SOFTWARE · PRIVATE AI

A private-AI deployment a partner can defend

A four-month build for a healthcare-adjacent organisation: on-premises model deployment with matter-aware retrieval, audit logging, and a runbook the in-house IT lead operates unattended.

Client: Healthcare-adjacent research operation, ~400 employees, US, HIPAA scope.
Duration: Four months
Practice: Software & Private AI

Results

0Bytes of clinical-trial data that left the organisation's network boundary

23Queries the model refused — the system declining when the source was missing, working as designed

4 monthsDiscovery to operator handoff, weekly written updates throughout

0 exceptionsRecorded against the organisation's HIPAA register after compliance review

01 — Challenge

The organisation's principal investigators wanted to use a language model on clinical-trial documentation. Their compliance counsel had reviewed the cloud-API option and concluded that sending the documents to a third-party model — even one with a Business Associate Agreement — created a risk profile the organisation could not defend. The team had read the public material on private AI deployment and had three constraints to satisfy at once: clinical data must never leave the boundary, retrieval must respect study and arm boundaries, and the system must be operable by the existing IT lead after handoff.

02 — Scope

Threat model written against the organisation's HIPAA posture, including the data flows and trust boundaries for prompts, completions, and retrieved documents
Hardware sizing and procurement guidance for on-premises GPU infrastructure
Open-weight model selection with a written justification against the organisation's confidentiality and accuracy requirements
Retrieval architecture that enforces study and arm boundaries at the index layer, not the prompt layer
Audit logging — immutable, paragraph-level, queryable for compliance review
A written operator runbook and a working session with the in-house IT lead

03 — Work

Weeks 1–2 were discovery: a threat model, an architecture sketch, and a signed design document the head of research and the IT lead both reviewed. Months 1–3 were the build — weekly written updates, production-quality code with tests, and a deployment that ran end-to-end on the organisation's hardware by the end of month two. Month four was hardening, retrieval-edge-case work, and the operator handoff: a six-hour working session, a runbook in writing, and a knowledge-transfer document that anticipates the questions a future IT lead will ask.

04 — Outcome

The system has been in active use across three principal investigators for the four months since handoff. The in-house IT lead operates the deployment unattended; Karakor has been retained for a quarterly check-in but is not on operational call. The organisation's compliance counsel has reviewed the architecture and incorporated the deployment into the organisation's HIPAA risk register without exception. The model has refused to answer twenty-three queries it could not source from the indexed corpus — refusals the team has documented as the feature working correctly.

When a model invents a citation, the failure is almost never the model. It is the system that decided what the model was allowed to see. Retrieval that refuses to answer when the source is missing is worth ten that produces a confident, plausible, wrong answer.

From the deliverable

Short excerpts representative of the actual documents shipped in this engagement. Anonymised to preserve client confidentiality; faithful to the structure and substance of what we delivered.

From the design document

Trust boundary: the on-premises GPU host. Nothing in scope leaves it. Prompts are constructed inside the boundary; retrieved chunks are read from a local vector store; completions are returned to the same host. The web front-end terminates TLS inside the boundary. There is no third-party model API in the data path.

From the operator runbook

If the model returns a response without source citations, the system has either (a) been queried about something outside the indexed corpus, or (b) had its retrieval layer fail open. Both are reportable events. The expected behaviour, when the source is missing, is a refusal. Do not configure the system to fall back to answering without citation. That is the failure mode this architecture exists to prevent.

From the threat model

Prompt-injection scenarios are mitigated at the retrieval layer, not the prompt layer. A document that says "ignore previous instructions" cannot affect the system's behaviour because the instructions are not in the prompt — they are in the retrieval policy, enforced before any document is selected.

ContinueAll 05 →

CYBERSECURITY · ASSESSMENTA written security posture before procurement askedA six-week NIST CSF-scoped assessment for a mid-market services firm fielding enterprise security questionnaires faster than its IT lead could answer them.LEGAL TECHNOLOGY · DMS & GOVERNANCEA DMS configuration a managing partner can defendAn eight-week document architecture review for a mid-size litigation firm: ethical-wall enforcement, retention defaults, and audit-trail design — replacing a configuration inherited from a managed-service provider.

Engage

We respond within two business days. Scoping calls are obligation-free and run thirty minutes.

Request consultation

A private-AI deployment a partner can defend

Tell us what you are trying to assess, harden, or build.