A private-AI deployment a partner can defend
A four-month build for a healthcare-adjacent organisation: on-premises model deployment with matter-aware retrieval, audit logging, and a runbook the in-house IT lead operates unattended.
- Client
- Healthcare-adjacent research operation, ~400 employees, US, HIPAA scope.
- Duration
- Four months
- Practice
- Software & Private AI
The organisation's principal investigators wanted to use a language model on clinical-trial documentation. Their compliance counsel had reviewed the cloud-API option and concluded that sending the documents to a third-party model — even one with a Business Associate Agreement — created a risk profile the organisation could not defend. The team had read the public material on private AI deployment and had three constraints to satisfy at once: clinical data must never leave the boundary, retrieval must respect study and arm boundaries, and the system must be operable by the existing IT lead after handoff.
- Threat model written against the organisation's HIPAA posture, including the data flows and trust boundaries for prompts, completions, and retrieved documents
- Hardware sizing and procurement guidance for on-premises GPU infrastructure
- Open-weight model selection with a written justification against the organisation's confidentiality and accuracy requirements
- Retrieval architecture that enforces study and arm boundaries at the index layer, not the prompt layer
- Audit logging — immutable, paragraph-level, queryable for compliance review
- A written operator runbook and a working session with the in-house IT lead
Weeks 1–2 were discovery: a threat model, an architecture sketch, and a signed design document the head of research and the IT lead both reviewed. Months 1–3 were the build — weekly written updates, production-quality code with tests, and a deployment that ran end-to-end on the organisation's hardware by the end of month two. Month four was hardening, retrieval-edge-case work, and the operator handoff: a six-hour working session, a runbook in writing, and a knowledge-transfer document that anticipates the questions a future IT lead will ask.
The system has been in active use across three principal investigators for the four months since handoff. The in-house IT lead operates the deployment unattended; Karakor has been retained for a quarterly check-in but is not on operational call. The organisation's compliance counsel has reviewed the architecture and incorporated the deployment into the organisation's HIPAA risk register without exception. The model has refused to answer twenty-three queries it could not source from the indexed corpus — refusals the team has documented as the feature working correctly.
When a model invents a citation, the failure is almost never the model. It is the system that decided what the model was allowed to see. Retrieval that refuses to answer when the source is missing is worth ten that produces a confident, plausible, wrong answer.
We respond within two business days. Scoping calls are obligation-free and run thirty minutes.
