Welcome to the Machine

The Sector Files #2: Healthcare

Apr 01, 2026

When the machine helps write the patient.

AI is entering healthcare through the language of assistance. It is sold as support for routine work: ambient documentation, note drafting, referral letters, discharge summaries, handovers, coding, and patient messaging. The attraction is obvious. Clinical documentation takes time, contributes to burnout, and often keeps clinicians at screens rather than with patients. The evidence that ambient voice and documentation tools can reduce that burden is now substantial. But in healthcare, routine does not mean low-stakes. A clinical note is not just admin. It is a record that travels: between clinicians, across departments, into referrals, into discharge, into complaints, into litigation, and sometimes into the long institutional memory of a person’s care. If AI starts shaping those records before healthcare systems have adequate ways to preserve provenance, challenge distortion, and correct the record, the issue is not simply efficiency. It is fidelity.

The central claim here refuses to treat this as a simple hallucination story. The problem is not only that a model may generate something false. The problem is that AI systems are beginning to generate, transform, and transmit clinical records before healthcare systems have adequate standards for meaning integrity, challenge, and correction. The governance failure is not inaccuracy alone, but the premature hardening of AI-shaped summaries into clinically actionable institutional fact.

The strongest available evidence still points first to workflow gain. Across studies and pilot settings, ambient scribing and related tools reduce documentation time and often increase patient-facing attention. The GOSH-led nine-site London study [1] is especially important in the UK context: more than 17,000 encounters, a reported 23.5% increase in patient interaction time, and a 51.7% reduction in documentation time. Those are not trivial gains. They should not be waved away. But they do not answer the governance question. They show that the tools can help. They do not show that the records they produce are sufficiently faithful, challengeable, or safe as they begin to travel through care systems.

This is where the Digital Narrative Care (DNC) lens matters. Existing governance language tends to ask whether an AI-generated note is accurate, safe, secure, interoperable, or usable. Those are necessary questions. They are not the only ones. DNC asks an additional one: does the record remain faithful enough to the patient, the presentation, and the context to be safely acted upon downstream? That is what meaning integrity means. A note can be factually correct, clinically useful, and still misrepresent the person whose encounter produced it.

That is not a speculative claim. The pre-AI baseline already tells us as much. Concealed-audio comparison work — notably Weiner et al. [2] — showed just how imperfect physician-authored notes already were before GenAI. Experimental and observational work on stigmatising language then sharpened the point: records can shape downstream care not only through what they state, but through how they frame the patient. Park et al. [3] demonstrated in a randomised vignette study that clinicians who read stigmatising language about an identical patient subsequently administered less analgesia. Himmelstein et al. [4] then confirmed the pattern at scale — stigmatising language appeared in 2.5% of nearly 49,000 real hospital admission notes, and more often in notes about Black patients. AI trained on historical records does not inherit only clinical knowledge. It inherits whatever documentation biases those records contain.

That is why record hardening is the right term for what happens next. An AI-generated draft does not remain a draft for long. It is reviewed quickly, signed off under pressure, entered into the electronic record, and then treated downstream as the record of what happened. Once that hardening occurs, later clinicians often encounter the account not as a provisional representation but as settled institutional fact. If the note is wrong, compressed, or misleading, the problem is no longer just one clinician’s drafting error. It becomes a systems problem.

The quality of that systems problem is now becoming clearer. Asgari et al. [5] found a 1.47% hallucination rate and a 3.45% omission rate across clinician-annotated sentences in AI-generated clinical notes, with hallucinations especially concentrated in the Plan section — the part of the note most likely to drive downstream action. The Plan finding matters because it means the risk is not only descriptive. It is action-bearing. A system that gets the plan wrong is not merely imperfect documentation support. It is potentially reshaping care.

Other reviews sharpen the same point. Ambient scribes do not merely reproduce older speech-recognition errors. They introduce a broader set of failure modes: hallucination, critical omission, misattribution, contextual misinterpretation, and the production of notes that feel complete enough to discourage scrutiny. This is where persuasive coherence does its most precise work. AI-generated notes can read as more thorough, more organised, and more authoritative than physician-authored notes. That can be useful. It can also make them more difficult to challenge, because smoothness is easily mistaken for fidelity.

This is also why “human in the loop” is not yet an adequate answer. Formally, most healthcare deployments still require clinician sign-off. In practice, that may mean reviewing a polished draft under time pressure, without specialty-specific training in AI failure modes, without a strong audit trail of what the system changed or omitted, and without clear institutional routes for contesting or correcting the record later. Lawton et al. [6] named the structural trap precisely: clinicians remain the accountable signatories to AI-shaped records while lacking the time, tools, and governance infrastructure needed to exercise meaningful oversight. Liability remains human. Agency does not always. And this is a structural problem, not an individual competence problem — it cannot be solved by asking clinicians to review more carefully, because the failure is architectural.

The clearest current proof of this gap may be the newest one. The NHS Confederation’s March 2026 guide, *Demystifying Clinical AI in Mental Health* [7] — produced in partnership with Limbic — is practical, serious, and exactly the sort of document trusts are likely to use. It asks eight sensible questions about adoption. Yet none of them is the question DNC exists to ask: does the AI preserve what the patient actually meant? Its cited RDaSH pilot makes the point even more sharply. In those mental health settings, only 61–80% of AVT output was correct without editing, and automation bias was explicitly flagged as a concern requiring training. This is not a failure of the document. It is a faithful reflection of where the field currently stands. That is not an argument against adoption. It is an argument against pretending that generic documentation governance can simply be lifted unchanged into Tier 3 settings.

The same guide also warns that deterministic AI triage can “force people into a single outcome” [7] by routing patients according to fixed logic that cannot accommodate the nuance of the case in front of it. That is a healthcare version of record hardening applied not only to documentation, but to pathway allocation itself. It shows that the DNC problem is not confined to what gets written in the note. It also appears in what gets decided from the note, the score, or the triage output before there is enough room for ambiguity, contest, or re-reading.

Current NHS governance frameworks are serious, but partial. NHS England’s ambient scribing guidance [8] — alongside the associated DTAC, DCB0129/0160 clinical safety standards, MHRA registration requirements, and supplier registry — indicates that documentation AI is not entering a regulatory vacuum. That matters. It means the right argument is not “there is no governance.” The right argument is more precise: the current architecture is stronger on clinical safety, privacy, security, and interoperability than it is on meaning integrity, downstream fidelity, and aftercare. In other words, it is better at governing the tool as a technical artefact than the record as a travelling clinical representation.

That gap becomes more serious as records move beyond the original consultation. The note that is “good enough” for one clinician in one moment may not be good enough when turned into a discharge summary, referral letter, handover note, coding justification, or later medico-legal record. And there is now a further complication. The ambient AI market is no longer only selling relief from documentation burden; it is increasingly selling revenue-cycle performance. A policy brief in *npj Digital Medicine* [9] documents the pattern directly: ambient AI increases documented diagnoses, raises coding levels, and generates measurable revenue gains. This matters because it names a distinct subtype of meaning distortion: incentive-shaped coherence. The note may be accurate in every stated fact, yet still be organised by billing logic rather than encounter fidelity. That is not a side issue. It is one of the clearest ways meaning can be bent without becoming obviously false.

There is also a scale problem. The NHS is consolidating its record infrastructure: shared care records, ICS-level data flows, the Connecting Care Records Programme, the Single Patient Record ambition, and interoperability frameworks that make records more mobile than they have ever been. An AI-generated note that misrepresents a patient in one encounter is a local governance failure. The same note, stripped of its AI provenance and transmitted through shared care systems, ambulance access, out-of-hours services, and ICS-wide pathways, becomes a systemic one. This is where DNC’s Digital Architectural Memory concept becomes directly useful. The issue is no longer only whether systems can exchange the record. It is whether they can preserve what the record has passed through, what has been altered, and what still needs to be interpreted with care. Once AI starts shaping official records in high-consequence human systems, efficiency gains and meaning risks arrive together.

This is where DNC’s third pillar — aftercare — becomes essential. In healthcare, aftercare does not mean soft support language. It means the governance of what happens to a record after it is created: who can challenge it, how, and with what accountability if the record travels badly. In practice that means:

- provenance tagging that identifies AI-generated content in the record

- audit trails for post-sign-off amendments that distinguish meaning-level changes from factual corrections

- patient routes to challenge AI-assisted records

- preservation of provenance when records move between systems

- escalation triggers for clinically significant hallucinations or omissions

- integration of AI documentation failures into incident frameworks rather than IT support channels

That is a serious governance proposition, not an ethical flourish.

One of the strengths of the Sector Files architecture is that it allows each piece to look sideways as well as inward. This pattern is not unique to healthcare. Bruff and Groves at the Ada Lovelace Institute [10] reach strikingly similar conclusions about AI transcription tools in social care: rapid adoption under resource pressure, incomplete evidence, unclear human-in-the-loop requirements, and real risks of hallucinations and misrepresentations entering statutory records. Social care is not the focus here. But the overlap clarifies that the healthcare problem is not an isolated quirk of clinical IT.

What DNC adds, then, is not another generic responsible-AI layer. It introduces a different object of assurance. Record integrity asks what was written, omitted, inferred, tagged, corrected, and retained. Meaning integrity asks whether the note preserves the patient’s own framing, clinical uncertainty, contextual detail, and protection from biased or incentive-shaped distortion. Aftercare asks what happens once the note begins to travel: whether provenance survives transmission, whether patients and clinicians can challenge it, and whether institutions know how to respond when an AI-shaped record turns out to have misled.

The danger is not only that AI may put something false into a clinical record. It is that AI is beginning to shape how healthcare systems witness, summarise, transmit, code, and remember patients before those systems have built adequate ways to preserve meaning, challenge distortion, and govern the afterlife of the record. The newest mental health evidence only strengthens that claim. The ICS and shared-record architecture scale it. And the coding-arms-race strand gives it a harder edge. The field still lacks the one question DNC exists to ask: does this system preserve what the patient actually meant? Until healthcare systems treat that as a core governance requirement rather than a background assumption, they should be more cautious about what they allow these tools to write down on their behalf.

AI doesn’t just write the note. It starts to write the patient.

Keep Meaning Human

Keep Healthcare Human

-----

References

[1] Wray J, Sridharan S et al. GOSH-led nine-site London ambient scribing evaluation (GOSH DRIVE / NHS England, 2025).

[2] Weiner SJ, Wang S, Kelly B, Sharma G, Schwartz A. “How accurate is the medical record? A comparison of the physician’s note with a concealed audio recording in unannounced standardized patient encounters.” *Journal of the American Medical Informatics Association* 27(5):770–775 (2020).

[3] Park J, Saha S, Chee B, Taylor J, Beach MC. “Physician use of stigmatizing language in patient medical records.” *JAMA Network Open* 4(7):e2117052 (2021).

[4] Himmelstein G, Bates D, Zhou L. “Examination of stigmatizing language in the electronic health record.” *JAMA Network Open* 5(1):e2144967 (2022).

[5] Asgari E et al. “A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation.” *npj Digital Medicine* (2025).

[6] Lawton T, Morgan P, Porter Z et al. “Clinicians risk becoming ‘liability sinks’ for artificial intelligence.” *Future Healthcare Journal* 11(1):100007 (2024).

[7] NHS Confederation / Limbic. *Demystifying Clinical AI in Mental Health* (26 March 2026).

[8] NHS England. Guidance on the use of AI-enabled ambient scribing products in health and care settings (April 2025, updated 2026).

[9] Policy brief: ambient AI scribes and the coding arms race. *npj Digital Medicine* (December 2025); see also Holmgren AJ et al., *JAMA Network Open* (January 2026), and Trillian Health, Increased Outpatient Coding Intensity Following Hospital Adoption of AI-Enabled Scribing (March 2026).

[10] Bruff O, Groves L. *Scribe and prejudice? Exploring the use of AI transcription tools in social care.* Ada Lovelace Institute, 11 February 2026.

-----

Suggested further reading

- Ada Lovelace Institute, *Scribe and prejudice?* for the cross-sector comparator.

- NHS Confederation / Limbic, *Demystifying Clinical AI in Mental Health* for the newest Tier 3 comparator and outreach seam.

Discussion about this post

Ready for more?