Client Work // 04 · Healthcare AI · Prompt Engineering · LLM Architecture

When Seconds Matter, Doctors Shouldn't Have to Hunt for Information

EHR systems were built for documentation. Not for thinking under time pressure. This is a prompt architecture and interface concept built around one constraint: what should AI never do in a clinical setting?

Role

Lead UX Designer · Prompt Engineer

Type

Prototype · Ongoing

Stack

LLM · Prompt Architecture · React

Status

Ongoing. No hallucinations observed in tested scenarios.

The Problem

"A physician has 60–90 seconds to scan a patient's history before entering the consultation room."

Reported

Physician burnout pressure

Estimated

Diagnostic errors annually (US)

~7 items

Working memory capacity

A physician walks into a consultation with about ninety seconds before the patient looks up. In that window they need to know what matters. Not everything. What matters. The electronic health record gives them everything. Critical alerts sit next to routine observations. Missing information looks identical to present information. The system does not tell you what it does not know. The cognitive cost of sorting signal from noise falls entirely on the doctor, every time, before the conversation has even started.

"14 of 21 major studies reported an association between clinician burnout and clinically significant medical errors. The AMA frames this as a system failure. Not a lack of physician resilience."

Source: PubMed burnout–error meta-analysis; Armstrong Institute (Johns Hopkins)

My Role

Lead UX Designer · Prompt Engineer

I started from the wrong end of the problem. Not what can the AI do, but what should it never do. That question shaped every decision. The AI should not interpret. It should not diagnose. It should not present uncertain data as though it were certain. Its job is to transform a fragmented pile of records into a card a physician can read in under ten seconds. Surface what is critical. Flag what is missing. Stop there. That constraint is not a limitation. It is the design.

Problem framing and research synthesis
Prompt architecture and constraint design
Output schema and card UI design
Hallucination testing and edge-case validation

The Solution

The Patient Summary Card

I designed a prompt engineering architecture that uses an LLM as a clinical structuring layer. The system is designed to extract clinically relevant details, surface critical flags first, and output a clean structured card. Built to minimise hallucination and avoid clinical interpretation. Clinical judgment stays with the physician.

PT-00412 · Pre-consultationToday 09:30

Margaret H., 54

Female · GP visit · Dr. A. Osei

Alerts

HbA1c overdue, 14 months

Guideline: retest every 6 months for Type 2 DM

Cardiology referral unresolved

Referred 8 months ago, no outcome recorded

Visit Reason

Fatigue and shortness of breath, onset 6 weeks ago, gradual

Active Conditions

Type 2 diabetesHypertensionHyperlipidaemia

Current Medications

Metformin 500mgLisinopril 10mgAtorvastatin 20mgAspirin 81mg

Patient Voice

"The tiredness is affecting my work. I'm worried the breathing could be something serious."

Data Quality

Renal panelPending

Allergy detailPenicillin, reaction type unspecified

Last BP reading148 / 92 mmHg

DEFAULT STATE · ALL DATA PRESENT

PT-00397 · Pre-consultationToday 10:00

David K., 67

Male · GP visit · Dr. A. Osei

Alerts

Allergy record incomplete

Sulfa allergy logged, no reaction severity documented

Visit Reason

Routine review, knee pain management

Active Conditions

OsteoarthritisType 2 diabetes+ 2 unverified

Current Medications

Celecoxib 100mgMetformin 1g

Patient Voice

No patient-stated concern found in record

Data Quality

Last HbA1cnot found

Cardiology statusnot found

Renal panelnot found

Last BP reading122 / 78 mmHg

INCOMPLETE DATA STATE · MISSING FIELDS SURFACED

PT-00441 · Pre-consultationToday 10:30

Yusra M., 71

Female · GP visit · Dr. A. Osei

Alerts. Review Before Entering.

Warfarin + new Ibuprofen script, interaction risk

Prescribed by out-of-hours GP 3 days ago, not reviewed by this practice

INR result outstanding, 19 days

Last INR: 3.8 (above therapeutic range). Retest overdue.

Falls risk, 2 incidents in past 4 months

Formal falls assessment not completed

Renal function, last checked 11 months ago

Relevant given Warfarin dosing and age

Visit Reason

Dizziness and unsteadiness on feet, onset 2 weeks ago

Active Conditions

Atrial fibrillationHypertensionOsteoporosis

Patient Voice

"I feel unsteady when I stand up. I am frightened of falling again."

ALERT-HEAVY STATE · CRITICAL FLAGS PRIORITISED

Before vs After

Before

Raw EHR data — unformatted, inconsistent, dense
Physician scans multiple screens to build context
Critical flags buried inside narrative text
Missing fields silently absent, not flagged
Working memory overloaded before patient enters the room

After

Structured clinical card — consistent, scannable output
Single view designed to surface clinically relevant context
Alerts ordered by severity, visible immediately
Missing fields explicitly flagged in tested scenarios
Physician enters the room with context already formed

The Architecture

Prompt Engineering as UX

The logic layer does what a good UX system always does: it reduces unnecessary decisions for the user. Five rules guided the prototype output.

Zero filler

No introductory text, no hedging language. Output was constrained to begin with the first JSON key.

Alert-first ordering

Overdose risks, unresolved allergies, and missing critical data surface before any summary content. A doctor must see risks before context.

Confidence calibration

The prompt was designed to avoid diagnosis, recommendation, or interpretation. It structures and surfaces. Clinical judgment remains with the physician.

Human layer

One patient-stated concern extracted verbatim from notes. If none exists, marked as missing. Not inferred in tested scenarios.

Confidence ceiling

The prompt was designed not to diagnose or interpret. It surfaces, and flags what it doesn't know.

FIG A. CLINICAL LOGIC ENGINE. Prompt architecture across input, logic, data, and component layers.

Research Grounding

The Evidence Base

Working memory ceiling

~7 items

A typical pre-consultation EHR data load far exceeds working memory capacity, forcing physicians into triage mode before the patient enters the room.

CLT in medicine (AMEE Guide No. 86)

AI vs full record review

Reported review-time reduction

One study found AI-generated summaries could reduce review time while maintaining clinical accuracy.

Comparing AI- vs. Clinician-Authored Summaries (medRxiv, 2025)

Diagnostic safety

Burnout and diagnostic-safety research

Burnout is associated with production pressure and path-of-least-resistance decisions. The AMA frames this as a system failure. Fix the interface, not the person.

PubMed meta-analysis; Armstrong Institute; AMA research

Quality of interaction, not duration

Substantial satisfaction increase

Patients who feel "seen" have higher enablement scores. The human layer in the card gives physicians the raw material to create that moment, even in 15 minutes.

Enablement After Consultation in Primary Healthcare (Dove Press, 2025)

"The most important decision in a safety-critical AI system is often what you refuse to let it do."

Back to work →