Full Citation

Tao X, Zhou S, Ding K, Li S, Li Y, Wu B, et al. An LLM chatbot to facilitate primary-to-specialist care transitions: a randomized controlled trial. Nat Med. 2026;32:934-942.

Study typePragmatic multicenter randomized controlled trial of a patient-facing LLM preassessment system across 24 medical disciplines.

IdentifierNo PMID listed

DOI10.1038/s41591-025-04176-7

Background and Question

Specialist clinics often receive patients without structured referral information, especially where primary-care gatekeeping is weak. This produces short, overloaded consultations in which clinicians must elicit history, frame differentials, and plan tests under severe time pressure.

Research question

Can a co-designed patient-facing LLM chatbot collect preconsultation history, generate referral reports, and improve specialist workflow and patient experience in real clinical settings?

Methods and Evidence Chain

Participants

2,069 analyzed patients/care partners and 111 specialists from 24 disciplines across two health centers.

Randomization

Patients were assigned to PreA-only, PreA with staff support, or no PreA before specialist consultation.

Intervention

PreA performed general consultation tasks, preliminary diagnostic framing, test-order suggestions, and referral-report generation.

Outcomes

Primary endpoints included consultation duration, physician-perceived care coordination, and patient-reported communication ease.

Participants

2,069 analyzed patients/care partners and 111 specialists from 24 disciplines across two health centers.

Randomization

Patients were assigned to PreA-only, PreA with staff support, or no PreA before specialist consultation.

Intervention

PreA performed general consultation tasks, preliminary diagnostic framing, test-order suggestions, and referral-report generation.

Outcomes

Primary endpoints included consultation duration, physician-perceived care coordination, and patient-reported communication ease.

Key Results

Efficiency

PreA-only reduced physician consultation duration by 28.7% compared with no PreA.

Coordination

Physician-perceived care coordination improved substantially with PreA referral reports.

Patient experience

Patient-reported communication ease improved compared with usual care.

Autonomy

PreA-only and staff-supported PreA produced equivalent outcomes, supporting scalable autonomous operation.

Mechanism Interpretation

PreA works as a workflow compression layer. It converts unstructured patient narratives into a structured preconsultation packet, allowing specialists to start from a synthesized history, plausible diagnostic frame, and test plan rather than from a blank encounter. The trial also suggests co-design can outperform passive fine-tuning on local dialogue because it aligns the model with desired care standards rather than reproducing local inefficiencies.

Mechanism / workflow schematic

Mermaid source is included so the website can render the diagram in supported browsers.

flowchart LR
  A[Patient symptoms before visit] --> B[LLM preassessment chat]
  B --> C[Structured referral report]
  C --> D[Specialist reviews summary]
  D --> E[Shorter consultation]
  D --> F[Improved care coordination]
  B --> G[Patient communication preparation]
  G --> H[Better reported communication ease]

Clinical and Translational Relevance

Clinical relevance

This is one of the more clinically meaningful LLM trials because it measures operational outcomes in live care rather than exam questions. It is directly relevant to outpatient triage, referral quality, documentation burden, and resource-limited specialist systems.

Translational value

Hospitals considering LLM deployment should copy the implementation logic more than the exact tool: stakeholder co-design, low-literacy patient interface, clinician-facing structured output, prospective trial endpoints, and monitoring for automation bias.

Limitations and Critique

Generalisability

Conducted in specific Chinese tertiary centers; other payment, referral, and EHR environments may differ.

Outcome depth

Consultation time and perceived coordination are valuable but do not prove improved diagnosis, morbidity, or long-term outcomes.

Blinding

Patient-reported experience outcomes are vulnerable to expectation effects.

Safety surveillance

Rare diagnostic errors, bias harms, and inappropriate reassurance need larger surveillance datasets.

Reviewer-style critique

The trial is strong because it is pragmatic, multicenter, and workflow-oriented. The main caution is that efficiency can be overvalued: clinical AI should shorten visits only if diagnostic quality, patient understanding, and safety are preserved or improved.

Practical Next Research Actions

Action 1

Replicate the workflow in a surgical outpatient pathway with time-to-correct-specialty, missing-history rate, and downstream test appropriateness.

Action 2

Audit generated referral reports for hallucinated symptoms, omitted red flags, and guideline-discordant test suggestions.

Action 3

Measure clinician overreliance by comparing note content, independent diagnosis, and management changes.

Action 4

Build dashboards for subgroup performance by age, literacy, income, specialty, and disease acuity.

Evidence-quality judgment

High workflow-efficacy evidence for the studied setting; moderate evidence for broader clinical benefit until diagnostic and safety endpoints mature.

LLM chatbot for primary-to-specialist care transitions tested in a randomized trial

Full Citation

Background and Question

Research question

Methods and Evidence Chain

Participants

Randomization

Intervention

Outcomes

Key Results

Mechanism Interpretation

Mechanism / workflow schematic

Clinical and Translational Relevance

Clinical relevance

Translational value

Limitations and Critique

Reviewer-style critique

Practical Next Research Actions

Action 1

Action 2

Action 3

Action 4

Evidence-quality judgment