Full Citation
Tao X, Zhou S, Ding K, Li S, Li Y, Wu B, et al. An LLM chatbot to facilitate primary-to-specialist care transitions: a randomized controlled trial. Nat Med. 2026;32:934-942.
Background and Question
Specialist clinics often receive patients without structured referral information, especially where primary-care gatekeeping is weak. This produces short, overloaded consultations in which clinicians must elicit history, frame differentials, and plan tests under severe time pressure.
Research question
Can a co-designed patient-facing LLM chatbot collect preconsultation history, generate referral reports, and improve specialist workflow and patient experience in real clinical settings?
Methods and Evidence Chain
2,069 analyzed patients/care partners and 111 specialists from 24 disciplines across two health centers.
Patients were assigned to PreA-only, PreA with staff support, or no PreA before specialist consultation.
PreA performed general consultation tasks, preliminary diagnostic framing, test-order suggestions, and referral-report generation.
Primary endpoints included consultation duration, physician-perceived care coordination, and patient-reported communication ease.
Participants
2,069 analyzed patients/care partners and 111 specialists from 24 disciplines across two health centers.
Randomization
Patients were assigned to PreA-only, PreA with staff support, or no PreA before specialist consultation.
Intervention
PreA performed general consultation tasks, preliminary diagnostic framing, test-order suggestions, and referral-report generation.
Outcomes
Primary endpoints included consultation duration, physician-perceived care coordination, and patient-reported communication ease.
Key Results
PreA-only reduced physician consultation duration by 28.7% compared with no PreA.
Physician-perceived care coordination improved substantially with PreA referral reports.
Patient-reported communication ease improved compared with usual care.
PreA-only and staff-supported PreA produced equivalent outcomes, supporting scalable autonomous operation.
Mechanism Interpretation
PreA works as a workflow compression layer. It converts unstructured patient narratives into a structured preconsultation packet, allowing specialists to start from a synthesized history, plausible diagnostic frame, and test plan rather than from a blank encounter. The trial also suggests co-design can outperform passive fine-tuning on local dialogue because it aligns the model with desired care standards rather than reproducing local inefficiencies.
Mechanism / workflow schematic
Mermaid source is included so the website can render the diagram in supported browsers.
flowchart LR A[Patient symptoms before visit] --> B[LLM preassessment chat] B --> C[Structured referral report] C --> D[Specialist reviews summary] D --> E[Shorter consultation] D --> F[Improved care coordination] B --> G[Patient communication preparation] G --> H[Better reported communication ease]
Clinical and Translational Relevance
Clinical relevance
This is one of the more clinically meaningful LLM trials because it measures operational outcomes in live care rather than exam questions. It is directly relevant to outpatient triage, referral quality, documentation burden, and resource-limited specialist systems.
Translational value
Hospitals considering LLM deployment should copy the implementation logic more than the exact tool: stakeholder co-design, low-literacy patient interface, clinician-facing structured output, prospective trial endpoints, and monitoring for automation bias.
Limitations and Critique
Conducted in specific Chinese tertiary centers; other payment, referral, and EHR environments may differ.
Consultation time and perceived coordination are valuable but do not prove improved diagnosis, morbidity, or long-term outcomes.
Patient-reported experience outcomes are vulnerable to expectation effects.
Rare diagnostic errors, bias harms, and inappropriate reassurance need larger surveillance datasets.
Reviewer-style critique
The trial is strong because it is pragmatic, multicenter, and workflow-oriented. The main caution is that efficiency can be overvalued: clinical AI should shorten visits only if diagnostic quality, patient understanding, and safety are preserved or improved.
Practical Next Research Actions
Action 1
Replicate the workflow in a surgical outpatient pathway with time-to-correct-specialty, missing-history rate, and downstream test appropriateness.
Action 2
Audit generated referral reports for hallucinated symptoms, omitted red flags, and guideline-discordant test suggestions.
Action 3
Measure clinician overreliance by comparing note content, independent diagnosis, and management changes.
Action 4
Build dashboards for subgroup performance by age, literacy, income, specialty, and disease acuity.
Evidence-quality judgment
High workflow-efficacy evidence for the studied setting; moderate evidence for broader clinical benefit until diagnostic and safety endpoints mature.