Google Healthcare AI: From Med-PaLM to the AI Co-Clinician Pipeline

Between 2022 and 2026, Google's clinical reasoning AI has moved through three distinct architectural generations, each representing a fundamental shift in what the system can do — and what it cannot. The progression from Med-PaLM through AMIE to the AI co-clinician is not merely a sequence of incremental benchmark improvements. It is a story of expanding capability boundaries: from static knowledge recall on multiple-choice exams, to text-based diagnostic conversation, to real-time multimodal telemedicine with live audio and video. For clinicians, researchers, and health executives evaluating where this technology fits into care delivery, understanding what changed between generations — and what did not — matters more than any single accuracy figure.

Generational Framework: From Knowledge Recall to Multimodal Telemedicine Agents

The pipeline can be understood as three generations, each defined by a different clinical task and a different level of interaction fidelity.

The first generation, anchored by Med-PaLM (2022) and its successor Med-PaLM 2 (2023), was fundamentally a knowledge-retrieval system. These models were evaluated on their ability to answer USMLE-style multiple-choice questions — a task that tests factual recall and basic clinical reasoning within a constrained format. Med-PaLM was the first AI system to pass the USMLE-style MedQA benchmark, achieving 67.6% accuracy. Med-PaLM 2 raised that to 85%, a level Google described as "expert doctor level." But the interaction was one-directional: the model received a question and returned an answer. There was no conversation, no follow-up, no ability to gather a history or explore ambiguity.

The second generation, AMIE (Articulate Medical Intelligence Explorer), represented a qualitative leap. AMIE was designed not to answer exam questions but to conduct diagnostic conversations. It could take a clinical history, ask clarifying questions, derive a differential diagnosis, and communicate with empathy — all within a text-based chat interface. Published in Nature Medicine, AMIE matched physician performance in simulated text-based consultations. In 2025, a multimodal version added the ability to interpret visual medical information, such as dermatologic images or radiology findings. This moved the system from pure language understanding into the realm of multimodal clinical perception.

The third generation, the AI co-clinician (announced April 30, 2026 by Google DeepMind), extends capability into real-time multimodal telemedicine. Unlike AMIE, which operated asynchronously over text, the AI co-clinician engages in live audio-video conversations with patients. Its dual-agent architecture — a Planner module that manages clinical reasoning and safety constraints, and a Talker agent that handles real-time conversation — is designed to enable supervised, physician-in-the-loop telemedical encounters. Research collaborations are underway with academic medical centers in the US, India, Australia, New Zealand, Singapore, and the UAE.

Three ascending editorial panels showing Google's clinical AI generational progression: Med-PaLM (2022) with book/exam icon, AMIE (2025) with chat bubble and stethoscope, and AI co-clinician (2026) with video camera and ear icon, connected by upward arrows. — Google's clinical reasoning AI has moved through three generations, each expanding the mode of interaction and clinical realism.

Performance Benchmarks Across Generations

The most commonly cited metric across this pipeline is accuracy on the MedQA dataset, a collection of USMLE-style multiple-choice questions. The progression is striking when viewed as a single number.

USMLE-style MedQA accuracy progression across Google's clinical reasoning models. Note that AMIE and AI co-clinician are not evaluated on MedQA because their design goals extend beyond exam-style knowledge recall.
Model	Year	MedQA Accuracy	Interaction Mode
Med-PaLM	2022	67.6%	Static question-answering
Med-PaLM 2	2023	85.0%	Static question-answering
Med-Gemini	2024	91.1%	Multimodal understanding (text + images)
AMIE	2025	Not benchmarked on MedQA	Text-based diagnostic conversation
AI co-clinician	2026	Not benchmarked on MedQA	Real-time audio/video telemedicine

From Med-PaLM to AI Co-Clinician: Tracing Google's Clinical Reasoning AI Pipeline

Generational Framework: From Knowledge Recall to Multimodal Telemedicine Agents

Performance Benchmarks Across Generations

Discussion

Comments