Between 2022 and 2026, Google's clinical reasoning AI has moved through three distinct architectural generations, each representing a fundamental shift in what the system can do — and what it cannot. The progression from Med-PaLM through AMIE to the AI co-clinician is not merely a sequence of incremental benchmark improvements. It is a story of expanding capability boundaries: from static knowledge recall on multiple-choice exams, to text-based diagnostic conversation, to real-time multimodal telemedicine with live audio and video. For clinicians, researchers, and health executives evaluating where this technology fits into care delivery, understanding what changed between generations — and what did not — matters more than any single accuracy figure.
Generational Framework: From Knowledge Recall to Multimodal Telemedicine Agents
The pipeline can be understood as three generations, each defined by a different clinical task and a different level of interaction fidelity.
The first generation, anchored by Med-PaLM (2022) and its successor Med-PaLM 2 (2023), was fundamentally a knowledge-retrieval system. These models were evaluated on their ability to answer USMLE-style multiple-choice questions — a task that tests factual recall and basic clinical reasoning within a constrained format. Med-PaLM was the first AI system to pass the USMLE-style MedQA benchmark, achieving 67.6% accuracy. Med-PaLM 2 raised that to 85%, a level Google described as "expert doctor level." But the interaction was one-directional: the model received a question and returned an answer. There was no conversation, no follow-up, no ability to gather a history or explore ambiguity.
The second generation, AMIE (Articulate Medical Intelligence Explorer), represented a qualitative leap. AMIE was designed not to answer exam questions but to conduct diagnostic conversations. It could take a clinical history, ask clarifying questions, derive a differential diagnosis, and communicate with empathy — all within a text-based chat interface. Published in Nature Medicine, AMIE matched physician performance in simulated text-based consultations. In 2025, a multimodal version added the ability to interpret visual medical information, such as dermatologic images or radiology findings. This moved the system from pure language understanding into the realm of multimodal clinical perception.
The third generation, the AI co-clinician (announced April 30, 2026 by Google DeepMind), extends capability into real-time multimodal telemedicine. Unlike AMIE, which operated asynchronously over text, the AI co-clinician engages in live audio-video conversations with patients. Its dual-agent architecture — a Planner module that manages clinical reasoning and safety constraints, and a Talker agent that handles real-time conversation — is designed to enable supervised, physician-in-the-loop telemedical encounters. Research collaborations are underway with academic medical centers in the US, India, Australia, New Zealand, Singapore, and the UAE.

Performance Benchmarks Across Generations
The most commonly cited metric across this pipeline is accuracy on the MedQA dataset, a collection of USMLE-style multiple-choice questions. The progression is striking when viewed as a single number.
| Model | Year | MedQA Accuracy | Interaction Mode |
|---|---|---|---|
| Med-PaLM | 2022 | 67.6% | Static question-answering |
| Med-PaLM 2 | 2023 | 85.0% | Static question-answering |
| Med-Gemini | 2024 | 91.1% | Multimodal understanding (text + images) |
| AMIE | 2025 | Not benchmarked on MedQA | Text-based diagnostic conversation |
| AI co-clinician | 2026 | Not benchmarked on MedQA | Real-time audio/video telemedicine |
AI co-clinician
Comments
Join the discussion with an anonymous comment.