AI and Healthcare: The Evidence Gap Behind 1,451 FDA-Cleared Devices

A dense grid of hundreds of small white FDA certificate icons on the left, and a tiny handful of glowing golden certificates on the right, representing the contrast between regulatory volume and scarce clinical evidence. — 1,451 cleared devices — and fewer than 2% with RCT evidence.

The FDA cleared 295 AI/ML-enabled medical devices in 2025 alone, according to IntuitionLabs' analysis. That brings the cumulative total to 1,451. Radiology accounts for 76% of them — 1,104 devices. Yet when I ask which of those devices have been tested in a randomized clinical trial, a 2025 JAMA study found that fewer than 2% have one. If you work in a hospital, a health system, or a procurement office, you are making decisions based on clearance letters, not on outcomes data. That should bother you.

1,451 Devices Cleared. How Many Had an RCT?

I do not have a problem with the clearance count. The FDA is moving faster, partly by design. What I have a problem with is the gap between that speed and the evidence required to trust a device in a clinical workflow. When I hear “FDA-cleared” used as a synonym for “proven effective,” I want to point to that 2% figure. That is the gap.

The Evidence That's Missing

The 2% RCT figure is the sharpest number, but it is not the only one. When you dig into the FDA's decision summaries for these devices, you find something worse than a lack of trials: a lack of basic methodological reporting.

46.7% of FDA decision summaries do not describe the study design used to support the clearance.
53.3% omit the sample size.
Only 3 devices — less than 1% — report actual patient health outcomes.

Let me be blunt: that is not a documentation glitch. It means the FDA itself is not enforcing basic evidentiary standards for AI devices. When nearly half of clearance filings lack even a study design, you cannot treat these summaries as reliable evidence. They are paperwork, not validation.

There is more nuance than a blanket “all devices lack evidence.” Some have retrospective or observational data. But for high-stakes deployment decisions — deciding whether a triage tool affects time to treatment, or whether a diagnostic algorithm changes outcomes — those designs are insufficient. The JAMA study made that clear. And as the ARISE State of Clinical AI report put it, “The time for context-specific prospective trials is now.”

Why So Little Evidence? The 510(k) Predicate Loophole

The reason most AI devices can get through without a trial is the 510(k) pathway. You do not need to prove your device is effective on its own — you only need to show it is “substantially equivalent” to a predicate device that was already cleared. If the predicate had thin evidence, the problem compounds. And because most predicates are earlier-generation narrow AI tools with similarly sparse data, the entire stack grows on a foundation of unverified equivalence.

This is not a criticism of the 510(k) pathway for all devices. It works for low-risk hardware and software. But when you are authorizing a device that could change a clinical decision — flag a stroke on a CT, prioritize a chest X-ray — you are asking clinicians to trust a regulatory shortcut. The evidence shows that shortcut is not working for AI.

PCCPs and Foundation Models: Progress or Just New Wrappers?

Two developments might suggest the gap is closing: Predetermined Change Control Plans and the first foundation-model clearance. Let me give you a dose of realism.

First, PCCPs allow manufacturers to update an AI device without a new clearance, provided the updates stay within a pre-approved plan. That is a sensible process change. But only 10% of 2025 clearances included a PCCP. And a PCCP is about how you iterate, not whether a device works. It does not generate clinical evidence.

Second, Aidoc's CARE1 foundation model cleared by FDA in February 2025 — the first clearance for a foundation-model-powered clinical AI device. That is a genuine regulatory first. But it is a clearance, not a validation. The clinical evidence behind CARE1 is early, and the model's performance in practice remains to be tested prospectively. I am cautiously optimistic, but I refuse to treat a regulatory milestone as a clinical stamp of approval.

The ARISE report noted the “jagged frontier” — models that can outperform humans on tightly controlled tasks but fail to recognize their own uncertainty. Until we have prospective trials that measure real-world impact, those frontiers remain theoretical.

How to Evaluate an AI Device: An Evidence-First Framework

I have spent the last several paragraphs describing a problem. Let me now offer a way forward. If you are a health system evaluator, a CMIO, or a clinician sitting on a purchasing committee, here is what to look for before you let an AI device touch a patient.

Ask for the study design. If the FDA summary does not mention it, demand at least a prospective multi-center study or an RCT. Retrospective single-center data is not enough for high-risk decisions.
Examine the sample size and population diversity. A model trained on 500 images from one hospital does not generalize to your patient population. Look for external validation on a separate dataset.
Require patient-level outcome data. Sensitivity and specificity are useful, but what matters is whether the tool changes treatment times, reduces mortality, or lowers readmission rates. Only 3 devices have that data — you can ask your vendor to be the fourth.
Check the evidence tier. Some companies have published rigorous evidence. For the minority of devices that meet high evidence standards, see our profile of healthcare AI companies with strong clinical validation. But for most devices, you need to apply the framework yourself.

I am not saying every AI device needs an RCT before deployment. Some low-risk tools — like ambient documentation — can be evaluated on operational metrics. But for anything that affects a clinical decision, the bar should be higher. The FDA's clearance rate is not slowing down. The evidence gap needs to narrow on our end, in how we evaluate and adopt.

For a deeper look at how methodological quality and reporting gaps affect AI clinical research, see our analysis of methodological quality and reporting gaps in AI clinical research.

Three horizontal evaluation stages on a blue-teal gradient: a gavel (regulatory clearance), a magnifying glass over a clipboard (evidence scrutiny), and a checkmark inside a shield (deployment decision). — From clearance to deployment: each stage demands its own evidence.

1,451 FDA-Cleared AI Devices: Why Only 2% Have Clinical Trial Evidence

1,451 Devices Cleared. How Many Had an RCT?

The Evidence That's Missing

Why So Little Evidence? The 510(k) Predicate Loophole

PCCPs and Foundation Models: Progress or Just New Wrappers?

How to Evaluate an AI Device: An Evidence-First Framework

Discussion

Comments