FDA Clearance Is Not Clinical Validation

By the end of 2025, the FDA had authorized 1,451 AI-enabled medical devices. 1,104 of them — 76% — were for radiology. In 2025 alone, 295 new AI/ML devices were cleared, according to The Imaging Wire and Innolitics. If you glance at those totals, you might assume AI is already proven in clinical practice. That assumption is exactly what the evidence does not support.

Most clearances come through the 510(k) pathway. That only requires the device to be substantially equivalent to an already marketed predicate. It does not require the submitter to show clinical efficacy. So FDA clearance means the device is similar enough to something already on the market — not that it has been shown to improve diagnosis, reduce errors, or save time in a real hospital. That distinction matters more than most readers realize.

The Most Important Number in This Article

A rapid systematic scoping review by Lawrence et al., published in eClinicalMedicine in 2025, identified 140 relevant studies on AI for diagnostic radiology. Of those, only 53 were quantitative effectiveness studies — and only 23 of those 53 evaluated AI in a live clinical pathway. Only 7 measured diagnostic accuracy in a real-world setting. Let that sink in: out of 140 studies, fewer than 20% tested AI where it would actually be used, with real patients, real clinicians, and real workflow constraints.

To put it another way: for every one study that tested AI in a live pathway, there are about 60 FDA clearances. The gap between regulatory volume and clinical evidence is not small — it is a chasm.

Breakdown of 140 studies in Lawrence et al. review. The vast majority measured perceptions or simulated performance, not real-world impact.
Study typeNumber of studies
Implementation7
Experiences14
Perceptions74
Quantitative effectiveness53
Cost6
Editorial infographic comparing a tall stack of FDA clearance icons labeled 'Regulatory Volume' with 1,451, and a sparse stack of document icons labeled 'Clinical Evidence' annotated with '23 of 53 in live pathways' and '< 7% real-world settings', with a wide gap between them.
The gap between regulatory volume and real-world evidence is the central finding of this analysis.

Sensitivity Gains — Mostly for Novices

Proponents of AI in imaging often point to improved diagnostic accuracy. And indeed, the Lawrence review found that 19 of 25 studies measuring sensitivity reported improvements with AI assistance. Five studies showed no change, one actually saw a reduction. That sounds like a clear win — until you look at who actually benefited.

Of the 16 studies that assessed sensitivity by reader experience level, 9 found that improvements were limited to less experienced readers. The more seasoned radiologists did not see a statistically significant boost. This pattern suggests that AI's accuracy benefit may be largely a training-level effect — it brings novices closer to expert performance, but does little for the experts themselves. If you are an experienced radiologist, the evidence does not support the idea that AI will make you more accurate.

  • Sensitivity improved in 19/25 studies, no change in 5, reduction in 1.
  • Specificity improved in 14/24 studies, no change in 7, reduction in 3.
  • Of those analyzing reader experience, 9/16 showed gains only among novices.

The 67% Efficiency Headline — and Why Meta-Analysis Says It's Noise

You have probably seen the statistic: 67% of studies report reductions in task time with AI. That figure comes from a 2024 systematic review and meta-analysis by Wenderott et al., published in npj Digital Medicine. It sounds like a resounding endorsement. But the 67% is a simple count of studies that reported any reduction — not a pooled effect size. When the authors performed proper meta-analyses on the subgroups that allowed pooling, the results were strikingly non-significant:

All three meta-analyses found no statistically significant difference. The 67% headline reflects a count of directional studies, not a measured effect.
Meta-analysisStandardized mean difference (SMD)95% CIp-value
CT reading time (4 studies)-0.60-2.02 to 0.820.3096.35%
Colonoscopy procedure time (5 studies)-0.04-0.76 to 0.670.8799.45%
Triage turnaround time (3 studies, Aidoc ICH)0.03-0.50 to 0.560.8483.75%

I would not take the 67% figure as proof of efficiency. At best, it means some studies saw some reduction in some contexts. The meta-analyses say the effect is indistinguishable from zero. And the Wenderott review itself notes that only 3 of 48 included studies were RCTs — all with high risk of bias. Of 45 non-randomized studies, only 1 had low risk of bias. The evidence is too weak to conclude that AI improves efficiency in radiology.

What the Evidence Base Actually Looks Like

The efficiency literature is not just thin — it's structurally compromised. Wenderott et al. reported that more than 50% of the studies declared a relevant conflict of interest. Only 4 of 48 studies followed a reporting guideline (3 used STARD, 1 used CONSORT). Of the 45 non-randomized studies, 28 had serious risk of bias, 12 critical, and only 1 low. All 3 RCTs had high risk of bias. No studies explored patient or public experiences.

When a field produces evidence with majority conflicts of interest, abysmal reporting standards, and near-universal risk of bias, the burden of proof shifts. The question is no longer "Does AI work?" but "Why should we believe any of these positive findings?" Until we see large, low-bias, independently funded trials conducted in live clinical pathways, the answer is: we should not.

Horizontal workflow infographic showing a radiology diagnostic pathway splitting into three AI integration branches: Triage (43%), Second Reader (35%), and First Reader (13.5%), converging to Final Report.
How AI is actually deployed in clinical workflows: mostly as a triage or second reader, rarely as a first reader.

So What Should You Do?

I am not anti-AI. The evidence shows narrow, potentially valuable signals: sensitivity gains for less experienced readers, and some time reductions in specific, well-defined contexts. But those signals are not strong enough to justify the widespread adoption that the regulatory numbers might suggest. Until more studies evaluate AI in live clinical pathways — with adequate methodology, independent funding, and patient-relevant outcomes — clinicians should treat most vendor claims as unsubstantiated.

If you are evaluating an AI tool for procurement or clinical use, ask these questions:

  • Was the tool tested in a live clinical pathway, or only on retrospective datasets?
  • Are the supporting studies free of conflicts of interest? Did they follow STARD or CONSORT?
  • Do the reported benefits apply to your staff's experience level, or mostly to novices?
  • Is the evidence based on meta-analytic effect sizes, or just a count of positive studies?

The field needs large, low-bias RCTs in real-world settings that measure patient-relevant outcomes — not just AUC on a holdout set. Until that happens, the gap between the number of FDA clearances and the quality of clinical evidence will remain the most important story in AI and medical imaging.