Barriers and Success Factors for Conversational AI in Health Systems

A two-panel clinical illustration: left panel shows a stressed physician typing at a computer screen while a patient sits apart on an exam table; right panel shows the same physician making eye contact and speaking with the patient while a subtle glowing waveform and small microphone icon in the background represent ambient AI capturing the conversation without a visible screen. — The shift from screen-focused documentation to patient-focused dialogue is the core promise of conversational AI — but realizing it at scale requires navigating a complex set of deployment barriers.

The Real Adoption Landscape: What 43 Health Systems Report

In 2024, the Scottsdale Institute surveyed 43 large U.S. non-profit health systems — each with at least $1 billion in net patient revenue — to understand how they are actually adopting artificial intelligence. The results, published in a peer-reviewed journal, offer the most detailed snapshot available of real-world AI deployment priorities, successes, and pain points across major health systems. The survey achieved a 64% response rate, lending weight to its findings.

The headline finding is counterintuitive. The dominant barrier to AI adoption is not clinician skepticism or resistance to new technology. Only 17% of respondents ranked low clinician adoption among their top two barriers. Instead, the survey reveals three structural obstacles that health system leaders must address head-on:

AI tool immaturity — cited by 77% of respondents as the top barrier. Health systems report that many conversational AI tools are not yet reliable, accurate, or well-integrated enough for clinical use.
Financial concerns — 47% of health systems identified cost and uncertain return on investment as a major hurdle.
Regulatory uncertainty — 40% of respondents flagged the evolving and ambiguous regulatory landscape as a barrier to deployment.

The survey also reveals a stark divergence in success rates across AI use cases. Ambient clinical documentation — AI scribes that listen to patient encounters and generate clinical notes — stands out as the breakout success. It is the only AI use case where 100% of respondents reported adoption activity. Sixty percent have deployed it in at least limited areas, and 14% have achieved full deployment. More importantly, 53% of organizations report a high degree of success with clinical documentation AI.

By contrast, other well-funded AI categories are struggling to deliver on their promise. Imaging and radiology AI — the most heavily deployed category, with 90% of health systems reporting deployment — achieves high success in only 19% of organizations. Early sepsis detection AI, deployed by 67% of respondents, earns high success marks from just 38%. Revenue cycle AI, a category that has attracted enormous investment, delivers high success for only 23% of health systems.

These numbers tell a clear story: deployment volume does not equal deployment success. Understanding why some use cases thrive while others stall is essential for health system leaders making adoption decisions.

What Works: Ambient Notes as the Breakout Conversational AI Use Case

Ambient documentation has emerged as the unambiguous leader in conversational AI adoption, and the reasons are instructive for any health system evaluating where to invest first. The use case addresses a problem every clinician experiences directly: the burden of clinical documentation. Physician burnout is driven substantially by the time spent on electronic health record (EHR) data entry, and ambient AI scribes offer a direct intervention — they listen to the patient-clinician conversation and automatically generate structured clinical notes.

The evidence supporting this use case is growing. A multicenter study published in JAMA Network Open in 2025 evaluated 263 physicians across six U.S. health systems using an ambient AI scribe from Abridge. The results were striking: physician burnout dropped from 51.9% to 38.8% after 30 days of use, representing 74% lower odds of experiencing burnout. The study, led by Dr. Lee Schwamm at Yale, is the first large, multicenter evaluation of ambient AI scribes on clinician experience. More than 1,000 physicians at Yale New Haven Health System now use the technology.

The financial case is equally compelling. Replacing a physician who leaves practice due to burnout costs health systems between $800,000 and $1.3 million per physician. If ambient AI scribes can reduce burnout and improve retention, the ROI calculation becomes straightforward — even before accounting for improvements in documentation quality, coding accuracy, or patient satisfaction.

Comparison of ambient AI scribe performance and cost metrics against traditional documentation approaches. Sources: Yale/Abridge study (JAMA Network Open 2025), AssemblyAI cost data, Scottsdale Institute survey.
Metric	Ambient AI Scribes	Traditional Documentation
Physician burnout rate (after 30 days)	38.8% (74% lower odds)	51.9% baseline
Physician replacement cost	$800K–$1.3M per physician (avoided)	Full cost incurred
Speech-to-text infrastructure cost	$0.15/hour (AssemblyAI Universal-3 Pro)	$4.15/hour (Amazon Transcribe Medical)
Typical time to positive ROI	6–9 months	N/A
Health system adoption activity	100% of surveyed systems	N/A

Several factors explain why ambient documentation succeeds where other AI use cases struggle. First, the workflow integration is relatively straightforward: the AI listens to an existing conversation and produces a note that the clinician reviews and signs. It does not require changes to how clinicians examine patients or make decisions. Second, the regulatory pathway is clearer — ambient scribes are generally classified as clinical documentation tools rather than medical devices requiring FDA premarket clearance, though this is evolving. Third, the ROI is measurable and direct: reduced documentation time, improved clinician satisfaction, and better coding accuracy.

For health systems evaluating ambient AI scribe vendors, a structured procurement approach is essential. Our Evaluating Ambient AI Scribes: A Structured Procurement Guide for Health Systems provides a detailed framework for comparing vendor capabilities, integration requirements, and evidence quality. For a broader view of the technology landscape beyond basic scribe functionality, see our Ambient Clinical Intelligence: Beyond AI Scribes — The Full Capability Landscape for Health Systems in 2026.

What Struggles: Diagnostic AI and Clinical Risk Stratification

The contrast between ambient documentation and other AI categories could not be starker. Imaging and radiology AI, which has received the most FDA clearances of any AI category — over 1,000 AI-enabled medical devices authorized through traditional premarket pathways — is deployed by 90% of surveyed health systems. Yet only 19% report a high degree of success. This is a sobering statistic for an area that has attracted billions in investment and years of clinical validation studies.

Early sepsis detection AI tells a similar story. Despite being deployed by 67% of health systems, only 38% report high success. Sepsis prediction has been a priority for health systems for over a decade, and numerous AI-powered early warning systems have been developed and commercialized. Yet the gap between controlled study performance and real-world clinical impact remains wide.

Deployment and success rates for major AI use cases across 43 U.S. health systems. Data from the Scottsdale Institute survey (2024).
AI Use Case	Deployment Rate	High Success Rate	Key Challenge
Ambient clinical documentation	60% deployed (14% fully)	53%	Evolving regulatory classification
Imaging / radiology AI	90%	19%	Workflow integration, false positives
Early sepsis detection AI	67%	38%	Alert fatigue, low specificity
Revenue cycle AI	Not specified	23%	Integration complexity, data quality

Why do these use cases underperform despite years of development and substantial investment? The reasons are multifaceted. Diagnostic AI tools often require integration into complex clinical workflows — a radiologist must review AI-flagged images, a clinician must act on a sepsis alert — and the AI's output is just one input among many. False positives erode trust. Alert fatigue sets in. The AI may perform well in a controlled validation study but degrade in real-world settings where patient populations differ, data quality varies, and clinical workflows are unpredictable.

A systematic umbrella review published in the International Journal of Medical Informatics in March 2026 analyzed 44 review articles covering AI-based conversational agents in healthcare. The review found that only addiction-support applications — smoking cessation and substance use — reported uniformly positive health outcomes. Clinical decision support and mental health applications showed mixed results. The review concluded that most evidence relies on short-term studies and lacks longitudinal or experimental designs, calling for greater transparency and standardization in conversational agent development.

For a grounded perspective on how AI tools perform outside of controlled studies, see our analysis of AI and Healthcare: What Real Clinical Deployments Actually Look Like, which examines the messy realities of implementation across multiple use cases.

The Three Dominant Barriers: Immaturity, Finance, and Regulation

Barrier 1: AI Tool Immaturity (77%)

The finding that 77% of health systems cite AI tool immaturity as the top barrier should give pause to vendors and investors who have promoted conversational AI as ready for prime time. Health system leaders report that many tools are not yet reliable enough for clinical use — they produce errors, fail to handle edge cases, or degrade in performance when deployed at scale.

The evidence gap supports this perception. The umbrella review of 44 reviews found that most studies on conversational AI in healthcare are short-term and lack rigorous experimental designs. Longitudinal data on real-world performance, safety, and patient outcomes is scarce. Health systems are being asked to invest in tools whose long-term effectiveness remains unproven.

Barrier 2: Financial Concerns (47%)

Nearly half of health systems identify financial concerns as a top barrier. This is not simply about the upfront cost of AI software. It reflects a deeper uncertainty about whether conversational AI investments will generate sufficient return to justify the expense, especially when budgets are constrained and competing priorities abound.

The cost-benefit analysis varies dramatically by use case. For ambient documentation, the financial case is increasingly clear. Speech-to-text infrastructure costs have dropped significantly: AssemblyAI's Universal-3 Pro with Medical Mode costs approximately $0.15 per hour, roughly 28 times cheaper than Amazon Transcribe Medical at $4.15 per hour. When combined with the $800,000 to $1.3 million cost of replacing a burned-out physician, the ROI for ambient AI scribes becomes compelling — most organizations see positive returns within six to nine months.

For diagnostic AI and clinical risk stratification tools, the financial case is less clear. These tools often require significant integration costs, ongoing monitoring, and workflow redesign. The ROI depends on outcomes that are harder to measure — reduced length of stay, avoided adverse events, improved diagnostic accuracy — and may take years to materialize.

Barrier 3: Regulatory Uncertainty (40%)

Regulatory uncertainty is the third major barrier, cited by 40% of health systems. The FDA has authorized over 1,000 AI-enabled medical devices through traditional premarket pathways — predominantly 510(k) clearances for imaging applications. However, regulators acknowledge that adaptive and generalized AI systems present challenges for frameworks designed for static, single-indication devices.

As of early 2026, no generative AI system has been cleared by the FDA for use as a medical device (SaMD). This creates significant uncertainty for health systems considering conversational AI tools that use large language models or other generative approaches. If a tool is classified as a medical device, it requires FDA clearance — a process that can take years and cost millions. If it is not classified as a medical device, the regulatory status may change as the FDA updates its guidance.

An npj Digital Medicine commentary published in September 2025 highlighted these challenges, noting that generative AI voice agents face significant technical and safety issues including latency, generative unpredictability (hallucinations), failure to identify high-risk scenarios, and regulatory uncertainty. The commentary also noted that a randomized crossover trial found an AI-enabled voice assistant achieved 97.7% agreement with human staff for COVID-19 screening and was rated 'good or outstanding' by 87% of participants — demonstrating that the technology can work in controlled settings, even as deployment at scale remains challenging.

For a deeper exploration of AI risk management frameworks that health systems can adopt to navigate this uncertainty, see our article on the NIST AI Risk Management Framework in Healthcare.

Financial Models and ROI: Building the Business Case

For health system decision-makers, building a credible business case for conversational AI requires understanding both the market trajectory and the specific cost-benefit dynamics of each use case.

The global conversational AI in healthcare market was estimated at $18.83 billion in 2025 and is projected to reach $59.12 billion by 2030, growing at a compound annual growth rate (CAGR) of 25.7%, according to a Research and Markets report published in April 2026. North America accounted for the largest revenue share in 2025. Key growth drivers include the increasing shortage of healthcare professionals, adoption of digital health platforms, expansion of value-based care models, and technological advances in generative AI and agentic AI.

The cost-benefit analysis for specific use cases reveals wide variation. The following table summarizes the key financial considerations for the three most common conversational AI deployment categories:

Financial comparison of major conversational AI deployment categories. Cost data from AssemblyAI and Amazon Transcribe Medical. ROI timelines are estimates based on industry reports and should be validated against specific vendor and health system contexts.
Cost/Benefit Factor	Ambient Documentation	Diagnostic AI	Clinical Risk Stratification
Infrastructure cost (speech-to-text)	$0.15/hour (modern API)	$4.15/hour (legacy)	Variable
Primary ROI driver	Burnout reduction, retention	Diagnostic accuracy, efficiency	Adverse event prevention
Physician replacement cost avoided	$800K–$1.3M per physician	N/A	N/A
Typical time to positive ROI	6–9 months	12–24 months	12–36 months
Integration complexity	Low to moderate	High	High
Regulatory risk	Low to moderate	High (FDA clearance likely required)	Moderate to high

The physician burnout replacement cost is a particularly powerful argument for ambient documentation. When a physician leaves practice due to burnout, the health system incurs costs of $800,000 to $1.3 million for recruitment, onboarding, and temporary coverage. If ambient AI scribes can reduce burnout odds by 74% — as the Yale/Abridge study suggests — the potential savings across a multi-physician practice are substantial.

A Maturity Model for Conversational AI Implementation

Health systems at different stages of conversational AI adoption face different challenges and require different strategies. A maturity model can help organizations assess their current position and plan their next steps. The following four-stage framework, adapted from industry analysis, provides a useful roadmap.

A five-step ascending ladder illustration in gradient blue-to-gold colors, showing progressive levels of conversational AI implementation maturity in healthcare — from a single microphone and node at the bottom to fully connected waveforms and a medical cross symbol at the top, with each step adding more integration and data flow complexity. — Conversational AI implementation maturity model: from foundational exploration to transformative, empathetic engagement.

Stage 1: Foundational Exploration

At this stage, health systems are conducting pilot projects with one or two conversational AI tools, typically in a single department or clinic. The focus is on understanding the technology's capabilities and limitations, assessing vendor quality, and gathering initial user feedback. Key activities include:

Running small-scale pilots with ambient documentation or patient triage tools
Establishing a cross-functional AI governance committee
Developing evaluation criteria for vendor selection
Identifying clinical champions in target departments

Stage 2: Integrated and Governed Implementation

Health systems at this stage have moved beyond pilots to structured deployments with formal governance, integration into EHR systems, and defined success metrics. The focus shifts from exploration to operationalization. Key activities include:

Deploying ambient documentation across multiple departments
Integrating AI tools with EHR workflows (e.g., Epic, Oracle Health)
Establishing ongoing monitoring and performance tracking
Developing clinician training and change management programs

Stage 3: Optimized and Proactive Assistance

At this stage, conversational AI is embedded in clinical workflows and begins to provide proactive, context-aware assistance. The AI does not just document — it surfaces relevant information, suggests next steps, and helps clinicians make decisions. Key activities include:

Deploying AI-powered clinical decision support at the point of care
Using AI to pre-populate orders, referrals, and follow-up instructions
Implementing real-time quality and safety monitoring
Expanding AI to additional clinical specialties and settings

Stage 4: Transformative and Empathetic Engagement

This is the aspirational stage where conversational AI enables fundamentally new models of care delivery. AI systems understand patient context, preferences, and emotional state, and engage in natural, empathetic conversations. The focus is on transforming the patient experience and enabling more personalized, proactive care. Key activities include:

Deploying AI-powered patient engagement and navigation tools
Using AI to support chronic disease management and care coordination
Integrating AI with population health and value-based care initiatives
Continuously improving AI models based on real-world outcomes

Most health systems are currently between Stages 1 and 2. The Scottsdale Institute survey data suggests that even among large, well-resourced organizations, full deployment of conversational AI remains rare — only 14% have fully deployed ambient documentation, the most successful use case. The path to higher maturity requires sustained investment, strong governance, and a willingness to learn from both successes and failures.

Recommendations for Health System Leaders

The evidence from the Scottsdale Institute survey, combined with the broader research landscape, supports several actionable recommendations for health system leaders planning conversational AI deployments.

Start with focused, high-impact use cases. Ambient documentation is the clear winner — it addresses a universal pain point, has the strongest evidence base, and offers the most straightforward ROI. Begin with a pilot in one or two departments, measure outcomes rigorously, and scale based on results.
Prioritize EHR integration and workflow alignment. The success of conversational AI depends on how well it fits into existing clinical workflows. Tools that require clinicians to change how they work will face resistance. Invest in integration with your EHR platform and involve clinical champions early in the evaluation process.
Invest in rigorous evaluation and monitoring frameworks. The evidence gap identified by the umbrella review underscores the importance of health systems conducting their own evaluations. Track not just adoption rates and user satisfaction, but also clinical outcomes, safety incidents, and equity impacts. Publish your findings to contribute to the broader evidence base.
Prepare for evolving regulatory requirements. The regulatory landscape for conversational AI is in flux. Stay informed about FDA guidance developments, particularly around generative AI and adaptive systems. Consider adopting the NIST AI Risk Management Framework as a governance foundation, regardless of whether your tools currently require FDA clearance.
Build the financial case on total cost of ownership, not just software licensing. The cost of conversational AI extends beyond the vendor subscription. Include integration costs, training, ongoing monitoring, and the potential savings from reduced clinician burnout and improved retention. The physician replacement cost of $800,000 to $1.3 million per physician is a powerful argument for investments that improve clinician satisfaction.
Be realistic about what works and what does not. The survey data is clear: deployment volume does not equal deployment success. Diagnostic AI and clinical risk stratification tools may require more time, investment, and workflow redesign than ambient documentation. Do not over-invest in struggling categories at the expense of proven ones.

The path to successful conversational AI deployment is not about finding the perfect technology — it is about understanding the real barriers, choosing the right use cases, and building the organizational infrastructure to support sustained adoption. Health systems that start with ambient documentation, invest in integration and governance, and maintain realistic expectations about what AI can and cannot do will be best positioned to realize the technology's potential.

Barriers and Success Factors for Deploying Conversational AI in Clinical Workflows: Lessons from Health System Implementations