Executive Summary#
AI medical devices are proliferating faster than regulatory infrastructure can track their failures. With over 1,200 FDA-authorized AI devices and a 14% increase in AI-related malpractice claims since 2022, understanding the liability landscape has never been more critical.
This comprehensive analysis covers:
- Radiology AI misdiagnosis, Cases, settlements, and the liability trap for radiologists
- Sepsis prediction failures, The Epic model controversy and hospital protocol implications
- Dermatology AI bias, Documented racial disparities and discrimination liability
- Standard of care, What physicians and hospitals are expected to do when using AI
- Emerging case law, How courts are approaching AI liability
The Expanding Universe of AI Medical Devices#
The FDA’s AI-Enabled Medical Devices List now includes over 1,200 authorized devices, up from 864 at the end of 2023. In 2024 alone, 235 AI devices were authorized, the most in FDA history. This rapid proliferation has outpaced the regulatory infrastructure designed to track failures and adverse events.
Radiology dominates the AI medical device landscape, accounting for over 75% of FDA AI approvals. Cardiology, pathology, and neurology follow. Leading vendors include GE Healthcare (96 cleared tools), Siemens Healthineers (80), Philips (42), and numerous startups.
Yet this growth comes with a critical gap: the FDA’s adverse event reporting system was not designed for AI, and the mechanisms to identify when algorithms fail patients remain underdeveloped.
The MAUDE Database Challenge#
The FDA’s Manufacturer and User Facility Device Experience (MAUDE) database collects reports of adverse events involving medical devices. However, research reveals significant limitations when tracking AI-specific failures.
AI Events Are Underreported#
A 2025 study in npj Digital Medicine analyzed MAUDE data for AI/ML device adverse events:
- Between 2010 and 2023, researchers identified only 943 adverse event reports across 823 unique FDA-cleared AI/ML devices
- The vast majority of reports came from just two devices, a mass spectrometry microbial identification system and mammography AI
- Most reported events were unrelated to the AI/ML algorithms themselves
- The current MAUDE system lacks a mechanism for reporting AI-specific failures
Missing Performance Data#
A 2025 cross-sectional study of 691 FDA-cleared AI/ML devices found alarming gaps in reported information:
| Missing Element | Percentage |
|---|---|
| Study design not described | 46.7% |
| Training sample size omitted | 53.3% |
| Demographic representation not reported | 95.5% |
| Premarket safety assessments documented | Only 28.2% |
Postmarket adverse events, including one death, were reported for only 36 devices (5.2%).
AI Drift Goes Untracked#
Two critical AI failure modes remain essentially invisible in current reporting:
- Concept drift: The relationship between inputs and outputs changes over time as real-world data diverges from training data
- Covariate shift: Input data distributions change even when the underlying relationships remain constant
Neither phenomenon has a standardized reporting mechanism in MAUDE.
Radiology AI: The Highest-Risk Category#
Radiology AI comprises the largest share of FDA-cleared devices and generates the most adverse event reports. As of July 2025, 873 radiology AI tools have been approved, accounting for 78% of all new AI medical device approvals. A study using MAUDE data found that mammography AI was implicated in 69% of reports, with the majority being near-miss events.
Performance Statistics#
Real-world AI radiology performance varies significantly:
| AI Application | Reported Accuracy | Clinical Reality |
|---|---|---|
| Viz.ai stroke detection | AUC > 0.90 | Retrospective data only |
| Aidoc intracranial hemorrhage | >90% sensitivity | Low false-positive rates in studies |
| Lunit mammography (INSIGHT MMG) | AUC 0.881 with AI | Up from 0.810 without AI |
| Breast cancer detection (Korean study) | 90% sensitivity | Radiologists: 78% |
| Early breast cancer detection | 91% accuracy | Radiologists: 74% |
However, a 2025 systematic review of 83 studies found overall diagnostic accuracy of only 52.1% for generative AI, with no significant performance difference between AI and physicians (p = 0.10).
Critical concern: An FDA-cleared AI algorithm misdiagnosed a finding as intracranial hemorrhage when the patient actually had ischemic stroke, the opposite condition requiring different treatment.
The Liability Trap: When AI and Radiologists Disagree#
The integration of AI creates a novel liability trap for radiologists. As legal analysts note:
“If AI flags a lung nodule on a chest radiograph that the radiologist doesn’t see and therefore doesn’t mention in the report, and that nodule turns out to be cancerous, the radiologist may be liable not just for missing the cancer but for ignoring AI’s advice.”
This creates pressure to follow AI recommendations even when clinical judgment suggests otherwise. Yet legal scholars caution that accepting AI recommendations may also increase liability, particularly when AI recommends nonstandard care.
Validation Gaps#
A 2025 JAMA Network Open study examining FDA-cleared radiology AI devices found troubling validation deficiencies:
- Only 1.6% of devices (6 of 691) cited a randomized clinical trial in their FDA submissions
- Less than 1% (3 devices) reported actual patient health outcomes
- Real-world performance often drops 15-30% below benchmark accuracy due to population shifts and integration barriers
- 97% of devices were cleared through the 510(k) pathway, which does not require safety and efficacy assessment
Vendors report sensitivity, specificity, and AUC metrics, but these may not reflect performance in diverse clinical settings.
Recent Radiology Malpractice Verdicts (2023-2025)#
While few cases specifically name AI software, radiology misdiagnosis remains the most common malpractice allegation. Diagnostic errors account for over 75% of radiology malpractice claims. These cases establish the liability framework that will apply as AI becomes more prevalent:
$120 Million Verdict (New York, 2023) A patient’s basilar artery occlusion was not recognized on CT and initially misinterpreted by the radiology resident on duty. This remains one of the largest radiology malpractice verdicts on record.
$9 Million Settlement (New York, 2024) A radiologist negligently failed to identify a breast mass as cancer, delaying diagnosis and treatment. Settlement reached after evidence showed the mass was visible on prior imaging.
$9.9 Million Settlement (Atlanta, 2025) Multiple experts confirmed that an arteriovenous malformation (AVM) was “plainly visible” on the initial CT scan. The court found that failing to identify such an obvious abnormality represented “a glaring breach of the standard of care.”
$7.1 Million Verdict (Pennsylvania, 2024) A 27-year-old woman was left legally blind after a radiologist failed to diagnose cerebral venous thrombosis on CT scan. The radiologist, reviewing remotely, reported results as normal despite clear signs of blood clots. Breakdown: $2.35M future medical costs, $1.28M lost earnings, $3.5M pain and suffering.
$4 Million Verdict (New York, February 2024) Radiologist failed to properly interpret mammogram and ultrasound, missing breast cancer for nearly two years despite visible abnormalities and family history.
$3 Million Judgment (Maryland, 2024) Imaging center found liable after misdiagnosing a growth as benign when it was cancerous. The cancer spread and the patient received a terminal diagnosis, the largest medical malpractice judgment in the county’s history.
Sepsis Prediction: A Case Study in Algorithm Failure#
Sepsis afflicts over 48.9 million people annually worldwide, with approximately 11 million deaths. In the United States, sepsis is a leading cause of hospital mortality and the most expensive condition billed to Medicare. AI prediction models were supposed to help, but the evidence suggests they may be causing as much harm as good.
The Scale of Hospital Sepsis#
- 1.7 million adults develop sepsis annually in the U.S.
- 270,000 die from sepsis-related complications
- $62 billion annual cost to U.S. healthcare system
- Every hour of delayed treatment increases mortality risk by 7-8%
Epic Sepsis Model Performance#
The Epic Sepsis Model (ESM) is deployed at hundreds of hospitals across the United States, used by 170 Epic customers representing hundreds of facilities. Independent research has raised significant concerns about its performance.
Key Findings:
A JAMA Internal Medicine study found:
- The model failed to identify two-thirds of sepsis patients
- Area under the curve (AUC) was only 0.63, well below the 0.76-0.83 range Epic reported
- The model triggered frequent false alarms
A February 2024 University of Michigan study found:
- Accuracy dropped to 62% when using data recorded before sepsis criteria were met
- Accuracy fell further to 53% when predictions were limited to before blood cultures were ordered
A follow-up study in 2024 concluded that Epic’s model “cheats” by relying on clinician judgments already in the EMR, essentially alerting to sepsis after clinicians have already begun suspecting it.
Clinical Impact#
The consequences of algorithm failure extend in both directions:
Missed Cases:
“Errant alarms may lead to unnecessary care or divert clinicians from treating sicker patients in emergency departments or intensive care units where time and attention are finite resources.”
Alert Fatigue: When algorithms trigger frequent false positives, clinicians begin ignoring them, potentially missing the cases where the alert was accurate.
Bias in Training Data#
A 2025 analysis in the American Journal of Bioethics highlighted training data limitations:
- Only 3% of training dataset sentences contained any social determinants of health (SDOH) data
- SDOH data are “notoriously under-documented in existing EHR structured data”
- This missing information compromises prediction accuracy for underserved populations
Epic’s Response#
Epic has defended its model, stating:
“Last fall, we released an updated version of the sepsis predictive model and we are working with our customers to implement it. The live organizations have seen more timely alerts and fewer false positives.”
Epic has also cited research showing its model reduced sepsis mortality odds by 44% and improved antibiotic administration timing by 40 minutes.
Legal Implications#
No Direct Lawsuits:Yet: Despite documented performance issues, no lawsuits specifically targeting Epic’s sepsis prediction algorithm have been publicly reported as of late 2025. This gap reflects:
- Causation challenges: Proving that an algorithm failure, rather than the underlying disease or clinician decisions, caused harm is difficult
- Contract protections: Epic’s agreements with hospitals may limit liability
- Discovery barriers: Proprietary algorithms are difficult to examine in litigation
Potential Theories:
- False Claims Act: Health tech whistleblower attorneys note that incentive payments Epic makes to hospitals to adopt its algorithms could create False Claims Act liability if the algorithms are defective and harm Medicare/Medicaid patients
- Negligent deployment: Hospitals that deploy underperforming algorithms without local validation may face liability
- Failure to update: If Epic knew of performance issues and failed to warn customers or update the model, product liability may attach
Competing Models Show Promise#
Not all sepsis algorithms fail. The COMPOSER deep learning model, deployed at UC San Diego Health, showed:
- 1.9% absolute reduction in in-hospital sepsis mortality (17% relative decrease)
- 5% improvement in sepsis bundle compliance
- 4% reduction in 72-hour SOFA change after sepsis onset
This demonstrates that well-designed and validated algorithms can improve outcomes, making poorly-performing algorithms harder to defend.
Dermatology AI: Documented Racial Bias#
AI systems for skin disease diagnosis exhibit well-documented performance disparities across skin tones, raising both quality-of-care and potential discrimination liability.
The Scope of Disparity#
| Metric | Light Skin | Dark Skin | Source |
|---|---|---|---|
| AI training images | 85-90% | 10-15% | Multiple studies |
| Textbook representation | 82-96% | 4-18% | Stanford HAI |
| Dermatologist accuracy | Higher | Lower | DDI study |
| AI-generated dermatology images | 89.8% | 10.2% | 2024-2025 study |
Performance Disparities#
Researchers created the Diverse Dermatology Images (DDI) dataset, the first publicly available, pathologically confirmed image dataset with diverse skin tones. Key findings:
- State-of-the-art dermatology AI models exhibit substantial limitations on dark skin tones
- Dermatologists who label AI datasets also perform worse on images of dark skin
- Fine-tuning AI models on diverse images closes the performance gap
A January 2024 study on skin lesion segmentation found:
- Significant correlation between segmentation performance and skin color
- Consistent challenges segmenting lesions for darker skin tones across diverse datasets
- Commonly used bias mitigation methods do not significantly reduce bias
The Physician-AI Interaction Problem#
A February 2024 Northwestern University study in Nature Medicine revealed a critical finding:
- AI decision support increased diagnostic accuracy by 33% for dermatologists and 69% for primary care physicians
- Without AI, dermatology specialists achieved 38% accuracy; primary care physicians 19%
- However, for primary care physicians, accuracy increased more for light skin tones than dark ones
- AI assistance actually exacerbated accuracy disparities by 5 percentage points
- Primary care providers who reported seeing “mostly or all white patients” performed significantly worse on dark skin images
The lead researcher concluded: “It’s not the AI that is biased, it’s how physicians use it.”
Generative AI Compounds the Problem#
A 2024-2025 study evaluated AI-generated dermatological images from four leading AI models (Adobe Firefly, ChatGPT-4o, Midjourney, Stable Diffusion):
- Only 10.2% of images reflected dark skin
- Only 15% accurately depicted the intended condition
- AI programs may “exacerbate cognitive bias and health inequity”
As these tools are increasingly used for medical education and clinical references, they risk perpetuating the same gaps that created the bias in the first place.
Educational Material Gaps#
Stanford HAI research found that medical textbooks used to train dermatologists lack example images of darker skin tones:
- The STAR-ED framework found just 1 in 10 images in training materials is in the black-brown range on the Fitzpatrick Scale
- This training gap directly contributes to diagnostic disparities
- AI trained on these same images inherits and may amplify the bias
Dataset Analysis (HAM10000)#
A 2024 analysis of the HAM10000 dataset, one of the most widely used dermatology AI training sets, found:
- Significant imbalance in representation of darker skin tones
- Complete lack of metadata about patient demographics
- This poses “a considerable barrier to comprehensively understanding and addressing the dermatological needs of a racially diverse population”
Proposed Solutions#
Researchers have developed potential fixes:
- DermDiff: A generative model designed to create diverse, representative dermoscopic images
- DDI fine-tuning: Training on diverse datasets closes performance gaps
- STAR-ED framework: Machine learning tool to assess and flag bias in training materials
However, most deployed commercial dermatology AI tools have not incorporated these solutions.
Cardiac AI and Wearable Devices#
Apple Watch Arrhythmia Detection#
Apple’s FDA-cleared ECG and irregular rhythm notification features represent consumer-facing AI medical devices used by millions. Key regulatory details:
- The FDA does not classify smartwatches as medical devices subject to traditional requirements
- The ECG data is “intended for informational use only”
- Users are “not intended to interpret or take clinical action based on the device output without consultation of a qualified healthcare professional”
- The device is indicated only for adults aged 22 and older
Liability Limitation: Apple explicitly states: “This material is provided for information purposes only; Apple assumes no liability related to its use.”
In 2024, the FDA qualified Apple Watch’s AFib history feature for use as a secondary endpoint in cardiac ablation clinical trials, expanding its regulatory footprint while maintaining limited liability.
Surgical Robotics: The Da Vinci Precedent#
While not strictly AI, the da Vinci Surgical System litigation provides precedent for medical device product liability that will inform future AI cases.
Scale of Litigation#
Intuitive Surgical, manufacturer of da Vinci systems, has faced extensive litigation:
- Set aside $67 million to settle approximately 3,000 cases
- In its 2023 SEC filing, acknowledged being a defendant in “a number of individual product liability lawsuits”
Recent Cases#
2025 Florida Verdict: A $504,966 verdict against a physician (not the manufacturer) in a case where a patient died from sepsis after robotic hysterectomy.
2024 Sultzer Wrongful Death Case: Sandra Sultzer died after a da Vinci device allegedly burned and tore her small intestine during colon cancer surgery. The lawsuit alleges Intuitive Surgical knew of insulation problems that could cause electrical burns but failed to disclose this risk.
Common Allegations#
- Equipment malfunction and design defects
- Electrical burns and organ damage
- Inadequate training programs for surgeons
- Failure to warn of known risks
According to a study in The Journal for Healthcare Quality, 174 injuries and 71 deaths related to da Vinci surgery robots have been reported.
Pathology AI: Emerging Standards#
FDA-Approved Tools#
AI pathology tools are rapidly expanding:
- In 2021, cancer diagnostics comprised over 80% of FDA AI device approvals
- Pathology specifically accounted for 19.7% of approvals
- The FDA-approved Paige Prostate model achieved AUC of 0.99 on trial data
- Implementation increased pathologists’ diagnostic sensitivity from 74% to 90%
Performance Gaps#
Despite impressive benchmark metrics, real-world deployment reveals challenges:
- Benchmark accuracies as high as 94.5% show performance drops of 15-30% in clinical use
- Underrepresentation of rural populations in training datasets linked to 23% higher false-negative rate for pneumonia detection
- Melanoma detection errors more prevalent among dark-skinned patients due to dataset imbalances
Does FDA Clearance Establish Standard of Care?#
A critical liability question: does FDA 510(k) clearance or De Novo authorization establish the standard of care, or create a ceiling on manufacturer liability?
The 510(k) Pathway#
Most AI medical devices reach market through the 510(k) pathway, which requires demonstration of “substantial equivalence” to a device already on the market:
- Does not require safety and efficacy assessment
- Is the “simplest, cheapest, and fastest” path to market
- Critics note that a device “substantially equivalent” to one that was recalled may remain on market
Preemption Defense#
Manufacturers may assert federal preemption under the Medical Device Amendments:
- Section 360k provides express preemption for some state tort claims
- However, devices cleared through 510(k) generally receive less preemption protection than those with full Pre-Market Approval (PMA)
Evidence in Litigation#
A product’s compliance with FDA requirements is “properly considered as evidence in determining whether that product is defective.” However, plaintiffs increasingly argue that 510(k) clearance sets a floor, not a ceiling, for safety standards.
The Emerging Liability Framework#
Who Bears Responsibility?#
Current legal frameworks create a complex liability landscape:
Physicians and Hospitals: As the law currently stands, physicians and hospitals shoulder the primary burden of liability. Surgeons generally accept that ultimate responsibility remains theirs, even when using AI.
Manufacturers: May face liability for:
- Design defects in the AI algorithm
- Manufacturing defects (improper training, data poisoning)
- Failure to warn of known limitations
- Failure to provide security updates
Vendors and Integrators: The Dickson v. Dexcom Inc. (2024) case is being watched as a potential precedent for AI/ML manufacturer liability.
Rising Malpractice Claims#
Data from 2024 shows a 14% increase in malpractice claims involving AI tools compared to 2022. The majority stem from diagnostic AI used in:
- Radiology
- Cardiology
- Oncology
Missed cancer diagnoses by machine-learning software have become a central focus in several high-profile lawsuits.
Insurance Industry Response#
Malpractice insurers are adapting:
- Some policies now include AI-specific exclusions
- Others require physicians to undergo AI training to remain covered
- Coverage uncertainty creates risk for both healthcare providers and patients
The AMA Position#
The American Medical Association has stated that “liability and incentives should be aligned so that the individual(s) or entity(ies) best positioned to know the AI system risks and best positioned to avert or mitigate harm do so through design, development, validation, and implementation.”
Practical Implications#
For Healthcare Providers#
Document AI Interactions
- Record when AI recommendations are followed or overridden
- Maintain rationale for clinical decisions that diverge from AI suggestions
- Preserve evidence of human oversight
Validate Before Deployment
- Request vendor performance data specific to your patient population
- Conduct internal validation studies where feasible
- Monitor for drift and performance degradation
Review Insurance Coverage
- Confirm malpractice policies cover AI-assisted care
- Understand any AI-specific exclusions or training requirements
- Consider additional coverage if using novel AI tools
For Patients and Plaintiffs#
Request AI Disclosure
- Ask whether AI systems were used in diagnosis or treatment
- Request documentation of AI recommendations
Identify All Defendants
- Consider manufacturer liability alongside provider malpractice
- Evaluate hospital policies for AI deployment and oversight
- Examine vendor contracts and indemnification arrangements
Preserve Evidence
- AI systems may be updated or retired, document the version in use
- Request algorithm explanations and performance data in discovery
For Manufacturers#
Disclosure Obligations
- Document known limitations and failure modes
- Disclose demographic performance disparities
- Provide clear guidance on intended use and contraindications
Post-Market Surveillance
- Monitor real-world performance against clinical trial results
- Report adverse events that may be algorithm-related
- Issue updates when performance degradation is detected
Contractual Protections
- Review indemnification provisions with deployers
- Clarify liability allocation for off-label or modified use
- Address algorithm update and support obligations
The Standard of Care When Using Medical AI#
Understanding what courts and professional societies expect from physicians and hospitals using AI diagnostic tools is critical for both liability prevention and patient safety.
Core Principles#
Medical malpractice requires proving that a physician’s actions deviated from the standard of care and caused harm. The standard of care is set by the collective actions of professional peers. As AI becomes integrated into clinical practice, these principles apply:
1. Physicians Must Apply Independent Judgment
Regardless of AI output, physicians have a duty to independently evaluate patients and apply clinical reasoning. Legal scholars note:
“A physician who in good faith relies on an AI/ML system to provide recommendations may still face liability if the actions the physician takes fall below the standard of care.”
2. “The AI Told Me To” Is Not a Defense
Courts continue to expect human oversight. If a physician blindly accepts AI recommendations without applying clinical judgment, they can be held liable for the outcome.
3. AI Recommendations Must Be Documented
- Record when AI recommendations are followed or overridden
- Document the reasoning behind diverging from AI suggestions
- Preserve evidence of human oversight in the medical record
What Creates Liability?#
| Scenario | Potential Liability |
|---|---|
| AI flags abnormality, physician ignores it, harm results | Physician liable for ignoring available technology |
| AI misses abnormality, physician relies solely on AI, harm results | Physician liable for failing to apply independent judgment |
| AI recommends nonstandard treatment, physician follows it, harm results | Physician liable for accepting care outside standard practice |
| Hospital deploys unvalidated AI, predictable harm occurs | Hospital liable for negligent deployment |
| AI vendor knew of defect, failed to warn, harm occurs | Vendor liable under product liability |
When AI May Establish the Standard#
As AI systems consistently outperform human physicians in specific tasks, legal theorists argue they may eventually define the standard of care:
“When AI models consistently outperform the average or reasonable physician, such models would establish the standard of care in light of their superior-to-human performance.”
This creates a future where not using proven AI tools could itself be negligent.
Hospital System Obligations#
Healthcare systems deploying AI face institutional standard of care requirements:
- Validate AI on local populations, Algorithms may perform differently on patient populations that differ from training data
- Train staff on capabilities and limitations, Clinicians must understand what AI can and cannot do
- Monitor performance post-deployment, Track accuracy, false positive/negative rates, and disparities
- Maintain human oversight mechanisms, Ensure AI supports rather than replaces clinical judgment
- Update or retire underperforming tools, Continued use of known-defective AI creates liability
The Black Box Problem#
Difficulties in apportioning liability arise when algorithms “cannot be fully understandable both for manufacturer and clinicians, constituting a black box.” This creates challenges for:
- Causation: Proving the AI specifically caused harm
- Discovery: Obtaining proprietary algorithm details in litigation
- Expert testimony: Finding experts who can evaluate AI performance
Frequently Asked Questions#
Who is liable when AI diagnostic software makes a mistake, the physician, the hospital, or the software vendor?
Can I sue if an AI system missed my cancer diagnosis?
Does FDA approval of an AI medical device mean it's safe and effective?
What should I do if I believe an AI medical device harmed me?
Are hospitals required to tell patients when AI is used in their diagnosis?
If dermatology AI is biased against darker skin tones, can I sue for discrimination?
Related Resources#
- AI Medical Misdiagnosis Case Tracker, Structured tracker of specific misdiagnosis cases, verdicts, and settlements
- AI Product Liability, Treating AI systems as products under strict liability
- Insurance Coverage for AI, Malpractice policy gaps for AI use
External Resources#
- FDA AI-Enabled Medical Devices List
- FDA MAUDE Database
- AHRQ Patient Safety Network - ML-Enabled Device Safety
- NIST AI Risk Management Framework
- AMA AI in Health Care Resources
- Holland & Knight: Medical Malpractice in the Age of AI
Harmed by AI Medical Technology?
If you believe an AI diagnostic tool or medical device contributed to a misdiagnosis, delayed treatment, or injury, understanding your legal options is critical. AI medical liability is a rapidly evolving field, connect with attorneys who understand both the technology and the law.
Find Legal Help