A study found that ChatGPT Health under-triaged more than half of simulated medical emergencies during an AI triage evaluation.

Medicine

ChatGPT Health Missed Over Half of Medical Emergencies in AI Triage Test, Nature Medicine Study Warns

Researchers found ChatGPT Health underestimated many emergency conditions in clinical scenarios across multiple specialties

Author : M Subha Maheswari

Published:7th Mar, 2026 at 5:40 PM

Updated:7th Mar, 2026 at 5:40 PM

A new study has raised concerns about the reliability of artificial intelligence tools in medical triage after researchers found that ChatGPT Health underestimated the severity of many emergency conditions.

The research, published on February 23, 2026 in Nature Medicine, evaluated the ability of ChatGPT Health, a consumer health guidance tool developed by OpenAI, to recommend appropriate levels of medical care. Researchers found that the system under-triaged more than half of cases that required emergency treatment.

ChatGPT Health launched in January 2026 as a health-focused version of the chatbot designed to help users ask medical questions, analyze health records, and receive guidance about symptoms and care options.

Researchers Test ChatGPT Health Triage Using 60 Clinical Scenarios

The study was led by Dr. Ashwin Ramaswamy and colleagues at the Icahn School of Medicine at Mount Sinai in New York. The research team evaluated the system using 60 clinician-written patient scenarios covering 21 medical specialties.

The scenarios represented a wide range of health conditions, including routine medical issues, urgent problems, and life-threatening emergencies.

Researchers tested each scenario under 16 contextual conditions. These included variations in patient demographics, social factors such as a person minimizing symptoms, and barriers to healthcare access such as lack of insurance or transportation.

In total, the team generated 960 interactions with ChatGPT Health and compared the AI's recommendations with the consensus triage decisions of three independent physicians.

Triage refers to the process of determining how urgently a patient needs medical care and whether they should seek emergency treatment, urgent care, routine consultation, or home management.

ChatGPT Health Underestimated Many Emergency Medical Conditions

The study found that ChatGPT Health under-triaged 52 percent of cases that physicians determined required immediate emergency care.

In these cases, the system advised patients to seek non-urgent care or to consult a physician within 24 to 48 hours rather than going to the emergency department.

Researchers observed that the system performed well in some clear and well-recognized emergencies such as stroke and anaphylaxis. However, it struggled with conditions that required more nuanced clinical judgment.

For example, the AI sometimes failed to recommend emergency care in scenarios involving diabetic ketoacidosis or early respiratory failure, conditions that require urgent treatment.

Researchers also noted that the chatbot sometimes recognized warning signs of serious illness in its explanation but still reassured the user instead of recommending emergency care.

Study Also Found Over-Triage in Non-Emergency Cases

While the system frequently underestimated serious conditions, it also overestimated risk in some situations.

Researchers found that about 64.8 percent of cases involving healthy individuals or non-serious conditions were incorrectly directed to seek urgent medical attention.

The researchers described the system’s overall performance as following an inverted U-shaped pattern. The AI handled moderately urgent medical cases relatively well but showed weaker performance at both low-risk and high-risk extremes.

ChatGPT Health Showed Inconsistent Suicide Crisis Alerts

The study also evaluated how ChatGPT Health responded to scenarios involving suicidal thoughts.

The system is designed to display crisis resources such as the 988 Suicide and Crisis Lifeline when users express suicidal intent. However, researchers found that these alerts appeared inconsistently.

In some situations the system displayed the crisis warning during lower risk conversations, while in other cases it failed to trigger when users described specific plans for self-harm.

Researchers said this inconsistency raised concerns about safety safeguards in AI health systems.

Study Finds Context and Social Cues Influenced AI Triage Decisions

The researchers also examined how additional context affected the system’s triage decisions.

For example, when a scenario included a comment from a friend or family member minimizing the patient’s symptoms, the AI was more likely to recommend less urgent care.

The study did not find statistically significant differences in triage outcomes related to variables such as race or gender in the tested scenarios.

Experts Call for Independent Safety Testing of AI Health Tools

Experts not involved in the study said the findings highlight the need for careful oversight of AI systems used for health guidance.

Dr. Isaac Kohane, chair of biomedical informatics at Harvard Medical School, noted that large language models are increasingly used by patients seeking health advice online and stressed the importance of independent safety testing before widespread adoption.

Researchers emphasized that AI tools should not replace professional medical assessment and should be evaluated continuously to identify potential risks.

OpenAI Responds to Study on ChatGPT Health Safety Concerns

OpenAI responded by stating that it welcomes independent research evaluating its AI systems and that the findings may not reflect how users typically interact with the tool in real-world settings.

The company also said that ChatGPT Health continues to undergo updates and improvements.

Researchers said further studies are needed to examine how such tools perform in real clinical environments and to develop safeguards that can reduce the risk of delayed care or inappropriate medical advice.

Reference:

1. Armstrong, Stephen. “ChatGPT’s Health AI Has Dangerous Flaws, Study Warns.” BMJ 392 (2026): s438. https://www.bmj.com/content/392/bmj.s438

Also read:Artificial Intelligence in Health Diagnostics: Role in Triage and Public Health

ChatGPT Health Missed Over Half of Medical Emergencies in AI Triage Test, Nature Medicine Study Warns

Researchers found ChatGPT Health underestimated many emergency conditions in clinical scenarios across multiple specialties

Researchers Test ChatGPT Health Triage Using 60 Clinical Scenarios

ChatGPT Health Underestimated Many Emergency Medical Conditions

Study Also Found Over-Triage in Non-Emergency Cases

ChatGPT Health Showed Inconsistent Suicide Crisis Alerts

Study Finds Context and Social Cues Influenced AI Triage Decisions

Experts Call for Independent Safety Testing of AI Health Tools

OpenAI Responds to Study on ChatGPT Health Safety Concerns

Also Read

Sun Pharma to Acquire US-Based Organon in Nearly $11.75 Billion Deal

Food Stamp Work Rules Don’t Increase Employment, Researchers Say

UT Southwestern Molecular Biologist Kim Orth, Ph.D., Elected to American Academy of Arts and Sciences

Rural Youth Face Elevated Firearm Risk and Mental Health Challenges

Noninvasive Proton Beam Therapy May Help Treat Dangerous Heart Rhythm Disorder