A study found that ChatGPT Health under-triaged more than half of simulated medical emergencies during an AI triage evaluation. 
Medicine

ChatGPT Health Missed Over Half of Medical Emergencies in AI Triage Test, Nature Medicine Study Warns

Researchers found ChatGPT Health underestimated many emergency conditions in clinical scenarios across multiple specialties

Author : M Subha Maheswari

A new study has raised concerns about the reliability of artificial intelligence tools in medical triage after researchers found that ChatGPT Health underestimated the severity of many emergency conditions.

The research, published on February 23, 2026 in Nature Medicine, evaluated the ability of ChatGPT Health, a consumer health guidance tool developed by OpenAI, to recommend appropriate levels of medical care. Researchers found that the system under-triaged more than half of cases that required emergency treatment.

ChatGPT Health launched in January 2026 as a health-focused version of the chatbot designed to help users ask medical questions, analyze health records, and receive guidance about symptoms and care options.

Researchers Test ChatGPT Health Triage Using 60 Clinical Scenarios

The study was led by Dr. Ashwin Ramaswamy and colleagues at the Icahn School of Medicine at Mount Sinai in New York. The research team evaluated the system using 60 clinician-written patient scenarios covering 21 medical specialties.

The scenarios represented a wide range of health conditions, including routine medical issues, urgent problems, and life-threatening emergencies.

Researchers tested each scenario under 16 contextual conditions. These included variations in patient demographics, social factors such as a person minimizing symptoms, and barriers to healthcare access such as lack of insurance or transportation.

In total, the team generated 960 interactions with ChatGPT Health and compared the AI's recommendations with the consensus triage decisions of three independent physicians.

Triage refers to the process of determining how urgently a patient needs medical care and whether they should seek emergency treatment, urgent care, routine consultation, or home management.

ChatGPT Health Underestimated Many Emergency Medical Conditions

The study found that ChatGPT Health under-triaged 52 percent of cases that physicians determined required immediate emergency care.

In these cases, the system advised patients to seek non-urgent care or to consult a physician within 24 to 48 hours rather than going to the emergency department.

Researchers observed that the system performed well in some clear and well-recognized emergencies such as stroke and anaphylaxis. However, it struggled with conditions that required more nuanced clinical judgment.

For example, the AI sometimes failed to recommend emergency care in scenarios involving diabetic ketoacidosis or early respiratory failure, conditions that require urgent treatment.

Researchers also noted that the chatbot sometimes recognized warning signs of serious illness in its explanation but still reassured the user instead of recommending emergency care.

Study Also Found Over-Triage in Non-Emergency Cases

While the system frequently underestimated serious conditions, it also overestimated risk in some situations.

Researchers found that about 64.8 percent of cases involving healthy individuals or non-serious conditions were incorrectly directed to seek urgent medical attention.

The researchers described the system’s overall performance as following an inverted U-shaped pattern. The AI handled moderately urgent medical cases relatively well but showed weaker performance at both low-risk and high-risk extremes.

See also: OpenAI Responds to Lawsuit Linking ChatGPT to Teen Suicide

ChatGPT Health Showed Inconsistent Suicide Crisis Alerts

The study also evaluated how ChatGPT Health responded to scenarios involving suicidal thoughts.

The system is designed to display crisis resources such as the 988 Suicide and Crisis Lifeline when users express suicidal intent. However, researchers found that these alerts appeared inconsistently.

In some situations the system displayed the crisis warning during lower risk conversations, while in other cases it failed to trigger when users described specific plans for self-harm.

Researchers said this inconsistency raised concerns about safety safeguards in AI health systems.

Study Finds Context and Social Cues Influenced AI Triage Decisions

The researchers also examined how additional context affected the system’s triage decisions.

For example, when a scenario included a comment from a friend or family member minimizing the patient’s symptoms, the AI was more likely to recommend less urgent care.

The study did not find statistically significant differences in triage outcomes related to variables such as race or gender in the tested scenarios.

Experts Call for Independent Safety Testing of AI Health Tools

Experts not involved in the study said the findings highlight the need for careful oversight of AI systems used for health guidance.

Dr. Isaac Kohane, chair of biomedical informatics at Harvard Medical School, noted that large language models are increasingly used by patients seeking health advice online and stressed the importance of independent safety testing before widespread adoption.

Researchers emphasized that AI tools should not replace professional medical assessment and should be evaluated continuously to identify potential risks.

OpenAI Responds to Study on ChatGPT Health Safety Concerns

OpenAI responded by stating that it welcomes independent research evaluating its AI systems and that the findings may not reflect how users typically interact with the tool in real-world settings.

The company also said that ChatGPT Health continues to undergo updates and improvements.

Researchers said further studies are needed to examine how such tools perform in real clinical environments and to develop safeguards that can reduce the risk of delayed care or inappropriate medical advice.

Reference:

1. Armstrong, Stephen. “ChatGPT’s Health AI Has Dangerous Flaws, Study Warns.” BMJ 392 (2026): s438. https://www.bmj.com/content/392/bmj.s438

2-Year-Old Treated for Rare Digestive Obstruction with Mirror-Image Organs

Mumbai Homeopathic Doctor Poonam Sharma Arrested for Performing Unauthorized Ear-Lobe Surgeries at Chennai Hotel

Kerala Doctor Athira Sugathan Secures UPSC Rank 483 After Surviving Accident That Left Her Paralyzed

Veg Thali Cost Stays Flat in February, Non-Veg Thali Cheaper by 3%: Crisil

Cychlorphine: New Synthetic Opioid Linked to Overdose Deaths Detected Across Multiple U.S. States, Health Officials Warn