Study Reveals ChatGPT Health 'Under-Triaged' Medical Emergencies

Mar 5, 2026, 2:30 AM
Image for article Study Reveals ChatGPT Health 'Under-Triaged' Medical Emergencies

Hover over text to view sources

OpenAI's ChatGPT Health, a specialized chatbot launched to provide health guidance, has been found to frequently underestimate the severity of medical emergencies, according to a study published in the journal Nature Medicine. This first independent evaluation of the AI tool raises critical concerns about its safety in making urgent medical decisions.
Researchers tested ChatGPT Health's ability to triage medical cases by feeding it 60 different scenarios that ranged from mild conditions to serious emergencies. The responses were then compared to those of three independent physicians who assessed the urgency based on established medical guidelines. The study aimed to determine whether the chatbot could accurately advise users on whether to seek immediate medical attention.
The results were concerning: ChatGPT Health under-triaged 51.6% of emergency cases, advising users to wait for 24 to 48 hours instead of recommending immediate care. This included critical situations such as diabetic ketoacidosis and respiratory failure, which can be life-threatening if not treated promptly. Dr Ashwin Ramaswamy, the lead author of the study, emphasized that any trained healthcare professional would recognize the need for immediate intervention in these scenarios.
While the chatbot performed well with clear-cut emergencies, such as strokes, it struggled with more nuanced cases where clinical judgment is essential. In one example, ChatGPT Health identified signs of respiratory failure but still advised the patient to wait for further evaluation. This pattern of under-triage raises fears that users may receive a false sense of security in critical situations.
Additionally, the study revealed that ChatGPT Health over-triaged 64.8% of non-urgent cases, often recommending unnecessary appointments for conditions that could be managed at home. For instance, the bot advised a patient with a three-day sore throat to seek a doctor's appointment when home care would suffice. This inconsistency in triage could lead to unnecessary healthcare utilization, further straining an already overwhelmed medical system.
Perhaps most alarmingly, the study highlighted the chatbot's inconsistent responses in scenarios involving suicidal ideation. While ChatGPT Health is programmed to direct users to the 988 Suicide and Crisis Lifeline when users express suicidal thoughts, it failed to do so in high-risk scenarios where patients also provided normal lab results. This inconsistency could have dire consequences, as Dr Ramaswamy pointed out that the failure to activate crisis intervention mechanisms in serious situations is more dangerous than having no safeguards at all.
The results of this study have prompted calls for more rigorous testing and safety standards for AI tools used in healthcare. Dr John Mafi, an associate professor at UCLA Health, emphasized the importance of controlled trials to evaluate the benefits and risks of such technologies before they are widely adopted.
Despite the potential benefits of AI in healthcare, experts underscore that these systems should not replace traditional medical judgment. Dr Ethan Goh, director of ARISE, noted that while chatbots can provide valuable information, they cannot substitute for professional medical advice. The findings from this research serve as a reminder that while AI can assist in healthcare delivery, it is crucial to ensure patient safety remains the top priority.
OpenAI has acknowledged the study and expressed a commitment to improving the safety and reliability of ChatGPT Health before expanding its availability. As technology continues to advance, ongoing evaluation and updates will be necessary to ensure that AI tools effectively support, rather than endanger, patient health.
In conclusion, while ChatGPT Health offers a promising approach to providing medical guidance, significant concerns about its triage capabilities and response to critical situations must be addressed to prevent unnecessary harm and ensure patient safety.

Related articles

USF Health Morsani College of Medicine Achieves Top 10 NIH Rankings

The USF Health Morsani College of Medicine has achieved national recognition with two departments ranking in the Top 10 for NIH funding. This accomplishment reflects the college's commitment to cutting-edge research and its role in advancing medical science.

New NIH Funding Policy May Lower Medicine Costs

Recent changes to the National Institutes of Health (NIH) funding policy, which cap indirect costs at 15%, are expected to redirect more funds towards direct research costs. This shift could lead to lower medicine prices by reducing the overall research and development expenses associated with drug production.

Northwestern Study Reveals App's Role in Long COVID Recovery Tracking

A recent study from Northwestern Medicine highlights the effectiveness of a mobile app in tracking recovery patterns among long COVID patients. The app, which allows users to log symptoms and perceived improvements, revealed that recovery is often non-linear, with many patients experiencing fluctuations in their health.

AI Technologies Transforming Public Health Law and Practice

The integration of AI technologies in public health law and practice is increasingly recognized as vital for improving healthcare delivery and policy. With upcoming APHA meetings focusing on these innovations, professionals are encouraged to submit research that highlights AI's role in enhancing public health outcomes.

Electronic Caregiver Enhances Las Cruces as National AI Healthcare Hub

Electronic Caregiver, Inc is expanding its operations in Las Cruces, New Mexico, to establish the Rio Grande Health Technology Corridor. This initiative aims to position the region as a national hub for AI-driven healthcare infrastructure, creating high-wage jobs and integrating advanced technologies in patient care.