ChatGPT for Medical Advice: New Study Raises Serious Concerns

16

A recent study published in Nature casts doubt on the reliability of using ChatGPT for medical guidance, despite its widespread use. More than 230 million people weekly consult the AI chatbot for health-related questions, ranging from simple allergy checks to managing symptoms. However, researchers found that ChatGPT Health routinely underestimates the urgency of serious medical emergencies, sometimes advising patients to wait instead of seeking immediate care.

Emergency Care Misdiagnosis

The study, led by Ashwin Ramaswamy at Mount Sinai in New York, assessed ChatGPT Health’s ability to correctly identify emergency situations across 60 clinical scenarios in 21 medical specialties. While the AI performed well in obvious cases like stroke or severe allergic reactions, it failed to advise emergency care in over half of the genuinely critical cases.

One example highlighted in the research involved an asthma scenario where ChatGPT correctly identified early signs of respiratory failure but still recommended waiting before seeking treatment. This demonstrates a dangerous flaw: the tool struggles when medical danger isn’t immediately apparent.

Suicidal Ideation and Inconsistent Safety Nets

The study also examined ChatGPT Health’s handling of suicidal ideation. Despite being programmed to encourage help-seeking behavior in such cases, the AI’s “safety net” response was inconsistent. The suicide and crisis lifeline banner appeared sporadically, and the model was more reliable in responding to users who hadn’t specified a method of self-harm than to those who had—a counterintuitive and disturbing finding.

Evolving Technology and Unpredictable Performance

Researchers emphasize that AI language models are in constant flux, with frequent updates that can alter performance unpredictably. While they don’t advocate abandoning AI health tools entirely, they strongly caution against relying on them for critical medical decisions. Patients experiencing worsening symptoms (chest pain, shortness of breath, severe allergies, mental status changes) should seek immediate medical attention directly, rather than solely following chatbot advice.

“As a medical student training alongside these tools, it’s clear that AI must be integrated thoughtfully into care, not as a substitute for clinical judgment,” explains Alvira Tyagi, study co-author.

The study underlines that today’s results are not set in stone; ongoing review and testing are crucial to ensure AI improvements translate into safer care. In the rapidly evolving world of AI, trusting your health to a chatbot remains a significant risk.