
A Penn State study tested ChatGPT, Google Gemini, and Meta LLaMA on real patient queries. Nine physicians evaluated the responses and found the large language models produced medically valid answers 76% of the time. The models performed well on primary care and differential diagnosis but struggled with dermatology and mental health.
Tap to vote and see what everyone thinks.