ChatGPT Can Help Doctors—and Hurt Patients
ChatGPT Can Help Doctors—and Hurt Patients unknown
Robert Pearl, a professor at Stanford medical school, was previously CEO of Kaiser Permanente, a US medical group with more than 12 million patients. If he was still in charge, he’d insist that all of its 24,000 physicians start using ChatGPT in their practice now.
“I think it will be more important to doctors than the stethoscope was in the past,” Pearl says. “No physician who practices high-quality medicine will do so without accessing ChatGPT or other forms of generative AI.”
Pearl no longer practices medicine but says he knows physicians using ChatGPT to summarize patient care, write letters, and even—when stumped—ask for ideas on how to diagnose patients. He suspects doctors will discover hundreds of thousands of useful applications of the bot for the betterment of human health.
As technology like OpenAI’s ChatGPT challenges the supremacy of Google search and triggers talks of industry transformation, language models are starting to show the ability to take on tasks previously reserved for white-collar workers like programmers, lawyers, and doctors. That has sparked conversations among doctors about how the tech can help them serve patients. Medical professionals hope language models can unearth information in digital health records or supply patients with summaries of lengthy, technical notes, but there’s also fear they can fool doctors or provide inaccurate responses that lead to an incorrect diagnosis or treatment plan.
Companies developing AI technology have made medical school exams a benchmark in the competition to build more capable systems. Last year, Microsoft Research introduced BioGPT, a language model that achieved high marks on a range of medical tasks, and a paper from OpenAI, Massachusetts General Hospital, and AnsibleHealth claimed that ChatGPT can meet or exceed the 60 percent passing score of the US Medical Licensing Exam. Weeks later, Google and DeepMind researchers introduced Med-PaLM, which achieved 67 percent accuracy on the same test, although they also wrote that, while encouraging, their results “remain inferior to clinicians.” Microsoft and one of the world’s largest health care software providers, Epic Systems, have announced plans to use OpenAI’s GPT-4, which underpins ChatGPT, to search for trends in electronic health records.
Heather Mattie, a lecturer in public health at Harvard University who studies the impact of AI on health care, was impressed the first time she used ChatGPT. She asked for a summary of how modeling social connections has been used to study HIV, a topic she researches. Eventually the model touched on subjects outside of her knowledge, and she could no longer discern whether it was factual. She found herself wondering how ChatGPT reconciles two completely different or opposing conclusions from medical papers, and who determines whether an answer is suitable or harmful.
Mattie now describes herself as less pessimistic than she was after that early experience. It can be a useful tool for tasks like summarizing text, she says, so long as the user knows that the bot may not be 100 percent correct and can generate biased results. She particularly worries about how ChatGPT treats diagnostic tools for cardiovascular disease and intensive care injury scoring, which have track records of race and gender bias. But she remains cautious about ChatGPT in a clinical setting, because sometimes it fabricates facts and doesn’t make clear when the information it is drawing on dates from.