3 min read

How to make AI work for medicine - STAT

How to make AI work for medicine - STAT unknown

Advancements in LLMs such as ChatGPT and GPT-4 have generated substantial excitement. Many see these models as assistants or even potential replacements for time-intensive tasks, like patient-physician communication through the electronic health record. Designed to serve numerous downstream applications, these models convert data into representations that are useful for multiple tasks. As a result, they have been labeled “foundation models.

Yet a core question remains: As exciting as it is to chat with an AI tool that has read more text than you will in your lifetime, will such models in their current state really transform health care? We think the answer is no. But one approach customized for medicine could.

Largely based on established AI methodologies, the recent success of foundation models is due in large part to their massive scale. Online sources like Wikipedia, Flickr, and YouTube provide a firehose of text, images and video data for training. In June 2022, it was estimated that nearly 500 hours of video data are uploaded to YouTube every minute. The size and breadth of these corpora feed into the ability of foundation models to serve multiple downstream tasks.

However, most health care data are not readily available on the internet, and thus as large as these training sets are there exist blind spots. Model blind spots contribute directly to issues in accuracy. While this will likely improve as models are trained on more health care-specific data, the appetite of these models may exceed available data as early as 2026 for high-quality text.

Simply augmenting existing foundation models with health care data misses a bigger opportunity: Just as existing LLMs obtain useful representations from text to inform downstream applications in dialogue, a health care-specific foundation model could be used to represent data collected from the electronic health record and other digital health data at scale. Applied to smaller datasets more typical of clinical research, the model’s output could be used downstream in applications unique to health (e.g., predicting outcomes).

Take the example of wearables that have grown in popularity over the past decade. Clinicians have been drowning in data from these sensors, presented by their eager patients, with little to show for it. A clinician cannot read every heartbeat or footstep collected by your FitBit, but a foundation model can. A health care-specific foundation model could learn to capture relationships among physiological signals from larger datasets and then be fine-tuned on smaller datasets to alert individuals when something seems wrong, such as a sharp change in blood glucose during periods of exercise.

To realize health care-specific foundation models, we are going to need a lot of data. At our academic medical center, there are records associated with more than 4 million patients. Yet even if each patient generated a book’s worth of text (a gross overestimate), this is far less data than what is currently used to train existing foundation models. Moreover, there would be entire “chapters” of health experiences missing as individuals moved across health systems.

To make the most of the data we have, we need approaches to facilitate data sharing. When it is not possible to share data, methods like federated learning, in which data are not directly shared but used to update models in a decentralized fashion, are a promising substitute.

And data alone is not enough; we need to think carefully about model training. Foundation models can be trained using a method called “self-supervision” in a way that does not require human annotated labels. For example, GPT-4 was trained by teaching itself to predict the next word in a sentence by randomly hiding words in the input data. While the idea of self-supervision is appealing, it’s not obvious what forms of self-supervision will serve the many potential downstream tasks in health care. Predicting the next word in a sentence makes sense in the context of language generation, but does not immediately apply to multimodal health data (e.g., physiological waveforms). This is because of the multiple sources of data involved in health care, as well as the knowledge and deep understanding required to make medical decisions.

While significant challenges exist, there is additional value in moving toward a shared foundation model in health care. If done right, the problem of regulatory oversight could become easier when thousands of models all depend on a shared foundation. The foundation itself creates a target for regulation and mitigation efforts.

Today, AI in health care is splintered at best. Patient data are confined to individual health systems that are left on their own to develop, validate and deploy AI tools. A shared starting point could help level the playing field and fulfill the promise of AI in health care.

Jenna Wiens is an associate professor of computer science and engineering, associate director of the Michigan Artificial Intelligence Lab, and co-director of Precision Health at the University of Michigan. Rada Mihalcea is the Janice M. Jenkins collegiate professor of computer science at the University of Michigan and director of the Michigan AI Lab. Brahmajee K. Nallamothu is a professor of internal medicine in the Division of Cardiovascular Medicine at the University of Michigan Medical School.