33 min read

Medicine Machines @ NEJM

A nice NEJM Podcast at the intersection of Machine Learning, Population Health and Medicine.

audio-thumbnail
Ziad Obermeyer at the New England Journal of Medicines
0:00
/50:25

Ziad Obermeyer: A lot of the things that I struggled with clinically, like figuring out, “OK, well, who is at high enough risk that they need to be in the hospital, and not out of the hospital?” That turns out to be a great machine-learning problem, because machine learning is really good at predicting a thing — so predicting who’s going to have a bad outcome on the basis of everything we see about someone now, great machine-learning problem. Diagnosis is a great machine-learning problem. So what is diagnosis but saying, “OK, I observe a physical exam, an electrocardiogram, some lab results, the way the patient’s . . . and now, I need to estimate the probability of acute coronary syndrome, pulmonary embolus, aortic dissection, etc., etc.”

So a lot of the core tasks that doctors are doing — in the emergency department, but everywhere — is the set of things that machine learning is really, really good at doing.

Lisa Rosenbaum: “Not otherwise specified” has always been one of my favorite phrases in medicine, not just because it’s a fancy way of saying, “We don’t really understand the root cause of something,” but also because it captures the human impulse to put tidy labels on things that remain largely unknown. In “NOS,” I talk with some of medicine’s most innovative thinkers to probe some of these messy unknowns behind our health care system, its players, and the stories that shape their lives. “NOS” makes time for the types of in-depth conversations that may not leave us with easy answers, but that shed fresh light on medicine’s toughest challenges, as well as the people envisioning its future. I’m Lisa Rosenbaum, and you’re listening to “Not Otherwise Specified” from the New England Journal of Medicine.

My guest today is Ziad Obermeyer. Ziad is a professor of health policy and management at UC Berkeley’s School of Public Health, an emergency medicine physician, and also a scientist who’s doing some of the most innovative and impactful work at the intersection of machine learning, medicine, and population health. Ziad’s always been interested in how physicians think and how we might make better decisions for our patients, given the limits of our own minds and all the data that we all must process all the time. Some of his work, then, has used machine-learning techniques to create new algorithms to help us overcome our own biases, and some of his work has actually done the opposite — using machine learning to identify biases in existing algorithms that are affecting the lives of millions of patients.

In 2019, he and his colleagues published a paper in Science which identified racial bias in a widely used population health algorithm that was intended to identify patients with complex health needs, but instead was actually worsening health inequities. After they identified the bias, they then worked with the company to remove it. I remember when I read the paper, and then learned about the collaboration between the scientists and the company, thinking to myself, “Wow, Ziad’s work is actually making the world a better place, and that is so cool.” In the 3 years since, he’s not only gone on to do highly impactful work, but it turns out I definitely was not the only person who was super-impressed. Ziad’s become a bit of a hot commodity.

Ziad, thank you so much for taking the time to be with us today. I’m excited, as always, to get to talk to you about your work and how you get your ideas, but also about how you became who you are, what drives you, and also how you bring your background in philosophy and history of medicine to a very pragmatic and data-driven approach to improving health care. So welcome to the show.

Ziad Obermeyer: Thank you so much for that very kind introduction. It’s always a pleasure to talk to you, Lisa. Even when some other people are listening, it’s always a treat.

Lisa Rosenbaum: I do want to start at the beginning of your trajectory in medicine. So how did you decide on emergency medicine, and then what was residency like for you?

Ziad Obermeyer: I decided on emergency medicine for two reasons. One, there was an intellectual reason, which was that I was really interested in global health at the time, and it felt like emergency medicine was a great way to get exposed to a cross-section of lots of interesting parts of medicine that were high acuity and really urgent and important. There was another part of me that was just responding to the fact that when I was in the hospital, all of the memories that I liked the most, or almost all of them, were from the ER — excuse me, the ED.

Lisa Rosenbaum: Oh yeah, we can’t say that.

Ziad Obermeyer: Yes. Sorry. It’s more than just a room. I think those were the reasons, but I think when I . . . . So, when I started residency, my first 2 years of residency, I would say that I — I had been a pretty bad medical student, because I had a bad attitude, and I was not excited about memorizing stuff. And I think in my first couple of years of residency, I was also a bad intern and a bad second-year resident. I think in part it was because I’d come into residency from this global health policy background. And I think part of me was still in this “thinking big thoughts” mode, where really like, what was the problem with the health care system? Well, it was incentives and waste . . . and actually, especially, coming from a global health perspective, we spent so much over here, but we’re spending so little over here. So I think when I was an intern or a second-year resident, and people would want to order a CT scan to look for pulmonary embolus, I’d just roll my eyes, or an MRI, and I’d be like, “Oh, I can’t believe this guy wants me to do another high-cost, low-value test.” You can imagine what a joy I was to be around for my poor attendings. And I think I look back on that period, and I really feel bad, because I think I just wasn’t very good, and I was making a lot of decisions that were just not good, and I was trying to do less.

I think when you have that attitude, and you’re paying attention, you just see that things fall through the cracks because yes, a lot of tests come back negative, but some don’t. So I think especially being in the emergency setting, and seeing the huge value that you get from diagnosing an acute coronary syndrome, a PE, something else, it just becomes really clear that it’s not so much about doing less or doing more. It’s about doing the right thing for the right patient. I think even as I started to get better over my second, third, fourth years of residency, I still struggled a lot with those kinds of decisions — whom to test, whom to admit to the hospital, who is safe to send home.

And I’d just go home after shifts, and I would replay things in my head. I’d be agonizing about all of these decisions that you have to make in the emergency department every day, because you’re seeing 20 to 30 patients a day, and all of them are just full of these difficult decisions. So it’s a really hard job, and it’s a very stressful job.

Lisa Rosenbaum: But what do you think shifted in you? I mean, I understand, because so much of what you’re saying, I think, is the foundation for some of the work, at least, that you’ve done, and I want to talk about that. But on a personal level, you were just like sort of being a jerk, and then you weren’t. Was it because somebody talked to you, or because you had a massive misdiagnosis, or do you think you just sort of grew up?

Ziad Obermeyer: I mean, I think growing up is a great — growing up is the perfect description of it. I think part of it is related to the structure of residency, where even bad residents get promoted from PGY-2 to PGY-3. They were stuck with me in residency, and there’s a limited supply of warm bodies around to do the PGY-3 job. I think it was, like many things in life, there’s an equilibrium where you are doing a bad job, you’re not trusted with responsibility, you’re just not great. And then there’s an equilibrium where you are learning a lot, you’re trusted to do more things, and so transitioning from one to the other, I think, is always multifactorial.

It’s like transitioning from sickness to health or health to sickness. There are just lots of little things that add up, but I was so lucky to have great teachers around me that were constantly just pointing at these little things that were important like, “Oh, did you notice that this patient had a rash?” I’m like, “No, I didn’t, because I didn’t actually examine their feet.” I think that if you’re paying attention, you just start noticing all these things that really matter, and at some point, it confronts you like, “This is a really important job, and it’s really important to do it well.”

And I think that was probably the realization that for me was the most important is actually taking pride in the work, and seeing that if you did this job well, you had this huge impact on people’s lives that was incredibly positive. And if you did it poorly, and if you were mailing it in, it was just like why are you even there? So I think growing up is a great description, and I think it was realizing that medicine is a great and really important job, and is worth doing well.

Lisa Rosenbaum: Let’s talk a little bit more about then how you evolved intellectually as far as thinking about the value of testing. I think, was it during your residency or soon after, you got a huge young, like early-career investigator award, right?

Ziad Obermeyer: That’s right. The NIH has a small percent of its portfolio dedicated to what they call high-risk, high-reward research, and so this is . . . . It works differently from a normal NIH grant in the sense that you — they don’t say this, exactly, but you get 5 years of support to just grow and learn and explore lots of different things. I think in many ways, they’re explicitly betting on you as a person rather than you as a lump of research that you’re producing, and they’re paying you a certain amount of money. And that was good for me, because many of the things that I proposed to do just turned out to be not good ideas, or not feasible, or things like that.

But that was transformative for me, because it gave me a few years of breathing room to just think and learn and start building out the collaborative networks that are still incredibly productive and high-value to me now. So I started off doing work that — the project that I proposed was about people who get sent home from the emergency department and drop dead in the next few days. And I was very directly inspired by a lot of my anxiety and stress in the ED, because most people you send home. Then you just never figure out what happened to them. So I would — after my shift, after I started having a better attitude, and growing up, and being a better doctor, I would just spend a lot of, like a few times a month, I would just go back through lists of every patient that I had seen on a shift, and I’d just open up their chart and see what happened to them. I think that’s one thing that — it’s a weird fact about our electronic health records is that we have so many things that we could learn from. Like it would be great to have some structured way to learn from all the patients you see. But how do you actually learn? Well, if a colleague happens to see you, and they’re like, “Oh, hey, remember that patient you saw?” And you’re like, “Oh my goodness, this is —”

Lisa Rosenbaum: The worst, the worst feeling.

Ziad Obermeyer: The worst. Yeah, no one ever is like, “yeah, everything happened exactly the way you thought it was going to be, and the patient’s doing fine.”

Lisa Rosenbaum: No, when someone says, “Hey, remember?,” you just, you know.

Ziad Obermeyer: It’s terrifying. But so that we learn in this incredibly ad hoc way, or in this painful way where you have to type in every MRN, and then look at the . . . . So the project that I was doing was basically taking advantage of a linkage between the electronic health record and Social Security data that told you whether or not someone had died, and specifically whether or not someone had died outside of the hospital. When I first started doing this project, I’d look at the charts of patients who had died not at the hospital that I was working in, but somewhere else at another hospital or out of the hospital. And there’d just be these notes from the clinic manager like, “Patient did not keep appointment. Patient did not keep appointment,” and they had no idea.

I think it speaks to how the fragmentation of our health record really inhibits both good clinical care, because you can’t see what tests the patient had, but also inhibits learning because you don’t see what happened in a lot of cases, and in fact some of the most important cases. What we found was that this happened, I think, more often than I would’ve thought. So when we looked at Medicare data, and you look at generally healthy people that are just ambulatory who come to the emergency department and get a diagnosis that doesn’t look like a bad diagnosis. We just found that people died after being sent home from the ED more often than we would’ve thought.

I think that that honestly raised a lot more questions for me than it answered, in the sense that, well, what you’d really want to know is like, “Well, what did they have? What could I have done differently?” I think that’s why a lot of the things that I produced in that phase of my research were interesting, and they prompted a lot of future projects but weren’t ultimately that satisfying on their own. So, if I were putting myself in the shoes of the people at NIH that handed me this check, I don’t know, I’ve been like, “Well, this guy didn’t produce that much of value.” But it produced a lot of learning, which is — I’m sure they’re very happy about that, but I learned a lot.

But it did very much set me on the path of research that I’m doing today. So even if you look at that grant, and look at the outputs, you’re like, “Eh, this is a little thin.” I think everything that I’m doing now is because the NIH essentially took a chance on me through that portfolio of grants from the Office of the Director, that was Francis Collins at the time.

Lisa Rosenbaum: Wow. Is that because of the networks you established, in terms of the people you collaborate with now, and the development of machine-learning techniques, or something else?

Ziad Obermeyer: Yeah, all of those things. I think what I realized during that period was that a lot of the things that I struggled with clinically like figuring out, “OK, well, who is at high enough risk that they need to be in the hospital, and not out of the hospital?” That turns out to be a great machine-learning problem, because machine learning is really good at predicting a thing, so predicting who’s going to have a bad outcome on the basis of everything we see about someone now — great machine-learning problem. Diagnosis is a great machine-learning problem. So what is diagnosis but saying, “OK, I observe a physical exam, an electrocardiogram, some lab results, the way the patient . . . and now, I need to estimate the probability of acute coronary syndrome, pulmonary embolus, aortic dissection, etc., etc.”

So a lot of the core tasks that doctors are doing — in the emergency department, but everywhere — is the set of things that machine learning is really, really good at doing. So it took me a while to start thinking about these problems that way, but that was the period where I was starting to think about these things and think about the really productive ways in which you could use machine learning to help doctors and others in the health care system make better decisions.

Lisa Rosenbaum: Then, I know you published, because it’s one of my favorite studies, in Science, I think around 2018, about end-of-life care and costs and our predictive abilities in terms of who will live and who will die. Can you talk a little bit about the narrative around end-of-life care at the time, how you got the idea to do the study, what you found, and then also whether you think it impacted the narrative about the fact that we were wasting so much money at the end of life?

Ziad Obermeyer: I’d been really interested in end-of-life care because one of the — I mean, it’s a really crazy thing about our job in medicine is you watch a lot of people die. It’s a crazy thing. It’s not an experience that most people, thankfully, have, but you spend a lot of time with death. I think especially in the emergency department, you spend a lot of time with people who have these chronic conditions. They’ve got end-stage cancer, and yet they come in, and everyone is just completely unprepared for it. No one’s talked about it, and it’s really hard to have those conversations.

So I think there’s a movement to try to find ways to have those conversations earlier and better. I think that movement is still absolutely right, and I think that the goal of better and more satisfying and better planned end-of-life care just is 110% correct. I think there’s a narrative that’s in parallel to that one which is saying that in addition to that being a great thing to do clinically, humanistically, whatever, that could also save us a ton of money, because if you just add up all of the health care that is delivered in the last year of life — so you look at people who die, and then you just turn the clock back a year — it’s an enormous fraction of our medical expenditures, even though only a very small fraction of people die.

So I think the reason to do this is not just a humanistic endeavor related to a good death, etc. It’s also, well, because this is a huge source of expenditures and potentially waste. So, naturally — because this is something that’s like, “Well, if only I’d known a few months ago that death was imminent . . . ” — that also seems like a great machine-learning problem. So I was very optimistic about the role of machine learning to both facilitate these conversations early, but also to save us a lot of money on futile health care. So working with some coauthors, we published the study that you mentioned, where we just tried to predict who was going to die.

This is in Medicare data. This is most of the deaths are covered by Medicare. We tried to predict who was going to die with machine learning, and then figure out, “OK, well, if we knew at a certain point how many dollars were downstream of when we knew above a certain threshold that someone was going to die. . . .” And one of the really surprising things that completely changed my mind about how I was thinking about end-of-life care and the savings associated with it was that it turns out to just be really hard to predict well enough to save the kinds of money that people were hoping for by turning down the knob on end-of-life care.

And that’s because not only do you need to know who’s going to die, you also need to know when they’re going to die. And it turns out that even if you had a near-perfect algorithm on any reasonable metric, like algorithms approaching the things that large tech companies have in practice for other kinds of outcomes, you’ll just never get to the point where the threshold of predictability is crossed and there’s a substantial amount of dollars on the other end of that. So it really changed my mind about the potential of better end-of-life care to save a lot of money. It didn’t change my mind about whether that was worth doing, because it’s absolutely worth doing for reasons completely unrelated to money.

Lisa Rosenbaum: Right, there’s a moral case and a financial case, and those get conflated, I think. I totally agree.

Ziad Obermeyer: Yeah, perfectly put.

Lisa Rosenbaum: I just want to back up, because every time I read one of your studies, I understand what you find, but I don’t understand this idea of how you train an algorithm to be better than we are. What does that actually mean? So for instance, if a patient comes in, if I see a patient in the emergency room, for instance, who’s had seven admissions in the last year for heart failure, and is extremely frail, and also lives alone and is 85, etc., obviously, I have a sense that that patient’s probably not going to do well. But what you’re finding in your data is that there’s an algorithm that’s going to know something that I can’t see, or that I can’t glean from the chart. And what I just never can quite wrap my head around is how can you overcome what’s wrong with my mind if something like my mind has to teach the algorithm?

Ziad Obermeyer: It’s a great question, and I think that the way a lot of machine learning works, in medicine and other places, is to say, “OK, I’m going to take this complex piece of data — so in medicine, let’s say it’s every part of the electronic health record before that patient walks into the emergency department — and then I’m going to predict something.” What that something is turns out to be really, really important, because you could predict, in this case, one of two things. One is, “Does Lisa think this person is going to die in the next month or 6 months or something like that?” You can get algorithms to predict human judgment.

So, in that case, algorithms learn from humans, which can be very useful in some settings. Or you can train algorithms to learn from patients and from nature and from what actually happens. And I think that that’s the way that algorithms can add an enormous amount of value over human judgment is when they’re learning from nature and not from humans. So an algorithm can look through that mass of electronic health record data leading up to the emergency department visit, and look at subtle trends in the potassium or the sodium or the relationship between those two things, or the relationship between those things and the electrocardiogram waveform.

And it can tie all of those together to make a prediction on whether that patient is going to die in the next however many days. And by circumventing the human judgment, like your judgment about whether or not that patient’s going to die, it can, in some cases, add a lot of value to the human decisions that get made that depend on whether or not that patient is going to die. For example, should I have that conversation with them? Should they get their finances in order? Should they make sure they have a will, etc., etc.?

Lisa Rosenbaum: OK, I have two questions related to that, then. First of all, we all know that a lot of the EHR is just crap. So you have all those objective data — the labs, for instance, the tests, although the reads of the tests, I assume, can be variably reliable — but then you have just so much copied and pasted nonsense. So I assume . . . I guess, how do you avoid the garbage in, garbage out problem? That’s one question. Then the other is once you have your algorithm that’s better than you are, can it just tell you what you should be paying attention to instead?

Ziad Obermeyer: Let’s start with the garbage in, garbage out problem is mostly a statement about what the algorithm is predicting. So if we’re predicting whether or not someone is going to die in the next month, now imagine you’ve given the algorithm the goal. So the algorithm’s goal is to tell you, “OK, well, what’s the probability this person’s going to die in the next month?” And it’s able to learn from thousands or millions of patients that it’s seen, and some of them die and others don’t. When the algorithm is sorting through all of this EHR data to figure out, “OK, well, what’s correlated with that a given person dying or not dying?” If there’s a bunch of copy-pasted garbage that is not useful, well then the algorithm is going to disregard it, because it’s not helpful for predicting the thing.

Lisa Rosenbaum: I see.

Ziad Obermeyer: So the huge strength of these algorithms is that we don’t need to tell it, “Oh, don’t pay attention to this. Pay attention to this.” And this is just a comment about the enormous computational advances that we’ve made. The algorithm’s strength is its ability to distinguish the garbage from the gold, and use the gold, and disregard the garbage in making its predictions. And the whole, I think, special sauce of machine learning is exactly that task of figuring out exactly what’s useful for predicting the thing you want it to predict, and what’s not. Now, if the thing you want it to predict is not the right thing to predict, then it runs into a lot of problems.

And in fact, a lot of my work on algorithmic bias and how problems get into algorithms is when the humans that build the algorithms tell the algorithm to predict the wrong thing — wrong in the sense that it’s not the thing we really care about like, “Is someone going to get sick?” But a proxy measure like, “Is someone going to decide to spend money on this person because they think they’re sick?” And those two things are very different, and that’s where a lot of problems creep into algorithms.

Lisa Rosenbaum: Maybe we should talk a little bit about the data you have on cardiovascular testing, because I will be able also to understand, but my sense of those data — I think I heard you present them even a few years ago at NBER. Well, you can talk about it, but — I think that my sense of the bottom line is like we’re undertesting in high-risk people, and we’re overtesting in low-risk people, and your algorithm can better predict than we can. So obviously as a cardiologist, I want the algorithm to tell me who I ought to be testing. So that gets back to that second part of the last question. What can we as cardiologists learn, please help us, about the people who most need to be tested?

Ziad Obermeyer: I think this is, I think, a tool that is very salient and relevant to people who do any work in the emergency department, because one of the things that, in residency and practice, you’re always thinking about is, “Is this person in front of me having a heart attack?” Heart attack as, of course, the medical listeners know, it doesn’t always look like it does on television. It’s not always a middle-aged, White man clutching his chest. Sometimes it’s women. Sometimes they’re not White. Sometimes they have different symptoms. And so it’s really a dilemma.

And it’s a dilemma because the definitive tests that we use to test for heart attack — catheterization, in particular, it’s an invasive procedure, it has risks, it’s expensive, it’s uncomfortable, and so you can’t just do that on everyone who has a little bit of nausea or mild chest pain or whatever. On the other hand, you don’t want to miss someone who’s having a heart attack in the ED, so it’s a really tough decision, because on the one hand, you can burn a lot of money, which is what cardiologists often accuse us of doing in the emergency department, but you’re stuck because you also don’t want to miss these cases where you can add a huge amount of value.

So I think everyone who’s practiced in the emergency department has a story of someone that they thought was — I remember this in particular, because this was in my first year or two as an attending at the Brigham, and I’d worked an overnight, and I’d seen this woman who was just tired, and she was having a hard time walking around. I saw her overnight. It was busy. I was like, “Well, I think she’s just like — she’s deconditioned. She just hasn’t been leaving the house a lot recently,” and so I signed her out to my department chair, Ron Walls — I don’t know if you ever met him — on Thursday morning.

Lisa Rosenbaum: Oh yeah.

Ziad Obermeyer: I was like, “I think she’s just deconditioned.” Anyway, it was very lucky that Ron is extremely good at this job, because something about the story didn’t sound right to him. And he went into the room, and he was like, “No, I think she needs a stress test.” She had a grossly positive stress test, ended up with three stents the same day.

Lisa Rosenbaum: Yikes.

Ziad Obermeyer: It’s just — and at the same time, I’ve ordered a lot of stress tests, and sent people for catheterization that came back negative. So everyone has these two experiences of ordering a lot of tests that are negative, and also missing or coming close to missing things that would’ve been catastrophic that you should have tested. So that’s the dilemma. So what we did was we built an algorithm using data from the hospital where I was practicing, so I’m in the data set. And what we did was we had it look at every single patient over a long period who had been catheterized. So these are patients that doctors were worried enough about to send the catheterization in the few days after their emergency department visit, mostly in the same day or the day after.

And it just learned well, what are the things that make a positive cath more likely, and what are the things that don’t? So the algorithm learned, I’d say, from the result of the test, and it learned to predict who was going to end up with a stent and who wasn’t, effectively. And then we took that algorithm, and we deployed it on a new set of patients that it had never seen. And we just compared what the algorithm was seeing to what the doctor was doing, and then what happened to the patient. So the first finding, as you alluded to, we do a lot of tests that are predictively negative using data that we knew from the triage desk.

So when we look at all of the patients that doctors decided to test with these invasive, high-cost tests, about two thirds, the algorithm says, “Don’t do this test. You’re never going to find anything.” And those tests largely come out to be not just low-value, but just extraordinarily low-value, and we knew that or the algorithm knew that from the beginning, from the triage desk. So first finding, not that surprising, doctors do a lot of tests on low-risk patients that they shouldn’t do. Those tests come out to be negative. Not a big surprise to anyone who’s read the literature or who’s practiced in the ED or in the cath lab.

I think the second finding was certainly much more surprising to me, which is that now we can get the algorithm to also look at the high-risk people, and about half of the very, very high-risk people doctors don’t test. That’s interesting, because it also suggests that at the same time that there is overtesting, there also could be undertesting of very high-risk people. I’ll just note parenthetically that you can come up with a great story for why incentives push doctors to overtest low-risk patients, whether it’s malpractice or financial to the hospital, to the doctor. It is very hard to come up with an incentive story for why doctors are failing to test high-risk patients.

So when we look at those high-risk patients that doctors don’t test and also don’t diagnose with anything related to heart attack, and in a lot of cases don’t even do an electrocardiogram or a troponin on — the most basic screening tests that anyone that you suspect of heart attack would have — those people go on to have adverse events in the month after their emergency department visit at extraordinarily high rates — suggesting that they’re genuinely high-risk, the algorithm is seeing something that the doctors are missing. Then the last piece of evidence is that when we look at — so there’s a little experiment that happens when anyone walks into the ED, and that experiment is: Which providers do you see?

Some providers, it turns out, like to test more than others, and I think we all know this from just being in the hospital. Some people will test more. Other people test less. It turns out when those high-risk patients walk into the ED on a shift staffed by people who test more, they do a lot better than if they walk into the same ED on the same day, like a Monday afternoon, and they see a team of providers who tests less. So, there’s a fairly large mortality difference for those high-risk patients, and only those high-risk patients, depending on who they see in the ED and how much they get tested.

That was the last piece of evidence that really convinced me that there was a real story here around undertesting of high-risk patients. So we’re taking the results of this study, and we’re working with a large health care system called Providence on the West Coast. They have like —

Lisa Rosenbaum: Oh . . .

Ziad Obermeyer: Oh, you know them?

Lisa Rosenbaum: I mean, I’m from Oregon, so yes.

Ziad Obermeyer: Yeah, you know them. They are, just parenthetically, it is a fantastic hospital system. It’s really, when I think about hospitals that are on the frontier of adopting data science and technology in a safe and responsible way, they are an amazing system, so it’s really an honor to work with them. And what we’re doing is we’re rebuilding that algorithm inside of their system, and we’re going to deploy it as a randomized trial in some of their emergency departments and not others. And we’re just going to see if the results that we got on paper hold up in practice in this rigorous evaluation.

Lisa Rosenbaum: Wow, that is real march-of-science work from — I mean, you created the algorithm. You then validated it, right?, with . . . Am I . . . Is that correct? And now you’re doing an RCT with it.

Ziad Obermeyer: Yeah, and I think it’s something that I’ve come to realize is that it — we get very good at publishing papers, and publishing papers is important because there’s a process and peer review as . . . it has its flaws for sure, but peer review is just . . . it’s an important part of the process, but it’s not the end of the process. And I think I used to have a model where it’s like, “Well, my job is done when I publish the paper, and now it’s up to someone else to turn that paper into a real thing.” And as I’ve, I think, grown up as a researcher, I think I realized that there’s no one better to do the applied follow-up stuff from the paper than the people who wrote the paper. So that’s how I’m thinking about it now, and it’s a very satisfying way to think about research, just doing it from end to end.

Lisa Rosenbaum: Oh, that must be so fulfilling. My mind is going in two directions. One is I want to try to better understand, as a cardiologist who sees patients, how you get past . . . . I mean, physicians hate decision aids, for instance. So how do you get past the physician ego, essentially, of like, “Here’s this algorithm that’s better than you, possibly?” I want to talk about that, but then I want to come back, so we don’t forget, to this whole idea of the incentives around research, because you’ve now touched on a couple times this idea that somebody believed in you or a group of people believed in you, gave you time to not necessarily publish a ton of stuff, but to learn to think.

Then now, you’re able to winnow your focus, to see something through that really is having tremendous impact. I mean, we haven’t even talked about two of your biggest studies that have really addressed structural racism in our system in two different ways, so hopefully we can get to that. But how do you earn the goodwill of the doctors by introducing an algorithm? And then I would love to hear your thoughts about incentive systems around research production.

Ziad Obermeyer: All great questions. On the doctor interaction front, I think there’s a conventional wisdom that doctors don’t want help. I think that that is largely dependent on the kind of “help,” in air quotes, that you’re talking about. There are a lot of really bad decision aids out there. There are ones that you’ve already made your decision to do some kind of radiological study, and as you’re putting in the order, a dialogue box pops up and says, “Did you know that CT scans involve radiation?” It’s like, “I do know, and I decide. . . .” So you’re going to click whatever you need to click to just keep doing the thing that you decided to do.

And I think that there’s actually evidence that when these kinds of decision aids provide valuable information, doctors really like it. There was a study, I think at Children’s, that took pharmacogenomic data and presented that to doctors, and saying, “Oh, you were going to write this dose, but this patient has this mutation that should make you write this dose of the medication instead.” And doctors uniformly adopt that recommendation, because it’s actually useful. It tells them something that they don’t already know.

Lisa Rosenbaum: Got it.

Ziad Obermeyer: I think that’s the perspective that I went into the design process with is like, “How can we help doctors do the thing that they already want to do?” And I think doctors don’t actually want to test a bunch of low-risk patients. I think that the financial incentives are completely overblown and very far from anyone’s mind when they’re actually practicing in the ED. What you want to do is just not miss heart attacks. So if you can help the doctors process this enormous amount of information overload that’s in the chart and say, “Well, I’m going to surface the things that the algorithm is using to make its prediction.”

For example, this person had two negative catheterizations in the past 3 years, so we can surface that information, and we can help the doctor document, so we can create dot phrases in the electronic health record that pull in that information. So the doctor can just have a dot phrase that says, “Yes, I understand this patient is blank years old, and has a chief complaint of blank, but they also have these five low-risk factors so on balance, I don’t think this is an acute coronary syndrome, and I will not keep them for testing.” Adding value in that way is also — I think that’s the way to think about it.

Likewise for high-risk patients, doctors don’t want to miss high-risk patients. They don’t want to have a patient that they forgot to do an ECG on or a troponin on. So helping doctors realize that it’s busy, but think about doing this test on this patient. If you don’t want to, that’s fine too. We have to respect doctors’ autonomy, because the doctor — as we show in the paper, the doctor often knows a lot more than the algorithm, and so you’d never want a situation where you take away the physician’s autonomy, because the physician knows a ton of things that the algorithm doesn’t about how the patient looks, how they answer questions, what the results of testing in the ED are.

So I think coming into this with respect for what the physician knows and with a mentality of being helpful rather than being annoying has already gotten us a long way as we talk to doctors in the system, and try to get this thing implemented.

Lisa Rosenbaum: That’s another thing that I have just loved so much about your work, because the assumptions about physician intentions are so negative. I mean, the narrative around financial bias being the driving force of all of our behavior has been pervasive for so long. And I understand, obviously, partly because financial incentives do shape behavior, and also because we can measure them. So it’s just the lowest-hanging fruit in terms of studying anything, but your work is some of the only that I’ve come across, and especially with that study you did about end-of-life spending that said, “Oh, hey, actually, maybe we are spending a lot on these patients, because we think they might live, and they do indeed live.”

That was so refreshing, and I hope that in addition to everything else you’re doing, you continue to use your machine-learning techniques to get inside our minds and reveal what’s actually going on, because I just think that’s so cool.

Ziad Obermeyer: Thanks for — I totally agree with you. I think that there’s this. . . . In part, I think both the problem and the solution are coming from the field of economics. I think one of the problems is that for a long time, because incentives, as you said, are the thing we can measure and the thing we understand, we look at everything through the lens of incentives. So there’s this view of doctors as these evil, profit-seeking geniuses that are just like they know exactly who’s at what risk, and they’re testing the high-risk patients, but then they’re also dipping down into the medium-risk patients, because that’s the financially optimal thing to do.

And it’s so crazy when you actually are the one making these decisions. And not to say that financial incentives don’t matter. They matter, but as a fraction — as the explanatory power for the decisions that the doctors actually make, these things are very, very small. They’re visible, and they’re there, but the dominant thing is that medicine is just really hard, and you make a lot of mistakes. I think that there’s actually, I think, to the credit of economists, there’s a bunch of recent work that is showing exactly that. Janet Currie at Princeton has these beautiful studies on simultaneous over- and underuse.

I think there is — Jason Abaluck and Leila Agha also have a paper on over- and undertesting for pulmonary embolus. So I think that as economics has started to acknowledge a huge role for human error and biases and mistakes, I think that is now starting to come into medicine. And I think that really resonates with anyone who’s ever practiced medicine, because we are very far from perfect, and I think we need all the help we can get.

Lisa Rosenbaum: I agree with you. Maybe let’s talk about the incentives around research, because I’m sure you . . . since we’re talking about incentives anyway, and I’m sure you’ve thought a lot about it. What can we do as a profession to stop incentivizing quantity over quality? I mean, you really have had such an ideal trajectory, in that I sense from you, and I don’t want to put words in your mouth, that, I mean, you’re not only doing impactful work, but you’re doing exactly what you want to do every day. You’re passionate about it, and not every . . . yeah, go ahead.

Ziad Obermeyer: Well, thanks for saying so. That’s exactly right. I think, maybe in the spirit of our conversation about doctor incentives, I think here, of course, there’s an incentive structure that currently exists, but I think that incentive structure is in part the result of a shortcoming in education. So one very weird thing about medicine is that there’s no Ph.D. in medicine. So it’s like there’re . . . people get M.D./Ph.D.s, but then they have a Ph.D. in another field, and they largely just do work that is that field, and they also happen to see some patients on the side.

So it’s weird that we don’t get any actual training in research, and we’re just expected to pick it up as we go along. And I think that that’s not how good research generally happens. Having now spent a lot of time with a lot of people who not only had the benefit of a Ph.D., but then also worked with a ton of really smart people afterwards and learned from them, it’s hard to produce good research, and it takes a lot of training and investment. And if you contrast that to medicine, you’re just expected to pick it up as you go. So I think that that’s almost — and I think that’s why we end up with the system that incentivizes quantity over quality. It’s because we have a hard time evaluating the quality of research, because we haven’t been trained in how to do it. So it’s a lot easier to have a checklist or a threshold, or even better, count up NIH grant dollars alongside the number of publications, and this plus that equals tenure.

I think this incentive system is itself the result of a shortfall in medical education. I don’t think every doctor needs to know how to do research, but I think there should be a better way for people who are interested in clinical research, like the research that I do, that I’m very passionate about and that we’ve been talking about — how do you learn how to do that? There’s no easy way to do that, and so I think that’s why it’s in a bit of short supply.

Lisa Rosenbaum: Are you going to do something about that? I mean, no. . . . I’m kind of being facetious, but I’m serious. I mean, how are we going to. . . . There’s just so much inertia around not only — I mean, I don’t want to just say bad stuff about med ed, but if anything, it feels like it’s going the wrong direction in terms of, I don’t know, just trying to take on more things at the cost of, I mean, being really good at a smaller set of things that might be more in our purview.

Ziad Obermeyer: It’s a great point. I mean, medicine is complicated. There’s a lot of things you have to know, and it’s just hard for everyone to know everything. It’s the dilemma of specialization, and it’s great. It means knowledge is growing, but it does mean that you need to know a lot of stuff, and there’s a lot of people competing to get on the medical curriculum. I think there are some signs that at least like it. . . . I’ll tell you, so I teach an undergrad class at Berkeley, and it’s basically AI for health. And even though for branding purposes, it’s a course about AI, it’s really a course about very mundane things about health data and how to do useful stuff with it.

So I think at least in this area, we’re benefiting from a lot of people being interested in data and data science and computer science. And I think we can get those people early so that by the time some of them go into medical school, they already know enough that even if medical schools hypothetically didn’t change at all, and didn’t decide to add value in this particular area, I think by virtue of getting different people in, that’ll change. I think there’s also a lot of movement at a lot of different universities around data science. So there’s a new program that I’m part of that’s joint between UCSF and Berkeley called Computational Precision Health, where there’s going to be a Ph.D. program, data resources faculty from both the Berkeley side, computer science, statistics, public health, and the UCSF side are bringing all of that clinical expertise around this new field that’s at the intersection of data and clinical medicine.

So I think people are starting to see the need for these kinds of things, and I’m optimistic, or cautiously optimistic, that there will be a supply of these kinds of training opportunities in the future.

Lisa Rosenbaum: That’s awesome.

I’ve been talking with Ziad Obermeyer at UC Berkeley, who’s clearly a trailblazer in the actually useful and enlightening application of machine learning to population health. His research elucidates critical aspects of clinical care, contributing to our understanding of the value of interventions for specific patients at specific times and enhancing our admittedly limited prognostic abilities. It’s heartening to hear that he’s also working to train a new generation of researchers to seek answers to such crucial questions in medicine and health care.

Thank you for coming on today, and telling us about your amazing work and your outlook and your vision.

Ziad Obermeyer: Thanks, Lisa. It’s wonderful to talk to you, as always.

Dr. Obermeyer reports receiving research support from Google, Griffin Catalyst, and Schmidt Futures; receiving speaking or consulting fees from Anthem, Atrium Health, Blue Cross Blue Shield Tennessee, the Health Management Academy, Independence Blue Cross, and Premier Inc.; and holding equity interests in Dandelion Health and LookDeep Health.