Fighting Bias in Data and AI
DECEMBER 05, 2019
Eric Daimler, PhD, MS: You know we were probably going to get into this question of bias at some point, and this is a good place to touch on that, you know the issues of data and bias and data and collection and where this is going manifests itself in every part of this system. It's again why I ask people to have a sort of systems level intelligence, if we look at just a collection of data the sensing of data you may think that the radar on top of your autonomous car, it may be objective, you know certainly the thermometer or an air temperature, air quality, those are objective, but it was within the last hundred years, well within the lifetime of if not our parents our grandparents that we were collecting under a veneer of scientific objectivity the shapes of people's heads right?
And that was really for racist purposes but we had a veneer of science on this saying “oh no we're completely objective about the shape of people you know size of people's heads and we'll import this to determine who's in a more superior” and so that is an example of how even the data before it's set for analysis can be biased, even in the collection of the data, can be biased. Who's making the choice about which data we're keeping, which data we’re getting rid of. Did we did we pay attention to every possible group? You know you it was for a long time that the difference between male and females was ignored. You know I was told a story recently about how the there was a drug discovered that had a 95 percent efficacy but for a particular ethnic group it had severe consequences. But that didn't come to light until later.
You know this is bias in action and it permeates our world. The difficulty in that conversation then is how do you motivate drug companies to invent a new drug that's 95% accurate, effective for the vast majority of the population, how do you motivate them to then go invent a new drug for the small population that is very poorly served and for which there's no obvious second choice? These are the types of conversations we're going to need to have going forward.
We know when you have a result and getting at our earlier conversation that says well I can tell the patient one thing and they're very likely to get it, or I can tell a patient another thing and they may or may not, but it but it's more accurate, you know what do you tell them? That's the output of data. If I tell them I'm gonna have a drug that everybody expects or I'm gonna tell them it's an experimental drug, how do I represent this? That's an output of the data? How is that objective or subjective?
Simon D. Murray, MD: What if banks used algorithms to determine who got loans and what if it was it was learned that women were better at repaying loans than men? Just suppose, now that's a bias and they put that bias in the loan thing ,should that bias be taken out because it's a bias? Would that mean women would have an easier time getting loans than men? Now it's a fact they say. Or supposing they learned that if you applied for a loan on Monday you were less likely to default than if you applied on Wednesday, and you happen to apply on Wednesday and the algorithm turns you down, is that fair?
ED: My wife can even give a better example, actually, we returned as a trade-in 2 phones to Apple last year, for a mistake of apples the phones were returned. Now my phone was returned because Apple had become hypersensitive about frauds coming from overseas and changing the serial number of phones and this one got flagged as that and it got sent back. My phone had been replaced on repair so it was a different serial number, but the system just rejected it and sent it back and I had no obvious recourse. There's no hey try again there's no going to the store, there's no phone number to call, it just it ends. My wife’s phone gets returned because it's connected to my phone, it's just on the same account, my wife is somehow implicated in my fraud of the serial number of the phone.
And what we wouldn't know and this is where this begins to be even scarier, is the degree to which that trigger may have gone somewhere else, maybe we trust Apple, we trust them with all of our data, and some very well-meaning bureaucrat says gosh I don't know how to assess these credit issues, I don't know how to assess these trustworthy issues, but before I interview people for jobs in the federal government I'm gonna use the credit scoring system that Apple has right? And I'm gonna see that he was trying to pull a fraud and so even though I didn't do anything, right I could I could be rejected for even application into the federal government and I could have missed the opportunity to have been in the White House because of these sorts of issues.
And this can make you mad because you might then say wow I know of these connections and now I was pulled aside by the TSA at the airport, was that because the TSA also used this credit scoring system? And then you might say well that was linked to my Amazon Prime account, and then I go into the CVS drugstore and the doors opened a little more slowly for me, it can really begin to drive you crazy these linkages of data. And this is again where I say a systems intelligence view, we need to think about how all of these systems relate, because if you just add more information and if you just resist this information neither of those extremes are gonna be really helpful. We need to be thinking of these nuances of data where is it appropriate, where is it not appropriate, what are the guiderails, what are the standards that we can put in a place and have a conversation as a society? It's not for me to answer it's not for you to answer, but the more people that are engaged in this conversation, and physicians clinicians, researchers, all healthcare professionals are in a fantastic place to be engaged in this conversation. The richer that conversation is the better decisions we’ll make.
Transcript edited for clarity.