Illustrations by Remie Geoffroi
Three years ago, a Georgia Tech study uncovered a major flaw in self-driving vehicles: they find it much harder to see darker-skinned pedestrians. The researchers were testing how accurately the vehicles’ artificial intelligence–based object detection models noticed pedestrians of different races. But no matter what variables they changed — how big the person was in the image, whether they were partially blocked, what time of day it was — the imbalance remained, raising fears that in real-life applications, racialized people could be at higher risk of being hit by a self-driving car. It’s just one of far too many examples showing how AI can be biased and, as a result, harm already-marginalized groups.
“Think of something like melanoma detection,” says Shingai Manjengwa, director of technical education at the Vector Institute for Artificial Intelligence. One of AI’s promises is that we can take a photo of a strange freckle or mole with our phone and upload it to a health care platform to determine whether we need to worry about it. “But what if there’s an issue with that model and it says, ‘You’re fine’?” Manjenwa says. “We all know that cancer is one of those things where if you catch it early, you have better chances of survival and treatment. The possibility of harm is why we need to be addressing bias.”
Computer scientists have known bias is an issue in AI for years, but they haven’t always taken that potential for harm seriously.
“Back then, researchers were just trying to build the fastest, most effective AI algorithm. They weren’t really considering the impact those algorithms would have on people’s lives,” says Elissa Strome, executive director of the pan-Canadian AI Strategy at CIFAR, a research organization. “Now there’s a lot more recognition of how critical this issue is.”
AI is used in retail, banking, policing and education, on the apps that we turn to for fun (like Snapchat and Netflix) and the programs that health care providers use to determine how to allocate care. Whether you are applying for a loan or heading through security at the airport, AI can affect how smoothly (or not so smoothly) that process goes; and because AI has a tendency to replicate and often amplify any bias in the data it’s applied to, the potential consequences are far-reaching. As Strome says, “bias in AI is something we have to work really hard and really intentionally on.”
It all comes down to people. AI is developed in phases: first the algorithm is designed, then the system is trained using massive data sets, then it’s deployed, where it continues to refine its predictions using real-world data. Golnoosh Farnadi, a core academic member and Canada CIFAR AI Chair at Montreal research institute Mila, calls this process the AI pipeline. She explains that “the problem of racism can appear anywhere in this pipeline” often due to the lack of awareness of the people involved.
For starters, teams that are homogenous in race, gender and class — which many teams in tech still are — may unintentionally introduce bias when designing their algorithm. Take COMPAS, a legal-software system that uses AI to calculate whether defendants are likely to reoffend. Judges in several U.S. states use these predictions in their sentencing decisions.
But according to a 2016 ProPublica investigation, the designers of the software didn’t properly account for the way that age, race and gender can affect prior arrests or contact with the police, which meant that “Black defendants were often predicted to be at a higher risk of recidivism than they actually were,” the report said. White defendants, on the other hand, “were often predicted to be less risky than they were.” (The company disputes the findings.)
Algorithms can also learn bias from the data they’re trained on. IBM’s facial recognition products, for example, were 97 per cent accurate when predicting gender — as long as the subject had light skin. But when the subject had dark skin, that dropped to 78 per cent. The reason these algorithms were so bad at analyzing dark-skinned subjects is because they had not been trained on data that included enough photos of racialized people.
That’s because “data is very expensive,” says Foteini Agrafioti, chief science officer at RBC and head of Borealis AI. “You can’t build a new data set because of the cost, so you want to leverage what already exists.”
Wikipedia is used often in machine learning, because it’s huge and there’s a lot of text in it. So is social media. “But these platforms are biased by definition because humans contribute that information, and humans themselves are biased,” says Agrafioti.
And of course, biased algorithms can get caught in a feedback loop of bad data. A prime example of this is a predictive policing software called PredPol, which uses historical data to predict where crimes will occur in order to suggest where police should patrol. Of course, since racialized people have historically been overpoliced, this increases the likelihood that they will be targeted by PredPol’s predictions, too.
And according to an investigation by Gizmodo and The Markup, that’s exactly what happened. When journalists reviewed an archive of the program’s data dating back to 2018, they found that several white, middle and upper-income neighbourhoods went years without having a single prediction of a crime. By contrast, poorer neighbourhoods inhabited by Black and Hispanic people saw many crime predictions — sometimes up to 11,000 a day.
“The software often recommended daily patrols in and around public and subsidized housing, targeting the poorest of the poor,” the reporters wrote. (The CEO said the journalists’ analysis was incomplete.) While it was impossible for the journalists to tell whether this focus led to specific stops, arrests or use of force, if even one person was arrested, that data would then be fed back into the software’s analysis, further justifying additional policing in these areas.
Intentionally revising the training material for AI programs can pose challenges of its own. When developers at OpenAI were building DALL-E 2, a program that generates images from text, they tried to filter out sexual imagery from its training data to prevent it from replicating the biases toward women commonly found in that type of content. The AI responded by generating fewer images of women overall.
OpenAI also ran into problems revising the training for its chatbot GPT-3 so that it would be less biased toward Muslims. Would the code understand the context in a piece of fiction about Islamophobia, for example? Or would that be just another text that affirms its biases?
Bias testing. Even in a world where tech was perfectly diverse and data was totally impartial, bias around race, gender, ability, class and body size can sneak into the pipeline. Developers will always need a system to identify problems.
That’s where technical methods come in, says Manjengwa, including double-checking AI-generated results. At Vector, her team has created a five-week course that trains developers to identify bias in their algorithm.
“We don’t have to just follow this, where we’re like, ‘Oh, it’s nobody’s fault, we didn’t know,’” she says.
And one day, that testing may be part of the algorithm itself. “There’s a lot of people right now who are dedicating their career to building algorithms that are self-correcting,” Agrafioti says. “Basically, they anticipate that something can go wrong, so they are automatically checking themselves and trying to highlight areas where they may fall short.”
The “human-in-the-loop” approach is already widely used everywhere from Meta to EY, but when it comes to preventing racism, it’s not yet a perfect solution. For one thing, humans are just as vulnerable to bias, whether that’s racial or automation bias (when people don’t give results proper scrutiny because they subconsciously believe that the machine is correct).
Many experts, including Manjengwa, think so. She believes people need opportunities for recourse when AI does lead to discriminatory practices. Already, AI is being used to determine everything from who gets a mortgage, to who gets into post-secondary institutions, to which COVID patients receive the oxygen they need, to whose face is detected on that fun new TikTok filter to what you’re charged for an Uber or Lyft ride. And this is happening with no official oversight.
“Right now, if I think the Uber app has charged me more than my colleague, even though we’re leaving from the same place to go to the same place, all I get to do is go on the news, the company denies it and that’s it,” she says. “The more we use AI, the more we need a place to go that offers recourse and can institute some sort of technical review on what algorithms do.”
Farnadi points to government approval processes for pharmaceuticals as an example of how regulations could work. “When companies come up with a new medicine, they cannot just put it in the market,” she says. “They have to meet standards, and they have to be accountable. We should have that in AI as well.”
In fact, some experts believe government oversight may be the only way these necessary changes will happen. According to Parham Aarabi, a professor and director of the Applied AI Group at the University of Toronto, if there was a mandate that required AI to “work for everyone, it would have a huge impact on removing, or at least reducing bias in AI.”
And that’s important, Aarabi says, because technology companies aren’t rushing to make those changes on their own. “When we have come up with very obvious cases that show major tech companies and tech products do have a bias, some but not all of them have tried to fix the issues. But it is something that they’ll have to do. There will be a reckoning.”
Yes. It’s not an easy task, of course, and as Agrafioti points out, it’s unlikely that a single group, company or academic institution will come up with the perfect solution. Instead, she says, the answer will likely come out of collaboration, which is why Borealis has developed a platform called RESPECT AI. It allows researchers to share the stumbling blocks they’ve encountered in their own work and, just as importantly, the best practices they’re developing to overcome them. The difficulty of fixing this problem only emphasizes why it must be solved.
“If we’re not doing anything today,” says Farnadi, “we’re actually going back in history and erasing everything that happened to preserve the rights of women and of different races.”