Why it's totally unsurprising that Amazon's recruitment AI was biased against women

Amazon abandoned a project to build an AI recruitment tool, which engineers found was discriminating against female candidates.
Dr Sandra Wachter, an AI researcher at Oxford University, told Business Insider that the gender bias was hardly surprising.
You feed an AI with garbage and it will spit garbage out, she said. In Amazon’s case, the machine may have reflected the fact that the historical data it was being fed was predominantly male résumés.
Nonetheless, Wachter believes algorithms could become better decision-making tools than humans.

Amazon admitted this week that it experimented with using machine learning to build a recruitment tool. The trouble is, it didn’t exactly produce fantastic results and it was later abandoned.

According to Reuters, Amazon engineers found that besides churning out totally unsuitable candidates, the so-called AI project showed a bias against women.

To Oxford University researcher Dr Sandra Wachter, the news that an artificially intelligent system had taught itself to discriminate against women was nothing new.

“From a technical perspective it’s not very surprising, it’s what we call ‘garbage in and garbage out,'” she told Business Insider.

Garbage in, garbage out

The problem boils down to the data Amazon fed its algorithm, Wachter speculated.

"What you would do is you go back and look at historical data from the past and look at successful candidates and feed the algorithm with that data and try to find patterns or similarities," said Wachter.

"You ask the question who has been the most successful candidates in the past [...] and the common trait will be somebody that is more likely to be a man and white."

Reuters reported that the engineers building the program used résumés from a 10 year period, which were predominantly male. Amazon did not provide Business Insider with the gender split in its engineering department but sent us a link to its diversity pages. Its global gender balance is 60% men, with 74% of managerial roles being held by men.

"So if then somebody applies who doesn't fit that profile, it's likely that that person gets filtered out just because the algorithm learned from historical data," said Wachter. "That happens in recruitment, and that happens in basically everywhere where we use historical data and this data is biased."

Garbage in, garbage out (sometimes abbreviated to "GIGO") just means that bad input will result in bad output, and it's the same with bias. The problem is that it's incredibly difficult to filter out algorithmic bias, because the algorithms we build pick up on human prejudices.

"What is the algorithm supposed to do? It can only learn from our semantics and our data and how we interact with humans, and the moment there is no gender parity yet, unfortunately," said Wachter.

Machine learning can produce self-fulfilling prophecies

This is far from the first time a computer program has displayed human bias. "It's just yet another example of how algorithmic decision-making and AI in general can actually reinforce existing stereotypes that we have in our society," said Wachter.

In 2016, a ProPublica investigation found that a computer program called COMPAS, designed to assess the risk of criminals re-offending, was discriminating against black people. As an example, the program deemed an 18-year old black girl who briefly stole a child's scooter to be more likely to re-offend than a 41-year old white man with two prior convictions for shoplifting power tools.

Wachter points out that COMPAS's software asked questions which led to individuals being judged by their social environment, such as "Was one of your parents ever sent to jail or prison?" or "How many of your friends/acquaintances are taking drugs illegally?"

"This is not about the individual anymore, that is about your social environment, and being judged based on other people," said Wachter. "If you apply that to every single person, that's a self-fulfilling prophecy."

Scanning for bias

That isn't to say there's no use in perfecting our algorithms in the meantime. The first thing we can do is come up with effective methods for spotting bias inside them.

amazon warehouse

Foto: An Amazon warehouse worker.sourceSean Gallup/Getty Images

"There's been a lot of discussion in the field about trying to come up with standards and testing periods before we deploy those systems," Wachter said. "If you have a very easy to understand algorithm detecting bias will be easier but when it comes to machine learning, a very opaque system, testing for bias and discrimination, or even understanding what's going on in that system, will become more and more difficult."

Wachter has worked closely devising ways to check for bias in machine learning models, and her work has been cited by Google in its "What If" tool, which lets users analyse machine learning models without writing extra code. She believes that before companies can deploy a system, they should be able to pass a standardised test that demonstrates it's not biased.

"Especially when it comes to employment, you should have some statistical evidence that your system isn't biased. And if you can't provide that, maybe you shouldn't use [the system] for making important decisions," she continued.

Amazon said in a statement that its hiring tool "was never used by Amazon recruiters to evaluate candidates." A source told Reuters that Amazon recruiters looked its recommendations, but they never solely relied on it for actual decision-making.

"An algorithm doesn't get grumpy"

Although rooting out algorithmic bias poses a technical challenge, Wachter is confident that using AI properly could actually improve fair decision-making in our society.

"If you look at it from the other perspective, if we play this right and if we work on data providence [...] I actually think algorithms could be a better decision-making tool than humans," she said. "An algorithm cannot lie to you, you cannot force an algorithm, you cannot entice or bribe an algorithm."

She also thinks that algorithmic decision-making could help cancel out a profoundly human quality - moodiness. "Algorithms are more consistent as well. If I sit on an employment panel for eight hours, my mood will swing from time to time. I might get angry, or grumpy, or hungry, so that could influence my judgement," she said. "An algorithm doesn't get grumpy or moody or hungry."

Wachter's not in favour of removing human oversight altogether, rather she believes that humans and AI play to each other's strengths. "I think ideally they would be complementary and cancel out each other's blind spots," she said.