I have seen numerous accounts of how — when an artificial intelligence or machine learning system is given a human resource task in the hope it won’t perpetuate human biases — biases in the material used to train the AI lead to it replicating the discrimination. As The Economist recently noted, this can happen even when information on things like the sex and race of applicants isn’t directly provided, since it can be inferred from other features in the data:
Such deficiencies are, at least in theory, straightforward to fix (IBM offered a more representative dataset for anyone to use). Other sources of bias can be trickier to remove. In 2017 Amazon abandoned a recruitment project designed to hunt through CVs to identify suitable candidates when the system was found to be favouring male applicants. The post mortem revealed a circular, self-reinforcing problem. The system had been trained on the CVs of previous successful applicants to the firm. But since the tech workforce is already mostly male, a system trained on historical data will latch onto maleness as a strong predictor of suitability.
Humans can try to forbid such inferences, says Fabrice Ciais, who runs PWC’s machine-learning team in Britain (and Amazon tried to do exactly that). In many cases they are required to: in most rich countries employers cannot hire on the basis of factors such as sex, age or race. But algorithms can outsmart their human masters by using proxy variables to reconstruct the forbidden information, says Mr Ciais. Everything from hobbies to previous jobs to area codes in telephone numbers could contain hints that an applicant is likely to be female, or young, or from an ethnic minority.
In part this is a subset of the black box problem in AI. For example, an AI intended to distinguish dogs from wolves learned to work out which photos had snow in them instead. Since the output from AIs is a set of tuned probabilities, it’s not possible to say what chain of reasoning or source of evidence led them to a conclusion; at the same time, this creates a risk that they will behave in unpredictable and unwanted ways.