Which candidate should we hire? Who should be promoted? How should we choose which people get which shifts? In the hope of making better and fairer decisions about personnel matters such as these, companies have increasingly adopted AI tools only to discover that they may have biases as well. How can we decide whether to keep human managers or go with AI? This article offers four considerations.
The initial promise of artificial intelligence as a broad-based tool for solving business problems has given way to something much more limited but still quite useful: algorithms from data science that make predictions better than we have been able to do so far.
In contrast to standard statistical models that focus on one or two factors already known to be associated with an outcome like job performance, machine-learning algorithms are agnostic about which variables have worked before or why they work. The more the merrier: It throws them all together and produces one model to predict some outcome like who will be a good hire, giving each applicant a single, easy-to-interpret score as to how likely it is that they will perform well in a job.
Insight Center
No doubt because the promise of these algorithms was so great, the recognition of their limitations has also gotten a lot of attention, especially given the fact that if the initial data used to build the model is biased, then the algorithm generated from that data will perpetuate that bias. The best-known examples have been in organizations that discriminated against women in the past where job performance data is also biased, and that means algorithms based on that data will also be biased.
So how should employers proceed as they contemplate adopting AI to make personnel decisions? Here are four considerations:
1. The algorithm may be less biased than the existing practices that generate the data in the first place. Let’s not romanticize how poor human judgment is and how disorganized most of our people management practices are now. When we delegate hiring to individual supervisors, for example, it is quite likely that they may each have lots of biases in favor of and against candidates based on attributes that have nothing to do with good performance: Supervisor A may favor candidates who graduated from a particular college because she went there, while Supervisor B may do the reverse because he had a bad experience with some of its graduates. At least algorithms treat everyone with the same attributes equally, albeit not necessarily fairly.
2. We may not have good measures of all of the outcomes we would like to predict, and we may not know how to weight the various factors in making final decisions. For example, what makes for a “good employee”? They have to accomplish their tasks well, they also should get along with colleagues well, fit in with the “culture,” stay with us and not quit, and so forth. Focusing on just one aspect where we have measures will lead to a hiring algorithm that selects on that one aspect, often when it does not relate closely to other aspects, such as a salesperson who is great with customers but miserable with co-workers.
Here again, it isn’t clear that what we are doing now is any better: An individual supervisor making a promotion decision may be able in theory to consider all those criteria, but each assessment is loaded with bias, and the way they are weighted is arbitrary. We know from rigorous research that the more hiring managers use their own judgment in these matters, the worse their decisions are.
3. The data that AI uses may raise moral issues. Algorithms that predict turnover, for example, now often rely on data from social media sites, such as Facebook postings. We may decide that it is an invasion of privacy to gather such data about our employees, but not using it comes at the price of models that will predict less well.
It may also be the case that an algorithm does a good job overall in predicting something for the average employee but does a poor job for some subset of employees. It might not be surprising, for example, to find that the hiring models that pick new salespeople do not work well at picking engineers. Simply having separate models for each would seem to be the solution. But what if the different groups are men and women or whites and African Americans, as appears to be the case? In those cases, legal constraints prevent us from using different practices and different hiring models for different demographic groups.
4. It is often hard, if not impossible, to explain and justify the criteria behind algorithmic decisions. In most workplaces now, we at least have some accepted criteria for making employment decisions: He got the opportunity because he has been here longer; she was off this weekend because she had that shift last weekend; this is the way we have treated people before. If I don’t get the promotion or the shift I want, I can complain to the person who made the decision. He or she has a chance to explain the criterion and may even help me out next time around if the decision did not seem perfectly fair.
When we use algorithms to drive those decisions, we lose the ability to explain to employees how those decisions were made. The algorithm simply pulls together all the available information to construct extremely complicated models that predict past outcomes. It would be highly unlikely if those outcomes corresponded to any principle that we could observe or explain other than to say, “The overall model says this will work best.” The supervisor can’t help explain or address fairness concerns.
Especially where such models do not perform much better than what we are already doing, it is worth asking whether the irritation they will cause employees is worth the benefit. The advantage, say, of just letting the most senior employee get first choice in picking his or her schedule is that this criterion is easily understood, it corresponds with at least some accepted notions of fairness, it is simple to apply, and it may have some longer-term benefits, such as increasing the rewards for sticking around. There may be some point where algorithms will be able to factor in issues like this, but we are nowhere close to that now.
Algorithmic models are arguably no worse than what we are doing now. But their fairness problems are easier to spot because they happen at scale. The way to solve them is to get more and better measures — data that is not biased. Doing that would help even if we were not using machine-learning algorithms to make personnel decisions.