Is your artificial intelligence fair?
Thanks to the increasing adoption of AI, this has become a question that data scientists and legal personnel now routinely confront. Despite the significant resources companies have spent on responsible AI efforts in recent years, organizations still struggle with the day-to-day task of understanding how to operationalize fairness in AI.
So what should companies do to steer clear of employing discriminatory algorithms? They can start by looking to a host of legal and statistical precedents for measuring and ensuring algorithmic fairness. In particular, existing legal standards that derive from U.S. laws such as the Equal Credit Opportunity Act, the Civil Rights Act, and the Fair Housing Act and guidance from the Equal Employment Opportunity Commission can help to mitigate many of the discriminatory challenges posed by AI.
At a high level, these standards are based on the distinction between intentional and unintentional discrimination, sometimes referred to as disparate treatment and disparate impact, respectively. Intentional discrimination is subject to the highest legal penalties and is something that all organizations adopting AI should obviously avoid. The best way to do so is by ensuring the AI is not exposed to inputs that can directly indicate protected class such as race or gender.
Avoiding unintentional discrimination, or disparate impact, however, is an altogether more complex undertaking. It occurs when a seemingly neutral variable (like the level of home ownership) acts as a proxy for a protected variable (like race). What makes avoiding disparate impact so difficult in practice is that it is often extremely challenging to truly remove all proxies for protected classes. In a society shaped by profound systemic inequities such as that of the United States, disparities can be so deeply embedded that it oftentimes requires painstaking work to fully separate what variables (if any) operate independently from protected attributes.
Indeed, because values like fairness are subjective in many ways — there are, for example, nearly two dozen conceptions of fairness, some of which are mutually exclusive — it’s sometimes not even clear what the most fair decision really is. In one study by Google AI researchers, the seemingly beneficial approach of giving disadvantaged groups easier access to loans had the unintended effect of reducing these groups’ credit scores overall. Easier access to loans actually increased the number of defaults within that group, thereby lowering their collective scores over time.
Determining what constitutes disparate impact at a statistical level is also far from straightforward. Historically, statisticians and regulators have used a variety of methods to detect its occurrence under existing legal standards. Statisticians have, for example, used a group fairness metric called the “80 percent rule” (it’s also known as the “adverse impact ratio”) as one central indicator of disparate impact. Originating in the employment context in the 1970s, the ratio consists of dividing the proportion of the selected group in the disadvantaged class by the proportion of selected members of the advantaged group. A ratio below 80% is generally considered to be evidence of discrimination. Other metrics, such as standardized mean difference or marginal effects analysis, have been used to detect unfair outcomes in AI as well.
All of which means that, in practice, when data scientists and lawyers are asked to ensure their AI is fair, they’re also being asked to select what “fairness” should mean in the context of each specific use case and how it should be measured. This can be an incredibly complex process, as a growing number of researchers in the machine learning community have noted in recent years.
Despite all these complexities, however, existing legal standards can provide a good baseline for organizations seeking to combat unfairness in their AI. These standards recognize the impracticality of a one-size-fits-all approach to measuring unfair outcomes. As a result, the question these standards ask is not simply “is disparate impact occurring?”. Instead, existing standards mandate what amounts to two essential requirements for regulated companies.
First, regulated companies must clearly document all the ways they’ve attempted to minimize — and therefore to measure — disparate impact in their models. They must, in other words, carefully monitor and document all their attempts to reduce algorithmic unfairness.
Second, regulated organizations must also generate clear, good faith justifications for using the models they eventually deploy. If fairer methods existed that would have also met these same objectives, liability can ensue.
Companies using AI can and should learn from many of these same processes and best practices to both identify and minimize cases when their AI is generating unfair outcomes. Clear standards for fairness testing that incorporate these two essential elements, along with clear documentation guidelines for how and when such testing should take place, will go a long way towards ensuring fairer and more-carefully-monitored outcomes for companies deploying AI. Companies can also draw from public guidance offered by experts such as BLDS’s Nicholas Schmidt and Bryce Stephens.
Are these existing legal standards perfect? Far from it. There is significant room for improvement, as regulators have in fact noted in recent months. (A notable exception is the Trump administration’s Department of Housing and Urban Development, which is currently attempting to roll back some of these standards.) Indeed, the U.S. Federal Trade Commission has indicated an increasing focus on fairness in AI in recent months, with one of its five commissioners publicly stating that it should expand its oversight of discriminatory AI.
New laws and guidance targeting fairness in AI, in other words, are clearly coming. If shaped correctly, they will be a welcome development when they arrive.
But until they come, it’s critical that companies build off of existing best practices to combat unfairness in their AI. If deployed thoughtfully, the technology can be a powerful force for good. But if used without care, it is all too easy for AI to entrench existing disparities and discriminate against already-disadvantaged groups. This is an outcome that both businesses and society at large cannot afford.