Many organizations have adopted a set of high-level principles to ensure that their AI makes decisions in an ethical fashion and causes no harm. But to give the principles teeth, organizations need to have an implementation strategy that includes concrete metrics that can be measured and monitored by engineers, data scientists, and legal personnel. Because there is no one-size-fits-all approach to quantifying potential harms created by AI, metrics are likely to vary across organizations, use cases, and regulatory jurisdictions. But that need not discourage organizations; they can draw from a combination of existing research, legal precedents, and technical best practices.
Environmental well-being. Human agency. Transparency. These are just a few of the ill-defined principles commonly listed in ethical frameworks for artificial intelligence (AI), hundreds of which have now been released by organizations ranging from Google to the government of Canada to BMW. As organizations embrace AI with increasing speed, adopting these principles is widely viewed as one of the best ways to ensure AI does not cause unintended harms.
The problem? Many AI ethical frameworks cannot be clearly implemented in practice, as researchers have consistently demonstrated. Without a dramatic increase in the specificity of existing AI frameworks, there’s simply not much technical personnel can do to clearly uphold such high-level guidance. And this, in turn, means that while AI ethics frameworks may make for good marketing campaigns, they all too frequently fail to stop AI from causing the very harms they are meant to prevent.
In that sense, ethics guidelines can actually pose serious risks for the companies that adopt them — creating the false sense that organizations have made their AI free of risk when its dangers are in fact rampant. If organizations think that the act of drafting AI ethical principles is enough to ensure their AI is safe, they should think again.
So what can companies do to keep their AI from causing harm?
The answer does not lie in abandoning ethical AI efforts altogether. Instead, organizations should ensure these frameworks are also developed in tandem with a broader strategy for ethical AI that is focused directly on implementation, with concrete metrics at the center. Every AI principle an organization adopts, in other words, should also have clear metrics that can be measured and monitored by engineers, data scientists, and legal personnel.
Insight Center
Organizations can start with an admission: Getting serious about ethical AI is no easy task. Applying the principles of ethical AI to real-world deployment environments requires significant time and resources that span legal, data science, and risk departments and, in some cases, requires outside expertise. Indeed, because there is no one-size-fits-all approach to quantifying potential harms created by AI, metrics for ethical AI are likely to vary across organizations, use cases, and regulatory jurisdictions. There is simply no single best way to measure a principle like AI transparency or well-being. But that need not discourage organizations that are serious about ethical AI, which can draw from a combination of existing research, legal precedents, and technical best practices.
In the realm of fairness, for example, there are years of case law in credit, housing, and employment law in the United States that can serve as a guide to measuring algorithmic discrimination — a topic I’ve previously addressed. Organizations may want to adopt or modify measures like adverse impact ratio, marginal effect, or standardized mean difference, which are widely used to quantify discrimination in highly-regulated fair-lending environments. Using these or similar metrics for fairness will allow data scientists to understand when and how their AI is creating discriminatory harms and to act quickly when it does.
In the world of privacy, there are a host of metrics that organizations can adopt to quantify potential privacy violations as well. While there are numerous examples of research on the subject (this study is one of my favorites), a set of techniques called “privacy enhancing technologies” may be one of the best places to start for operationalizing principles related to privacy. Methods like differential privacy, which have open source packages that data scientists can adopt out of the box, are based upon the explicit notion that privacy can be quantified in large data sets and have been deployed by many tech giants for years. Similar research exists in the world of AI interpretability and security as well, which can be paired with a host of commonly espoused AI principles like transparency, robustness, and more.
All of which means that as organizations engage in the difficult work of applying ethics-related metrics to their AI, they need not start from scratch. With clear metrics attached to their ethics frameworks, organizations can more easily understand when ethical failures are occurring in their AI.
Are there downsides to an increased emphasis on the role of metrics in ethical AI? Surely. There are some facets of algorithmic decision-making that are hard, if not impossible, to quantify. This means that many AI deployments are likely to have intangible risks that require careful and critical review. Overemphasizing metrics can, in some circumstances, lead to overlooked risk in these areas. Additionally, it takes far more than clear metrics to ensure that AI is not generating serious harms. Mechanisms for accountability, documentation, model inventories, and more must also form major components of any efforts to deploy AI responsibly.
But failing to provide clear metrics for AI harms, as is all too often the case in organizations deploying AI, breeds confusion for frontline technical and legal personnel, who require measurable indicators to understand if their AI is misbehaving and to know how to respond when it does.
And while the task of translating AI principles into concrete metrics is both intellectually challenging and resource intensive, organizations adopting AI should not be daunted. Indeed, they cannot afford to be. The alternative to the challenge they face — namely, of waiting to measure the harms of AI until after they occur — will be far more difficult and costly to their consumers, their reputations, and their bottom lines.
Ultimately the question is not if AI harms need to be measured but when. If organizations wait too long to quantify the damages their AI can cause, the courts will start doing it for them.