Privacy has always meant minimizing the collection of data to what is “fit for purpose,” and then limiting the access and retention of that data only to what is required.
Artificial intelligence (AI), promising exciting opportunities for new services and products, has always been hungry for data from any source in order to derive insights.
These two worlds are rapidly colliding. As they do, they have the potential to address both the promise and the risks of data privacy and AI.
Realizing this promise comes down to governance. Effective oversight requires complex structures most organizations do not have—not just for developers, like data-science teams, but also across teams that may procure AI solutions, such as operations and HR, and core teams, like privacy, that traditionally perform a governance function.
Inadequate governance exposes organizations to unnecessary risks, especially when teams are unaware of which data is restricted under which law. This risk was recently demonstrated when several organizations were sued for violating the California Consumer Privacy Act (CCPA) after sharing data with a third-party fraud-profiling tool. While the CCPA protects the use of data for fraud-detection and security purposes, it does not protect the voluntary transfer of data to a third party.
In this case, the data shared was clearly personal, given it was customer-specific and was used to create customer-fraud profiles. But is it always clear what constitutes personal data?
Consider an online BMI calculator, where you enter your weight and height. Are your weight and height protected data points? What if, following your search for this BMI calculator, you are bombarded with advertisements for weight-loss supplements, likely driven by AI-based recommendation engines? Was your data shared with this third party? How did an advertiser know you want to lose weight?
Recent regulations like the CCPA, California Privacy Rights Act (CPRA), and General Data Protection Regulation (GDPR) have made data protection, data privacy, and transparency top priorities for organizations. When asked what actions organizations were going to take to build AI responsibly, 38% of respondents to PwC’s 2021 AI Predictions survey indicated compliance with regulation, including privacy, a top priority for AI. While current privacy legislation is almost uniformly driven on the collection and use of personal data, proposed regulation such as the Algorithmic Accountability Act and Automated Decisions Accountability Act in California focus on the broader impact of technology and data use.
Increased emphasis on regulation is complemented by the added focus organizations are placing on data, analytics, and AI-driven decision making. In response, privacy practices in leading organizations are considering expanding their roles. This pivot increasingly positions privacy teams as core to the emerging AI-governance area.
AI applications require significant quantities of data to make robust decisions, and often require a balancing of benefits and risks given AI’s impact on stakeholders. By embedding ethics into privacy and data-protection practices, some organizations are putting increased responsibility on the privacy teams to oversee AI. This change requires privacy groups to have a basic understanding of how models are developed and tested so they can evaluate development practices such as bias mitigation.
This is a needed shift. However, privacy is not the only group both looking to take charge of AI governance and required to take a more active role. Data-science teams, business units, and functional groups alike are all moving toward more robust operating models around AI.
What Constitutes Personal Data?
Because engagement with privacy functions has historically been driven by regulatory requirements, it has primarily been compliance oriented. This leaves important engagement with privacy teams to the end of development, when they may no longer significantly influence the design of a solution.
Approaches such as incorporating privacy-by-design principles attempt to solve for potential privacy and data-protection issues by embedding privacy expertise throughout the software-development process. Even with this approach, a gap often remains between what development teams consider privacy conflicts and what actually is a conflict. Teams may not anticipate that sensor, transactional, or locational data used to train AI might be personal if such data could be linked to an individual.
With the BMI calculator example, for instance, development teams may not consider derived data as protected data. Data is not just collected; it can also be observed using sensors or other technologies, enriched with other data sources that provide a finer-grain view of an individual, and even synthetically generated to develop large synthetic populations based on sampling a few individual data points.
What are the privacy implications of these approaches? Is it ethical to combine data in such a way, or generate a synthetic population based on the information of a few real individuals? What are the implications of the intended use of the data or insights?
There are implications to using AI all throughout the data lifecycle (Figure 1), as well as the software-development lifecycle. As privacy regulations increasingly focus on data-minimization requirements, and ethical data and AI discussions move more toward data protection, organizations may increasingly rely on these techniques. Even pseudonymized or anonymized data techniques may be undermined and de-identified through data augmentations if the resulting combined dataset provides enough information to link back to an individual.
The Data Lifecycle, PwC
The Language Gap
“Privacy” is not the only term that may mean different things to data-science and privacy teams. In the privacy regulatory context, “transparency” means informing subjects regarding how their data is processed; in data science, transparency refers to an explanation of a model’s decision making. In the privacy realm, “accuracy” refers to how correct a subject’s data is; in data science, the word refers to the performance of a model on a population: the proportion of decisions the model made correctly.
Effective integration of privacy-by-design practices and widespread education start to close this language gap, but does not solve the broader questions regarding data, data use, and AI ethics. Other ethical principles organizations are moving to adopt, such as fairness, will also have similar interpretation challenges.
AI Development Is Not Like Software Development
Software development traditionally follows Agile or Waterfall development methodologies. AI development requires an experimentation-driven approach, where the whole model is iteratively developed rather than individual components developed and added to one another. This difference yields many challenges for organizations, which extend to the stewardship and governance practices required to maintain effective oversight.
Expanding Past Privacy
To enable a robust layer of governance, your organization needs to take several steps:
- Consider what governance your organization needs to move past regulatory-driven privacy protection into data and data-use ethics. Data, data use, and AI ethics involve more than privacy. Some organizations are adopting principles around explainability, societal benefit, and fairness, among other principles. Identify which principles are relevant, and more importantly, what these principles mean to your organization. Get wide executive agreement on these principles, and translate them into concrete standards and procedures for each practice within your organization to enact trust-driven approaches.
- Appreciate the differences between AI and software. Find the best connection points with AI development teams, given their experimentation-driven approach. There should be oversight as to where and when in the development process data is accessed, cleaned, manipulated, augmented, protected. Doing so can enable more thorough analysis of broader ethical considerations—privacy and beyond.
- Build off what you have. No one wants burdensome governance and compliance. But expanding governance practices unnecessarily can lead to poor compliance down the line. Building off the processes that already exist can help reduce any friction and change management required to gain effective oversight. Most use of personal data already requires a Privacy Impact Assessment or a GDPR-required Data Protection Impact Assessment in higher risk data scenarios. Augment these requirements with additional questions relating to AI, and mandate their use across all AI development. Place particular focus on questions to identify and assess the likelihood and significance of benefits, risks, and mitigation controls. Recognize this action to be a benefit-risk decision-making tool, and bring a broad cross-section of internal stakeholders to the decision input.
- Align on language. And don’t go it alone: anticipate any differences in how terminology is used between privacy and AI teams. Educate and collaborate to develop a shared understanding of these terms. Collaborative governance with a multilayer approach can help reduce friction and provide the broad range of perspectives necessary for your organization to develop robust and lasting mechanisms for ethical AI.
- Use AI techniques to address privacy issues. While AI introduces a number of complications in dealing with privacy, it also provides a few innovative ways of solving privacy issues. New techniques like homomorphic encryption and differential privacy enable data sharing through encryption or noise-induction. Federated learning allows insights to be generated locally and aggregated without revealing private data. As an active space for research, AI will undoubtedly yield new applications that may enable privacy protection.
Enterprises need to get a handle on governing AI systems and the associated data. Understanding how to institute oversight and alignment with data, data use, and ethical AI principles is key to establishing effective governing model, which will ultimately require changing the nature of specific roles. Effective governance is about building trust and enabling the powerful benefits of responsible AI.
Learn more about how PwC is enabling trust with data.