In Your Face Artificial Intelligence: Regulating the Collection and Use of Face Data (Part II)

The technologies behind “face data” collection, detection, recognition, and affect (emotion) analysis were previously summarized. Use cases for face data, and reported concerns about the proliferation of face data collection efforts and instances of face data misuse were also briefly discussed.

In this follow-on post, a proposed “face data” definition is explored from a governance perspective, with the purpose of providing more certainty as to when heightened requirements ought to be imposed on those involved in face data collection, storage, and use.  This proposal is motivated in part by the increased risk of identity theft and other instances of misuse from unauthorized disclosure of face data, but also recognizes that overregulation could subject persons and entities to onerous requirements.

Illinois’ decade-old Biometric Information Privacy Act (“BIPA”) (740 ILCS 14/1 (2008)), which has been widely cited by privacy hawks and asserted against social media and other companies in US federal and various state courts (primarily Illinois and California), provides a starting point for a uniform face data definition. The BIPA defines “biometric identifier” to include a scan of a person’s face geometry. The scope and meaning of the definition, however, remains ambiguous despite close scrutiny by several courts. In Monroy v. Shutterfly, Inc., for example, a federal district court found that mere possession of a digital photograph of a person and “extraction” of information from such photograph is excluded from the BIPA:

“It is clear that the data extracted from [a] photograph cannot constitute “biometric information” within the meaning of the statute: photographs are expressly excluded from the [BIPA’s] definition of “biometric identifier,” and the definition of “biometric information” expressly excludes “information derived from items or procedures excluded under the definition of biometric identifiers.”

Slip. op. No. 16-cv-10984 (N.D. Ill. 2017). Despite that finding, the Monroy court concluded that a “scan of face geometry” under the statute’s definition includes a “scan” of a person’s face from a photograph (or a live scan of a person’s face geometry). Although not at issue in Monroy, the court did not address whether that BIPA applies when a scan of any part of a person’s face geometry from an image is insufficient to identify the person in the image. That is, the Monroy holding arguably applies to any data made by a scan, even if that data by itself cannot lead to identifying anyone.

By way of comparison, the European Union’s General Data Protection Regulation (GDPR), which governs “personal data” (i.e., any information relating to an identified or identifiable natural person), will regulate biometric information when it goes into effect in late May 2018. Like the BIPA, the GDPR will place restrictions on “personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data” (GDPR, Article 4) (emphasis added).  Depending on how EU nation courts interpret the GDPR generally, and Article 4 specifically, a process that creates any biometric data that relates to, or could lead to, or that allows one to identify a person, or allows one to confirm an identity of a person, is a potentially covered process under the GDPR.

Thus, to enhance clarity for potentially regulated individuals and companies dealing with US citizens, “face data” could be defined, as set forth below, in a way that considers a minimum quantity or quality of data below which a regulated entity would not be within the scope of the definition (and thus not subject to regulation):

“Face data” means data in the possession or control of a regulated entity obtained from a scan of a person’s face geometry or face attribute, as well as any information and data derived from or based on the geometry or attribute data, if in the aggregate the data in the possession or control of the regulated entity is sufficient for determining an identity of the person or the person’s emotional (physiological) state.

The term “determining an identity of the person or the person’s emotional (physiological) state” relates to any known computational or manual technique for identifying a person or that person’s emotions.

The term “is sufficient” is interpretable; it would need to be defined explicitly (or, as is often the case in legislation, left for the courts to fully interpret). The intent of “sufficient” is to permit the anonymization or deletion of data following the processing of video signals or images of a person’s face to avoid being categorized as possessing regulated face data (to the extent probabilistic models and other techniques could not be used to later de-anonymize or reconstruct the missing data and identify a person or that person’s emotional state). The burden of establishing the quality and quantity of face data that is insufficient for identification purposes should rest with the regulated entity that possesses or controls face data.

Face data could include data from the face of a “live” person captured by a camera (e.g., surveillance) as well as data extracted from existing media (e.g., stored images). It is not necessary, however, for the definition to encompass the mere virtual depiction or display of a person in a live video or existing image or video. Thus, digital pictures of friends or family on a personal smartphone would not be face data, and the owner of the phone should not be a regulated entity subject to face data governance. An app on that smartphone, however, that uses face detection algorithms to process the pictures for facial recognition and sends that data to a remote app server for storage and use (e.g., for extraction of emotion information) would create face data.

By way of other examples, a process involving pixel-level data extracted from an image (a type of “scan”) by a regulated entity  would create face data if that data, combined with any other data possessed or controlled by the entity, could be used in the aggregate to identify the person in the image or that person’s emotional state. Similarly, data and information reflecting changes in facial expressions by pixel-level comparisons of time-slice images from a video (also a type of scan) would be information derived from face data and thus would be regulated face data, assuming the derived data combined with other data owned or possessed could be used to identify the person in the image or the person’s emotional state.

Information about the relative positions of facial points based on facial action units could also be data derived from or based on the original scan and thus would be face data, assuming again that the data, combined with any other data possessed by a regulated entity, could be used to identify a person or that person’s emotional state. Classifications of a person’s emotional state (e.g., joy, surprise) based on extracted image data would also be information derived from or based on a person’s face data and thus would also be face data.

Features extracted using deep learning convolutions of an image of a person’s face could also be face data if the convolution information along with other data in the possession or control of a regulated entity could be used to identify a person or that person’s emotional state.

For banks and other institutions that use face recognition for authentication purposes, sufficient face data would obviously need to be in the banks possession at some point in time to positively identify a customer making a transaction. This could subject the institution to face data governance during that time period. In contrast, a social media platform that permits users to upload images of people but does not scan or otherwise process the images (such as by cross-referencing other existing data) would not create face data and thus would not subject the platform to face data governance, even if it also possessed tagged images of the same individuals in the uploaded images. Thus, the mere possession or control over images, even if the images could potentially contain identifying information, would not constitute face data. But, if a platform were to scan (process) the uploaded images for identification purposes or sell or provide the images uploaded by users to a third party that scans the images to extract face geometry or attributes data for purposes such as targeted advertising, could subject the platform and the third party to face data governance.

The proposed face data definition, which could be modified to include “body data” and “voice data,” is merely one example that US policymakers and stakeholders might consider in the course of assessing the scope of face data governance in the US.  The definition does not exclude the possibility that any number of exceptions, exclusions, and limitations could be implemented to avoid reaching actors and actions that should not be covered, while also maintaining consistency with existing laws and regulations. Also, the proposed definition is not intended to directly encompass specific artificial intelligence technologies used or created by a regulated entity to collect and use face data, including the underlying algorithms, models, networks, settings, hyper-parameters, processors, source code, etc.

In a follow-on post, possible civil penalties for harms caused by face data collection, storage, and use will be briefly considered, along with possible defenses a regulated person or entity may raise in litigation.

How Privacy Law’s Beginnings May Suggest An Approach For Regulating Artificial Intelligence

A survey conducted in April 2017 by Morning Consult suggests most Americans are in favor of regulating artificial intelligence technologies. Of 2,200 American adults surveyed, 71% said they strongly or somewhat agreed that there should be national regulation of AI, while only 14% strongly or somewhat disagreed (15% did not express a view).

Technology and business leaders speaking out on whether to regulate AI fall into one of two camps: those who generally favor an ex post, case-by-case, common law approach, and those who prefer establishing a statutory and regulatory framework that, ex ante, sets forth clear do’s and don’ts and penalties for violations. (If you’re interested in learning about the challenges of ex post and ex ante approaches to regulation, check out Matt Scherer’s excellent article, “Regulating Artificial Intelligence Systems: Risks, Challenges, Competencies, and Strategies,” published in the Harvard Journal of Law and Technology (2016)).

Advocates for a proactive regulatory approach caution that the alternative is fraught with predictable danger. Elon Musk for one, notes that, “[b]y the time we’re reactive in A.I., regulation’s too late.” Others, including leaders of some of the biggest AI technology companies in the industry, backed by lobbying organizations like the Information Technology Industry Council (ITI), feel that the hype surrounding AI does not justify quick Congressional action at this time.

Musk criticized this wait-and-see approach. “Normally, the way regulation’s set up,” he said, “a whole bunch of bad things happen, there’s a public outcry, and then after many years, a regulatory agency is set up to regulate that industry. There’s a bunch of opposition from companies who don’t like being told what to do by regulators, and it takes forever. That in the past has been bad but not something which represented a fundamental risk to the existence of civilization.”

Assuming AI regulation is inevitable, how should regulators (and legislators) approach such a formidable task? After all, AI technologies come in many forms, and their uses extend across multiple industries, including some already burdened with regulation. The history of privacy law may provide the answer.

Without question, privacy concerns, and privacy laws, touch on AI technology use and development. That’s because so much of today’s human-machine interactions involving AI are powered by user-provided or user-mined data. Search histories, images people appear in on social media, purchasing habits, home ownership details, political affiliations, and many other data points are well-known to marketers and others whose products and services rely on characterizing potential customers using, for example, machine learning algorithms, convolutional neural networks, and other AI tools. In the field of affective computing, human-robot and human-chatbot interactions are driven by a person’s voice, facial features, heart rate, and other physiological features, which are the percepts that the AI system collects, processes, stores, and uses when deciding actions to take, such as responding to user queries.

Privacy laws evolved from a period during late nineteenth century America when journalists were unrestrained in publishing sensational pieces for newspapers or magazines, basically the “fake news” of the time. This Yellow Journalism, as it was called, prompted legal scholars to express a view that people had a “right to be let alone,” setting in motion the development of a new body of law involving privacy. The key to regulating AI, as it was in the development of regulations governing privacy, may be the recognition of a specific personal right that is, or is expected to be, infringed by AI systems.

In the case of privacy, attorneys Samuel Warren and Louis Brandeis (later, Justice Brandeis) were the first to articulate a personal privacy right. In The Right of Privacy, published in the Harvard Law Review in 1890, Warren and Brandeis observed that “the press is overstepping in every direction the obvious bounds of propriety and of decency. Gossip…has become a trade.” They contended that “for years there has been a feeling that the law must afford some remedy for the unauthorized circulation of portraits of private persons.” They argued that a right of privacy was entitled to recognition because “in every [] case the individual is entitled to decide whether that which is his shall be given to the public.” A violation of the person’s right of privacy, they wrote, should be actionable.

Soon after, courts began recognizing the right of privacy in civil cases. By 1960, in his seminal review article entitled Privacy (48 Cal.L.Rev 383), William Prosser wrote, “In one form or another,” the right of privacy “was declared to exist by the overwhelming majority of the American courts.” That led to uniform standards. Some states enacted limited or sweeping state-specific statutes, replacing the common law with statutory provisions and penalties. Federal appeals courts weighed in when conflicts between state law arose. This slow progression from initial recognition of a personal privacy right in 1890, to today’s modern statutes and expansive development of common law, won’t appeal to those pushing for regulation of AI now.

Even so, the process has to begin somewhere, and it could very well start with an assessment of the personal rights that should be recognized arising from interactions with or the use of AI technologies. Already, personal rights recognized by courts and embodied in statutes apply to AI technologies. But there is one personal right, potentially unique to AI technologies, that has been suggested: the right to know why (or how) an AI technology took a particular action (or made a decision) affecting a person.

Take, for example, an adverse credit decision by a bank that relies on machine learning algorithms to decide whether a customer should be given credit. Should that customer have the right to know why (or how) the system made the credit-worthiness decision? FastCompany writer Cliff Kuang explored this proposition in his recent article, “Can A.I. Be Taught to Explain Itself?” published in the New York Times (November 21, 2017).

If AI could explain itself, the banking customer might want to ask it what kind of training data was used and whether the data was biased, or whether there was an errant line of python coding to blame, or whether the AI gave the appropriate weight to the customer’s credit history. Given the nature of AI technologies, some of these questions, and even more general ones, may only be answered by opening the AI black box. But even then it may be impossible to pinpoint how the AI technology made its decision. In Europe, “tell me why/how” regulations are expected to become effective in May 2018. As I will discuss in a future post, many practical obstacles face those wishing to build a statute or regulatory framework around the right of consumers to demand from businesses that their AI explain why it made or took a particular adverse action.

Regulation of AI will likely happen. In fact, we are already seeing the beginning of direct legislative/regulatory efforts aimed at the autonomous driving industry. Whether interest in expanding those efforts to other AI technologies grows or lags may depend at least in part on whether people believe they have personal rights at stake in AI, and whether those rights are being protected by current laws and regulations.