At the Intersection of AI, Face Swapping, Deep Fakes, Right of Publicity, and Litigation

Websites like GitHub, Reddit and others offer developers and hobbyists dozens of repositories containing artificial intelligence deep learning models, instructions for their use, and forums for learning how to “face swap,” a technique used to automatically replace a face of a person in a video with that of a different person. Older versions of face swapping, primarily used on images, have been around for years in the form of entertaining apps that offered results with unremarkable quality (think cut and paste at its lowest, and photoshop editing at a higher level). With the latest AI models, however, including deep neural networks, a video with a face-swapped actor–so-called “deep fake” videos–may appear so seamless and uncanny as to fool even the closest of inspections, and the quality is apparently getting better.

With only subtle clues to suggest an actor in one of these videos is fake, the developers behind them have become the target of criticism, though much of the criticism has also been leveled generally at the AI tech industry, for creating new AI tools with few restrictions on potential uses beyond their original intent.  These concerns have now reached the halls of New York’s state legislative body.

New York lawmakers are responding to the deep fake controversy, albeit in a narrow way, by proposing to make it illegal to use “digital replicas” of individuals without permission, a move that would indirectly regulate AI deep learning models. New York Assembly Bill No. A08155 (introduced in 2017, amended Jun. 5, 2018) is aimed at modernizing New York’s right of publicity law (N.Y. Civ. Rights Law §§ 50 and 50-1)–one of the nation’s oldest publicity rights laws that does not provide post-mortem publicity rights–though it may do little to curb the broader proliferation of face swapped and deep fake videos. In fact, only a relatively small slice of primarily famous New York actors, artists, athletes, and their heirs and estates would benefit from the proposed law’s digital replicas provision.

If enacted, New York’s right of publicity law would be amended to address computer-generated or electronic reproductions of a living or deceased individual’s likeness or voice that “realistically depicts” the likeness or voice of the individual being portrayed (“realistic” is undefined). Use of a digital replica would be a violation of the law if done without the consent of the individual, if the use is in a scripted audiovisual or audio work (e.g., movie or sound recording), or in a live performance of a dramatic work, that is intended to and creates the clear impression that the individual represented by the digital replica is performing the activity for which he or she is known, in the role of a fictional character.

It would also be a violation of the law to use a digital replica of a person in a performance of a musical work that is intended to and creates the clear impression that the individual represented by the digital replica is performing the activity for which he or she is known, in such musical work.

Moreover, it would be a violation to use a digital replica of a person in an audiovisual work that is intended to and creates the clear impression that an athlete represented by the digital replica is engaging in an athletic activity for which he or she is known.

The bill would exclude, based on First Amendment principles, a person’s right to control their persona in cases of parody, satire, commentary, and criticism; political, public interest, or newsworthy situations, including a documentary, regardless of the degree of fictionalization in the work; or in the case of de minimis or incidental uses.

In the case of deep fake digital replicas, the bill would make it a violation to use a digital replica in a pornographic work if done without the consent of the individual if the use is in an audiovisual pornographic work in a manner that is intended to and creates the impression that the individual represented by the digital replica is performing.

Similar to the safe harbor provisions in other statutes, the New York law would provide limited immunity to any medium used for advertising including, but not limited to, newspapers, magazines, radio and television networks and stations, cable television systems, billboards, and transit advertising, that make unauthorized use of an individual’s persona for the purpose of advertising or trade, unless it is established that the owner or employee had knowledge of the unauthorized use, through presence or inclusion, of the individual’s persona in such advertisement or publication.

Moreover, the law would provide a private right of action for an injured party to sue for an injunction and to seek damages. Statutory damages in the amount of $750 would be available, or compensatory damages, which could be significantly higher.  The finder of fact (judge or jury) could also award significant “exemplary damages,” which could be substantial, to send a message to others not to violate the law.

So far, AI tech developers have largely avoided direct legislative or regulatory action targeting their AI technologies, in part because some have taken steps to self-regulate, which may be necessary to avoid the confines of command and control-style state or federal regulatory schemes that would impose standards, restrictions, requirements, and the right to sue to collect damages and collect attorneys’ fees. Tech companies efforts at self-regulating, however, have been limited to expressing carefully-crafted AI policies for themselves and their employees, as well as taking a public stance on issues of bias, ethics, and civil rights impacts from AI machine learning. Despite those efforts, more laws like New York’s may be introduced at the state level if AI technologies are used in ways that have questionable utility or social benefits.

For more about the intersection of right of publicity laws and regulating AI technology, please see an earlier post on this website, available here.

Obama, Trump, and the Regulation of Artificial Intelligence

Near the end of his second term, President Obama announced a series of workshops and government working groups tasked with “Preparing for the Future of Artificial Intelligence.” Then, just weeks before the 2016 presidential general election, the Obama administration published two reports including one titled “The National Artificial Intelligence Research and Development Plan.” In it, Obama laid out seven strategies for AI-related R&D, including making long-term investments in AI research to enable the United States to remain a world leader in AI, developing effective methods for human-AI interaction, and ensuring the safety, security, and trustworthiness of AI systems. The Obama AI plan also included strategies for developing shared and high-quality public datasets and environments for AI training and testing, creating standards and benchmarks for evaluating AI technologies, and understanding the national AI research workforce needs. His plan also recognized the need for collaboration among researchers to address the ethical, legal, and societal implications of AI, topics that still resonate today.

Two years after Obama’s AI announcement, the Trump administration in May 2018 convened an Artificial Intelligence Summit at the White House and then published an “Artificial Intelligence for the American People” fact sheet highlighting President Trump’s AI priorities. The fact sheet highlights the President’s goal of funding fundamental AI R&D, including in the areas of computing infrastructure, machine learning, and autonomous systems. Trump’s AI priorities also include a focus on developing workforce training in AI, seeking a strategic military advantage in AI, and leveraging AI technology to improve efficiency in delivering government services. The Trump fact sheet makes no mention of Obama’s AI plan.

Despite some general overlap and commonality between Obama’s and Trump’s AI goals and strategies, such as funding for AI, workforce training, and maintaining the United States’ global leadership in AI, one difference stands out in stark contrast: regulating AI technology. While Obama’s AI strategy did not expressly call for regulating AI, it nonetheless recognized a need for setting regulatory policy for AI-enabled products. To that end, Obama recommended drawing on appropriate technical expertise at the senior level of government and recruiting the necessary AI technical talent as necessary to ensure that there are sufficient technical seats at the table in regulatory policy discussions.

Trump, on the other hand, has rolled back regulations across the board in a number of different governmental areas and, in the case of AI, has stated that he would seek to “remove regulatory barriers” to AI innovation to foster new American industries and deployment of AI-powered technologies. With the Trump administration’s express concerns about China’s plan to dominate high tech, including AI, by 2025, as well as Congressional efforts at targeted AI legislation slowed in various committees, any substantive federal action toward regulating AI appears to be a long way off. That should be good news to many in the US tech industry who have long resisted efforts to regulate AI technologies and the AI industry.

California Jury to Decide if Facebook’s Deep Learning Facial Recognition Creates Regulated Biometric Information

Following a recent decision issued by Judge James Donato of the U.S. District Court for the Northern District of California, a jury to be convened in San Francisco in July will decide whether a Facebook artificial intelligence technology creates regulated “biometric information” under Illinois’ Biometric Information Privacy Act (BIPA).  In some respects, the jury’s decision could reflect general sentiment toward AI during a time when vocal opponents of AI have been widely covered in the media.  The outcome could also affect how US companies, already impacted by Europe’s General Data Protection Regulation (GDPR), view their use of AI technologies to collect and process user-supplied data. For lawyers, the case could highlight effective litigation tactics in highly complex AI cases where black box algorithms are often unexplainable and lack transparency, even to their own developers.

What’s At Stake? What Does BIPA Cover?

Uniquely personal biometric identifiers, such as a person’s face and fingerprints, are often seen as needing heightened protection from hackers due to the fact that, unlike a stolen password that one can reset, a person cannot change their face or fingerprints if someone makes off with digital versions and uses them to steal the person’s identity or gain access to the person’s biometrically-protected accounts, devices, and secure locations. The now 10-year old BIPA (740 ILCS 14/1 (2008)) was enacted to ensure users are made aware of instances when their biometric information is being collected, stored, and used, and to give users the option to opt out. The law imposes requirements on companies and penalties for non-compliance, including liquidated and actual damages. At issue here, the law addresses “a scan” of a person’s “face geometry,” though it falls short of explicitly defining those terms.

Facebook users voluntarily upload to their Facebook accounts digital images depicting them, their friends, and/or family members. Some of those images are automatically processed by an AI technology to identify the people in the images. Plaintiffs–here, putative class action individuals–argue that Facebook’s facial recognition feature involves a “scan” of a person’s “face geometry” such that it collects and stores biometric data in violation of BIPA.

Summary of the Court’s Recent Decision

In denying the parties’ cross-motions for summary judgment and allowing the case to go to trial, Judge Donato found that the Plaintiffs and Facebook “offer[ed] strongly conflicting interpretations of how the [Facebook] software processes human faces.” See In Re Facebook Biometric Information Privacy Litigation, slip op. (Dkt. 302), No. 3:15-cv-03747-JD (N.D. Cal. May 14, 2018). The Plaintiffs, he wrote, argued that “the technology necessarily collects scans of face geometry because it uses human facial regions to process, characterize, and ultimately recognize face images.” On the other hand, “Facebook…says the technology has no express dependency on human facial features at all.”

Addressing Facebook’s interpretation of BIPA, Judge Donato considered the threshold question of what BIPA’s drafters meant by a “scan” in “scan of face geometry.” He rejected Facebook’s suggestion that BIPA relates to an express measurement of human facial features such as “a measurement of the distance between a person’s eyes, nose, and ears.” In doing so, he relied on extrinsic evidence in the form of dictionary definitions, specifically Merriam-Webster’s 11th, for an ordinary meaning of “to scan” (i.e., to “examine” by “observation or checking,” or “systematically . . . in order to obtain data especially for display or storage”) and “geometry” (in everyday use, means simply a “configuration,” which in turn denotes a “relative arrangement of parts or elements”).  “[N]one of these definitions,” the Judge concluded, “demands actual or express measurements of spatial quantities like distance, depth, or angles.”

The Jury Could Face a Complex AI Issue

Digital images contain a numerical representation of what is shown in the image, specifically the color (or grayscale), transparency, and other information associated with each pixel of the image. An application running on a computer can render the image on a display device by reading the file data to identify what color or grayscale level each pixel should display. When one scans a physical image or takes a digital photo with a smartphone, they are systematically generating this pixel-level data. Digital image data may be saved to a file having a particular format designated by a file extension (e.g., .GIF, .JPG, .PNG, etc.).

A deep convolutional neural network–a type of AI–can be used to further process a digital image file’s data to extract features from the data. In a way, the network replicates a human cognitive process of manually examining a photograph. For instance, when we examine a face in a photo, we take note of features and attributes, like a nose and lip shape and their contours as well as eye color and hair. Those and other features may help us recall from memory whose face we are looking at even if we have never seen the image before.

A deep neural network, once it is fully trained using many different face images, essentially works in a similar manner. After processing image file data to extract and “recognize” features, the network uses the features to classify the image by associating it with an identity, assuming it has “seen” the face before (in which case it may compare the extracted features to a template image of the face, preferably several images of the face). Thus, a digital image file may contain a numerical representation of what is shown in the image, and a deep neural network creates a numerical representation of features shown in the digital image to perform classification.  A question for the jury, then, may involve deciding if the processing of uploaded digital images using a deep convolutional neural network involves “a scan” of “a person’s face geometry.” This question will challenge the parties and their lawyers to assist the jury to understand digital image files and the nuances of AI technology.

For Litigators, How to Tackle AI and Potential AI Bias?

The particulars of advanced AI have not been central to a major federal jury case to date.  Thus, the Facebook case offers an opportunity to evaluate a jury’s reaction to a particular AI technology.

In its summary judgment brief, Facebook submitted expert testimony that its AI “learned for itself what features of an image’s pixel values are most useful for the purposes of characterizing and distinguishing images of human faces” and it “combines and weights different combinations of different aspects of the entire face image’s pixel value.” This description did not persuade Judge Donato to conclude that an AI with “learning” capabilities escapes BIPA’s reach, at least not as a matter of law.  Whether it will be persuasive to a jury is an open question.

It is possible some potential jurors may have preconceived notions about AI, given the hype surrounding use cases for the technology.  Indeed, outside the courthouse, AI’s potential dark side and adverse impacts on society have been widely reported. Computer vision-enabled attack drones, military AI systems, jobs being taken over by AI-powered robots, algorithmic harm due to machine learning bias, and artificial general intelligence (AGI) taking over the world appear regularly in the media.  If bias for and against AI is not properly managed, the jury’s final decision might be viewed by some as a referendum on AI.

For litigators handling AI cases in the future, the outcome of the Facebook case could provide a roadmap for effective trial strategies involving highly complex AI systems that defy simple description.  That is not to say that the outcome will create a new paradigm for litigating tech. After all, many trials involve technical experts who try to explain complex technologies in a way that is impactful on a jury. For example, complex technology is often the central dispute in cases involving intellectual property, medical malpractice, finance, and others.  But those cases usually don’t involve technologies that “learn” for themselves.

How Will the Outcome Affect User Data Collection?

The public is becoming more aware that tech companies are enticing users to their platforms and apps as a way to generate user-supplied data. While the Facebook case itself may not usher in a wave of new laws and regulations or even self-policing by the tech industry aimed at curtailing user data collection, a sizeable damages award from the jury could have a measured chilling effect. Indeed, some companies may be more transparent about their data collection and provide improved notice and opt-out mechanisms.

In Your Face Artificial Intelligence: Regulating the Collection and Use of Face Data (Part I)

Of all the personal information individuals agree to provide companies when they interact with online or app services, perhaps none is more personal and intimate than a person’s facial features and their moment-by-moment emotional states. And while it may seem that face detection, face recognition, and affect analysis (emotional assessments based on facial features) are technologies only sophisticated and well-intentioned tech companies with armies of data scientists and stack engineers are competent to use, the reality is that advances in machine learning, microprocessor technology, and the availability of large datasets containing face data have lowered entrance barriers to conducting robust face detection, face recognition, and affect analysis to levels never seen before.

In fact, anyone with a bit of programming knowledge can incorporate open-source algorithms and publicly available image data, train a model, create an app, and start collecting face data from app users. At the most basic entry point, all one really needs is a video camera with built-in face detection algorithms and access to tagged images of a person to start conducting facial recognition. And several commercial API’s exist making it relatively easy to tap into facial coding databases for use in assessing other’s emotional states from face data. If you’re not persuaded by the relative ease at which face data can be captured and used, just drop by any college (or high school) hackathon and see creative face data tech in action.

In this post, the uses of face data are considered, along with a brief summary of the concerns raised about collecting and using face and emotional data. Part II will explore options for face data governance, which include the possibility of new or stronger laws and regulations and policies that a self-regulating industry and individual stakeholders could develop.

The many uses of our faces

Today’s mobile and fixed cameras and AI-based face detection and recognition software enable real-time controlled access to facilities and devices. The same technology allows users to identify fugitive and missing persons in surveillance videos, private citizens interacting with police, and unknown persons of interest in online images.

The technology provides a means for conducting and verifying commercial transactions using face biometric information, tracking people automatically while in public view, and extracting physical traits from images and videos to supplement individual demographic, psychographic, and behavioristic profiles.

Face software and facial coding techniques and models are also making it easier for market researchers, educators, robot developers, and autonomous vehicle safety designers to assess emotional states of people in human-machine interactions.

These and other use cases are possible in part because of advances in camera technology, the proliferation of cameras (think smart phones, CCTVs, traffic cameras, laptop cameras, etc.) and social media platforms, where millions of images and videos are created and uploaded by users every day. Increased computer processing power has led to advances in face recognition and affect-based machine learning research and improved the ability of complex models to execute faster. As a result, face data is relatively easy to collect, process, and use.

One can easily image the many ways face data might be abused, and some of the abuses have already been reported. Face data and machine learning models have been improperly used to create pornography, for example, and to track individuals in stores and other public locations without notice and without seeking permission. Models based on face data have been reportedly developed for no apparent purpose other than for predictive classification of beauty and sexual orientation.

Face recognition models are also subject to errors. Misidentification, for example, is a weakness of face recognition and affect-based models. In fact, despite improvements, face recognition is not perfect. This can translate into false positive identifications. Obviously, tragic consequences can occur if the police or government agencies make decisions based on a false positive (or false negative) identity of a person.

Face data models have been shown to perform more accurately on persons with lighter skin color. And affect models, while raising fewer concerns compared to face recognition due mainly to the slower rate of adoption of the technology, may misinterpret emotions if culture, geography, gender, and other factors are not accounted for in training data.

Of course, instances of reported abuse, bias, and data breaches overshadow the many unreported positive uses and machine learning applications of face data. But as is often the case, problems tend to catch the eyes of policymakers, regulators, and legislators, though overreaction to hyped problems can result in a patchwork of regulations and standards that go beyond addressing the underlying concerns and cause unintended effects, such as possibly stifling innovation and reducing competitiveness.

Moreover, reactionary regulation doesn’t play well with fast-moving disruptive tech, such as face recognition and affective computing, where the law seems to always be in catch-up mode. Compounding the governance problem is the notion that regulators and legislators are not crystal ball readers who can see into the future. Indeed, future uses of face data technologies may be hard to imagine today.

Even so, what matters to many is what governments and companies are doing with still images and videos, and specifically how face data extracted from media are being used, sometimes without consent. These concerns raise questions of transparency, privacy laws, terms of service and privacy policy agreements, data ownership, ethics, and data breaches, among others. They also implicate issues of whether and when federal and state governments should tighten existing regulations and impose new regulations where gaps exist in face data governance.

With recent data breaches making headlines and policymakers and stakeholders gathering in 2018 to examine AI’s impacts, there is no better time than now to revisit the need for stronger laws and to develop new technical- and ethical-based standards and guidelines applicable to face data. The next post will explore these issues.

Regulating Artificial Intelligence Technologies by Consensus

As artificial intelligence technologies continue to transform industries, several prominent voices in the technology community are calling for regulating AI to get ahead of what they see as AI’s actual and potential social and economic impacts. These calls for action follow reports of machine learning classification bias, instances of open source AI tools being misused, lack of transparency in AI algorithms, privacy and data security issues, and forecasts of workforce impacts as AI technologies spread.

Those advocating for strong state or federal legislative action around AI, however, may be disappointed by the rate at which policymakers in the US are tackling sensitive issues. But they may be even more disappointed by recent legislative efforts suggesting that AI technologies will not be regulated in the traditional sense, but instead may be governed through a process of consensus building without targeted and enforceable standards. This form of technological governance–often called “soft law”–is not new. In some industries, soft law governance has evolved and taken over the more traditional command and control “hard law” governance approach.

Certain transformative technologies like AI evolve faster than policymaker’s ability to keep up and as a result, at least in the US, AI’s future may not be tied to traditional legislative lawmaking, notice and rulemaking, and regulation by multiple government agencies whose missions include overseeing specific industry activities. According to those who have studied this trend, the hard law approach is gradually dying when it comes to certain tech, with the exception of technologies in highly-regulated segments such as autonomous vehicles (e.g., safety regulations) and fintech (e.g., regulatory oversight of distributed ledger tech and cryptocurrencies). Instead, an industry-led self-regulatory multistakeholder process is emerging whereby participants, including government policymakers, come up with consensus-based standards and processes that form a framework for regulating industry activities.

This process is already apparent when it comes to AI. Organizations like the IEEE have produced consensus-style standards for ethical considerations in the design and development of AI systems, and private companies are publishing their views on how they and others can self-regulate their activities, products, and services in the AI space. That is not to say that policymakers will play no role in the governance of AI. The US Congress and New York City, for example, are considering or in the process of implementing multistakeholder task forces for tackling the future of AI, workforce and education issues, and harms caused by machine learning algorithms.

A multistakeholder approach to regulating AI technologies is less likely to stifle innovation and competitiveness compared to a hard law prescriptive approach, which could involve numerous regulatory requirements, inflexible standards, and civil penalties for violations. But some view hard law governance as providing a measure of predictability that consensus approaches cannot duplicate. If multistakeholder governance is in AI’s future, stakeholders will need to develop and adopt meaningful standards and the industry will need to demonstrate a willingness to be held accountable in ways that go beyond simply appeasing vocal opponents and assuaging negative public sentiment toward AI. If they don’t, legislators may feel pressure to take a more hard law tact with AI technologies.

Industry Focus: The Rise of Data-Driven Health Tech Innovation

Artificial intelligence-based healthcare technologies have contributed to improved drug discoveries, tumor identification, diagnosis, risk assessments, electronic health records (EHR), and mental health tools, among others. Thanks in large part to AI and the availability of health-related data, health tech is one of the fastest growing segments of healthcare and one of the reasons why the sector ranks highest on many lists.

According to a 2016 workforce study by Georgetown University, the healthcare industry experienced the largest employment growth among all industries since December 2007, netting 2.3 million jobs (about an 8% increase). Fourteen percent of all US workers work in healthcare, making it the country’s largest employment center. According to the latest government figures, the US spends the most on healthcare per person ($10,348) than any other country. In fact, healthcare spending is nearly 18 percent of the US gross domestic product (GDP), a figure that is expected to increase. The healthcare IT segment is expected to grow at a CAGR greater than 10% through 2019. The number of US patents issued in 2017 for AI-infused healthcare-related inventions rose more than 40% compared to 2016.

Investment in health tech has led to the development of some impressive AI-based tools. Researchers at a major university medical center, for example, invented a way to use AI to identify from open source data the emergence of health-related events around the world. The machine learning system they’d created extracted useful information and classified it according to disease-specific taxonomies. At the time of its development ten years ago, the “supervised” and “unsupervised” natural language processing models were leaps ahead of what others were using at the time and earned the inventors national recognition. More recently, medical researchers have created a myriad of new technologies from innovative uses of machine learning technologies.

What most of the above and other health tech innovations today have in common is what drives much of the health tech sector: lots of data. Big data sets, especially labeled data, are needed by AI technologists to train and test machine learning algorithms that produce models capable of “learning” what to look for in new data. And there is no better place to find big data sets than in the healthcare sector. According to an article last year in the New England Journal of Medicine, by 2012 as much as 30% of the world’s stored data was being generated in the healthcare industry.

Traditional healthcare companies are finding value in data-driven AI. Biopharmaceutical company Roche’s recent announcement that it is acquiring software firm Flatiron Health Inc. for $1.9 billion illustrates the value of being able to access health-related data. Flatiron, led by former Google employees, makes software for real-time acquisition and analysis of oncology-specific EHR data and other structured and unstructured hospital-generated data for diagnostic and research purposes. Roche plans to leverage Flatiron’s algorithms–and all of its data–to enhance Roche’s ability to personalize healthcare strategies by way of accelerating the development of new cancer treatments. In a world powered by AI, where data is key to building new products that attract new customers, Roche is now tapped into one of the largest sources of labeled data.

Companies not traditionally in healthcare are also seeing opportunities in health-related data. Google’s AI-focused research division, for example, recently reported in Nature a promising use of so-called deep learning algorithms (a computation network structured to mimic how neurons fire in the brain) to make cardiovascular risk predictions from retinal image data. After training their model, Google scientists said they were able to identify and quantify risk factors in retinal images and generate patient-specific risk predictions.

The growth of available healthcare data and the infusion of AI health tech in the healthcare industry will challenge companies to evolve. Health tech holds the promise of better and more efficient research, manufacturing, and distribution of healthcare products and services, though some have also raised concerns about who will benefit most from these advances, bias in data sets, anonymizing data for privacy reasons, and other legal issues that go beyond healthcare, issues that will need to be addressed.

To be successful, tomorrow’s healthcare leaders may be those who have access to data that drives innovation in the health tech segment. This may explain why, according to a recent survey, healthcare CIOs whose companies plan spending increases in 2018 indicated that their investments will likely be directed first toward AI and related technologies.

“AI vs. Lawyers” – Interesting Result, Bad Headline

The recent clickbait headline “AI vs. Lawyers: The Ultimate Showdown” might lead some to believe that an artificial intelligence system and a lawyer were dueling adversaries or parties on opposite sides of a legal dispute (notwithstanding that an “intelligent” machine has not, as far as US jurisprudence is concerned, been recognized as having machine rights or standing in state or federal courts).

Follow the link, however, and you end up at LawGeex’s report titled “Comparing the Performance of Artificial Intelligence to Human Lawyers in the Review of Standard Business Contracts.” The 37-page report details a straightforward, but still impressive, comparison of the accuracy of machine learning models and lawyers in the course of performing a common legal task.

Specifically, LawGeex set out to consider, in what they call a “landmark” study, whether an AI-based model or skilled lawyers are better at issue spotting while reviewing Non-Disclosure Agreements (NDAs).

Issue spotting is a task that paralegals, associate attorneys, and partners at law firms and corporate legal departments regularly perform. It’s a skill learned early in one’s legal career and involves applying knowledge of legal concepts and issues to identify, in textual materials such as contract documents or court opinions, specific and relevant facts, reasoning, conclusions, and applicable laws or legal principles of concern. Issue spotting in the context of contract review may simply involve locating a provision of interest, such as a definition of “confidentiality” or an arbitration requirement in the document.

Legal tech tool using machine learning algorithms have proliferated in the last couple of years. Many involve combinations of AI technologies and typically required processing thousands of documents (often “labeled” by category or type of document) to create a model that “learns” what to look for in the next document that it processes. In the LawGeex’s study, for example, its model was trained on thousands of NDA documents. Following training, it processed five new NDAs selected by a team of advisors while 20 experienced contract attorneys were given the same five documents and four hours to review.

The results were unsurprising: LawGeex’s trained model was able to spot provisions, from a pre-determined set of 30 provisions, at a reported accuracy of 94% compared to an average of 85% for the lawyers (the highest-performing lawyer, LawGeex noted, had an accuracy of 94%, equaling the software).

Notwithstanding the AI vs. lawyers headline, LawGeek’s test results raise the question of whether the task of legal issue spotting in NDA documents has been effectively automated (assuming a mid-nineties accuracy is acceptable). And do machine learning advances like these generally portend other common tasks lawyers perform someday being performed by intelligent machines?

Maybe. But no matter how sophisticated AI tech becomes, algorithms will still require human input. And algorithms are a long way from being able to handle a client’s sometimes complex objectives, unexpected tactics opposing lawyers might deploy in adversarial situations, common sense, and other inputs that factor into a lawyer’s context-based legal reasoning and analysis duties. No AI tech is currently able to handle all that. Not yet anyway.

A Proposed AI Task Force to Confront Talent Shortage and Workforce Changes

Just over a month after House and Senate commerce committees received companion bills recommending a federal task force to globally examine the “FUTURE” of Artificial Intelligence in the United States (H.R. 4625; introduced Dec. 12, 2017), a House education and workforce committee is set to consider a bill calling for a task force assessment of the impacts of AI technologies on the US workforce.

If enacted, the “Artificial Intelligence Job Opportunities and Background Summary Act of 2018,” or the “AI JOBS Act of 2018” (H.R. 4829; introduced Jan. 18, 2018), would require the Secretary of Labor to report on impacts and growth of AI, industries and workers who may be most impacted by AI, expertise and education needed in an AI economy (compared to today), an identification of workers who will experience expanded career opportunities from AI and those who may be vulnerable to career displacement, and ways to alleviate workforce displacement and prepare a future AI workforce.

Assessing these issues now is critical. Former Senator Tom Daschle and David Beier, in a recent opinion published in The Hill, see a “dramatic set of changes” in the nature of work in America as AI technologies become more entrenched in the US economy. Citing a McKinsey’s Global Institute’s study of 800 occupations, Daschle and Beier conclude that AI technologies will not cause net job losses. Rather, job losses will likely be offset by job changes and gains in fields such as healthcare, infrastructure development, energy, and in fields that do not exist today. They cite Gartner Research estimates suggesting millions of new jobs will be created directly or indirectly as a result of the AI economy.

Already there are more AI-related jobs than high-skilled workers to fill them. One popular professional networking site currently lists over 6,000 “artificial intelligence” jobs. Chinese internet giant Tencent estimates there are only 300,000 AI experts worldwide (recent estimates by Toronto-based Element AI puts that figure at merely 90,000 AI experts). In testimony this week before a House Information Technology subcommittee, Intel’s CTO Amir Khosrowshahi said that, “Workers need to have the right skills to create AI technologies and right now we have too few workers to do the job.” Huge salaries for newly-minted computer science PhDs will drive more to the field, but job openings are likely to outpace available talent even as record numbers of students enroll in machine learning and related AI classes at top US universities.

If AI job gains shift workers disproportionately toward high-skilled jobs, the result may be continued job opportunity inequality. A 2016 study by Georgetown University’s Center on Education and the Workforce found that “out of the 11.6 million jobs created in the post-recession economy, 11.5 million went to workers with at least some college education.” The study authors found that, since 2008, graduate degree workers had the most job gains (83%), predominantly in high-skill occupations, and college graduates saw the next highest job gains (57%), also in high-skill jobs. The highest job growth was seen in management, healthcare, and computer and mathematical sciences. These same fields are prime for a future influx of highly-skilled AI workers.

The US is not alone in raising concerns about job and workforce changes in an AI economy. The UK Parliament’s Artificial Intelligence Committee, for example, is confronting challenges in re-educating UK’s workforce to improve skills needed to work alongside AI systems. The US may need to do more to catch up, according to Mr. Khosrowshahi. “Current federal funding levels [in tech education],” he argued, “are not keeping pace with the rest of the industrialized world.”

The AI JOBS Act of 2018 presents an opportunity for US policymakers to develop novel approaches to address expected workforce shifts caused by an AI economy. If nothing is done, the US could find itself at a competitive disadvantage with increasing economic inequality.

New York City Task Force to Consider Algorithmic Harm

One might hear discussions about backpropagation, activation functions, and gradient descent when visiting an artificial intelligence company. But more recently, terms like bias and harm associated with AI models and products have entered tech’s vernacular. These issues also have the attention of many outside of the tech world following reports of AI systems performing better for some users than for others when making life-altering decisions about prison sentences, creditworthiness, and job hiring, among others.

Considering the recent number of accepted conference papers about algorithmic bias, AI technologists, ethicists, and lawyers seems to be proactively addressing the issue by sharing with each other various technical and other solutions. At the same time, at least one legislative body–the New York City Council–has decided to explore ways to regulate AI technology with an unstated goal of rooting out bias (or at least revealing its presence) by making AI systems more transparent.

New York City’s passage of the “Automated decision systems used by agencies” law (NYC Local Law No. 49 of 2018, effective January 11, 2018), creates a task force under the aegis of Mayor de Blasio’s office. The task force will convene no later than early May 2018 for the purpose of identifying automated decision systems used by New York City government agencies, developing procedures for identifying and remedying harm, developing a process for public review, and assessing the feasibility of archiving automated decision systems and relevant data.

The law defines an “automated decision system” as:

“a computerized implementations of algorithms, including those derived from machine learning or other data processing or artificial intelligence techniques, which are used to make or assist in making decisions.”

The law defines an “agency automated decision system” as:

“an automated decision system used by an agency to make or assist in making decisions concerning rules, policies or actions implemented that impact the public.”

While the law does not specifically call out bias, the source of algorithmic unfairness and harm can be traced in large part to biases in the data used to train algorithmic models. Data can be inherently biased when it reflects the implicit values of a limited number of people involved in its collection and labelling, or when the data chosen for a project does not represent a full cross-section of society (which is partly the result of copyright and other restrictions on access to proprietary data sets, and the ease of access to older or limited data sets where groups of people may be unrepresented or underrepresented). A machine algorithm trained on this data will “learn” the biases, and can perpetuate bias when it is asked to make decisions.

Some argue that making algorithmic black boxes more transparent is key to understanding whether an algorithm is perpetuating bias. The New York City task force could recommend that software companies that provide automated decision systems to New York City agencies make their systems transparent by disclosing details about their models (including source code) and producing the data used to create their models.

Several stakeholders have already expressed concerns about disclosing algorithms and data to regulators. What local agency, for example, would have the resources to evaluate complex AI software systems? And how will source code and data, which may embody trade secrets and include personal information, be safeguarded from inadvertent public disclosure? And what recourse will model developers have before agencies turn over algorithms (and the underlying source code and data) in response to Freedom of Information requests and court-issued subpoenas?

Others have expressed concerns that regulating at the local level may lead to disparate and varying standards and requirements, placing a huge burden on companies. For example, New York City may impose standards different from those imposed by other local governments. Already, companies are having to deal with different state regulations governing AI-infused autonomous vehicles, and will soon have to contend with European Union regulations concerning algorithmic data (GDPR Art. 22; effective May 2018) that may be different than those imposed locally.

Before their job is done, New York City’s task force will likely hear from many stakeholders, each with their own special interests. In the end, the task force’s recommendations, especially those on how to remedy harm, will receive careful scrutiny, and not just by local stakeholders, but also by policymakers far removed from New York City, because as AI technology impacts on society grow, the pressure to regulate AI systems on a national basis is likely to grow.

Information and/or references used for this post came from the following:

NYC Local Law No. 49 of 2018 (available at here) and various hearing transcripts

Letter to Mayor Bill de Blasio, Jan. 22, 2018, from AI Now and others (available here)

EU General Data Protection Regulations (GDPR), Art. 22 (“Automated Individual Decision-Making, Including Profiling”), effective May 2018.

Dixon et. al “Measuring and Mitigating Unintended Bias in Text Classification”; AAAI 2018 (accepted paper).

W. Wallach and G. Marchant, “An Agile Ethical/Legal Model for the International and National Governance of AI and Robotics”; AAAI 2018 (accepted paper).

D. Tobey, “Software Malpractice in the Age of AI: A Guide for the Wary Tech Company”; AAAI 2018 (accepted paper).

Recognizing Individual Rights: A Step Toward Regulating Artificial Intelligence Technologies

In the movie Marjorie | Prime (August 2017), John Hamm plays an artificial intelligence version of Marjorie’s deceased husband, visible to Marjorie as a holographic projection in her beachfront home. As Marjorie (played by Lois Smith) interacts with Hamm’s Prime through a series of one-on-one conversations, the AI improves its cognition by observing and processing Marjorie’s emotional expressions, movements, and speech. The AI also learns from interactions with Marjorie’s son-in-law (Tim Robbins) and daughter (Geena Davis), as they recount highly personal and painful episodes of their lives. Through these interactions, Prime ends up possessing a collective knowledge greater and more personal and intimate than Marjorie’s original husband ever had.

Although not directly explored in the movie’s arc, the futuristic story touches on an important present-day debate about the fate of private personal data being uploaded to commercial and government AI systems, data that theoretically could persist in a memory device long after the end of the human lives from which the data originated, for as long as its owner chooses to keep it. It also raises questions about the fate of knowledge collected by other technologies perceiving other people’s lives, and to what extent these percepts, combined with people’s demographic, psychographic, and behavioristic characteristics, would be used to create sharply detailed personality profiles that companies and governments might abuse.

These are not entirely hypothetical issues to be addressed years down the road. Companies today provide the ability to create digital doppelgangers, or human digital twins, using AI technologies. And collecting personal information from people on a daily basis as they interact with digital assistants and other connected devices is not new. But as Marjorie|Prime and several non-cinematic AI technologies available today illustrate, AI systems allow the companies who build them unprecedented means for receiving, processing, storing, and taking actions based on some of the most personal information about people, including information about their present, past, and trending or future emotional states, which marketers for years have been suggesting are the keys to optimizing advertising content.

Congress recently acknowledged that “AI technologies are rapidly evolving in capability and application throughout society,” but the US currently has no federal policy towards AI and no part of the federal government has ownership of the advancement of AI technologies. Left unchecked in an unregulated market, as is largely the case today, AI technological advancements may trend in a direction that may be inconsistent with collective values and goals.

Identifying individual rights

One of the first questions those tasked with developing laws, regulations, and policies directed toward AI should ask is, what are the basic individual rights–rights that arise in the course of people interacting with AI technologies–that should be recognized? Answering that question will be key to ensuring that enacted laws and promulgated regulations achieve one of Congress’s recently stated goals: ensuring AI technologies benefit society. Answering that question now will be key to ensuring that policymakers have the necessary foundation in front of them and will not be unduly swayed by influential stakeholders as they take up the task of deciding how and/or when to regulate AI technologies.

Identify individual rights leads to their recognition, which leads to basic legal protections, whether in the form of legislation or regulation, or, initially, as common law from judges deciding if and how to remedy a harm to a person or property caused by an AI system. Fortunately, identifying individual rights is not a formidable task. The belief that people have a right to be let alone in their private lives, for example, established the basic premise for privacy laws in the US. Those same concerns about intrusion into personal lives ought to be among the first considerations by those tasked with formulating and developing AI legislation and regulations. The notion that people have a right to be let alone has led to the identification of other individual rights that could protect people in their interactions with AI systems. These include the right of transparency and explanation, the right of audit (with the objective to reveal bias, discrimination, and content filtering, and thus maintain accountability), the right to know when you are dealing with an AI system and not a human, and the right to be forgotten (that is, mandatory deletion of one’s personal data), among others.

Addressing individual rights, however, may not persuade everyone to trust AI systems, especially when AI creators cannot explain precisely the basis for certain actions taken by trained AI technologies. People want to trust that owners and developers of AI systems that use private personal data will employ the best safeguards to protect that data. Trust, but verify, may need to play a role in policy-making efforts even if policies appear to comprehensively address individual rights. Trust might be addressed by imposing specific reporting and disclosure requirements, such as those suggested by federal lawmakers in pending federal autonomous driving legislation.

In the end, however, laws and regulations developed with privacy and other individual rights in mind, that address data security and other concerns people have about trusting their data to AI companies, will invariably include gaps, omissions, and incomplete definitions. The result may be unregulated commercial AI systems, and AI businesses finding workarounds. In such instances, people may have limited options other than to fully opt out, or accept that individual AI technology developers’ work was motivated by ethical considerations and a desire to make something that benefits society. The pressure within many tech companies and startups to push new products out to the world every day, however, could make prioritizing ethical considerations a challenge. Many organizations focused on AI technologies, some of which are listed below, are working to make sure that doesn’t happen.

Rights, trust, and ethical considerations in commercial endeavors can get overshadowed by financial interests and the subjective interests and tastes of individuals. It doesn’t help that companies and policymakers may also feel that winning the race for AI dominance is a factor to be considered (which is not to say that such a consideration is antithetical to protecting individual rights). This underscores the need for thoughtful analysis, sooner rather than later, of the need for laws and regulations directed toward AI technologies.

To learn more about some of these issues, visit the websites of the following organizations, who are active in AI policy research: Access Now, AI Now, and Future of Life.