Will “Leaky” Machine Learning Usher in a New Wave of Lawsuits?

A computer science professor at Cornell University has a new twist on Marc Andreessen’s 2011 pronouncement that software is “eating the world.”  According to Vitaly Shmatikov, it is “machine learning [that] is eating the world” today.  His personification is clear: machine learning and other applications of artificial intelligence are disrupting society at a rate that shows little sign of leveling off.  With increasing numbers of companies and individual developers producing customer-facing AI systems, it seems all but inevitable that some of those systems will create unintended and unforeseen consequences, including harm to individuals and society at large.  Researchers like Shmatikov and his colleagues are starting to reveal those consequences, including one–“leaky” machine learning models–that could have serious legal implications.

In this post, the causes of action that might be asserted against a developer who publishes, either directly or via a machine learning as a service (MLaaS) cloud platform, a leaky machine learning model are explored along with possible defenses, using the lessons of cybersecurity litigation as a jumping off point.

Over the last decade or more, the plaintiffs bar and the defendants bar have contributed to a body of case law now commonly referred to as cybersecurity law.  This was inevitable, given the estimated 8,000 data breaches involving 11 billion data records made public since 2005. After some well-publicized breaches, lawsuits against companies that reported data thefts began appearing more frequently on court dockets across the country.  Law firms responded by marketing “cybersecurity” practice groups whose attorneys advised clients about managing risks associated with data security and the aftermath of data exfiltrations by cybercriminals.  Today, with an estimated 70-percent of all data being generated by individuals (often related to those individuals’ activities), and with organizations globally expected to lose over 146 billion more data records between 2018 and 2023 if current cybersecurity tools are not improved (Juniper Research), the number of cybersecurity lawsuits is not expected to level off anytime soon.

While data exfiltration lawsuits may be the most prevalent type of cybersecurity lawsuit today, the plaintiffs bar has begun targeting other cyber issues, such as ransomware attacks, especially those affecting healthcare facilities (in ransomware cases, malicious software freezes an organization’s computer systems until a ransom is paid; while frozen, a business may not be able to effectively deliver critical services to customers).  The same litigators who have expanding into ransomware may soon turn their attention to a new kind of cyber-like “breach”: the so-called leaky machine learning models built on thousands of personal data records.

In their research, sponsored in part by the National Science Foundation (NSF) and Google, Shmatikov and his colleagues in early 2017 “uncovered multiple privacy and integrity problems in today’s [machine learning] pipelines” that could be exploited by adversaries to infer if a particular person’s data record was used to train machine learning models.  See R. Shokri, Membership Inference Attacks Against Machine Learning Models, Proceedings of the 38th IEEE Symposium on Security and Privacy (2017). They describe a health care machine learning model that could reveal to an adversary whether or not a certain patient’s data record was part of the model’s training data.  In another example, a different model trained on location and other data, used to categorize mobile users based on their movement patterns, was found to reveal by way of query whether a particular user’s location data was used.

These scenarios certainly raise alarms from a privacy perspective, and one can imagine other possible instances of machine learning models revealing the kind of personal information to an attacker that might cause harm to individuals.  While actual user data may not be revealed in these attacks, the mere inference that a person’s data record was included in a data set used to train a model, what Shmatikov and previous researchers refer to as “membership inference,” could cause that person (and the thousands of others whose data records were used) embarrassment and other consequences.

Assuming for the sake of argument that a membership inference disclosure of the kind described above becomes legally actionable, it is instructive to consider what businesses facing membership inference lawsuits might expect in terms of statutory and common law causes of action so they can take steps to mitigate problems and avoid contributing more cyber lawsuits to already busy court dockets (and of course avoid leaking confidential and private information).  These causes of actions could include invasion of privacy, consumer protection laws, unfair trade practices, negligence, negligent misrepresentation, innocent misrepresentation, negligent omission, breach of warranty, and emotional distress, among others.  See, e.g., Sony Gaming Networks & Cust. Data Sec. Breach Lit., 996 F.Supp. 2d 942 (S.D. Cal 2014) (evaluating data exfiltration causes of action).

Negligence might be alleged, as it often is in cybersecurity cases, if plaintiff (or class action members) can establish evidence of the following four elements: the existence of a legal duty; breach of that duty; causation; and cognizable injury.  Liability might arise where defendant failed to properly safeguard and protect private personal information from unauthorized access, use, and disclosure, where such use and disclosure caused actual money or property loss or the loss of a legally-protected interest in the confidentiality and privacy of plaintiff’s/members’ personal information.

Misrepresentation might be alleged if plaintiff/members can establish evidence of a misrepresentation upon which they relied and a pecuniary loss resulting from the reliance of the actionable misrepresentation. Liability under such a claim could arise if, for example, plaintiff’s data record has monetary value and a company makes representations about its use of security and data security measures in user agreements, terms of service, and/or privacy policies that turn out to be in error (for example, the company’s measures lack robustness and do not prevent an attack on a model that is found to be leaky).  In some cases, actual reliance on statements or omissions may need to be alleged.

State consumer protection laws might also be alleged if plaintiff/members can establish (depending on which state law applies) deceptive misrepresentations or omissions regarding the standard, quality, or grade of a particular good or service that causes harm, such as those that mislead plaintiff/members into believing that their personal private information would be safe upon transmission to defendant when defendant knew of vulnerabilities in its data security systems. Liability could arise where defendant was deceptive in omitting notice that its machine learning model could reveal to an attacker the fact that plaintiff’s/members’ data record was used to train the model. In certain situations, plaintiff/members might have to allege with particularity the specific time, place, and content of the misrepresentation or omission if the allegations are based in fraud.

For their part, defendants in membership inference cases might challenge plaintiff’s/members’ lawsuit on a number of fronts.  As an initial tactic, defendants might challenge plaintiff’s/members’ standing on the basis of failing to establish an actual injury caused by the disclosure (inference) of data record used to train a machine learning model.  See In re Science App. Intern. Corp. Backup Tape Data, 45 F. Supp. 3d 14 (D.D.C. 2014) (considering “when, exactly, the loss or theft of something as abstract as data becomes a concrete injury”).

Defendants might also challenge plaintiff’s/members’ assertions that an injury is imminent or certainly impending.  In data breach cases, defendants might rely on state court decisions that denied standing where injury from a mere potential risk of future identity theft resulting from the loss of personal information was not recognized, which might also apply in a membership inference case.

Defendants might also question whether permission and/or consent was given by a plaintiffs/members for the collection, storage, and use of personal data records.  This query would likely involve plaintiff’s/members’ awareness and acceptance of membership risks when they allowed their data to be used to train a machine learning model.  Defendants would likely examine whether the permission/consent given extended to and was commensurate in scope with the uses of the data records by defendant or others.

Defendants might also consider applicable agreements related to a user’s data records that limited plaintiff’s/members’ choice of forum and which state laws apply, which could affect pleading and proof burdens.  Defendants might rely on language in terms of service and other agreements that provide notice of the possibility of external attacks and the risks of leaks and membership inference.  Many other challenges to a plaintiff’s/members’ allegations could also be explored.

Apart from challenging causes of action on the merits, companies should also consider taking other measures like those used by companies in traditional data exfiltration cases.  These might include proactively testing their systems (in the case of machine learning models, testing for leakage) and implementing procedures to provide notice of a leaky model.  As Shmatikov and his colleagues suggest, machine learning model developers and MLaaS providers should take into account the risk that their models will leak information about their training data, warn customers about this risk, and “provide more visibility into the model and the methods that can be used to reduce this leakage.”  Machine learning companies should account for foreseeable risks and associated consequences and assess whether they are acceptable compared to the benefits received from their models.

If data exfiltration, ransomware, and related cybersecurity litigation are any indication, the plaintiffs bar may one day turn its attention to the leaky machine learning problem.  If machine learning model developers and MLaaS providers want to avoid such attention and the possibility of litigation, they should not delay taking reasonable steps to mitigate the leaky machine learning model problem.

California Jury to Decide if Facebook’s Deep Learning Facial Recognition Creates Regulated Biometric Information

Following a recent decision issued by Judge James Donato of the U.S. District Court for the Northern District of California, a jury to be convened in San Francisco in July will decide whether a Facebook artificial intelligence technology creates regulated “biometric information” under Illinois’ Biometric Information Privacy Act (BIPA).  In some respects, the jury’s decision could reflect general sentiment toward AI during a time when vocal opponents of AI have been widely covered in the media.  The outcome could also affect how US companies, already impacted by Europe’s General Data Protection Regulation (GDPR), view their use of AI technologies to collect and process user-supplied data. For lawyers, the case could highlight effective litigation tactics in highly complex AI cases where black box algorithms are often unexplainable and lack transparency, even to their own developers.

What’s At Stake? What Does BIPA Cover?

Uniquely personal biometric identifiers, such as a person’s face and fingerprints, are often seen as needing heightened protection from hackers due to the fact that, unlike a stolen password that one can reset, a person cannot change their face or fingerprints if someone makes off with digital versions and uses them to steal the person’s identity or gain access to the person’s biometrically-protected accounts, devices, and secure locations. The now 10-year old BIPA (740 ILCS 14/1 (2008)) was enacted to ensure users are made aware of instances when their biometric information is being collected, stored, and used, and to give users the option to opt out. The law imposes requirements on companies and penalties for non-compliance, including liquidated and actual damages. At issue here, the law addresses “a scan” of a person’s “face geometry,” though it falls short of explicitly defining those terms.

Facebook users voluntarily upload to their Facebook accounts digital images depicting them, their friends, and/or family members. Some of those images are automatically processed by an AI technology to identify the people in the images. Plaintiffs–here, putative class action individuals–argue that Facebook’s facial recognition feature involves a “scan” of a person’s “face geometry” such that it collects and stores biometric data in violation of BIPA.

Summary of the Court’s Recent Decision

In denying the parties’ cross-motions for summary judgment and allowing the case to go to trial, Judge Donato found that the Plaintiffs and Facebook “offer[ed] strongly conflicting interpretations of how the [Facebook] software processes human faces.” See In Re Facebook Biometric Information Privacy Litigation, slip op. (Dkt. 302), No. 3:15-cv-03747-JD (N.D. Cal. May 14, 2018). The Plaintiffs, he wrote, argued that “the technology necessarily collects scans of face geometry because it uses human facial regions to process, characterize, and ultimately recognize face images.” On the other hand, “Facebook…says the technology has no express dependency on human facial features at all.”

Addressing Facebook’s interpretation of BIPA, Judge Donato considered the threshold question of what BIPA’s drafters meant by a “scan” in “scan of face geometry.” He rejected Facebook’s suggestion that BIPA relates to an express measurement of human facial features such as “a measurement of the distance between a person’s eyes, nose, and ears.” In doing so, he relied on extrinsic evidence in the form of dictionary definitions, specifically Merriam-Webster’s 11th, for an ordinary meaning of “to scan” (i.e., to “examine” by “observation or checking,” or “systematically . . . in order to obtain data especially for display or storage”) and “geometry” (in everyday use, means simply a “configuration,” which in turn denotes a “relative arrangement of parts or elements”).  “[N]one of these definitions,” the Judge concluded, “demands actual or express measurements of spatial quantities like distance, depth, or angles.”

The Jury Could Face a Complex AI Issue

Digital images contain a numerical representation of what is shown in the image, specifically the color (or grayscale), transparency, and other information associated with each pixel of the image. An application running on a computer can render the image on a display device by reading the file data to identify what color or grayscale level each pixel should display. When one scans a physical image or takes a digital photo with a smartphone, they are systematically generating this pixel-level data. Digital image data may be saved to a file having a particular format designated by a file extension (e.g., .GIF, .JPG, .PNG, etc.).

A deep convolutional neural network–a type of AI–can be used to further process a digital image file’s data to extract features from the data. In a way, the network replicates a human cognitive process of manually examining a photograph. For instance, when we examine a face in a photo, we take note of features and attributes, like a nose and lip shape and their contours as well as eye color and hair. Those and other features may help us recall from memory whose face we are looking at even if we have never seen the image before.

A deep neural network, once it is fully trained using many different face images, essentially works in a similar manner. After processing image file data to extract and “recognize” features, the network uses the features to classify the image by associating it with an identity, assuming it has “seen” the face before (in which case it may compare the extracted features to a template image of the face, preferably several images of the face). Thus, a digital image file may contain a numerical representation of what is shown in the image, and a deep neural network creates a numerical representation of features shown in the digital image to perform classification.  A question for the jury, then, may involve deciding if the processing of uploaded digital images using a deep convolutional neural network involves “a scan” of “a person’s face geometry.” This question will challenge the parties and their lawyers to assist the jury to understand digital image files and the nuances of AI technology.

For Litigators, How to Tackle AI and Potential AI Bias?

The particulars of advanced AI have not been central to a major federal jury case to date.  Thus, the Facebook case offers an opportunity to evaluate a jury’s reaction to a particular AI technology.

In its summary judgment brief, Facebook submitted expert testimony that its AI “learned for itself what features of an image’s pixel values are most useful for the purposes of characterizing and distinguishing images of human faces” and it “combines and weights different combinations of different aspects of the entire face image’s pixel value.” This description did not persuade Judge Donato to conclude that an AI with “learning” capabilities escapes BIPA’s reach, at least not as a matter of law.  Whether it will be persuasive to a jury is an open question.

It is possible some potential jurors may have preconceived notions about AI, given the hype surrounding use cases for the technology.  Indeed, outside the courthouse, AI’s potential dark side and adverse impacts on society have been widely reported. Computer vision-enabled attack drones, military AI systems, jobs being taken over by AI-powered robots, algorithmic harm due to machine learning bias, and artificial general intelligence (AGI) taking over the world appear regularly in the media.  If bias for and against AI is not properly managed, the jury’s final decision might be viewed by some as a referendum on AI.

For litigators handling AI cases in the future, the outcome of the Facebook case could provide a roadmap for effective trial strategies involving highly complex AI systems that defy simple description.  That is not to say that the outcome will create a new paradigm for litigating tech. After all, many trials involve technical experts who try to explain complex technologies in a way that is impactful on a jury. For example, complex technology is often the central dispute in cases involving intellectual property, medical malpractice, finance, and others.  But those cases usually don’t involve technologies that “learn” for themselves.

How Will the Outcome Affect User Data Collection?

The public is becoming more aware that tech companies are enticing users to their platforms and apps as a way to generate user-supplied data. While the Facebook case itself may not usher in a wave of new laws and regulations or even self-policing by the tech industry aimed at curtailing user data collection, a sizeable damages award from the jury could have a measured chilling effect. Indeed, some companies may be more transparent about their data collection and provide improved notice and opt-out mechanisms.

10 Things I Wish Every Legal Tech Pitch Would Include

Due in large part to the emergence of advanced artificial intelligence-based legal technologies, the US legal services industry today is in the midst of a tech shakeup.  Indeed, the number of advanced legal tech startups continues to increase. And so too are the opportunities for law firms to receive product presentations from those vendors.

Over the last several months, I’ve participated in several pitches and demos from leading legal tech vendors.  Typically delivered by company founders, executives, technologists, and/or sales, these presentations have been delivered live, as audio-video conferences, audio by phone with a separate web demo, or pre-recorded audio-video demos (e.g., a slide deck video with voiceover).  Often, a vendor’s lawyer will discuss how his or her company’s software addresses various needs and issues arising in one or more law firm practice areas.  Most presentations will also include statements about advanced legal tech boosting law firm revenues, making lawyers more efficient, and improving client satisfaction (ostensibly, a reminder of what’s at stake for those who ignore this latest tech trend).

Based on these (admittedly small number of) presentations, here is my list of things I wish every legal tech presentation would provide:

1. Before a presentation, I wish vendors would provide an agenda and the bios of the company’s representatives who will be delivering their pitch. I want to know what’s being covered and who’s going to be giving the presentation.  Do they have a background in AI and the law, or are they tech generalists? This helps prepare for the meeting and frame questions during Q&A (and reduces the number of follow-up conference calls).  Ideally, presenters should know their own tech inside and out and an area of law so they can show how the software makes a difference in that area. I’ve seen pitches by business persons who are really good at selling, and programmers who are really good at talking about bag-of-words bootstrapping algorithms. It seems that best person to pitch legal tech is someone who knows both the practice of law and how tech works in a typical law firm setting.

2. Presenters should know who they are talking to at a pitch and tailor accordingly.  I’m a champion for legal tech and want to know the details so I can tell my colleagues about your product.  Others just want to understand what adopting legal tech means for daily law practice. Find out who’s who and which practice group(s) or law firm function they represent and then address their specific needs.

3. The legal tech market is filling up with single-function offerings that generally perform a narrow function, so I want to understand all the ways your application might help replace or augment law firm tasks. Mention how your tech could be utilized in different practice areas where it’s best deployed (or where it could be deployed in the future in the case of features still in the development pipeline). The more capabilities an application has, the more attractive your prices begin to appear (and the fewer vendor roll-outs and training sessions I and my colleagues will have to sit through).

4. Don’t oversell capabilities. If you claim new features will be implemented soon, they shouldn’t take months to deploy. If your software is fast and easy, it had better be both, judged from an experienced attorney’s perspective. If your machine learning text classification models are not materially different than your competitors’, avoid saying they’re special or unique. On the other hand, if your application includes a demonstrable unique feature, highlight it and show how it makes a tangible difference compared to other available products in the market. Finally, if your product shouldn’t be used for high stakes work or has other limitations, I want to understand where that line should be drawn.

5. Speaking of over-selling, if I hear about an application’s performance characteristics, especially numerical values for things like accuracy, efficiency, and time saved, I want to see the benchmarks and protocols used to measure those characteristics.  While accuracy and other metrics are useful for distinguishing one product from another, they can be misleading. For example, a claim that a natural language processing model is 95% accurate at classifying text by topic should be backed up with comparisons to a benchmark and an explanation of the measurement protocol used.  A claim that a law firm was 40-60% more efficient using your legal tech, without providing details about how those figures were derived, isn’t all that compelling.

6. I want to know if your application has been adopted by top law firms, major in-house legal departments, courts, and attorneys general, but be prepared to provide data to back up claims.  Are those organizations paying a hefty annual subscription fee but only using the service a few times a month, or are your cloud servers overwhelmed by your user base? Monthly active users, API requests per domain, etc., can place usage figures in context.

7. I wish proof-of-concept testing was easier.  It’s hard enough to get law firm lawyers and paralegals interested in new legal tech, so provide a way to facilitate testing your product. For example, if you pitch an application for use in transactional due diligence, provide a set of common due diligence documents and walk through a realistic scenario. This may need to be done for different practice groups and functions at a firm, depending on the nature of the application.

8. I want to know how a legal tech vendor has addressed confidentiality, data security, and data assurance in instances where a vendor’s legal tech is a cloud-based service. If a machine learning model runs on a platform that is not behind the firm’s firewall and intrusion detection systems, that’s a potential problem in terms of safeguarding client confidential information. While vendors need to coordinate first with a firm’s CSO about data assurance/security, I also want to know the details.

9. I wish vendors would provide better information demonstrating how their applications helped others develop business. For example, tell me if using your application helped a law firm respond to a Request for Proposal (RFP) and won, or a client gave more work to a firm that demonstrated advanced legal tech acumen.  While such information may merely be anecdotal, I can probably champion legal tech on the basis of business development even if a colleague isn’t persuaded with things like accuracy and efficiency.

10. Finally, a word about design.  I wish legal tech developers would place more emphasis on UI/UX. It seems some of the offerings of late appear ready for beta testing rather than a roll-out to prospective buyers. I’ve seen demos in which a vendor’s interface contained basic formatting errors, something any quality control process would have caught. Some UIs are bland and lack intuitiveness when they should be user-friendly and have a quality look and feel. Use a unique theme and graphics style, and adopt a brand that stands out. For legal tech to succeed in the market, technology and design both must meet expectations.

[The views and opinions expressed in this post are solely the author’s and do not necessarily represent or reflect the views or opinions of the author’s employer or colleagues.]

“AI vs. Lawyers” – Interesting Result, Bad Headline

The recent clickbait headline “AI vs. Lawyers: The Ultimate Showdown” might lead some to believe that an artificial intelligence system and a lawyer were dueling adversaries or parties on opposite sides of a legal dispute (notwithstanding that an “intelligent” machine has not, as far as US jurisprudence is concerned, been recognized as having machine rights or standing in state or federal courts).

Follow the link, however, and you end up at LawGeex’s report titled “Comparing the Performance of Artificial Intelligence to Human Lawyers in the Review of Standard Business Contracts.” The 37-page report details a straightforward, but still impressive, comparison of the accuracy of machine learning models and lawyers in the course of performing a common legal task.

Specifically, LawGeex set out to consider, in what they call a “landmark” study, whether an AI-based model or skilled lawyers are better at issue spotting while reviewing Non-Disclosure Agreements (NDAs).

Issue spotting is a task that paralegals, associate attorneys, and partners at law firms and corporate legal departments regularly perform. It’s a skill learned early in one’s legal career and involves applying knowledge of legal concepts and issues to identify, in textual materials such as contract documents or court opinions, specific and relevant facts, reasoning, conclusions, and applicable laws or legal principles of concern. Issue spotting in the context of contract review may simply involve locating a provision of interest, such as a definition of “confidentiality” or an arbitration requirement in the document.

Legal tech tool using machine learning algorithms have proliferated in the last couple of years. Many involve combinations of AI technologies and typically required processing thousands of documents (often “labeled” by category or type of document) to create a model that “learns” what to look for in the next document that it processes. In the LawGeex’s study, for example, its model was trained on thousands of NDA documents. Following training, it processed five new NDAs selected by a team of advisors while 20 experienced contract attorneys were given the same five documents and four hours to review.

The results were unsurprising: LawGeex’s trained model was able to spot provisions, from a pre-determined set of 30 provisions, at a reported accuracy of 94% compared to an average of 85% for the lawyers (the highest-performing lawyer, LawGeex noted, had an accuracy of 94%, equaling the software).

Notwithstanding the AI vs. lawyers headline, LawGeek’s test results raise the question of whether the task of legal issue spotting in NDA documents has been effectively automated (assuming a mid-nineties accuracy is acceptable). And do machine learning advances like these generally portend other common tasks lawyers perform someday being performed by intelligent machines?

Maybe. But no matter how sophisticated AI tech becomes, algorithms will still require human input. And algorithms are a long way from being able to handle a client’s sometimes complex objectives, unexpected tactics opposing lawyers might deploy in adversarial situations, common sense, and other inputs that factor into a lawyer’s context-based legal reasoning and analysis duties. No AI tech is currently able to handle all that. Not yet anyway.

When It’s Your Data But Another’s Stack, Who Owns The Trained AI Model?

Cloud-based machine learning algorithms, made available as a service, have opened up the world of artificial intelligence to companies without the resources to organically develop their own AI models. Tech companies that provide these services promise to help companies extract insights from the company’s unique customer, employee, product, business process, and other data, and to use those insights to improve decisions, recommendations, and predictions without the company having an army of data scientists and full stack developers. Simply open an account, provide data to the service’s algorithms, train and test an algorithm, and then incorporate the final model into the company’s toolbox.

While it seems reasonable to assume a company owns a model it develops with its own data–even one based on an algorithm residing on another’s platform–the practice across the industry is not universal. Why this matters is simple: a company’s model (characterized in part by model parameters, network architecture, and architecture-specific hyperparameters associated with the model) may provide the company with an advantage over competitors. For instance, the company may have unique and proprietary data that its competitors do not have. If a company wants to extract the most value from its data, it should take steps to not only protect its valuable data, but also the models created based on that data.

How does a company know if it has not given away any rights to its own data uploaded to another’s cloud server, and that it owns the models it created based on its data? Conversely, how can a company confirm the cloud-based machine learning service has not reserved any rights to the model and data for its own use? The answer, of course, is likely embedded in multiple terms of service, privacy, and user license agreements that apply to the use of the service. If important provisions are missing, vague, or otherwise unfavorable, a company may want to look at alternative cloud-based platforms.

Consider the following example. Suppose a company wants to develop an AI model to improve an internal production process, one the company has enhanced over the years and that gives it a competitive advantage over others. Maybe its unique data set derives from a trade secret process or reflects expertise that its competitors could not easily replicate. With data in hand, the company enters into an agreement with a cloud-based machine learning service, uploads its data, and builds a unique model from the service’s many AI technologies, such as natural language processing (NLP), computer vision classifiers, and supervised learning tools. Once the best algorithms are selected, the data is used to train them and a model is created. The model can then be used in the company’s operations to improve efficiency and cut costs.

Now let us assume the cloud service provider’s terms of service (TOS) states something like the following hypothetical:

“This agreement does not impliedly or otherwise grant either party any rights in or to the other’s content, or in or to any of the other’s trade secret or rights under intellectual property laws. The parties acknowledge and agree that Company owns all of its existing and future intellectual property and other rights in and concerning its data, the applications or models Company creates using the services, and Company’s project information provided as part of using the service, and Service owns all of its existing and future intellectual property and other rights in and to the services and software downloaded by Company to access the services. Service will not access nor use Company’s data, except as necessary to provide the services to Company.”

These terms would appear to generally protect certain of the company’s rights and interest in its data and any models created using the company’s data, and further the terms indicate the machine learning service will not use the company’s data and the model trained using the data, except to provide the service. That last part–the exception–needs careful attention, because how a company defines the services it performs can be stated broadly.

Now consider the following additional hypothetical TOS:

“Company acknowledges that Service may access Company’s data submitted to the service for the purpose of developing and improving the service, and any other of Service’s current, future, similar, or related services, and Company agrees to grant Service, its licensees, affiliates, assigns, and agents an irrevocable, perpetual right and permission to use Company’s data, because without those rights and permission Service cannot provide or offer the services to Company.”

The company may not be comfortable agreeing to those terms, unless the terms are superseded with other, more favorable terms in another applicable agreement related to using the cloud-based service.

So while AI may be “the new electricity” powering large portions of the tech sector today, data is an important commodity all on its own, and so are the models behind an AI company’s products. So don’t forget to review the fine print before uploading company data to a cloud-based machine learning service.

Legal Tech, Artificial Intelligence, and the Practice of Law in 2018

Due in part to a better understanding of available artificial intelligence legal tech tools, more lawyers will adopt and use AI technologies in 2018 than ever before. Better awareness will also drive creation and marketing of specialized legal practice areas within law firms focused on AI, more lawyers with AI expertise, new business opportunities across multiple practice groups, and the possibly of another round of Associate salary increases as the demand for AI talent both in-house and at law firms escalates in response to the continued expansion of AI in key industries.

The legal services industry is poised to adopt AI technologies at the highest level seen to date. But that doesn’t mean lawyers are currently unfamiliar with AI. In fact, AI technologies are widely used by legal practitioners, such as tech that power case law searches (websites services in which a user’s natural language search query is processed by a machine learning algorithm, and displays a ranked and sorted list of relevant cases), and that are used in electronic discovery of documents (predictive analytics software that finds and tags relevant electronic documents for production during a lawsuit based on a taxonomy of keywords and phrases agreed upon by the parties).

Newer AI-based software solutions, however, from companies like Kira and Ross, among dozens of others now available, may improve the legal services industry’s understanding of AI. These solutions offer increased efficiency, improved client service, and reduced operating costs. Efficiency, measured in terms of the time it takes to respond to client questions and the amount of billable hours expended, can translate into reduced operating costs for in-house counsel, law firm lawyers, judges, and their staffs, which is sure to get attention. AI-powered contract review software, for example, can take an agreement provided by opposing counsel and nearly instantaneously spot problems, a process that used to take an Associate or Partner a half-hour or more to accomplish, depending on the contract’s complexity. In-house counsel are wary of paying biglaw hourly rates for such mundane review work, so software that can perform some of the work seems like a perfect solution. The law firms and their lawyers that become comfortable using the latest AI-powered legal tech will be able to boast of being cutting edge and client-focused.

Lawyers and law firms with AI expertise are beginning to market AI capabilities on their websites to retain existing clients and capture new business, and this should increase in 2018. Firms are focusing efforts on industry segments most active in AI, such as tech, financial services (banks and financial technology companies or “fintech”), computer infrastructure (cloud services and chip makers), and other peripheral sectors, like those that make computer vision sensors and other devices for autonomous vehicles, robots, and consumer products, to name a few. Those same law firms are also looking at opportunities within the ever-expanding software as a service industry, which provides solutions for leveraging information from a company’s own data, such as human resources data, process data, quality assurance data, etc. Law practitioners who understand how these industries are using AI technologies, and AI’s limitations and potential biases, will have an edge when it comes to business development in the above-mentioned industry segments.

The impacts of AI on the legal industry in 2018 may also be reflected in law firm headcounts and salaries. Some reports suggest that the spread of AI legal tech could lead to a decrease in lawyer ranks, though most agree this will happen slowly and over several years.

At the same time, however, the increased attention directed at AI technologies by law firm lawyers and in-house counsel in 2018 may put pressure on law firms to adjust upward Associate salaries, like many did during the dot-com era when demand for new and mid-level lawyers equipped to handle cash-infused Silicon Valley startups’ IPO, intellectual property, and contract issues skyrocketed. A possible Associate salary spike in 2018 may also be a consequence of, and fueled by, huge salaries reportedly being paid in the tech sector, where big tech companies spent billions in 2016 and 2017 acquiring AI start-ups to add talent to their rosters. A recent report suggests annual salary and other incentives in the range of $350,000 to $500,000 being paid for newly-minted PhDs and to those with just a few years of AI experience. At those levels, recent college graduates contemplating law school and a future in the legal profession might opt instead to head to graduate school for a Masters or PhD in an AI field.

The AI Summit New York City: Takeaways For the Legal Profession

This week, business, technology, and academic thought leaders in Artificial Intelligence are gathered at The AI Summit in New York City, one of the premier international conferences offered for AI professionals. Below, I consider two of the three takeaways from Summit Day 1, published yesterday by AI Business, from the perspective of lawyers looking for opportunities in the burgeoning AI market.

“1. The tech landscape is changing fast – with big implications for businesses”

If a year from now your law practice has not fielded at least one query from a client about AI technologies, you are probably going out of your way to avoid the subject. It is almost universally accepted that AI technologies in one form or another will impact nearly every industry. Based on recently-published salary data, the industries most active in AI are tech (think Facebook, Amazon, Alphabet, Microsoft, Netflix, and many others), financial services (banks and financial technology companies or “fintech”), and computer infrastructure (Amazon, Nvidia, Intel, IBM, and many others; in areas such as chips for growing computational speed and throughput, and cloud computing for big data storage needs).

Of course, other industries are also seeing plenty of AI development. The automotive industry, for example, has already begun adopting machine learning, computer vision, and other AI technologies for autonomous vehicles. The robotics and chatbot industries have seen great strides lately, both in terms of humanoid robotic development, and consumer-machine interaction products such as stationary and mobile digital assistants (e.g., personal robotic assistants, as well as utility devices like autonomous vacuums). And of course the software as a service industry, which leverages information from a company’s own data, such as human resources data, process data, healthcare data, etc., seems to offers new software solutions to improve efficiencies every day.

All of this will translate into consumer adoption of specific AI technologies, which is reported to already be at 10% and growing. The fast pace of technology development and adoption may translate into new business opportunities for lawyers, especially for those who invest time to learning about AI technologies. After all, as in any area of law, understanding the challenges facing clients is essential for developing appropriate legal strategies, as well as for targeting business development resources.

“2. AI is a disruptive force today, not tomorrow – and business must adapt”

Adapt or be left behind is a cautionary tale, but one with plenty of evidence demonstrating that it holds true in many situations.

Lawyers and law firms as an institution are generally slow to change, often because things that disrupt the status quo are viewed through a cautionary lens. This is not surprising, given that a lawyer’s work often involves thoughtful spotting of potential risks, and finding ways to address those risks. A fast-changing business landscape racing to keep up with the latest in AI technologies may be seen as inherently risky, especially in the absence of targeted laws and regulations providing guidance, as is the case today in the AI industry. Even so, exploring how to adapt one’s law practice to a world filled with AI technologies should be near the top of every lawyer’s list of things to consider for 2018.