One of the concerns expressed by those studying algorithmic decision-making is the apparent lack of transparency. Those impacted by adverse algorithmic decisions often seek transparency to better understand the basis for the decisions. In the case of software used in legal proceedings, parties who seek explanations about software face a number of obstacles, including those imposed by evidentiary rules, criminal or civil procedural rules, and by software companies that resist discovery requests.
The closely-followed issue of algorithmic transparency was recently considered by a California appellate court in People v. Superior Court of San Diego County, slip op. Case D073943 (Cal. App. 4th October 17, 2018), in which the People sought relief from a discovery order requiring the production of software and source code used in the conviction of Florencio Jose Dominguez. Following a hearing and review of the record and amicus briefs in support of Dominguez filed by the American Civil Liberties Union, the American Civil Liberties Union of San Diego and Imperial Counties, the Innocence Project, Inc., the California Innocence Project, the Northern California Innocence Project at Santa Clara University School of Law, Loyola Law School’s Project for the Innocent, and the Legal Aid Society of New York City, the appeals court granted the People’s relief. In doing so, the court considered, but was not persuaded by, the defense team’s “black box” and “machine testimony” arguments.
At issue on appeal was Dominguez’s motion to compel production of a DNA testing program called STRmix used by local prosecutors in their analysis of forensic evidence (specifically, DNA found on the inside of gloves). STRmix is a “probabilistic genotyping” program that expresses a match between a suspect and DNA evidence in terms the probability of a match compared to a coincidental match. Probabilistic genotyping is said to reduce subjectivity in the analysis of DNA typing results. Dominguez’s counsel moved the trial court for an order compelling the People to produce the STRmix software program and related updates as well as its source code, arguing that defendant had a right to look inside the software’s “black box.” The trial court granted the motion and the People sought writ relief from the appellate court.
On appeal, the appellate court noted that “computer software programs are written in specialized languages called source code” and “source code, which humans can read, is then translated into [a] language that computers can read.” Cadence Design Systems, Inc. v. Avant! Corp., 29 Cal. 4th 215, 218 at fn.3 (2002). The lab that used STRmix testified that it had no way to access the source code, which it licensed from a software authorized seller. Thus, the court considered whether the company that created the software should produce it. In concluding that the company was not obligated to produce the software and source code, the court, citing precedent, found that the company would have had no knowledge of the case but for the defendant’s subpoena duces tecum, and it did not act as part of the prosecutorial team such that it was obligated to turn over exculpatory evidence (assuming software itself is exculpatory, which the court was reluctant to find).
With regard to the defense team’s “black box” argument, the appellate court found nothing in the record to indicate that the STRmix software suffered a problem, as the defense team argued, that might have affected its results. Calling this allegation speculative, the court concluded that the “black box” nature of STRmix was not itself sufficient to warrant its production.
Moreover, the court was unpersuaded by the defense team’s argument that the STRmix program essentially usurped the lab analyst’s role in providing the final statistical comparison, and so the software program—not the analyst using the software—was effectively the source of the expert opinion rendered at trial. The lab, the defense argued, merely acted in a scrivener’s capacity for STRmix’s analysis, and since the machine was providing testimony, Dominguez should be able to evaluate the software to defend against the prosecution’s case against him.
The appellate court disagreed. While acknowledging the “creativity” of the defense team’s “machine testimony” argument (which relied heavily on Berkeley law professor Andrea Roth’s “Machine Testimony” article (126 Yale L.J. 1972 (2017)), the panel noted the testimony that STRmix did not act alone, that there were humans in the loop: “[t]here are still decisions that an analyst has to make on the front end in terms of determining the number of contributors to a particular sample and determin[ing] which peaks are from DNA or from potentially artifacts” and that the program then performs a “robust breakdown of the DNA samples,” based at least in part on “parameters [the lab] set during validation.” Moreover, after STRmix renders “the diagnostics,” the lab “evaluate[s] … the genotype combinations … . to see if that makes sense, given the data [it’s] looking at.” After the lab “determine[s] that all of the diagnostics indicate that the STRmix run has finished appropriately,” it can then “make comparisons to any person of interest or … database that [it’s] looking at.”
While the appellate court’s decision mostly followed precedent and established procedure, it could easily have gone the other way and affirmed the trial judge’s decision granting Defendant’s motion to compel the STRmix software and source code, which would have given Dominguez better insight into the nature of the software’s algorithms, its parameters and limitations in view of validation studies, and the various possible outputs the model could have produced given a set of inputs. In particular, the court might have affirmed the trial judge’s decision to grant access to the STRmix software if the policy of imposing transparency in STRmix’s algorithmic decisions were given more consideration from the perspective of actual harm that might occur if software and source code are produced. Here, the source code owner’s objection to production was based in part on trade secret and other confidentiality concerns; however, procedures already exist to handle those concerns. Indeed, source code reviews happen all the time in the civil context, such as in patent infringement matters involving software technologies. While software makers are right to be concerned about the harm to their businesses if their code ends up in the wild, the real risk of this happening can be low if proper procedures, embodied in a suitable court-issued Protective Order, are followed by lawyers on both sides of a matter and if the court maintains oversight and demands status updates from the parties to ensure compliance and integrity in the review process. Instead of following the trial court’s approach, however, the appellate court conditional access to STRmix’s “black box” on the demonstration of specific errors in the program’s results, which seems intractable: only by looking into the black box in the first place is a party able to understand whether problems exist that affect the result.
Interestingly, artificial intelligence had nothing to do with the outcome of the appellate court’s decision, yet the panel noted that “We do not underestimate the challenges facing the legal system as it confronts developments in the field of artificial intelligence.” The judges acknowledged that the notion of “machine testimony” in algorithmic decision-making matters is a subject about which there are widely divergent viewpoints in the legal community, a possible prelude to what is ahead when artificial intelligence software cases make their way through the courts in criminal or non-criminal cases. To that, the judges cautioned, “when faced with a novel method of scientific proof, we have required a preliminary showing of general acceptance of the new technique in the relevant scientific community before the scientific evidence may be admitted at trial.”
Lawyers in future artificial intelligence cases should consider how best to frame arguments concerning machine testimony in both civil and criminal contexts to improve their chances of overcoming evidentiary obstacles. Lawyers will need to effectively articulate the nature of artificial intelligence decision-making algorithms, as well as the relative roles of data scientists and model developers who make decisions about artificial intelligence model architecture, hyperparameters, data sets, model inputs, training and testing procedures, and the interpretation of results. Today’s artificial intelligence systems do not operate autonomously; there will always be humans associated with a model’s output or result and those persons may need to provide expert testimony beyond the machine’s testimony. Even so, transparency will be important to understanding algorithmic decisions and for developing an evidentiary record in artificial intelligence cases.