Court OKs Use of Computer-Assisted Review of Electronically Stored Information

Business Litigation Update

Date: February 27, 2012


In Da Silva Moore v. Publicis Groupe, a federal court for the first time approved defendants' use of computer-assisted review, a/k/a predictive coding, and acknowledged that it can be a faster, more reliable and cost-effective method for document review and production. Predictive coding, which takes advantage of recent advances in artificial intelligence, enables faster and less costly document review than other available technologies.

Computer-Assisted Review Explained

In the court's decision, United States Magistrate Judge Andrew J. Peck of the Southern District of New York described his understanding of computer assisted review as follows:

By computer-assisted coding, I mean tools ... that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e., training by) a human reviewer. Unlike manual review [a/k/a linear review], where a review is done by the most junior staff, computer-assisted coding involves a senior partner (or small team) who review and code a "seed set" of documents.

Magistrate Peck furthers describes the process as a continual source of refinement by which the seed set is used and the computer and the reviewer work together to create a predictive model applied to large set of documents. The goal of the process is to identify responsive documents that are appropriate for further review and non-responsive or less-responsive documents that should be excluded from further review.

As Magistrate Peck explained, review of electronically stored information (ESI) has evolved from searching data sets using simple keywords to using algorithms that return documents containing keywords and other documents containing words that frequently appeared together with keywords. Predictive coding takes litigation review software further by "recall[ing] documents that have similar concepts to those in a set of identified documents even if the same words aren't used in the two documents."

The Da Silva Moore Case

In the case before the court, plaintiff Monique Da Silva Moore, a former public relations executive, and four other female plaintiffs sued Publicis Groupe, an international advertising conglomerate, and its U.S. public relations subsidiary, MSL Group, for gender discrimination under Title VII and other statutes. Among the other claims were Equal Pay and Fair Labor Standards Act claims that the plaintiffs sought to bring as collective and class actions, seeking $100 million in damages, back pay and attorneys' fees.

Based on the court's prior rulings, defense counsel successfully limited discovery to 30 rather than 44 custodians for the first phase. However, even with just 30 custodians, the relevant data set, almost entirely email correspondence, comprised more than 3 million documents. The defendants proposed the use of predictive coding to conduct document review, and the parties, with the court's guidance, developed an ESI protocol.

The Da Silva Moore ESI Protocol

The Da Silva Moore ESI protocol calls for a random sampling of nearly 2,500 of the more than 3 million documents to be manually reviewed for relevant documents, constituting the initial seed set that will be used to train the predictive coding software. The defendants agreed to provide the seed set and the defendants' issue coding (the parties had agreed to eight issue tags) to the plaintiffs for their review and also agreed to modify the coding based on the plaintiffs' suggestions. Additionally, the defendants coded documents using "judgmental sampling" to further train the predictive coding software. This step involves keyword and Boolean searches run against the entire data set with the top ranking results manually reviewed by the parties.

Next, the entire data set will undergo seven "iterative" rounds "to stabilize the training of the software." Each iterative round requires the return and manual review of 500 documents for each of the issue tags. Finally, after the seventh iterative round, the defendants will review another random sample set of nearly 2,500 documents from those documents that the predictive coding software deems irrelevant to ensure proper coding. The plaintiffs will then be entitled to review all documents reviewed by the defendants (except privileged documents), including those deemed irrelevant.

Although the court was careful to note that predictive coding is not necessarily the best ESI review method for every case, the bottom line for this case is that the parties only will have to review from 10,000 to 15,000 documents to train the software to code properly.

Factors Favoring the Use of Predictive Coding

In endorsing the ESI protocol, the court cited Fed. R. Civ. P. 1, which requires construing the Federal Rules of Civil Procedure "to secure the just, speedy, and inexpensive determination of every action and proceeding," and 26(b)(2)(C), which embodies the proportionality principle and requires the court to consider, among other things, the burden and expense of the proposed discovery versus the benefits and needs of the case. As the court noted, "the idea is not to make this perfect, it's not going to be perfect. The idea is to make it significantly better than the alternatives without nearly as much cost."

The court repeatedly indicated that this method may not be suitable in every case involving ESI. Several important factors drove the court's decision in the Da Silva case:

  • The parties' agreement to use predictive coding
  • The size of the entire data set (more than 3 million documents)
  • The accuracy of predictive coding compared to other available methods
  • The Rule 26(b)(2)(C) proportionality principle
  • "[T]he transparent process proposed by [Defendants]"

The court's adoption of predictive coding in this case sets a framework for using computer-assisted technology in cases involving large-scale ESI. If applied correctly, computer-assisted review can improve cost and time savings and increase accuracy of responsiveness in high-volume data cases.