Campbell et al v. Facebook Inc.

Filing 109

MOTION for Extension of Time to File Plaintiffs' Motion for Extension of Class Certification and Summary Judgment Deadlines filed by Matthew Campbell, Michael Hurley. (Attachments: # 1 Proposed Order, # 2 Declaration of David Rudolph, # 3 Exhibit 1, # 4 Exhibit 2, # 5 Exhibit 3, # 6 Exhibit 4, # 7 Exhibit 5, # 8 Exhibit 6, # 9 Exhibit 7, # 10 Exhibit 8, # 11 Exhibit 9, # 12 Exhibit 10, # 13 Exhibit 11, # 14 Exhibit 12, # 15 Exhibit 13, # 16 Exhibit 14, # 17 Exhibit 15, # 18 Exhibit 16, # 19 Exhibit 17, # 20 Exhibit 18, # 21 Exhibit 19, # 22 Exhibit 20, # 23 Exhibit 21)(Sobol, Michael) (Filed on 9/16/2015)

Download PDF
EXHIBIT 16 Lieff Cabraser Heimann & Bernstein, LLP 275 Battery Street, 29th Floor San Francisco, CA 94111-3339 t 415.956.1000 f 415.956.1008 September 1, 2015 VIA E-MAIL Priyanka Rajagopalan, Esq. PRajagopalan@gibsondunn.com Gibson, Dunn & Crutcher LLP 1881 Page Mill Road Palo Alto, California 94304 RE: Campbell v. Facebook, Inc., N.D. Cal. Case No. 13-cv-05996-PJH Dear Priyanka: Thank you for your letter of August 20, 2015, responding to my letter of July 23, 2015 regarding Facebook’s implementation of predictive coding. Based on our understanding of the process Facebook is implementing, we have concerns that Facebook’s technology-assisted review (“TAR”) may suffer from significant flaws and may not constitute a reliable document search and review process. We are particularly concerned with Facebook’s implementation of keyword culling prior to processing, as well as the training process that Facebook has engaged in without input from Plaintiffs. Keyword Culling As an initial matter Facebook’s process of keyword culling—which was not agreed to by Plaintiffs prior to implementation but was instead unilaterally adopted by Facebook—is discouraged and recognized as a flawed methodology that is likely to filter out a significant portion of responsive documents. See Rio Tinto PLC v. Vale S.A., 2015 U.S. Dist. LEXIS 94117 (S.D.N.Y. July 15, 2015) (“[P]re-culling [using keywords] should not occur in a perfect world.”); Progressive Cas. Ins. Co. v. Delaney, No. l l-CV-00678, 2014 U.S. Dist. LEXIS 69166, 2014 WL 3563467 (D. Nev. July 18, 2014), (where parties had stipulated to a keyword then manual review protocol, the court would not allow Progressive to use TAR only on the positive keyword hits). The Equivio process best practices are described in Equivio’s “Relevance Project Framework Process Guidelines for Conducting a Predictive Coding Project with the Equivio Relevance Application” (“Guidelines”) (available at https://law.duke.edu/sites/default/files/images/centers/judicialstudies/Panel_1Background_Paper_3.pdf). The Guidelines break down the process into phases for preparation, assessment, training, catch-up (special case of training), decision and verification. San Francisco New York Nashville www.lieffcabraser.com Priyanka Rajagopalan, Esq. September 1, 2015 Page 2 The Guidelines’ description of the preparation phase includes the following (p. 2): 2. Metadata culling should be performed prior to loading the data into the predictive coding application . . . In rare cases, basic keyword filters will also be applied. It is recommended that application of keyword filters be minimized as keyword filters are liable to inadvertently cull significant sections of relevant data. 3. Prior to starting the process, it is recommended to engage with opposing counsel, communicating intention to use predictive coding, and verifying the definition of relevance to be applied. Here, Facebook proposed the use of keyword filters without indicating Facebook’s intention to use them in the context of Equvio processing. A primary purpose of TAR is obviate—not simply reinforce the inadequacies of —keyword searching, and courts have recognized that the value of TAR is as an alternative to traditional keyword searches. See, e.g., Nat'l Day Laborer Org. Network v. United States Immigration & Customs Enforcement Agency, 877 F. Supp. 2d 87, 110 (S.D.N.Y. 2012) (“[T]he use of keywords without testing and refinement (or more sophisticated techniques) will in fact not be reasonably calculated to uncover all responsive material.”). Keyword searches alone may fail to capture up to 80 percent of relevant documents, 1 and Facebook’s proposal of only processing documents returned by keyword searches only exacerbates this problem. Plaintiffs did not consider, and did not agree to, Facebook’s application of keyword filters as a method of culling documents prior to the application of the predicative coding process. Accordingly, in the absence of further information from Facebook regarding the details of the training it has conducted thus far as well as the population of documents against which the keyword searches were run (discussed below), Plaintiffs do not consent to Facebook’s current implementation of predictive coding as we understand it. Seeding Your letter states “[W]e created a training set to teach the computer. The training set includes (i) a set of documents identified as responsive in our linear review, (ii) the results of a review of randomly selected documents from a subset of the overall data set, and (iii) the materials included in Facebook’s Production Volumes 3, 4, and 6. As we continue to review and produce responsive documents, those documents will likewise be incorporated into training sets to further train the computer.” We interpret this response as indicating Facebook is engaging in “seeding,” which potentially biases the results of the process. As noted in the Guidelines (p. 5): Maura R. Grossman & Terry Sweeney, What Lawyers Need to Know About Search Tools: The Alternatives to Keyword Searching Include Linguistic and Mathematical Models for Concept Searching, Nat. L. J. (Aug. 23, 2010). 1 Priyanka Rajagopalan, Esq. September 1, 2015 Page 3 In most cases, seeding is not used and is not required. Seeding refers to a situation where the user has a set of documents which he knows to be relevant, and which can be fed into Relevance to train the system. Unaided seeding is liable to bias the results; the system will find only documents which are similar to the seed documents, but will not capture other types of relevant documents which may be present, unbeknown to the application, in the population. Plaintiffs have been given no input into the seeding process; indeed, Plaintiffs are largely in the dark as to the actual composition of the training sets. We request that Facebook: 1) produce and identify by Bates number the “set of documents identified as responsive in our linear review;” 2) produce and identify by Bates number “the results of [the] review of randomly selected documents from a subset of the overall data set” identified in your letter, 3) confirm that all documents produced as Facebook’s Production Volumes 3, 4, and 6 were included in the training set; 4) produce and identify by Bates number any further documents “incorporated into training sets to further train the computer;” and 5) provide the responsiveness classification made by the expert(s) with respect to each seed document. Without a complete understanding of the documents being used as part of the seeding process, as well as input into that process, Plaintiffs simply cannot judge the effectiveness of Facebook’s TAR implementation at this stage. We also note that there is no provision in the Equivio Guidelines for continuing to introduce new documents manually into the training process post-initial seeding, and request more particulars for the basis of this action, its implementation, and how it fits into the Equivio process. Control Set Under the Equivio Guidelines, during the Assessment phase Equivio creates a “control set” of randomly selected documents, which are assessed by the expert(s) and used by Equivio in guiding the training process. Please produce and identify by Bates number those documents, and, with respect to each document in the set, provide the responsiveness classifications made (i) initially by your experts, and (ii) by Equivio after stabilization. Documents Against Which Search Terms Were Run In response to our request for “[t]he number of documents against which the search terms were run to produce the initial 600,000 unique documents used to create the predictive coding model,” your letter states, “[t]he total number of documents against which the search terms were run is not readily available using existing tools.” Please explain why Facebook is unable to provide this number. Please also provide 1) a precise listing of the repositories in which the documents against which the search terms were run were stored, and 2) how this document population was selected for keyword searching. Plaintiffs are not in a position to Priyanka Rajagopalan, Esq. September 1, 2015 Page 4 assess the overall effectiveness of Facebook’s TAR implementation or its appropriateness in this instance without this information. Please provide the requested documents and information by September 4, 2015. Given the impending summary judgment and class certification deadlines, Facebook’s production thus far—a significant portion of which are either publicly-available or highly duplicative (i.e. individual responses to email chains spread across many documents)—appears to be inadequate, which may in part be due to Facebook’s failure to implement best practices in its predictive coding process. For example, Facebook has yet to produce any of the documents related to Facebook’s decision to scan private messages for URLs, and to increase the “Like” count for third-party websites as discussed in Hank Bates’ August 20, 2015 letter. If you have any questions about or would like to discuss the foregoing, please let us know. Sincerely, David T. Rudolph DTR/wp 1271660.1

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?