Kadrey et al v. Meta Platforms, Inc.

Filing 288

Discovery Order re: 267 Joint Discovery Letter Brief. Signed by Judge Thomas S. Hixson on 11/25/2024. (tshlc1, COURT STAFF) (Filed on 11/25/2024)

Download PDF
1 2 3 4 UNITED STATES DISTRICT COURT 5 NORTHERN DISTRICT OF CALIFORNIA 6 7 RICHARD KADREY, et al., Plaintiffs, 8 DISCOVERY ORDER v. 9 10 Re: Dkt. No. 267 META PLATFORMS, INC., Defendant. 11 United States District Court Northern District of California Case No. 23-cv-03417-VC (TSH) 12 13 14 15 A. ECF No. 267 1. Issue #4 a. RFPs 64, 77, 45, 46, 53, 54, 59 16 Plaintiffs move to compel on seven RFPs. 17 RFP 64: “Documents and Communications sufficient to show each instance within the last 18 three years where You have licensed copyrighted works for Meta’s commercial use.” 19 The Court agrees with Meta that this RFP is unreasonably overbroad because it seeks 20 information concerning each instance in which Meta licensed a copyrighted work for Meta’s 21 commercial use, regardless of whether the commercial use had anything to do with AI or any issue 22 that is relevant to this case. This would include, for example, licensing a song to use in an 23 advertisement. Plaintiffs’ motion is DENIED as to RFP 64. 24 25 RFP 77: “Communications Concerning any licensing copyrighted works that were used to train the Meta Language Models.” 26 The Court reads this RFP as if there were an “of” between “licensing” and “copyrighted.” 27 Meta argues that with respect to copyrighted textual works, no such licenses exist and thus there is 28 nothing to produce. However, this RFP is not limited to textual works, and although Plaintiffs’ 1 copyrighted works are textual in nature, Meta does not explain why relevant evidence would be 2 limited to textual works. The Court also does not think that “communications concerning” any 3 licensing of copyrighted works are limited to communications that successfully resulted in a 4 license. Communications would also be responsive if they resulted in no license being obtained. 5 The Court does agree with Meta that the words “that were used to train the Meta Language 6 Models” means the RFP is limited to communications concerning any licensing of copyrighted 7 works that were, in fact, used to train the Meta Language Models. Accordingly, the Court 8 GRANTS Plaintiffs’ motion as to RFP 77 in part and ORDERS Meta to produce responsive 9 documents regardless of whether the communications successfully resulted in a license, and not 10 limited to textual works. United States District Court Northern District of California 11 RFP 45: “All Documents and Communications Concerning any licensing, accreditation, or 12 attribution mechanism, or similar tool for crediting, compensating, or seeking consent from 13 owners of copyrighted works that were used to train the Meta Language Models.” 14 Meta does not dispute the relevance of this information but argues it has no responsive 15 documents. Accordingly, the Court GRANTS Plaintiffs’ motion as to RFP 45. If there really is 16 nothing responsive, then there is nothing for Meta to produce. 17 RFP 46: “All Documents and Communications sufficient to show Your actual or projected 18 income from the sale or licensing of the Meta Language Models.” RFP 53: “All Documents and 19 Communications Concerning any income statement, balance sheet, or statement of cash flows, 20 Concerning any of the Meta Language Models.” 21 Meta does not dispute the relevance of this information and states it is not withholding 22 documents responsive to these RFPs. Accordingly, the Court GRANTS Plaintiffs’ motion as to 23 RFPs 46 and 53. If Meta has already produced the responsive documents, then there is nothing 24 more for Meta to produce. 25 26 27 28 RFP 54: “All Documents and Communications Concerning any decision by You to not develop an interface for end users to interact with any of the Meta Language Models.” Other than asserting that the documents sought by RFP 54 are “clearly relevant,” Plaintiffs have not actually explained how responsive documents are relevant to any issue in this case. 2 1 2 3 4 5 6 United States District Court Northern District of California 7 Plaintiffs’ motion is DENIED as to RFP 54. RFP 59: “Documents and Communications Concerning the ability of any Meta Language Model to output fictional works.” Plaintiffs do not explain why documents responsive to this RFP are relevant to this case. Their motion is DENIED as to RFP 59. b. Time Frame for Document Productions The parties agree that Meta has limited the relevant time period for document production to 8 January 1, 2022 to the present. Plaintiffs argue that this time frame is too narrow. They say that 9 the proposed class period begins on July 7, 2020 (three years before the Complaint was filed). 10 They also argue that copying or discussions “may have” occurred before 2022. They say “[t]he 11 Court should require Meta to run searches (and produce documents from) as far back as necessary 12 to capture all instances in which Meta copied—or discussed copying—copyrighted data that was 13 used to train Llama, or, at a minimum, as far back as the beginning of the class period.” 14 The request for Meta to produce documents from “as far back as necessary” to capture 15 relevant conduct is indeterminate. The Court has to specify a time frame. The Court therefore 16 considers the alternative request to expand the time frame to “the beginning of the class period.” 17 That is a request to expand document production by a year and a half. This is the sort of request 18 the Court does not expect to be made on the last day to move to compel concerning existing 19 written discovery, which is when Plaintiffs filed this request. The date range for document 20 production is something that should be resolved much sooner than that. Plaintiffs have filed a 21 motion to compel 35 days before the close of fact discovery seeking an additional year and a half 22 of document production. That likely cannot be done by the close of fact discovery, and ordering 23 Meta to try threatens to turn the close of fact discovery into a train wreck. The Court continues to 24 be concerned by Plaintiffs’ repeated attempts to seek major expansions of the scope of discovery 25 right near the end of fact discovery. A fire drill in the last month of fact discovery concerning a 26 foundational issue that could have been raised much sooner is not proportional to the needs of the 27 case. Plaintiffs’ motion to expand the time frame for document production is DENIED. 28 3 1 2 3 Issue #5 a. Llama Source Code Plaintiffs move to compel additional Llama source code. Meta argues that Plaintiffs have 4 no existing written discovery requests that seek source code, and that such requests were first 5 served on October 9, 2024, and Meta’s responses were due on November 8, 2024, which is the 6 date the joint discovery letter brief was filed. 7 United States District Court Northern District of California 2. Plaintiffs’ section of the letter brief does not identify any discovery request to which 8 source code is responsive. A party moving to compel should demonstrate that it asked for the 9 materials in discovery. Plaintiffs have not shown that. Accordingly, their motion to compel is 10 DENIED as to the source code. 11 b. Llama Training Data 12 Plaintiffs move to compel on RFPs 1-3 and 7 and rog 1. RFPs 1-3 seek “[t]he Training 13 Data” for Llama 1, 2 and 3. RFP 7 seeks “[d]ocuments and Communications to, from, or with 14 Library Genesis (aka LibGen) Concerning Training Data.” Rog 1 asks: 15 16 17 18 19 20 21 22 23 Describe in detail the data You have used to train or otherwise develop the Meta Language Models, Including, for each: a. How You obtained the data, e.g., by scraping the data, purchasing it from third parties, or by other means; b. All sources of Data, including any third parties that provided data sets; c. To the extent the data was derived from publicly available websites, a list of all such websites and, for each, the percentage of the data corpus that is derived from that website; d. The categories of content included in the data and the extent to which each category is represented in the data corpus (i.e., as a percentage of data used to train the model); 24 e. All policies and procedures Related to identifying, assessing, vetting and selecting sources of data for the model. 25 Plaintiffs do not present any argument with respect to RFP 7. Meta states that it is not 26 aware of any responsive documents. The Court DENIES the motion to compel as to RFP 7 27 because the motion is unexplained. 28 With respect to the training data, Plaintiffs say they need more information about how 4 United States District Court Northern District of California 1 Meta obtained and used it. Plaintiffs say that if the only issue in the case were the important 2 binary question of whether Plaintiffs’ copyrighted materials were in Meta’s possession in some 3 fashion, the data already produced would answer that question. But Plaintiffs say it is also 4 important that they be permitted discovery that goes to the importance of and breadth of use of the 5 copyrighted protected materials at issue. For these reasons, Plaintiffs do not believe it is enough 6 for them to have access to the set of training data for Llamas 1-3, and submit they are entitled to 7 information from Meta that identifies the iterations of copies of training data with copyrighted 8 material or books within their possession, custody or control. Plaintiffs seek information on how 9 many times Meta downloaded each copyrighted work, from where it downloaded each, when it 10 downloaded each, and how it is using each copy. To be clear, in light of Meta’s stated burden 11 objection, Plaintiffs are not demanding that Meta produce each iteration of the copies. They 12 would settle for a declaration or an amended answer to rog 1 that provides this information. 13 Meta has several responses. The major one is that Meta has identified and produced copies 14 of the actual book-related training datasets that allegedly include copyrighted works that were 15 actually used to train the Llama models. Meta argues that Plaintiffs’ request that Meta scour the 16 entire company to determine if there are other stored and duplicative copies of those datasets – 17 copies which would not have been the ones used to train Llama – is overly burdensome and not 18 proportional to the needs of the case. 19 The Court DENIES Plaintiffs’ motion to compel because RFPs 1-3 and rog 1 (and RFP 7) 20 did not ask for this information. RFPs 1-3 asked for the training data for Llama 1, 2 and 3. If 21 there are other copies of the same copyrighted works in Meta’s possession that were not used to 22 train Llama, the RFPs didn’t ask for those. Similarly, rog 1 asked Meta to “[d]escribe in detail the 23 data You have used to train or otherwise develop the Meta Language Models . . .” This rog didn’t 24 ask about the full scope of what Meta has or does with the copyrighted works. It just asked about 25 the data used to train or otherwise develop the language models. Plaintiffs do not make any 26 argument that Meta’s response to rog 1 is incomplete or otherwise defective. Plaintiffs argue that 27 they seek “information about how books were used in LLM training and operationalization, and 28 how in turn the LLMs or book corpuses are used by Meta.” Rog 1 did not ask for that. It asked 5 1 Meta to describe the data it used to train or develop the models, including (a) how Meta got it, (b) 2 all the sources of data, (c) the public websites it got data from, (d) the categories of content 3 included in the data, and (e) the policies and procedures for selecting sources of data. The rog 4 asked for what data Meta used and where and how and why it got it. It did not ask anything about 5 how the data was used in training or operationalization, or how the LLMs or book corpuses are 6 used by Meta. Accordingly, Plaintiffs’ motion is DENIED as to these discovery requests. 7 IT IS SO ORDERED. 8 9 Dated: November 25, 2024 10 THOMAS S. HIXSON United States Magistrate Judge United States District Court Northern District of California 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 6

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?