Kadrey et al v. Meta Platforms, Inc.
Filing
288
Discovery Order re: 267 Joint Discovery Letter Brief. Signed by Judge Thomas S. Hixson on 11/25/2024. (tshlc1, COURT STAFF) (Filed on 11/25/2024)
1
2
3
4
UNITED STATES DISTRICT COURT
5
NORTHERN DISTRICT OF CALIFORNIA
6
7
RICHARD KADREY, et al.,
Plaintiffs,
8
DISCOVERY ORDER
v.
9
10
Re: Dkt. No. 267
META PLATFORMS, INC.,
Defendant.
11
United States District Court
Northern District of California
Case No. 23-cv-03417-VC (TSH)
12
13
14
15
A.
ECF No. 267
1.
Issue #4
a.
RFPs 64, 77, 45, 46, 53, 54, 59
16
Plaintiffs move to compel on seven RFPs.
17
RFP 64: “Documents and Communications sufficient to show each instance within the last
18
three years where You have licensed copyrighted works for Meta’s commercial use.”
19
The Court agrees with Meta that this RFP is unreasonably overbroad because it seeks
20
information concerning each instance in which Meta licensed a copyrighted work for Meta’s
21
commercial use, regardless of whether the commercial use had anything to do with AI or any issue
22
that is relevant to this case. This would include, for example, licensing a song to use in an
23
advertisement. Plaintiffs’ motion is DENIED as to RFP 64.
24
25
RFP 77: “Communications Concerning any licensing copyrighted works that were used to
train the Meta Language Models.”
26
The Court reads this RFP as if there were an “of” between “licensing” and “copyrighted.”
27
Meta argues that with respect to copyrighted textual works, no such licenses exist and thus there is
28
nothing to produce. However, this RFP is not limited to textual works, and although Plaintiffs’
1
copyrighted works are textual in nature, Meta does not explain why relevant evidence would be
2
limited to textual works. The Court also does not think that “communications concerning” any
3
licensing of copyrighted works are limited to communications that successfully resulted in a
4
license. Communications would also be responsive if they resulted in no license being obtained.
5
The Court does agree with Meta that the words “that were used to train the Meta Language
6
Models” means the RFP is limited to communications concerning any licensing of copyrighted
7
works that were, in fact, used to train the Meta Language Models. Accordingly, the Court
8
GRANTS Plaintiffs’ motion as to RFP 77 in part and ORDERS Meta to produce responsive
9
documents regardless of whether the communications successfully resulted in a license, and not
10
limited to textual works.
United States District Court
Northern District of California
11
RFP 45: “All Documents and Communications Concerning any licensing, accreditation, or
12
attribution mechanism, or similar tool for crediting, compensating, or seeking consent from
13
owners of copyrighted works that were used to train the Meta Language Models.”
14
Meta does not dispute the relevance of this information but argues it has no responsive
15
documents. Accordingly, the Court GRANTS Plaintiffs’ motion as to RFP 45. If there really is
16
nothing responsive, then there is nothing for Meta to produce.
17
RFP 46: “All Documents and Communications sufficient to show Your actual or projected
18
income from the sale or licensing of the Meta Language Models.” RFP 53: “All Documents and
19
Communications Concerning any income statement, balance sheet, or statement of cash flows,
20
Concerning any of the Meta Language Models.”
21
Meta does not dispute the relevance of this information and states it is not withholding
22
documents responsive to these RFPs. Accordingly, the Court GRANTS Plaintiffs’ motion as to
23
RFPs 46 and 53. If Meta has already produced the responsive documents, then there is nothing
24
more for Meta to produce.
25
26
27
28
RFP 54: “All Documents and Communications Concerning any decision by You to not
develop an interface for end users to interact with any of the Meta Language Models.”
Other than asserting that the documents sought by RFP 54 are “clearly relevant,” Plaintiffs
have not actually explained how responsive documents are relevant to any issue in this case.
2
1
2
3
4
5
6
United States District Court
Northern District of California
7
Plaintiffs’ motion is DENIED as to RFP 54.
RFP 59: “Documents and Communications Concerning the ability of any Meta Language
Model to output fictional works.”
Plaintiffs do not explain why documents responsive to this RFP are relevant to this case.
Their motion is DENIED as to RFP 59.
b.
Time Frame for Document Productions
The parties agree that Meta has limited the relevant time period for document production to
8
January 1, 2022 to the present. Plaintiffs argue that this time frame is too narrow. They say that
9
the proposed class period begins on July 7, 2020 (three years before the Complaint was filed).
10
They also argue that copying or discussions “may have” occurred before 2022. They say “[t]he
11
Court should require Meta to run searches (and produce documents from) as far back as necessary
12
to capture all instances in which Meta copied—or discussed copying—copyrighted data that was
13
used to train Llama, or, at a minimum, as far back as the beginning of the class period.”
14
The request for Meta to produce documents from “as far back as necessary” to capture
15
relevant conduct is indeterminate. The Court has to specify a time frame. The Court therefore
16
considers the alternative request to expand the time frame to “the beginning of the class period.”
17
That is a request to expand document production by a year and a half. This is the sort of request
18
the Court does not expect to be made on the last day to move to compel concerning existing
19
written discovery, which is when Plaintiffs filed this request. The date range for document
20
production is something that should be resolved much sooner than that. Plaintiffs have filed a
21
motion to compel 35 days before the close of fact discovery seeking an additional year and a half
22
of document production. That likely cannot be done by the close of fact discovery, and ordering
23
Meta to try threatens to turn the close of fact discovery into a train wreck. The Court continues to
24
be concerned by Plaintiffs’ repeated attempts to seek major expansions of the scope of discovery
25
right near the end of fact discovery. A fire drill in the last month of fact discovery concerning a
26
foundational issue that could have been raised much sooner is not proportional to the needs of the
27
case. Plaintiffs’ motion to expand the time frame for document production is DENIED.
28
3
1
2
3
Issue #5
a.
Llama Source Code
Plaintiffs move to compel additional Llama source code. Meta argues that Plaintiffs have
4
no existing written discovery requests that seek source code, and that such requests were first
5
served on October 9, 2024, and Meta’s responses were due on November 8, 2024, which is the
6
date the joint discovery letter brief was filed.
7
United States District Court
Northern District of California
2.
Plaintiffs’ section of the letter brief does not identify any discovery request to which
8
source code is responsive. A party moving to compel should demonstrate that it asked for the
9
materials in discovery. Plaintiffs have not shown that. Accordingly, their motion to compel is
10
DENIED as to the source code.
11
b.
Llama Training Data
12
Plaintiffs move to compel on RFPs 1-3 and 7 and rog 1. RFPs 1-3 seek “[t]he Training
13
Data” for Llama 1, 2 and 3. RFP 7 seeks “[d]ocuments and Communications to, from, or with
14
Library Genesis (aka LibGen) Concerning Training Data.” Rog 1 asks:
15
16
17
18
19
20
21
22
23
Describe in detail the data You have used to train or otherwise
develop the Meta Language Models, Including, for each:
a. How You obtained the data, e.g., by scraping the data, purchasing
it from third parties, or by other means;
b. All sources of Data, including any third parties that provided data
sets;
c. To the extent the data was derived from publicly available websites,
a list of all such websites and, for each, the percentage of the data
corpus that is derived from that website;
d. The categories of content included in the data and the extent to
which each category is represented in the data corpus (i.e., as a
percentage of data used to train the model);
24
e. All policies and procedures Related to identifying, assessing,
vetting and selecting sources of data for the model.
25
Plaintiffs do not present any argument with respect to RFP 7. Meta states that it is not
26
aware of any responsive documents. The Court DENIES the motion to compel as to RFP 7
27
because the motion is unexplained.
28
With respect to the training data, Plaintiffs say they need more information about how
4
United States District Court
Northern District of California
1
Meta obtained and used it. Plaintiffs say that if the only issue in the case were the important
2
binary question of whether Plaintiffs’ copyrighted materials were in Meta’s possession in some
3
fashion, the data already produced would answer that question. But Plaintiffs say it is also
4
important that they be permitted discovery that goes to the importance of and breadth of use of the
5
copyrighted protected materials at issue. For these reasons, Plaintiffs do not believe it is enough
6
for them to have access to the set of training data for Llamas 1-3, and submit they are entitled to
7
information from Meta that identifies the iterations of copies of training data with copyrighted
8
material or books within their possession, custody or control. Plaintiffs seek information on how
9
many times Meta downloaded each copyrighted work, from where it downloaded each, when it
10
downloaded each, and how it is using each copy. To be clear, in light of Meta’s stated burden
11
objection, Plaintiffs are not demanding that Meta produce each iteration of the copies. They
12
would settle for a declaration or an amended answer to rog 1 that provides this information.
13
Meta has several responses. The major one is that Meta has identified and produced copies
14
of the actual book-related training datasets that allegedly include copyrighted works that were
15
actually used to train the Llama models. Meta argues that Plaintiffs’ request that Meta scour the
16
entire company to determine if there are other stored and duplicative copies of those datasets –
17
copies which would not have been the ones used to train Llama – is overly burdensome and not
18
proportional to the needs of the case.
19
The Court DENIES Plaintiffs’ motion to compel because RFPs 1-3 and rog 1 (and RFP 7)
20
did not ask for this information. RFPs 1-3 asked for the training data for Llama 1, 2 and 3. If
21
there are other copies of the same copyrighted works in Meta’s possession that were not used to
22
train Llama, the RFPs didn’t ask for those. Similarly, rog 1 asked Meta to “[d]escribe in detail the
23
data You have used to train or otherwise develop the Meta Language Models . . .” This rog didn’t
24
ask about the full scope of what Meta has or does with the copyrighted works. It just asked about
25
the data used to train or otherwise develop the language models. Plaintiffs do not make any
26
argument that Meta’s response to rog 1 is incomplete or otherwise defective. Plaintiffs argue that
27
they seek “information about how books were used in LLM training and operationalization, and
28
how in turn the LLMs or book corpuses are used by Meta.” Rog 1 did not ask for that. It asked
5
1
Meta to describe the data it used to train or develop the models, including (a) how Meta got it, (b)
2
all the sources of data, (c) the public websites it got data from, (d) the categories of content
3
included in the data, and (e) the policies and procedures for selecting sources of data. The rog
4
asked for what data Meta used and where and how and why it got it. It did not ask anything about
5
how the data was used in training or operationalization, or how the LLMs or book corpuses are
6
used by Meta. Accordingly, Plaintiffs’ motion is DENIED as to these discovery requests.
7
IT IS SO ORDERED.
8
9
Dated: November 25, 2024
10
THOMAS S. HIXSON
United States Magistrate Judge
United States District Court
Northern District of California
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
6
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?