Disney Enterprises, Inc. et al v. Hotfile Corp. et al
Filing
217
MOTION to Strike and Memorandum of Law of Defendants Hotfile Corporation and Anton Titov to Strike Plaintiffs' Putative "Rebuttal" Report of Dr. Richard Waterman Before the Close of Expert Discovery on January 17, 2012 and Motion for Expedited Briefing and Hearing at the Upcoming Status Conference on January 13, 2012 by Hotfile Corp., Anton Titov. Responses due by 1/26/2012 (Attachments: # 1 Exhibit A, # 2 Exhibit B, # 3 Exhibit C, # 4 Exhibit D, # 5 Exhibit E, # 6 Exhibit F, # 7 Exhibit G, # 8 Exhibit H, # 9 Exhibit I, # 10 Exhibit J, # 11 Exhibit K)(Munn, Janet)
EXHIBIT “A”
UNITED STATES DISTRICT COURT
SOUTHERN DISTRICT OF FLORIDA
CASE NO. 11-20427-WILLIAMS/TURNOFF
DISNEY ENTERPRISES, INC.,
TWENTIETH CENTURY FOX FILM CORPORATION,
UNIVERSAL CITY STUDIOS PRODUCTIONS LLLP,
COLUMBIA PICTURES INDUSTRIES, INC., and
WARNER BROS. ENTERTAINMENT INC.,
Plaintiffs,
v.
HOTFILE CORP., ANTON TITOV, and
DOES 1-10.
Defendants.
HOTFILE CORP.,
Counterclaimant,
v.
WARNER BROS. ENTERTAINMENT INC.,
Counterdefendant.
RULE 26(a)(2)(B) REPORT OF DR. RICHARD WATERMAN
1.
My name is Richard Waterman and I am an Adjunct Professor of Statistics at The
Wharton School at the University of Pennsylvania, and the President and Co-Founder of
Analytic Business Services, Inc., a consultancy focused on providing expert advice and opinions
in the field of statistical analysis. I received my Ph.D. in Statistics from the Pennsylvania State
University in 1993. I have substantial experience designing and reviewing sampling protocols
for various large organizations, such as the United States Postal Service, for whom I designed
CASE NO. 11-20427-WILLIAMS/TURNOFF
and analyzed a national multi-stage sample for the estimation of operational characteristics. I
have designed sampling protocols involving various filesharing technologies, specifically
BitTorrent, Gnutella and Usenet. I also have substantial experience in designing sampling
protocols in the private sector, and have developed market research studies for numerous large
corporate clients, which typically involve issues related to sampling. Further details of my
professional history, including a list of publications I have authored during the last ten years, can
be found on the resume attached as Exhibit A. Within the last four years, I have testified as an
expert at trial or deposition in the following cases, as further outlined in Exhibit B: Arista
Records LLC, et al. v. Lime Group LLC, et al.
No. 06-Civ. 05936 (S.D.N.Y); Columbia
Pictures Industries, Inc. et al. v. Gary Fung, No. 06-CV-5578 (C.D. Cal.); and Schappell v.
GEICO Corporation, No. 1333 S2001 (Pa. Commw Ct.). I have submitted expert reports in
Columbia Pictures Industries, Inc. et al. v. Gary Fung, No. 06-CV-5578 (C.D. Cal.); Arista
Records LLC, et al. v. Usenet.com , Inc., No. 07-CV-08822 (S.D.N.Y.); Schappell v. GEICO
Corporation, No. 1333 52001 (Pa. Commw. Ct.); Freedom Medical Supply, Inc. V. PMA Capital
Ins. Co., No. 003988 (Pa. Commw. Ct.); and Blehm v. Albert Jacobs et al 1:09-cv-02865-RPM,
for the United States District Court of Colorado. I am being compensated for my services in this
case at a rate of $450/hour ($550/hour for testimony).
2.
I have been asked by the plaintiffs to create a protocol for drawing a statistically
reliable sample for a study analyzing the percentage of files downloaded daily that were
identified as infringing from the website operated by the defendants, www.hotfile.com
("Hotfile").
3.
I devised a methodology, described in more detail below, to allow fOr a
scientifically reliable and unbiased sample of files to be selected from the population of interest.
CASE NO. 11-204 27-W ILI JANIS/TURNOFF
After the sample had been drawn and the content obtained if available, an analysis of those files
was conducted by a copyright analyst, Mr. Scott Zebrak, supervising a team in aid of his
analysis. Mr. Zebrak's report describing the process he followed and his analysis is attached
hereto as Exhibit C. For the determination of the copyright infringement status of each file in the
sample, I relied on the work and conclusions of Mr. Zebrak.
4.
Based on the analysis of the content files in the sample, I performed statistical
analyses to derive the results for the infringement study. Those results are presented below,
beginning with a summary of my opinions and conclusions and followed by a description of the
study, the sampling protocol and analyses, and the bases and reasons for my opinions and
conclusions. In general, in reaching my opinions and conclusions, I relied upon my specialized
knowledge, education, and experience as applied to the facts and data discussed below, as well as
data about downloads from Hotfile produced by defendants, and the work and conclusions of Mr.
Zebrak. The exhibits I may use as a summary of or in support for my opinions are attached
hereto or are being produced concurrently with this Report.
5.
Based upon my review of the most recent data provided by Mr. Zebrak,
approximately 90.3% of all daily downloads of files on Hotfile were downloads of infringing or
highly likely infringing content; approximately 5.4% of the downloads of files per day on Hotfile
were downloads of non-infringing or highly likely non-infringing files; and the remaining
approximately 4.3% of the downloads of files per day on Hotfile were downloads of files whose
copyright status could not be reliably determined in the time allowed. Of the works classified as
non-infringing, 0.5% were identified as being likely illegal to distribute, making the infringement
analysis here conservative. This analysis was based on data showing downloads of files that was
provided by Hotfile.
3
CASE NO. I 1-20427-WILLIAMS/TURNOFF
6.
The following describes the processes I used to design the sampling protocol and
select the sample for the study:
7.
The first step in devising the sampling protocol was to define the relevant
population of interest from which the sample would be extracted, and to ensure the population
was accurately represented in the sampling frame. Since the objective of the Hotfile study was
to analyze the daily percentage of downloads of files from Hotfile that were of infringing files,
the population of interest consists of downloads of files from Hotfile in a specified time prior to
the complaint, January 2011.
8.
While defendants did not produce actual log data for the period before February
2011, they did produce a data table called "dailydownload", that efficiently summarizes all the
necessary information that would be found in a log file to enable an infringement analysis of the
recorded downloads. My understanding is that this table identifies files that were downloaded in
a specific day (represented in the "uploadid" field), the date of download (represented in the
"date" field), and the number of "premium" and "free" downloads of the files (represented in the
"premium" and "free" fields). My understanding is that "premium" and "free" downloads are
downloads by different kinds of users: those who have purchased Hotfile Premium
subscriptions, and those that have not, respectively. Adding the two together gives the number
of recorded downloads per day for the file on the indicated date. Thus, the "dailydownload" data
contains a summary of information of recorded downloads by file for any particular day.
9.
To understand the level of infringing activity on Hotfile prior to filing the
complaint I looked at the month of activity prior to the complaint filing, January 2011. In order
to understand the number of downloads per day in this month, I looked at different random days
4
CASE NO. 11-20427-W1LLIAM SrIli R.NOFF
in the month, and took a sample of downloads from each of those clays. I designed the protocol
to randomly select five weekdays and two weekend days.
10.
In the first step of the protocol, I randomly selected five weekdays and two
weekend days, by consecutively assigning each weekday in January 2011 a number and
consecutively assigning each weekend day in January 2011 a number. I then used a standard
random number generator to generate a separate list of numbers for the set of weekdays and the
set of weekend days. This is a standard and universally accepted means to generate a simple
random sample. The days selected by this process were January 5, 11, 20, 21, and 24
(weekdays) and January 1 and 30 (weekend days).
11.
Overall, the "dailydownload" table shows 145,691,820 downloads of files from
Hotfile in the month of January 2011. On each date selected, the "dailydownload" table shows
the number of recorded downloads of files per day. The combination of the "free" and
"premium" downloads per day for the selected days were as follows:
Date
Download Count
2011-Jan-01
4,180,329
2011-Jan-05
4,677,811
2011-Jan-11
4,568,087
2011-Jan-20
4,496,274
2011-Jan-21
4,631,944
2011-Jan-24
4,738,937
2011-Jan-30
5,125,537
5
CASE NO. I 1-20427-WILLIAMS/TURNOFF
l2.
Within each selected day, the sample frame was obtained by taking the
dailydownload data and expanding the record of each file to capture the total number of recorded
downloads of that file on that day. For example, if a file was downloaded 5 times in a day, the
record would be expanded to reflect five separate downloads of that file. This method permits
simple random sampling of the complete set of recorded downloads of all files in a day. The
sample size was selected to obtain a 95% confidence interval with a margin of error of plus or
minus 5%. (Because of the consistency of daily download infringement proportions, the final
margin of error of the study was considerably smaller.) This allows for a high level of
confidence that the results of the study reflect the percentage of infringing downloads per day for
any day in the entire population, together with a high level of precision. To target this level of
precision, I concluded that the Hotfile sample size should be 1750 (250 per day), which is also
consistent with sample sizes in other similar online infringement studies conducted in other
cases.
13.
I used "simple random sampling" to draw the sample within each day. "Simple
random sampling" is a universally accepted statistical methodology in which each item has the
same opportunity to be chosen as any other item. In this case, each download of a file in a
particular day had the same chance to be chosen as any other download of any file within that
day. For each day, I used a standard random number generator to generate a list of numbers to
select the downloads that constitute the sample. This too is a standard and universally accepted
means to generate a simple random sample.
14.
I am attaching herewith as Exhibit D the download instructions that implement
the sampling protocol I have described in the foregoing for the Hotfile study. The protocol
provides for replacement of files in the sample under only limited circumstances. First, if the file
6
CASE NO. 11-20427-WILLIAMS/TURNOFF
appeared by its metadata to contain child or other illegal pornography, it was not included in the
sample. Second, if the content file was corrupt, inoperable, or unplayable/undisplayable, 'for
reasons other than being password-protected or encrypted, it was not included in the sample. In
those cases, the files were replaced in the sample by another randomly selected file according to
the protocol.
15.
Mr. Zebrak provided an analysis showing his conclusions as to which of the 1750
sample files analyzed were determined to be either confirmed or highly likely copyright
infringing, with the result broken down by download date. He also provided information as to
which files he classified as highly likely or confirmed non-infringing, those "unknowable" files
as to which no determination could be made, and "illegal" files that did not appear to be
copyright infringing but that Mr. Zebrak concluded were likely illegal to distribute for other
reasons. The infringement determinations of each download by day are itemized in the attached
Exhibit E.
16.
Based upon my review of the most recent data provided by Mr. Zebrak, by doing
the calculations described above, I am able to conclude that approximately 90.3% of all daily
downloads of files on Hotfile were downloads of infringing or highly likely infringing content;
approximately 5.4% of the downloads of files per day on Hotfile were downloads of noninfringing or highly likely non-infringing files; and the remaining approximately 4.3% of the
downloads of files per day on Hotfile were downloads of files whose copyright status could not
be reliably determined in the time allowed. Of the works classified as non-infringing, 0.5% (nine
files in the study) were identified as being likely illegal to distribute.
7
CASE NO. 11-20427-WILLIAMS/TURNOFF
17.
Using standard and universally accepted statistical methods to calculate a margin
of error at a 95% confidence level yields a margin of error for this study of approximately 1.3%.
This indicates a high level of reliability.
18.
In my professional opinion, the sampling procedures used in the Hotfile study arc
based on standard and universally accepted statistical methods, and provide a scientifically valid
sample from which we can reliably estimate the incidents of copyright infringement through the
Hotfile website.
19.
I continue to consider additional statistical analyses that might be conducted with
additional data and/or time, including as to files that may been uploaded to Hotfile but not
downloaded, and reserve the right to supplement this report based on such further analyses. I
further reserve the right to supplement or modify this report based on additional information that
may come to light or based on further analyses.
8
CASE NO. I 1-20427-WI LLIANISIVRNOIT
/ //—
Dated: Novembert , 2011
Richard Waterman, Ph.D.
9
CASE NO. 11-20427-WILLIANIS/TURNOFF
UNITED STATES DISTRICT COURT
SOUTHERN DISTRICT OF FLORIDA
CASE NO. 1 1-20427-WILLIAMS/TURNOFF
DISNEY ENTERPRISES, INC.,
TWENTIETH CENTURY FOX FILM CORPORATION,
UNIVERSAL CITY STUDIOS PRODUCTIONS LLLP,
COLUMBIA PICTURES INDUSTRIES, INC., and
WARNER BROS. ENTERTAINMENT INC.,
Plaintiffs,
v.
HOTFILE CORP., ANTON TITOV, and
DOES 1 - 10.
Defendants.
HOTFILE CORP.,
Counterclaimant,
v.
WARNER BROS. ENTERTAINMENT INC.,
Counterdefendant.
CERTIFICATE OF SERVICE
I HEREBY CERTIFY on this 18th day of November, 2011, I served the following
document on all counsel of record on the attached service list via the Court's CM/ECF filing
system:
RULE 26(a)(2)(B) REPORT OF DR. RICHARD WATERMAN
I further certify that I am admitted to the United States Court for the Southern District of Florida
and certify that this Certificate of Service was executed o his date.
By: Duarlit(rPozza
10
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?