Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al
Filing
87
DECLARATION re 86 Memorandum in Opposition to Motion to Compel (McCrary) by President and Fellows of Harvard College. (Ellsworth, Felicia) (Main Document 87 replaced on 7/31/2015) (Montes, Mariliz). (Additional attachment(s) added on 7/31/2015: # 1 Addendum) (Montes, Mariliz).
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 1 of 18
UNITED STATES DISTRICT COURT FOR
THE DISTRICT OF MASSACHUSETTS
BOSTON DIVISION
STUDENTS FOR FAIR ADMISSIONS, INC.,
Plaintiff,
v.
PRESIDENT AND FELLOWS OF HARVARD
COLLEGE (HARVARD CORPORATION),
Civil Action No. 1:14-cv-14176-ADB
Defendant.
Declaration of
Justin McCrary, Ph.D.
On Behalf of Defendants
July 30, 2015
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 2 of 18
I, Justin McCrary, hereby state under penalty of perjury:
I.
INTRODUCTION AND QUALIFICATIONS
1.
I am an economist with expertise in statistical methods, economic modeling, labor
economics, law and economics, and antitrust. I received my A.B. in Public Policy from
Princeton University in 1996. After working at National Economics Research Associates in
White Plains, New York, and the Federal Reserve Bank of New York from 1996-1998, I began
my Ph.D. in Economics at the University of California, Berkeley (“Berkeley”), completing the
degree in June 2003 with field specializations in labor economics and econometrics. I then spent
close to five years as Assistant Professor in the Gerald R. Ford School of Public Policy and the
Department of Economics at the University of Michigan. While at Michigan, I taught
introductory statistics and advanced microeconomic theory to M.P.P. students, and advanced
econometric theory to Ph.D. students. I became an Assistant Professor of Law at Berkeley in
January 2008 and was promoted to Professor in July 2010. While at Berkeley, I have taught
courses on introductory, intermediate, and advanced statistics to J.D. students, L.L.M. students,
and Ph.D. students; on law and economics to J.D. students as well as undergraduates; on
business law to J.D., L.L.M., and M.B.A. students; and on labor economics to Ph.D. students.
2.
In addition to my post as Professor, I am the Founding Director of D-Lab, the Social
Sciences Data Laboratory at Berkeley. At D-Lab, I lecture and advise graduate students and
faculty regarding high-performance computing, statistical software, statistical and econometric
techniques, and research design.
3.
From September 2009 until July 2014, when I began to direct the D-Lab, I co-directed
the Law and Economics Program at Berkeley Law with Bob Cooter and Dan Rubinfeld (20082011) and with Bob Cooter and Eric Talley (2012-2014).
Page 1
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 3 of 18
4.
Since 2008 I have co-directed the Economics of Crime Working Group of the National
Bureau of Economic Research (“NBER”). The NBER is the preeminent professional association
of economists in the world, with approximately 1,300 members worldwide. I was invited to
become a Faculty Research Fellow of the NBER in 2006 and remained in that position until
2012, when I was invited to become a Faculty Research Associate.
5.
My research spans a diverse range of topics, including econometric and statistical
methodology, education, employment discrimination, antitrust, crime, fertility, financial markets,
income inequality, and monetary policy. Many of my articles have been published in leading
economics, econometrics, and statistics journals, such as the Review of Economics and Statistics
and the Journal of Econometrics. In addition, I have written or co-written three papers that were
published in the top economics journal in the world, the American Economic Review, and have
co-edited a book, Controlling Crime: Strategies and Tradeoffs, published by the University of
Chicago Press. Over the years, my research has been supported by the University of Michigan;
the University of California, Berkeley; the MacArthur Foundation; the NBER; the National
Institutes of Health; the National Science Foundation; and the Robert Wood Johnson Foundation.
6.
I am frequently asked to review articles for leading economics, econometrics, and
statistics journals, including Econometrica, the American Economic Review, the Quarterly
Journal of Economics, the Journal of Political Economy, the Review of Economic Studies, the
Review of Economics and Statistics, and the American Law and Economics Review. Since
coming to Berkeley Law, I have also been asked to comment on empirical papers submitted to
law reviews and to peer-reviewed law journals, including the California Law Review, the Law
and Society Review, the Journal of Law and Economics, and the Journal of Empirical Legal
Studies.
Page 2
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 4 of 18
7.
I am currently a signatory to an Intergovernmental Personnel Agreement between the
Equal Employment Opportunity Commission (“EEOC”) and the University of California,
Berkeley. The EEOC has asked me to analyze its data regarding the racial and gender
composition of the workforce of public and private employers. I receive no monetary
compensation for this work.
8.
My consulting experience has spanned a wide range of industries and markets. For
example, I have previously analyzed the extent to which alleged collusive behavior among health
care providers affected prices; the extent of infringing sales in a patent lawsuit pertaining to
pharmaceuticals; the potential anti-competitive implications of a proposed telecommunications
merger; damages associated with an alleged price-fixing conspiracy in the corrugated packaging
industry; damages associated with an alleged price-fixing conspiracy in several prominent hightechnology product markets; and damages associated with an alleged price-fixing conspiracy in
the sale of retail gasoline. In addition, I am currently a consultant for the California Attorney
General, tasked with analyzing the data systems maintained by the Attorney General’s office. I
have also been asked to assess the extent to which those data point to differences in criminal
justice outcomes between different racial groups.
9.
Finally, I am frequently asked to speak on the use of statistical methodologies in
empirical legal studies and for the past four summers have given day-long lectures for the weeklong Causal Inference Workshop and its more advanced version, the Advanced Causal Inference
Workshop, both organized by Bernie Black (a Professor of Law and Business at Northwestern
University) and Matthew McCubbins (a Professor of Law and Political Science at Duke
University).
Page 3
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 5 of 18
10.
A copy of my curriculum vitae, including a list of previous testimony and depositions, is
included as Appendix A.
11.
I am being compensated at my standard billing rate of $750 per hour. I have been
assisted in this matter by staff of Cornerstone Research, who worked under my direction. In
addition to my direct compensation for this work, I receive from Cornerstone Research a portion
of the amount that it bills for work supporting me. Neither my compensation from Defendant
nor my compensation from Cornerstone Research is in any way contingent or based on the
content of my opinion or the outcome of this or any other matter.
The statements made herein are based on my personal knowledge and upon information
12.
made available to me by Defendant’s counsel and by staff of Cornerstone Research who were
working under my direction.
II.
BACKGROUND AND ASSIGNMENT
13.
On July 16, 2015, Plaintiff filed a motion to compel the production of a preliminary
sample of 6,400 application files (“Plaintiff’s Motion”). I understand that the motion was filed
partly in response to Defendant’s proposal to produce a sample of 160 application files in
conjunction with an electronic database containing information about freshman undergraduate
applicants to Harvard. In support of Plaintiff’s Motion, Professor Peter Arcidiacono submitted
an expert declaration (“Arcidiacono Decl.”). In his declaration, Professor Arcidiacono opined
that a sample of 6,400 application files would be necessary in order to evaluate whether
Harvard’s admissions process discriminates against Asian-American applicants.1 Professor
1
Arcidiacono Decl. ¶¶ 27-36.
Page 4
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 6 of 18
Arcidiacono further opined that Harvard’s proposed sample of 160 application files would be
insufficient.2
14.
I have been asked to review the Arcidiacono Declaration, Plaintiff’s Motion, and
Harvard’s electronic Admissions Office database (the “Database”) in order to evaluate whether
the Database (in conjunction with a sample of 160 application files) would be sufficient for the
statistical analysis and modeling described by Professor Arcidiacono.
15.
I submit this Declaration in support of Defendant’s Opposition to Plaintiff’s Motion.
III.
SUMMARY OF OPINIONS
16.
I have experience estimating statistical models of discrimination and understand the
extant academic literature on discrimination. I also understand the types of data necessary for
estimating statistical models.
My understanding is that Harvard is proposing to produce detailed information from the
17.
Database, for one or more years, for applicants for freshman admission. That information is
comprehensive and detailed and is sufficient for the statistical analyses and modeling described
by Professor Arcidiacono.
The Database contains data for the full universe of freshman applicants—that is,
18.
approximately 37,000 applicants each year—with several hundred fields of information in the
system. For the year of data that I reviewed, for example, I understand that there are more than
900 fields in the Database. In particular, the Database contains the kinds of information
identified by Professor Arcidiacono as important to the analyses he describes. Moreover, the
information in the Database will be sufficient for that analysis even if Harvard redacts from the
Database information that would directly identify the applicant; the applicant’s family members;
2
Arcidiacono Decl. ¶¶ 37-41.
Page 5
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 7 of 18
and other third parties such as individual Harvard alumni interviewers, high school teachers and
counselors, and others who recommended the applicant for admission to Harvard.
19.
Professor Arcidiacono has not provided a reason, and I am not aware of any reason, why
a sample of 6,400 application files, which is a mere subset of the universe or population of all
applicants in the Database, would allow for him to perform the statistical analysis he has
described more reliably than would the full Database. In fact, for the statistical analyses
identified by Professor Arcidiacono, the Plaintiff’s sampling method would be less reliable than
analyzing the universe of applicant information contained in the Database. Furthermore, the
universe or population of all applicants in the Database can be produced at a substantially lower
cost than producing a sample of 6,400 application files.
20.
Moreover, Harvard’s proposal to also produce 160 applicant files (80 selected by SFFA
and 80 selected by Harvard) in addition to the Database is more than sufficient for Professor
Arcidiacono to assess whether the Database contains all the necessary information relevant for
his statistical analyses.
21.
I understand that the Plaintiff has argued that the proposed sample of 6,400 applications
is reasonable because, at about 4 percent of the total population of applications, it is smaller in
size than typical samples produced in comparable cases. From a statistical point of view,
however, it is irrelevant whether or not the requested sample is modest in size relative to samples
produced in other litigation. The Database includes the entire population of freshman applicants
and can be produced at a substantially lower cost than a sample of 6,400 application files. Thus,
the Database is preferable to any sample – regardless of size – for the analysis proposed by
Professor Arcidiacono.
Page 6
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 8 of 18
IV.
ANALYZING A POPULATION IS PREFERABLE TO ANALYZING A SAMPLE
OF THE POPULATION
22.
Given Professor Arcidiacono’s focus on the importance of statistical sampling, it is
helpful to start with a brief overview of the purpose of statistical sampling. Any time a
researcher has easy and inexpensive access to data on the whole population, there is no need to
conduct statistical sampling. Typically, as Professor Arcidiacono and I agree, sampling is
performed when collecting data is costly. In such a context, collecting a sample is less costly
than collecting data on the whole population for the simple reason that a sample has fewer
observations than the population. Statistical sampling selects a subset of individuals from a
given population (here, the relevant population is students who applied for freshman admission
to Harvard College) in such a way that the characteristics of the sample are similar to those of the
population.3 A statistical sample can be used in conjunction with statistical assumptions to draw
inferences regarding the population from which the sample is drawn. In summary, statistical
sampling is commonly used because the relevant information for the entire population is not
available or it is too costly to collect information about the entire population.
This fact is recognized by introductory statistics textbooks and treatises on the use of
23.
statistics in legal settings. Professor Arcidiacono and I agree on this point. As he states: “In
statistical analysis, sampling relates to the selection of a subset of individuals from [a] statistical
population to estimate characteristics of the whole population. Analyzing the whole population
is preferable but tends to be costly.”4
3
There are different types of statistical samples, including random samples and stratified random samples, among
other types.
4
Arcidiacono Decl. ¶¶ 18-19
Page 7
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 9 of 18
24.
A simple example helps illustrate the point. Consider a population of 100 individuals,
half of whom are men and half of whom are women. A researcher who did not know the fraction
of the population that is female might seek to estimate this fraction using a sample of, say, 30
individuals. Though unlikely, it is possible that a random sample of 30 individuals from this
population would contain 20 women and 10 men. Thus for such a random sample, the
researcher’s best guess about the female proportion of the population would be 66.7%.
25.
Estimates based on random samples are uncertain, in the sense that different random
samples would yield somewhat different estimates of the same underlying quantity: another
random sample of 30 individuals from our population of 100 might contain 16 women (53.3%)
instead of 20 (66.7%). For this reason, estimates derived from analysis of a sample are
inherently uncertain. Statisticians and economists quantify the degree of uncertainty by
reporting what is known as a confidence interval. A confidence interval is a range, such as 0.5 to
0.8, such that there is a quantifiable degree of confidence that the true fraction of the population
who are women is in that range. Such confidence intervals are justified under particular
statistical assumptions.
26.
If one were to analyze the entire population, there would be no uncertainty of the type
described above. The researcher would know that 50% of individuals in the population are
female and there would be no associated confidence interval surrounding the estimate. If
analyzing the entire population is no more difficult, intrusive, or costly than analyzing a sample,
it is always best to analyze the population, because doing so eliminates the uncertainty in
estimating the population statistic of interest.
Page 8
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 10 of 18
27.
Thus, to the extent that the analysis contemplated by Professor Arcidiacono can
incorporate information from the full Database of applicants, there would be absolutely no
reason to prefer using a sample of application files, no matter how large.
V.
HARVARD’S DATABASE IS SUFFICIENT FOR THE ANALYSIS PROFESSOR
ARCIDIACONO PROPOSES
28.
As noted above, I have experience estimating statistical models of discrimination and am
knowledgeable about the academic literature on discrimination. In my Ph.D. dissertation, I
studied discrimination empirically in the context of litigation against police departments. This
work was later published in the American Economic Review in 2007.5 I also draw on this
expertise in connection with my work for the EEOC and the California Attorney General.
In teaching statistics courses at the University of Michigan and the University of
29.
California, Berkeley, and labor economics courses at the University of California, Berkeley, I
routinely draw on examples of statistical models used to assess the extent to which there is
evidence that might be consistent with discrimination. I also draw extensively on the literature
on discrimination in teaching law and economics at the University of California, Berkeley, and
have supervised Ph.D. student dissertations focusing on discrimination. As an expert in the use
of statistical and econometric methods, I understand the types of data necessary for estimating
statistical models such as those proposed by Professor Arcidiacono.
Professor Arcidiacono proposes using a sample of 6,400 application files to examine
30.
“whether application files are systematically scored differently on the basis of race.”6 He further
elaborates on the analysis that he intends to undertake: “With the raw files in hand, we can code
5
6
McCrary, J., “The Effect of Court-Ordered Police Hiring Quotas on the Composition and Quality of Police,”
American Economic Review, Volume 97, Number 1, March 2007.
Arcidiacono Decl. ¶ 28.
Page 9
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 11 of 18
the various factors Harvard describes as important in determining application subscores (for
example, creating an indicator variable for whether the student was a valedictorian). We can
then use regression analysis to see, for example, whether Asians received lower subscores
conditional on the factors that Harvard describes as important for that subscore.”7 As I discuss
below, the Database is sufficient for this analysis.
31.
Professor Arcidiacono also discusses using the sample of application files to score more
subjective factors such as “the level and extent of” participation in extracurricular activities, so
that a model predicting admissions decisions can properly account for such factors.8 In making
these arguments, Professor Arcidiacono overlooks a much simpler, and from a statistical
perspective more rigorous, solution: the use of Harvard’s Database.
32.
While I understand that Professor Arcidiacono has not yet had the opportunity to review
the variables included in the Database, I have looked at these variables carefully. The Database
that Harvard proposes to produce contains rich, detailed information on every applicant for
freshman admission to Harvard. In fact, the Database includes several hundred data fields for
each applicant.
33.
Importantly, the Database provides information on the two factors that Professor
Arcidiacono has specifically identified as important.
34.
First, the Database provides information relevant to determining whether an applicant
was the “valedictorian”9 of his or her class. In particular, it includes information regarding the
7
Arcidiacono Decl. ¶ 28.
8
Arcidiacono Decl. ¶ 34.
9
I understand that “valedictorian” may be defined differently for different candidates, because the honor can be
defined differently by different secondary schools. Some candidates may have received the honor at the time of
applying, and others may expect to receive the honor after applying. My understanding is that if the applicant
disclosed among his or her honors that he or she is the valedictorian, that designation is captured in the
Page 10
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 12 of 18
class rank of some applicants (as I understand it, those for whom their guidance counselor filled
out this information in the application materials), and it includes the applicant’s self-reported list
of high-school honors and achievements.
35.
Second, the Database contains extremely detailed information about each applicant’s
extracurricular activities. For example, for each extracurricular activity that an applicant reports
on his or her application (up to a maximum of twelve), the Database contains fields identifying
the type of activity (for example, School Newspaper/Journalism), the applicant’s role (for
example, Editor-in-Chief), the years during which the applicant was involved in the activity,
whether the applicant’s involvement was year-round or only during the school year, and the
number of hours per week and number of weeks per year that the applicant devoted to the
activity.
36.
These are just two examples of the detailed information available in the Database. There
are additional types of information in the Database that could be relevant for Professor
Arcidiacono’s analyses. For example, the Database includes:
Numerous measures of academic success while in high school (e.g., GPA, class
rank, scores on the ACT, SAT I, SAT II, and TOEFL, AP courses taken and
scores on AP tests, honors reported by the applicant (such as National Merit
Scholar), and the level of the honor (such as national vs. school));
Measures of intended areas of focus while at Harvard (e.g., intended major,
intended career, and intended graduate school plans);
Extensive information on each applicant’s family (e.g., parents’ marital status and
education level, and siblings’ age and education level, and whether the applicant’s
parent(s) and/or siblings attended Harvard College);
Information about the applicant’s and the applicant’s family’s financial situation,
(e.g., parents’ occupation and employer, whether the applicant applied for
financial aid, and whether they paid their application fee);
database. Because many students do not know if they are going to be valedictorian at the time of the
application, this information is often not disclosed in the application and therefore not captured in the database.
Page 11
Case 1:14-cv-14176-ADB Document 87 Filed 07/30/15 Page 13 of 18
Cultural and demographic information (e.g., race and ethnicity, languages spoken
by the applicant, and proficiency level for each language); and
Ratings assigned by Harvard admissions officers and alumni interviewers (which
includes admissions officer ratings for academics, extracurricular activities,
athletics, and personal qualities).
37.
Given this extensive record for each applicant, it is my opinion that the Database is
sufficient for the analyses described in the Arcidiacono Declaration and preferable to the
sampling method proposed by the Plaintiff and Professor Arcidiacono. I am not aware of any
reason why a sample of 6,400 application files would allow for the analysis proposed by
Professor Arcidiacono to be conducted more reliably than would Harvard’s Database. In fact,
for the analyses that Professor Arcidiacono has described, his sampling method would be less
reliable than analyzing the full universe of applicant information contained in the Database.
38.
To the extent the Plaintiff would like to ascertain that the Database contains all the
necessary information for their statistical analysis, Harvard’s proposal to also produce 160
applicant files (80 selected by SFFA and 80 selected by Harvard) is more than sufficient to make
this assessment.
I declare under penalty of perjury under the laws of the United States of America that the
foregoing is true and correct. Executed on July 30, 2015.
_________________________
Professor Justin McCrary,
Ph.D.
Page 12
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?