The Authors Guild, Inc. et al v. Hathitrust et al
DECLARATION of John Wilkin in Support re: 100 MOTION for Summary Judgment.. Document filed by Hathitrust. (Attachments: # 1 Exhibit A (Part 1 of 2), # 2 Exhibit A (Part 2 of 2), # 3 Exhibit B-C)(Petersen, Joseph)
KILPATRICK TOWNSEND & STOCKTON LLP
Joseph Petersen (JP 9071)
Robert Potter (RP 5757)
1114 Avenue of the Americas
New York, NY 10036
Telephone: (212) 775-8700
Facsimile: (212) 775-8800
Joseph M. Beck (admitted pro hac vice)
W. Andrew Pequignot (admitted pro hac vice)
Allison Scott Roach (admitted pro hac vice)
1100 Peachtree Street, Suite 2800
Atlanta, Georgia 30309-4530
Telephone: (404) 815-6500
Facsimile: (404) 815-6555
Attorneys for Defendants
UNITED STATES DISTRICT COURT
SOUTHERN DISTRICT OF NEW YORK
THE AUTHORS GUILD, INC., ET AL.,
Case No. 11 Civ. 6351 (HB)
HATHITRUST, ET AL.,
DECLARATION OF JOHN WILKIN IN SUPPORT OF
DEFENDANTS’ MOTION FOR SUMMARY JUDGMENT
I, John Wilkin, pursuant to 28 U.S.C. § 1746, hereby declare as follows:
I am an Associate University Librarian at the University of Michigan
(“Michigan”) and also serve as the Executive Director of HathiTrust, which, as is explained in
more detail below, is the name of a service provided by Michigan. I submit this declaration in
support of the defendant libraries’ (the “Libraries”) motion for summary judgment. Unless
otherwise noted, I make this declaration based upon my own personal knowledge or, where
specifically noted, based upon Michigan’s business records.
As Associate University Librarian for Library Information Technology, I am
responsible for, among other things, the online catalog and related technologies for the
University Library which is physically spread over numerous buildings and individual libraries
(collectively, the “University Library”).1
My duties include ensuring the University Library has the necessary technological
infrastructure and networked systems to support the library’s core mission and services. I have
served in this role since June 1, 2002.
On June 1, 2012, I assumed responsibility for many of Michigan’s publishing
activities, including the University of Michigan Press and digital publishing operations.
Prior to my current role and responsibilities, I served as the Head of the Digital
Library Production Service at Michigan, a position that I held since 1996. In that role I was
responsible for campus- and internet-wide Michigan-hosted digital collection services.
I have continuously (with the exception noted below) served in various roles in
Michigan’s library since graduating with a Masters in Library Science from the University of
Tennessee in 1986 (with the exception of 1992 through 1994 during which time I served as
Systems Librarian for Information Services at the University of Virginia).
I have served as the Executive Director of HathiTrust since 2008. In that role, I
am responsible for the service’s operations, development, budget and the measures taken to
ensure the security of the works within the HathiTrust digital library.
As used in this declaration, and unless otherwise noted, the term “University Library” does not include certain
other libraries at Michigan including Bentley Historical Library, Clements Library, Kresge Business Administration
Library, Law Library, Thompson Library (Flint) and Mardigian Library (Dearborn), among others.
The University Library
Consistently ranked as one of the top ten academic research libraries in North
America, the University Library, which, as a part of a non-profit organization dedicated to
learning, is open to students, faculty and the general public, makes available an extraordinary
array of resources and services.
The University Library holds more than 8 million bound volumes housed in a
number of physical locations across the Michigan campus. All of the various libraries at
Michigan, including the law and business school libraries, hold more than 11 million volumes.
Last year, in fiscal year 2011, the University Library hosted nearly 4 million
The Core Function of Academic Libraries Such as the University Library
In order to place HathiTrust in context, it helps to have some background
regarding the basic functions of the University Library, indeed all academic libraries:
We buy works for academic and scholarly pursuits;
We curate, maintain, and preserve those works;
We help scholars and students identify works pertinent to their pursuits; and
We make those works available and accessible consistent with applicable law.
We have been performing these functions for nearly 175 years.
Acquisition of Works
Academic libraries such as the University Library acquire works to satisfy
anticipated future demand by University Library patrons. When a work is requested by many
patrons, and we find ourselves maintaining a waiting list for that work, we will often try to
purchase additional copies.
Last year, in fiscal year 2011, Michigan’s libraries spent more than 24 million
dollars on library materials and the vast majority of these sums were spent acquiring new works.
Although state appropriations for the university consistently decrease, the University Library’s
spending for acquisitions continually increases.
We spend millions of dollars each year on obtaining access to electronic
resources: we spend approximately 12 million dollars2 per year in order to acquire the right to
display the full text of works (most of which are in-copyright) to library patrons.
While I discuss this point in greater detail below, it bears emphasis that our
digitization efforts, including those associated with the HathiTrust Digital Library (“HDL”), do
not diminish our acquisitions of in-copyright material (digital or otherwise). Each year, we
spend millions of dollars to license the right to display the text of copyrighted works and to
acquire new books. Moreover, no portion of in- copyright materials is displayed to patrons
through the HDL (except to students, faculty and staff with certified print disabilities—please
see Section J, ¶¶ 100-106, below for a description of this service). In other words, the HDL is
not a substitute, in any respect, for our acquisitions of in-copyright material (whether print or
There are a number of reasons why academic libraries spend such enormous sums
on acquisitions every year. Academic libraries acquire works not just for current students and
faculty, but also for future generations.
Librarians cannot reliably predict which works may be of scholarly interest in ten,
fifty, or one hundred years. This is one reason why the University Library acquires an
extraordinarily broad range of materials on every conceivable subject.
This figure includes expenditures by all campus libraries because many such licenses are jointly negotiated or
funded for the campus as a whole.
Indeed, it is not unusual to hear scholars express pleasant surprise (and relief) to
discover that we have a particular work in our collection. Such statements are a testament to our
efforts to acquire and preserve a breathtaking number of works. We do so because of the mere
possibility that a particular work on a seemingly obscure topic may be valuable to a future
student or scholar.
The imperative that academic and research libraries acquire a broad range of
material for future scholarship is magnified by the fact that the library can have no assurance that
a work will remain available after it is first published. Indeed, most works go out of print after
the initial print run and once that print run is sold out, it can be difficult if not impossible to
obtain copies of the work.
As a result, academic libraries typically acquire works very shortly after they are
published—even before a definite scholarly need has surfaced—and they need to purchase a
sufficient number of copies of each work to accommodate anticipated user demands; otherwise,
the library may not be able to buy the work later. This is particularly true for books published
outside of the United States, for example in developing countries, and most journal issues are out
of print soon after the initial issue is distributed.
Books, in their physical form, are inherently subject to damage, deterioration and
loss. This is particularly true for books published between 1850 and 1990—approximately threequarters of our entire collection—because books published during this time period were
generally published on paper with high acid content.
Paper with high acid content degrades far more quickly than paper with low acid
content. This is because the fibers that comprise paper degrade when acid meets the moisture in
the air. In what is referred to as an “acid hydrolysis reaction,” the paper fibers are repeatedly
split into smaller fragments so long as the source of acid remains in the paper. This process, in
fact, produces more acid and the degradation accelerates in a downward spiral.
As a result, books that are more than 160 years old—that typically were published
on rag cotton paper, which is relatively more durable—are usually in far better condition than
books that are less than 50 years old.
As of 2004, Michigan estimated that about half of its collection—approximately
3.5 million books—was printed on paper with high acid content, i.e. on paper that is particularly
vulnerable to deterioration and, ultimately, loss.
Prior to the digitization project at issue in this proceeding, the process of
searching the University Library’s immense collections to identify deteriorating books took so
long that, by the time we identified the most imperiled books from the millions potentially at risk,
it was often too late and the books had disintegrated or were unusable.
Our earliest, independent efforts to preserve deteriorating books through
digitization were also severely limited by the length of time it took us to digitize them. Indeed,
books were deteriorating so rapidly that, even if we could have instantly identified all of the
books in our vast collections that were on the brink of deterioration (as noted above, this is an
impossible task), we still could not have digitized the collection in time to preserve the content of
Indeed, the University Library was the industry leader in the average number
works digitized per year. However, we were only capable of independently digitizing
approximately 5,000 books per year, which was but a small fraction of the imperiled works
within our collection.
In other words, prior to asking Google to digitize our collections, we were losing
the race to save significant portions of the library’s works.
Gradual disintegration is not the only threat to books in the academic libraries.
Loss from theft, vandalism, fire, and floods presents an ever-looming threat.
Hurricane Katrina devastated Tulane University’s Howard-Tilton Memorial
Library. I understand from published reports that its basement floor (approximately the size of a
football field) flooded with over eight feet of water, destroying 90% of the 500,000 volumes in
one of the library’s collections.
Library destruction has not been limited to acts of nature. The most famous
example of a loss of a library is probably the destruction of the Library of Alexandria, but
millions of volumes have been destroyed in libraries during the World Wars, and the collection
of the National Library in Sarajevo lost over 1 million volumes due to shelling in the 1990’s.
Helping Scholars Identify Works of Potential Interest
Importantly, libraries aid scholars in the identification of relevant works. The
immense collections housed by academic libraries such as the University Library would be
significantly diminished without reliable and efficient search methods and related technology.
Until relatively recently, most searches of a library’s collection relied on a
physical card catalog. Each card contained limited information concerning a particular work,
including its title, author, publication date and publisher and limited information concerning the
work’s subject matter.
In order to automate cataloging, libraries began to share the work of creating
bibliographic description in the 1960’s. As part of these efforts, libraries created the Research
Libraries Group and the Online Computer Library Center (OCLC), a non-profit organization that
developed and maintained Worldcat, the largest online public access catalog (OPAC) in the
world. This paved the way for the creation of online catalogs in the 1970’s.
While converting the physical card catalog to a digital one empowered users to
perform broader searches, those searches were still limited to the work’s basic bibliographic
data, such as author, title, subject, etc., as illustrated in the following screen shot from
Michigan’s online catalog:
Even with the advent of the online catalog such as depicted above, the actual
content of the works remained closed to searches. Accordingly, a work that contained
information of great importance to a researcher would not be discoverable by that researcher
unless the work’s title, subject headings, or other limited bibliographic data happened to contain
certain key words or other evidently pertinent information.
Making Works Available Pursuant to Our Understanding of
One of the most basic functions of the University Library—indeed all academic
and research libraries—is lending books and other materials to patrons. Further, the University
Library makes works available in a variety of other ways:
We reproduce and distribute works that are in the public domain; and
We make copies of works in order to provide equal access to those works to
students, faculty, and staff who have certified print disabilities;
We reproduce and distribute works pursuant to Section 108 of the Copyright Act
in the event that a work is lost, damaged, deteriorating, or stolen and a
replacement copy cannot be obtained by the University Library at a fair price.
The University Library’s Early Digitization Efforts
Starting in the mid 1980’s, the University Library, like many leading academic
and research libraries, began investing in the equipment necessary to convert works from print to
We took this significant step because we recognized that, in the decades to follow,
basic library functions would increasingly require computing technology.
For example, as summarized above, one of the most critical missions served by
libraries such as the University Library is the preservation of works for future generations. It
was for this reason that in the late 1980’s we began converting at-risk materials to digital format.
We knew that by digitizing such works we were ensuring that they would be available for future
scholarly pursuits even in the event that the work in physical form was lost and we could not find
a replacement copy at a fair price.
As noted above (¶¶ 22-29), while we sought to preserve at risk works through
digitization, we found that given the enormous size of our collections we could not digitize and,
thereby, preserve deteriorating works quickly enough. In fact, during this time period I
understand that we lost irreplaceable volumes which, as a result, have vanished from the
academic and cultural landscape.
The University Library’s early efforts at digitization also allowed for an
increased, though still very limited, number of works to be made more readily accessible to those
with certified print disabilities, and allowed for some improved search functionality across the
digitized works. A truly comprehensive solution, however, required large-scale digitization that
the University Library could not possibly accomplish on its own.
Google’s Involvement in Michigan’s Digitization Efforts
Prior to Google’s involvement in our digitization efforts, at our then rate of
scanning, it would have taken us more than 1,000 years to digitize the University Library’s then
over 7 million volumes.
In 2002, we began speaking with Google about its interest in digitizing
Michigan’s entire library collections in less than a decade.
In late 2004, we entered into an agreement with Google under which Google
would convert hardcopy books from Michigan Library collections to a digital format and provide
digital copies of those books to Michigan. Attached as Exhibit A is a true and correct copy of
Michigan’s current agreement with Google concerning this project.
In return for giving Google access to books in the University Library collection,
Google was required to give the University Library a digital copy of the works digitized by
Google. We bargained for this right because it was important to us that we had the right to
control our own uses and satisfy one or our primary missions of providing specialized services to
the blind or other persons with disabilities. We knew that by maintaining control over our own
digitized copies of our collection we could ensure that students and faculty with print disabilities
had access to works within the HDL on par with their non-disabled peers.
Our aim in working with Google was to digitize as much of the University
Library as possible.3 If we digitized only select portions of our collections we would not have
accomplished our goals.
For example, if we limited the project solely to works known to be in the public
domain, we would have continued to lose books presumed to be in-copyright to inevitable
disintegration and decay and, potentially, to theft, vandalism and natural disaster or calamity.
Further, digitization held the promise of providing students and scholars with
print disabilities immediate access to our print collections on par with the access afforded other
library patrons. That promise would have been largely unrealized if we had limited digitization
to the public domain.
Finally, from the very outset of the project our goal was to offer scholars a better,
more comprehensive way to search for and discover pertinent works within the collection. If we
only allowed such searches over the portion of works known to be in the public domain, roughly
75% of the library’s collections would have been excluded. A search tool that excluded 75% of
our collections would be of significantly less value to students and scholars seeking to identify
the works most relevant to them.
While Michigan was the first academic library to work with Google in connection
with what would become the “Google Book Project,” it is my understanding that Google
ultimately partnered with a number of other universities and research libraries. For example, I
am aware that in addition to the defendants named in this lawsuit, Google worked with such
In certain instances, rights holders availed themselves of an opt-out program offered by Google in connection with
its digitization of works. In those situations, our digital collection does not include such works.
universities as Harvard University, Stanford University, Oxford University, Columbia
University, Princeton University, the University of Virginia, and the University of Texas at
Austin, among others. The benefits to society—in preserving books, making them accessible to
people with print disabilities, and enabling people to find them—increased significantly with
each institution that digitized books from its collections.
The Formation of HathiTrust
In 2008, Michigan formed HathiTrust, named for the Hindi word for elephant,
“hathi.” The “hathi” prefix is intended to evoke the qualities of memory, wisdom, and strength
symbolized by elephants.
The concept underlying the formation of HathiTrust was (and is) simple. It makes
no economic or functional sense for each research library to maintain its own digitized collection
of works. Rather, we believe that by working together and pooling resources we can better serve
our common goals of collecting, organizing, securing, preserving and, consistent with applicable
law, sharing the record of human knowledge.
Accordingly, pursuant to the HathiTrust mission, participating members
combined their digitized collections in order to provide more secure, long-term storage for the
works, more comprehensive research and discovery tools, improved access to works in the
public domain and improved access to works for students and faculty with print disabilities.
Michigan runs the HDL as a service not only for the benefit of Michigan but also for the benefit
of all participating institutions and, indeed, all users of the HathiTrust website located at
There are currently more than sixty institutions participating in HathiTrust,
including Michigan, and membership is open to institutions worldwide. Attached hereto as
Exhibit B is a true and correct copy of participating members as of today.
The Composition of the HathiTrust Digital Library (“HDL”)
The combined corpus of the HDL now totals more than 10 million works and is
growing every day.
While the HDL corpus contains a very large number of works, we have a
significant amount of information regarding the general composition of the corpus.
For example, an analysis of the Library of Congress call numbers of works
provides an overview of the subject matter of the works found in the HDL:
Legend follows on next page.
Further, the HDL contains works in a multitude of languages as illustrated in the
Works within the HDL were also published across a broad range of dates,
commencing prior to the 15th century and running to the present as illustrated in the following
Interactive versions of each of the pie diagrams included in this declaration may be accessed through the
HathiTrust website located at www.hathitrust.org.
Approximately 30% of the corpus consists of material that is clearly within the
public domain. It bears emphasis that although we treat the balance of the works as if they are
in-copyright, this does not in fact mean that 70% of the corpus is in-copyright.
It is notoriously difficult to determine whether a particular work remains in-
copyright. For example, it is my understanding that works published between 1923 and 1963
entered the public domain unless they were renewed. Other copyright determinations may rely
on the death date of authors about whom very little is known or documented.
While the vast majority of works from this period were not renewed, determining
the renewal status of works from this period is an extraordinarily difficult task. The Copyright
Office’s records prior to January 1, 1978 are not completely or reliably digitized at the present
time. Therefore, the process of confirming whether a work was renewed involves the laborious
task of checking the physical records at the Copyright Office in Washington, D.C. Of course,
even if a search confirmed that the work had been renewed, there is no guarantee that a
subsequent search would identify the current rights holder.
Accordingly, we err on the side of classifying works as in-copyright even though
we are confident that many of those works are, in fact, in the public domain.
While precise calculation is difficult given the size of the HDL corpus, it is my
understanding that the vast majority of works in the corpus are now out of print (and, in fact, for
older works within the collection, have been out of print for decades).
Based upon an analysis of the call numbers within the archive, less than 9% of the
HDL corpus consists of prose fiction, poetry and drama. The remainder, approximately 90% of
the corpus, is likely to consist of factual works such as books and journals in many disciplines of
the arts, humanities, social sciences and sciences.
The Limited Uses of the Works within the HDL
We permit extremely limited use of the works in the HDL presumed to be in-
copyright. Specifically, patrons can only make the following uses of such works:
Full-Text Search: Through the Internet, the University’s students, faculty, and
staff, as well as the general public, may search for a particular term across all
works within the HDL. For those works that are not in the public domain or for
which the copyright holder has not expressly authorized use, the search results
indicate only the frequency and page numbers for which a particular term is
found within a particular book or periodical. (Unlike Google’s service, the results
do not show portions of text in “snippet” format.) At no time does the user have
digital access to any of the actual written content within such works (unless
he/she is afforded access as a certified print disabled user). In other words, none
of the work’s text is ever displayed on the computer screen or available for print,
not even one word.
Preservation: As noted above, before Google assisted with our digital
conversion, we were losing works every year as a result of the natural
disintegration of books (particularly the large segment of the collection written
on paper with high acid content). There was also the ever-present risk of other
more sudden forms of loss such as those occasioned by fire, flood, or theft. The
HDL now constitutes an extremely valuable protection against the prospect of
such loss and permits us to make copies pursuant to Section 108 of the Copyright
Act in the event that a work is lost damaged, deteriorating, lost, or stolen and a
replacement copy cannot be obtained by the University Library at a fair price.
Access for people with certified print disabilities: For decades, the University
Library has converted works in its collection to alternative formats for the blind
and other persons who have disabilities that prevent them from accessing printed
materials. Because digitization has significantly improved the quality of access
for print-disabled readers, the HDL was designed specifically to enable libraries
to make their collections accessible to such readers in digital format.
It is important to emphasize that given the very limited uses made of in-copyright
works in the HDL, our digitization of such materials does not diminish our purchases of incopyright works.
Indeed, in my opinion, if the HDL has any impact whatsoever on the University
Library’s acquisition of in-copyright material, it has a positive effect on our purchasing.
Experience and basic common sense tells me that scholars, students, and other
patrons are more likely to discover and use works that they can locate through digital search.
Such increased demand for works invariably translates into increased purchases. This is
because, as noted above (see ¶ 13), if a work is frequently requested by patrons, and we find
ourselves maintaining a waiting list for such works, we try to acquire more copies of that
particular work to meet patron demand.
For instance, the University Library includes in its print collection a work called
Television Program Master Index. This work contains an index of critical and historical
information regarding over 1,000 television shows contained in hundreds of books.
We digitized Television Program Master Index and, since it is presumed to be in-
copyright, we only permit HDL users to search the text of the work (i.e., the text is not available
except to users who have print disabilities). The HDL search functionality does quickly allow a
user to determine whether a particular show is covered in the work and, if it is, users typically
follow up by borrowing a print copy of the work from the library.
After we made the Television Program Master Index searchable through the
HDL, the usage of the University Library’s copy of the work increased dramatically and we
decided to acquire two additional copies of the work to satisfy the increasing demand.
The Benefits of the HDL’s Full-text Search Functionality
Full-text searching easily constitutes the most significant advance in library
search technology since the 1960’s.
Rather than combing through electronic cataloging records and attempting to
discern which works in our collection may be of interest, scholars can access the HDL website
and search the actual text of over 10 million books and journals. They can then immediately
access those works that are in the public domain or for which HathiTrust has the rights to display
the work in full text mode.
The Libraries, through the HDL, have made it possible for university students,
faculty, and staff, as well as the general public, to search the combined digital collections
contributed by the HathiTrust members. The search results display bibliographic information—
including title, author, publisher, and publication date—for books containing the search term, as
well as the frequency and page numbers for which the term is found, giving some clues as to
how useful the book might be.
For example, as of June 8, 2012, a search for “anaphylactic shock” identifies
38,239 works. If the user selects the work titled Allergy and Tissue Metabolism by W.G. Smith
– in which the term “anaphylactic shock” appears – the following bibliographic information is
Only the bibliographic information is displayed for this work. As reflected in the
above screen shot, the “viewability” of this work is search only; none of the text of the work is
available for display or download. (Only certified users who have print disabilities are able to
access the text through a secure network.)
If the user searches for the same term in this book, a screen showing the page
numbers for each use of the term is displayed as follows:
Based upon the search results, the user may decide to purchase the book or check
out a physical copy from one of the member libraries. The text of books is not made digitally
available unless it is determined that they are in the public domain or unless the rights holder has
given us permission to provide access to the work.5
Without the ability to search the entire full text, the content within these
resources—as distinct from basic bibliographic information describing that text—is invisible, or
nearly so, to the majority of researchers.
Moreover, full-text searching is incredibly helpful even with respect to resources
that could be identified as potentially relevant through a catalog search. For example, many
The exception to this is that Michigan students, faculty and staff certified through the Office of Services for
Students with Disabilities as having print disabilities are granted access to digitized files of presumed in-copyright
libraries, including the University Library, store a substantial portion of their collections in
offsite facilities, these materials are not immediately available to the scholar, and their retrieval
may require significant library staff time. If full-text searching is available, a researcher can use
it to determine whether a potentially relevant off-site work may be pertinent to her research.
Indeed, the HDL empowers scholars to perform types of research on a scale that
simply could not be performed before the HathiTrust libraries digitized their collections. For
example, and as explained in more detail in the Declaration of Dr. Neil R. Smalheiser, a digital
research method commonly called “text mining” is already proving itself a vital tool for
scholarly research and could potentially have application to works within the HDL corpus.
In short, having a digitized library of the magnitude represented by the HDL has
the potential to yield breakthrough scientific discoveries – potentially lifesaving discoveries –
that simply would not be possible if the service ceased to exist.
HathiTrust’s Efforts to Preserve the Libraries’ Collections and the Cultural Record
One of the primary goals of HathiTrust is the preservation of the published record
of human knowledge through the creation of reliable and accessible electronic representations of
the works within the corpus.
The use of redundant storage in geographically remote locations ensures long-
term preservation of digital data by protecting against the total loss that would otherwise occur
from the failure or destruction of the primary storage.
The HDL corpus is stored at Michigan with a “live” mirror site located at Indiana
University’s Indianapolis campus.
The existence of the mirror site allows for balancing the load of user web traffic
to avoid overburdening a single site, and each site acts as a back-up of the HDL collection in the
event that one site were to cease operation (for example, due to failure caused by a disaster, or
even as a result of routine maintenance).
To allow for recovery of the HDL collection in the event of a disaster, automatic
tape backups are created. Two encrypted backup tapes are created and are stored separately from
each other, as well as from the two “live” storage instances, and the tape backups are not
connected to the Internet. In the event of a disaster causing large-scale data loss to the primary
HDL corpus at Michigan and the mirror site at Indiana University’s Indianapolis campus, the
backup tapes could be used to restore the lost data.
The HDL has been certified as a trustworthy digital repository by the Center for
Research Libraries (“CRL”) through their rigorous Trustworthy Repositories Audit and
Certification (“TRAC”) assessment program, which included an in-depth preservation audit of
the HDL. Attached as Exhibit C is a true and correct copy of this certification report.
This audit began in November 2009 and was completed in December 2010. Only
a small number of digital repositories have been granted this certification.
In addition to protecting the digital data in the HDL from loss through disaster or
mechanical failure, Michigan employs many levels of security to control access to the content in
the HDL. The security employed by Michigan with respect to the digital library meets, and in
many ways exceeds, the specifications developed by the parties in the Google Books proposed
First, Michigan maintains, and requires the University of Indiana to maintain,
rigorous physical security controls. HDL servers, storage, and networking equipment at
Michigan and Indiana University are mounted in locked racks, and only six individuals at
Michigan and three at Indiana University have keys. The data centers housing HDL servers,
storage, and networking equipment at each site location are monitored by video surveillance, and
entry requires use of both a keycard and a biometric sensor.
Second, network access to the HDL corpus is highly restricted, even for the staff
of the data centers housing HDL equipment at Michigan and Indiana University. For example,
two levels of network firewalls are in place at each site, and Indiana University data center staff
do not have network access to the HDL corpus, only access to the physical equipment. For the
backup tapes, network access is limited to the administrators of the backup system, and these
individuals are not provided the encryption key that would be required to access the encrypted
files on the backup tapes.
Web access to the HDL corpus is also highly restricted. Access by users of the
HDL service is governed by primarily by the HDL rights database, which classifies each work by
presumed copyright status, and also by a user’s authentication to the system (e.g., as an
individual certified to have a print disability by Michigan’s Office of Services for Students with
Michigan staff who have been granted web access to in-copyright works in the
HDL in order to perform a job function for HathiTrust (e.g., because they are involved incopyright status determinations or image quality research) must use a specific, approved IP
address and successfully authenticate to the system using their username and password. In
addition, this web access is encrypted using Secure Socket Layer, a cryptographic protocol
providing communication security over the Internet.
Even where we do permit a work to be read online, such as a work in the public
domain, we make efforts to ensure that inappropriate levels of access do not take place. For
example, a mass download prevention system called “choke” is used to measure the rate of
activity (such as the rate a user is reading pages) by each individual user. If a user’s rate of
activity exceeds certain thresholds, the system assumes that the user is mechanized (e.g., a web
robot) and blocks that user’s access for a set period of time.
A proxy detection system is also used. The proxy detection system consults real-
time blacklist services to identify users that appear to be employing known proxy or
anonymization servers to falsify a physical presence in the United States (in an attempt to
subvert HDL use limitations that restrict access to some materials to users in the United States).
If the proxy detection system identifies such a user, that user’s access is blocked. Web access to
the HDL is also logged by IP address and, when available, by username, the HTTP request
string, and whether or not the volume requested is identified as presumed in-copyright in the
HDL rights database. Such logs are regularly reviewed.
Access for Individuals with Print Disabilities
One of the primary goals of HathiTrust has always been to enable people who
have print disabilities to access the wealth of information within library collections. We
constructed the archive with the objective of making the world’s first accessible research library.
Access for people who have print disabilities is a part of our agreements with HathiTrust
members and it is one of the core services around which the archive is built, along with
preservation and search.
For centuries, libraries have been inaccessible to people who have a broad range
of disabilities because library collections have not been available in accessible formats. As a
result, individuals with print disabilities simply do not have equal access to library collections
and are denied the full promise offered by libraries. This is particularly true in the academy,
where access to the written record is at the heart of most scholarly pursuits.
For instance, given the number of works a student must review to write a typical
term paper, she may have to wait weeks or even months to get access to converted works—in
Braille or audio recording format—that she has not yet even been able to determine whether or
not she will use. Even digitizing the works on a case-by-case basis can take weeks, which makes
pursuing an education that much more difficult for a student who has a print disability.
Accessibility has been a hallmark of our digitization efforts from the earliest days
and, when we discussed digitizing the library in connection with Google, it was one of our
primary objectives to make our library collections immediately accessible to people who have
In fact, in 2005, when the National Federation of the Blind first contacted us
about Google Books, we invited them to campus to demonstrate the accessible library we had
Our accessible library works like this:
A person who has a print disability seeks certification from a qualified expert.
The expert informs the library when a particular patron has a print disability for
which digital access is a reasonable accommodation.
We explain the digital library to the patron, we describe appropriate uses of the
service (including warnings about copyright infringement), and we enable the
patron to get secure access to the accessible library.
If we have a digital copy of a work, the authorized patron with a print disability
will have immediate access to that work in a format that can be made accessible
through a variety of adaptive technologies. For example, the disabled user can
enable software that translates the text into spoken words.
For our patrons who have print disabilities, this service makes it possible for these
individuals to achieve their full academic and scholarly potential.
The Orphan Works Project
As noted above, we believe it is clear that scholars, students, and other patrons are
more likely to discover and use works that they can access in a digital format.
Therefore, we strive to find ways to provide lawful access to digital works we
have digitized and we spend millions of dollars each year to license access to digital content for
our campus. Unfortunately, with orphan works, because a rights holder cannot be identified,
there is no way to license digital access.
With this in mind, we developed a project that we called the “Orphan Works
Project” (or “OWP” for short). The goal of the OWP was to identify orphan works and then to
enable limited uses of those works to students, scholars, and walk-in patrons.
The OWP, as initially contemplated, had two steps: (i) first, we identify potential
orphan works through a diligent, reasonable process that eliminates works that are claimed by a
putative rights holder or that are otherwise found not to be orphans; and (ii) based on the results
of the first step, we planned to enable limited uses of orphan works by Michigan students,
scholars, and walk-in patrons.
As contemplated, the OWP would have allowed users access to orphan works for
the purpose of online review, with the number of users permitted to view a given work limited at
any one time to the number of copies held by the library. Readers would have been reminded,
through watermarking and other explicit notices, that the books are subject to copyright.
After completing its initial process to identify potential orphan works, Michigan
posted a list of potential orphan works on the Michigan library website on or about July 15, 2011
and provided a link to the list on the HathiTrust website. The public posting was a conscious
part of the identification process. With more eyes on possible orphan works, it was our intent to
increase the accuracy of the identification process.
The public scrutiny component of the OWP worked as intended in instances when
works identified as potential orphan works were claimed by the putative rights holder or the
rights holder was identified by a third party. Had the project moved forward, these works would
not have been treated as orphan works.
In evaluating the methods used to determine potential orphan works, we
concluded that there were flaws in our pilot process and that we needed to remedy those flaws
before moving ahead with the OWP. We therefore suspended the process and never proceeded
to the second step of the project (i.e., we never proceeded to enable limited uses of putative
Michigan, which is the only member of HathiTrust that has actively engaged in
the work of the OWP, is continuing to study ways to improve the candidate identification
process. In fact, we reached out to plaintiff The Authors Guild and other associations (including
the Association of American Publishers) to invite their input on ways to improve the candidate
identification process. After initially expressing interest in speaking with us and participating in
this process, the Authors Guild thereafter abruptly filed this lawsuit.
Not a single patron has been given access to a work through the OWP and at
present, we do not know whether or how the OWP will continue.
In the event that Michigan decided to move forward with the OWP and provide
access of works to users through the project, it would seek to comply with the requirements of
section 108(e) of the Copyright Act.