Public.Resource.org v. United States Internal Revenue Service
Filing
48
Declaration of Carl Malamud in Support of 47 MOTION for Summary Judgment Plaintiff Public.Resource.Org's Consolidated Cross-Motion for Summary Judgment and Opposition to Defendant's Motion for Summary Judgment filed byPublic.Resource.org. (Attachments: # 1 Exhibit A)(Related document(s) 47 ) (Burke, Thomas) (Filed on 9/29/2014)
1
2
3
4
5
6
7
8
9
DAVIS WRIGHT TREMAINE LLP
10
11
12
13
14
15
THOMAS R. BURKE (CA State Bar No. 141930)
DAVIS WRIGHT TREMAINE LLP
505 Montgomery Street, Suite 800
San Francisco, California 94111
Telephone:
(415) 276-6500
Facsimile:
(415) 276-6599
Email:
thomasburke@dwt.com
RONALD G. LONDON (Pro Hac Vice)
DAVIS WRIGHT TREMAINE LLP
1919 Pennsylvania Ave., N.W., Suite 800
Washington, DC 20006
Telephone:
(202) 973-4200
Email:
ronnielondon@dwt.com
DAN LAIDMAN (State Bar No. 274482)
DAVIS WRIGHT TREMAINE LLP
865 South Figueroa Street, Suite 2400
Los Angeles, CA 90017-2566
Telephone:
(213) 633-6800
Facsimile:
(213) 633-6899
Email:
danlaidman@dwt.com
DAVID HALPERIN (Pro Hac Vice)
1530 P Street NW
Washington, DC 20005
Telephone:
(202) 905-3434
Email:
davidhalperindc@gmail.com
Attorneys for Plaintiff Public.Resource.Org
16
17
IN THE UNITED STATES DISTRICT COURT
18
THE NORTHERN DISTRICT OF CALIFORNIA
19
SAN FRANCISCO DIVISION
20
21
22
23
24
25
26
PUBLIC.RESOURCE.ORG., a California non- ) Case No. 3:13-CV-02789-WHO
profit organization,
)
) DECLARATION OF CARL MALAMUD
Plaintiff,
)
)
v.
)
)
UNITED STATES INTERNAL REVENUE
)
SERVICE,
)
)
Defendant.
)
)
27
28
1
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
I, Carl Malamud, declare as follows:
2
1.
Since 2007, I have been the President and Founder of Public.Resource.Org, a
3
nonprofit corporation and the Plaintiff in this FOIA action. I have personal knowledge of the
4
matters stated in this declaration and could competently testify to them if called as a witness.
5
6
Mr. Malamud’s Background and Experience
2.
My formal education was in Business Economics and Public Policy at the Indiana
7
University School of Business where I completed all coursework for the Doctorate in Business
8
Administration and received an MBA in 1982.
9
3.
From 1982 to 1992, I worked professionally in the field of computer networks,
DAVIS WRIGHT TREMAINE LLP
10
including positions at the Board of Governors of the Federal Reserve System, numerous
11
consulting engagements with government groups such as the Department of Defense, wrote as a
12
Contributing Editor and columnist for numerous trade publications such as Communications
13
Week, and authored 8 professional reference books.
14
4.
From 1993 to 1996, I served full-time as the founder and executive director of the
15
Internet Multicasting Service, where I started and ran the first radio station on the Internet. As
16
part of my work at the Internet Multicasting Service, I was also responsible for putting the U.S.
17
Securities and Exchange Commission EDGAR system on the Internet and then donating
18
computers and software to the SEC so they could take my system over. I was also responsible for
19
putting numerous other government databases on the Internet for the first time, including the U.S.
20
Patent database.
21
5.
In 1998 and 1999, I was the CEO of Invisible Worlds. During that period, I
22
worked with my Chief Technology Officer, Dr. Marshall T. Rose, to help develop the tools used
23
to produce Internet Standards. These tools are based on the XML markup language, which is the
24
same language that the IRS uses for their Modernized e-File (MeF) format. These tools continue
25
to be used as the basis for authoring documents for the Internet standards process. The
26
specifications for this work have been published as Internet Request for Comments 2629, “Writing
27
I-Ds and RFCs using XML.” That standard may be found at http://tools.ietf.org/html/rfc2629.
28
2
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
6.
In 2004, I was a consultant on documentation strategies to the Internet Systems
2
Consortium, a nonprofit corporation that produces software essential to the operation of the
3
Domain Name System. I was the founding Chairman of the Board of the Internet Systems
4
Consortium in 1994. ISC is the author and publisher of BIND, which is used by many large
5
Domain Name Servers throughout the world and also operates the “F” Root Name server, which is
6
one of the core authoritative name servers that make the Internet function. As a consultant on
7
documentation strategies, I spent a great deal of time working with Docbook, an XML-based
8
authoring language for technical documentation.
9
7.
In 2007, I founded Public.Resource.Org, a nonprofit corporation which is based in
DAVIS WRIGHT TREMAINE LLP
10
California. We are responsible for placing the historical opinions of the U.S. Court of Appeals
11
back to the founding of the court on the Internet for the first time. As part of that work we
12
discovered numerous Social Security Numbers (SSNs) in those opinions and notified the Court of
13
the presence of this information. On July 16, 2008, Chief Judge Lee H. Rosenthal thanked us for
14
our efforts on behalf of the Committee on Rules of Practice and Procedure of the Judicial
15
Conference of the United States. That letter may be found at
16
https://public.resource.org/scribd/7512576.pdf.
17
8.
In 2008 and 2009, I conducted a series of audits on 20 million pages of PACER
18
documents and discovered numerous SSNs. We notified the Chief Judges of 32 U.S. District
19
Courts of these findings and this resulted in changes in the privacy procedures for the PACER
20
documents and acknowledgment of our efforts by several Chief Judges and by the Committee on
21
Rules of Practice and Procedure of the Judicial Conference of the United States.
22
9.
In 2007 and then again in 2010, I submitted reports to the Speaker of the House of
23
Representatives concerning my recommendations for broader availability of video from
24
Congressional hearings. On January 5, 2011, the Speaker of the House acknowledged my efforts
25
and authorized me to work with the Committee on Oversight and Government Reform and the
26
House Broadcast Studio, an effort that led to the posting of over 14,000 hours of video from
27
Congressional hearings. The letter from the Speaker may be found at
28
https://law.resource.org/rfcs/gov.house.20110105.pdf.
3
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
10.
In 2008, I served as an advisor to the Presidential transition, where I outlined a
2
series of proposed changes in how the Official Journals of Government, including the Federal
3
Register, can be published. Those changes were implemented and have resulted in a substantial
4
improvement in the online system, which is visible at federalregister.gov.
5
11.
In 2008, I began a program called FedFlix in cooperation with the National
6
Technical Information Service (NTIS) and the National Archives and Records Administration.
7
The program sent volunteers into the National Archives to copy videos and obtained copies of
8
video from numerous agencies, including the Department of Defense, OSHA, and the Mine Health
9
and Safety Administration. Approximately 6,000 videos were copied and posted to YouTube and
DAVIS WRIGHT TREMAINE LLP
10
the Internet Archive and have since been viewed over 50 million times.
11
12
Mr. Malamud’s Work with the IRS Exempt Organizations Database
12.
In 2008, I began working with the IRS Exempt Organizations database by
13
submitting payment for 6 years of DVDs and developing software to process that data and post it
14
on the Internet with no restrictions on use. Since 2008, I have processed and posted on the
15
Internet over 7,634,050 instances of the Form 990 filed by Exempt Organizations. The data that I
16
processed was made available on our servers, on nonprofit services such as the Internet Archive,
17
and forms the basis for numerous other commercial and non-commercial systems that analyze and
18
host Form 990 data. Our archive of Form 990s is the only one freely available on the Internet with
19
no restrictions on access or use. We make this data available free of charge and with no
20
restrictions, just as we have with court documents and numerous other government databases,
21
because we believe that these Works of Government should be more broadly available.
22
13.
As part of my work, I performed audits of the Exempt Organizations database
23
looking for instances of where the IRS has released individuals’ SSNs as part of its release of
24
Form 990 data. Our best estimate is that there are close to 600,000 SSNs in the Exempt
25
Organizations data we purchased from the IRS. When I find SSNs in a Form 990, I redact that
26
information and replace the files we made available for public view. I also systematically notify
27
the IRS, GuideStar, the Foundation Center, and others who I know have copies of this database.
28
4
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
14.
On July 2, 2013, I notified the IRS and the Treasury Inspector General for Tax
2
Administration (TIGTA) of a large number of Social Security Numbers for political organizations
3
filing under Section 527 that were on the IRS web site. That notification can be found at
4
https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20130702.pdf. The Inspector General assigned
5
complaint number 63-1307-0025-C to their investigation of this matter.
6
15.
On July 15, 2013, Congressman Tom Latham and 41 other members of the House
7
of Representatives wrote to the Acting Commissioner of the Internal Revenue Service to request
8
an explanation of this privacy breach. That letter may be found at
9
https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20130715.pdf.
DAVIS WRIGHT TREMAINE LLP
10
16.
On September 16, 2013, the Acting Commissioner wrote to Congressman Tom
11
Latham and informed the Congress that the IRS had changed the position on redaction of Social
12
Security Numbers. That letter may be found at
13
https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20130916.pdf.
14
17.
On December 6, 2013, the Internal Revenue Service updated section 3.20.13.13.2
15
of the Internal Revenue Manual to permit redaction of Social Security Numbers for Section 527
16
Political Organizations. Those changes were effective January 1, 2014. This section of the IRM
17
may be found at http://www.irs.gov/irm/part3/irm_03-020-013r.html.
18
18.
On April 22, 2014, I notified the IRS Commissioner and the Inspector General of a
19
large number of Social Security Numbers in returns for Exempt Organizations that are not
20
Political Organizations. That letter may be found at
21
https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20140422.pdf.
22
19.
On July 7, 2014, I concluded the audit of SSNs and sent the IRS Commissioner and
23
the Inspector General detailed audit results, including copies of 9,392 returns that I had redacted
24
with detailed recommendations on steps the IRS should take to mitigate this problem. The cover
25
letter for this audit may be found at https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20140707.pdf.
26
The Inspector General assigned complaint number 63-1407-0060-C to their investigation of this
27
matter.
28
5
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
20.
On July 24, 2014, I notified the IRS of my analysis of the April, 2014 shipment of
2
returns. In that notification, I informed the IRS of a major privacy breach for an exempt
3
organization that had e-filed their results. That notice can be found at
4
https://bulk.resource.org/irs.gov/eo/doc/irs.gov.20140707.pdf.
5
21.
In order to find privacy breaches in Exempt Organization filings, I am forced to use
6
Optical Character Recognition. For the April, 2014 results, this required running OCR on 546,631
7
pages of returns. I started that process on July 18 and by devoting a 12-CPU system entirely to the
8
task, was able to process 177,144 pages per day. The process was completed on July 22.
9
DAVIS WRIGHT TREMAINE LLP
10
22.
In addition to taking a lot of time, in my considerable experience, using OCR is
inherently inaccurate. For example, the letter O can easily be confused with the number 0.
11
12
Mr. Malamud’s Work with IRS Form 990.
23.
As part of my work on the IRS Exempt Organizations database, I have carefully
13
examined the documentation on the Modernized e-File (MeF) format. That information can be
14
found at http://www.irs.gov/Tax-Professionals/e-File-Providers-&-Partners/Modernized-e-File-
15
Program-Information.
16
24.
I have read and am familiar with the MeF Submission Composition Guide which
17
details the structure of an e-file submission, including the XML format for a submitted return, the
18
“envelope” for that submission in the SOAP format (which is also based on XML), and the rules
19
for submitting attachments as PDF files. That guide may be found at http://www.irs.gov/pub/irs-
20
schema/MeF_Submission_Composition_Guide_v1-4.pdf.
21
25.
I have read and am familiar with the Schemas and Business Rules for Exempt
22
Organizations, including Forms 990, 990EZ, 990-N, 990-PF, 1120-POL, and 8868 as well as
23
Corporate Forms 1120, 1120S, and 7004. That information may be found at
24
http://www.irs.gov/Charities-&-Non-Profits/Current-Valid-XML-Schemas-and-Business-Rules-
25
for-Exempt-Organizations-Modernized-e-File.
26
26.
The IRS does not provide a sample instance of an XML file for the Form 990 or
27
Form 990-PF. However, I was able to examine a sample instance of an XML file for a corporate
28
return based on Form 1120. That file is contained in the IRS publication “2014 Valid XML
6
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
Schemas and Business Rules for 1120, 1120S, 1120-F, and 7004 Modernized e-File (MeF).” That
2
information can be found at http://www.irs.gov/Tax-Professionals/e-File-Providers-&-
3
Partners/2014-Valid-XML-Schemas-and-Business-Rules-for-1120-1120S-1120-F-and-7004-
4
Modernized-e-File.
5
27.
The name of the file that I examined is
6
Example_TransmissionWithConsolidatedReturn.xml. A copy of the file I examined is available at
7
https://bulk.resource.org/irs.gov/eo/doc/doc/Example_TransmissionWithConsolidatedReturn.xml.
8
9
28.
In order to remove (redact) one element nested inside an XML file, I use a common
programmers tool called a “text editor.” Any professional programmer has access to such
DAVIS WRIGHT TREMAINE LLP
10
software. I use a text editor called bbedit on my Apple computer. Other examples of text editors
11
are “vi” on any Unix computer, and “notepad” on any Windows computer. I used the bbedit
12
software on the file named Example_TransmissionWithConsolidatedReturn.xml, removed the
13
element IRS1120LScheduleB, and saved the file with a new name. That entire process took me
14
57 seconds.
15
29.
There are a number of techniques used to transform and process XML files. A
16
common technique is the use of Style Sheets, a standard defined by the World Wide Web
17
Consortium, the standards-making body for the World Wide Web. The definition of Extensible
18
Stylesheet Transformations (XSLT) may be found at http://www.w3.org/TR/xslt.
19
30.
The IRS uses this technique to publish a number of sample files that can be used to
20
transform returns in MeF. These style sheets can be used by businesses, tax preparers, and others
21
to transform a return into another format, such as transforming the XML into HTML for display in
22
a web browser. The IRS publishes these style sheets at http://www.irs.gov/Tax-Professionals/e-
23
File-Providers-&-Partners/Modernized-e-File-MeF-Stylesheets.
24
31.
I wrote a very simple style sheet, a true and correct copy of which is attached as
25
Exhibit A, that is based on something called an “identity transformation.” An identity
26
transformation is a style sheet that copies everything that is input to the output with no changes.
27
An example of the identity transformation may be found in Section 7.5 of the XSLT specification.
28
I added a single line to the style sheet which copies every element except the element
7
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
1
IRS1120LScheduleB. It took me almost one hour to write this style sheet because it had been
2
several years since I looked at style sheets and had to use Google to understand how to list the
3
namespaces that the IRS uses. Using a free open source program which comes on my computer
4
called xsltproc, I was able to specify the name of an input file, the name of the style sheet, and the
5
name of the output file. I ran that program and produced an XML file with the Schedule B
6
removed. It took 1.429 seconds to execute this command on my desktop computer. I ran this
7
program on a single instance of a Form 990, but this program could also be used, without
8
modification, to process hundreds or thousands of instances of the Form 990. It can also be easily
9
modified to remove multiple schedules.
DAVIS WRIGHT TREMAINE LLP
10
32.
Availability of returns in MeF format are significantly easier to work with than the
11
bitmap files produced by the IRS and shipped on DVDs. For my particular application, finding
12
Social Security Numbers in current returns, having the e-file data would have saved me a week of
13
initial processing of the data and would have found much more reliable results.
14
33.
In addition to locating SSNs, the availability of the data in MeF format would
15
unlock a large number of other applications. For example, in order to find returns in our collection
16
of over 7.5 million Form 990s, computer programs must use a variety of search indices. With the
17
data the IRS currently provides, we know the name of the nonprofit and rudimentary information
18
such as the city, state, date of filing, and assets. If information were available in MeF format,
19
much more useful search capabilities would be possible using all of the data fields in the return to
20
help the public readily access the information that they desire.
21
34.
Public.Resource.Org’s request for Exempt Organization returns in MeF format
22
instead of bitmap images would be of substantial use to perform audits for privacy violations of
23
Exempt Organization returns. If the MeF format data were available, I would be able to notify the
24
IRS and other organizations with copies of this data more quickly about any breaches that were
25
discovered. In addition to finding privacy breaches, there would be a large number of other
26
beneficial applications in the public interest. It is my considered technical opinion, based on over
27
30 years as a computer professional, extensive work with the XML standard, and 6 years of
28
8
DECLARATION OF CARL MALAMUD
Case No. 3:13-CV-02789-WHO
DWT 24908918v1 0200593-000001
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?