Xerox Corporation v. Google Inc. et al

Filing 142

CLAIM CONSTRUCTION OPENING BRIEF filed by Xerox Corporation. (Attachments: # 1 Exhibit A-D, # 2 Exhibit E-J, # 3 Appendix A)(Day, John)

Xerox Corporation v. Google Inc. et al Doc. 142 IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF DELAWARE XEROX CORPORATION, Plaintiff-Counterclaim Defendant, v. GOOGLE INC., YAHOO! INC., RIGHT MEDIA INC., RIGHT MEDIA LLC, YOUTUBE, INC. AND YOUTUBE, LLC, Defendants-Counterclaim Plaintiffs. ) ) ) ) ) ) ) ) ) ) ) C.A. No. 10-136-LPS PLAINTIFF XEROX CORPORATION'S OPENING CLAIM CONSTRUCTION BRIEF Lawrence C. Ashby (I.D. #468) John G. Day (I.D. #2403) Lauren E. Maguire (I.D. #4261) ASHBY & GEDDES 500 Delaware Avenue, 8th Floor P.O. Box 1150 Wilmington, DE 19899 (302) 654-1888 Attorneys for Plaintiff Xerox Corporation Of Counsel: Richard J. Stark Andrei Harasymiak CRAVATH, SWAINE & MOORE LLP Worldwide Plaza 825 Eighth Avenue New York, NY 10019-7475 (212) 474-1000 Dated: March 25, 2011 {00502187;v1} Dockets.Justia.com TABLE OF CONTENTS Page I. II. III. IV. INTRODUCTION .................................................................................................................1 GOVERNING LAW..............................................................................................................2 ASSERTED CLAIMS ...........................................................................................................2 DISPUTED CLAIM TERMS ................................................................................................3 A. "Selected document content" (Claims 1, 2, 18, 19).......................................................3 1. Law governing indefiniteness ..................................................................................4 2. "Selected document content" is the input to the claimed method ...........................4 3. "Selected document content" can consist of all or part of a document ...................5 4. "Selected document content" must be in electronic form........................................6 B. "Classification label" (Claims 1, 18) .............................................................................7 1. Ordinary meaning of "label"....................................................................................7 2. The classification label identifies categories in the organized classification of document content but need not be in any particular format ................................8 C. "Categorizing the selected document content using the organized classification of document content for assigning the selected document content a classification label" (Claims 1, 18) ......................................................................................................9 1. The default patent law rule is that "a" means "one or more" ..................................9 2. The specification provides no support for requiring a single classification label........................................................................................................................10 3. Xerox's proposed construction reflects the agreed-upon meaning of OCDC .......11 D. "Query" (Claims 1, 2, 18, 19) ......................................................................................11 1. Xerox's proposed construction accords with the claim language..........................12 2. Xerox's proposed construction accords with the specification..............................12 E. "To restrict a search at the information retrieval system for information concerning the set of entities to the category of information in the information retrieval system identified by the assigned classification label" (Claims 1, 18) .........14 "Characteristic vocabulary" (Claim 10).......................................................................16 Order of steps (Claims 1 and 2; Claims 18 and 19).....................................................17 Order of steps (Claim 1; Claim 18)..............................................................................19 F. G. H. V. CONCLUSION....................................................................................................................20 {00502187;v1} i TABLE OF AUTHORITIES Page(s) Cases Altiris, Inc. v. Symantec Corp., 318 F.3d 1363 (Fed. Cir. 2003) ...................................................18 Baldwin Graphic Systems, Inc. v. Siebert, Inc., 512 F.3d 1338 (Fed. Cir. 2008)....................10, 18 Brookhill-Wilk 1, LLC v. Intuitive Surgical, Inc., 334 F.3d 1294 (Fed. Cir. 2003).........................2 Exxon Res. & Eng'g Co. v. U.S., 265 F.3d 1371 (Fed. Cir. 2001)...................................................4 Ingenio, Filiale De Loto-Quebec, Inc. v. Gamelogic, Inc., 445 F.Supp.2d 443 (D. Del. 2006).............................................................................................9 KCJ Corp. v. Kinetic Concepts, Inc., 223 F.3d 1351 (Fed. Cir. 2000)............................................9 Leader Techs., Inc. v. Facebook, Inc., 692 F.Supp.2d 425 (D. Del. 2010) .....................................4 Markman v. Westview Instruments, Inc., 52 F.3d 967 (Fed. Cir. 1995), aff'd, 517 U.S. 370 (1996) .........................................................2 Phillips v. AWH Corp., 415 F.3d 1303 (Fed. Cir. 2005) .................................................................2 Praxair, Inc. v. ATMI, Inc., 543 F.3d 1306 (Fed. Cir. 2008)...........................................................4 Young v. Lumenis, Inc., 492 F.3d 1336 (Fed. Cir. 2007).................................................................4 Statutes & Rules 35 U.S.C. 112..............................................................................................................................18 {00502187;v1} ii I. INTRODUCTION The inventions of the U.S. Patent No. 6,778,979 ("'979 Patent"), a copy of which is attached as an appendix to this memorandum, are directed to search technology, specifically, using the content of a document as the basis for automatically generating a query to an information retrieval system such as a database. The objective of the claimed automatic query generation techniques is "to improve the quality (e.g., in terms of precision recall) of information retrieval systems". (979/48:37-39.)1 "Precision recall" means both successfully retrieving information ("recall"), and retrieving relevant information ("precision"). (See Ex. A at 429.) The '979 Patent achieves this objective by automatically identifying certain recognized words and phrases (referred to as "entities") in the document content; by automatically determining the categories that describe the subject matter of the document content; and by automatically formulating a query using both the entities and "classification labels" (identifiers associated with the subject-matter categories). (979/2:64-3:15.) Because the "classification labels" correspond to categories of information in an information retrieval system, their inclusion in the query restricts any subsequent search at the information retrieval system to those particular information-system categories. (979/76:10-31.) The patent specification provides the following example. The phrase "seven up" is automatically identified in document content. As it happens, "seven up" is not only a soft drink, but also a gene found in fruit flies. (See Ex. B at 1123.) The document content is analyzed and automatically categorized to determine its subject matter. Finally, a query is formulated that includes "seven up" and the category (or classification) label "science+biology+genetics": 1 This memorandum uses the citation convention "[patent number]/[column]:[line]". 1 {00502187;v1} (979/50:1-11; Fig. 39). In contrast to a query concerning "seven up" alone, which would locate a great deal of information concerning soda, the query is restricted to "science+biology+genetics" and would therefore locate information in the category of science, biology and genetics--greatly improving the precision (i.e., relevance) of the results to the original document content. II. GOVERNING LAW Claim construction is performed by the court as a matter of law. Markman v. Westview Instruments, Inc., 52 F.3d 967, 979 (Fed. Cir. 1995), aff'd, 517 U.S. 370 (1996). Claim construction should be based as much as possible on the intrinsic evidence: the language of the claims, the patent specification and the prosecution history. See id.; Phillips v. AWH Corp., 415 F.3d 1303, 1313-14 (Fed. Cir. 2005) (en banc). Claim terms are read in light of the claims in which they appear and the specification. Phillips, 415 F.3d at 1313. Although the specification is a valuable guide in claim construction, the Federal Circuit has warned against confining claims to the very specific embodiments of the claimed inventions disclosed in the specification. Id. at 1323. A claim term is given its ordinary and customary meaning to a person of ordinary skill in the field to which the invention pertains, unless the intrinsic evidence indicates that the inventors used that term with a different meaning. Brookhill-Wilk 1, LLC v. Intuitive Surgical, Inc., 334 F.3d 1294, 1298 (Fed. Cir. 2003). III. ASSERTED CLAIMS Xerox is asserting seven claims of the '979 Patent: Claims 1, 2, 3, 5, 10, 18 and 19. Claims 1 and 18 are independent claims. Claim 1 is directed to a method. Claim 18 is directed to an article of manufacture2 that contains a memory and computer instructions that perform the method of Claim 1. Accordingly, when this memorandum discusses Claim 1, the analysis The parties agree that the claimed "article of manufacture" is "a computer program existent (permanently, temporarily, or transitorily) on any computer-usable medium such as on any memory device or in any transmitting device". (D.I. 133, at 3 (attached as Ex. C).) {00502187;v1} 2 2 applies equally to Claim 18. Similarly, Claim 2 (which depends from Claim 1) covers a method, while Claim 19 (which depends from Claim 18) covers computer instructions for performing that method. The remaining claims depend directly or indirectly from Claim 1. The specification of the '979 Patent describes a comprehensive document management system with many capabilities that are not claimed by the '979 Patent, some of which are the subject of other Xerox patents. Section F.3 (cols. 48-52) is the principal portion of the specification that describes the '979 invention.3 Earlier descriptions of a "text categorizer" (Section F.1 (cols. 41-46)) are relevant to categorization of document content for query formulation, as stated in Section F.3. (979/48:45-48.) In addition, Section F.3 explains that both the categorizer and Section B.4 (cols. 10-11) are relevant to automatic entity identification (979/51:27-30). IV. DISPUTED CLAIM TERMS A. "Selected document content" (Claims 1, 2, 18, 19) Defendants' Construction indefinite Xerox's Construction all or part of the content of a document in electronic form Defendants contend that "selected document content" is indefinite. Defendants' position is untenable. Although Defendants were served with interrogatories in April 2010 seeking detailed contentions for all invalidity defenses, Defendants never contended that any claim terms are indefinite, let alone provided a basis for contending that term "selected document content" is in any way indefinite. (D.I. 125 at 8-17; D.I. 127, Ex. B at 2-4, 6-9.) Accordingly, Defendants should be precluded from raising this defense. Furthermore, the examiner never raised any uncertainty concerning this term during prosecution, and there is nothing unclear about it. The intrinsic evidence demonstrates that "selected document content" comprises an input to the claimed method: the document content in 3 As noted at various points in the patent prosecution (see, e.g., Ex. D at 4; Ex. E at 3). 3 {00502187;v1} which entities are identified and which is categorized. The specification teaches that either all or part of a document can be used as the input for that entity identification and categorization. Finally, because the claims plainly apply in the context of a computer system and require "automatic" identification of entities in, as well as "automatic" categorization of, the selected document content, that content must be in electronic form. Accordingly, the Court should reject Defendants' indefiniteness argument and adopt Xerox's construction for this term. 1. Law governing indefiniteness General principles of claim construction apply when determining whether a claim term is indefinite. Young v. Lumenis, Inc., 492 F.3d 1336, 1346 (Fed. Cir. 2007); Leader Techs., Inc. v. Facebook, Inc., 692 F.Supp.2d 425, 436 (D. Del. 2010). "A claim will be found indefinite only if it `is insolubly ambiguous, and no narrowing construction can properly be adopted . . . .'" Praxair, Inc. v. ATMI, Inc., 543 F.3d 1306, 1319 (Fed. Cir. 2008); Leader Techs., 692 F.Supp.2d at 436; see also Exxon Research & Eng'g Co. v. U.S., 265 F.3d 1371, 1375 (Fed. Cir. 2001). Conversely, a claim term is definite if it can be given any reasonable meaning. Young, 492 F.3d at 1346; Leader Techs., 692 F.Supp.2d at 436. "If the meaning of the claim is discernible, even though the task may be formidable and the conclusion may be one over which reasonable persons will disagree, [the Federal Circuit has] held the claim sufficiently clear to avoid invalidity on indefiniteness grounds". Exxon, 265 F.3d at 1375; Leader Techs., 692 F.Supp.2d at 436. As discussed below, "selected document content" is not indefinite under these exacting tests since its meaning is readily apparent from the patent. 2. "Selected document content" is the input to the claimed method As the claim language indicates, "selected document content" simply comprises the document content that serves as an input to the claimed method: {00502187;v1} 4 A method for automatically generating a query from selected document content comprising: ... [b] automatically identifying a set of entities in the selected document content . . .; [c] automatically categorizing the selected document content using the organized classification of document content for assigning the selected document content a classification label . . . . (979/76:10-26; emphasis added.) The "selected document content" is the content in a document in which entities are identified and that is categorized, thus producing the data used to automatically generate a query. (See also Figs. 36, 38 (item 3612); Fig. 39 (item 3902); Ex. F at 10 ("Applicant's claims recite automatically generating a query from selected document content, from which both a set of entities and a classification label are automatically identified and assigned, respectively.") (emphasis in original).) Indeed, the parties agree that the same "selected document content" is used for entity identification, categorization and query generation. (Ex. C at 2.) 3. "Selected document content" can consist of all or part of a document There is also nothing unclear about what the "selected document content" used in performing the claimed method consists of: all or part of a document. With reference to Figure 38, which "illustrates the elements and flow of information for generating a query," (979/48:4041), the specification plainly states: In operation as shown in FIG. 38, the document content 3612 or alternatively limited context (i.e., words, sentences, or paragraphs) surrounding the entity 3808 is analyzed by categorizer 3610 to produce a set of categories 3620. (979/48:5255.) Thus, the specification expressly teaches that the "selected document content" in which an entity is identified and which is categorized can be all or part of the document.4 This is further confirmed by the specification embodiment of Claim 2, which requires that the formulated query also include "terms relating to context information surrounding the set of entities in the selected document content". (979/76:33-35; emphasis added.) During the prosecution history, Xerox indicated that the quoted language was embodied in "aspect vector 3822". (See Ex. E at 9.) The specification description of the aspect vector again indicates that {00502187;v1} 4 5 4. "Selected document content" must be in electronic form In addition to comprising all or part of the content of a document, "selected document content" must be in electronic form. The specification defines the term "document" broadly as "an electronic (e.g., digital) or physical (e.g., paper) recording of information. . .". (979/6:5253.) However, the claims do not use the term "document"; they use the term "selected document content". Moreover, the claims are directed to "automatically generating queries" from that content, both by "automatically identifying" entities in that content and "automatically categorizing" that content. The repeated uses of the word "automatic" indicate that the selected document content must be in electronic form. Moreover, the context for the inventions is obviously a computer system. (See Claim 1 (referring to an "information retrieval system"); Claim 18 ("an article of manufacture for use in a computer system").) The specification further demonstrates this. For example, in the specification, a "text categorizer" is used to perform the "categorizing" step (Step (c))5 of Claim 1. (See cols. 41-46.) The specification indicates that this categorizer is "a utility integrated with or accessed by the meta-document server". (979/41:50-51.) A "utility" (i.e., a "utility program") is "[a] program that performs a specific task related to the management of computer functions, resources, or files" (Ex. I at 1511), while the specification teaches that the referenced metadocument server is a computer-implemented system (see 979/74:31-37). Similarly, the specification teaches that "[e]ntity identification or extraction can be performed: (a) manually by a user, (b) automatically by entity extractor 3804 shown in FIG. 38 using for example a method as described in section B.4, or (c) by the categorizer 3610". the relevant content for generating the aspect vector is all or part of a document: "[p]roducing an aspect vector contextualizes queries related to the entities by examining a portion of the document content that may range from all of it to one or more paragraphs and/or segments around the entity". (979/50:21-25; emphasis added.) 5 The parties refer to the steps of the independent claims using letter references. (See Ex. G.) {00502187;v1} 6 (979/51:27-30.) The asserted claims require "automatic" (as opposed to manual) entity identification, so the specification teaches using either the categorizer 3610 (a software utility), or the computer-implemented methods described in Section B.4 (979/10:44-11:45) for entity identification. Since the entity identification and categorization methods taught in the patent all operate on electronic document content, "selected document content" must be in electronic form. Because "selected document content" is readily construed, Defendants' belated indefiniteness argument fails, and the Court should adopt Xerox's construction of that term. B. "Classification label" (Claims 1, 18) Defendants' Construction classifying word or phrase Xerox's Construction a label in any format that identifies a category in the organized classification of document content Defendants' construction improperly limits the format of a "classification label" to a "classifying word or phrase", even though no such restriction appears in the patent. By contrast, Xerox's construction correctly reflects the ordinary meaning of "label" and the plain text of the claim, which simply requires that a label be associated with the subject-matter classifications used in determining the subject matter of document content--thus, a "classification label". 1. Ordinary meaning of "label" The ordinary meaning of "label" is an identifier: "[a]n item used to identify something or someone". (Ex. I at 773.) In computer science, a "label" has a similar meaning, with the label specifically identifying data: "[a]n identifier within or attached to a set of data elements". (Ex. J at 374.) Neither definition restricts the permissible format of a label. For example, in the ordinary meaning of the term, a label could be an alphanumeric code, as in the Library of Congress classification system. Alternatively, a label could be a number, as in the Dewey Decimal System. In both systems, the label is associated with and identifies a {00502187;v1} 7 category used to describe the subject matter of document content (books). The format is of no moment. The classification label of the claims is equally capable of being in any format. 2. The classification label identifies categories in the organized classification of document content but need not be in any particular format Claim 1 begins with the step of "[d]efining an organized classification of document content with each class in the organized classification of document content having associated therewith a classification label.... (979/76:13-15; emphasis added.) The parties have agreed that an "organized classification of document content" ("OCDC") is "an organized set of categories that can be used to describe the subject matter of document content".6 (Ex. C at 3.) Hence, consistent with Xerox's proposed construction, the claim simply requires that a "classification label" be associated with, and therefore identify, a descriptive category in the OCDC. The claim language also requires the classification label to "correspond[] to a category of information in an information retrieval system". Beyond that, the claims impose no limitations on the claimed classification label. They do not require any particular format. Nor does the specification indicate that the classification label must be in any particular format. It is certainly not required to be a word or phrase. For example, in describing categorization techniques, the specification treats classification labels as mathematical "variables" or "values". (979/43:41-56.) Specifically, the specification states that classification may involve assigning "a document (or a body of text) class labels drawn from a discrete set of possible labels C" (979/43:41-44) and teaches that "classifier accepts as input a document `Doc' and predicts the target value C, or a classification label . . .". (979/43:44-51; emphasis added.) Defendants' proposed construction should therefore be rejected. The parties have also agreed that "defining an organized classification of document content" means "setting an organized classification of document content," i.e., setting the particular organized classification of document content that is to be used in performing the steps of the claimed method. (Ex. C at 3.) {00502187;v1} 6 8 C. "Categorizing the selected document content using the organized classification of document content for assigning the selected document content a classification label" (Claims 1, 18) Defendants' Construction using the organized classification of document content to categorize the selected document content and to assign to the selected document content a single classification label Xerox's Construction determining the subject matter of the selected document content using one or more of the categories defining the organized classification of document content and assigning the corresponding classification label(s) to the selected document content Defendants contend that categorizing selected document content must result in the assignment of only one classification label. This contention finds no support in patent law or the patent and should be rejected. The remainder of Defendants' proposed construction merely rearranges the claim language. By contrast, Xerox's construction dovetails with the parties' agreed construction for the OCDC. 1. The default patent law rule is that "a" means "one or more" "The United States Court of Appeals for the Federal Circuit `has repeatedly emphasized that an indefinite article "a" or "an" in patent parlance carries the meaning "one or more" in open-ended claims containing the transitional phrase "comprising".'" Ingenio, Filiale De LotoQuebec, Inc. v. GameLogic, Inc., 445 F.Supp.2d 443, 453-54 (D. Del. 2006) (quoting KCJ Corp. v. Kinetic Concepts, Inc., 223 F.3d 1351, 1356 (Fed. Cir. 2000)). This is a basic rule to which exceptions are extremely limited: That "a" or "an" can mean "one or more" is best described as a rule, rather than merely as a presumption or even a convention. The exceptions to this rule are extremely limited: a patentee must "evince[ ] a clear intent" to limit "a" or "an" to "one". The subsequent use of definite articles "the" or "said" in a claim to refer back to the same claim term does not change the general plural rule, but simply reinvokes that non-singular meaning. An exception to the general rule that "a" or "an" means more than one only arises where the language of the claims themselves, the specification, or the prosecution history necessitate a departure from the rule. Baldwin Graphic Sys., Inc. v. Siebert, Inc., 512 F.3d 1338, 1342-43 (Fed. Cir. 2008) (internal citations omitted). {00502187;v1} 9 Therefore, the black-letter rule is that "a classification label" or "the classification label" refers to one or more classification labels. 2. The specification provides no support for requiring a single classification label The specification of the '979 Patent provides no support for departing from that blackletter rule. To the contrary, the specification repeatedly confirms that categorization of document content can result in the selection of multiple classes/categories and hence multiple classification labels. Thus, in Section F.3, which describes query formulation, the specification teaches that "[i]n generating the set of categories 3620, the categorizer 3610 classifies input document to generate classification labels for the document content 3612". (979/49:18-20; emphasis added.) The specification then goes on to state clearly that: Document classification labels define the set of categories 3620 output by the categorizer 3610. These classification labels in one embodiment are appended to the query 3812 by query generator 3810 to restrict the scope of the query (i.e., the entity 3808 and the context vector 3822) to folders corresponding to classification labels in a document collection of an information retrieval system. (979/49:31-37; emphasis added.) Moreover, as the specification quotations indicate, categorization for the purpose of query formulation is performed by "categorizer 3610". The specification states that relevant details concerning the operation of this categorizer are found earlier in the specification: FIG. 38 illustrates the elements and flow of information for generating a query 3812 by query generator 3810. The query generated may include some or all of the following elements. . . . (b) a set of categories 3620 generated by the categorizer 3610 (as described above in further detail while referring to FIG. 36) . . . ". (979:48/40-48; emphasis added.) The operation of the "categorizer 3610" (also referred to as "text categorizer 3610") is discussed in Section F.1 of the patent with reference to Figure 36. (See 979/41:45-51, which introduces Section F.1 ("Text Categorizer") (979/41:52-46:67).) There, the specification expressly teaches that documents may be assigned to one or more classes: {00502187;v1} 10 The goal of a text classification system, such as text categorizer 3610, is to classify a document 3612 into a set of one or more classes 3620, which are also referred to as categories. In operation, the text categorizer 3610 assigns a document one or more classes in a set of classes that are defined in an ontology represented in knowledge base 3622 . . . . (979/41:53-58; emphasis added.)7 To simplify the description of the text categorizer 3610, it is assumed that documents 3612 will be assigned to no more than one class. However, it will be appreciated by those skilled in the art that the text categorization method described herein may be readily extended to assign documents to more than one class. (979/43:35-40; emphasis added). Because the claims expressly state that "each class" is associated with a "classification label", the categorization of document content into multiple classes would result in the assignment of multiple classification labels. Defendants' attempt to require a single classification label is therefore entirely without merit and should be rejected. 3. Xerox's proposed construction reflects the agreed-upon meaning of OCDC Xerox's construction of "categorizing" follows logically from the agreed-upon meaning of OCDC: "an organized set of categories that can be used to describe the subject matter of document content". (Ex. C at 3.) Because Step (c) of Claim 1 covers categorizing document content "using" the OCDC, this step involves determining the subject matter of the selected document content using those descriptive OCDC classes/categories, then assigning the associated classification label to that content. By contrast, Defendants' construction ignores the agreed-upon meaning of OCDC and merely rearranges the claim language without clarifying it. D. "Query" (Claims 1, 2, 18, 19) Defendants' Construction request for search results Xerox's Construction a set of data specifying search criteria The asserted claims of the '979 Patent are directed to automatic query generation. As such, the claims cover what the generated query must contain (i.e., entity data restricted by During the prosecution history, Xerox confirmed that element 3612 embodied "selected document content". (See Ex. E at 3.) {00502187;v1} 7 11 classification label data). The claim language stops there; the claimed method ends with the formulation of the query. There is no reference to "search results" in the claim. There is no claim language describing or limiting how the formulated query must be used or implemented in the information retrieval system. Xerox's proposed construction correctly reflects the subject matter of the claims--query formulation--by describing the query in terms of its content, without requiring any particular interaction with the information retrieval system. By contrast, Defendants' proposed construction ignores the subject matter of the claims. 1. Xerox's proposed construction accords with the claim language The term "query" appears in both the preamble to Claim 1 ("[a] method for automatically generating a query from selected document content")8 as well as in Step (d) ("automatically formulating the query to restrict a search at the information retrieval system . . ."). (979/76:1011, 27-28.) Thus, the only query activity mentioned by the claims is "generating" or "formulating" the query. In that regard, Step (d) further specifies what the formulated query contains: data resulting from Step (b) (the entity identification step) and Step (c) (the categorization step). Xerox's proposed construction ("a set of data specifying search criteria") is therefore consistent with the use and role of a "query" in the asserted claims. 2. Xerox's proposed construction accords with the specification The relevant specification descriptions of query formulation likewise focus on the contents of the query. The key specification section, Section F.3, references Figure 38, which shows "the elements and flow of information for generating a query". (979/48:4041.) In Figure 38, the query is marked with the reference number 3812 (the relevant excerpt from Figure 38 is shown to the right). Consistent with Xerox's Claim 18 contains analogous language: "instructions stored in the memory for operating a method for automatically generating a query from selected document content". (979/78:14-16.) {00502187;v1} 8 12 proposed construction, the query is depicted entirely in terms of its potential contents, which constitute search criteria. The specification discusses the elements of the query shown in Figure 38 as follows: FIG. 38 illustrates the elements and flow of information for generating a query 3812 by query generator 3810. The query generated may include some or all of the following elements...: (a) a set of entities 3808..., (b) a set of categories 3620 generated by the categorizer 3610... (c) an aspect vector 3822 generated by categorizer 3610 or short run aspect vector generator 3820, and (d) a category vocabulary 3621 generated by the categorizer 3610.9 (979/48:40-51.) Although the discussion covers in detail the data that could appear in the query, there is no restriction on how the query functionally interacts with the information retrieval system. The specification goes on to describe an application of the method of Claim 1 (979/50:111), which is depicted in Figure 39. The query in Figure 39 looks like this: Again, it is depicted solely with reference to its contents: the entities identified in document content ("seven up") and the result of categorizing that content ("science/biology/genetics"). To be sure, the intrinsic evidence indicates that the formulated query will be used in performing a search at an information retrieval system and that the scope of that search must be restricted. But, as discussed in the next section, the search scope restriction is accomplished by specifying what the query contains, not how it interacts with the information retrieval system. The search governed by a given query will be equally limited in scope regardless of whether the query "requests search results", simply initiates a search without expecting any results (because, for example, the results may be passed on rather than returned), or is simply stored for later use. Some of these elements of the depicted query embody asserted dependent claims, i.e., aspect vector (Claims 2 and 3) and category vocabulary (Claim 10). (See also 979/48:63-66, which also describes the formulated query in terms of its contents.) {00502187;v1} 9 13 Therefore, because Defendants' proposed construction ignores the only salient characteristic of a query for purposes of the asserted claims (the contents of the query) and instead introduces an unsupported and extraneous requirement specifying how the query must interact with the information retrieval system, it should be rejected. E. "To restrict a search at the information retrieval system for information concerning the set of entities to the category of information in the information retrieval system identified by the assigned classification label" (Claims 1, 18) Defendants' Construction to confine a search at the information retrieval system to the category of information identified by the assigned classification label, where the search seeks information concerning the set of entities Xerox's Construction the set of data specifying search criteria includes data items corresponding to one or more entities identified in the `automatically identifying' step and one or more classification labels assigned in the `automatically categorizing' step While Xerox's proposed construction for this claim term provides an easily comprehensible test that is fully consistent with the claims and the specification, Defendants' construction simply rearranges the claim language without clarifying anything. The disputed claim term appears in Step (d) of Claim 1 and modifies the preceding phrase "automatically formulating the query": "automatically formulating the query to restrict a search at the information retrieval system for information concerning the set of entities to the category of information in the information retrieval system identified by the assigned classification label". (979/76:27-31; emphasis added.) As discussed above in relation to "query," the asserted claims are directed to formulating a query, i.e., formulating the set of data that comprises the search criteria for the search. The remainder of the Step (d) simply specifies what that set of query data must contain in terms of search criteria. First, the claim language states that the query is for information concerning "the set of entities". The use of "the" before "set of entities" indicates that this term refers back to the set of {00502187;v1} 14 entities identified in Step (b), i.e., entities in selected document content. As the specification example depicted in Figure 39 indicates, the formulated query is for "information concerning the set of entities" because it includes data corresponding to that set of entities. Thus, in Figure 39, the query contains the set of entities "seven up". Xerox's proposed construction reflects the claim language and specification teachings by requiring that "the set of data specifying search criteria [i.e., the query] include[] data items corresponding to one or more entities identified in the `automatically identifying' step". (Ex. C at 6.) Next, the claim language requires that the formulated query restrict the search "to the category of information in the information retrieval system identified by the assigned classification label". (979/76:29-31.) The "assigned classification label" clearly refers back to Step (c) of Claim 1: the classification label associated with the particular class in the OCDC that, as a result of automatic categorization, is found to describe the subject matter of the document content. Because Claim 1 also requires that each classification label "correspond[] to a category of information in an information retrieval system," (979/76:15-17) the inclusion in the query of data corresponding to the classification label(s) will restrict a search at the information retrieval system (whatever the timing, form or purpose of the search) to the categories of information corresponding to the assigned classification labels. For example, as depicted in Figure 39 and discussed at 979/50:1-11, the query containing the set of entities "seven up" also contains data corresponding to the classification labels that reflect the subject matter of the document content ("science>biology>genetics"), thus focusing the search on the corresponding categories of information in the information retrieval system. Again, by specifying that the query include data items corresponding to one or more classification labels assigned in the "automatically categorizing" step, Xerox's proposed construction accurately reflects the manner in which classification labels restrict the scope of the query. {00502187;v1} 15 In contrast to Xerox's construction, which provides a logical, practical and easily understood interpretation of the relevant claim language, Defendants' proposed construction merely rearranges the existing claim text in an awkward manner and needlessly substitutes "confine" for "restrict" (which requires no clarification). F. "Characteristic vocabulary" (Claim 10) Defendants' Construction one or more words or phrases that describe the category of information corresponding to the class Xerox's Construction one or more words or phrases that describe a class in the organized classification of document content The parties disagree concerning which element of Claim 1 the "characteristic vocabulary" of dependent Claim 10 describes. Xerox's construction tracks the claim language: this vocabulary simply describes a class in the OCDC (i.e., the organized set of categories that can be used to describe the subject matter of document content). By contrast, Defendants argue that vocabulary describes a "category of information," which, they have informed Xerox, refers to a category of information in the information retrieval system. (Ex. H.) Neither the claim language nor the specification supports Defendants' construction. Claim 10 reads "[t]he method according to claim 1, wherein each class in the organized classification of document content has associated therewith a characteristic vocabulary". (979/77:11-13; emphasis added.) Thus, the claim language expressly links the "characteristic vocabulary" to each class in the OCDC. The claim says nothing about the categories of information in the information retrieval system. The specification likewise contradicts Defendants' construction. The specification states that the term "characteristic vocabulary" is synonymous with "category vocabulary" and has the reference number 3621 in Figure 36: "the characteristic vocabulary (i.e., category vocabulary) 3621 associated with the corresponding classes". (979/49:44-46.) However, the word {00502187;v1} 16 "category" in the phrase "category vocabulary" does not refer to the categories of information in the information retrieval system. Instead, the specification plainly indicates that it refers to the set of classes in the OCDC that are used to categorize document content. The specification first introduces the term "category vocabulary" in conjunction with a discussion of the "set of categories 3620 generated by categorizer 3610". (979/48:46-47.) This "set of categories 3620" refers to the classes in the OCDC used during categorization: The goal of a text classification system, such as text categorizer 3610, is to classify a document 3612 into a set of one or more classes 3620, which are also referred to as categories. In operation, the text categorizer 3610 assigns a document one or more classes in a set of classes that are defined in an ontology. . . . An example of an ontology is the DMOZ ontology . . . . (979/41:53-59.) The specification then goes on clearly to define "category vocabulary" as follows: [T]he document from which the entity is extracted is categorized. Categorization involves producing a category 3620 and a category vocabulary 3621. The category vocabulary for a category consists of one or more terms that describe the category. (979/51:33-37; emphasis added.) Because the "category" being discussed has the figure reference number "category 3620," it refers to the "classes 3620" that comprise the OCDC and that are used to categorize document content--not the categories of information in the information retrieval system. Thus, the specification tracks the claim language and directly supports Xerox's proposed construction while providing no support for Defendants' proposed construction. G. Order of steps (Claims 1 and 2; Claims 18 and 19) Defendants' Construction The steps of claim 1 must be performed before the step of 2. The steps of claim 18 must be performed before the step of 19. Xerox's Construction The step of Claim 2 must be performed during or after the completion of step (d) of Claim 1. The step of Claim 19 must be performed during or after the completion of step (f) of Claim 18. {00502187;v1} 17 The law is clear that "although a method claim necessarily recites the steps of the method in a particular order, as a general rule the claim is not limited to performance of the steps in the order recited, unless the claim explicitly or implicitly requires a specific order". Baldwin Graphic Sys., Inc. v. Seibert, Inc., 512 F.3d 1338, 1345 (Fed. Cir. 2008). To determine whether a specific order is required, First, [the court looks] to the claim language to determine if, as a matter of logic or grammar, they must be performed in the order written. . . . If not, [the court] next look[s] to the rest of the specification to determine whether it directly or implicitly requires such a narrow construction. If not, the sequence in which such steps are written is not a requirement. Altiris, Inc. v. Symantec Corp., 318 F.3d 1363, 136970 (Fed. Cir. 2003) (internal citations omitted). As a dependent claim, Claim 2 "incorporate[s] by reference all the limitations of the claim to which it refers" (35 U.S.C. 112 4), here Claim 1. Thus, Claim 2 requires that the query of Claim 1 be limited "by adding terms relating to context information surrounding the set of entities in the selected document content".10 (979/76:33-35.) Because nothing in the claim language requires that these "terms" be added to the query as search criteria only after data corresponding to the set of entities and the classification labels is present in the query, Defendants' attempt to read a narrow construction into the claims is unavailing. All else being equal, the query "[entity]+[classification label]+[terms]" has the same scope as the query "[entity]+[terms]+[classification label]". Indeed, the specification expressly provides for a query constructed by appending classification labels to a query already consisting of an entity plus the "terms" of Claim 2. As mentioned above (see n.4), during the prosecution history, Xerox indicated that the "terms" of Claims 2 and 19 were embodied in "aspect vector 3822 shown in [Applicant's] Figure 38". (See Ex. E at 9.) The specification teaches that "[d]ocument classification labels define the set of This memorandum will focus on Claims 1 and 2, but the same analysis applies to analogous Claims 18 and 19. {00502187;v1} 10 18 categories 3620 output by the categorizer 3610. These classification labels in one embodiment are appended to the query 3812 by query generator 3810 to restrict the scope of the query (i.e., the entity 3808 and the context vector 3822) . . .". (979/49:31-35; emphasis added.) Hence, the specification expressly teaches an embodiment in which the entities from Step (b) ("automatically identifying a set of entities") of Claim 1 and "terms" of Claim 2 are present in the query being formulated before the addition of classification labels derived in Step (c) ("automatically categorizing the selected document content") of Claim 1. H. Order of steps (Claim 1; Claim 18) Defendants' Construction Claim 1: Step (a) must be performed before steps (c) and (d). Step (b) must be performed before step (d). Step (c) must be performed before step (d). Claim 18: Step (c) must be performed before steps (e) and (f). Step (d) must be performed before step (f). Step (e) must be performed before step (f).. Xerox's Construction Claim 1: Step (a) must be performed before steps (c) and (d). Step (b) must be performed before the completion of step (d). Step (c) must be performed before the completion of step (d). Claim 18: Step (c) must be performed before steps (e) and (f). Step (d) must be performed before the completion of step (f). Step (e) must be performed before the completion of step (f). The parties agree that Step (b) of Claim 1 ("automatically identifying a set of entities") must precede the completion of Step (d) ("automatically formulating the query"), and Step (c) ("automatically categorizing the selected document content") must precede the completion of Step (d). This is necessarily so because, as explained above, the query formulated in Step (d) utilizes the data generated in both Steps (b) and (c). Defendants go farther and insist that Steps (b) and (c) must be entirely performed before any commencement of the query formulation step. However, nothing in the claim language or the specification precludes an iterative process whereby, for example, the entity identification step finds an entity, the query formulation step adds that entity to the query, then the entity identification step finds another entity, and so on. Likewise, the categorization step could find {00502187;v1} 19 one category at a time, and pass each one to the query formulation step to add the classification label to the query. Alternatively, as is well known, computer systems often support multiple concurrent processes, multiple processors, or multiple computers working together in a distributed system. (See Ex. J at 132, 447 (defining "computer system", "multiprocessor" and "multiprocessing"); see also 979/75:15-27.) In such an arrangement, the entity identification step could proceed as one stream of activity concurrently with the categorization step as another stream of activity, with both passing their results (entities and classification labels) to another concurrent stream of activity comprising the query formulation step. The claim language imposes no requirement that the entity identification step or the categorization step must proceed to completion before the query formulation step can begin. All that is required is that the query formulation step be supplied with entities and classification labels to put into the query. Put another way, the claim language merely requires that the final formulated query contain data corresponding to entities and classification labels, and it is that final set of data that gives the formulated query its scope. The timing of when particular items are added to the query has no effect on whether the query ultimately formulated satisfies the requirements of the claim, or on its ultimate scope. Hence, there is no logical reason based on the claim language to impose any timing requirement. Nor does the specification impose one. Accordingly, Defendants' attempt to narrow the claims should be rejected. V. CONCLUSION For the foregoing reasons, Xerox respectfully requests that the Court adopt its proposed constructions for the disputed claim terms of the '979 Patent. {00502187;v1} 20 ASHBY & GEDDES /s/ John G. Day Lawrence C. Ashby (I.D. #468) John G. Day (I.D. #2403) Lauren E. Maguire (I.D. #4261) 500 Delaware Avenue, 8th Floor P.O. Box 1150 Wilmington, DE 19899 (302) 654-1888 lashby@ashby-geddes.com jday@ashby-geddes.com lmaguire@ashby-geddes.com Attorneys for Plaintiff Xerox Corporation Of Counsel: Richard J. Stark Andrei Harasymiak CRAVATH, SWAINE & MOORE LLP Worldwide Plaza 825 Eighth Avenue New York, NY 10019 (212) 474-1000 Dated: March 25, 2011 {00502187;v1} 21