PA Advisors, LLC v. Google Inc. et al
Filing
433
MOTION in Limine and Daubert Motion to Exclude the Testimony of Mr. Stanley Peters by PA Advisors, LLC. (Attachments: #1 Affidavit, #2 Exhibit A, #3 Exhibit B-1, #4 Exhibit B-2, #5 Exhibit B-3, #6 Exhibit B-4, #7 Exhibit B-5, #8 Exhibit B-6, #9 Exhibit B-7, #10 Exhibit B-8, #11 Exhibit B-9, #12 Exhibit B-10, #13 Exhibit B-11, #14 Exhibit B-12, #15 Exhibit B-13, #16 Exhibit C, #17 Text of Proposed Order)(Wiley, Elizabeth)
Exhibit B-4
ACC - 4
Invalidity Chart Culliss in view of Herz and Additional Prior Art References
1
Invalidity Chart Culliss in view of Herz and Additional Prior Art References The `067 Patent 1. A data processing method for enabling a user utilizing a local computer system having a local data storage system to locate desired data from a plurality of data items stored in a remote data storage system in a remote computer system, the remote computer system being linked to the local computer system by a telecommunication link, the method comprising the steps of: Culliss Culliss 1:28-31 "Given the large amount of information available over the Internet, it is desirable to reduce this information down to a manageable number of articles which fit the needs of a particular user." Herz Herz 79:11-14 "A method for cataloging a plurality of target objects that are stored on an electronic storage media, where users are connected via user terminals and bidirectional data communication connections to a target server that accesses said electronic storage media." Herz 1:19-21 "This invention relates to customized electronic identification of desirable objects, such as news articles, in an electronic media environment." Herz See also Abstract; 1:18-43; 4:35-48; 28:4155:42; Figures 116. Additional Prior Art References Salton `89 p. 229 "Information retrieval systems process files of records and requests for information, and identify and retrieve from the files certain records in response to the information requests. The retrieval of particular records depends on the similarity between the records and the queries, which in turn is measured by comparing the values of certain attributes attached to records and information requests." Salton `68 p. 7 "Because of their special importance in the present context, it is useful to describe in more detail the operations that lead to the retrieval of stored information in answer to user search requests. In practice, searches often may be conducted by using author names or citations or titles as principal criteria. Such searches do not require a detailed content analysis of each item and are relatively easy to perform, provided that there is a unified system for generating and storing the bibliographic citations pertinent to each item." Braden 5:2-6 "In accordance with our broad teachings, the present invention satisfies this need by employing natural language processing to improve the accuracy of a keyword-based document search performed by, e.g., a statistical web search engine." Ahn 1:31-33 "The present invention is directed to a system and method for searching through documents maintained in electronic form. The present invention is capable of searching through individual documents, or groups of documents." Brookes 1:9-14 "This invention relates to 2
The `067 Patent
Culliss
Herz
Additional Prior Art References information technology and, in particular, to a method and apparatus whereby users of a database system may be alerted to important information including text, graphics and other electronically stored information within the system and by which means information may be efficiently disseminated." Dasan 1:10-15 "The present invention relates to information retrieval. More specifically, the present invention relates to a client server model for information retrieval based upon a user-defined profile, for example, for the generation of an "electronic" newspaper which contains information of interest to a particular user." Dedrick See, e.g., Abstract, Figures 1-8. Krishnan See 1:6-12. Kupiec 3:23-29 "The present invention provides a method for answer extraction. A system operating according to this method accepts a natural-language input string such as a user supplied question and a set of relevant documents that are assumed to contain the answer to the question. In response, it generates answer hypotheses and finds these hypotheses within the documents." Reese 1:55-57 "A method and a system for requesting and retrieving information from distinct web network content sites is disclosed." Menczer p. 157 "In this paper we discuss the use of algorithms based on adaptive, intelligent, autonomous, distributed populations of agents making local decisions as a way to automate the online information search and discovery process in the
3
The `067 Patent
Culliss
Herz
Additional Prior Art References Web or similar environments." Armstrong p. 4 "We have experimented with a variety of representations that re-represent the arbitrary-length text associated with pages, links, and goals as a fixed-length feature vector. This idea is common within information retrieval systems [Salton and McGill, 1983]. It offers the advantage that the information in an arbitrary amount of text is summarized in a fixed length feature vector compatible with current machine learning methods."
(a) extracting, by one of the local computer system and the remote computer system, a user profile from user linguistic data previously provided by the user, said user data profile being representative of a first linguistic pattern of the said user linguistic data;
Culliss 3:46-48 "Inferring Personal Data Users can explicitly specify their own personal data, or it can be inferred from a history of their search requests or article viewing habits. In this respect, certain key words or terms, such as those relating to sports (i.e. "football" and "soccer"), can be detected within search requests and used to classify the user as someone interested in sports." Culliss 3:13-36 "The present embodiment of the invention utilizes personal data to further
Herz 56:19-27 "Initialize Users' Search Profile Sets. The news clipping service instantiates target profile interest summaries as search profile sets, so that a set of high interest search profiles is stored for each user. The search profiles associated with a given user change over time. As in any application involving search profiles, they can be initially determined for a new user (or explicitly altered by an existing user) by any of a number of procedures, including the following preferred methods: (1) asking the user to specify search profiles directly by giving keywords and/or numeric attributes, (2) using copies of the profiles of target objects or target clusters that the user indicates are representative of his or her interest, (3) using a standard set 4
Salton `89 p. 405-6 "To help furnish semantic interpretations outside specialized or restricted environments, the existence of a knowledge base is often postulated. Such a knowledge base classifies the principal entities or concepts of interest and specifies certain relationships between the entities. [43-45] . . . . The literature includes a wide variety of different knowledge representations . . . [one of the] best-known knowledge-representation techniques [is] the semantic-net. . . . In generating a semantic network, it is necessary to decide on a method of representation for each entity, and to relate or characterize the entities. The following types of knowledge representations are recognized: [46-48]. . . . A linguistic level in which the elements are language specific and the links represent arbitrary relationships between concepts that exist in the area under consideration." Salton `89 p. 378 "A prescription for a complete language-analysis package might be based on the following components: A knowledge base consisting of stored entities and predicates, the latter used to characterize and relate the entities."
The `067 Patent
Culliss refine search results . . . . Personal activity data includes data about past actions of the user, such as reading habits, viewing habits, searching habits, previous articles displayed or selected, previous search requests entered, previous or current site visits, previous key terms utilized within previous search results, and time or date of any previous activity."
Herz of search profiles copied or otherwise determined from the search profile sets of people who are demographically similar to the user." Herz 6:58-60 "Each user's target profile interest summary is automatically updated on a continuing basis to reflect the user's changing interests." Herz 7:26-29 "The accuracy of this filtering system improves over time by noting which articles the user reads and by generating a measurement of the depth to which the user reads each article. This information is then used to update the user's target profile interest summary." Herz 27:47-49 "[T]he disclosed method for determining topical interest through similarity requires users as well as target objects to have profiles."
Additional Prior Art References Salton `68 p. 9, Fig. 1-3
"different content analysis procedures are available to generate identifiers for documents and requests. . . statistical and syntactic procedures to identify relations between words and concepts, and phrase generating methods." Salton `68 p. 11 (Statistical association methods, Syntactic analysis methods, and Statistical phrase recognition methods) Salton `68 p. 33 "The phrase dictionaries. Both the regular and the stem thesauruses are based on entries corresponding either to single words or to single word stems. In attempting to perform a subject analysis of written text, it is possible, however, to go further by trying to locate phrases consisting of sets of words that are judged to be important in a given subject area."
Herz 27:62-67 "In a variation, each user's user profile is subdivided into a set of long-term attributes, such as demographic characteristics, and a set of shortterm attributes . . . such as the user's textual and multiple-choice Salton `68 p. 35-36 "The syntactic phrase dictionary answers to questions" has a more complicated structure, as shown by the 5
The `067 Patent
Culliss
Herz Herz 56:20-28 "As in any application involving search profiles, they can be initially determined for a new user (or explicitly altered by an existing user) by any of a number of procedures, including the following preferred methods: . . . (2) using copies of the profiles of target objects or target clusters that the user indicates are representative of his or her interest." Herz 59:24-27 "The user's desired attributes . . . would be some form of word frequencies such as TF/IDF and potentially other attributes such as the source, reading level, and length of the article." Herz See also Abstract; 1:18-43; 4:8:8; 55:4456:14; 56:15-30; 58:5760:9; Figures 1-16.
Additional Prior Art References excerpt reproduced in Fig. 2-6. Here, each syntactic phrase, also known as criterion tree or criterion phrase, consists not only of a specification of the component concepts but also of syntactic indicators, as well as of syntactic relations that may obtain between the included concepts. . . . More specifically, there are four main classes of syntactic specifications, corresponding to noun phrases, subject-verb relations, verb-object relations, and subject-object relations." Braden 7:19-23 "Generally speaking and in accordance with our present invention, we have recognized that precision of a retrieval engine can be significantly enhanced by employing natural language processing to process, i.e., specifically filter and rank, the records, i.e., ultimately the documents, provided by a search engine used therein." Braden See, e.g., 11:62-14:61. Brookes 12:38-43 "creating and storing an interest profile for each database user indicative of categories of information of interest to said each database user, said interest profile comprising (i) a list of keywords taken from said finite hierarchical set and (ii) an associated priority level value for each keyword." Brookes See also, 1:66-2:3. Chislenko 3:38-39 "Each user profile associates items with the ratings given to those items by the user. Each user profile may also store information in addition to the user's ratings." Chislenko 4:15-18 "For example, the system may assume that Web sites for which the user has created "bookmarks" are liked by that user and may use
6
The `067 Patent
Culliss
Herz
Additional Prior Art References those sites as initial entries in the user's profile." Chislenko 4:40-50 "Ratings can be inferred by the system from the user's usage pattern. For example, the system may monitor how long the user views a particular Web page and store in that user's profile an indication that the user likes the page, assuming that the longer the user views the page, the more the user likes the page. Alternatively, a system may monitor the user's actions to determine a rating of a particular item for the user. For example, the system may infer that a user likes an item which the user mails to many people and enter in the user's profile and indication that the user likes that item." Chislenko 21:64-22:2 "(a) storing, using the machine, a user profile in a memory for each of the plurality of users, wherein at least one of the user profiles includes a plurality of values, one of the plurality of values representing a rating given to one of a plurality of items by the user and another of the plurality of values representing additional information." Chislenko 22:29-35 "storing, using the machine, a user profile in a memory for each of the plurality of users, wherein at least one of the user profiles includes a plurality of values, one of the plurality of values representing a rating given to one of a plurality of items by the user and another of the plurality of values representing information relating to the given ratings." Dasan 3:21-24 "The present invention is a method and apparatus for automatically scanning information using a user-defined profile, and providing relevant stories from that information to a user based upon
7
The `067 Patent
Culliss
Herz
Additional Prior Art References that profile." Dasan 4:1-25 "[T]he user is able to connect to the remote server and specify a user profile, setting forth his interests. The user is able to specify the context for the information to be searched (e.g. the date). The user is able to save the profile on the remote machine. Finally the user is able to retrieve the personal profile (with any access control, if desired) and edit (add or delete entries) and save it for future operations. Dasan 4:34-39 "Using this interface, and HTTP, the server may notify the client of the results of that execution upon completion. The server's application program, the personal newspaper generator maintains a record of the state of each user's profile, and thus, provides state functionality from session to session to an otherwise stateless protocol." Dasan See, e.g., 5:37-6:3; 8:53-67. Dedrick 7:28-38 "Data is collected for personal profile database 27 by direct input from the end user and also by client activity monitor 24 monitoring the end user's activity. When the end user consumes a piece of electronic information, each variable (or a portion of each variable) within the header block for that piece of electronic information is added to the database for this end user. For example, if this piece of electronic information is made available to the end user for consumption in both audio and video format, and the end user selects the audio format, then this choice of format selection is stored in personal profile database Z1 for this end user." Dedrick 3:544:4 "The GUI may also have hidden
8
The `067 Patent
Culliss
Herz
Additional Prior Art References fields relating to "consumer variables." Consumer variables refer to demographic, psychographic and other profile information. Demographic information refers to the vital statistics of individuals, such as age, sex, income and marital status. Psychographic information refers to the lifestyle and behavioral characteristics of individuals, such as likes and dislikes, color preferences and personality traits that show consumer behavioral characteristics. Thus, the consumer variables refer to information such as marital status, color preferences, favorite sizes and shapes, preferred learning modes, employer, job title, mailing address, phone number, personal and business areas of interest, the willingness to participate in a survey, along with various lifestyle information. This information will be referred to as user profile data, and is stored on a consumer owned portable profile device such as a Flash memory-based PCMClA pluggable card." Dedrick See, e.g., Abstract, Figures 1-8. Eichstaedt 1:34-43 "The present invention provides a profiling technique that generates user interest profiles by monitoring and analyzing a user's access to a variety of hierarchical levels within a set of structured documents, e.g., documents available at a web site. Each information document has parts associated with it and the documents are classified into categories using a known taxonomy. In other words, each document is hierarchically structured into parts, and the set of documents is classified as well." Eichstaedt 3:28-31 "The profile generation algorithm in the present embodiment learns from positive feedback. Each view of a document signifies an
9
The `067 Patent
Culliss
Herz
Additional Prior Art References interest level in the content of the document." Eichstaedt 1:43-55 "In other words, each document is hierarchically structured into parts, and the set of documents is classified as well. The user interest profiles are automatically generated based on the type of content viewed by the user. The type of content is determined by the text within the parts of the documents viewed and the classifications of the documents viewed. In addition, the profiles also are generated based on other factors including the frequency and currency of visits to documents having a given classification, and/or the hierarchical depth of the levels or parts of the documents viewed. User profiles include an interest category code and an interest score to indicate a level of interest in a particular category. Unlike static registration information, the profiles in this invention are constantly changing to more accurately reflect the current interests of an individual." Eichstaedt 2:15-41 "A preferred embodiment of the present invention automatically generates a profile that accurately captures a user's stable interest after monitoring the user's interaction with a set of structured documents. The technique of the present embodiment is based on the following three assumptions. First, each document in the corpus has different levels, parts, or views. These views are used to determine the level of interest a user has in a particular document. A hierarchical document structure is a good example for a document with different views. Structured documents such as patents have a title, an abstract and a detailed description. These parts of the document may be categorized according to a 3-level hierarchy which then can be used to determine how interested a user
10
The `067 Patent
Culliss
Herz
Additional Prior Art References is in a particular topic. For example, if a user only views the title of a patent document, the user probably has little or no interest in the content of the document. If the user views the abstract as well, the user can be assumed to have more interest in the content of the document. If the user goes on to view the detailed description, then there is good evidence that the user has a strong interest in the document, and the category into which it is classified. Generally, the more views, levels, or parts a document has, the finer will be the granularity of the present system. Although not all documents are structured at present, with the advent of XML, it is likely that the proportion of hierarchical documents available on the internet and in other databases will only increase." Eichstaedt 3:15-18 "In the system of the present invention, a special access analyzer and profile generator 62 analyzes information about user access to database 60 to generate a profile for the user. The profile is then used by a webcasting system 64 to provide or "push" customized information back to the user 54." Eichstaedt 5:32-36 "The automatic profile generation algorithm is completely automated and derives the user profiles from implicit feedback. Therefore, the user community does not have to learn new rules to customize the pushed information stream." Krishnan 2:37-41 "The information access monitor computes user/group profiles to identify information needs and interests within the organization and can then automatically associate users/groups with information of relevance."
11
The `067 Patent
Culliss
Herz
Additional Prior Art References Krishnan 4:1-4 "[A] profile of a user's attributes is termed a `user profile'; a summary of digital profiles of objects accessed by a user and/or noted as of interest to the user, is termed the `interest summary' of that user." Krishnan See also Fig. 6. Reese 4:35-53 "The user profile is intended to focus the retrieved results on meaningful data. One type of user profile is related to the demographics of the user. For example, the user profile might include the area code, zip code, state, sex, and age of a user. With such a profile, the matching server would retrieve data to the client related to the client's demographics. For example, if the user were interested in current events in the state of Oregon, the matching server would retrieve data and compile an aggregate database relating to current events pertinent to the user's age and area, e.g., Portland. Similarly, if the user sought information regarding retail purchases, the matching server would retrieve data relevant to the user's demographics. A demographics user profile is also very effective for advertisers that wish to advertise their goods or services on the matching server so that specific advertisements can be targeted at user's with specific user profile demographics. Other user profiles include, but are not limited to, areas of interest, business, politics, religion, education, etc." Reese 5:55-65 "The user profile form 600 includes a Search Type field 630 that allows a user to select whether the user wants an exact match of the user profile with the search data or whether the user will accept some lesser amount of exactness as acceptable for retrieved data. The user profile form 600 further
12
The `067 Patent
Culliss
Herz
Additional Prior Art References allows the user to enter demographics specific to the user. In FIG. 6, the demographics include area code 640, zip code 650, state 660, sex 670, age 680, and some other identifiers 690. Once the user enters the appropriate data in the user profile form 600, the user is instructed to save the profile by a "Save Profile" 694 button." Reese 8:26-35 "Thus far, the invention is focused on a user-created user profile. The invention also contemplates that the user profile may be constructed by the client based on the user's search habits. In other words, an artificial intelligence system may be created to develop a user profile. In the same way that a system is trained to be associative with regard to matching profile elements, the entire profile may be trained based on a user's search habits. For instance, a user profile that relates to demographics can be trained by recognizing user habits relating to demographics." Sheena 4:40-49 "Ratings can be inferred by the system from the user's usage pattern. For example, the system may monitor how long the user views a particular Web page and store in that user's profile an indication that the user likes the page, assuming that the longer the user views the page, the more the user likes the page. Alternatively, a system may monitor the user's actions to determine a rating of a particular item for the user. For example, the system may infer that a user likes an item which the user mails to many people and enter in the user's profile an indication that the user likes that item." Sheena 2:9-14 "In one aspect the present invention relates to a method for recommending an item to one of a plurality of users. The method begins by storing
13
The `067 Patent
Culliss
Herz
Additional Prior Art References a user profile in a memory by writing user profile data to a memory management data object. Item profile data is also written to a memory management data object." Sheena 3:34-67 "Each user profile associates items with the ratings given to those items by the user. Each user profile may also store information in addition to the user's rating. In one embodiment, the user profile stores information about the user, e.g. name, address, or age. In another embodiment, the user profile stores information about the rating, such as the time and date the user entered the rating for the item. User profiles can be any data construct that facilitates these associations, such as an array, although it is preferred to provide user profiles as sparse vectors of n-tuples. Each n-tuple contains at least an identifier representing the rated item and an identifier representing the rating that the user gave to the item, and may include any number of additional pieces of information regarding the item, the rating, or both. Some of the additional pieces of information stored in a user profile may be calculated based on other information in the profile, for example, an average rating for a particular selection of items (e.g., heavy metal albums) may be calculated and stored in the user's profile. In some embodiments, the profiles are provided as ordered n-tuples. Alternatively, a user profile may be provided as an array of pointers; each pointer is associated with an item rated by the user and points to the rating and information associated with the rating. A profile for a user can be created and stored in a memory element when that user first begins rating items, although in multidomain applications user profiles may be created for particular domains only when the user begins to explore, and rate items within, those domains.
14
The `067 Patent
Culliss
Herz
Additional Prior Art References Alternatively, a user profile may be created for a user before the user rates any items in a domain. For example, a default user profile may be created for a domain which the user has not yet begun to explore based on the ratings the user has given to items in a domain that the user has already explored." Sheena 28:16-21 "(a) storing a user profile, in the memory, for each of a plurality of users, wherein the user profile comprises a separate rating value, supplied by a particular one of the users, for each corresponding one of a plurality of items, said items including the item non-rated by the user." Siefert 2:48-59 "In addition, in other forms of the invention, a profile is maintained which specifies certain preferences of the user. Two such preferences are (1) a preferred natural language (such as English or French), (2) the type of interface which the user prefers. The invention presents the resource in a manner compatible with the profile. Also, another profile, termed a "learning profile:' is maintained, which, in a simplified sense, specifies the current status of a user. with respect to a curriculum which the user is undertaking. The invention ensures compatibility between the resource and the learning profile, if possible." Siefert 8:60-62 "As stated above, the user profile contains information identifying the preferences of the user." Siefert 11:57-63 "The user profile specifies preferences of a user. It may not be possible, in all cases, to cause a resource selected by a user to become compatible with all specified preferences. However, insofar as the resource is transformed so
15
The `067 Patent
Culliss
Herz
Additional Prior Art References that more preferences are matched than previously, the invention can be said to "enhance" the compatibility between the resource and the preferences." Belkin p. 397 "The search intermediary uses his knowledge about the IR system (with its data collections) and the searcher to formulate requests directly to the IR system. The search intermediary has formulated a model of the user and taken advantage of his existing model of the IR system." Belkin p. 399 "In the general information seeking interaction, the IR system needs to have (see Table 1 for a brief listing of the ten functions and their acronyms): a model of the user himself, including goals, intentions and experience (UM)." Han p. 409 "Personalized Web Agents Another group of Web agents includes those that obtain or learn user preferences and discover Web information sources that correspond to these preferences, and possibly those of other individuals with similar interests (using collaborative filtering)" Han p. 409 "As the user browses the Web, the profile creation module builds a custom profile by recording documents of interest to the user. The number of times a user visits a document and the total amount of time a user spends viewing a document are just a few methods for determining user interest [1, 3, 4]. Once WebACE has recorded a sufficient number of interesting documents, each document is reduced to a document vector and the document vectors are passed to the clustering modules." Menczer p. 158-9 "Words are the principal asset in
16
The `067 Patent
Culliss
Herz
Additional Prior Art References text collections, and virtually all information retrieval systems take advantage of words to describe and characterize documents, query, and concepts such as "relevance" or "aboutness" . . . This metric can be called word topology and is the reason why documents are usually represented as word vectors in information retrieval . . . [l]inks, constructed manually to point from one page to another, reflect an author's attempts to relate her writings to others.' Word topology is a epiphenomenal consequence of word vocabulary choices made by many authors, across many pages. The entire field of free text information retrieval is based on the statistical patterns reliably present in such vocabulary usage. By making our agents perceptually sensitive to word topology features." Menczer p. 160 "For the reasons outlined in Section 2, each agent's genotype also contains a list of keywords, initialized with the query terms." [Agent's genotype is its version of a user profile.] Menczer p. 163 "The user initially provides a list of keywords and a list of starting points, in the form of a bookmark file." [The bookmarks and starting points are evidence of the profile the agent uses in creating its genotype.] Armstrong p. 1 "In interactive mode, WebWatcher acts as a learning apprentice [Mitchell et al., 1985; Mitchell et. al., 1994], providing interactive advice to the Mosaic user regarding which hyperlinks to follow next, then learning by observing the user's reaction to this advice as well as the eventual success or failure of the user's actions." Armstrong p. 4 "1. Underlined words in the
17
The `067 Patent
Culliss
Herz
Additional Prior Art References hyperlink. 200 boolean features are allocated to encode selected words that occur within the scope of the hypertext link (i.e., the underlined words seen by the user). These 200 features correspond to only the 200 words found to be most informative over all links in the training data (see below.)" Armstrong p. 4: "The task of the learner is to learn the general function UserChoice?, given a sample of training data logged from users."
(b) constructing, by the remote computer system, a plurality of data item profiles, each plural data item profile corresponding to a different one of each plural data item stored in the remote data storage system, each of said plural data item profiles being representative of a second linguistic pattern of a corresponding plural data item, each said plural second linguistic pattern being substantially unique to each corresponding plural data item;
Culliss 2:33-37 "The articles can each be associated with one or more of these key terms by any conceivable method of association now known or later developed. A key term score is associated with each article for each of the key terms. Optionally, a key term total score can also be associated with the article."
Herz 79:11-22 "A method for cataloging a plurality of target objects that are stored on an electronic storage media, where users are connected via user terminals and bidirectional data communication connections to a target server that accesses said electronic storage media, said method comprising the steps of: storing on said electronic storage media each target object; automatically generating in said target server, target profiles for each of said target objects that are stored on said electronic storage media, each of said target profiles being generated from the contents of an associated one of said target objects and their associated target object characteristics" Herz 6:43-46 "The specific 18
Salton `89 p. 275. "[I]n these circumstances, it is advisable first to characterize record and query content by assigning special content descriptions, or profiles, identifying the items and representing text content. The text profiles can be used as short-form descriptions; they also serve as document, or query, surrogates during the text-search and [text]retrieval operations." Salton `89 p. 294-6 (see also fn. 28-30)( Linguistic methodologies including syntactic class indicators (adjective, noun, adverb, etc.) are assigned to the terms). Salton `89 p. 389 (see also fn. 23-25) (Syntactic class markers, such as [noun], adjective, and pronoun, are first attached to the text words. Syntactic class patterns are then specified, such as "noun-noun", or "adjective-adjective-noun," and groups of text words corresponding to permissible syntactic class patterns are assigned to the texts for content identification. Word frequency and word distance constraints may also be used to refine phrase construction."
The `067 Patent
Culliss
Herz embodiment of this system disclosed herein illustrates the use of a first module which automatically constructs a "target profile" for each target object in the electronic media based on various descriptive attributes of the target object."
Additional Prior Art References Salton `89 p. 391, Fig. 11.3 Salton `68 p. 11 (Statistical association methods, Syntactic analysis methods, and Statistical phrase recognition methods).
Salton `68 p. 30 "The word stem thesaurus and suffix list. One of the earliest ideas in automatic Herz 12:54-13:53 "In particular, a information retrieval was the suggested use of words contained in documents and search requests for textual attribute, such as the full purposes of content identification. No elaborate text of a movie review, can be content analysis is then required, and the similarity replaced by a collection of between different items can be measured simply by numeric attributes that represent scores to denote the presence and the amount of overlap between the respective vocabularies." significance of the words "aardvark," "aback," "abacus," Salton `68 p. 33 "The phrase dictionaries. Both the and so on through "zymurgy" in that text. The score of a word in a regular and the stem thesauruses are based on entries text may be defined in numerous corresponding either to single words or to single word stems. In attempting to perform a subject ways. The simplest definition is analysis of written text, it is possible, however, to go that the score is the rate of the further by trying to locate phrases consisting of sets word in the text, which is of words that are judged to be important in a given computed by computing the number of times the word occurs subject area." in the text, and dividing this Salton `68 p. 35-36 "The syntactic phrase dictionary number by the total number of has a more complicated structure, as shown by the words in the text. This sort of excerpt reproduced in Fig. 2-6. Here, each syntactic score is often called the "term frequency" (TF) of the word. The phrase, also known as criterion tree or criterion definition of term frequency may phrase, consists not only of a specification of the optionally be modified to weight component concepts but also of syntactic indicators, as well as of syntactic relations that may obtain different portions of the text between the included concepts. . . . More unequally: for example, any occurrence of a word in the text's specifically, there are four main classes of syntactic title might be counted as a 3-fold specifications, corresponding to noun phrases, subject-verb relations, verb-object relations, and or more generally k-fold 19
The `067 Patent
Culliss
Herz occurrence (as if the title had been repeated k times within the text), in order to reflect a heuristic assumption that the words in the title are particularly important indicators of the text's content or topic. However, for lengthy textual attributes, such as the text of an entire document, the score of a word is typically defined to be not merely its term frequency, but its term frequency multiplied by the negated logarithm of the word's "global frequency," as measured with respect to the textual attribute in question. The global frequency of a word, which effectively measures the word's uninformativeness, is a fraction between 0 and 1, defined to be the fraction of all target objects for which the textual attribute in question contains this word. This adjusted score is often known in the art as TF/IDF ("term frequency times inverse document frequency"). When global frequency of a word is taken into account in this way, the common, uninformative words have scores comparatively close to zero, no matter how often or rarely they appear in the text. Thus, their rate has little influence on the object's target profile. Alternative methods of 20
Additional Prior Art References subject-object relations." Braden 7:19-23 "Generally speaking and in accordance with our present invention, we have recognized that precision of a retrieval engine can be significantly enhanced by employing natural language processing to process, i.e., specifically filter and rank, the records, i.e., ultimately the documents, provided by a search engine used therein." Braden 11:62-14:61 "In general, to generate logical form triples for an illustrative input string, e.g. for input string 510, that string is first parsed into its constituent words. Thereafter, using a predefined record (not to be confused with document records employed by a search engine), in a stored lexicon, for each such word, the corresponding records for these constituent words, through predefined grammatical rules, are themselves combined into larger structures or analyses which are then, in turn, combined, again through predefined grammatical rules, to form even larger structures, such as a syntactic parse tree. A logical form graph is then built from the parse tree. Whether a particular rule will be applicable to a particular set of constituents is governed, in part, by presence or absence of certain corresponding attributes and their values in the word records. The logical form graph is then converted into a series of logical form triples. Illustratively, our invention uses such a lexicon having approximately 165,000 head word entries. This lexicon includes various classes of words, such as, e.g., prepositions, conjunctions, verbs, nouns, operators and quantifiers that define syntactic and semantic properties inherent in the words in an input string so that a parse tree can be constructed therefor. Clearly, a logical form (or, for that matter, any other representation, such as logical
The `067 Patent
Culliss
Herz calculating word scores include latent semantic indexing or probabilistic models. Instead of breaking the text into its component words, one could alternatively break the text into overlapping word bigrams (sequences of 2 adjacent words), or more generally, word n-grams. These word n-grams may be scored in the same way as individual words. Another possibility is to use character ngrams. For example, this sentence contains a sequence of overlapping character 5-grams which starts "for e", "or ex", "r exa'', "exam", "examp", etc. The sentence may be characterized, imprecisely but usefully, by the score of each possible character 5-gram ("aaaaa", "aaaab", ... "zzzzz") in the sentence. Conceptually speaking, in the character 5-gram case, the textual attribute would be decomposed into at least 265=11,881,376 numeric attributes. Of course, for a given target object, most of these numeric attributes have values of 0, since most 5-grams do not appear in the target object attributes. These zero values need not be stored anywhere. For purposes of digital storage, the value of a textual attribute could be characterized by storing the set 21
Additional Prior Art References form triples or logical form graph within a logical form, capable of portraying a semantic relationship) can be precomputed, while a corresponding document is being indexed, and stored, within, e.g., a record for that document, for subsequent access and use rather than being computed later once that document has been retrieved. Using such precomputation and storage, as occurs in another embodiment of our invention discussed in detail below in conjunction with FIGS. 10-13B, drastically and advantageously reduces the amount of natural language processing, and hence execution time associated therewith, required to handle any retrieved document in accordance with our invention. In particular, an input string, such as sentence 510 shown in FIG. 5A, is first morphologically analyzed, using the predefined record in the lexicon for each of its constituent words, to generate a so-called "stem" (or "base") form therefor. Stem forms are used in order to normalize differing word forms, e.g., verb tense and singular-plural noun variations, to a common morphological form for use by a parser. Once the stem forms are produced, the input string is syntactically analyzed by the parser, using the grammatical rules and attributes in the records of the constituent words, to yield the syntactic parse tree therefor. This tree depicts the structure of the input string, specifically each word or phrase, e.g. noun phrase "The octopus", in the input string, a category of its corresponding grammatical function, e.g., NP for noun phrase, and link(s) to each syntactically related 45 word or phrase therein. For illustrative sentence 510, its associated syntactic parse tree would be:
The `067 Patent
Culliss
Herz of character 5-grams that actually do appear in the text, together with the nonzero score of each one. Any 5-gram that is not included in the set can be assumed to have a score of zero. The decomposition of textual attributes is not limited to attributes whose values are expected to be long texts. A simple, one-term textual attribute can be replaced by a collection of numeric attributes in exactly the same way. Consider again the case where the target objects are movies. The "name of director" attribute, which is textual, can be replaced by numeric attributes giving the scores for "FedericoFellini," "Woody-Allen," "Terence-Davies," and so forth, in that attribute."
Additional Prior Art References
A start node located in the upper-left hand corner of the tree defines the type of input string being parsed. Sentence types include "DECL" (as here) for a declarative sentence, "IMPR" for an imperative sentence and "QUES" for a question. Displayed vertically to the right and below the start node is a first level analysis. This analysis has a head node indicated by an asterisk, typically a main verb (here the word "has"), a premodifier (here the noun phrase "The octopus"), followed by a postmodifier (the noun phrase "three hearts"). Each leaf of the tree Herz 79:11-23 "A method for contains a lexical term or a punctuation mark. Here, cataloging a plurality of target as labels, "NP" designates a noun phrase, and objects that are stored on an "CHAR" denotes a punctuation mark. The syntactic electronic storage media, . . . said parse tree is then further processed using a different method comprising the steps of: . set of rules to yield a logical form graph, such as . . automatically generating in graph 515 for input string 510. The process of said target server, target profiles producing a logical form graph involves extracting for each of said target objects that underlying structure from syntactic analysis of the are stored on said electronic input string; the logical form graph includes those storage media, each of said target words that are defined as having a semantic profiles being generated from the relationship there between and the functional nature contents of an associated one of of the relationship. The "deep" cases or functional said target objects and their roles used to categorize different semantic associated target object 22
The `067 Patent
Culliss
Herz characteristics." Herz 5:7-11 "The system for electronic identification of desirable objects of the present invention automatically constructs both a target profile for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles."
Additional Prior Art References relationships include:
Herz 10:63-67; 11:1-7 "However, a more sophisticated system would consider a longer target profile, including numeric and associative attributes: (a.) full text of document . . . (d.) language in which document is written . . . (g.) length in words . . To identify all the semantic relationships in an input . (h.) reading level." string, each node in the syntactic parse tree for that Herz See also Abstract; 1:18-43; string is examined. In addition to the above relationships, other semantic roles are used. 4:498:8; 9:116:62; 26:43 27:43; 55:4456:14; 56:52 In any event, the results of such analysis for input 57:10. string 510 is logical form graph 515. Those words in the input string that exhibit a semantic relationship therebetween (such as, e.g. "Octopus" and "Have") are shown linked to each other with the relationship therebetween being specified as a linking attribute (e.g. Dsub). This graph, typified by graph 515 for input string 510, captures the structure of arguments and adjuncts for each input string. Among other things, logical form analysis maps function words, 23
The `067 Patent
Culliss
Herz
Additional Prior Art References such as prepositions and articles, into features or structural relationships depicted in the graph. Logical form analysis also resolves anaphora, i.e., defining a correct antecedent relationship between, e.g., a pronoun and a co-referential noun phrase; and detects and depicts proper functional relationships for ellipsis. Additional processing may well occur during logical form analysis in an attempt to cope with ambiguity and/or other linguistic idiosyncrasies. Corresponding logical form triples are then simply read in a conventional manner from the logical form graph and stored as a set. Each triple contains two node words as depicted in the graph linked by a semantic relationship therebetween. For illustrative input string 510, logical form triples 525 result from processing graph 515. Here, logical form triples 525 contain three individual triples that collectively convey the semantic information inherent in input string 510. Similarly, as shown in FIGS. 5B-5D, for input strings 530, 550 and 570, specifically exemplary sentences "The octopus has three hearts and two lungs.", "The octopus has three hearts and it can swim.", and "I like shark fin soup bowls.", logical form graphs 535, 555 and 575, as well as logical form triples 540, 560 and 580, respectively result. There are three logical form constructions for which additional natural language processing is required to correctly yield all the logical form triples, apart from the conventional manner, including a conventional "graph walk", in which logical form triples are created from the logical form graph. In the case of coordination, as in exemplary sentence "The octopus has three hearts and two lungs", i.e. input string 530, a logical form triple is created for a word, its semantic relation, and each of the values of the coordinated constituent. According to a "special" graph walk, we find in FIG. 540 two logical form
24
The `067 Patent
Culliss
Herz
Additional Prior Art References triples "haveDobj- heart" and "have-Dobj-lung". Using only a conventional graph walk, we would have obtained only one logical form triple "haveDobj-and". Similarly, in the case of a constituent which has referents (Refs), as in exemplary sentence "The octopus has three hearts and it can swim", i.e. input string 550, we create a logical form triple for a word, its semantic relation, and each of the values of the Refs attribute, in additional to the triples generated by the conventional graph walk. According to this special graph walk, we find in triples 560 the logical form triple "swim-Dsuboctopus" in addition to the conventional logical form triple "swim-Dsubit". Finally, in the case of a constituent with noun modifiers, as in the exemplary sentence "I like shark fin soup bowls", i.e. input string 570, additional logical form triples are created to represent possible internal structure of the noun compounds. The conventional graph walk created the logical form triples "bowl-Mods-shark", "bowl-Modsfin" and "bowl-Mods-soup", reflecting the possible internal structure [[shark] [fin] [soup] bowl]. In the special graph walk, we create additional logical form triples to reflect the following possible internal structures [[shark fin] [soup] bowl] and [[shark] [fin soup] bowl] and [[shark [fin] soup] bowl], respectively: "fin-Mods-shark", "soup-Mods-fin", and "soupMods-shark". Inasmuch as the specific details of the morphological, syntactic, and logical form processing are not relevant to the present invention, we will omit any further details thereof. However, for further details in this regard, the reader is referred to co-pending United States patent applications entitled "Method and System for Computing Semantic Logical Forms from Syntax Trees", filed Jun. 28, 1996 and assigned Ser. No. 08/674,610 and particularly "Information Retrieval Utilizing
25
The `067 Patent
Culliss
Herz
Additional Prior Art References Semantic Representation of Text", filed Mar. 7, 1997 and assigned Ser. No. 08/886,814; both of which have been assigned to the present assignee hereof and are incorporated by reference herein." Braden 7:47-53 "each of the documents in the set is subjected to natural language processing, specifically morphological, syntactic and logical form, to produce logical forms for each sentence in that document. Each such logical form for a sentence encodes semantic relationships, particularly argument and adjunct structure, between words in a linguistic phrase in that sentence." Ahn 2:32-34 "Also, a document tree and a document index table is maintained for each document (such as Document Dl)." Brookes 12:27-37 "storing in association with each information item in the database system a plurality of parameters including (i) at least one keyword indicative of the subject matter of said information item, and (ii) a priority level value for each information item, wherein said priority level value is selected from a predetermined set ·of priority level values, and wherein said at least one keyword is selected from a finite hierarchical set of keywords having a tree structure relating broad keywords to progressively narrower keywords." Brookes See also, 1:57-65. Dedrick 15:41-44 "The metering server 14 is capable of storing units of information relating to the content databases of the publisher/advertiser, including the entire content database." Dedrick See, e.g., Abstract, Figures 1-8.
26
The `067 Patent
Culliss
Herz
Additional Prior Art References Eichstaedt 2:42-50 "The second assumption is that the documents must already be assigned to at least one category of a known taxonomy tree for the database. Notice, however, that this system works with any existing taxonomy tree and does not require any changes to a legacy system. FIG. 1 illustrates a taxonomy tree with six leaf categories 50. Each leaf category has an interest value associated with it. Taxonomies are available for almost all domainspecific document repositories because they add significant value for the human user." Eichstaedt 1:34-43 "The present invention provides a profiling technique that generates user interest profiles by monitoring and analyzing a user's access to a variety of hierarchical levels within a set of structured documents, e.g., documents available at a web site. Each information document has parts associated with it and the documents are classified into categories using a known taxonomy. In other words, each document is hierarchically structured into parts, and the set of documents is classified as well." Krishnan 3:64-4:1 "[I]nformation, which is typically electronic in nature and available for access by a user via the Internet, is termed an `object'; a digitally represented profile indicating an object's attributes is termed an `object profile.'" Krishnan 7:13-42 "The basic [document] indexing operation comprises three steps, noted above as: filtering, word breaking, and normalization . . . . Once the content filter has operated on the source file, the word breaker step is activated to divide the received text stream from the content filter into
27
The `067 Patent
Culliss
Herz
Additional Prior Art References words and phrases. Thus, the word breaker accepts a stream of characters as an input and outputs words . . . . The final step of indexing is the normalization process, which removes `noise' words and eliminates capitalization, punctuation, and the like." Krishnan See also Fig. 6. Kupiec 13:13-20 "In step 250 the match sentences retained for further processing in step 245 are analyzed to detect phrases they contain. The match sentences are analyzed in substantially the same manner as the input string is analyzed in step 220 above. The detected phrases typically comprise noun phrases and can further comprise title phrases or other kinds of phrases. The phrases detected in the match sentences are called preliminary hypotheses." Reese 7:1-24 "In collecting the information that matches the query request, the server may collect different forms of information. First, the server may collect entire content site data, for example, entire files or documents on a particular content server. Instead, the server may collect key words from particular sites (e.g., files) on individual content servers, monitor how often such key words are used in a document, and construct a database based on these key words (step 822). Another way of collecting data is through the collection of content summaries (step 824). In this manner, rather than entire files or documents being transmitted to the server and ultimately to the client, only summaries of the documents or files are collected and presented. The summaries offer a better description of the content of the particular files or documents than the key words, because the user can form a better opinion of what is contained in the abbreviated
28
The `067 Patent
Culliss
Herz
Additional Prior Art References document or file based on summaries rather than a few key words. The summaries may be as simple as collective abstracts or may involve the matching server identifying often used key words and extracting phrases or sentences using these key words from the document. Finally, the invention contemplates that titles may also be retrieved by the matching server and submitted to the client rather than entire documents or files." Sheena 2:14-15 "Similarity factors are calculated for each of the users and the similarity factors are used to select a neighboring user set for each user of the system." Sheena 4:56-5:17 "Profiles for each item that has been rated by at least one user may also be stored in memory. Each item profile records how particular users have rated this particular item. Any data construct that associates ratings given to the item with the user assigning the rating can be used. It is preferred is to provide item profiles as a sparse vector of n-tuples. Each n-tuple contains at least an identifier representing a particular user and an identifier representing the rating that user gave to the item, and it may contain other information, as described above in connection with user profiles. As with user profiles, item profiles may also be stored as an array of pointers. Item profiles may be created when the first rating" Siefert 8:22-33 "In a very simple sense, the expert identifies the language of a sample of words, by reading the sample. Then, the invention analyzes samples of each language, in order to find unique character- and word patterns (or other patterns). Now the invention can associate unique patterns with
29
The `067 Patent
Culliss
Herz
Additional Prior Art References each language. The invention stores the unique patterns, together with the corresponding language identities, in a reference table. Later, to identify a language, the invention looks for the unique patterns within a sample of the language, such as in a file whose language is to be identified. When a pattern is found, the invention identifies the language containing it, based on the table." Armstrong p. 4 "1. Underlined words in the hyperlink. 200 boolean features are allocated to encode selected words that occur within the scope of the hypertext link (i.e., the underlined words seen by the user). These 200 features correspond to only the 200 words found to be most informative over all links in the training data (see below.)"
(c) providing, by the user to the local computer system, search request data representative of the user's expressed desire to locate data substantially pertaining to said search request data;
Culliss 2:39-41 "[T]he invention can accept a search query from a user and a search engine will identify matched articles." Culliss 12:41-51 "A method of organizing a plurality of articles comprising . . . (b) accepting a first search query from a first user having first personal data."
Herz 66:52-61 "However, in a variation, the user optionally provides a query consisting of textual and/or other attributes, from which query the system constructs a profile in the manner described herein, optionally altering textual attributes as described herein before decomposing them into numeric attributes. Query profiles are similar to the search profiles in a user's search profile set, except that their attributes are explicitly specified by a user, most often for one-time usage, and unlike search profiles, they are not automatically updated to reflect changing interests." 30
Salton `89 p. 160 "Several types of query specifications can be distinguished. A simple query is one containing the value of a single search key. A range query contains a range of values for a single key for example, a request for all the records of employee ages 22 to 25. A functional query is specified by using a function for the values for certain search keys, for example the age of employees exceeding a given stated threshold." Salton `68 p. 7 "When the search criteria are based in one way or another on the contents of a document, it becomes necessary to use some system of content identification, such as an existing subject classification or a set of content identifiers attached to each item, which may help in restricting the search to items within a certain subject area and in distinguishing items likely to be pertinent from others to be rejected."
The `067 Patent
Culliss
Herz Herz See also Abstract; 1:18-43; 4:49-8::8; 55:445:14; 56:15-30; 58:5760:9; Figures 1-16.
Additional Prior Art References Salton `68 p. 413 "The user participates in the system by furnishing information about his needs and interests, by directing the search and retrieval operations accordance with his special requirements, by introducing comments out systems operations, by specifying output format requirements, and nearly by influencing file establishment and file maintenance procedures." Braden 7:35-38 "Specifically, in operation, a user supplies a search query to system 5. The query should be in full-text (commonly referred to as "literal") form in order to take full advantage of its semantic content through natural language processing." Ahn 3:37-42 "In step 408, the invention receives a user search request containing a keyword and determines whether the search request is directed to searching an individual document or a group of documents. If the search request is directed to searching an individual document, then step 414 is performed." Brookes 8:48-54 "In this manner the information in the system may be augmented by input from the users, questions may be asked of specific users and responses directed accordingly. A collection of information items related in this manner is termed a `discussion'. The context of a discussion is defined by the parameters (especially keywords) of its constituent information items." Brookes See, e.g., 12:27-37 "storing in association with each information item in the database system a plurality of parameters including (i) at least one keyword indicative of the subject matter of said
31
The `067 Patent
Culliss
Herz
Additional Prior Art References information item, and (ii) a priority level value for each information item, wherein said priority level value is selected from a predetermined set of priority level values, and wherein said at least one keyword is selected from a finite hierarchical set of keywords having a tree structure relating broad keywords to progressively narrower keywords." Dasan 7:28-38 "the user specifies search terms used in the full-text search. These are illustrated in field 804. Any number of search terms may be used and the "l" character is treated as a disjunction ("or"). Then. by selecting either of user interface objects 806 or 808, the user specifies whether the search terms are case sensitive or not. This is detected at step 706. At step 708, using either a scrollable list containing selectable item(s), as illustrated in field 810, or other means, the user specifies the search context(s) (the publications, newsfeeds, etc... ) in which to search. By the selection of icon 812 or other commit means." Dedrick See, e.g., Figures 1-8, 8:209:24, 14:5564. Krishnan 7:61-63 "The query screen allows a user to express a query by simply filling out fields in a form." Krishnan 12:36-47 "[A] method for enhancing efficiencies with which objects retrieved from the Internet are maintained for access by the multiple members, the method comprising: . . . receiving a member-generated query for one or more objects that can be obtained from the Internet." Krishnan See also Fig. 6. Kupiec 4:7-8 "The method begins by accepting as
32
The `067 Patent
Culliss
Herz
Additional Prior Art References input the user's question and a set of documents that are assumed to contain the answer to the question." Reese 7:1-23 "In collecting the information that matches the query request, the server may collect different forms of information." Menczer p. 162 "Consider for example the following query: "Political institutions: The structure, branches and offices of government." Menczer p. 163 "The user initially provides a list of keywords and a list of starting points, in the form of a bookmark file.2 In step (0), the population is initialized by pre-fetching the starting documents. Each agent is "positioned" at one of these document and given a random behavior (depending on the representation) and an initial reservoir of "energy". In step (2), each agent "senses" its local neighborhood by analyzing the text of the document where it is currently situated. This way, the relevance of all neighboring documents -those pointed to by the hyperlinks in the current document- is estimated. Based on these link relevance estimates, an agent "moves" by choosing and following one of the links from the current document." Armstrong p. 4 "4. Words used to define the user goal. These features indicate words entered by the user while defining the information search goal. In our experiments, the only goals considered were searches for technical papers, for which the user could optionally enter the title, author, organization, etc. (see Figure 3). All words entered in this way throughout the training set were included (approximately 30 words, though the exact number varied with the training set used in the particular
33
The `067 Patent
Culliss
Herz
Additional Prior Art References experiment). The encoding of the boolean feature in this case is assigned a 1 if and only if the word occurs in the user-specified goal and occurs in the hyperlink, sentence, or headings associated with this example." Salton `89 p.275 "In these circumstances, it is advisable first to characterize record and query content by assigning special content descriptions, or profiles, identifying the items and representing text content. The text profiles can be used as short-form descriptions; they also serve as document, or query, sur
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?