I/P Engine, Inc. v. AOL, Inc. et al

Filing 1

COMPLAINT against AOL, Inc., Gannett Company, Inc., Google, Inc., IAC Search & Media, Inc., Target Corporation ( Filing fee $ 350 receipt number 14683024186.), filed by I/P Engine, Inc. (Attachments: # 1 Exhibit A, # 2 Exhibit B, # 3 Receipt, # 4 Civil Cover Sheet, # 5 Letter)(ecav, )

Download PDF
EXHIBIT B US006775664B2 (12) United States Patent (io) Patent No.: (45) Date of Patent: Lang et al. (54) (75) INFORMATION FILTER SYSTEM AND METHOD FOR INTEGRATED CONTENTBASED AND COLLABORATIVE/ADAPTIVE FEEDBACK QUERIES (*) Notice: Oct. 22, 2001 Prior Publication Data 5,649,186 A 5,734,893 A Int. Cl.7 C06F 17/30 707/3; 707/5; 707/1; 707/10; 707/2 707/1, 3, 5, 10, 707/2 (56) References Cited U.S. PATENT DOCUMENTS 5,019,961 A 5/1991 5,117,349 A 5,249,262 A 5/1992 Tirfingelal 9/1993 Baule 5.471.610A 11/1995 7/1997 3/1998 Ferguson Li et al ♦ 348/7 707/10 707/4 11/1998 Miller et al • 2/1999 I-ang el al 707/2 707/1 * 3/1999 Robinson 705/27 (List continued on next page.) OTHER PUBLICATIONS Primary Examiner—John E. Breene 1998, now Pal. No. 6,314,420, which is a conlinualion-inpait of application No. 08/627,436, filed on Apr. 4, 1996, now Pat. No. 5,867,799, application No. 10/045,198, and a continualion-in-part of application No. 09/195,708, filed on Nov. 19,1998, now Pat. No. 6,308,175, which is a continu alion-in-part of application No. 08/627,436, filed on Apr. 4, 1996, now Pat. No. 5,867,794. Field of Search Farry el al Assistant Examiner—Kucn S. Lu Continuation of application No. 09/204,149, filed on Dec. 3, (58) 395/149 3/1997 (List continued on next page.) Related U.S. Application Data U.S. Cl 395/149 Yaksich et al 5,608,447 A US 2002/0120609 Al Aug. 29, 2002 (52) Yaksich et al 10/1996 5.842.199A 707/3 364/419.19 Persin, "Document Filtering for Fast Ranking", Proceedings of the Seventeenth Annual International ACM-SIGIR Con ference on Research and Development in Information Retrieval, Jul. 6, 1994 pp. 339-348. Appl. No.: 10/045,198 (51) 10/1996 5.563.999 A Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 467 days. (63) Henderson el al 5.563.998 A Lycos, Inc., Waltham, MA (US) (65) Arnram el al 8/1996 5,884,282 A Assignee: Filed: 7/1996 5.544,049 A 5,867,799 A (73) (22) Aug. 10,2004 5,537,586 A Inventors: Andrew K. Lang, Pittsburgh, PA (US); Donald M. Kosnk, Pittsburgh, PA (US) (21) US 6,775,664 B2 Addcsso cl al Kawaguehi el al 364/192 707/5 395/66 707/4 (74) Attorney, Agent, or Firm—Testa, Hurwiiz & Thibeaull, LLP (57) ABSTRACT A search engine system is provided for a portal site on the internet. The search engine system employs a regular search engine to make one-shot or demand searches for information entities which provide at least threshold matches to user queries. The search engine system also employs a collaborativc/content-based filter to make continuing searches for information entities which match existing wire queries and are ranked and stored over time in useraccessible, system wires corresponding to the respective queries. A user feedback system provides collaborative feedback data for integration with content profile data in the operation of the collaborative/content-based filter. A query processor determines whether a demand search or a wire search is made for an input query. 38 Claims, 10 Drawing Sheets US 6,775,664 B2 Page 2 U.S. PATENT DOCUMENTS Shelh, "learning Approach to Personalized Information 5,983,214 A 11/1999 707/1 Filtering", Massachusetts Institute of Technology, (Feb. 6,006,222 A 12/1999 Culliss 707/5 6,014,665 A 1/2000 Culliss 707/5 1994) pp. 1-74. 6,029,141 A 6,029,161 A • Lang et al 2/2000 2/2000 Bezos cl al Lang el al 6,078,916 A 6/2000 Culliss 6,182,068 Bl 1/2001 Culliss 705/27 707/1 707/5 707/5 6,308,175 151 * 10/2001 Lang el al 707/10 6,314,420 Bl • 11/2001 I.ang cl al 707.G OTHER PUBLICATIONS Dumais el al., "Using Latent Semantic Analysis to Improve Access lo Textual Information", In Proceedings of CHI-88 Conference on Human Factors in Computing Systems, (1998, New York: ACM) 6 pp. Evans et al., "A Summary of the Cl ARIT Project", Tech nical Report, Laboratory for Computational Linguistics, Knowles, "Software Bots filter Web dala; Agent Technology Delivers Customized Information for BBN's PIN", PC Week v.12, nl7 May 1995 pp. 112-113. Carnegie-Mellon University, Sep. 1991, pp. 1-12. Rcsnick ct al., "Open Architecture for Collaboration Filter ing of Netnews", Proceedings of the Conference on Com puter Supported Cooperative Work, Oct. 1994 pp. 175-186. Goldberg et al., "Using Collaborative Filtering to Weave an Information Tapestry", Communications of the ACM vol. 35, No. 12, Dec. 1992 pp. 61-70. Structured Information Spaces", Human Factors in Comput Fischer et al., "Information Access in Complex, Poorly ing Systems, CHI '91 Conference Proceedings, ACM, 1991 8 pp. * cited by examiner FIRST MEANS MEANS USER 30' L 29 33' INFORMATION FILTER APPARATUS X 32 PREDICTION ADAPTATION ADAPTATION MEANS SECOND T DISTRIBUTED NETWORK RESOURCE fl 28e 27b #1 USER [/*> MEANS COMMUNICATION RESOURCE #2 NETWORK DISTRIBUTED 31 COMPUTER STORAGE MEANS 13 COMPUTER SYSTEM -25 35 27a s 17 y USER USER USER USER USER -\ FILTER FILTER FILTER FILTER FILTER 28a CREDIBILITY FILTER COMMUNITY FILTER COMMUNITY FILTER ADAPTIVE FILTER MEANS ADAPTIVE CONTENT FILTER EXTRACTION MEANS T7 13 USER1- FIG. I 16 to (/I C/3 d crs 5? NOTE: AF=AOAPTIVE FILTER NOTE: MC=MEMB£R CLIENT .PROFILE. ADAPTIVE ADAPTIVE CONTENT to W en s* o c s n I d U.S. Patent Aug. 10,2004 100- Sheet 3 of 10 PROVIDING DYNAMIC US 6,775,664 B2 -105 INFORMON CHARACTERIZATION COMMUNITY FILTER 135- 110 HO—| CLIENT FILTER ADAPTIVELY FILTER RAW INFORMOKS PRESENTING PROPOSED INFORMON TO USER -115 -120 ADAPTING ■125 CONTENT AND/OR COLLABORATIVE PROFILE UPDATING DYNAMIC INFORMON CHARACTERIZATION -130 IPREDICTING MC RESPONSESH50 -170 GROUP INTO PREFERENCE COHORT 175 TARGETED INFORMON -165 _r 155 CREATE CREDIBILITY CONSUMER PROFILE FILTER RECOMMENDATION FILTER FIG. 3 _r UPDATE 160 CREDIBILITY PROFILE CONSULTATION FILTER U.S. Patent Aug. 10,2004 Sheet 4 of 10 US 6,775,664 B2 ■205 PREFILTER PARTITION EACH USER INTO MULTIPLE MC's 200- GROUP MC's TO FORM MULTIPLE COMMUNITIES -210 PREDICTING COMMUNITY PROFILE -215 SELECT PROPOSED INFORMON 255f-{ CONTENT FILTER 260- COLLABORATION FILTER 265 MC FILTER PROVIDE PROPOSED INFORMONS -235 TO USER FIG. 4 UPDATE ' PREDICTION -245 CRITERIA -240 U.S. Patent Aug. 10,2004 Sheet 5 of 10 PARSE DATA INTO TOKENS US 6,775,664 B2 ■301 300 CREATING MODE-INVARIANT -305 COMPONENTS CONCEPT-BASED INDEXING DETAILED COMMUNITY -310 -315 PROFILING ELIMINATE POOR FITS DETAILED MC PROFILING 345- 340- UPDATE PROFILES OBTAIN USER FEEDBACK ELIMINATE POOR FITS PRESENTING PROPOSED INFORMONS TO USER FIG. 5 -320 -325 -330 -335 1 I W L 425b 425d LEARNING FUNCTION L 425c LEARNING FUNCTION L LEARNING FUNCTION COLLABORATIVE INPUT ERROR-CORRECTION UNIT 420- 425a FUNCTION INFORMATION 415- r LEARNING UNSTRUCTURED FEATURE 410- STRUCTURED FEATURE INFORMATION 405 CORREUTED-FEATURE, 400 428d- 428c- 428b- 428a- 426c UP IRP r 426d UP IRP _r UP IRP -426b UP IRP -426o 4270 FIG. 6 WEIGHTED IRP FUNCTION ~427d FUNCTION WEIGHTED IRP FUNCTION WEIGHTED IRP ^- 429d PREDICTOR UP=UNCERTAINTY NOTE: IRP=INDEPENDENT RATING PREDICTOR 432 N> o ON era IS d T SUB-SUB MIND POOL 502o SUB-SUB MIND POOL SUB-SUB MIND POOL FIG. 7 SUB-SUB MIND POOL \. MIND POOL SUB SUB-SUB MIND POOL 502c SUB-SUB MIND POOL 504^ SUB-SUB MIND POOL a -a en o c I 2 OS > S3 CZ3 U.S. Patent Aug. 10,2004 USER PROVIDES Sheet 8 of 10 US 6,775,664 B2 20C QUERY TERMS LOOKUP 22C QUERY IN TABLE DISPLAY RESULTS FROM PREVIOUSLY CREATED WIRE PERFORM REGULAR SEARCH ON QUERY ENGINE FIG. 8 28C 26C OTHER USER USER INDIVIDUAL 30C ► 42C1 43C — 32C 48C 42C AQAPTIVF FFFORACK DATA i ADAPTIVE FFFDBACK DATA PROCESSOR QUERY MODE DEMAND MODE PROCESSING/ RATING RATING WIRE MODE INFORMON SEARCH RETURN PROCESSOR -48C1 PROFILES DEMAND PROFILE 50C 42C2 ■40C CONTENT-BASED WIRE PROFILES MOLD MIND AGENT STRUCTURE FILTER BASED CONTENT- SELECTOR 38C FEEDBACK DATA A. FEEDBACK PROCESSORFEEDBACK DATA RELEVANT TO CURRENT INFORMON FEEDBACK DATA SPIDER MEMORY SYSTEM -46CM FIG. 9 SITE M WEB A SITE «-► WEB On ^4 d e NO i CA era CO p 62C-* 64C STATIONS USER OTHER USER STATION INDIVIDUAL ► 60C—j 74C r QUERY PROCESSOR BASED FILTER STRUCTURE CONTENT- (DEMAND) SEARCH ENGINE 80C r DEMAND SEARCH RETURN PROCESSOR 76C — r 80FD COLLABORATIVE/ COLLABORATIVE CONTENT-BASED FEEDBACK FILTER FEEDBACK PROCESSOR- DATA STRUCTURE FEEDBACK DATAFOR CONSIDERED INFORMONS ADAPT VE FEEDBACK DATA FOR USER CONTENT PROFILE ADAPTIVE FEEDBACK DATA FOR OTHER USER CONTENT PROFILES WIRE ROTATION TO USERS RELEVANT INFORMONS LIST Ml SITE MEMORY SEARCH -78C WEB -68C A <♦-*■ WEB SITE DEMAND -82C ON DEMAND SPIDERINFORMON ACQUISITION MEMORY -78CM MEMORY INFORMON ACQUISITION CONT1NOUS SPIDER- 72C FIG. 10 4—► ■66C Memory WIRE OS e © I § CO US 6,775,664 B2 1 and informons obtained from the network are compared to the profile for relevancy and ranking. A continuously oper ating "spider" scans the network to find informons which are received and processed for relevancy to the individual user's wire or to wires established by numerous other users. INFORMATION FILTER SYSTEM AND METHOD FOR INTEGRATED CONTENTBASED AND COLLABORATIVE/ADAPTIVE FEEDBACK QUERIES This is a continuation of U.S. Ser. No. 09/204,149 filed Dec. 3, 1998 now U.S. Pat. No. 6,314,420, which is a continuation in part of U.S. Ser. No. 08/627,436, filed Apr. 4, 1996 now U.S. Pal. No. 5,867,799, the disclosures of which arc hereby incorporated by reference herein. This application is also a continuation in part of U.S. Ser. No. 09/195,708 filed Nov. 19, 1998 now U.S. Pat. No. 6^08,175, which is a continuation in part of U.S. Ser. No. 08/627,436, filed Apr. 4,1996 now U.S. Pat. No. 5,867,794, the disclosures of which are hereby incorporated by refer The integrated filter system compares received informons to the individual user's query profile data, combined with collaborative data, and ranks, in order of value, informons 10 select any listed informon for consideration. As the system continues to feed the individual user's "wire", the stored relevant informon list typically changes 15 ence herein. BACKGROUND OF THE INVENTION The present invention relates to information processing systems for large or massive intbrmation networks, such as the internet, and more particularly to such information systems especially adapted for operation in portal and other web sites wherein a search engine operates with collabora tive and content-based filtering to provide better search responses to user queries. 20 user's query. In essence, the pre-search process is a short term search for quickly finding and quickly iclcniifying information entities which are content matched to the user's informons, adjustments in the user's query, feedback evalu ations by the user for considered informons, and updatings in collaborative feedback data. Received informons arc similarly processed for other users' wires established in the information filter system. Thus, the integrated information filter system performs continued long-term searching, i.e., it compares network informons to multiple users' queries to course of time, whereas conventional search engines initiate a search in response lo an individual user's query and use 2S content-based filtering to compare the query lo accessed network informons typically to find matching informons during a limited, short-term search time period. thousands of sites for consideration by a user at the user's site having a search capability, and thereafter enters a particular query, i.e., a request for informons relevant to a topic, a field of interest, etc. Thereafter, the search site typically employs a "spider" scanning system and a contentbased filter in a search engine to search the internet and find informons which match the query. This process is basically a pre-search process in which matching informons are found, at the time of initiating a search for the user's query, by comparing informons in an "informon data base" to the due to factors including a return of new and more relevant find matching informons for various users' wires over the In the operation of the internet, a countless number of informons arc available for downloading from any of at least location. A user typically connects to a portal or other web found to be relevant. The system maintains the ranked informons in a stored list from which the individual user can llic present invention is directed to an information pro 30 35 cessing system especially adapted for use at internet portal or other web sites to make network searches for information entities relevant to user queries, with collaborative feedback data and content-based data and adaptive filter structuring, being used in filtering operations to produce significantly improved search results. SUMMARY OF THE INVENTION A search engine system employs a content-based filtering system for receiving informons from a network on a con 40 query. tinuing basis and for filtering the informons for relevancy to a wire or demand query from an individual user. A feedback system provides feedback data from other users. Another system controls the operation of the filtering The return list of matching informons can be very exten system lo filter for one of a wire response and a demand sive according to the subject of the query and the breadth of 45 response and to return the one response to the user. The the query. More specific queries typically result in shorter filtering system combines pertaining feedback data from the return lists. In some cases, the search site may also be feedback system with content profile data in determining the structured to find web sites which probably have stored relevancy of the informons for inclusion in at least a wire informons matching the entered query. Collaborative data can be made available lo assist in informon rating when a user actually downloads an response lo the query. 50 BRIEF DESCRIPTION OF THE DRAWINGS informon, considers and evaluates it, and returns data to the FIG. 1 is an diagrammatic representation f an embodi ment of an information filtering apparalus *ici;urding to the present invention. search site as a representation of the value of the considered informon lo the user. In the patent application which is parent to this FIG. 2 is an diagrammatic representation of another embodiment of an information filtering apparatus according continualion-in-parl application, i.e. Ser. No. 08/627,436, filed by the present inventors on Apr. 4, 1996, and hereby lo Ihc present invention. incorporated by reference, an advanced collaborative/ content-based information filter system is employed to pro vide superior filtering in the process of finding and rating informoas which match a user's query. The information filter structure in this system integrates content-based filter ing and collaborative filtering to determine relevancy of informons received from various sites in the internet or other network. In operation, a user enters a query and a corre sponding "wire" is established, i.e., the query is profiled in storage on a content basis and adaptively updated over time, FIG. 3 is a How diagram for an embodiment of an information filtering method according to the present invenlion. FIG. 4 is a flow diagram for another embodiment of an information filtering meihod according tu the present inven tion. 65 FIG. 5 is a flow diagram for yet another embodiment of an information filtering meihod according lo the present invention. US 6,775,664 B2 FIG. 6 is an illustration of a ihree-component-input model and profile with associated predictors. FIG. 7 is an illustration of a mindpool hierarchy. FIG. 8 is a logic diagram illustrating a search selection feature of the invention; FIG. 9 is a functional block diagram of an embodiment of the invention in which an integrated information processing system employs a search engine and operates with combined collaborative filtering and content-based filtering, which is preferably adaptive, to develop responses to user queries. FIG. 10 shows another and presently preferred embodi ment of the invention in which an information processing system includes an integrated filter structure providing 10 represented by ihe member client, can increase the impor tance of that particular factor, A's authorship, for others of the user's member clients. Each of the clients of one user can be associated with the individual clients of other users insofar as the profiles of the respeclive clients have similar attributes. A "community" is a group of clients, called member clients, that have similar member client profiles, i.e., that share a subset of attributes or interests. In general, the subset of shared attributes forms the community profile for a given community and is representative of the commu nity norms, or common client attributes. The "relevance" of a particular informon broadly describes how well it satisfies the user's information need. The more relevant an informon is to a user, the higher the collaborative/adaptive-contcnt-bascd filtering to develop "signal" content. ITlc less relevant the informon, the higher longer term, continuing responses to user queries, and a the "noise" content. Clearly, the notion of what is relevant to search engine structure which provides short term, demand a particular user can vary over time and with context, and the responses to user queries, with the system directing user user can find the relevance of a particular informon limited queries to the appropriate structure for responses. to only a few of the user's potentially vast interest areas. 20 Because a user's interests typically change slowly, relative DETAILED DESCRIPTION OF THE to the data stream, it is preferred to use adaptive procedures EMBODIMENTS to track the user's current interests and follow them over The invention herein is preferably configured with an time. Provision, too, is preferred to be made for sudden apparatus and method for information filtering in a computer changes in interest, e.g., taking up antiquarian sword col system receiving a data stream from a computer network, in 35 lecting and discontinuing stamp collecting, so that the which entities of information relevant to the user, or method and apparatus track the evolution of "relevance" to "infonnons," are extracted from the data stream using a user and the communities of which the user is a member. content-based and collaborative filtering. The information In general, information filtering is the process of selecting filtering is long term in the sense that it operates on a the information that a users wishes to see, i.e., informons, continuing basis, and is both interactive and distributed in x from a large amount of data. Content-based filtering is a structure and method. It is interactive in that communication process of filtering by extracting features from the informon, is substantially bi-directional at each level of the filter. It is e.g., the text of a document, to determine the informon's distributed in that all or part of the information filler can relevance. Collaborative filtering, on the other hand, is the include a purely hierarchical (up-and-down/parent-child) process of filtering informons, e.g., documents, by deter structure or method, a purely parallel (peer-to-peer) structure 35 mining what informons other users with similar interests or or method, or a combination of hierarchical and parallel needs found to be relevant. structures and method. The system apparatus includes a filter structure having As used herein, the term "informon" comprehends an adaptive content-based filters and adaptive collaborative filters, which respectively include, and respond to, an adap ticular user. In general, informons can be heterogeneous in 40 tive content profile and an adaptive collaboration profile. As nature and can be all or part of a textual, a visual, or an audio used herein, the term "content-based filler" means a filter in entity. Also, informons can be composed of a combination of which content data, such as key words, is used in performing the aforementioned entities, thereby being a multimedia the filtering process. In a collaborative filter, other user data entity. Furthermore, an informon can be an entity of pat is used in performing Ihe filtering process. A collaborative terned data, such as a data file containing a digital repre- 45 filler is also sometimes referred to as a "content" filler since sentation of signals and can be a combination of any of the it ultimately performs the task of finding an object or previously-mentioned entities. Although some of the data in document having content relevant to the content desired by a data stream, including informons, may be included in an a user. If there are some instances herein where the term informon, not all data is relevant to a user, and is not within "content filter" is used as distinguished from a collaborative the definition of an informon. By analogy, an informon may $0 filter, it is intended that the term "content filter" mean information entity of potential or actual interest to a par be considered to be a "signal," and the total data stream may "content-based filter." The adaptive filters each are preferred be considered to be "signal+noise." Therefore, an informa to include at least a portion of a community filler for each community serviced by the apparatus, and a portion of a tion filtering apparatus is analogous to other types of signal filters in that it is designed to separate the "signal" from the "noise." member client filler for each member client of the serviced 55 communities. For this reason, the adaptive filtering is dis communication with the network. Because an individual tributed in Ihal each of the community filters perform adaptive collaborative filtering and adaptive content user can be interested in multiple categories of information, filtering, even if on different levels, and even if many fillers Also as used herein, the term "user" is an individual in exist on a given level. The integrated tillering permits an a unique profile, or set of attributes. Each member client 60 individual user lo be a unique member client of multiple profile, then, is representative of a particular group of user communities, with each community including multiple preferences. Collectively, the member client profiles asso member clients sharing similar interests. The adaptive fea ciated with each user is the user profile. The present inven tures permit the interests of member clients and entire tion can apply the learned knowledge of one of a user's communities to change gradually over time. Also a member member clients to others of the user's member clients, so 65 client has the ability to indicate a sudden change in Ihal the importance of the learned knowledge, e.g., ihe user's preference, e.g., the member client remains a collector but is preference for a particular author in one interest area as no longer interested in coin collecting. the user can be considered to be multiple clients each having US 6,775,664 B2 ers can be augmented to include their respective credibility The filter structure also implements adaptive credibility profiles, community membership, and professional or avofiltering, providing member clients with a measure of inforcational affiliations. A rating of each recommendation pro mon credibility, as judged by other member clients in the vided by a responder, relative to other responders' community. For example, a new member client in a first community, having no credibility, can inject an Mormon 5 recommendations, also can be supplied. The requester can into the data flow, thereby providing other member clients in provide feedback to each of the responders, including a other communities with the proposed informon, based on the rating of the credibility of the responder on the particular respective community profile and member client profiles. If topic, or the quality of the recommendation. As before, the the other member clients believe the content of the informon responders can accrue quality points, value tokens, or "info to be credible, the adaptive credibility profile will reflect a 10 bucks," as apportioned by the requester, in return for the growing credibility. Conversely, feedback profiles from informon recipients that indicate a lack of credibility cause the adaptive credibility profile, for the informon author, to reflect untrustworthiness. However, the growth and declina useful recommendation. Furthermore, certain embodiments are preferred to be self-optimizing in that some or all of the adaptive filters used tion of credibility are not "purely democratic," in the sense that one's credibility is susceptible to the bias of others' perceptions, so the growth or declination of one's credibility is generally proportional to how the credibility of the new 15 collaboration, credibility, reliability, etc. member client is viewed by other member clients. Member clients can put their respective reputations "on the line," and engage in spirited discussions which can be refereed by other interested member clients. The credibility 20 ibility sub-profiles for the credibility of the content of the and the like, and can be fed back to discussion participants, reviewers, and observers to monitor the responses of others to the debate. The adaptive credibility profiles for those member clients with top credibility ratings in their commu nities may be used to establish those member clients as The filter structure herein is capable of identifying the preferences of individual member clients and communities, providing direct and inferential consumer preference information, and tracking shifts in the preferences whether the shifts be gradual or sudden. The consumer preference information can be used to target particular consumer pref erence groups, or cohorts, and provide members of the cohort with targeted informons relevant to their consumer profile further can be partitioned to permit separate cred informon, the author, the author's community, the reviewers, in the system dynamically seek optimal values for the function intended by the filter, e.g., content analysis, 25 preferences. This information also may be used to follow demographical shifts so that activities relying on accurate demographical data, such as retail marketing, can use the consumer preference information to anticipate evolving con 30 sumer needs in a timely manner. To provide a basis for adaptation, it is preferred that each raw informon be processed into a standardized vector, which With this functionality, additional features can be may be on the order of 20,000 to 100,000 tokens long. The implemented, including, for example, "instant polling" on a learning and optimization methods that ultimately are cho matter of political or consumer interest. In conjunction with both content and collaborative filtering, credibility filtering, 35 sen are preferred to be substantially robust to the problems which can be presented by such high-dimensional input and the resulting adaptive credibility profiles, also may be spaces. Dimensionality reduction using methods such as the used to produce other features, such as on-line consultation singular value decomposition (SVD), or auto-encoding neu and recommendation services. Although the "experts" in the ral networks attempt to reduce the size of the space while communities most closely related to the topic can be afforded special status as such, member clients from other 40 initially retaining the information contained in the original representation. However, the SVD can lose information communities also can participate in the consultation or during the transformation and may give inferior results. Two recommendation process. adaptation/learning methods that are presently preferred In one embodiment of the consultation service, credibility include the TF-IDF technique and the MDL technique. "experts" in their respective communities. filtering can be augmented to include consultation filtering. With this feature, a member client can transmit an informon to the network with a request for guidance on an issue, for example, caring for a sick tropical fish. Other member clients can respond to the requester with informons related to the topic, e.g., suggestions for water temperature and 45 antibiotics. The informons of the respondent can include 50 their respective credibility profiles, community membership, and professional or vocational affiliations. The requester can provide feedback to each of the responders, including a rating of the credibility of the respondcr on the particular topic. Additionally, the responders can accrue quality points, 55 FIG. 1 illustrates one embodiment of an information filtering apparatus 1 structured for search engine implemen tation in accordance with the invention as described subse quently herein in connection with FIGS. 8 and 9. In general, a data stream is conveyed through network 3, which can be a global internetwork. A skilled artisan would recognize that apparatus 1 can be used with other types of networks, including, for example, an enterprise-wide network, or "intranet." Using network 3, User #1 (5) can communicate with other users, for example. User #2 (7) and User #3 (9), and also with distributed network resources such as resource value tokens, or "info bucks," as apportioned by the #1 (11) and resource #2 (13). requester, in return for useful guidance. Apparatus 1 is preferred to be part of computer system 16, although User #1 (5) is not required to be the sole user of Similarly, one embodiment of an on-line recommendation class medieval armor rcfurbishcrs. In this embodiment, the computer system 16. In one present embodiment, it is preferred that computer system 16 having information filter apparatus 1 therein filters information for a plurality of users. One application for apparatus 1, for example, could be requester can transmit the informon to the network bearing the request for recommendation. Other member clients can commercial information filtering service, which can be service uses recommendation filtering and adaptive recom mendation profiles to give member clients recommendations on matters as diverse as local auto mechanics and world- respond to the requester with informons having specific recommendations or disrecommendations, advice, etc. As with the consultation service, the informons of the respond 60 that user 5 and similar users may be subscribers to a 65 provided by the owner of computer system 16. Extraction means 17 can be coupled with, and receives data stream 15 from, network 3. Extraction means 17 can US 6,775,664 B2 7 identify and extract raw infonnons 19 from data stream 15. Each of the raw informons 19 has an information content. Extraction means 17 uses an adaptive content filter, and at least part of the adaptive content profile, to analyze the data stream for the presence of raw informons. Raw infonnons 5 are those data entities whose content identifies them as being "in the ballpark," or of potential interest to a community coupled to apparatus 1. Extraction means 17 can remove 8 A predicted response anticipated by adaptive filtering means 21 can be compared to the actual feedback response 29 of user 5 by first adaptation means 30, which derives a prediction error. First adaptation means 30 also can include prediction means 33, which collects a number of temporally- duplicate informons, even if the informons arrive from spaced feedback responses, to update the adaptive collaboration profile, the adaptive content profile, or both, with an adapted future prediction 34, in order to minimize subsem prec[icuon errors by the respective adaptive collabo- Extraction means 17 also can use at least part of a commu- >° °™ embodiment of the invention herein, it is preferred different sources, so that user resources are not wasted by handling and viewing repetitive and cumulative information. ra,ion filt(,r and ad ive „,„„,„, fiIu.r .,.«•._. ■■.-•• r ■ nity profile and a user profile for User #1 (5) to determine whether the informon content is relevant to the community »hat prediction means 33 be a self-optimizing prediction means using a preselected learning technique. Such tech- informon is a selected raw informon that, based upon the respective member client and community profiles, is pre- also can include a neural network therein and employ a neural network learning technique for adaptation and pre of which User #1 is a part. niques can include, for example, one or more of a top-keyFilter means 21 adaptively filters raw informons 19 and 15 word-selection learning technique, a nearest-neighbor learnproduces proposed informons 23 which are conveyed to ing technique, a term-weighting learning technique, and a User #1 (5) by communication means 25. A proposed probabilistic learning technique. First adaptation means 30 dicted to be of particular interest to a member client of User 20 diction. In one present embodiment of the invention, the 5. Filter means 21 can include a plurality of community filters 270,6 and a plurality of member client fillers 28a-e, term-weighting learning technique is preferred to be a TF-IDF technique and the probabilistic learning technique is profiles. When raw Mormons 19 are filtered by filler means adapIaliori means 30 further can include second each respectively having community and member client artlcXt^^ User #1 (5), responsive to the respective community and member client profiles, are conveyed thereto. Where such is desired, filter means 21 also can include a credibility filter preferred to be a MDL learning technique, collaboration profiles the adaptive content profiles, the community profile, and the user profile, responsive to at lcasl ooe of lhc otner Proflles-In thls manner>lrends altnb- which enables means 21 to perform credibility filtering of ,n utable Io individual member clients, individual users, and raw informons 19 according to a credibility profile. * '"dividual communities in one domain of system 16 can be It is preferred that the adaptive filtering performed within "cognized by and influence, similar entities in other filter means 21 by the plurality of filters 27a,b, l%a-<, and 35, use a self-optimizing adaptive filtering so that each of the domil™ <meludinS of a8e"1 ™nds )• conta«led wUmn «*»" tem 16 t0 l.hue extem lhal the «*Pective entities share parameters processed by filters 27a,i>, 2%a-e, and 35, is 35 common attributes. driven continually to respective values corresponding to a minimal error for each individual parameter. Self- Apparatus 1 also can include a computer storage means 31 for storing the profiles, including the adaptive content error, etc., are favored to prevail. remote analysis, for example, by User #2 (7). optimization encourages a dynamic, marketplace-like operaProfiIe and Ihe adaptive collaboration profile. Additional tion of the system, in that those entities having the most trend-tracking infonnation can be stored for later retrieval in desirable value, e.g., highest credibility, lowest predicted 40 storage means 31, or may be conveyed to network 3 for Self-optimization can be effected according to respective FIG. 2 illustrates another preferred embodiment of inforpresclcctcd self-optimizing adaptation techniques including, malion filtering apparatus 50, in computer system 51. Appafor example, one or more of a top-key-word-selection adapranis 50 can include first processor 52, second processors tation technique, a nearest-neighbor adaptation technique, a 45 53"A third processors Ma-d, and a fourth processor 55, to term-weighting adaptation technique, a probabilistic adapeffect the desired information filtering. First processor 52 tation technique, and a neural network learning technique. In can be coupled to, and receive a data stream 56 from, one present embodiment of the invention, the termnetwork 57. First processor 52 can serve as a pre-processor weighting adaptation technique is preferred to be a TF-IDF by extracting raw informons 58 from data stream 56 rcspontechnique and the probabilistic adaptation technique is pre- 50 sive to preprocessing profile 49 and conveying informons 58 ferred to be a MDL technique. to second processors 53a,b. When user 5 receives proposed informon 23 from appaBecause of Ihe inconsistencies presenled by the nearlyratus 1, user 5 is provided with multiple feedback queries infinite individual differences in the modes of along with the proposed informon. By answering, user 5 conceptualization, expression, and vocabulary among users, creates a feedback profile that corresponds to feedback 55 even within a community of coinciding interests, similar response 29. User feedback response 29 can be active notions can be described with vastly different terms and feedback, passive feedback, or a combination. Active feedconnotations, greatly complicating informon characterizaback can include the user's numerical rating for an tion. Mode variations can be even greater between disparate informon, hints, and indices. Hints can include like or dislike communities, discouraging interaction and knowledge- of an author, and informon source and timeliness. Indices 60 sharing among communities. Therefore, it is particularly can include credibility, agreement with content or author, preferred that processor 52 create a mode-invariant reprehumor, or value. Feedback response 29 provides an aclual scntalion for each raw informon, thus allowing fast, accurate response to proposed informon 23, which is a measure of the informon characterization and collaborative filtering. Moderelevance of the proposed informon to the information need invariant representations tend to facilitate relevant informon of user 5. Such relevance feedback attempts to improve the 65 selection and distribution within and among communities, performance for a particular profile by modifying Ihe thereby promoting knowledge-sharing, thereby benefiting profiles, based on feedback response 29. the group of interlinked communities, i.e., a society, as well. US 6,775,664 B2 10 First processor 52 also can be used to prevent duplicate informons, e.g., the same information from different sources, from further penetrating, and thus consuming the resources of, the filtering process. Other processors 53,a,6, 54a-d, also may be used to perform the duplicate informa tion elimination function, but additionally may measure the differences between the existing informon and new infor mons. That difference between the content of the informon the previous time the user reviewed it and the content of the informon in its present form is the "delta" of interest. Processors Sia,b, 54a-d may eliminate the informon from technique, a nearest-neighbor adaptation technique, a termweighting adaptation technique, and a probabilistic adapta tion technique. Any of the adaptive filters 66a-d may include a neural network. In one present embodiment of the invention, the term-weighting adaptation technique is pre ferred to be a TF-IDF technique and the probabilistic adap 10 tation technique is preferred to be a MDL technique. An artisan would recognize that one or more of the processors 52-55 could be combined functionally so that the actual number of processors used in the apparatus 50 could be less than, or greater than, that illustrated in FIG. 2. For example, in one embodiment of the present invention, first further processing, or direct the new, altered informon to the processor 52 can be in a single microcomputer workstation, member client, in the event that nature or extent of the with processors 53-55 being implemented in additional change exceeds a "delta" threshold. In general, from the notion of exceeding a preselected delta threshold, one may respective microcomputer systems. Suitable microcomputer 15 infer that the informon has changed to the extent that the systems can include those based upon the Intel® Pentiumchange is interesting to the user. The nature of this change Pro™ microprocessor. In fact, the flexibility of design can be shared among all of a user's member clients. This presented by the invention allows for extensive scalability of delta threshold can be preselected by the user, or by the apparatus 50, in which the number of users, and the com preselected learning technique. Such processing, or "delta 20 munities supported may be easily expanded by adding learning" can be accomplished by second processors SSa,b, suitable processors. As described in the context of FIG. 1, alone or in concert with third processors S4a-d. Indeed, the interrelation of the several adaptive profiles and respec third processor S4a-d can be (he locus for delta learning, tive filters allow trends attributable to individual member where processors 54a-d adapts a delta learning profile for clients, individual users, and individual communities in one each member client of the community, i.e. user, thus antici ,5 domain of system 51 to be recognized by, and influence, pating those changes in existing informons that the user may similar entities in other domains, of system 51 to the extent find "interesting." that the respective entities in the different domains share Second processors S3a,b can filter raw informons 58 and extract proposed community informons 59a,b therefrom. Informons 59a,b are those predicted by processors 53a,b to be relevant to the respective communities, in response to common attributes. The above described system operates in accordance with 30 communily profiles 48a,b that are unique to the communi ties. Although only two second processors S3a,b are shown in FIG. 2, system 51 can be scaled to support many more processors, and communities. It is presently preferred that second processors S3a,b extract community informons S9a,b using a two-step process. Where processor 52 has generated mode-invariant concept representations of the raw informons, processor 53a,b can perform concept-based 35 a method 100 for information filtering in a computer system, as illustrated in FIG. 3, which includes providing a dynamic informon characterization (step 105) having a plurality of profiles encoded therein, including an adaptive content profile and an adaptive collaboration profile; and adaptively filtering the raw informons (step 110) responsive to the dynamic informon characterization, thereby producing a proposed informon. The method continues by presenting the proposed informon to the user (step 115) and receiving a feedback profile from the user (step 120), responsive to the indexing, and then provide detailed communily filtering of 4U proposed informon. Also, the method includes adapting at each informon. least one of the adaptive content profile (step 125) and the Third processors 54a—d can receive community infor adaptive collaboration profile responsive to the feedback mons S9a,b from processors 53a, 6, and extract proposed profile; and updating the dynamic informon characterization member client informons 61a-d therefrom, responsive to (step 130) responsive thereto. unique member client profiles 62a-d for respective ones of 45 The adaptive filtering (step 110) in method 100 can be member clients 63a-d. Each user can be represented by machine distributed adaptive filtering that includes commu multiple member clients in multiple communities. For nity filtering (substep 135), Using a community profile for example, each of users 64a,b can maintain interests in each each community, and client filtering (substep 140), similarly of the communities serviced by respective second processors using a member client profile for each member client of each 53a,b, and each receive separate member client informons 50 community. It is preferred that the filtering in substeps 135 6lb,c and 61a,d, respectively. and 140 be responsive to the adaptive content profile and the Each member client 63a-d provides respective member adaptive collaboration profile. Method 100 comprehends client feedback 6Sa-d to fourth processor 55, responsive to servicing multiple communities and multiple users. In turn, the proposed member client informons 6la-d. Based upon each user may be represented by multiple member clients, the member client feedback 6Sa-d, processor 55 updates at 5; with each client having a unique member client profile and least one of the preprocessing profile 49, community profiles being a member of a selected communily. It is preferred that 48a,b and member client profiles 62a-d. Also, processor 55 updating the dynamic informon characterization (step 130) further include predicting selected subsequent member cli adapts at least one of the adaptive content profile 68 and the adaptive collaboration profile 69, responsive to profiles 49, 4Sa,b, and 62a-d. ent responses (step 150). 60 Method 100 can also include credibility filtering (step Fourth processor 55 can include a plurality of adaptive 155) of the raw informons responsive to an adaptive cred filters 66a-d for each of the aforementioned profiles and ibility profile and updating the credibility profile (step 160) computer storage therefor. It is preferred that the plurality of responsive to the user feedback profile. Method 100 further adaptive filters 66a-d be self-optimizing adaptive fillers. can include creating a consumer profile (step 165) respon Self-optimization can be effected according to a preselected 65 sive to the user feedback profile. In general, the consumer self-optimizing adaptation technique including, for example, profile is representative of predetermined consumer prefer one or more of a top-key-word-selection adaptation ence criteria relative to the communities of which the user is US 6,775,664 B2 11 12 a member client. Furthermore, grouping selected ones (step 170) of the users into a preference cohort, responsive to the It is preferred that coherent portions of the data stream, i.e., potential raw informons, be first parsed (step 301) into preselected consumer preference criteria, can facilitate pro- generalized words, called tokens. Tokens include punctua- viding a targeted informon (step 175), such as an advertisement, to the preference cohort. 5 FIG. 4 illustrates yet another preferred method 200. In general, method 200 includes partitioning (step 205) each (ion and other specialized symbols that may be part of the structure found in the article headers. For example, in addition to typical words such as "seminar" counting as user into multiple member clients, each having a unique member client profile with multiple client attributes and tokens, the punctuation mark "S" and the symbol "Newsgrouprcomp.ai" are also tokens. Using noun phrases as tokens also can be useful mutinies with each member client in a particular community „,.. thereby providing each community with a unique commu- tokens not occurring in, the document. Using this type of grouping member clients (step 210) .0 form multiple com- „ sharing selected client attributes with other member clients, nity profile having common client attributes. Nex( ' , token cou|Us for ^ documenl {& aeited . . ,,.,,, . . ... ,. Thls vector ls the slze of *e l°lal ™«bu|«y. Wl'h. zeros »°' vector 1S fmet.mes «1 fed the bag-of-words model. While Method 200 continues by predicting a community profile ,, Ihe bag-of-words model does no. capture the order of the (step 215) for each community using firs, prediction criteria, '5 lokens in the dofme"V wlnch "** be «f? forflm8uistic and predicting a member client profile (step 220) for a member client in a particular community using second or symaCc analysts, il captures most of the information needed for filImnS PurPoses- prediction criteria. Method 200 also includes the steps of Although, it is common in information retrieval systems extracting raw informons (step 225) from a data stream and „ to SrouP 'he tokens together by their common lingmshc selecting proposed informons (step 230) from raw infor- "° roots. callcd stemming, as a next step it is preferred in the mons. The proposed informons generally are correlated with Presenl invention that the tokens be left in their unstemmed client to whom the proposed informon is offered. After providing the proposed informons to the user (step 235), " receiving user feedback (step 240) in response to the proposed informons permits the updating of the first and second prediction criteria (step 245) responsive to the user feed- Creating a mode-invariant profile (step 305), C, includes creating a conceptual representation for each informon, A, that is invariant with respect to the form-of-expression, e.g., vocabulary and conceptualization. Each community can consist of a "Meta-U-Zine" collection, M, of informons. one or more of the common client attributes of a community, and of the member client attributes of the particular member back. form- ln lhis form-lhe tokens are amenable to being dassifled int0 mode-invariant concept components, Based upon profile C, the appropriate communities, if any, Method 200 further may include prefiltering the data 3° for each informon in the data stream are selected by concept- stream (step 250) using the predicted community profile, with the predicted community profile identifying the raw informons in the data stream based indexing (step 310) into each M. That is, for each c°ncepl C that describes A, put A into a queue Q^, for each M wmch k related to C. It is preferred that there is a list of filtering the raw informons using an adaptive content filter (step 255) responsive to the informon content; filtering the raw informons using an adaptive collaboration filter (step 260) responsive to the common client attributes for the index-searched. Each A that is determined to be a poor fit for a Particular M is eliminated from further processing. Once A has bcen matched w,th a particular M, a more complex community profile Pw is developed and maintained for each an adaptive member client filter (step 26S) responsive to the unique member client profile. determine whether it matches PA, strongly enough to be relained or "weeded" out (step 325) at this stage. Step 230 of selecting proposed informons can include 35 Ms Ihal » slored for each concePt and lhat can be eas«y pertaining community; and filtering the raw informons using 40 M <sleP 315>-lf A has fallen inl° Q"- lhen A >» analyzed to It is preferred that updating the first and second prediction Eacn A for a particular M is sent to each user's personal criteria (step 245) employ a self-optimizing adaptation aScnl' or member client U of M, for additional analysis technique, including, for example, one or more of a top- 45 based on lhe member client's profile (step 325). Each A that key-word-selection adaptation technique, a nearest-neighbor fils u>s interests sufficiently is selected for U's personal adaptation technique, a term-weighting adaptation informon, or "U-Zinc," collection, Z. Poor-fitting informons technique, and a probabilistic adaptation technique. Il is are eliminated from placement in Z (step 330). This userfurther preferred that the term-weighting adaptation techlevel staBe of analysis and selection may be performed on a nique be a TF-IDF technique and the probabilistic adapta- 50 cenlralizcd server site or on the user's computer, lion technique be a minimum description length technique. Next, the proposed informons are presented to user U The information filtering method shown in FIG. 5 pro- <sleP 335) for review. User U reads and rales each selected 10 the appropriate member clicnl. The method 300 includes of a ralinS for now "interesting" U found A to be, as well as mode-invariant (MI) profile of the informon (slep 305); selecting lhe most appropriate communities for each Opinion feedback: Did U agree, disagree, or have no opinion regarding the position of A? informon, based on the MI profile, using concept-based indexing (step 310); detailed analysis (step 315) of each Credibility Feedback: Did U find the facts, logic, sources, and quotes in A to be truthful and credible or not? informon with regard to its fit within each community; to eliminating poor-fitting informons (step 320); detailed fil- Informon Qualities: How does the user rate the informons qualities, for example, "interestingness," credibility, ftinnincss, content value, writing quality, violence content, sexual content, profanity level, business importance, scientific merit, surprise/unexpectedness of information content, artistic quality, dramatic appeal, entertainment value, (rendiness/importance to future direclions, and opinion agreement. vides rapid, efficient data reduction and rouiing, or filtering, A found in z (sleP 340) Th* feedback from U can consist parsing the data stream into tokens (step 301); creating a 55 onc or morc of tnc following: Icringof each informon relative to fit for each member client (step 325); eliminating poor-fitting informons (step 330); presenting the informon to the member client/user (slep 335); and obtaining the member client/user response, includ- 65 ing multiple ratings for different facets of lhe user's response to the informon (step 340). US 6,775,664 B2 13 14 Specific Reason Feedback: Why did the user like or dislike A? Because of the authority? Because of the source? Because A is out-of-date (e.g. weather report from 3 weeks ago)? Because the information contained in A has been seen already? (i.e., the problem of duplicate information delivery) Categorization Feedback: Did U liked A? Was it placed within the correct M and Z? Such multi-faceted feedback queries can produce rich feedback profiles from U that can be used to adapt each of the profiles used in the filtering process to some optimal operating point. One embodiment of creating a Ml profile (step 305) for each concept can include concept profiling, creation, and optimization. Broad descriptors can be used to create a substantially-invariant concept profile, ideally without the word choice used to express concept C. A concept profile can include positive concept clues (PCC) and negative concept clues (NCC). The PCC and NCC can be combined by a processor to create a measure-of-fit that can be compared to a predetermined threshold. If the combined effect of the PCC and NCC exceeds the predetermined threshold, then informon A can be assumed to be related to concept C; otherwise it is eliminated from further processing. PCC is a set of words, phrases, and other features, such as the source or the author, each with an associated weight, that tend to be in A which contains C. In contrast, NCC is a set of words, phrases, and other features, such as the source or the author, each with an associated weight that tend to make it more unlikely that A is contained in C. For example, if the term "car" is in A, then it is likely to be about automobiles. However, if the phrase "bumper car" also is in A, then it is more likely that A related to amusement parks. Therefore, "bumper car" would fall into the profile of negative concept clues for the concept "automobile." accept the article for subsequent filtering, for example, 5 10 15 20 25 30 35 detailed indexing of incoming A. It is preferred that a data structure including a database be used, so that the vector of Ms, that are related to any concept C, may be looked-up. Furthermore, when a Z is created by U, the concept clues given by U to the information filter can be used to determine a set of likely concepts C that describe what U is seeking. For example, if U types in "basketball" as a likely word in the associated Z, then all concepts that have a high positive weight for the word "basketball" are associated with the new Z. If no such concepts C seem to pre-exist, an entirely new concept C is created that is endowed with the clues U has given as the starting profile. To augment the effectiveness of concept-based indexing, it is preferred to provide continual optimization learning. In general, when a concept C no longer uniquely triggers any documents that have been classified and liked by member clients U in a particular community M, then that M is removed from the list of M indexed into by C. Also, when there appears to be significant overlap between articles fitting concept C, and articles that have been classified by users as belonging to M, and if C does not currently index into M, then M can be added to the list of M indexed into by C. The foregoing heuristic for expanding the concepts C that are covered by M, can potentially make M too broad and, thus, accept too many articles, Therefore, it further is preferred that a reasonable but arbitrary limit is set on the conceptual size covered by M. With regard to the detailed analysis of each informon A with respect to the community profile for each M, each A must pass through this analysis for each U subscribing to a particular M, i.e., for each member client in a particular community. After A has passed that stage, it is then filtered at a more personal, member client level for each of those users. The profile and filtering process are very similar for both the community level and the member client level, except that at the community level, the empirical data obtained is for all U who subscribed to M, and not merely Typically, concept profile C can be created by one or more an individual U. Other information about the individual U means. First, C can be explicitly created by user U. Second, 40 can be used to help the filter, such as what U thinks of what C can be created by an electronic thesaurus or similar device a particular author writes in other Zs that the user reads, and that can catalog and select from a set of concepts and the articles that can't be used for the group-level M processing, words that can be associated with that concept. Third, C can FIG. 6 illustrates the development of a profile, and its be created by using co-occurrence information that can be associated predictors. Typically, regarding the structure of a generated by analyzing the content of an informon. This 45 profile 400, the information input into the structure can be means uses the fact that related features of a concept tend to divided into three broad categories: (1) Structured Feature occur more often within the same document than in general. Information (SFI) 405; (2) Unstructured Feature Informa- Fourth, C can be created by the analysis of collections, H, of tion (UFI) 410; and (3) Collaborative Input (Cl) 415. FeaA that have been rated by one or more U. Combinations of tures derived from combinations of these three types act as features that tend to occur repeatedly in II can be grouped 50 additional peer-level inputs for the next level of the rating together as PCC for the analysis of a new concept. Also, an prediction function, called (4) Correlated-Feature, ErrorA that one or more U have rated and determined not to be Correction Units (CFECU) 420. From inputs 405,410,415, within a particular Z can be used for the extraction of NCC. 420, learning functions 42Sa-d can be applied to get two Concept profiles can be optimized or learned continually computed functions 426<w/, 428a-d of the inputs. These after their creation, with the objective that nearly all As that 55 two functions are the Independent Rating Predictors (1RIJ) Us have found interesting, and belonging in M, should pass 426a-</, and the associated Uncertainty Predictors (UP) the predetermined threshold of at least one C that can serve 428a-d. IRPs 426a-d can be weighted by dividing them by as an index into M. Another objective of concept profile their respective UPs 42Sa-d, so that the more certain an 1RP management is that, for each A that does not fall into any of 426a-d is, the higher its weight. Each weighted 1RP 429a-d the one or more M that are indexed by C, the breadth of C 60 is brought together with other IRPs 429a-d in a combination is adjusted to preserve the first objective, insofar as possible. function 427a-d. This combination function 427a-rf can be For example, if C's threshold is exceeded for a given A, C's from a simple, weighted, additive function to a far more breadth can be narrowed by reducing PCC, increasing NCC, complex neural network function. The results from this arc or both, or by increasing the threshold for C. normalized by the total uncertainty across all UPs, from In the next stage of filtering, one embodiment of content- 65 Certain-zero to Uncertain-infinity, and combined using the based indexing takes an A that has been processed into the Certainty Weighting Function (CWF) 430. Once the CWF set of C that describe it, and determine which M should 430 has combined the IRPs 426a-rf, it is preferred that result US 6,775,664 B2 15 16 432 be shaped via a monotonically increasing function, to map 19 the range and distribution of the actual ratings. This function is called the Complete Rating Predictor (CRP) 432. SFI 405 can include vectors of authors, sources, and other of Ihe profile if, for example, an author's average rating for a particular M is not "significantly" different from the average for the author across all Ms. Here, "significance" is used is in a statistical sense, and frameworks such as the and concepts that help lo determine the degree lo which A on|y two of the above lenns: features of informon A that may be influential in determining 5 Minimum Description Length (MDL) Principle can be used the degree to which A falls into the categories in a given M. to determine when to store or use a more "local" component UFI 410 can include vectors of important words, phrases, of Ihe IRP. As a simple example, the following IRP employs falls into a given M. Vectors can exist for different canonical parts of A. For example, individual vectors may be provided 10 IRP (author)=weightcd sum of avcraSc (MM& 6'vcn thls author s° far ln lhis M) for subject/headings, content body, related information in . u, aXera.ge (r*"n8?, givcn a.U ^otsusotlir m thls M) a posinve and negative vector exists for each canonical par,. £**$£ T^S^I^^Sl complex other referenced informons, and the like. It is preferred that ™* 2 Sives Rvalues attained for the four new articles, CI 415 is received from other Us who already have seen neura| £, h ^ be used How , ^^ A and have rated it. The input used for CI 415 can include, ,5 me(hod> usef£ffor ,his examplej fe simply t0 repeat lhe same for example, "inlereslingness, credibility, funmness, con- ^ (hat was ^ for ,he ,Rp bul> ms,ead of predicting tent value, writing quality, violence content, sexual content, profanity level, business importance, scientific merit, the rating> it ^ preferred to predict the squared-error, given the feature vector. The exact square-error values can be used surprise/unexpectedness of information content, artistic as the informon weights, instead of using a rating-weight quality, dramatic appeal, entertainment value, trendiness/ 20 lookup table. A more optimal mapping function could also importance to future directions, and opinion agreement. be computed, if indicated by the application. Each CFECU 420 is a unit that can detect sets of specific feature combinations which are exceptions in combination. For example, author X's articles are generally disliked in lhe Z for woodworking, except when X writes about lathes. 25 Token 1 When an informon authored by X contains the concept of Token 2 Token 3 Token 4 16.68 8.73 12.89 11.27 [RPpos.vec.ot "lathes," then the appropriate CFECU 420 is triggered to signal that this is an exception, and accordingly a signal is sent to offset the general negative signal otherwise triggered [rp nCg. vector —. woodworking L. made tf) InformaUon-Type, and other fields previously identified to other ^ u fa , difficull exemplary SH below, accounts only for the Author field. rcgarding which uscrs alrcad , „, . 15.20 8.87 . 4.27 , . 5.04 ... , because of the genera, dllike fork's informons^ the 30 ^^^^ff^l^^^S^ • ,Tr- w«mP ' * ?"? °a £lnictu5ed Fealure lDjolma' t.on (SFI) 405 can include fields such as Author, Source, , a sjmjiarj,y measure, and then a mapping function can be used ^^ an UP. to get collaboralive Maki e£fective be of particular value m the analysis^ For simplicity the 35 seven issues FJ For this example, assume three authors A, B^and C have ,here £ t (CI) from b]cm becausc ofIheVfoU'Owing „ fa nQ , iorj knowled * will have r[tcd an informln ^iorl making a prediction for a user U, who hasn't yet collectively subm.tted 10 angles that have been read and have been rated as m TABLE 1 (followmg the text of this read Momon A^wtoK> a moM for diclion mus/be ationa] no maller which subseI of J j ,s h t0 spec.ficat.on). In the accompanying rating scheme, a rating 40 ^available, if any, at a given time. Second, computational can vary between 1 and 5, will, 5 indicating a "most interesting article^ If four new articles (11,12,13,14) arrive cffid musl b/maintSned in H hl of a polenl^uy very , se, of users and informons. ^ increFmenlai u;dal/s B C and a new author D has con.nbu.ed, a simple IRP for r orled^om users r that have no. yet been rated, and, ,n addition .0 authors A, o(%{i edictions oflen are desired as more feedbPack is di an informon Fourlh in leam. lhe Author field, that just takes sums of the averages, would 45 in^ good models for ^akin| raling fictions, only very as toiiows. IRP (author)-weigbted sum of sparse data typically is available for each users rating of each document. Thus, a large "missing data" problem must be average (ratings given the author so far) average (ratings given the author so far in this M) average (ratings given all authors so far in this M) average (ratings given all authors) dealI wjlh effectjveiy Fifth mos, ,entia, so]u(ions (o ,he c, Mcm ifc so indepenc|ence assumptiOns that, when grossly violated, give very poor resuhs average ratings given the author so Tar by a particular user U)* M m e ,e of an indcpcndcnce assUmption violation, assume that ten users of a collaboralive fii,e|.jng sysiem) called the "B-Team," always rale all average (ratings given the aulhor so far m this M by a artkles exact|y in ,he samc wa% for examp,e> because ,hey particular user U) 55 think very much alike. Further assume that user A's ralings particular user U) correlated with user C at the 0.9 level. Now, suppose user C (if for a personal Z) i( is reasonabk lo predicl ,hal A-s raling ako mighl ^ a ..5.. average (ratings given all authors so far in Ihis M by a average (ratings given all authors by a particular user)* are „,„.!„„, wiln lhe B.Tcam a, lhc 0 5 levelj and are rcads an arlide and rates ;, a ..5.. Bascd on |ha, c>s ra(ingi Ilie purpose of the weighted sum is to make use of broader, 60 Furlher_ su ^,se tllaI a mernber of the B-Tcam reads the more general statistics, when strong statistics for a particular user reading an informon by a particular aulhor, within a particular Z may not yet be available. When stronger slalistics arc available, the broader terms can be eliminated by artic|c an(, ratcs i( , ..,„ Exis|j collaborativc filtcri methods are hkc| |0 dfcl ,ha, A>s ralj R sub A wou|d be. that used for creating CWFs 430, for the profiles as a wholei In principle, if other members of the B-Team then read and Some of the averages may be left out in the actual storage rate lhe article, it should not affect the prediction of A's using smaller weights. This weighting scheme is similar 10 65 «^-(0.9x?»o.5x2)/(0.9+o.5)-3.93 US 6,775,664 B2 17 18 rating, R^ub.A, because it is known that other B-Team members always rate the article with the same value as the first member of the B-Team. However, the prediction for A by existing collaborative filtering schemes would tend to give 10 times the weight to the "2" rating, and would be: comparing the similarity between the profiles. Next, the similarity of the member client profiles and informon con tent profiles can be compared. A neural network could be used to learn how to compare profiles so that the error in predicted ratings is minimized. However, the invention can be embodied with use of the invention can be embodied with Kv(0-9*5+10x0.5x2)/(0.9+10x0.5),.2.46 use of a simple cosine similarity metric, like that previously considered in connection with Unstructured Feature Infor Existing collaborative filtering schemes do not work well in this case because B-Team's ratings are not independent, and have a correlation among one another of 1. 'Hie information mation (UFI) can be used. filter according to the present invention can recognize and compensate for such inter-user correlation. Sixth, information about the community of people is known, other than each user's ratings of intbrmons. This information can include the present topics the users like, what authors the users like, etc. This information can make the system more effective when it is used for learning 15 frequencies that can be multiplied in the standard token frequency comparison method, which would be recognized by a skilled artisan. stronger associations between community members. For example, because Users A and B in a particular community M have never yet read and rated an informon in common, no correlation between their likes and dislikes can be made, based on common ratings alone. However, users A and B have both read and liked several informons authored by the same author, X, although Users A and B each read a distinctly different Zs. Such information can be used to make The method used preferably includes more than just the tokens, such as the author and other SFI; and, it is preferred that the three vectors for component also are able to be compared. SFIs may be handled by transforming them into an entity that can be treated in comparable way to token Continuing in the ongoing example, the Author field may 20 be used. Where user A and user B have rated authors K and L, the token frequency vector may appear as follows: 25 the inference that there is a possible relationship between user A's interests and user B's interests. For the most part, existing collaborative filtering systems can not take advan tage of this knowledge. Seventh, information about the informon under consider 30 ation also is known, in addition to the ratings given it so far. For example, from knowing that informon A is about the concept of "gardening", better use can be made of which Further, the author component of the member client profiles users' ratings are more relevant in the context of the inforof user A and user B may be compared by taking a special mation in the informon. If user B's rating agrees with user 3S weighted correlation of each author under comparison. In D'sralingofarticleswhenthesubjectisabout"politics",but general, the weight is a function F of the sample sizes for B's ratings agree more with user D when informon A is user A's and user B's rating of the author, where F is the about "gardening", then the relationship between User B's product of a monotonically-incrcasing function of the ratings and User D's ratings are preferred to be emphasized sample size for each of user A and user B. Also, a simple to a greater extent than the relationship between User B and 4U function G of whether the informon D is by the author or not User C when making predictions about informon A. is used. This function can be: G=q if so, and G=p<q if not, With regard to the aforementioned fourth, sixth and where p and q are optimized constraints according to the seventh issues namely, making effective use of sparse, but domain of the filtering system. When there has been no known, information about the community and the informon, rating of an author by a user, then the function of the zero it is possible to determine the influence of user A's rating of 45 sample size is positive. This is because the fact that the user an informon on the predicted rating of the informon for a did not read anything by the author can signify some second user, B. For example, where user A and user B have indication that the author might not produce an informon read and rated in common a certain number of informons, which would be highly rated by the user. In this case, the the influence of user A's rating of informon D on the exact value is an increasing function H of the total articles predicted rating of informon D for user B can be defined by 50 read by a particular user so far, because it becomes more a relationship that has two components. First, there can be a likely lhat the user is intentionally avoiding reading infor common "mindset," S.sub.M, between user A and user B mons by that author with each subsequent article that has and informon D, that may be expressed as: been read but is not prepared by the author. In general, the exact weighting function and parameters can be empirically Afj-profileH)xprofile(B)xDocumcntProfilc(D). 5J derived rather than theoretically derived, and so is chosen by the optimization of the overall rating prediction hinclions. Second, a correlation may be taken between user A's past ratings and user B's past ratings with respect to informons that are similar to D. This correlation can be taken by weighting all informons E lhat A and B have rated in common by the similarity of li to D, S.ED: Continuing in the present example, a correlalion can be computed with the following weights for Ihe aulhors K, I. and M. 60 S/ro=Wcightcd_Corrclation (ratings (A), ratings (B)) bach of the examples can be weighted by Author W^-weight for rating pair(rating A,D), rating (B,D)) =DoaimcntProfile(E)xDocumentProfilc (D) Note that the "X" in the above equation may not be a mere multiplication or cross-product, but rather be a method for 65 Weight F(21,l, nut author) log(2l + 1) x log(l author) - 0.04 1) x G(not US 6,775,664 B2 20 19 develop more complex composite informon qualities, and more sublime mindpools. FIG. 7 illustrates a preferred embodiment of a mindpool system 500. It is preferred that all users be members of the uppermost portion of the hierarchy, namely, the top mindpool 501. Mindpool 501 can be broken into sub-mindpools 502a-c, which separate users into those having at least some common interests. Furthermore, each sub-mindpool 502a-c can be respectively broken into sub-sub-mindpools 503a-6, -continued Author Weight F(5,7, author or D) log(5 + 1) x log(7 + 1) x G(aulhor) - 0.70 F(0.2, not author) • H(26) x log(2 + 1) x G (not author) - 0.02 It is preferred that the logarithm be used as the monotonically-increasing function and that p»l, q=0.1. Also used are H=log(samplc_sizc*0.1) and an assumed rating, for those authors who are unrated by a user, to the value of "2." The correlation for the author SFI can be mapped to a non-zero range, so that it can be included in the cosine similarity metric. This mapping can be provided by a simple one-neuron neural network, or a linear function such as, (correlation+l)*P0. Where the Po is an optimized parameter used to produce the predicted ratings with the lowest error in the given domain for filtering. An artisan skilled in information retrieval would recog nize that there are numerous methods that can be used to effect informon comparisons, particularly document com parisons. One preferred method is to use a TF-1DF weight ing technique in conjunction with the cosine similarity metric. SFI including author, can be handled by including them as another token in the vector. However, the token is preferred to be weighted by a factor that is empirically optimized rather than using a TF-IDF approach. Each com ponent of the relationship between user A's and user B's can be combined to produce the function to predict the rating of informon D for user B. The combination function can be a simple additive function, a product function, or a complex 10 15 20 25 30 503c-rf, S03e,f,g to which users 504a-g are respective members. As used herein, mindpool 501 is the parent node to sub-mindpools 502a-c, and sub-mindpools 502o-c arc the respective parent nodes to sub-sub-mindpools 503a-g. Sub-mindpools 502u-c arc the child nodes to mindpool 501 and sub-sub-mindpools 503a-g are child nodes to respective mindpools 503o-c. Sub-sub-mindpools 503a-g can be con sidered to be end nodes. Users 505a,b can be members of sub-mindpoul 502a, 502c, if such more closely matches their interests than would membership in a sub-submindpool 503a-g. In general, the objective is to break down the entire population of users into subsets that are optimally similar. For example, the set of users who find the same articles about "gardening" by author A to be interesting but nevertheless found other articles by author A on "gardening" to be uninteresting may be joined in one subset. A processing means or mindpool manager may be used to handle the management of each of the mindpools 501, 502a-c, and 503a-g. A mindpool manager performs the following functions: (1) receiving rating information from child-node mindpool managers and from those users coupled directly to the manager; (2) passing rating informa tion or compiled statistics of the rating information up to the manager's parent node, if such exists; (3) receiving estima tions of the mindpool consensus on the rating for an infor function, including, for example, a neural network mapping function, depending upon computational efficiency con- 35 mon from the manager's parent mindpool, if such exists; and straints encountered in the application. Optimization of the (4) making estimations of the mindpool consensus on the combination function can be achieved by minimizing the rating for a specific informon for the users that come under predicted rating error as an objective. the manager's domain; and (5) passing the estimations from In addition to determining the relationship between two function 4 down to either a child-node mindpool or, if the user's ratings, a relationship that can be used and combined 40 manager is an end node in the hierarchy, to the respective across a large population of users can be developed. This user's CWF, for producing the user's predicted rating. relationship is most susceptible to the aforementioned first, Function 4 also can include combining the estimations second, third, and fifth issues in the effective use of col received from the manager's parent node, and Uncertainty laborative input. Specifically, the difficulty with specifying a Predictions can be estimated based on sample size, standard user rating relationship across a large population of users is 45 deviation, etc. Furthermore, as alluded to above, users can compounded by the lack of a priori knowledge regarding a be allowed to belong to more than one mindpool if they large volume of dynamically changing information that may don't lit precisely into one mindpool but have multiple fit have unexpected correlation and therefore grossly violate views regarding the conceptual domain of the informon. independence assumptions. Also, it is preferred that lateral communication be provided In one embodiment of the present invention, it is preferred 50 between peer managers who have similar users beneath that users be broken into distributed groups called "mind them to share estimation information. When a rating comes pools." Mindpools can be purely hierarchical, purely in from a user, it can be passed lo the immediate manager(s) parallel, or a combination of both. Mindpools can he similar node above that user. It is preferred that the manager(s) first lo the aforementioned "community" or may instead be one decide whether the rating will effect its current estimation or of many subcommunities. These multiple hierarchies can be 55 whether the statistics should be passed upward to a parentused to represent different qualities of an article. Some node. If the manager estimation would change by an amount qualities that can be maintained in separate hierarchies above an empirically-derived minimum threshold, then the include: interest ingncss; credibility; funnincss; valuableness; writing quality; violence content; sexual content; pro manager should pass that estimation down to all of its child-nodes. In the event that the compiled statistics are fanity level; business importance; scientific merit; artistic (>o changed by more than another minimum threshold amount, quality; dramatic appeal; entertainment value; surprise or then the compiled statistics should be passed to the manag unexpectedness of information content; trendiness or impor er's parent-node, if any, and the process rccurscs upward and tance to future directions; and opinion agreement. Each of downward in the hierarchy. these qualities can be optionally addressed by users with a Because no mindpool manager is required lo have accu rating feedback mechanism and, therefore, these qualities 65 rate information, but just an estimation of the rating and an can be used to drive separate mindpool hierarchies. Also, the uncertainty level, any manager may respond with a simple qualities can be used in combinations, if appropriate, to average of all previous documents, and with a higher degree US 6,775,664 B2 21 22 of uncertainty, if none of its child-nodes has any rating information yet. The preferred distributed strategy tends to production of other uncertainty combination methods described above. Approximations can be made by precomputing all terms thai do not change significantly, based on the particular informon, or the subset of actual ratings given so far lo the mindpool manager. As slated previously, the correlaled-feature error-correclion units (CFECUs) are intended to detect irregularities or statistical exceptions. Indeed, Iwo objectives of the CFECU units are to (1) find non-linear exceptions to the general structure of the three aforementioned types of inputs (SFI, UFI, and CI); and (2) find particular combinations of informon sub-features that statistically stand out as having special structure which is not captured by the rest of the general model; and (3) trigger an additional signal to the CFECU's conditions are met, in order to reduce prediction error. The following exemplifies reduce the communication needed between processors, and the computation tends to be pooled, thereby eliminating a substantial degree of redundancy. Using this distributed strategy, the estimations tend to settle to the extent that the updating of other nodes, and the other users predictions are minimized. Therefore, as the number of informons and users becomes large, the computation and prediction updates grow as the sum of the number of informons and the number of users, rather than the product of the number of informons and the number of users. In addition, incremental updates can be accomplished by the passing of estimations up and 10 down the hierarchy. Incremental updates of rating predic tions continue to move until the prediction becomes stable 15 the CFECU operation: due to the large sample size. The distributed division of users can reduce the effects of independent assumption violations. In the previous example with the B-Team of ten users, the B-Team can be organized as a particular mindpool. With the additional ratings from each of the B-Tcam members, the estimation from the B-Team mindpool typically does not change significantly because of the exact correlation between the members of that mindpool. This single estima tion then can be combined with other estimations to achieve the desired result, regardless of how many B-Team members have read the article at any given lime. The mindpool hierarchies can be created by either computer-guided or human-guided methods. If the hierarchy creation is human-guided, there often is a natural breakdown of people based on information such as job position, com 30 User B's Number of mon interests, or any other information that is known about tnformons Read Aboul them. Where the mindpool hierarchy is created Average over automatically, the previously described measure of the col Gardening Politics Topics laborative input relationship between users can be employed in a standard hierarchical clustering algorithm to produce 35 7 Author A's 40 1.69 Articles 1.84 70 200 each group of users or nodes in the mindpool hierarchy. Other Authors Such standard hierarchical clustering algorithms can include, for example, the agglomerative method, or the divide-and-conquer method. A skilled artisan would recog In Ihis example, it is desired thai author A's informon D nize that many other techniques also are available for 40 about gardening have a high predicted rating for user B. However, because Ihe average rating for aulhor A by user B incrementally-adjusting the clusters as new information is is only 1.69, and the average rating for the gardening collected. Typically, clustering is intended to (1) bring concept is only 1.68, a three-part model (SFI-UFI-CI) that together users whose rating information is clearly not inde does not evaluate the informon features in combination pendent; and (2) produce mindpool estimations that are 45 would tend lo not rank informon D very highly. In this case, substantially independent among one another. the first CFECU would first find sources of error in past Estimations are made in a manner similar to other esti examples. This could include using the three-part model mations described herein. For example, for each user or against the known examples that user B has rated so far. In sub-mindpool (sub-informant), a similarity between the subthis example, seven articles that user B has rated, have an informant and the centroid of the mindpool can be computed average rating of 4.S, though even the Ihree-part model only in order to determine how relevant the sub-informant is in so predicts a raiing of about 1.68. When such a large error computing the estimation. Uncertainty estimators also are appears, and has statistical strength due to the number of associated with these sub-informants, so that they can be examples with the common characteristics of, for example, weighted with respect to their reliability in providing the the same aulhor and topic, a CFECU is created lo idenlify most accurate estimation. Optionally, the informon under thai this exception lo Ihe Ihree-part model has been triggered evaluation can be used to modulate the relevancy of a 55 and thai a correction signal is needed. Second, it is preferred sub-informant. This type of evaluation also can lake advan to index the new CFECU into a database so thai, when tage of the Iwo previously-determined collaborative infor triggering features appear in an informon, for example, mation relationship components, thereby lending lo magnify aulhor and topic, the correction signal is scm into the relationships that are stronger for particular types of infor appropriate CWF. One method which can be used to effect mons than for others. Once a suitable set of weights are 60 the first step is a cascade correlation neural network, in established for each user within a mindpool for a particular which the neural net finds new connection neural net units informon, a simple wcightcd-avcragc can be used to make to progressively reduce the prediction error. Another method the estimation. It is preferred that the "simple" weighled is lo search through each informon lhal has been rated but average used be more conservative regarding input infor whose predicted rating has a high error, and storing the mation that a simple independent linear regression. Also, the 65 informons profile. overall Uncertainty can be derived from the Uncertainty When "enough" informons have been found with high Predictions of the sub-informants, in a manner similar to the error and common characteristics, the common characters- US 6,775,664 B2 23 24 tics can be joined together as a candidate for anew CFECU. Next, the candidate can be tested on all the samples, whether When a user makes a query for which a wire already exists, wire search results are preferably returned instead of they have a high prediction or a low prediction error regular search engine results. As shown in the logic diagram than a minimum threshold level, the CFECU can be added block 26C reUlms results from ,he existi associated with them. Then, the overall error change (reduction or increase) for all of the examples can be 5 computed to determine ir the CFECU should be added to the informon profile. If the estimated error reduction is greater to the profile. As successful CFECU are Covered for users profiles, they also can be added to a database of CFECU's that may be useful for analyzing other profiles. If of FIG. 7, a user provides a query as indicated by block 20C. nje query is applied to a Lookup Table, as indicated by block 22C, block 24C applies a test to determine from the table wnethcr a wire already exists for the new query. If so, wfre otherwise, bIock 28C commands , demand ^ bby a regular query . ?,,•■! L r • u u a particular CFECU has a sufficiently broad application, it can be moved up in the filtering process, so that it is . WlIh lhe f5 of Wlre se,arch relu1rns> eafh uf ™ rev>ew lhc. rctu,rn,cd rcsulls and Pr0Vldc fccdback dala about can be included in the representation that is computed in the the hller Prollles Xlsed m Processing informons for the wire, be made by taking the average of those informons for which the CFECU responds. Also, the Uncertainty can be chosen such that the CFECU signal optimally outweighs the other tion of previous users' feedback data. By analyzing documents which users rale as meeting a particular quality such as 1 interestingness, the system can find common document computed for every entity once. Also, the particular CFECU reviewed documents. Such feedback data is incorporated in pre-processing stage as a new feature. In general, the esti- 15 Therefore, when a future user makes substantially the same mation of the predicted rating from a particular CFECU can querv.lne wire wi" havc bee" improved by the incorpora- signals being sent to the CWF. One method of self- 20 features which can be used to return more like documents to optimization that can be employed is, for example, the future users who make substantially the same query, gradient descent method, although a skilled artisan would Alternatively, all queries applied to a search engine sys- recognize that other appropriate optimization methods may tern of the invention can set up new wires. After a search be used. query is presented to the search engine system, a wire is The invention of this continuation-in-part application, as 25 created on the basis of the query terms, and all new shown in FIGS. 8 and 9, provides a collaborative and documents subsequently received from the network are preferably adaptive search engine system in which elements filtered by the new wire. A push-model may be used to send of the structure and principles of operation of the apparatus all passed, new documents to the user, of FIGS. 1-7 are applied. Accordingly, a search engine Among other basic search engine system structures, an system of the invention, as preferably embodied, integrates 30 integrated system can be employed in which collaborative collaborative filtering with adaptive content-based filtering and content-based filtering is structured to provide demand to provide improved search engine performance. The aerosearches with or without collaborative filtering, or wire nym "CASE" refers to a search engine system of the searches. In the operation of the preferred basic structure and invention, i.e., a collaborative, adaptive search engine. other basic structures, a query processor can be employed, if In the operation of conventional search engines at portal 35 needed, to make search-type assignments for user queries, web sites, user queries are searched on demand to find Generally, basic search engine system structures of the relevant informons across the web. Content-based filtering is invention arc preferably embodied with the use of a protypically used in measuring the relevancy of informons, and grammed computer system. the search results are resented in the form of a list of Collaborative filtering employs additional data from other informons ranked by relevancy. 40 users to improve search results for an individual user for The present invention combines collaborative filtering with content-based filtering in measuring informons for relevancy, and further preferably applies adaptive updating of the content-based filtering operation. In providing these whom a search is being conducted. The collaborative data can be feedback informon rating data, and/or it can be content-profile data for agent mind melding which is more fully disclosed in Serial Number (Docket # LYC 4), entitled results, the invention can be embodied as a search engine 45 INTEGRATED COLLABORATIVE/CONTENT-BASED system in accordance with different basic structures. In the FILTER STRUCTURE EMPLOYING SELECTIVELY presently preferred basic structure, an integrated SHARED, CONTENT-BASED PROFILE DATA TO collaborative/content-based filter (FIGS. 1-7) is operated to EVALUATE INFORMATION ENTITIES IN A MASSIVE provide ongoing or continuous searching for selected user INFORMATION NETWORK, filed by the current inventors system. A suitable analysis is applied to the search engine connection with FIGS. 1-7. queries, with a "wire" being established for each query. On 50 on Nov. 19, 1998, and hereby incorporated by reference, the other hand, a regular search engine Ls operated to make Many types of user rating information can be used. For immediate or short-term "demand" searches for other user example, users can sort documents which they have read queries on the basis of content-based filtering. This basic from best to worst. Alternatively, users can select on a scale structure of the invention is especially beneficial for use in (numeric, such as 1 to 10, or worded, such as good, medium, applying the invention to existing search engine structure. 55 poor) how much they enjoyed reading a document. Further, Demand search results can be returned if no wire exists user monitoring can measure lime spent by users on each for an input query. Otherwise, wire search results are document, thereby indicating user interest (normalized by returned if a wire docs exist, or collaborative ranking data document length). Among other possibilities, the choices of can be applied from the wire filter structure to improve the documents for reading by other users can be simply used as results of (he demand search from the regular search engine. 60 an indication of interesting documents. In all cases, the In the currently preferred embodiment, wires are created feedback rating data can be based on interestingness or any for the most common queries received by the search engine of a variety of other document qualities, as described in operations to determine which queries are most common, Feedback ranking information can be used in a number of and respective wires are then created for each of these 65 ways, and the invention is not limited by the method of queries. An analysis update can be made from time to time feedback information use. Use methods range in spectrum to make wire additions or deletions as warranted. from weighting relative ranks by a set amount (possibly US 6,775,664 B2 25 26 equally, possibly heavy weighting one above the other) to dynamically adjusting the weight by measuring how stalls- the processor 48C rates informons returned by the spider system 46C in a demand search as indicated by the reference lically significant the user feedback is. For example, if only character 48C2. Collaborative rating data is used in the one person has ranked an article, it may not be significant. However, if many people have consistently ranked an article 5 the same, more credibility may be placed on the user's informon rating process in the wire search mode, and if applied in the demand search mode, to the extent that collaborative data is available for the informons in the weighting. search return. Search results are returned to the users 34C FIG. 9 shows a generalized embodiment of the invention and 36C from the ^^.j, relum processor 4«C as shown in in which system elements in a CASE system 30C are pjQ g integrally configured to provide wire and/or demand 10 ' '. . . , . searches. A quer? processor 32C receives queries from an Tte lnvenllon » Pref"abl>' embodied as s^own in ."<?• individual user 34C and other users 36C. A mode selector 10 A °-uery processor 60C receives queries from an inch- already exists, and wires exist only for those queries found to be commonly entered as previously described. In the continuously scans a network 70C for informons providing a threshold-level match for conlenl based profiles (i.e., 38C responds to the currently processed query to set a vidual user 62C and other users 64C and determines whether content-based filler structure 40C for wire search operation » w're already exists for each entered query. If a wire exists, or demand search operation. In the preferred application of 15 <be query is routed to a collaborativc/content-based filter the invention, the wire mode is selected only if a wire structure 66C lice that of FIGS. 1-7. A spider system 68C demand search mode, the filter structure 40C can function preprocessing profiles at the top level of the preferred similarly to a normal search engine. 20 multi-level filter structure, at least one of which reflects the Otherwise, various schemes can be used for determining whether a wire search or a demand search is made. For example, every query can call for a wire search, with a content profile of a current wire query). Informons which are passed by the filter 66C for existing wires are stored in a mernOry 72C according to ihe wire or wires lo which they demand search being made the first time a particular query belong is entered and with wire searches being made for subsequent 25 e entries of the same query. As another example, the user may <•„„ _ select a demand search, or, if continuing network searching •, ■■ • ■ <• . , . , sysIem of nG- 7 '° P™"* collaborative feedback data for is desired, the user may select a wire search. The filter structure 40C operates in its set wire search integration with the content-based data in the measurement of informon relevancy by the filter 66C. An informon rating mode or demand search mode, arid employs content-based 30 structure like that of FIG. 6 is employed for this purpose, profiles 42C in content-based filtering (preferably mulliAdaptive feedback data is applied from the users to the filter level as described in connection with FIGS. 1-7). Wire profilcs 42C I arc adaptivcly updated with informon 66C as shown in order to update content profiles as previously described. evaluation, feedback data from users respectively associated if no wire exists for a currently input query, the query is therewith. These profiles are used by the filter structure 40C 35 ^m Xo a regular search engine where a content profile is in wire searches in the wire mode. established for content based filtering of informons returned in demand searches in the demand mode. Collaborative 70c The Demand profiles 42C2 arc used by the filter structure 40C profile data can be integrated with the wire profiles through agent mind melding 43C as previously explained. A spider system 46C scans a network 44C to find infor by a spider svs|em 78C in a demancl ^^ of lhe network 40 ider s ,em 78C can have its own me ,em 78CM M considered in connection with the spider /,q - p.Q „ mons for a current demand search, and to find informons Once filtering is performed on returned informons, those with continued network scanning for existing wires. In informons which provide a satisfactory match to the query selecting available informons for return, the spider system are returned as a list to lhe user through a search reiurn 46C uses a content threshold derived from the content based 45 processor 80C. The processor 80C creates a new wire for the profile for which an informon search is being conducted. current query for which a demand search was made, if a In many instances, it s preferable that the spider system demand search memory 82C indicates that the current query 46C have a memory system 46CM which holds an informon has been made over lime with sufficient frequency to qualify data base wherein index information is stored from inforas a "common" query for which a wire is justified. As mons previously collected from the network. In this manner, 50 indicated by dashed connector line 80TD, collaborative demand searches can be quickly made from the spider feedback data can be, and preferably is, integrated into the memory 46CM as opposed to making a time consuming demand search processing by Ihe processor 80C. search and downloading in response to a search demand query from the search engine. Many alterations and modifications may be made by those having ordinary skill in the art without departing from the A search return processor 48C receives either demand 55 spirit and scope of the invention, lliereforc, it must be search informons or wire search informons passed by the understood thai the illustrated embodiments have been set conieni-based filler structure 40C according to the operating forth only for the purposes of example, and thai it should not mode of the latter, and includes an informon rating system be taken as limiting the invention as defined by the following which is like that of FIG. 6. The informon rating system claims. The following claims are, therefore, to be read lo combines content-based filtering data with collaborative 60 include not only Ihe combination of elements which are feedback rating data, from users through a feedback proccsliterally set forth but all equivalent elements for performing sor 50C at least in the wire search mode and, if desired, in substantially the same function in substantially the same lhe demand search mode. way to obtain substantially the same result. The claims are In the wire search mode, the processor 48C rales inforthus to be understood to include what Ls specifically illusmons on a continuing basis as they arc received from Ihe 65 trated and described above, what is conceptually equivalent, network 44C through the spider system 46C as indicated by and also what incorporates the essential idea of the inventhe reference character 48C1. In Ihe demand search mode, tion. US 6,775,664 B2 27 28 Ihe delivered information to determine relevance to the at least one of the first user and the query. 13. The search system of claim 10 wherein the feedback response further comprises information rating data. 14. The search system of claim 13 further comprising a ranking module to apply a weight to Ihe information rating TABLE 1 Article Author Rating Given data. 15. The search system of claim 13 wherein the informa tion rating data is dynamically adjusted by measuring sta 10 tistical significance of the information. 16. A web portal comprising the search system of claim 1 for providing information relevant to a query from the first user. 17. 'ITie search system of claim 1 wherein the plurality of users comprises at least one mindpool of users. 25 What is claimed is: 1. A search system comprising: a scanning syslem for searching for information relevant to a query associated with a first user in a plurality of 18. The search system of claim 17 wherein users in the mindpool of users are grouped into at least one distributed group. users; a feedback system for receiving information found to be 19. The search system of claim 17 wherein the mindpool 30 of users comprises at least one of a distributed group of users having a hierarchical structure, a distributed group of users having a parallel structure, and a distributed group of users having a combination of a hierarchical structure and a relevant to the query by other users; and a content-based filter syslem for combining Ihe informa parallel structure. tion from Ihe feedback system with the informalion 20. The search system of claim 1 wherein the contentfrom the scanning system and for filtering the com 35 based filler system filters the combined information relevant bined information for relevance to at least one of the to bolh the query and the first user. query and the first user. 21. The search system of claim 1 wherein the conlenibascd filter syslem filters by extracting features from the information. 22. The search system of claim 21 wherein the extracted 2. 'Ihe search syslem of claim 1 wherein the conleni-bascd filter system filters informalion on a continuing basis. 3. The search system of claim 1 wherein the information comprises an informon. 4. The search engine of claim information relevant to at least one query is used to anticipate a future 5. The search syslem of claim information is an advertisement. 1 wherein the filtered of the first user and the query by the first user. 1 wherein Ihe filtered features comprise content dala indicative of the relevance to the at least one of ihe query and ihe user. 23. The search system of claim 22 wherein the content 45 lion obtained from the information received from ihe feed 6. The search system of claim 1 further comprising an information delivery system for delivering the filtered infor back syslem. 24. The search system of claim 1 wherein the scanning mation to the first user. 7. The search syslem of claim 1 furl her comprising feedback communication means for delivering information data indicative of the relevance to the at least one of the query and the user comprises specific elements of informa 50 syslem further comprises scanning a network upon a demand search request. 25. The search system of claim 22 wherein the search lo at least one of the olher users. syslem applies adaptive user feedback dala to Ihe content- 8. The search syslem of claim 7 wherein ihe information delivered to the at least one of the other users further based filter syslem lo provide a learning component for the comprises the filtered information. content profile dala. delivered to the at least one of the olher users further user comprising: 26. A method for obtaining information relevant lo a first 9. The search system of claim 7 wherein the informalion searching for information relevant to a query associated with a first user in a plurality of users; comprises a feedback query. 10. The search system of claim 9 wherein the information received by Ihe feedback system found to be relevant to the query further comprises a feedback response lo the feedback query. 11. The search system of claim 7 further comprising a monitor for measuring time spent by the at least one of the other users accessing the delivered informalion. 12. The search system of claim 11 wherein the contentbased filter system uses the measured lime spenl accessing oo receiving information found to be relevant to the query by olher users; combining ihe informalion found lo be relevant to the query by other users with the searched information; and content-based filtering the combined information for rel evance to at least one of ihe query and the first user. 27. The method of claim 26 further comprising the step of filtering informalion on a continuing basis. US 6,775,664 B2 29 28. The method of claim 26 further comprising the step of 30 34. The method of claim 33 further comprising the step of using the measured lime spent to determine relevance of the delivering the filtered information to the first user. 29. The search system of claim 26 further comprising the at least one of the first user and the query. step of delivering information to at least one of the other users. s 30. The search system of claim 29 wherein the informa tion delivered to the at least one of the other users further comprises the step of delivering the filtered information to the at least one of the other users. 31. The method of claim 29 further comprising the step of 10 providing a feedback query to the at least one of the other users. 32. The method of claim 31 wherein the receiving of information found to be relevant to the query further com prises the step of receiving a feedback response to the is feedback query. 33. The method of claim 29 further comprising the step of measuring lime spent by the at least one of the other users accessing the delivered information. 35. The method of claim 32 further comprising the step of receiving a rating of the relevance of the delivered infor mation to the query. 36. The method of claim 26 further comprising the step of grouping at least two users in the plurality of users into a mindpool. 37. The method of claim 36 further comprising the step of grouping the mindpool users into at least one of a hierar chical structure, a parallel structure, and a combination of a hierarchical structure and a parallel structure. 38. The method of claim 26 wherein the searching step comprises scanning a network in response to a demand search for the information relevant to the query associated with the first user.

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?