I/P Engine, Inc. v. AOL, Inc. et al
Filing
1
COMPLAINT against AOL, Inc., Gannett Company, Inc., Google, Inc., IAC Search & Media, Inc., Target Corporation ( Filing fee $ 350 receipt number 14683024186.), filed by I/P Engine, Inc. (Attachments: # 1 Exhibit A, # 2 Exhibit B, # 3 Receipt, # 4 Civil Cover Sheet, # 5 Letter)(ecav, )
EXHIBIT B
US006775664B2
(12) United States Patent
(io) Patent No.:
(45) Date of Patent:
Lang et al.
(54)
(75)
INFORMATION FILTER SYSTEM AND
METHOD FOR INTEGRATED CONTENTBASED AND COLLABORATIVE/ADAPTIVE
FEEDBACK QUERIES
(*)
Notice:
Oct. 22, 2001
Prior Publication Data
5,649,186 A
5,734,893 A
Int. Cl.7
C06F 17/30
707/3; 707/5; 707/1; 707/10;
707/2
707/1, 3, 5, 10,
707/2
(56)
References Cited
U.S. PATENT DOCUMENTS
5,019,961 A
5/1991
5,117,349 A
5,249,262 A
5/1992 Tirfingelal
9/1993 Baule
5.471.610A
11/1995
7/1997
3/1998
Ferguson
Li et al
♦
348/7
707/10
707/4
11/1998
Miller et al
•
2/1999
I-ang el al
707/2
707/1
*
3/1999
Robinson
705/27
(List continued on next page.)
OTHER PUBLICATIONS
Primary Examiner—John E. Breene
1998, now Pal. No. 6,314,420, which is a conlinualion-inpait of application No. 08/627,436, filed on Apr. 4, 1996,
now Pat. No. 5,867,799, application No. 10/045,198, and a
continualion-in-part of application No. 09/195,708, filed on
Nov. 19,1998, now Pat. No. 6,308,175, which is a continu
alion-in-part of application No. 08/627,436, filed on Apr. 4,
1996, now Pat. No. 5,867,794.
Field of Search
Farry el al
Assistant Examiner—Kucn S. Lu
Continuation of application No. 09/204,149, filed on Dec. 3,
(58)
395/149
3/1997
(List continued on next page.)
Related U.S. Application Data
U.S. Cl
395/149
Yaksich et al
5,608,447 A
US 2002/0120609 Al Aug. 29, 2002
(52)
Yaksich et al
10/1996
5.842.199A
707/3
364/419.19
Persin, "Document Filtering for Fast Ranking", Proceedings
of the Seventeenth Annual International ACM-SIGIR Con
ference on Research and Development in Information
Retrieval, Jul. 6, 1994 pp. 339-348.
Appl. No.: 10/045,198
(51)
10/1996
5.563.999 A
Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
U.S.C. 154(b) by 467 days.
(63)
Henderson el al
5.563.998 A
Lycos, Inc., Waltham, MA (US)
(65)
Arnram el al
8/1996
5,884,282 A
Assignee:
Filed:
7/1996
5.544,049 A
5,867,799 A
(73)
(22)
Aug. 10,2004
5,537,586 A
Inventors: Andrew K. Lang, Pittsburgh, PA (US);
Donald M. Kosnk, Pittsburgh, PA (US)
(21)
US 6,775,664 B2
Addcsso cl al
Kawaguehi el al
364/192
707/5
395/66
707/4
(74) Attorney, Agent, or Firm—Testa, Hurwiiz & Thibeaull,
LLP
(57)
ABSTRACT
A search engine system is provided for a portal site on the
internet. The search engine system employs a regular search
engine to make one-shot or demand searches for information
entities which provide at least threshold matches to user
queries. The search engine system also employs a
collaborativc/content-based filter to make continuing
searches for information entities which match existing wire
queries and are ranked and stored over time in useraccessible, system wires corresponding to the respective
queries. A user feedback system provides collaborative
feedback data for integration with content profile data in the
operation of the collaborative/content-based filter. A query
processor determines whether a demand search or a wire
search is made for an input query.
38 Claims, 10 Drawing Sheets
US 6,775,664 B2
Page 2
U.S. PATENT DOCUMENTS
Shelh, "learning Approach to Personalized Information
5,983,214 A
11/1999
707/1
Filtering", Massachusetts Institute of Technology, (Feb.
6,006,222 A
12/1999 Culliss
707/5
6,014,665 A
1/2000 Culliss
707/5
1994) pp. 1-74.
6,029,141 A
6,029,161 A
•
Lang et al
2/2000
2/2000
Bezos cl al
Lang el al
6,078,916 A
6/2000
Culliss
6,182,068 Bl
1/2001
Culliss
705/27
707/1
707/5
707/5
6,308,175 151 * 10/2001
Lang el al
707/10
6,314,420 Bl • 11/2001
I.ang cl al
707.G
OTHER PUBLICATIONS
Dumais el al., "Using Latent Semantic Analysis to Improve
Access lo Textual Information", In Proceedings of CHI-88
Conference on Human Factors in Computing Systems,
(1998, New York: ACM) 6 pp.
Evans et al., "A Summary of the Cl ARIT Project", Tech
nical Report, Laboratory for Computational Linguistics,
Knowles, "Software Bots filter Web dala; Agent Technology
Delivers Customized Information for BBN's PIN", PC
Week v.12, nl7 May 1995 pp. 112-113.
Carnegie-Mellon University, Sep. 1991, pp. 1-12.
Rcsnick ct al., "Open Architecture for Collaboration Filter
ing of Netnews", Proceedings of the Conference on Com
puter Supported Cooperative Work, Oct. 1994 pp. 175-186.
Goldberg et al., "Using Collaborative Filtering to Weave an
Information Tapestry", Communications of the ACM vol.
35, No. 12, Dec. 1992 pp. 61-70.
Structured Information Spaces", Human Factors in Comput
Fischer et al., "Information Access in Complex, Poorly
ing Systems, CHI '91 Conference Proceedings, ACM, 1991
8 pp.
* cited by examiner
FIRST
MEANS
MEANS
USER
30'
L
29
33'
INFORMATION FILTER APPARATUS
X
32
PREDICTION
ADAPTATION ADAPTATION
MEANS
SECOND
T
DISTRIBUTED
NETWORK
RESOURCE fl
28e
27b
#1
USER
[/*>
MEANS
COMMUNICATION
RESOURCE #2
NETWORK
DISTRIBUTED
31
COMPUTER
STORAGE
MEANS
13
COMPUTER SYSTEM
-25
35
27a
s 17
y
USER USER USER USER USER -\
FILTER FILTER FILTER FILTER FILTER 28a
CREDIBILITY FILTER
COMMUNITY
FILTER
COMMUNITY
FILTER
ADAPTIVE FILTER MEANS
ADAPTIVE CONTENT FILTER
EXTRACTION
MEANS
T7
13
USER1-
FIG. I
16
to
(/I
C/3
d
crs
5?
NOTE: AF=AOAPTIVE FILTER
NOTE: MC=MEMB£R CLIENT
.PROFILE.
ADAPTIVE
ADAPTIVE
CONTENT
to
W
en
s*
o
c
s
n
I
d
U.S. Patent
Aug. 10,2004
100-
Sheet 3 of 10
PROVIDING
DYNAMIC
US 6,775,664 B2
-105
INFORMON
CHARACTERIZATION
COMMUNITY
FILTER
135-
110
HO—| CLIENT FILTER
ADAPTIVELY FILTER
RAW INFORMOKS
PRESENTING
PROPOSED
INFORMON
TO USER
-115
-120
ADAPTING
■125
CONTENT AND/OR
COLLABORATIVE
PROFILE
UPDATING DYNAMIC
INFORMON CHARACTERIZATION
-130
IPREDICTING MC RESPONSESH50
-170
GROUP INTO
PREFERENCE
COHORT
175
TARGETED
INFORMON
-165
_r 155
CREATE
CREDIBILITY
CONSUMER
PROFILE
FILTER
RECOMMENDATION
FILTER
FIG. 3
_r
UPDATE
160
CREDIBILITY
PROFILE
CONSULTATION
FILTER
U.S. Patent
Aug. 10,2004
Sheet 4 of 10
US 6,775,664 B2
■205
PREFILTER
PARTITION EACH
USER INTO
MULTIPLE MC's
200-
GROUP MC's
TO FORM
MULTIPLE
COMMUNITIES
-210
PREDICTING
COMMUNITY
PROFILE
-215
SELECT PROPOSED INFORMON
255f-{ CONTENT FILTER
260-
COLLABORATION FILTER
265
MC FILTER
PROVIDE
PROPOSED
INFORMONS
-235
TO USER
FIG. 4
UPDATE '
PREDICTION
-245
CRITERIA
-240
U.S. Patent
Aug. 10,2004
Sheet 5 of 10
PARSE DATA
INTO TOKENS
US 6,775,664 B2
■301
300
CREATING
MODE-INVARIANT
-305
COMPONENTS
CONCEPT-BASED
INDEXING
DETAILED
COMMUNITY
-310
-315
PROFILING
ELIMINATE
POOR FITS
DETAILED
MC
PROFILING
345-
340-
UPDATE
PROFILES
OBTAIN
USER
FEEDBACK
ELIMINATE
POOR FITS
PRESENTING
PROPOSED
INFORMONS
TO USER
FIG. 5
-320
-325
-330
-335
1 I
W
L
425b
425d
LEARNING
FUNCTION
L
425c
LEARNING
FUNCTION
L
LEARNING
FUNCTION
COLLABORATIVE
INPUT
ERROR-CORRECTION
UNIT
420-
425a
FUNCTION
INFORMATION
415-
r
LEARNING
UNSTRUCTURED
FEATURE
410-
STRUCTURED
FEATURE
INFORMATION
405
CORREUTED-FEATURE,
400
428d-
428c-
428b-
428a-
426c
UP
IRP
r 426d
UP
IRP
_r
UP
IRP
-426b
UP
IRP
-426o
4270
FIG. 6
WEIGHTED IRP
FUNCTION
~427d
FUNCTION
WEIGHTED IRP
FUNCTION
WEIGHTED IRP
^-
429d
PREDICTOR
UP=UNCERTAINTY
NOTE: IRP=INDEPENDENT
RATING PREDICTOR
432
N>
o
ON
era
IS
d
T
SUB-SUB
MIND POOL
502o
SUB-SUB
MIND POOL
SUB-SUB
MIND POOL
FIG. 7
SUB-SUB
MIND POOL
\.
MIND POOL
SUB
SUB-SUB
MIND POOL
502c
SUB-SUB
MIND POOL
504^
SUB-SUB
MIND POOL
a
-a
en
o
c
I
2
OS
>
S3
CZ3
U.S. Patent
Aug. 10,2004
USER
PROVIDES
Sheet 8 of 10
US 6,775,664 B2
20C
QUERY
TERMS
LOOKUP
22C
QUERY
IN
TABLE
DISPLAY RESULTS
FROM
PREVIOUSLY
CREATED
WIRE
PERFORM
REGULAR
SEARCH ON
QUERY
ENGINE
FIG. 8
28C
26C
OTHER
USER
USER
INDIVIDUAL
30C
►
42C1
43C —
32C
48C
42C
AQAPTIVF FFFORACK DATA i
ADAPTIVE
FFFDBACK DATA
PROCESSOR
QUERY
MODE
DEMAND
MODE
PROCESSING/
RATING
RATING
WIRE MODE
INFORMON
SEARCH RETURN
PROCESSOR
-48C1
PROFILES
DEMAND
PROFILE
50C
42C2
■40C
CONTENT-BASED
WIRE
PROFILES
MOLD
MIND
AGENT
STRUCTURE
FILTER
BASED
CONTENT-
SELECTOR
38C
FEEDBACK DATA
A.
FEEDBACK PROCESSORFEEDBACK DATA RELEVANT
TO CURRENT INFORMON
FEEDBACK DATA
SPIDER
MEMORY
SYSTEM
-46CM
FIG. 9
SITE
M
WEB
A
SITE
«-► WEB
On
^4
d
e
NO
i
CA
era
CO
p
62C-*
64C
STATIONS
USER
OTHER
USER
STATION
INDIVIDUAL
►
60C—j
74C
r
QUERY
PROCESSOR
BASED
FILTER
STRUCTURE
CONTENT-
(DEMAND)
SEARCH
ENGINE
80C
r
DEMAND SEARCH
RETURN PROCESSOR
76C —
r
80FD
COLLABORATIVE/
COLLABORATIVE
CONTENT-BASED
FEEDBACK
FILTER
FEEDBACK PROCESSOR- DATA
STRUCTURE
FEEDBACK DATAFOR
CONSIDERED INFORMONS
ADAPT VE FEEDBACK DATA FOR USER CONTENT PROFILE
ADAPTIVE FEEDBACK DATA FOR OTHER USER CONTENT PROFILES
WIRE ROTATION TO USERS RELEVANT INFORMONS LIST
Ml
SITE
MEMORY
SEARCH
-78C
WEB
-68C
A
<♦-*■ WEB
SITE
DEMAND
-82C
ON DEMAND
SPIDERINFORMON
ACQUISITION
MEMORY
-78CM
MEMORY
INFORMON
ACQUISITION
CONT1NOUS
SPIDER-
72C
FIG. 10
4—►
■66C
Memory
WIRE
OS
e
©
I
§
CO
US 6,775,664 B2
1
and informons obtained from the network are compared to
the profile for relevancy and ranking. A continuously oper
ating "spider" scans the network to find informons which are
received and processed for relevancy to the individual user's
wire or to wires established by numerous other users.
INFORMATION FILTER SYSTEM AND
METHOD FOR INTEGRATED CONTENTBASED AND COLLABORATIVE/ADAPTIVE
FEEDBACK QUERIES
This is a continuation of U.S. Ser. No. 09/204,149 filed
Dec. 3, 1998 now U.S. Pat. No. 6,314,420, which is a
continuation in part of U.S. Ser. No. 08/627,436, filed Apr.
4, 1996 now U.S. Pal. No. 5,867,799, the disclosures of
which arc hereby incorporated by reference herein.
This application is also a continuation in part of U.S. Ser.
No. 09/195,708 filed Nov. 19, 1998 now U.S. Pat. No.
6^08,175, which is a continuation in part of U.S. Ser. No.
08/627,436, filed Apr. 4,1996 now U.S. Pat. No. 5,867,794,
the disclosures of which are hereby incorporated by refer
The integrated filter system compares received informons
to the individual user's query profile data, combined with
collaborative data, and ranks, in order of value, informons
10
select any listed informon for consideration.
As the system continues to feed the individual user's
"wire", the stored relevant informon list typically changes
15
ence herein.
BACKGROUND OF THE INVENTION
The present invention relates to information processing
systems for large or massive intbrmation networks, such as
the internet, and more particularly to such information
systems especially adapted for operation in portal and other
web sites wherein a search engine operates with collabora
tive and content-based filtering to provide better search
responses to user queries.
20
user's query. In essence, the pre-search process is a short
term search for quickly finding and quickly iclcniifying
information entities which are content matched to the user's
informons, adjustments in the user's query, feedback evalu
ations by the user for considered informons, and updatings
in collaborative feedback data. Received informons arc
similarly processed for other users' wires established in the
information filter system. Thus, the integrated information
filter system performs continued long-term searching, i.e., it
compares network informons to multiple users' queries to
course of time, whereas conventional search engines initiate
a search in response lo an individual user's query and use
2S content-based filtering to compare the query lo accessed
network informons typically to find matching informons
during a limited, short-term search time period.
thousands of sites for consideration by a user at the user's
site having a search capability, and thereafter enters a
particular query, i.e., a request for informons relevant to a
topic, a field of interest, etc. Thereafter, the search site
typically employs a "spider" scanning system and a contentbased filter in a search engine to search the internet and find
informons which match the query. This process is basically
a pre-search process in which matching informons are
found, at the time of initiating a search for the user's query,
by comparing informons in an "informon data base" to the
due to factors including a return of new and more relevant
find matching informons for various users' wires over the
In the operation of the internet, a countless number of
informons arc available for downloading from any of at least
location. A user typically connects to a portal or other web
found to be relevant. The system maintains the ranked
informons in a stored list from which the individual user can
llic present invention is directed to an information pro
30
35
cessing system especially adapted for use at internet portal
or other web sites to make network searches for information
entities relevant to user queries, with collaborative feedback
data and content-based data and adaptive filter structuring,
being used in filtering operations to produce significantly
improved search results.
SUMMARY OF THE INVENTION
A search engine system employs a content-based filtering
system for receiving informons from a network on a con
40
query.
tinuing basis and for filtering the informons for relevancy to
a wire or demand query from an individual user. A feedback
system provides feedback data from other users.
Another system controls the operation of the filtering
The return list of matching informons can be very exten
system lo filter for one of a wire response and a demand
sive according to the subject of the query and the breadth of 45
response and to return the one response to the user. The
the query. More specific queries typically result in shorter
filtering system combines pertaining feedback data from the
return lists. In some cases, the search site may also be
feedback system with content profile data in determining the
structured to find web sites which probably have stored
relevancy of the informons for inclusion in at least a wire
informons matching the entered query.
Collaborative data can be made available lo assist in
informon rating when a user actually downloads an
response lo the query.
50
BRIEF DESCRIPTION OF THE DRAWINGS
informon, considers and evaluates it, and returns data to the
FIG. 1 is an diagrammatic representation f an embodi
ment of an information filtering apparalus *ici;urding to the
present invention.
search site as a representation of the value of the considered
informon lo the user.
In the
patent
application
which
is parent
to this
FIG. 2 is an diagrammatic representation of another
embodiment of an information filtering apparatus according
continualion-in-parl application, i.e. Ser. No. 08/627,436,
filed by the present inventors on Apr. 4, 1996, and hereby
lo Ihc present invention.
incorporated by reference, an advanced collaborative/
content-based information filter system is employed to pro
vide superior filtering in the process of finding and rating
informoas which match a user's query. The information
filter structure in this system integrates content-based filter
ing and collaborative filtering to determine relevancy of
informons received from various sites in the internet or other
network. In operation, a user enters a query and a corre
sponding "wire" is established, i.e., the query is profiled in
storage on a content basis and adaptively updated over time,
FIG. 3 is a How diagram for an embodiment of an
information filtering method according to the present invenlion.
FIG. 4 is a flow diagram for another embodiment of an
information filtering meihod according tu the present inven
tion.
65
FIG. 5 is a flow diagram for yet another embodiment of
an information filtering meihod according lo the present
invention.
US 6,775,664 B2
FIG. 6 is an illustration of a ihree-component-input model
and profile with associated predictors.
FIG. 7 is an illustration of a mindpool hierarchy.
FIG. 8 is a logic diagram illustrating a search selection
feature of the invention;
FIG. 9 is a functional block diagram of an embodiment of
the invention in which an integrated information processing
system employs a search engine and operates with combined
collaborative filtering and content-based filtering, which is
preferably adaptive, to develop responses to user queries.
FIG. 10 shows another and presently preferred embodi
ment of the invention in which an information processing
system includes an integrated filter structure providing
10
represented by ihe member client, can increase the impor
tance of that particular factor, A's authorship, for others of
the user's member clients. Each of the clients of one user can
be associated with the individual clients of other users
insofar as the profiles of the respeclive clients have similar
attributes. A "community" is a group of clients, called
member clients, that have similar member client profiles,
i.e., that share a subset of attributes or interests. In general,
the subset of shared attributes forms the community profile
for a given community and is representative of the commu
nity norms, or common client attributes.
The "relevance" of a particular informon broadly
describes how well it satisfies the user's information need.
The more relevant an informon is to a user, the higher the
collaborative/adaptive-contcnt-bascd filtering to develop
"signal" content. ITlc less relevant the informon, the higher
longer term, continuing responses to user queries, and a
the "noise" content. Clearly, the notion of what is relevant to
search engine structure which provides short term, demand
a particular user can vary over time and with context, and the
responses to user queries, with the system directing user
user can find the relevance of a particular informon limited
queries to the appropriate structure for responses.
to only a few of the user's potentially vast interest areas.
20 Because a user's interests typically change slowly, relative
DETAILED DESCRIPTION OF THE
to the data stream, it is preferred to use adaptive procedures
EMBODIMENTS
to track the user's current interests and follow them over
The invention herein is preferably configured with an
time. Provision, too, is preferred to be made for sudden
apparatus and method for information filtering in a computer
changes in interest, e.g., taking up antiquarian sword col
system receiving a data stream from a computer network, in 35 lecting and discontinuing stamp collecting, so that the
which entities of information relevant to the user, or
method and apparatus track the evolution of "relevance" to
"infonnons," are extracted from the data stream using
a user and the communities of which the user is a member.
content-based and collaborative filtering. The information
In general, information filtering is the process of selecting
filtering is long term in the sense that it operates on a
the information that a users wishes to see, i.e., informons,
continuing basis, and is both interactive and distributed in x from a large amount of data. Content-based filtering is a
structure and method. It is interactive in that communication
process of filtering by extracting features from the informon,
is substantially bi-directional at each level of the filter. It is
e.g., the text of a document, to determine the informon's
distributed in that all or part of the information filler can
relevance. Collaborative filtering, on the other hand, is the
include a purely hierarchical (up-and-down/parent-child)
process of filtering informons, e.g., documents, by deter
structure or method, a purely parallel (peer-to-peer) structure 35 mining what informons other users with similar interests or
or method, or a combination of hierarchical and parallel
needs found to be relevant.
structures and method.
The system apparatus includes a filter structure having
As used herein, the term "informon" comprehends an
adaptive content-based filters and adaptive collaborative
filters, which respectively include, and respond to, an adap
ticular user. In general, informons can be heterogeneous in 40 tive content profile and an adaptive collaboration profile. As
nature and can be all or part of a textual, a visual, or an audio
used herein, the term "content-based filler" means a filter in
entity. Also, informons can be composed of a combination of
which content data, such as key words, is used in performing
the aforementioned entities, thereby being a multimedia
the filtering process. In a collaborative filter, other user data
entity. Furthermore, an informon can be an entity of pat
is used in performing Ihe filtering process. A collaborative
terned data, such as a data file containing a digital repre- 45 filler is also sometimes referred to as a "content" filler since
sentation of signals and can be a combination of any of the
it ultimately performs the task of finding an object or
previously-mentioned entities. Although some of the data in
document having content relevant to the content desired by
a data stream, including informons, may be included in an
a user. If there are some instances herein where the term
informon, not all data is relevant to a user, and is not within
"content filter" is used as distinguished from a collaborative
the definition of an informon. By analogy, an informon may $0 filter, it is intended that the term "content filter" mean
information entity of potential or actual interest to a par
be considered to be a "signal," and the total data stream may
"content-based filter." The adaptive filters each are preferred
be considered to be "signal+noise." Therefore, an informa
to include at least a portion of a community filler for each
community serviced by the apparatus, and a portion of a
tion filtering apparatus is analogous to other types of signal
filters in that it is designed to separate the "signal" from the
"noise."
member client filler for each member client of the serviced
55 communities. For this reason, the adaptive filtering is dis
communication with the network. Because an individual
tributed in Ihal each of the community filters perform
adaptive collaborative filtering and adaptive content
user can be interested in multiple categories of information,
filtering, even if on different levels, and even if many fillers
Also as used herein, the term "user" is an individual in
exist on a given level. The integrated tillering permits an
a unique profile, or set of attributes. Each member client 60 individual user lo be a unique member client of multiple
profile, then, is representative of a particular group of user
communities, with each community including multiple
preferences. Collectively, the member client profiles asso
member clients sharing similar interests. The adaptive fea
ciated with each user is the user profile. The present inven
tures permit the interests of member clients and entire
tion can apply the learned knowledge of one of a user's
communities to change gradually over time. Also a member
member clients to others of the user's member clients, so 65 client has the ability to indicate a sudden change in
Ihal the importance of the learned knowledge, e.g., ihe user's
preference, e.g., the member client remains a collector but is
preference for a particular author in one interest area as
no longer interested in coin collecting.
the user can be considered to be multiple clients each having
US 6,775,664 B2
ers can be augmented to include their respective credibility
The filter structure also implements adaptive credibility
profiles, community membership, and professional or avofiltering, providing member clients with a measure of inforcational affiliations. A rating of each recommendation pro
mon credibility, as judged by other member clients in the
vided by a responder, relative to other responders'
community. For example, a new member client in a first
community, having no credibility, can inject an Mormon 5 recommendations, also can be supplied. The requester can
into the data flow, thereby providing other member clients in
provide feedback to each of the responders, including a
other communities with the proposed informon, based on the
rating of the credibility of the responder on the particular
respective community profile and member client profiles. If
topic, or the quality of the recommendation. As before, the
the other member clients believe the content of the informon
responders can accrue quality points, value tokens, or "info
to be credible, the adaptive credibility profile will reflect a 10 bucks," as apportioned by the requester, in return for the
growing credibility. Conversely, feedback profiles from
informon recipients that indicate a lack of credibility cause
the adaptive credibility profile, for the informon author, to
reflect untrustworthiness. However, the growth and declina
useful recommendation.
Furthermore, certain embodiments are preferred to be
self-optimizing in that some or all of the adaptive filters used
tion of credibility are not "purely democratic," in the sense
that one's credibility is susceptible to the bias of others'
perceptions, so the growth or declination of one's credibility
is generally proportional to how the credibility of the new
15
collaboration, credibility, reliability, etc.
member client is viewed by other member clients.
Member clients can put their respective reputations "on
the line," and engage in spirited discussions which can be
refereed by other interested member clients. The credibility
20
ibility sub-profiles for the credibility of the content of the
and the like, and can be fed back to discussion participants,
reviewers, and observers to monitor the responses of others
to the debate. The adaptive credibility profiles for those
member clients with top credibility ratings in their commu
nities may be used to establish those member clients as
The filter structure herein is capable of identifying the
preferences of individual member clients and communities,
providing direct and inferential consumer preference
information, and tracking shifts in the preferences whether
the shifts be gradual or sudden. The consumer preference
information can be used to target particular consumer pref
erence groups, or cohorts, and provide members of the
cohort with targeted informons relevant to their consumer
profile further can be partitioned to permit separate cred
informon, the author, the author's community, the reviewers,
in the system dynamically seek optimal values for the
function intended by the filter, e.g., content analysis,
25 preferences. This information also may be used to follow
demographical shifts so that activities relying on accurate
demographical data, such as retail marketing, can use the
consumer preference information to anticipate evolving con
30
sumer needs in a timely manner.
To provide a basis for adaptation, it is preferred that each
raw informon be processed into a standardized vector, which
With this functionality, additional features can be
may be on the order of 20,000 to 100,000 tokens long. The
implemented, including, for example, "instant polling" on a
learning and optimization methods that ultimately are cho
matter of political or consumer interest. In conjunction with
both content and collaborative filtering, credibility filtering, 35 sen are preferred to be substantially robust to the problems
which can be presented by such high-dimensional input
and the resulting adaptive credibility profiles, also may be
spaces. Dimensionality reduction using methods such as the
used to produce other features, such as on-line consultation
singular value decomposition (SVD), or auto-encoding neu
and recommendation services. Although the "experts" in the
ral networks attempt to reduce the size of the space while
communities most closely related to the topic can be
afforded special status as such, member clients from other 40 initially retaining the information contained in the original
representation. However, the SVD can lose information
communities also can participate in the consultation or
during the transformation and may give inferior results. Two
recommendation process.
adaptation/learning methods that are presently preferred
In one embodiment of the consultation service, credibility
include the TF-IDF technique and the MDL technique.
"experts" in their respective communities.
filtering can be augmented to include consultation filtering.
With this feature, a member client can transmit an informon
to the network with a request for guidance on an issue, for
example, caring for a sick tropical fish. Other member
clients can respond to the requester with informons related
to the topic, e.g., suggestions for water temperature and
45
antibiotics. The informons of the respondent can include
50
their respective credibility profiles, community membership,
and professional or vocational affiliations. The requester can
provide feedback to each of the responders, including a
rating of the credibility of the respondcr on the particular
topic. Additionally, the responders can accrue quality points,
55
FIG. 1 illustrates one embodiment of an information
filtering apparatus 1 structured for search engine implemen
tation in accordance with the invention as described subse
quently herein in connection with FIGS. 8 and 9. In general,
a data stream is conveyed through network 3, which can be
a global internetwork. A skilled artisan would recognize that
apparatus 1 can be used with other types of networks,
including, for example, an enterprise-wide network, or
"intranet." Using network 3, User #1 (5) can communicate
with other users, for example. User #2 (7) and User #3 (9),
and also with distributed network resources such as resource
value tokens, or "info bucks," as apportioned by the
#1 (11) and resource #2 (13).
requester, in return for useful guidance.
Apparatus 1 is preferred to be part of computer system 16,
although User #1 (5) is not required to be the sole user of
Similarly, one embodiment of an on-line recommendation
class medieval armor rcfurbishcrs. In this embodiment, the
computer system 16. In one present embodiment, it is
preferred that computer system 16 having information filter
apparatus 1 therein filters information for a plurality of
users. One application for apparatus 1, for example, could be
requester can transmit the informon to the network bearing
the request for recommendation. Other member clients can
commercial information filtering service, which can be
service uses recommendation filtering and adaptive recom
mendation profiles to give member clients recommendations
on matters as diverse as local auto mechanics and world-
respond to the requester with informons having specific
recommendations or disrecommendations, advice, etc. As
with the consultation service, the informons of the respond
60
that user 5 and similar users may be subscribers to a
65
provided by the owner of computer system 16.
Extraction means 17 can be coupled with, and receives
data stream 15 from, network 3. Extraction means 17 can
US 6,775,664 B2
7
identify and extract raw infonnons 19 from data stream 15.
Each of the raw informons 19 has an information content.
Extraction means 17 uses an adaptive content filter, and at
least part of the adaptive content profile, to analyze the data
stream for the presence of raw informons. Raw infonnons 5
are those data entities whose content identifies them as being
"in the ballpark," or of potential interest to a community
coupled to apparatus 1. Extraction means 17 can remove
8
A predicted response anticipated by adaptive filtering
means 21 can be compared to the actual feedback response
29 of user 5 by first adaptation means 30, which derives a
prediction error. First adaptation means 30 also can include
prediction means 33, which collects a number of temporally-
duplicate informons, even if the informons arrive from
spaced feedback responses, to update the adaptive collaboration profile, the adaptive content profile, or both, with an
adapted future prediction 34, in order to minimize subsem prec[icuon errors by the respective adaptive collabo-
Extraction means 17 also can use at least part of a commu-
>° °™ embodiment of the invention herein, it is preferred
different sources, so that user resources are not wasted by
handling and viewing repetitive and cumulative information.
ra,ion filt(,r and ad
ive „,„„,„, fiIu.r
.,.«•._.
■■.-••
r
■
nity profile and a user profile for User #1 (5) to determine
whether the informon content is relevant to the community
»hat prediction means 33 be a self-optimizing prediction
means using a preselected learning technique. Such tech-
informon is a selected raw informon that, based upon the
respective member client and community profiles, is pre-
also can include a neural network therein and employ a
neural network learning technique for adaptation and pre
of which User #1 is a part.
niques can include, for example, one or more of a top-keyFilter means 21 adaptively filters raw informons 19 and 15 word-selection learning technique, a nearest-neighbor learnproduces proposed informons 23 which are conveyed to
ing technique, a term-weighting learning technique, and a
User #1 (5) by communication means 25. A proposed
probabilistic learning technique. First adaptation means 30
dicted to be of particular interest to a member client of User 20 diction. In one present embodiment of the invention, the
5. Filter means 21 can include a plurality of community
filters 270,6 and a plurality of member client fillers 28a-e,
term-weighting learning technique is preferred to be a
TF-IDF technique and the probabilistic learning technique is
profiles. When raw Mormons 19 are filtered by filler means
adapIaliori means 30 further can include second
each respectively having community and member client
artlcXt^^
User #1 (5), responsive to the respective community and
member client profiles, are conveyed thereto. Where such is
desired, filter means 21 also can include a credibility filter
preferred to be a MDL learning technique,
collaboration profiles the adaptive content profiles, the
community profile, and the user profile, responsive to at
lcasl ooe of lhc otner Proflles-In thls manner>lrends altnb-
which enables means 21 to perform credibility filtering of ,n utable Io individual member clients, individual users, and
raw informons 19 according to a credibility profile.
* '"dividual communities in one domain of system 16 can be
It is preferred that the adaptive filtering performed within
"cognized by and influence, similar entities in other
filter means 21 by the plurality of filters 27a,b, l%a-<, and
35, use a self-optimizing adaptive filtering so that each of the
domil™ , 2%a-e, and 35, is 35 common attributes.
driven continually to respective values corresponding to a
minimal error for each individual parameter. Self-
Apparatus 1 also can include a computer storage means
31 for storing the profiles, including the adaptive content
error, etc., are favored to prevail.
remote analysis, for example, by User #2 (7).
optimization encourages a dynamic, marketplace-like operaProfiIe and Ihe adaptive collaboration profile. Additional
tion of the system, in that those entities having the most
trend-tracking infonnation can be stored for later retrieval in
desirable value, e.g., highest credibility, lowest predicted 40 storage means 31, or may be conveyed to network 3 for
Self-optimization can be effected according to respective
FIG. 2 illustrates another preferred embodiment of inforpresclcctcd self-optimizing adaptation techniques including,
malion filtering apparatus 50, in computer system 51. Appafor example, one or more of a top-key-word-selection adapranis 50 can include first processor 52, second processors
tation technique, a nearest-neighbor adaptation technique, a 45 53"A third processors Ma-d, and a fourth processor 55, to
term-weighting adaptation technique, a probabilistic adapeffect the desired information filtering. First processor 52
tation technique, and a neural network learning technique. In
can be coupled to, and receive a data stream 56 from,
one present embodiment of the invention, the termnetwork 57. First processor 52 can serve as a pre-processor
weighting adaptation technique is preferred to be a TF-IDF
by extracting raw informons 58 from data stream 56 rcspontechnique and the probabilistic adaptation technique is pre- 50 sive to preprocessing profile 49 and conveying informons 58
ferred to be a MDL technique.
to second processors 53a,b.
When user 5 receives proposed informon 23 from appaBecause of Ihe inconsistencies presenled by the nearlyratus 1, user 5 is provided with multiple feedback queries
infinite individual differences in the modes of
along with the proposed informon. By answering, user 5
conceptualization, expression, and vocabulary among users,
creates a feedback profile that corresponds to feedback 55 even within a community of coinciding interests, similar
response 29. User feedback response 29 can be active
notions can be described with vastly different terms and
feedback, passive feedback, or a combination. Active feedconnotations, greatly complicating informon characterizaback can include the user's numerical rating for an
tion. Mode variations can be even greater between disparate
informon, hints, and indices. Hints can include like or dislike
communities, discouraging interaction and knowledge-
of an author, and informon source and timeliness. Indices 60 sharing among communities. Therefore, it is particularly
can include credibility, agreement with content or author,
preferred that processor 52 create a mode-invariant reprehumor, or value. Feedback response 29 provides an aclual
scntalion for each raw informon, thus allowing fast, accurate
response to proposed informon 23, which is a measure of the
informon characterization and collaborative filtering. Moderelevance of the proposed informon to the information need
invariant representations tend to facilitate relevant informon
of user 5. Such relevance feedback attempts to improve the 65 selection and distribution within and among communities,
performance for a particular profile by modifying Ihe
thereby promoting knowledge-sharing, thereby benefiting
profiles, based on feedback response 29.
the group of interlinked communities, i.e., a society, as well.
US 6,775,664 B2
10
First processor 52 also can be used to prevent duplicate
informons, e.g., the same information from different
sources, from further penetrating, and thus consuming the
resources of, the filtering process. Other processors 53,a,6,
54a-d, also may be used to perform the duplicate informa
tion elimination function, but additionally may measure the
differences between the existing informon and new infor
mons. That difference between the content of the informon
the previous time the user reviewed it and the content of the
informon in its present form is the "delta" of interest.
Processors Sia,b, 54a-d may eliminate the informon from
technique, a nearest-neighbor adaptation technique, a termweighting adaptation technique, and a probabilistic adapta
tion technique. Any of the adaptive filters 66a-d may
include a neural network. In one present embodiment of the
invention, the term-weighting adaptation technique is pre
ferred to be a TF-IDF technique and the probabilistic adap
10
tation technique is preferred to be a MDL technique.
An artisan would recognize that one or more of the
processors 52-55 could be combined functionally so that the
actual number of processors used in the apparatus 50 could
be less than, or greater than, that illustrated in FIG. 2. For
example, in one embodiment of the present invention, first
further processing, or direct the new, altered informon to the
processor 52 can be in a single microcomputer workstation,
member client, in the event that nature or extent of the
with processors 53-55 being implemented in additional
change exceeds a "delta" threshold. In general, from the
notion of exceeding a preselected delta threshold, one may
respective microcomputer systems. Suitable microcomputer
15
infer that the informon has changed to the extent that the
systems can include those based upon the Intel® Pentiumchange is interesting to the user. The nature of this change
Pro™ microprocessor. In fact, the flexibility of design
can be shared among all of a user's member clients. This
presented by the invention allows for extensive scalability of
delta threshold can be preselected by the user, or by the
apparatus 50, in which the number of users, and the com
preselected learning technique. Such processing, or "delta 20 munities supported may be easily expanded by adding
learning" can be accomplished by second processors SSa,b,
suitable processors. As described in the context of FIG. 1,
alone or in concert with third processors S4a-d. Indeed,
the interrelation of the several adaptive profiles and respec
third processor S4a-d can be (he locus for delta learning,
tive filters allow trends attributable to individual member
where processors 54a-d adapts a delta learning profile for
clients, individual users, and individual communities in one
each member client of the community, i.e. user, thus antici ,5 domain of system 51 to be recognized by, and influence,
pating those changes in existing informons that the user may
similar entities in other domains, of system 51 to the extent
find "interesting."
that the respective entities in the different domains share
Second processors S3a,b can filter raw informons 58 and
extract proposed community informons 59a,b therefrom.
Informons 59a,b are those predicted by processors 53a,b to
be relevant to the respective communities, in response to
common attributes.
The above described system operates in accordance with
30
communily profiles 48a,b that are unique to the communi
ties. Although only two second processors S3a,b are shown
in FIG. 2, system 51 can be scaled to support many more
processors, and communities. It is presently preferred that
second processors S3a,b extract community informons
S9a,b using a two-step process. Where processor 52 has
generated mode-invariant concept representations of the raw
informons, processor 53a,b can perform concept-based
35
a method 100 for information filtering in a computer system,
as illustrated in FIG. 3, which includes providing a dynamic
informon characterization (step 105) having a plurality of
profiles encoded therein, including an adaptive content
profile and an adaptive collaboration profile; and adaptively
filtering the raw informons (step 110) responsive to the
dynamic informon characterization, thereby producing a
proposed informon. The method continues by presenting the
proposed informon to the user (step 115) and receiving a
feedback profile from the user (step 120), responsive to the
indexing, and then provide detailed communily filtering of 4U proposed informon. Also, the method includes adapting at
each informon.
least one of the adaptive content profile (step 125) and the
Third processors 54a—d can receive community infor
adaptive collaboration profile responsive to the feedback
mons S9a,b from processors 53a, 6, and extract proposed
profile; and updating the dynamic informon characterization
member client informons 61a-d therefrom, responsive to
(step 130) responsive thereto.
unique member client profiles 62a-d for respective ones of 45
The adaptive filtering (step 110) in method 100 can be
member clients 63a-d. Each user can be represented by
machine distributed adaptive filtering that includes commu
multiple member clients in multiple communities. For
nity filtering (substep 135), Using a community profile for
example, each of users 64a,b can maintain interests in each
each community, and client filtering (substep 140), similarly
of the communities serviced by respective second processors
using a member client profile for each member client of each
53a,b, and each receive separate member client informons 50 community. It is preferred that the filtering in substeps 135
6lb,c and 61a,d, respectively.
and 140 be responsive to the adaptive content profile and the
Each member client 63a-d provides respective member
adaptive collaboration profile. Method 100 comprehends
client feedback 6Sa-d to fourth processor 55, responsive to
servicing multiple communities and multiple users. In turn,
the proposed member client informons 6la-d. Based upon
each user may be represented by multiple member clients,
the member client feedback 6Sa-d, processor 55 updates at 5; with each client having a unique member client profile and
least one of the preprocessing profile 49, community profiles
being a member of a selected communily. It is preferred that
48a,b and member client profiles 62a-d. Also, processor 55
updating the dynamic informon characterization (step 130)
further include predicting selected subsequent member cli
adapts at least one of the adaptive content profile 68 and the
adaptive collaboration profile 69, responsive to profiles 49,
4Sa,b, and 62a-d.
ent responses (step 150).
60
Method 100 can also include credibility filtering (step
Fourth processor 55 can include a plurality of adaptive
155) of the raw informons responsive to an adaptive cred
filters 66a-d for each of the aforementioned profiles and
ibility profile and updating the credibility profile (step 160)
computer storage therefor. It is preferred that the plurality of
responsive to the user feedback profile. Method 100 further
adaptive filters 66a-d be self-optimizing adaptive fillers.
can include creating a consumer profile (step 165) respon
Self-optimization can be effected according to a preselected 65 sive to the user feedback profile. In general, the consumer
self-optimizing adaptation technique including, for example,
profile is representative of predetermined consumer prefer
one or more of a top-key-word-selection adaptation
ence criteria relative to the communities of which the user is
US 6,775,664 B2
11
12
a member client. Furthermore, grouping selected ones (step
170) of the users into a preference cohort, responsive to the
It is preferred that coherent portions of the data stream,
i.e., potential raw informons, be first parsed (step 301) into
preselected consumer preference criteria, can facilitate pro-
generalized words, called tokens. Tokens include punctua-
viding a targeted informon (step 175), such as an
advertisement, to the preference cohort.
5
FIG. 4 illustrates yet another preferred method 200. In
general, method 200 includes partitioning (step 205) each
(ion and other specialized symbols that may be part of the
structure found in the article headers. For example, in
addition to typical words such as "seminar" counting as
user into multiple member clients, each having a unique
member client profile with multiple client attributes and
tokens, the punctuation mark "S" and the symbol "Newsgrouprcomp.ai" are also tokens. Using noun phrases as
tokens also can be useful
mutinies with each member client in a particular community
„,..
thereby providing each community with a unique commu-
tokens not occurring in, the document. Using this type of
grouping member clients (step 210) .0 form multiple com- „
sharing selected client attributes with other member clients,
nity profile having common client attributes.
Nex( '
,
token cou|Us for ^ documenl {& aeited
.
.
,,.,,,
.
.
...
,.
Thls vector ls the slze of *e l°lal ™«bu|«y. Wl'h. zeros »°'
vector 1S fmet.mes «1 fed the bag-of-words model. While
Method 200 continues by predicting a community profile ,, Ihe bag-of-words model does no. capture the order of the
(step 215) for each community using firs, prediction criteria, '5 lokens in the dofme"V wlnch "** be «f? forflm8uistic
and predicting a member client profile (step 220) for a
member client in a particular community using second
or symaCc analysts, il captures most of the information
needed for filImnS PurPoses-
prediction criteria. Method 200 also includes the steps of
Although, it is common in information retrieval systems
extracting raw informons (step 225) from a data stream and „ to SrouP 'he tokens together by their common lingmshc
selecting proposed informons (step 230) from raw infor- "° roots. callcd stemming, as a next step it is preferred in the
mons. The proposed informons generally are correlated with
Presenl invention that the tokens be left in their unstemmed
client to whom the proposed informon is offered. After
providing the proposed informons to the user (step 235), "
receiving user feedback (step 240) in response to the proposed informons permits the updating of the first and second
prediction criteria (step 245) responsive to the user feed-
Creating a mode-invariant profile (step 305), C, includes
creating a conceptual representation for each informon, A,
that is invariant with respect to the form-of-expression, e.g.,
vocabulary and conceptualization. Each community can
consist of a "Meta-U-Zine" collection, M, of informons.
one or more of the common client attributes of a community,
and of the member client attributes of the particular member
back.
form- ln lhis form-lhe tokens are amenable to being dassifled int0 mode-invariant concept components,
Based upon profile C, the appropriate communities, if any,
Method 200 further may include prefiltering the data 3° for each informon in the data stream are selected by concept-
stream (step 250) using the predicted community profile,
with the predicted community profile identifying the raw
informons in the data stream
based indexing (step 310) into each M. That is, for each
c°ncepl C that describes A, put A into a queue Q^, for each
M wmch k related to C. It is preferred that there is a list of
filtering the raw informons using an adaptive content filter
(step 255) responsive to the informon content; filtering the
raw informons using an adaptive collaboration filter (step
260) responsive to the common client attributes for the
index-searched. Each A that is determined to be a poor fit for
a Particular M is eliminated from further processing. Once
A has bcen matched w,th a particular M, a more complex
community profile Pw is developed and maintained for each
an adaptive member client filter (step 26S) responsive to the
unique member client profile.
determine whether it matches PA, strongly enough to be
relained or "weeded" out (step 325) at this stage.
Step 230 of selecting proposed informons can include 35 Ms Ihal » slored for each concePt and lhat can be eas«y
pertaining community; and filtering the raw informons using 40 M -lf A has fallen inl° Q"- lhen A >» analyzed to
It is preferred that updating the first and second prediction
Eacn A for a particular M is sent to each user's personal
criteria (step 245) employ a self-optimizing adaptation
aScnl' or member client U of M, for additional analysis
technique, including, for example, one or more of a top- 45 based on lhe member client's profile (step 325). Each A that
key-word-selection adaptation technique, a nearest-neighbor
fils u>s interests sufficiently is selected for U's personal
adaptation technique, a term-weighting adaptation
informon, or "U-Zinc," collection, Z. Poor-fitting informons
technique, and a probabilistic adaptation technique. Il is
are eliminated from placement in Z (step 330). This userfurther preferred that the term-weighting adaptation techlevel staBe of analysis and selection may be performed on a
nique be a TF-IDF technique and the probabilistic adapta- 50 cenlralizcd server site or on the user's computer,
lion technique be a minimum description length technique.
Next, the proposed informons are presented to user U
The information filtering method shown in FIG. 5 pro-
usef£ffor ,his examplej fe simply t0 repeat lhe same
for example, "inlereslingness,
credibility, funmness, con-
^ (hat was ^ for ,he ,Rp bul> ms,ead of predicting
tent value, writing quality, violence content, sexual content,
profanity level, business importance, scientific merit,
the rating> it ^ preferred to predict the squared-error, given
the feature vector. The exact square-error values can be used
surprise/unexpectedness of information content, artistic
as the informon weights, instead of using a rating-weight
quality, dramatic appeal, entertainment value, trendiness/ 20 lookup table. A more optimal mapping function could also
importance to future directions, and opinion agreement.
be computed, if indicated by the application.
Each CFECU 420 is a unit that can detect sets of specific
feature combinations which are exceptions in combination.
For example, author X's articles are generally disliked in lhe
Z for woodworking, except when X writes about lathes. 25
Token 1
When an informon authored by X contains the concept of
Token 2
Token 3
Token 4
16.68
8.73
12.89
11.27
[RPpos.vec.ot
"lathes," then the appropriate CFECU 420 is triggered to
signal that this is an exception, and accordingly a signal is
sent to offset the general negative signal otherwise triggered
[rp nCg. vector
—.
woodworking L.
made tf)
InformaUon-Type, and other fields previously identified to
other ^ u fa , difficull
exemplary SH below, accounts only for the Author field.
rcgarding which uscrs alrcad
, „,
.
15.20
8.87
.
4.27
, .
5.04
...
,
because of the genera, dllike fork's informons^ the 30 ^^^^ff^l^^^S^
• ,Tr- w«mP ' * ?"? °a £lnictu5ed Fealure lDjolma'
t.on (SFI) 405 can include fields such as Author, Source,
, a sjmjiarj,y measure, and then a mapping
function can be used ^^ an UP.
to get collaboralive
Maki
e£fective
be of particular value m the analysis^ For simplicity the 35 seven issues FJ
For this example, assume three authors A, B^and C have
,here
£
t (CI) from
b]cm becausc ofIheVfoU'Owing
„ fa nQ ,
iorj knowled *
will have r[tcd an informln
^iorl making a prediction for a user U, who hasn't yet
collectively subm.tted 10 angles that have been read and
have been rated as m TABLE 1 (followmg the text of this
read Momon A^wtoK> a moM for diclion mus/be
ationa] no maller which subseI of J j ,s h
t0
spec.ficat.on). In the accompanying rating scheme, a rating 40 ^available, if any, at a given time. Second, computational
can vary between 1 and 5, will, 5 indicating a "most
interesting article^ If four new articles (11,12,13,14) arrive
cffid
musl b/maintSned in H hl of a polenl^uy very
,
se, of users and informons. ^ increFmenlai u;dal/s
B C and a new author D has con.nbu.ed, a simple IRP for
r orled^om users r
that have no. yet been rated, and, ,n addition .0 authors A,
o(%{i
edictions oflen are desired as more feedbPack is
di
an informon Fourlh in leam.
lhe Author field, that just takes sums of the averages, would 45 in^ good models for ^akin| raling fictions, only very
as toiiows.
IRP (author)-weigbted sum of
sparse data typically is available for each users rating of each
document. Thus, a large "missing data" problem must be
average (ratings given the author so far)
average (ratings given the author so far in this M)
average (ratings given all authors so far in this M)
average (ratings given all authors)
dealI wjlh effectjveiy
Fifth mos,
,entia, so]u(ions (o ,he c,
Mcm
ifc
so indepenc|ence assumptiOns that, when grossly violated, give
very poor resuhs
average ratings given the author so Tar by a particular
user U)*
M m e
,e of an indcpcndcnce
assUmption violation, assume that ten users of a collaboralive fii,e|.jng sysiem) called the "B-Team," always rale all
average (ratings given the aulhor so far m this M by a
artkles exact|y in ,he samc wa% for examp,e> because ,hey
particular user U)
55 think very much alike. Further assume that user A's ralings
particular user U)
correlated with user C at the 0.9 level. Now, suppose user C
(if for a personal Z)
i( is reasonabk lo predicl ,hal A-s raling ako mighl ^ a ..5..
average (ratings given all authors so far in Ihis M by a
average (ratings given all authors by a particular user)*
are „,„.!„„, wiln lhe B.Tcam a, lhc 0 5 levelj and are
rcads an arlide and rates ;, a ..5.. Bascd on |ha, c>s ra(ingi
Ilie purpose of the weighted sum is to make use of broader, 60 Furlher_ su ^,se tllaI a mernber of the B-Tcam reads the
more general statistics, when strong statistics for a particular
user reading an informon by a particular aulhor, within a
particular Z may not yet be available. When stronger slalistics arc available, the broader terms can be eliminated by
artic|c an(, ratcs i( , ..,„ Exis|j
collaborativc filtcri
methods are hkc| |0
dfcl ,ha, A>s ralj
R sub A wou|d
be.
that used for creating CWFs 430, for the profiles as a wholei
In principle, if other members of the B-Team then read and
Some of the averages may be left out in the actual storage
rate lhe article, it should not affect the prediction of A's
using smaller weights. This weighting scheme is similar 10 65
«^-(0.9x?»o.5x2)/(0.9+o.5)-3.93
US 6,775,664 B2
17
18
rating, R^ub.A, because it is known that other B-Team
members always rate the article with the same value as the
first member of the B-Team. However, the prediction for A
by existing collaborative filtering schemes would tend to
give 10 times the weight to the "2" rating, and would be:
comparing the similarity between the profiles. Next, the
similarity of the member client profiles and informon con
tent profiles can be compared. A neural network could be
used to learn how to compare profiles so that the error in
predicted ratings is minimized. However, the invention can
be embodied with use of the invention can be embodied with
Kv(0-9*5+10x0.5x2)/(0.9+10x0.5),.2.46
use of a simple cosine similarity metric, like that previously
considered in connection with Unstructured Feature Infor
Existing collaborative filtering schemes do not work well in
this case because B-Team's ratings are not independent, and
have a correlation among one another of 1. 'Hie information
mation (UFI) can be used.
filter according to the present invention can recognize and
compensate for such inter-user correlation.
Sixth, information about the community of people is
known, other than each user's ratings of intbrmons. This
information can include the present topics the users like,
what authors the users like, etc. This information can make
the system more effective when it is used for learning
15
frequencies that can be multiplied in the standard token
frequency comparison method, which would be recognized
by a skilled artisan.
stronger associations between community members. For
example, because Users A and B in a particular community
M have never yet read and rated an informon in common, no
correlation between their likes and dislikes can be made,
based on common ratings alone. However, users A and B
have both read and liked several informons authored by the
same author, X, although Users A and B each read a
distinctly different Zs. Such information can be used to make
The method used preferably includes more than just the
tokens, such as the author and other SFI; and, it is preferred
that the three vectors for component also are able to be
compared. SFIs may be handled by transforming them into
an entity that can be treated in comparable way to token
Continuing in the ongoing example, the Author field may
20
be used. Where user A and user B have rated authors K and
L, the token frequency vector may appear as follows:
25
the inference that there is a possible relationship between
user A's interests and user B's interests. For the most part,
existing collaborative filtering systems can not take advan
tage of this knowledge.
Seventh, information about the informon under consider 30
ation also is known, in addition to the ratings given it so far.
For example, from knowing that informon A is about the
concept of "gardening", better use can be made of which
Further, the author component of the member client profiles
users' ratings are more relevant in the context of the inforof user A and user B may be compared by taking a special
mation in the informon. If user B's rating agrees with user 3S weighted correlation of each author under comparison. In
D'sralingofarticleswhenthesubjectisabout"politics",but
general, the weight is a function F of the sample sizes for
B's ratings agree more with user D when informon A is
user A's and user B's rating of the author, where F is the
about "gardening", then the relationship between User B's
product of a monotonically-incrcasing function of the
ratings and User D's ratings are preferred to be emphasized
sample size for each of user A and user B. Also, a simple
to a greater extent than the relationship between User B and 4U function G of whether the informon D is by the author or not
User C when making predictions about informon A.
is used. This function can be: G=q if so, and G=po changed by more than another minimum threshold amount,
quality; dramatic appeal; entertainment value; surprise or
then the compiled statistics should be passed to the manag
unexpectedness of information content; trendiness or impor
er's parent-node, if any, and the process rccurscs upward and
tance to future directions; and opinion agreement. Each of
downward in the hierarchy.
these qualities can be optionally addressed by users with a
Because no mindpool manager is required lo have accu
rating feedback mechanism and, therefore, these qualities 65 rate information, but just an estimation of the rating and an
can be used to drive separate mindpool hierarchies. Also, the
uncertainty level, any manager may respond with a simple
qualities can be used in combinations, if appropriate, to
average of all previous documents, and with a higher degree
US 6,775,664 B2
21
22
of uncertainty, if none of its child-nodes has any rating
information yet. The preferred distributed strategy tends to
production of other uncertainty combination methods
described above. Approximations can be made by precomputing all terms thai do not change significantly, based
on the particular informon, or the subset of actual ratings
given so far lo the mindpool manager. As slated previously,
the correlaled-feature error-correclion units (CFECUs) are
intended to detect irregularities or statistical exceptions.
Indeed, Iwo objectives of the CFECU units are to (1) find
non-linear exceptions to the general structure of the three
aforementioned types of inputs (SFI, UFI, and CI); and (2)
find particular combinations of informon sub-features that
statistically stand out as having special structure which is not
captured by the rest of the general model; and (3) trigger an
additional signal to the CFECU's conditions are met, in
order to reduce prediction error. The following exemplifies
reduce the communication needed between processors, and
the computation tends to be pooled, thereby eliminating a
substantial degree of redundancy. Using this distributed
strategy, the estimations tend to settle to the extent that the
updating of other nodes, and the other users predictions are
minimized. Therefore, as the number of informons and users
becomes large, the computation and prediction updates grow
as the sum of the number of informons and the number of
users, rather than the product of the number of informons
and the number of users. In addition, incremental updates
can be accomplished by the passing of estimations up and
10
down the hierarchy. Incremental updates of rating predic
tions continue to move until the prediction becomes stable 15 the CFECU operation:
due to the large sample size. The distributed division of users
can reduce the effects of independent assumption violations.
In the previous example with the B-Team of ten users, the
B-Team can be organized as a particular mindpool. With the
additional ratings from each of the B-Tcam members, the
estimation from the B-Team mindpool typically does not
change significantly because of the exact correlation
between the members of that mindpool. This single estima
tion then can be combined with other estimations to achieve
the desired result, regardless of how many B-Team members
have read the article at any given lime.
The mindpool hierarchies can be created by either
computer-guided or human-guided methods. If the hierarchy
creation is human-guided, there often is a natural breakdown
of people based on information such as job position, com 30
User B's Number of
mon interests, or any other information that is known about
tnformons Read Aboul
them. Where the mindpool hierarchy is created
Average over
automatically, the previously described measure of the col
Gardening
Politics
Topics
laborative input relationship between users can be employed
in a standard hierarchical clustering algorithm to produce 35
7
Author A's
40
1.69
Articles
1.84
70
200
each group of users or nodes in the mindpool hierarchy.
Other Authors
Such standard hierarchical clustering algorithms can
include, for example, the agglomerative method, or the
divide-and-conquer method. A skilled artisan would recog
In Ihis example, it is desired thai author A's informon D
nize that many other techniques also are available for 40 about gardening have a high predicted rating for user B.
However, because Ihe average rating for aulhor A by user B
incrementally-adjusting the clusters as new information is
is only 1.69, and the average rating for the gardening
collected. Typically, clustering is intended to (1) bring
concept is only 1.68, a three-part model (SFI-UFI-CI) that
together users whose rating information is clearly not inde
does not evaluate the informon features in combination
pendent; and (2) produce mindpool estimations that are
45 would tend lo not rank informon D very highly. In this case,
substantially independent among one another.
the first CFECU would first find sources of error in past
Estimations are made in a manner similar to other esti
examples. This could include using the three-part model
mations described herein. For example, for each user or
against the known examples that user B has rated so far. In
sub-mindpool (sub-informant), a similarity between the subthis example, seven articles that user B has rated, have an
informant and the centroid of the mindpool can be computed
average rating of 4.S, though even the Ihree-part model only
in order to determine how relevant the sub-informant is in so
predicts a raiing of about 1.68. When such a large error
computing the estimation. Uncertainty estimators also are
appears, and has statistical strength due to the number of
associated with these sub-informants, so that they can be
examples with the common characteristics of, for example,
weighted with respect to their reliability in providing the
the same aulhor and topic, a CFECU is created lo idenlify
most accurate estimation. Optionally, the informon under
thai this exception lo Ihe Ihree-part model has been triggered
evaluation can be used to modulate the relevancy of a 55 and thai a correction signal is needed. Second, it is preferred
sub-informant. This type of evaluation also can lake advan
to index the new CFECU into a database so thai, when
tage of the Iwo previously-determined collaborative infor
triggering features appear in an informon, for example,
mation relationship components, thereby lending lo magnify
aulhor and topic, the correction signal is scm into the
relationships that are stronger for particular types of infor
appropriate CWF. One method which can be used to effect
mons than for others. Once a suitable set of weights are 60 the first step is a cascade correlation neural network, in
established for each user within a mindpool for a particular
which the neural net finds new connection neural net units
informon, a simple wcightcd-avcragc can be used to make
to progressively reduce the prediction error. Another method
the estimation. It is preferred that the "simple" weighled
is lo search through each informon lhal has been rated but
average used be more conservative regarding input infor
whose predicted rating has a high error, and storing the
mation that a simple independent linear regression. Also, the 65 informons profile.
overall Uncertainty can be derived from the Uncertainty
When "enough" informons have been found with high
Predictions of the sub-informants, in a manner similar to the
error and common characteristics, the common characters-
US 6,775,664 B2
23
24
tics can be joined together as a candidate for anew CFECU.
Next, the candidate can be tested on all the samples, whether
When a user makes a query for which a wire already
exists, wire search results are preferably returned instead of
they have a high prediction or a low prediction error
regular search engine results. As shown in the logic diagram
than a minimum threshold level, the CFECU can be added
block 26C reUlms results from ,he existi
associated with them. Then, the overall error change
(reduction or increase) for all of the examples can be 5
computed to determine ir the CFECU should be added to the
informon profile. If the estimated error reduction is greater
to the profile. As successful CFECU are Covered for
users
profiles, they also can be added to a database of
CFECU's that may be useful for analyzing other profiles. If
of FIG. 7, a user provides a query as indicated by block 20C.
nje query is applied to a Lookup Table, as indicated by
block 22C, block 24C applies a test to determine from the
table wnethcr a wire already exists for the new query. If so,
wfre otherwise,
bIock 28C commands , demand ^ bby a regular query
.
?,,•■! L
r •
u
u
a particular CFECU has a sufficiently broad application, it
can be moved up in the filtering process, so that it is
. WlIh lhe f5 of Wlre se,arch relu1rns> eafh uf ™ rev>ew
lhc. rctu,rn,cd rcsulls and Pr0Vldc fccdback dala about
can be included in the representation that is computed in the
the hller Prollles Xlsed m Processing informons for the wire,
be made by taking the average of those informons for which
the CFECU responds. Also, the Uncertainty can be chosen
such that the CFECU signal optimally outweighs the other
tion of previous users' feedback data. By analyzing documents which users rale as meeting a particular quality such
as 1 interestingness, the system can find common document
computed for every entity once. Also, the particular CFECU
reviewed documents. Such feedback data is incorporated in
pre-processing stage as a new feature. In general, the esti- 15 Therefore, when a future user makes substantially the same
mation of the predicted rating from a particular CFECU can
querv.lne wire wi" havc bee" improved by the incorpora-
signals being sent to the CWF. One method of self- 20 features which can be used to return more like documents to
optimization that can be employed is, for example, the
future users who make substantially the same query,
gradient descent method, although a skilled artisan would
Alternatively, all queries applied to a search engine sys-
recognize that other appropriate optimization methods may
tern of the invention can set up new wires. After a search
be used.
query is presented to the search engine system, a wire is
The invention of this continuation-in-part application, as 25 created on the basis of the query terms, and all new
shown in FIGS. 8 and 9, provides a collaborative and
documents subsequently received from the network are
preferably adaptive search engine system in which elements
filtered by the new wire. A push-model may be used to send
of the structure and principles of operation of the apparatus
all passed, new documents to the user,
of FIGS. 1-7 are applied. Accordingly, a search engine
Among other basic search engine system structures, an
system of the invention, as preferably embodied, integrates 30 integrated system can be employed in which collaborative
collaborative filtering with adaptive content-based filtering
and content-based filtering is structured to provide demand
to provide improved search engine performance. The aerosearches with or without collaborative filtering, or wire
nym "CASE" refers to a search engine system of the
searches. In the operation of the preferred basic structure and
invention, i.e., a collaborative, adaptive search engine.
other basic structures, a query processor can be employed, if
In the operation of conventional search engines at portal 35 needed, to make search-type assignments for user queries,
web sites, user queries are searched on demand to find
Generally, basic search engine system structures of the
relevant informons across the web. Content-based filtering is
invention arc preferably embodied with the use of a protypically used in measuring the relevancy of informons, and
grammed computer system.
the search results are resented in the form of a list of
Collaborative filtering employs additional data from other
informons ranked by relevancy.
40 users to improve search results for an individual user for
The present invention combines collaborative filtering
with content-based filtering in measuring informons for
relevancy, and further preferably applies adaptive updating
of the content-based filtering operation. In providing these
whom a search is being conducted. The collaborative data
can be feedback informon rating data, and/or it can be
content-profile data for agent mind melding which is more
fully disclosed in Serial Number (Docket # LYC 4), entitled
results, the invention can be embodied as a search engine 45 INTEGRATED COLLABORATIVE/CONTENT-BASED
system in accordance with different basic structures. In the
FILTER STRUCTURE EMPLOYING SELECTIVELY
presently preferred basic structure, an integrated
SHARED, CONTENT-BASED PROFILE DATA TO
collaborative/content-based filter (FIGS. 1-7) is operated to
EVALUATE INFORMATION ENTITIES IN A MASSIVE
provide ongoing or continuous searching for selected user
INFORMATION NETWORK, filed by the current inventors
system. A suitable analysis is applied to the search engine
connection with FIGS. 1-7.
queries, with a "wire" being established for each query. On 50 on Nov. 19, 1998, and hereby incorporated by reference,
the other hand, a regular search engine Ls operated to make
Many types of user rating information can be used. For
immediate or short-term "demand" searches for other user
example, users can sort documents which they have read
queries on the basis of content-based filtering. This basic
from best to worst. Alternatively, users can select on a scale
structure of the invention is especially beneficial for use in
(numeric, such as 1 to 10, or worded, such as good, medium,
applying the invention to existing search engine structure. 55 poor) how much they enjoyed reading a document. Further,
Demand search results can be returned if no wire exists
user monitoring can measure lime spent by users on each
for an input query. Otherwise, wire search results are
document, thereby indicating user interest (normalized by
returned if a wire docs exist, or collaborative ranking data
document length). Among other possibilities, the choices of
can be applied from the wire filter structure to improve the
documents for reading by other users can be simply used as
results of (he demand search from the regular search engine. 60 an indication of interesting documents. In all cases, the
In the currently preferred embodiment, wires are created
feedback rating data can be based on interestingness or any
for the most common queries received by the search engine
of a variety of other document qualities, as described in
operations to determine which queries are most common,
Feedback ranking information can be used in a number of
and respective wires are then created for each of these 65 ways, and the invention is not limited by the method of
queries. An analysis update can be made from time to time
feedback information use. Use methods range in spectrum
to make wire additions or deletions as warranted.
from weighting relative ranks by a set amount (possibly
US 6,775,664 B2
25
26
equally, possibly heavy weighting one above the other) to
dynamically adjusting the weight by measuring how stalls-
the processor 48C rates informons returned by the spider
system 46C in a demand search as indicated by the reference
lically significant the user feedback is. For example, if only
character 48C2. Collaborative rating data is used in the
one person has ranked an article, it may not be significant.
However, if many people have consistently ranked an article 5
the same, more credibility may be placed on the user's
informon rating process in the wire search mode, and if
applied in the demand search mode, to the extent that
collaborative data is available for the informons in the
weighting.
search return. Search results are returned to the users 34C
FIG. 9 shows a generalized embodiment of the invention
and 36C from the ^^.j, relum processor 4«C as shown in
in which system elements in a CASE system 30C are
pjQ g
integrally configured to provide wire and/or demand 10
' '.
.
.
,
.
searches. A quer? processor 32C receives queries from an
Tte lnvenllon » Pref"abl>' embodied as s^own in ."•
individual user 34C and other users 36C. A mode selector
10 A °-uery processor 60C receives queries from an inch-
already exists, and wires exist only for those queries found
to be commonly entered as previously described. In the
continuously scans a network 70C for informons providing
a threshold-level match for conlenl based profiles (i.e.,
38C responds to the currently processed query to set a
vidual user 62C and other users 64C and determines whether
content-based filler structure 40C for wire search operation
» w're already exists for each entered query. If a wire exists,
or demand search operation. In the preferred application of 15
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?