I/P Engine, Inc. v. AOL, Inc. et al
Filing
1
COMPLAINT against AOL, Inc., Gannett Company, Inc., Google, Inc., IAC Search & Media, Inc., Target Corporation ( Filing fee $ 350 receipt number 14683024186.), filed by I/P Engine, Inc. (Attachments: # 1 Exhibit A, # 2 Exhibit B, # 3 Receipt, # 4 Civil Cover Sheet, # 5 Letter)(ecav, )
EXHIBIT A
US006314420B1
United States Patent
(io) Patent No.:
US 6,314,420 Bl
(45) Date of Patent:
*Nov. 6,2001
Lang et al.
(54)
(75)
COLLABORATIVE/ADAPTIVE SEARCH
5,608,447 •
ENGINE
Inventors: Andrew K. Lang; Donald M. Kosak,
Notice:
6,014,665
1/2000 Culliss
6,029,161 • 2/2000 Lang et al
patent is extended or adjusted under 35
U.S.C. 154(b) by 0 days.
retrieval, Jul. 6, 1994, pp. 339-348."
Appl. No.: 09/204,149
* cited by examiner
Dec. 3, 1998
Primary Examiner—Thomas Black
Assistant Examiner—Frantz Coby
Related U.S. Application Data
(63)
I.LP
Apr. 4, 1996, now Pat. No. 5,867,799.
Int. Cl.7
(52)
U.S. Cl
(58)
(74) Attorney, Agent, or Finn—Testa, Hurwitz & Thibeault
Continuation-in-pait of application No. 08/627 436 filed on
(51)
(57)
G06F 17/30
internet. The search engine system employs a regular search
engine to make one-shot or demand searches for information
entities which provide at least threshold matches to user
707/5
707/1, 10, 102,
707/3, 2, 5
References Cited
U.S. PATENT DOCUMENTS
5,019,961
5,117,349
5,249,262
5,471,610
5,537,586
5,544,049
5,563,998
5,563,999
5/1991 Addusso et al
5/1992 Tirfing ct al
9/1993
11/1995
7/1996
8/1996
10/1996
10/1996
Baule
Kawaguchi el al
Amram ct al
Henderson et al
Yaksichclal
Yaksich et al
364/192
707/5
395/66
707/4
707/3
364/419.19
395/149
395/149
ABSTRACT
A search engine system is provided for a portal site on the
707/3; 707/10; 707/2;
Field of Search
(56)
707/5
707/5
Michael Persin, Document Filtering for Fast Ranking, Pro
ceeding of the seventeenth annual international ACM-SIGIR conference on research and development in information
claimer.
Filed:
6/2000 Culliss
1/2001 Culliss
707/1
707/5
OTHER PUBLICATIONS
This patent is subject to a terminal dis
(22)
'" 707/5
707/1
6,078,916
6,182,068
Subject to any disclaimer, the term of this
(21)
" 707/1
5,983,214 ♦ 11/1999 Lang et al
6,006,222
12/1999 Cullisi
Assignee: Lycos, Inc., Waltham, MA (US)
(*)
343/7
707/10
707/2
5,867,799 • 2/1999 Lang el al
both of Pittsburgh, PA (US)
(73)
3/1997 Fany el al
5,649,186 • 7/1997 Ferguson
5,842,199 ♦ 11/1998 Miller et al
queries. The search engine system also employs a
collaborative/content-based filter to make continuing
searches for information entities which match existing wire
queries and are ranked and stored over time in useraccessible, system wires corresponding to the respective
queries. A user feedback system provides collaborative
feedback data for integration with content profile data in the
operation of the collaborative/contcnt-based filter. A query
processor determines whether a demand search or a wire
search is made for an input query.
36 Claims, 10 Drawing Sheets
KCVCIC
cnuu:
IfQMOi
CHAHtCTiailtlCH
TZI
USER
30'
r_
29
33-
TJ
INFORMATION FILTER APPARATUS
J
32
SECOND
FIRST
PREDICTION
ADAPTATION ADAPTATION
MEANS
MEANS
MEANS
DISTRIBUTED
NETWORK
RESOURCE |I
28e
lib
COMMUNITY
FILTER
USER
COMMUNICATION
MEANS
NETWORK
31
COMPUTER
STORAGE
MEANS
RESOURCE §2
COMPUTER SYSTEM
-25
35
27q
s 17
USER USER USER USER USER
FILTER FILTER FILTER FILTER FILTER
28a
CREDIBILITY FILTER
COMMUNITY
FILTER
ADAPTIVE FILTER MEANS
r-19
ADAPTIVE CONTENT FILTER
EXTRACTION
MEANS
1.5
#3
DISTRIBUTED
FIG. I
16
OS
©
©
to
o
i
7"
64b
\ '
48a
if
MC
MC
~62o
PROFILE
MC
LPROFILE
65a
PROFILE
W
65a
LPROFILEJ
FEEDBACK
59cr
MC
54br
\
66a-
r 53o
SECOND
PROCESSOR
FEEDBACK
COMMUNITY
PROFILE
NOTE: AF=ADAPTIVE FILTER
THIRD
PROCESSOR
WC
iUSER
61a..
54o
51
I
66b'
AF
MC
I
MC
FIG. 2
MC
' "user" V
AF
65d
]'
MC
^
MC
NOTE: MC=MEMBER CLIENT
54d
-59b
COMMUNITY
PROFILE
PROFILE
ADAPTIVE
CONTENT
PROFILE
ADAPTIVE
THIRD
PROCESSOR
T61d
-57
SECOND
PROCESSOR
_N
53b
49
FEEDBACK
PROFILE
MC
PROFILE
PROCESSOR
PROFILE
FEEDBACK
PROFILE 65c
MC
PROriLE
54c
r55
52
PROCESSOR
THIRD
v
-66c
AF
FOURTH
PROCESSOR
..64a I
THIRD
PROCESSOR
V
AF
58^
FIRST
PROCESSOR
-56
ON
la
o
©
f
o
U.S. Patent
Nov. 6,2001
100-
Sheet 3 of 10
PROVIDING
DYNAMIC
INFORMON
CHARACTERIZATION
COMMUNITY
FILTER
135-
US 6,314,420 Bl
-105
■110
140—1 CLIENT FILTER
ADAPTIVELY FILTER
RAW INFORMONS
PRESENTING
PROPOSEO
INFORMON
TO USER
-115
RECEIVING
FEEDBACK
PROFILE
-120
ADAPTING
-125
CONTENT AND/OR
COLLABORATIVE
PROFILE
UPDATING DYNAMIC
INFORMON CHARACTERIZATION
IPREDICTING MC RESPONSESV -150
r170
GROUP INTO
PREFERENCE
COHORT
175
TARGETED
INFORMON
-165
-155
CREATE
CONSUMER
PROFILE
CREDIBILITY
FILTER
RECOMMENDATION
FILTER
FIG. 3
-160
UPDATE
CREDIBILITY
PROFILE
CONSULTATION
FILTER
U.S. Patent
Nov. 6,2001
Sheet 4 of 10
US 6,314,420 Bl
■205
PREFILTER
PARTITION EACH
USER INTO
MULTIPLE MC's
200-
GROUP MC's
TO FORM
MULTIPLE
COMMUNITIES
-210
PREDICTING
COMMUNITY
PROFILE
-215
PREDICT
MC
PROFILE
-220
EXTRACT
-225
RAW
INFORMONS
r230
SELECT PROPOSED INFORMON
255
CONTENT FILTER
260-
COLLABORATION FILTER
265
MC FILTER
PROVIOE
PROPOSED
\*i
-235
INFORMONS
TO USER
FIG. 4
UPDATE
PREDICTION
CRITERIA
-245
RECEIVE
USER
FEEDBACK
■240
U.S. Patent
345-
340-
FIG. 5
INPUT
r
420
425b
L.
425c
425d
LEARNING
FUNCTION
r
LEARNING
FUNCTION
COLUBORATIVE
415-
r
425a
LEARNING
FUNCTION
UNSTRUCTURED
FEATURE
INFORMATION
410-
STRUCTURED
FEATURE
INFORMATION
405-
CORRELATED-FEATURE,
LEARNING
ERROR-CORRECTION
FUNCTION
UNIT
400
428d-
428c-
428b-
428a-
-426d
r 426c
r 426b
r 426a
427a
427c
FIG. 6
FUNCTION
WEIGHTED IRP
r
427d
WEIGHTED IRP
FUNCTION
r
WEIGHTED IRP
FUNCTION
WEIGHTED IRP
FUNCTION
r
429d
430
COMPLETE
RATING
PREDICTOR
-432
UP=UNCERTAINTY
PREDICTOR
NOTE: IRP=INDEPENDENT
RATING PREDICTOR
CERTAINTY
WEIGHTING
FUNCTION
r
o
I—»
o
ON
o
13
to
in
U.S. Patent
Nov. 6,2001
Sheet 7 of 10
US 6,314,420 Bl
U.S. Patent
Nov. 6,2001
USER
Sheet 8 of 10
US 6,314,420 Bl
20C
PROVIDES
QUERY
TERMS
LOOKUP
QUERY
22C
IN
TABLE
DOES
A WIRE
ALREADY EXIST
ON THE
QUERY
DISPLAY RESULTS
FROM
PREVIOUSLY
CREATED
WIRE
TOPIC?
PERFORM
REGULAR
SEARCH ON
QUERY
ENGINE
FIG. 8
28C
26C
->
OTHER
USER
-36CI
INDIVIDUAL
USER
30C
FEEDBACK PROCESSOR-
43C —
32C
SEARCH RETURN
PROCESSOR
48C
RATING
DEMAND
MODE
PROCESSING/
RATING
INFORMON
WIRE MODE
CONTENT-BASED
PROFILE
DEMAND
PROFILES
50C
48C2
■40C
38C
42C
WIRE
PROFILES
MOLD
AGENT
MIND
FILTER
STRUCTURE
BASED
CONTENT-
MODE
SELECTOR
I
TO CURRENT INFORMON
FEEDBACK DATA RELEVANT
ADAPT! VF FFFDBACK DATA
FEEDBACK DATA
ADAPTIVE
QUERY
PROCESSOR
FEEDBACK DATA
FEEDBACK DATA ■
MEMORY
SPIDER
46C
FIG. 9
M
WEB
SITE
A
WEB
SITE
W
OS
o
SO
o
C/5
a
62C-=,
64C
STATIONS
USER
QUERY
►
PROCESSOR
60C—-I
74C
r
CONSIDERED INFORMONS
OTHER
USER
STATION
£
80FD
WIRE
66C
MEMORY
MEMORY
-78CM
MEMORY
ACQUISITION
SPIDERCONTINOUS
INFORMON
72C
80C
r
RETURN PROCESSOR
FIG. 10
MEMORY
DEMAND
SEARCH
-82C
ON DEMAND
ACQUISITION
FILTER
STRUCTURE
BASED
CONTENT-
ENGINE
(DEMAND)
SPIDERINFORMON
SEARCH
DEMAND SEARCH
76C —
r
COLLABORATIVE COLLABORATIVE/
CONTENT-BASED
FEEDBACK
FILTER
FEEDBACK PROCESSOR- DATA
STRUCTURE
FEEDBACK DATA FOR
VE FEEDBACK DATA FOR USER CONTENT PROFILE
INDIVIDUAL
ADAP
ADAPTIVE FEEDBACK DATA FOR OTHER USER CONTENT PROFILES ,
WIRE RETURNS TO USERS RELEVANT INFORMONS LIST
-78C
-68C
«-► WEB
SITE
M
«-► WEB
SITE
A
td
O
GO
C5
US 6,314,420 Bl
found to be relevant. The system maintains the ranked
informons in a stored list from which the individual user can
select any listed informon for consideration.
COLLABORATIVE/ADAPTIVE SEARCH
ENGINE
This application is a continuation-in-part of copending
application Ser. No. 08/627,436 filed on Apr. 4, 1996 now
5
U.S Pat. No. 5,867,799, the entire contents of which are
hereby incorporated by reference.
As the system continues to feed (he individual user's
"wire", the stored relevant informon list typically changes
due to factors including a return of new and more relevant
informons, adjustments in the user's query, feedback evalu
ations by the user for considered informons, and updatings
BACKGROUND OF THE INVENTION
in collaborative feedback data. Received informons are
The present invention relates to information processing 10 similarly processed for other users' wires established in the
systems for large or massive information networks, such as
information filter system. Thus, the integrated information
the internet, and more particularly to such information
filter system performs continued long-term searching, i.e., it
systems especially adapted for operation in portal and other
compares network informons to multiple users' queries to
web sites wherein a search engine operates with collabora
find matching informons for various users' wires over the
tive and content-based filtering to provide better search 15 course of time, whereas conventional search engines initiate
responses to user queries.
a search in response to an individual user's query and use
content-based filtering to compare the query to accessed
In the operation of the internet, a countless number of
network informons typically to find matching informons
information are available for downloading from any of at
during a limited, short-term search time period.
least thousands of sites for consideration by a user at the
user's location. A user typically connects to a portal or other 20
The present invention is directed lo an information pro
cessing system especially adapted for use at internet portal
or other web sites lo make network searches for information
entities relevant to user queries, with collaborative feedback
data and content-based data and adaptive filler structuring,
based filter in a search engine to search the internet and find 25 being used in filtering operations to produce significantly
information which match the query. This process is basically
improved search resulls.
web site having a search capability, and thereafter enters a
particular query, i.e., a request for information relevant to a
topic, a field of interest, etc. Thereafter, the search site
typically employs a "spider" scanning system and a content-
a pre-search process in which matching informons are
found, at the time of initiating a search for the user's query,
SUMMARY OF THE INVENTION
by comparing informons in an "informon data base" to the
user's query. In essence, the pre-search process is a short ^
term search for quickly rinding and quickly identifying
information entities which are content matched to the user's
query.
The return list of matching informons can be very exten
sive according to the subject of the query and the breadth of
the query. More specific queries typically result in shorter
return lists. In some cases, the search site may also be
structured to find web sites which probably have stored
informons matching the entered query.
Collaborative data can be made available to assist in
informon rating when a user actually downloads an
informon, considers and evaluates it, and returns data to the
search site as a representation of the value of the considered
informon to the user.
A search engine system employs a content-based filtering
system for receiving informons from a network on a con
tinuing basis and for tillering the informons for relevancy to
a wire or demand query from an individual user. A feedback
system provides feedback dala from other users.
Another system controls the operation of the filtering
system to filler for one of a wire response and a demand
response and 10 return the one response to the user. The
filtering system combines pertaining feedback data from the
feedback system with conlent profile dala in determining the
40
relevancy of the informons for inclusion in al least a wire
response to the query.
BRIEF' DESCRIF11ON OF THE DRAWINGS
45
FIG. 1 is an diagrammatic representation of an embodi
ment of an information filtering apparatus according lo the
In the patent application which is parent to this
present invention.
continuation-in-part application, i.e. Ser. No. 08/627,436,
filed by the present inventors on Apr. 4,1996, now U.S. Pat.
FIG. 2 is an diagrammatic representation of another
No. 5,867,799 and hereby incorporated by reference, an
embodiment of an information filtering apparatus according
advanced collaborative/content-bascd information filter sys- 50 to the present invention.
tem is employed to provide superior filtering in the process
FIG. 3 is a flow diagram for an embodiment of an
of finding and rating informons which match a user's query.
informalion filtering mclhod according to the present inven
The information filter structure in this system integrates
tion.
content-based filtering and collaborative filtering to deter
FIG. 4 is a flow diagram for another embodiment of an
mine relevancy of informons received from various sites in 55
information filtering method according to the present inven
the Internet or other network. In operation, a user enters a
tion.
query and a corresponding "wire" is established, i.e., the
FIG. 5 is a flow diagram for yei another embodiment of
query is profiled in storage on a content basis and adaplively
an information filtering method according lo the present
updated over time, and informons obtained from the net
work are compared to the profile for relevancy and ranking. (,0 invention.
FIG. 6 is an illustration of a threc-componcnl-inpul model
A continuously operating "spider" scans the network to find
and profile with associated predictors.
informons which are received and processed to determine
relevancy to the individual user's wire or to wires estab
lished by numerous other users.
FIG. 7 is an illustration of a mind pool hierarchy.
FIG. 8 is a logic diagram illustrating a search selection
The integrated filter system compares received informons 65 feature of Hie invenlion;
to the individual user's query profile data, combined with
FIG. 9 is a functional block diagram of an embodiment of
collaborative da la, and ranks, in order of value, informons
the invenlion in which an intcgralcd informalion processing
US 6,314,420 Bl
system employs a search engine and operates with combined
collaborative filtering and content-based tillering, which is
preferably adaptive, to develop responses to user queries.
i.e., that share a subset of attributes or interests. In general,
ihc subset of shared attributes forms the community profile
for a given community and is representative of the commu
nity norms, or common client attributes.
The "relevance" of a particular informon broadly
describes how well it satisfies the user's information need.
The more relevant an informon is to a user, the higher the
"signal" content. The less relevant the informon, the higher
the "noise" content. Clearly, the notion of what is relevant to
a particular user can vary over time and with context, and the
user can find the relevance of a particular informon limited
queries to the appropriate structure for responses.
lo only a few of the user's potentially vast interest areas.
Because a user's interests typically change slowly, relative
DETAILED DESCRIPTION OF THE
lo the data stream, it is preferred lo use adaptive procedures
EMBODIMENTS
15 lo track the user's current interests and follow them over
The invention herein is preferably configured with an
lime. Provision, too, is preferred to be made for sudden
apparatus and method for information filtering in a computer
changes in inlerest, e.g., taking up antiquarian sword col
system receiving a data stream from a computer network, in
lecting and discontinuing stamp collecting, so that the
which entities of information relevant to the user, or
method and apparatus track the evolution of "relevance" to
"informons," are extracted from the data stream using 2n a user and Ihe communities of which the user is a member.
content-based and collaborative filtering. The information
In general, information filtering is the process of selecting
filtering is long term in the sense that it operates on a
the information that a users wishes to see, i.e., informons,
continuing basis, and is both interactive and distributed in
from a large amount of data. Content-based filtering is a
structure and method. It is interactive in that communication
process of filtering by extracting features from the informon,
is substantially big-directional at each level of the filter. It is 25 e.g., the text of a document, to determine the informon's
distributed in that all or part of the information filter can
relevance. Collaborative filtering, on the other hand, is the
include a purely hierarchical (up-and-down/parent-child)
process of filtering informons, e.g., documents, by deter
structure or method, a purely parallel (peer-to-peer) structure
mining what informons other users wilh similar interests or
or method, or a combination of hierachical and parallel
needs found lo be rclcvanl.
FIG. 10 shows another and presently preferred embodi
ment of the invention in which an information processing 5
system includes an integrated filter structure providing
collaborativc/adaplivc-conlcnt-bascd filtering to develop
longer term, continuing responses to user queries, and a
search engine structure which provides short term, demand
responses to user queries, with the system directing user io
structures and method.
30
The system apparatus includes a filler structure having
As used herein, the term "informon" comprehends an
adaptive content based filters and adaptive collaborative
information entity of potential or actual interest lo a par
fillers, which respectively include, and respond to, an adap
ticular user. In general, informons can be hetcrogenous in
tive content profile and an adaptive collaboration profile. As
nature and can be all or part of a textual, a visual, or an audio
used herein, the term "content-based filter" means a filter in
entity. Also, informons can be composed of a combination of 35 which content data, such as key words, is used in performing
the aforementioned entities, thereby being a multimedia
ihe filtering process. In a collaborative filter, other user data
entity. Furthermore, an informon can be an entity of pat
is used in performing Ihe filtering process. A collaborative
terned data, such as, a data file containing a digital repre
filter is also sometimes referred to as a "content" filter since
sentation of signals and can be a combination of any of the
it ultimately performs the lask of finding an object or
previously-mentioned entities. Although some of the data in 40 document having content relevant to the content desired by
a data stream, including informons, may be included in an
a user. If there are some instances herein where the term
informon, not all data is relevant to a user, and is not within
"contenl filter" is used as distinguished from a collaborative
the definition of an informon. By analogy, an informon may
filler, it is intended thai the term "content filter" mean
be considered to be a "signal," and the total data stream may
"conteni-bascd filler." The adaptive filters each are preferred
be considered to be "signal +noise." Therefore, an informa 45 lo include at lcasl a portion of a community filter for each
tion filtering apparatus is analogous to other types of signal
community serviced by the apparatus, and a portion of a
filters in that it is designed to separate the "signal" from the
member client filler for each member client of Ihe serviced
"noise."
communities. For this reason, ihe adaptive filtering is dis
Also as used herein, the term "user" is an individual in
tributed in that each of the community fillers perform
communication with the network. Because an individual so adaptive collaborative filtering and adaptive content
user can be interested in multiple categories of information,
filtering, even if on different levels, and even if many filters
the user can be considered to be multiple clients each having
exisl on a given level. The integrated filtering permits an
a unique profile, or set of attributes. Each member client
individual user to be a unique member client of multiple
profile, then, is representative of a particular group of user
communities, wilh each community including multiple
preferences. Collectively, the member client profiles asso 55 member clicnls sharing similar interests. The adaptive fea
ciated with each user is the user profile. The present inven
tures permit the interests of member clients and entire
tion can apply the learned knowledge of one of a user's
communities lo change gradually over time. Also a member
member clients to others of the user's member clients, so
client has the ability to indicate a sudden change in
that the importance of the learned knowledge, e.g., the user's
preference, e.g., the member client remains a collector but is
preference for a particular author in one interest area as 60 no longer interested in coin collecting.
represented by the member client, can increase the impor
tance of that particular factor, A's authorship, for others of
Ihc user's member clients. Each of Ihc clients of one user can
be associated with Ihc individual clicnls of olhcr users
insofar as the profiles of the respective clients have similar 65
attributes. A "community" is a group of clicnls, called
member clients, that have similar member client profiles,
The filter structure also implements adaptive credibility
filtering, providing member clicnls wilh a measure of infor
mon credibility, as judged by other member clients in the
community. For example, a new member client in a first
community, having no credibility, can inject an informon
into Ihe data flow, thereby providing other member clients in
other communities with the proposed informon, based on the
US 6,314,420 Bl
5
6
respective community profile and member client profiles. If
the other member clients believe the content of the informon
topic, or the quality of the recommendation. As before, the
responders can accrue quality points, value tokens, or "info
to be credible, the adaptive credibility profile will reflect a
growing credibility. Conversely, feedback profiles from
bucks," as apportioned by the requester, in return for the
useful recommendation
informon recipients that indicate a lack of credibility cause 5
Furthermore, certain' embodiments are preferred to be
the adaptive credibility profile, for the informon author to
, IP
,
• •
. ,. ,
„ f,.
. \.
...
.
reflect untrustworlhinessHowever, the growth and dcclina«K*P«»n«»g " thaI *>™ °r a»°f *e adaptive filters used
lion of credibility are not "purely democratic," in the sense
f the syslem ^"^.cally seek optimal values for the
that one's credibility is susceptible to the bias of others1
functlon lnIended bv Ihe filler. «**. content analysis,
perceptions, so the growth or declination of one's credibility
collaboration, credibility, reliability, etc.
member client is viewed by other member clients.
preferences of individual member clients and communities.
nities may be used to establish those member clients as
sutncr nccds in a timcly man"cr
is generally proportional to how the credibility of the new
T"6 filter structure herein is capable of identifying, the
Member clients can put their respective reputations "on
providing direct and inferential consumer preference
the line," and engage in spirited discussions which can be
information, and tracking shifts in Ihe preferences whether
refcreed by other interested member clients. The credibility
lhe shifts be gradual or sudden. The consumer preference
profile further can be partitioned to permit separate credinformation can be used to target particular consumer prefibility sub-profiles for the credibility of the content of the
erence S^F5' or c°h°f|s. ™d provide members of the
informon, the author, the author's community, the reviewers,
cohort wlth lar8eled informons relevant to their consumer
and the like, and can be fed back to discussion participants,
preferences. This information also may be used to follow
reviewers, and observers to monitor the responses of others M dcmographical shifts so that activities relying on accurate
to the debate. The adaptive credibility profiles for those
demographical data, such as retail marketing, can use the
member clients with top credibility ratings in their commuconsumer preference information to anticipate evolving con"experts" in their respective communities.
To provide a basis for adaptation, it is preferred that each
With this functionality, additional features can be ,s raw informon be processed into a standardized vector, which
implemented, including, for example, "instant polling" ona "
may be on lne order of 20,000 to 100,000 tokens long. The
matter of political or consumer interest. In conjunction with
learning and optimization methods that ultimately are choboth content and collaborative filtering, credibility filtering,
sen are preferred to be substantially robust to Ihe problems
and the resulting adaptive credibility profiles, also may be
which can be presented by such high-dimensional input
used to produce other features, such as on-line consultation 30 sPaces- Dimensionality reduction using methods such as the
and recommendation services. Although the "experts" in the
singular value decomposition (SVD), or auto-encoding neucommunities most closely related to the topic can be
ra| networks attempt to reduce the size of the space while
afforded special status as such, member clients from other
initially retaining the information contained in the original
communities also can participate in the consultation or
representation. However, the SVD can lose information
recommendation process.
In one embodiment of the consultation service, credibility
filtering can be augmented to include consultation filtering.
3J during the transformation and may give inferior results. Two
adaptation/learning methods that are presently preferred
include the TF-IDF technique and lhe MDL technique.
With this feature, a member client can transmit an informon
to the network with a request for guidance on an issue, for
FIG. 1 illustrates one embodiment of an information
filtering apparatus 1 structured for search engine implemcn-
example, caring for a sick tropical fish. Other member 40 Iat'on m accordance with the invention as described subsc-
clients can respond to the requester with informons related
quently herein in connection with FIGS. 8 and 9. In general,
lo the topic, e.g., suggestions for water temperature and
a dala stream is conveyed through network 3, which can be
antibiotics. The informons of the responders can include
a global internet work. A skilled artisan would recognize that
apparatus 1 can be used with other types of networks,
their respective credibility profiles, community membership,
and professional or avocational affiliations. The requester 45 including, for example, an enterprise-wide network, or
can provide feedback to each of the responders, including a
"intranet." Using network 3, User #1 (5) can communicate
rating of the credibility of the respondcr on the particular
wiIn olner users> for example, User #2 (7) and User #3 (9),
topic. Additionally, the responders can accrue quality points,
and a'so w'tn distributed network resources such as resource
value tokens, or "info bucks," as apportioned by the
#1 (U) and resource #2 (13).
requester, in return for useful guidance.
Similarly, one embodiment of an on-line recommendation
50
Apparatus 1 is preferred lo be part of computer system 16,
although User #1 (5) is not required to be the sole user of
service uses recommendation filtering and adaptive recomcomputer system 16. In one present embodiment, it is
mendation profiles to give member clients recommendations
preferred that computer system 16 having information filter
on matters as diverse as local auto mechanics and worldapparatus 1 therein filters information for a plurality of
class medieval armor refurbishers. In this embodiment, Ihe 55 users- One application for apparatus 1, for example, could be
requester can transmit the informon to Ihe network bearing
that user 5 and similar users may be subscribers to a
the request for recommendation. Olher member clients can
commercial information filtering service, which can be
respond 10 Ihe requester with informons having specific
provided by Ihe owner of computer sysiem 16.
recommendations or dis-recommcndalions, advice, etc. As
Extraction means 17 can be coupled wilh, and receives
with lhe consultation service, the informons of lhe respond- 60 dala stream IS from, network 3. Extraction means 17 can
ers can be augmented to include their respective credibility
profiles, community membership, and professional or avocational affiliations. A rating of each recommendation provided by a respondcr, rclalivc to olher responders'
identify and cxiraci raw informons 19 from dala stream 15.
Each of the raw informons 19 has an information content,
Extraction means 17 uses the adaptive content filter, and at
least pan of lhe adaptive content profile, lo analyze the data
recommendations, also can be supplied. The requester can 65 stream for Ihe presence of raw informons. Raw informons
provide feedback lo each of the rcspondcrs, including a
arc those dala entities whose contcnl identifies them as being
raling of lhe credibility of the respondcr on the particular
"in the ballpark," or of potential interest 10 a community
US 6,314,420 Bl
8
coupled to apparatus 1. Extraction means 17 can remove
duplicate informons, even if the informons arrive from
different sources, so that user resources are not wasted by
handling and viewing repetitive and cumulative information.
Extraction means 17 also can use at least part of a community profile and a user profile for User #1 (5) to determine
whether the informon content is relevant to the community
5
spaced feedback responses, to update the adaptive collaboration profile, the adaptive content profile, or both, with an
adapted future prediction 34, in order to minimize subsequent prediclion errors by the respective adaptive collaboration filter and adaptive content filter.
in one embodiment of the invention herein, it is preferred
,hat prediction means 33 be a self-optimizing prediction
means using a preselected learning technique.
of which User #1 is a part.
Filter means 21 adaptively filters raw informons 19 and
Such techniques can include, for example, one or more of
produces proposed informons 23 which are conveyed to " a op-key-word-selection learning technique, a nearestUser #1 (5) by communication means 25. A proposed
neighbor learning technique, a term-weighting learning
informon is a selected raw informon that, based upon the
technique, and a probabilistic learning technique. First adaprespective member client and community profiles, is pretation means 30 also can include a neural nelwork therein
dieted to be of particular inierest to a member client of User
and employ a neural network learning technique for adap5. Filter means 21 can include a plurality of community « tation and prediction. In one present embodiment of ihe
fillers Zla.b and a plurality of member client fillers 28«-e,
invention, the lerm-weighting learning technique is preeach respectively having community and member client
ferred to be a TF-IDF technique and the probabilistic leamprofiles. When raw informons 19 are filtered by filter means
ing technique is preferred to be a MDL learning technique
21 those informons that are predicted to be suitable for a
Firsl adaptalion means 30 further can include ^^
pariicuhr member client of a particular community, e.g., 20 adaplation ^eans 32 for adaplin al least one of Ihe ad
ive
User #1 (5), response to the respective community and
coiiaboration profiles, the adaptive content profiles! the
member client profiles, are conveyed thereto. Where such .s
dfi
«w JT
?r h P,
r,!-r lllty « lC
raw informons 19 according .0 a credulity profile.
communitv proglc> and the J, fl, responsive o at
least one of ihe other profiles. In this ma^er, trends a.irib-
„ ulable lo individual member clienls. individual users, and
25 individual communilics in one domajn of syslem 16 can be
It is preferred that the adaptive filtering performed within
recognized by, and influence, similar entities in other
domains (melding agent "minds"), contained within system
16 to the extern lhal the respective entities share common
filter means 21 by the plurality of filters 27o,fc, 2&a-e, and
35, use a self-optimizing adaptive filtering so that each of ihe
parameters processed by filters 21a,b, 28«-e, and 35, is
driven continually .0 respective values corresponding .0 a
minimal error for each individual parameter. Self-
»
error, etc., are favored to prevail.
Apparatus 1 also can include a computer slorage means
31 for slori
the
fil
inch|di
me ad
profile and |ne Jtivc coUaboratfon
opumizationencouragesadynamic,marketplace-likeopera-
tion of the system, ,n that those entities having the most
desirable value e.g. h.ghes. cred.b.hty, lowest pred.c.ed
attributes
;v*
fileP Addiun™
trend-lracking information can be stored for later retrieval in
storagc
mcans 31> or
^
cd
,0
nc,work
3> remolc analysiS( fof cxamp]e> by ^ #, (?)
Self-optimization can be effected according to respective
preselected self-optimizing adaption techniques including,
3
f
FIG. 2 illustrates another preferred embodiment of infor-
maIion filIering apparams s0, in computer system 51. Appa-
for example, one or more of a lop-key-word-selection adap-
lation technique, and a neural network learning technique. In
ratus 50 can include firsl processor 52, second processor
53a,b, third processor 64a-d, and a fourth processor 55, .0
effect the desired information filtering. First processor 52
can be coupled lo, and receive a data stream 56 from
weighting adaptation technique is preferred to be a TF-IDF
by extracling raw fnformons 58 from data streaPm &
anon technique, a nearest-neighbor adaptation technique, a
lerm-weighting adaptation technique, a probab.listic adapone present embodiment of the invention, the term-
SlTV Sn?'?
*UC PtaU°n tCChmqUe 'S PR!"
ferred to be a MDL technique.
nelwork 57. Firsl processor 52 can serve as a pre-processor
.
« sive «° ^processing profile 49 and conveying informons 58
to second processor 53fl,6.
When user 5 receives proposed informon 23 from appa-
Because of the inconsistencies presented by Ihe ncarlyinfinite individual differences in the modes of
along with the proposed informon. By answering, user 5
conceptualization, expression, and vocabulary among users
creates a feedback profile that corresponds to feedback 50 even wiihin a communily of coinciding interests, similar
response 29. User feedback response 29 can be active
notions can be described with vastly different terms and
feedback, passive feedback, or a combination. Active Teedconnotations, greatly complicating informon characierizaback can include the user s numerical rating for an
Iion. Mode variations can be even greater between disparate
informon, hmts, and indices. Hints can include like or dislike
communilies, discouraging interaction and knowlcdgeof an author, and informon source and timeliness. Indices 55 sharing among communities. Therefore, it is particularly
can include credibility agreement with consent or author,
preferred thai processor 52 create a mode-invariant reprehumor, or value. Feedback response 29 provides an actual
station for each raw informon, thus allowing fast accurate
ratus 1, user 5 is provided with multiple feedback queries
response to proposed informon 23, which is a measure of the
relevance of the proposed informon to the information need
of user 5. Such relevance feedback attempts ,0 .rnprove ,he
performance lor a particular profile by modifying the
profiles, based on feedback response 29.
informon characterization and collaborative filtering Mode-
60
invariant representations lend 10 facilitate relevant informon
selection and distribulion wilhin and
coramuniticSj
thereby promoting knowledge-sharing, thereby benefiting
,he group of inler,inked communilies, i.e., a society, as welL
A predicted response anticipated by adaptive filtering
First processor 52 also can be used to prevent duplicate
means 21 can be compared to the actual feedback response
informons, e.g., the same information from different
29 of user 5 by firsl adaplation means 30, which derives a 65 sources, from further pcnelraling, and thus consuming the
prediction error. First adaptation mcans 30 also can include
resources of, the filtering process. Other processors 53 a b
prediction mcans 33, which collects a number of temporally54<7-rf, also may be used 10 perform the duplicate informa-
US 6,314,420 Bl
10
tion elimination function, but additionally may measure the
differences between the existing informon and new infor-
ferred to be a TF-IDI-' technique and the probabilistic adap
tation technique Ls preferred to be a MDL technique.
mons. That difference between the content of the informon
the previous time the user reviewed it and the content of the
informon in its present form is the "delta" of interest.
processors 52-55 could be combined functionally so that the
learning" can be accomplished by second processor SSa,b,
alone or in concert with third processor 54a-d. Indeed, third
suitable processors. As described in the context of FIG. 1,
An artisan would recognize that one or more of the
s actual number of processors used in the apparatus 50 could
Processors 53a,b, S4a-d may eliminate the informon from
be less than, or greater than, that illustrated in FIG. 2. For
further processing, or direct the new, altered informon to the
example, in one embodiment of the present invention, first
member client, in the event that nature or extent of the
processor 52 can be in a single microcomputer workstation,
change exceeds a "delta" threshold. In general, from the
with processors 53-55 being implemented in additional
notion of exceeding a preselected delta threshold, one may ]0 respective microcomputer systems. Suitable microcomputer
infer that the informon has changed to the extent that the
systems can include those based upon the Intel® Pentiumchange is interesting to the user. The nature of this change
Pro™ microprocessor. In fact, the flexibility of design
can be shared among all of a user's member clients. This
presented by the invention allows for extensive scalability of
delta threshold can be preselected by the user, or by the
apparatus 50, in which the number of users, and the compreselected learning technique. Such processing, or "delta 15 munities supported may be easily expanded by adding
the interrelation of the several adaptive profiles and respec
tive filters allow trends attributable to individual member
processor 54a-d adapts a delta learning profile for each
clients, individual users, and individual communities in one
member client of the community, i.e. user, thus anticipating 2q domain of system 51 to be recognized by, and influence,
those changes in existing informons that the user may find
similar entities in other domains, of system 51 to the extent
"interesting."
that the respective entities in the different domains share
Second processor S3a,b can filter raw informons 58 and
common attributes.
extract proposed community informons 59a,b therefrom.
The above described system operates in accordance with
Informons 59a,b are those predicted by processor S3a,b to 25 100 for information filtering in a computer system, as
be relevant to the respective communities, in response to a
illustrated in FIG. 3, which includes providing a dynamic
community profiles 48a,b that are unique to the communi
informon characterization (step 105) having a plurality of
ties. Although only two second processors 53a,b are shown
profiles encoded therein, including an adaptive content
in FIG. 2, system SI can be scaled to support many more
profile and an adaptive collaboration profile; and adaptivcly
processors, and communities. It is presently preferred that 3U filtering the raw informons (step 110) responsive to the
second processor S3a,b extract community informons 59a,b
dynamic informon characterization, thereby producing a
using a two-step process. Where processor 52 has generated
proposed informon. The method continues by presenting the
mode-invariant concept representations of the raw
proposed informon to the user (step 115) and receiving a
processor 54a-d can be the locus for delta learning, where
informons, processor 53a,b can perform concept-based
feedback profile from the user (step 120), responsive to the
indexing, and then provide detailed community filtering of 35 proposed informon. Also, the method includes adapting at
each informon.
least one of the adaptive content profile (step 125) and the
Third processors 54a-d can receive community infor
adaptive collaboration profile responsive to the feedback
mons 59a,b from processors Sia,b, and extract proposed
profile; and updating the dynamic informon characterization
member client informons 6la~d therefrom, responsive to
(step 130) responsive thereto.
unique member client profiles 62a-d for respective ones of 40
The adaptive filtering (step 110) in method 100 can be
member clients 63a-d. Each user can be represented by
machine distributed adaptive filtering that includes commu
multiple member clients in multiple communities. For
example, each of users 64a,b can maintain interests in each
of the communities serviced by respective second processors
nity filtering (sub-step 135), using a community profile for
each community, and client filtering (sub-step 140), simi
Fourth processor 55 can include a plurality of adaptive
fillers 66a-d for each of the aforementioned profiles and
computer storage therefor. It is preferred that the plurality of
adaptive filters 66a-d be self-optimizing adaptive fillers.
155) of Ihe raw informons responsive lo an adaptive cred
ibility profile and updaling the credibility profile (step 160)
responsive to the user feedback profile. Method 100 further
can include creating a consumer profile (slcp 165) respon-
larly using a member client profile for each member client
S3a,b, and each receive separate member client informons 45 of each community. It is preferred that the filtering in
61b,c and 61a,d, respectively.
sub-steps 135 and 140 be responsive to the adaptive content
Each member client 63a-d provides respective member
profile and the adaptive collaboration profile. Method 100
client feedback 6Sa-d to fourth processor 55, responsive to
comprehends servicing multiple communities and multiple
the proposed member client informons 61a-d. Rased upon
of users. In turn, each user may be represented by multiple
the member client feedback 6Sa-d, processor 55 updates at 50 member clients, with each client having a unique member
least one of the preprocessing profile 49, community profiles
client profile and being a member of a selected community.
48a, fc and member client profiles 62a-d. Also, processor 55
It is preferred that updating the dynamic informon charac
adapts at least one of the adaptive content profile 68 and the
terization (step 130) further include predicting selected
adaptive collaboration profile 69, responsive to profiles 49,
subsequent member client responses (step 150).
48a, b, and 62a-d.
55
Method 100 can also include credibility tillering (step
Sclf-optimizalion can be effected according to a preselected 60 sive to the user feedback profile. In general, Ihe consumer
self-optimizing adaptation technique including, for example,
profile is representative of predetermined consumer prefer
one or more of a top-key-word-seleclion adaptation
ence criteria relative to Ihe communities of which the user is
technique, a nearest-neighbor adaptation technique, a terma member client. Furthermore, grouping selected ones (step
weighting adaptation technique, and a probabilistic adapta
170) of the users into a preference cohort, rcspoasive to the
tion technique. Any of the adaptive fillers 66a-d may 65 preselected consumer preference criteria, can facilitate pro
include a neural network In one present embodiment of the
viding a targeted informon (slcp 175), such as an
invention, the term-weighting adaptation technique is pre
advertisement, to the preference cohort.
US 6,314,420 Bl
11
12
FIG. 4 illustrates yet another preferred method 200. In
structure found in the article headers. For example, in
general, method 200 includes partitioning (step 20S) each
user into multiple member clients, each having a unique
addition to typical words such as "seminar" counting as
tokens, the punctuation mark "$" and the symbol "News-
member client profile with multiple client attributes and
grouping member clients (step 210) to form a multiple
5
communities with each member client in a particular community sharing selected client attributes with other member
clients, thereby providing each community with a unique
community profile having common client attributes.
group:comp.ai" are also tokens. Using noun phrases as
tokens also can be useful.
Next a vector of token counts for the document is created,
This vector is the size of the total vocabulary, with zeros for
tokens not occurring in the document. Using this type of
vector is sometimes called Ihe bag-of-words model. While
Method 200 continues by predicting a community profile 10 the bag-of-words model does not capture the order of the
(step 215) for each community using first prediction criteria,
tokens in the document, which may be needed for linguistic
and predicting a member client profile (step 220) for a
or syntactic analysis, it captures most of the information
member client in a particular community using second
needed for filtering purposes.
prediclion criteria. Method 200 also includes the steps of
Although, it is common in information retrieval systems
extraciing raw informons (step 225) from a data stream and 15 io group the tokens together by their common linguistic
selecting proposed informons (step 230) from raw inforroots, called stemming, as a next step it is preferred in the
mons. The proposed informons generally are correlated with
one or more of the common client attributes of a community,
and of the member client attributes of the particular member
present invention that the tokens be left in their unstemmed
form. In this form, the tokens are amenable to being classifted into mode-invariant concept components,
posed informons permits the updating of Ihe first and second
vocabulary and conceptualization. Each community can
client to whom the proposed informon is offered. After 2"
Creating a mode-invariant profile (slep 305), C, includes
providing the proposed informons to the user (step 235),
creating a conceptual representalion for each informon, A,
receiving user feedback (step 240) in response to Ihe pro,hat is invariani wilh respecl ,0 lhe form-of-expression, e.g.,
prediction criteria (step 245) responsive to the user feed- ^
kack-
Method 200 further may include prefiltering the data
stream (step 250) using the predicted community profile,
with the predicted community profile identifying the raw
informons in the data stream.
consisl o[ a «Meta-U-Zine" collection, M, of informons.
25 Based upon profile C, the appropriate communities, if any.
for each informon in the data stream are selected by conceptbased indexing (step 310) into each M. That is, for each
concept C that describes A, put A into a queue QM, for each
M which is related to C. It is preferred that there is a list of
Step 230 of selecting proposed informons can include
filtering the raw informons using an adaptive content filter
(step 255) responsive to the informon content; filtering the
raw informons using an adaptive collaboration filter (step
260) responsive to the common client attributes for the
Ms that is stored for each concept and that can be easily
index-searched. Each A that is determined to be a poor fit for
a particular M is eliminated from further processing. Once
A has been matched with a particular M, a more complex
community profile PM is developed and maintained for each
an adaptive member client filler (step 265) responsive to the
unique member client profile.
determine whether it matches l'M strongly enough to be
retained or "weeded" out (step 325) at this stage.
pertaining community; and filtering the raw informons using 35 M (step 315). If A has fallen into QM, then A is analyzed to
Il is preferred that updating the first and second prediction
Each A for a particular M is sent to each user's personal
criteria (step 245) employ a self-optimizing adaptation 40 age"t, or member client U of M, for additional analysis
technique,, including, for example, one or more of a topbased on the member client's profile (step 325). Each A that
key-word-selection adaptation technique, a nearest-neighbor
?lls U's interests sufficiently is selected for U's personal
adaptation technique, a term-weighting adaptation
informon, or "U-Zine," collection, Z. Poor-filling informons
technique, and a probabilistic adaptation technique. It is
*& eliminated from placement in Z (step 330). This userfurther preferred that the term-weighting adaptation tech- 45 leve' sla8e of analysis and selection may be performed on a
nique be a TF-IDF technique and the probabilistic adaptacentralized server site or on the user's computer,
lion technique be a minimum description length technique.
Next, the proposed informons are presented to user U
The information filtering method shown in FIG. 5 pro(sleP 33S) for review. User U reads and rates each selected
vides rapid, efficient data reduction and routing, or filtering,
A found in Z (step 340). The feedback from U can consist
to the appropriate member client. The method 300 includes 50 °^ a rating for how "interesting" U found A to be, as well as
parsing the data stream into tokens (step 301); creating a
one or more °f tne following:
mode-invariant (MI) profile of the informon (step 305);
selecting the most appropriate communities for each
Opinion feedback: Did U agree, disagree, or have no
opinion regarding the position of A?
informon with regard to its fit within each community;
Informon Qualities: How does lhe user rate the informons
informon, based on the Ml profile, using concept-based
indexing (slep 310); detailed analysis (step 315) of each 55
eliminating poor-fitting informons (step 320); detailed filtcring of each informon relalive to fit for each member client
(step 325); eliminating poor-fitting informons (step 330);
presenting the informon Io lhe member client/user (step 60
335); and obtaining the member client/user response, includ-
ing multiple ratings for different facets of the user's response
to the informon (slep 340).
II is preferred that coherent portions of the dala stream.
i.e., potential raw informons, be first parsed (slep 301) into 65
generalized words, called tokens. Tokens include punctua-
lion and other specialized symbols thai may be part of lhe
Credibility Feedback: Did U find the facts, logic, sources,
and quotes in A to be truthful and credible or nol?
qualities, for example, "interestingness," credibility,
funniness, content value, writing quality, violence
content, sexual content, profanity level, business
imporiance, scientific merit, surprise/unexpectedness
of information content, artistic quality, dramatic appeal,
cnlertainmenl value, trendiness/importance to future
directions, and opinion agreement.
Specific Reason Feedback: Why did the user like or
dislike A?
Because of the authority?
Because of the source?
US 6,314,420 Bl
14
13
Ms, that are related to any concept C, may be Iooked-up.
Furthermore, when a Z is created by U, the concept clues
Because A is out-of-date (e.g. weather report from 3
weeks ago)?
Because the information contained in A has been seen
already? (I.e., the problem of duplicate information
delivery)
Categorization Feedback: Did U liked A? Was it placed
within the correct M and Z?
Such multi-faceted feedback queries can produce rich feed
back profiles from U that can be used to adapt each of the
5
given by U to the information filter can be used to determine
a set of likely concepts C that describe what U is seeking.
For example, if U types in "basketball" as a likely word in
the associated Z, then all concepts that have a high positive
weight for the word "basketball" are associated with the new
z. If no such concepts C seem to pre-exist, an entirely new
concept C is created that is endowed with the clues U has
profiles used in the filtering process to some optimal oper- 10 given as the starting profile.
To augment the effectiveness of concept-based indexing,
ating point.
it is preferred to provide continual optimization learning. In
One embodiment of creating a MI profile (step 305) for
general, when a concept C no longer uniquely triggers any
each concept can include concept profiling, creation, and
documents that have been classified and liked by member
optimization. Broad descriptors can be used to create a
substantially-invariant concept profile, ideally without the 15 clients U in a particular community M, then that M is
removed from the list of M indexed into by C. Also, when
word choice used to express concept C. A concept profile
there appears to be significant overlap between articles
can include positive concept clues (PCC) and negative
fitting concept C, and articles that have been classified by
concept clues (NCQ. The PCC and NCC can be combined
users as belonging to M, and if C does not currently index
by a processor to create a measure-of-fit that can be com
pared to a predetermined threshold. If Ihe combined effect of 20 into M, then M can be added to the list of M indexed into
by C. The foregoing heuristic for expanding the concepts C
the PCC and NCC exceeds the predetermined threshold,
that are covered by M, can potentially make M too broad
then informon A can be assumed to be related to concept C;
and, thus, accept loo many articles. Therefore, it further is
otherwise it is eliminated from further processing. PCC is a
preferred that a reasonable but arbitrary limit is set on the
set of words, phrases, and other features, such as the source
or the author, each with an associated weight, that tend to be 25 conceptual size covered by M.
With regard to the detailed analysis of each informon A
in A which contains C. In contrast, NCC is a set of words,
with respect to the community profile for each M, each A
phrases, and other features, such as the source or the author,
must pass through this analysis for each U subscribing to a
each with an associated weight that tend to make it more
particular M, i.e., for each member client in a particular
unlikely that A is contained in C. For example, if the term
"car" is in A, then it is likely to be about automobiles. 30 community. After A has passed that stage, it is then filtered
at a more personal, member client level for each of those
However, if the phrase "bumper car" also is in A, then it is
users. The profile and filtering process arc very similar for
more likely that A related to amusement parks. Therefore,
both the community level and the member client level,
"bumper car" would fall into Ihe profile of negative concept
except that at the community level, the empirical data
clues for the concept "automobile."
Typically, concept profile C can be created by one or more 35 obtained is for all U who subscribed to M, and not merely
an individual U. Other information about the individual U
means. First, C can be explicitly created by user U.
can be used to help the filter, such as what U thinks of what
Second, C can be created by an electronic thesaurus or
a particular author writes in other Zs that the user reads, and
similar device that can catalog and select from a set of
articles that can't be used for the group-level M processing.
concepts and the words that can be associated with that
FIG. 6 illustrates the development of a profile, and its
concept. Third, C can be created by using co-occurrence 40
associated predictors. Typically, regarding the structure of a
information that can be generated by analyzing the content
profile 400, the information input into the structure can be
of an informon. This means uses the fact that related features
of a concept lend to occur more often within Ihe same
divided into three broad calegories: (I) Structured Feature
Information (SFI) 405; (2) Unstructured Feature Informa
analysis of collections, H, of A that have been rated by one 45 tion (UFI) 410; and (3) Collaborative Input (CI) 415. Fea
tures derived from combinations of these three types acl as
or more U. Combinations of features that tend to occur
repeatedly in II can be grouped together as PCC for the
additional peer-level inputs for the next level of the rating
analysis of a new concept. Also, an A that one or more U
prediction function, called (4) Correlated-Feature, Errorhave rated and determined not to be within a particular Z can
Correction Units (CFECU) 420. From inputs 405,410,415,
be used for the extraction of NCC.
50 420, learning functions 425a-d can be applied to get two
document than in general. Fourth, C can be created by the
Concept profiles can be optimized or learned continually
after their creation, with the objective that nearly all As that
computed functions 426a-d,428a-d of the inputs. These two
functions are the Independent Rating Predictors (IRP)
Us have found interesting, and belonging in M, should pass
426a-d, and the associated Uncertainty Predictors (UP)
Ihe predetermined threshold of at least one C that can serve
428a-d. IRPs 426a-d can be weighted by dividing them by
as an index into M. Another objective of concept profile 55 their respective UPs 428a-d, so that the more certain an IRP
management is that, for each A that docs not fall into any of
426a-d is, the higher its weight. Each weighted IRP429<7-
the one or more M that are indexed by C, the breadth of C
is brought together with other IRPs 429a-d in a combination
is adjusted to preserve the first objective, insofar as possible.
function 427-d. This combination function 427a-d can be
For example, if C's threshold is exceeded for a given A, C's
from a simple, weighted, additive function to a far more
breadth can be narrowed by reducing PCC, increasing NCC, 60 complex neural network function. The results from this are
normalized by the total uncertainty across all UPs, from
In the next stage of filtering, one embodiment of contentCertain=zero to Uncertain°infinity, and combined using the
based indexing takes an A thai has been processed into the
Certainty Weighting Function (CWF) 430. Once the CWF
set of C that describe it, and determine which M should
430 has combined the IRPs 426a-d, it is preferred that result
accept the article for subsequent filtering, for example, 65 432 be shaped via a monolonically increasing function, to
detailed indexing of incoming A. It is preferred that a data
map to the range and distribution of the actual ratings. This
structure including a database be used, so that the vector of
function is called the Complete Rating Predictor (CRP) 432.
or both, or by increasing the threshold for C.
US 6,314,420 Bl
16
15
SF1405 can include vectors of authors, sources, and other
features of informon A that may be influential in determining
the degree to which A falls into the categories in a given M.
UFI 410 can include vectors of important words, phrases,
and concepts that help to determine the degree to which A
falls into a given M. Vectors can exist for different canonical
parts of A. For example, individual vectors may be provided
for subject/headings, content body, related information in
other referenced informons, and the like. It is preferred that
a positive and negative vector exists for each canonical part.
Q 41S is received from other Us who already have seen
A and have rated it. The input used for CI 415 can include,
for example, "interestingness," credibility, funniness, con
tent value, writing quality, violence content, sexual content,
profanity level, business importance, scientific merit,
surprise/unexpectedness of information content, artistic
quality, dramatic appeal, entertainment value, trendincss/
"significance" is used is in a statistical sense, and frame
works such as the Minimum Description Length (MDL)
Principle can be used to determine when to store or use a
more "local" component of the IRP. As a simple example,
the following IRP employs only two of the above terms:
lRP(aulhor)=>wcighled sum of
10
15
importance to future directions, and opinion agreement.
Each CFECU 420 is a unit that can detect sets of specific
feature combinations which are exceptions in combination.
For example, author X's articles are generally disliked in the
Z for woodworking, except when X writes about lathes.
average (ratings given this author so far in this M)
average (ratings given all authors so far in this M)
Table 2 gives the values attained for the four new articles.
It is preferred that an estimate of the uncertainty resulting
from a positive or negative IRP be made, and a complex
neural net approach could be used. However, a simpler
method, useful for this example, is simply to repeal the same
process that was used for the IRP but, instead of predicting
the rating, it is preferred to predict the squared-error, given
the feature vector. The exact square-error values can be used
as the informon weights, instead of using a rating-weight
lookup table. A more optimal mapping function could also
be computed, if indicated by the application.
20
Token 1
Token 2
Token 3
Token 4
IRP pas. vector
16.68
8.73
12.89
11.27
IRP ncg. vector
15.20
8.87
4.27
5.04
When an informon authored by X contains the concept of
"lathes," then the appropriate CFECU 420 is triggered to
signal that this is an exception, and accordingly a signal is
sent to offset the general negative signal otherwise triggered
25
because of the general dislike for X's informons in the
The UPs then can be computed in a manner similar to the
IRP's: comparisons with the actual document vectors can be
woodworking Z.
As an example the form of Structured Feature Information
made to gel a similarity measure, and then a mapping
(SF1) 405 can include fields such as Author, Source, 30
function can be used to gel an UP.
Information-Type, and other fields previously identified to
Making effective use of collaborative input (Cl) from
be of particular value in the analysis. For simplicity, the
other users U is a difficult problem because of the following
exemplary SFI, below, accounts only for the Author field.
seven issues. First, there generally is no a priori knowledge
For this example, assume three authors A, B, and C, have
regarding which users already will have rated an informon
collectively submitted 10 articles that have been read, and 35
A, before making a prediction for a user U, who hasn't yet
have been rated as in TABLE 1 (following the text of this
read informon A. Therefore, a model for prediction must be
specification. In the accompanying rating scheme, a rating
operational no matter which subset of the inputs happen to
can vary between 1 and 5, with 5 indicating a "most
be available, if any, at a given time. Second, computational
interesting" article. If four new articles (11,12,13,14) arrive
efficiency must be maintained in light of a potentially verythat have not yet been rated, and, in addition to authors A, 40
large set of users and informons. Third, incremental updates
B, C, and a new author D has contributed, a simple IRP for
of rating predictions often are desired, as more feedback is
the Author field, that just lakes sums of the averages, would
reported from users regarding an informon. Fourth, in learn
be as follows:
ing good models for making rating predictions, only very
lRP(author)=weighted sum of
sparse data typically is available for each users rating of each
45
average(ratings given the author so far)
document. Thus, a large "missing data" problem must be
averagc(ratings given the author so far in this M)
dealt with effectively.
averagc( ratings given all authors so far in this M)
Fifth, most potential solutions to the CI problem require
average(ratings given all authors)
independence assumptions that, when, grossly violated, give
averagc(ratings given the author so far by a particular
very poor results. As an example of an independence
50
user U)*
assumption violation, assume that ten users of a collabora
average(ratings given the author so far in this M by a
tive filtering system, called the "B-Team," always rate all
particular user U)*
articles exactly in the same way, for example, because they
average(ratings given all authors so far in this M by a
think very much alike. Further assume that user A's ratings
particular user U)*
are correlated with the B-Team at the 0.5 level, and are
avcragc(ratings given all authors by a particular user)* 55
correlated with user C at the 0.9 level. Now, suppose user C
* (if for a personal Z)
reads an article and rates it a "5". Based on that C's rating,
The purpose of the weighted sum is to make use of
it is reasonable to predict that A's rating also might be a "5".
broader, more general statistics, when strong statistics for a
particular user reading an informon by a particular author,
within a particular Z may not yet be available. When
stronger statistics arc available, the broader terms can be
eliminated by using smaller weights. This weighting scheme
is similar to that used for creating CWFs 430, for the profiles
as a whole. Some of the averages may be left out in the
actual storage of the profile if, for example, an author's
average rating for a particular M is not "significantly"
different from the average for the author across all Ms. Here,
60
Further, suppose that a member of the B-Team reads the
article, and rates it a "2". Existing collaborative filtering
methods are likely to predict thai A's rating RA would be:
R,,-(0.9x5h-0.5x2)/(0.9+0.5)=3.93
In principle, if other members of the B-Tcam then read and
65
rale the article, it should not affect the prediction of A's
rating, RA, because it is known that other B-Tcam members
always rate the article with the same value as the first
US 6,314,420 Bl
17
member of the B-Team. However, the prediction for A by
existing collaborative filtering schemes would tend to give
10 times the weight to the "2" rating, and would be:
R,-(0.9x5+10x0.5x2)/(0.9+10x0.5)-2.46
18
comparing the similarity between the profiles. Next, the
similarity of the member client profiles and informon content profiles can be compared. A neural network could be
used to learn how to compare profiles so that the error in
predicled ralings is minimized. However, the invention can
be embodied with use of a simple cosine similarity metric,
Existing collaborative filtering schemes do not work well in
like that previously considered in connection with Unstructhis case because B-Team's ratings are not independent, and
tured Feature Information (UFI) can be used.
have a correlation among one another of 1. The information
filter according to the present invention can recognize and
yhe method used is preferred to be able to include more
10 Il]an jusl ,ne tokens> such ^ the author and other SFI; and,
compensate for such inter-user correlation.
Sixth, information about the community of people is
jt is preferrej tnal ,he three vectors for component also are
known, other than each user's ratings of informons. This
ab|e ,0 ^ C0ll)paret]. sFIs may be handled by transforming
information can include the present topics the users like,
mem m(o an cntjly lnat cim bc lrta,cli ;n a comparable way
what authors the users like, etc. This information can make
,0 ,oken frequencjes ttiat can be multiplied in the standard
the system more effective when it is used for learning 15 Ioke[) frequency comparison method, which would be recstronger associations between community members. For
ognized by a skilled artisan.
example, because Users A and B in a particular community
ConIinuing in lhe ongoing eXiimplc> lhe Author field may
M have never yet read and rated an informon in common, no
correlation between their likes and dislikes can be made,
based on common ratings alone. However, users A and B 2U
have both read and liked several informons authored by the
same author, X, although Users A and B each read a
distinctly different Zs. Such information can be used to make
the inference that there is a possible relationship between
user A's interests and user B's interests. For the most part, -5
existing collaborative filtering systems can not take advan
tage of this knowledge.
Seventh, information about the informon under consider
ation also is known, in addition to the ratings given it so far.
For example, from knowing that informon A is about the *>
concept of "gardening", better use can be made of which
users' ratings are more relevant in the context of the infor
mation in the informon. If user B's rating agrees with user
D's rating of articles when the subject is about "politics", but
B's ratings agree more with user D when informon A is 3S
about "gardening", then the relationship between User B's
ratings and User D's ratings are preferred to be emphasized
to a greater extent than the relationship between User B and
User C when making predictions about informon A.
ioken
for
t
known, information about the community and the informon,
it is possible to determine the influence of user A's rating of
an informon on the predicted rating of the informon for a
common "mindset," Sw between user A and user B and 50
informon D, that may be expressed as:
M^profilefA) X profile(B) X DocumentProfile(D).
Second, a correlation may be taken between user A's past
ratings and user B's past ratings with respect to informons
that are similar to D. This correlation can be taken by
weighting all informons E that A and B have rated in
common by the similarity of E to D, SED:
^^
a
correlation of each author under comparife funclion p f ^
|e ^
£
an'd usef B*
q(^ auth
^fa the
monotonically-increasing function of the
J
\ and
fB M
a sj
^
an
*
^P
*
^
^
read P
e
constraints according to the
* J
when lhere has ^ nQ
q{ ^ zefo
user
soffie
amhor ^
h
raicd
^
^
Jn
thjs
cag
J
[ P
intentionally avoiding reading inforsubsequent article that has
been read but is not prepared bv the author. In general, the
exact weighting function and parameters can bc empirically
derived rather ,,)an ,heoretically derived, and so Ls chosen by
(he
limizalion of lh<; ovcran ralinj, prediction functions.
Continuing in the present example, a correlation can be
com
led wilh the rollowin& W(.ighls for Ihe aulhors K, L
a[)d ^
S£D=Weighled_Correlation(ratings(A),ratings(B))
Author
Weight
W^ = weight for rating pair (rating (A. £)). rating (B. DU
• togpl ♦ 1) x log(l «l|x Gfnot author)
-004
F(5.7. author or D)
" log(5 + 1) x log(7 + 1) x G(Milhoi)
Note that the "X" in the above equation may not bc a mere
multiplication or cross-product, but rather bc a method for
(he
exact value is an increasing function H of the total articles
read by a particular user so far, because it becomes more
Each of the examples can be weighted by
= DocumemProfile (E)x DocummtProfiletD)
fc
of whether the informon D is by the author or not
can be;
^
second user, B. For example, where user A and user B have 45
read and rated in common a certain number of informons,
the influence of user A's rating of informon D on the
predicted rating of informon D for user B can be defined by
a relationship that has two components. First, there can be a
as follows:
component of the member client
and ^ £
for each f
With regard to the aforementioned fourth, sixth and 40
seventh issues namely, making effective use of sparse, but
vcctor m
fi)cs f
£
^
P
f
- 0.70
US 6,314,420 131
19
20
FIO. 7 illustrates a preferred embodiment of a mind pool
-continued
Auihor
M
Weighi
;
svstem 50°- U fe Preferred that aU users be members of |he
T
I(°K26)°x C(2 + i)« G(not amho,)
"
. o.O2
5
uppermost portion of the hierarchy, namely, the top mind
pool 501. Mind pool 501 can be broken into sub-mindpools
502a-e, which separate users into those having at least some
common interests. Furthermore, each sub-mind pool 502«-c
can be respectively broken into sub-sub-mindpools 503a-o,
I, is preferred that, ,1,,, irwiriihrn he used as the
the logarithm be used as the
m olonfi^^
used are H=log(sample_Ji2,.0.1) and an assumed rating, for
503c-<*, SO5e,f,g to which users 504a-g are respective
members. As used herein, mind pool 501 is h p
^^s. As used_h«ein, mind pool SO^fethe^arent^ode
those authors who arc unrated by a user, to the value of "2."
respechve parent nodes to sub-sub-mindpools S03«-g Subpools 502a-c arc the child nodes to mind pool 501 and
one-neuron neural network, or a linear function such as,
(correlation* 1)*PO. Where the Po is an optimized parameter
502a, 502c, if such more closely matches their interests than
would membership in a sub-sub-mind pool 503a-g. In
nize that there are numerous methods that can be used to
effect informon comparisons, particularly document com-
hy autjjOr A l0 he interesting but nevertheless found other
^^^ by author A on "gardening" to be uninteresting may
ing technique in conjunction with the cosine similarity
A processing means or mind pool manager may be used
The correlation for the author' SFI can be mapped to a
sub-pools 503a-g are child nodes to respective mindpools
non-zero range, so that it can be included in the cosine
503a-c. Sub-pools 503a-g can be considered to be end
similarity metric. This mapping can be provided by a simple 15 nodes. Users 505a,i> can be members of sub-mind pool
used to produce the predicted ratings with the lowest error
general, the objective is to break down the entire population
in the given domain for filtering.
of users into subsets that are optimally similar. For example.
An artisan skilled in information retrieval would recog- 2Q meseiofuserswho find the same articles about "gardening"
parisons. One preferred method is to use a TF-1DF weight-
metric. SH including author, can be handled by including
be . [ned fa one subse,
P
management of each of the mindpools 501,
ponent of the relationship between user A's and user B's can
child-node mind pool managers and from those users
simple additive function, a product function, or a complex
manager's parent node, if such exists; (3) receiving estima-
straints encountered in the application. Optimization of the
exists; and (4) making estimations of the mind pool con-
be combined to produce the function to predict the rating of
coupled directly to the manager; (2) passing rating informainformon D for user B. The combination function can be a 30 lion or compiled statistics of the rating information up to the
function, including, for example, a neural network mapping
function, depending upon computational efficiency con-
tions of the mind pool consensus on the rating for an
informon from the manager's parent mind pool, if such
combination function can be achieved by minimizing the 35 sensus on the rating for a specific informon for the users that
predicted rating error as an objective.
In addition to determining the relationship between two
user's ratings, a relationship that can be used and combined
across a large population of users can be developed. This
ibl
h
fid fi
laborative input. Specifically, the difficuly
pyg
user rating relationship across a large population of users is
compounded by the lack of a priori knowledge regarding a
come under the manager's domain; and (5) passing the
estimations from function 4 down to either a child-node
mind pool or, if the manager is an end node in the hierarchy,
to the respective user's CWF, for producing the user's
ditd
ti
Function 4 also can include combining the
y
size, standard deviation, etc. Furthermore, as alluded to
above, users can be allowed to belong to more than one mind
large volume of dynamically changing information that may 45 pool if they don't fit precisely into one mind pool but have
have unexpected correlation and therefore grossly violate
multiple views regarding the conceptual domain of the
to the aforementioned "community" or may instead be one
managers) first decide whether (he rating will effect its
independence assumptions.
informon. Also, it is preferred that lateral communication be
In one embodiment of the present invention, it is preferred
procided between peer managers who have similar users
that users be broken inlo distributed groups called "mindbeneath them to share estimation information. When a rating
pools." Mindpools can be purely hierarchical, purely so comes in from a user, it can be passed to the immediate
parallel, or a combination of both. Mindpools can be similar
manager(s) node above that user. It is preferred that the
of many subcommunities. These multiple hierarchies can be
current estimation or whether the statistics should be passed
used to represent different qualities of an article. Some
upward 10 a parent-node. If the manager estimation would
qualities that can be maintained in separate hierarchies 55 change by an amount above an empirically-derived miniinclude: interestingness; credibility; funniness; valuablemum threshold, then the manager should pass that estimaness; writing quality; violence content; sexual content; protion down to all of its child-nodes. In the event that the
fanity level; business importance; scientific merit; ariislic
compiled statistics arc changed by more than another miniquality; dramatic appeal; entertainment value; surprise or
mum threshold amount, then the compiled statistics should
unexpectedness of information content; Irendiness or impor- 50 be passed to the manager's parent-node, if any ,and the
lance to future directions; and opinion agreement. Each of
process recurses upward and downward in the hierarchy,
these qualities can be optionally addressed by users with a
Because no mind pool manager is required to have
rating feedback mechanism and, therefore, these qua lilies
accurate information, bin just an estimation of the rating and
can be used to drive separate mind pool hierarchies. Also,
an unccriainly level, any manager may respond with a
the qualities can be used in combinations, if appropriate, to 65 simple average of all previous documents, and with a higher
develop more complex composite informon qualities, and
degree of uncertainty, if none of its child-nodes has any
more sublime mindpools.
rating information yet. The preferred distributed strategy
US 6,314,420 Bl
22
21
methods described above. Approximations can be made by
pre-computing all terms that do not change significantly,
based on the particular informon, or the subset of actual
tends lo reduce the communication needed between
processors, and the computation tends to be pooled, thereby
eliminating a substantial degree of redundancy. Using this
distributed strategy, Ihe estimations tend to settle to the
ratings given so far to the mind pool manager.
As stated previously, the correlated-feature error-
extent that the updating of other nodes, and the other users
correction units (CFECUs) are intended to detect irregulari
ties or statistical exceptions. Indeed, two objectives of the
CFECU units are to (1) find non-linear exceptions to the
general structure of the three aforementioned types of inputs
(SFI, UFI, and CI); and (2) find particular combinations of
informon sub-features thai statistically stand out as having
special structure which is not captured by the rest of the
general model; and (3) trigger an additional signal to the
CFECU's conditions are met, in order to reduce prediction
error. The following exemplifies the CFECU operation.
predictions are minimized. Therefore, as the number of
informons and users becomes large, the computation and
prediction updates grow as the sum of the number of
informoas and the number of users, rather than the product
of the number of informons and the number of users. In
addition, incremental updates can be accomplished by the
passing of estimations up and down the hierarchy. Incre
mental updates of rating predictions continue to move until
the prediction becomes stable due to the large sample size.
10
The distributed division of users can reduce the effects of
15
independent assumption violations. In the previous example
with the B-Team often users, the B-Team can be organized
as a particular mind pool. With the additional ratings from
each of the B-Team members, the estimation from the
B-Team mind pool typically does not change significantly
because of the exact correlation between the members of
that mind pool. This single estimation then can be combined
with other estimations to achieve the desired result, regard
less of how many B-Team members have read the article at
20
any given time.
25
The mind pool hierarchies can be created by either
computer- or human-guided methods. If the hierarchy cre
ation is human-guided, there often is a natural breakdown of
people based on information such as job position, common
interests, or any other information that is known about them.
30
Where the mind pool hierarchy is created automatically,
because the previously described measure of the collabora
tive input relationship between users can be employed in a
standard hierarchical clustering algorithm to produce each
group of users or nodes in the mind pool hierarchy. Such 35
In this example, it is desired that author A's informon D
standard hierarchical clustering algorithms can include, for
about gardening have a high predicted rating for user B.
example, the agglomerative method, or the divide-andHowever, because the average rating for author A by user B
conquer method. A skilled artisan would recognize that
is only 1.69, and the average rating for the gardening
many other techniques also are available for incrementallyadjusting the clusters as new information is collected. 40 concept is only 1.68, a three-part model (SFI-UFI-C1) that
does not evaluate the informon features in combination
Typically, clustering is intended to (1) bring together users
would tend to noi rank informon D very highly. In this case,
the first CFECU would firsl find sources of error in past
examples. This could include using the three-part model
pendent among one another.
Estimations are made in a manner similar to other esti 45 against the known examples that user B has rated so far. In
this example, seven articles that user B has rated, have an
mations described herein. For example, for each user or
average rating of 4.5, though even the three-part model only
sub-mind pool (sub-informant), a similarity between the
predicts a rating of about 1.68. When such a large error
sub-informant and the centroid of the mind pool can be
appears, and has statistical strength due to the number of
computed in order to determine how relevant the sub-
whose rating information is clearly not independent; and (2)
produce mind pool estimations that are substantially inde
informant is in computing the estimation. Uncertainty esti- 50 examples with the common characteristics of, for example,
mators also are associated with these sub-informants, so that
the same author and topic, a CFECU is created to identify
that this exception lo ihe three-part model has been triggered
and that a correction signal is needed. Second, it is preferred
they can be weighted with respect to their reliability in
providing the most accurate estimation. Optionally, the
informon under evaluation can be used lo modulate the
relevancy of a sub-informant. 'Iliis type of evaluation also
can take advantage of the two previously-determined col
laborative information relationship components, thereby
tending to magnify relationships that are stronger for par
ticular types of informons than for others. Once a suitable set
of weights are established for each user within a mind pool
for a particular informon, a simple weighled-avcrage can be
used to make the estimation. It is preferred that the "simple"
weighted average used is more conservative regarding input
information that a simple independent linear regression.
Also, the overall Uncertainty can be derived from Ihe
Uncertainty Prcdiclions of the sub-informants, in a manner
similar lo the produclion of other uncertainty combination
55
to index the new CI-'LCU into a database so thai, when
triggering features appear in an informon, for example,
author and topic, the correction signal is sent into the
appropriate CWF. One method which can be used to effect
the first step is a cascade correlation neural network, in
which the neural net finds new connection neural net units
65
to progressively reduce the prediction error. Another method
is to search through each informon lhai has been rated but
whose predicted raling has a high error, and storing Ihe
informons profile.
When "enough" informons have been found with high
error and common characteristics, ihe common characteris
tics can be joined together as a candidate for a new CFECU.
Ncxi, the candidate can be tested on all Ihe samples, whether
US 6,314,420 Bl
23
24
they have a high prediction or a low prediction error
regular search engine results. As shown in the logic diagram
of Fl G. 7, a user provides a query as indicated by block 20C.
The query is applied to a Lookup Table, as indicated by
block 22C, block 24C applies a test to determine from the
users'profiles, they also can be added to a database of
engine.
mation of the predicted rating from a particular CFECU can
be made by taking the average of those informons for which
the CFECU responds. Also, the Uncertainty can be chosen
such that the CFECU signal optimally outweighs the other
tion of previous users' feedback data. By analyzing docu
associated with them. Then, the overall error change
(reduction or increase) for all of the examples can be
computed to determine if the CFECU should be added to the
informon profile. If the estimated error reduction is greater 5 table whether a wire already exists for the new query. If so,
block 26C returns results from the existing wire. Otherwise,
than a minimum threshold level, the CFECU can be added
block 28C commands a demand search by a regular query
to the profile. As successful CFECU are discovered for
CFECU's that may be useful for analyzing other profiles. If
With the use of wire search returns, each user can review
a particular CFECU has a sufficiently broad application, it 10 the returned results and provide feedback data about
can be moved up in the filtering process, so that it is
reviewed documents. Such feedback data is incorporated in
computed for every entity once. Also, the particular CFECU
the filter profiles used in processing informons for the wire.
can be included in the representation that is computed in the
Therefore, when a future user makes substantially the same
pre-processing stage as a new feature. In general, the esti
query, the wire will have been improved by the incorpora
ments which users rate as meeting a particular quality such
as interestingness, the system can find common document
features which can be used to return more like documents to
signals being sent to the CWF. One method of selffuture users who make substantially the same query.
optimization that can be employed is, for example, the
Alternatively, all queries applied to a search engine sys
gradient descent method, although a skilled artisan would 20
tem of the invention can set up new wires. After a search
recognize that other appropriate optimization methods may
query is presented to the search engine system, a wire is
be used.
created on the basis of the query terms, and all new
The invention of this continuation-in-part application, as
documents subsequently received from the network are
shown in FIGS. 8 and 9, provides a collaborative and
preferably adaptive search engine system in which elements 25 filtered by the new wire. A push-model may be used to send
all passed, new documents to the user.
of the structure and principles of operation of the apparatus
of FIGS. 1-7 are applied. Accordingly, a search engine
system of the invention, as preferably embodied, integrates
collaborative filtering with adaptive content-based filtering
to provide improved search engine performance. The acro
nym "CASE" refers to a search engine system of the
invention, i.e., a collaborative, adaptive search engine.
In the operation of conventional search engines at portal
web sites., user queries are searched on demand to find
relevant informons across the web. Content-based filtering is
typically used in measuring the relevacy of informons, and
the search results are resented in the form of a list of
Among other basic search engine system structures, an
integrated system can be employed in which collaborative
and content-based filtering is structured to provide demand
30 searches with or without collaborative filtering, or wire
searches. In the operation of the preferred basic structure and
other basic structures, a query processor can be employed, if
needed, to make search-type assignments for user queries.
Generally, basic search engine system structures of the
35 invention are preferably embodied with the use of a pro
grammed computer system.
Collaborative fillering employs additional data from other
users to improve search results for an individual user for
whom a search is being conducted. The collaborative data
informons ranked by relevancy.
The present invention combines collaborative filtering
with content-based filtering in measuring informons for
relevancy, and further preferably applies adaptive updating
40
of the content-based filtering operation. In providing these
INTEGRATED COLLABORATIVE/CONTENT-BASED
results, the invention can be embodied as a search engine
system in accordance wilh different basic structures. In the
presently preferred basic structure, an integrated
collaborative/conlent-based filter (FIGS. 1-7) is operated to
provide ongoing or continuous searching for selected user
queries, with a "wire" being established for each query. On
the other hand, a regular search engine is operated to make
can be feedback informon rating data, and/or it can be
content-profile data for agent mind melding which is more
fully disclosed in Ser. No. 09/195,708 now pending, entitled
FILTER STRUCTURE EMPLOYING SELECTIVELY
45
SHARED, CONTENT-BASED PROFILE DATA TO
EVALUATE INFORMATION ENTITIES IN A MASSIVE
INFORMATION NETWORK, filed by the current inventors
on Nov. 19, 1998, and hereby incorporated by reference.
Many types of user rating information can be used. For
immediate or short-term "demand" searches for other user 50 example, users can sort documents which they have read
from best to worst. Alternatively, users can select on a scale
(numeric, such as 1 to 10, or worded, such as good, medium,
poor) how much they enjoyed reading a document. Further,
user monitoring can measure time spent by users on each
for an input query. Otherwise, wire search results are 55 document, thereby indicating user interest (normalized by
document length). Among other possibilities, the choices of
returned if a wire does exist, or collaborative ranking data
documents for reading by other users can be simply used as
can be applied from the wire filler structure to improve the
an indication of interesting documents. In all cases, the
results of the demand search from the regular search engine.
feedback rating data can be based on inlerestingness or any
In the currently preferred embodiment, wires are created
queries on the basis of content-based filtering. This basic
structure of the invention is especially beneficial for use in
applying the invention to existing search engine structure.
Demand search results can be returned if no wire exists
for the most common queries received by the search engine 60 of a variety of other document qualities, as described in
connection with FIGS. 1-7.
system. A suitable analysis is applied to the search engine
Feedback ranking information can be used in a number of
ways, and the invention is not limited by the method of
and respective wires arc then created for each of these
feedback information use. Use methods range in spectrum
queries. An analysis update can be made from time to time
65 from weighting relative ranks by a set amount (possibly
to make wire additions or deletions as warranted.
equally, possibly heavy weighting one above the other) to
When a user makes a query for which a wire already
dynamically adjusting the weight by measuring how statisexists, wire search results are preferably returned instead of
operations to determine which queries are most common,
US 6,314,420 Bl
25
26
tically significant the user feedback is. For example, if only
character 48C2. Collaborative rating data is used in the
However, if many people have consistently ranked an article
applied in the demand search mode, to the extent that
one person has ranked an article, it may not be significant.
informon rating process in the wire search mode, and if
the same, more credibility may be placed on the user's
collaborative data is available for the informons in the
weighting.
5 search return. Search results are returned to the users 34C
FIG. 9 shows a generalized embodiment of the invention
and 3fic from ,he ^^^ return processor 48C as shown in
in which system elements in a CASE system 30C are
^ 9
integrally configured to provide wire and/or demand
'
.
searches A query processor 32C receives queries from an
The invention .s preferably embodied as shown in FIG.
38C responds to the currently processed query to set a
content-based filter structure 40C for wire search operation
vidual user 62C and other users 64C and determines whether
a wire already exists for each entered query. If a wire exists,
individual user 34C and other users 36C. A mode selector 10 10. A query processor 60C receives queries from an indior demand search operation. In the preferred application of
,he query is routed to a collaborative/conlenl-based filter
the invention, the wire mode is selected only if a wire
struciure 66C like that of FIGS. 1-7. A spider system 68C
already exists, and wires exist only for those queries found 15 continllousiy scaas a nctWOrk 70C for informons providing
to be commonly entered as previously described In the
^ ,hreshold.level match for conlen, based profiles (i.e.,
demand search mode, the filter structure 40C can funct.on
cessi
files „ Ihe ,
lcvel o[ lhe preferred
"e^vruiXme^beusedfordetennining
mul.i-ieve, filter structure, a, leas, one of which reflects the
demand search being made the first time a particular query
memory 72C according to the wire or wires to which they
is entered and with wire searches being made for subsequent
entries of the same query. As another example, the user may
belong.
A feedback processor 74C is structured like the mind pool
is desired, the user may select a wire search.
integration with the content-based data in the measurement
mode or demand search mode, and employs content-based
profiles 42C in content-based filtering preferably multt-
slmcme like ,hal of FIG. 6 is employed for this purpose,
AdUve feetlback dala ^ appiied from thc USers to the filter
whether a wire search or a demand search is made. For 20 content profile of a current wire query). Informons which are
example, every query can call for a wire search, with a
passed by the filter 66C for existing wires are stored in a
select a demand search, or, if continuing network searching 25 SyStcm of pjg 7 t0 provide collaborative feedback data for
The filter structure 40C operates in its set wire search
Sies iTrj\^^;^r^2i:
of infonnon K{cvAncy by ,hc fi|,cr 66C. An informon rating
evaluation, feedback data from users respectively associated
ouslv
in demand
profile data
agent mind
A spider
by a spider system 78C in a demand search of the network
70c. The spider system 78C can have its own memory
system 78CM as considered in connection with the spider
45c of FIG. 9.
therewith. These profiles are used by the filter structure 40C
If no wire exists for a currently input query, the query is
in wire searches in the wire mode.
sent to a regular search engine where a content profile is
Demand profiles 42C2 are used by the filler structure 40C 35 established for content based filtering of informons returned
searches in the demand mode. Collaborative
can be integrated with the wire profiles through
melding 43C as previously explained.
system 46C scans a network 44C to find infor-
mons for a current demand search, and to find Mormons 40
with continued network scanning for existing wires. In
selecting available mfonnons for return, the sp.der system
46C uses a content threshold derived from the content-based
Once fi,(eri
which
is performed on returned informons, those
e a satisfacl
'
^
H.. ,
.
.
match ,0 the query
^
h a ^^ fmm
re for
profile for which an informon search is being conducted.
prutcssoi »^'
.
it.
46C have a memory system 46CM which holds an informon
data base wherein index information is stored from infor-
demand search memory 82C ind,cates that the current query
has been made over time with sufhcienl frequency to qualify
P In many instances, it s preferable that the spider system 45 current query for which a demand search, was made fa
mons previously collected from the network. In this manner,
as a "common" query for which a wire is justified. As
demand searches can be quickly made from the spider
indicated by dashed connector line 801D, collaborative
memory 46CM as opposed to making a time consuming 50 feedback data can be, and preferably is, integrated into the
search and downloading in response to a search demand
demand search processing by the processor 80C.
query from the search engine.
A search return processor 48C receives either demand
search informons or wire search informoas passed by the
Many alterations and modifications may be made by those
having ordinary skill in the art without departing from the
spiri( and xope of ,he jnvcnljon. Therefore, it must be
content-based filter structure 40C according to the operating 55 undel5|ood ihal the illustrated embodiments have been set
mode of the latter, and includes an informon rating system
which is like lhat of FIG. 6. The informon rating system
combines content-based filtering data with collaborative
forth Q. for |he
be (akc[) as ,„
oses of exampie> and that it should not
|hc invemion as^^ b the following
lhefeforC) ,0 ^ rea(] ,0
feedback rating data from users through a feedback proces,he combination of elements which are
sor SOC a. leas, ,n the wire search mode and, ,f desxred, ■„ « ^^ ^ ^^ ^ ^.^ ^^ fw ^^
in rwirflearch mode, the processor 48C rates infor-
mons on a continuing basis as they arc received from the
-"stantiaHy the same function in substantially the same
way to obtain substantially the same result, llie claims are
network 44C through the spider sysicm 46C as indicated by
ihus to be understood to include whai is specifically illusthc reference character 48C1. In the demand search mode, 65 traled and described above, whai is conceptually equivalent,
the processor 48C rates informons returned by the spider
and also what incorporates the essential idea of the mvensyslcm 46C in a demand search as indicalcd by the reference
lion.
US 6,314,420 Bl
28
27
10. A search engine system comprising:
a system for scanning a network to make a demand search
TABLE 1
Article
Author
for informons relevant to a query from an individual
Rating given
user;
a content-based filter system for receiving the informons
from the scanning system and for filtering the infor
mons on the basis of applicable content profile data for
relevance to the query; and
a feedback system for receiving collaborative feedback
data from system users relative to informons consid
10
ered by such users;
the filter system combining pertaining feedback data
from the feedback system with the content profile
data in filtering each informon for relevance to the
query.
11. The system of claim 10 wherein adaptive user feed
What is claimed is:
1. A search engine system comprising:
back data is applied to the content-based filter to provide a
learning component for content profile data employed
a first system for receiving informons from a network on
a continuing search basis, for filtering such informons 30 therein.
12. The system of claim 10 wherein:
for relevancy to a query from an individual user, and for
the scanning system further scans the network on a
storing a ranked list of relevant informons as a wire;
continuing basis to make a wire search for informons
a second system for receiving informons from a network
relevant to wire queries from system users; and
on a current demand search basis and for filtering such
informons for relevancy to the query from the indi
35
vidual user; and
data in filtering each wire informon for relevance to
a third system for selecting at least one of the first and
second systems to make a search for the query and to
applicable wire query.
13. The system of claim 10 wherein the collaborative
feedback data comprises active feedback data.
14. The system of claim 10 wherein the collaborative
feedback data comprises passive feedback data.
15. The system of claim 14 wherein the passive feedback
data is obtained by passively monitoring the actual response
return the wire or demand search results to the indi
vidual user.
2. The system of claim 1 wherein the third system selects
the first system to make a wire search only if a wire already
exists for the query.
3. The system of claim 1 wherein:
a feedback system is provided for receiving collaborative
feedback data from system users relative to informons
to a proposed informon.
16. The system of claim 10 wherein the collaborative
feedback data comprises a combination of active feedback
considered by such users; and
at least the first system combines pertaining data from the
feedback system with content profile data of the first
system in filtering each informon for relevance to the
query and inclusion in the wire.
4. The system of claim 3 wherein the first system includes
a multi-level, content-based filter having descending levels
including at least an upper preprocessing level, a middle user
community level, and a bottom user level.
5. The method of claim 3 wherein the collaborative
data and passive feedback data.
50
from an individual user;
55
6. The method of claim 3 wherein the collaborative
feedback data comprises passive feedback data.
7. The method of claim 6 wherein the passive feedback
data is obtained by passively moniloring the actual response
to a proposed informon.
users;
a system for controlling the operation of the filtering
system to filter for one of a wire response and a demand
response and to return the one response to the user; and
the filtering system combining pertaining feedback data
from the feedback system with content profile data in
determining the relevancy of the informons for inclu
sion in at least a wire response to the query.
18. The system of claim 17 wherein:
the content-based filtering system includes
8. The method of claim 3 wherein the collaborative
feedback data comprises a combination of active feedback
data and passive feedback data.
of content profile data employed therein.
17. A search engine system comprising:
a content-based filtering system for receiving informons
from a network on a continuing basis and for filtering
the informons for relevancy to a wire or demand query
a feedback system providing feedback data from other
feedback data comprises active feedback data.
9. The system of claim 1 wherein adaptive user feedback
data is applied at least to the first system to provide updating
the filter system combines pertaining feedback data from
the feedback system with applicable content profile
65
a
collaborative/content based filter for filtering infor
mons for relevancy to a wire query on a continuing
basis; and
US 6,314,420 Bl
30
29
30. A method for operating a search engine system
the content-based filtering system includes a regular
comprising:
search engine for filtering informons for relevancy to a
receiving informons in a content-based filtering system
demand query.
from a network on a continuing basis and filtering the
informons for relevancy to a wire or demand query
19 The system of claim 18 wherein adaptive user feed
back data is applied at least to the collaborative/content-
based filter to provide learning for content profile data
employed therein.
from an individual user;
.
20. The search engine system of claim 17 wherein the
feedback system provides active feedback data.
21. The search engine system of claim 17 wherein the
providing feedback data from other users;
controlling the operation of the filtering system to filter
for one of a wire response and a demand response and
10
to return the one response to the user; and
feedback system provides passive feedback data.
combining pertaining feedback data with content profile
22. The search engine system of claim 21 wherein the
passive feedback data is obtained by passively monitoring
data in the filtering system in determining the relevancy
of the informons for inclusion in at least a wire
the actual response to a proposed informon.
23 The system of claim 17 wherein the feedback system 15
response to the query.
provides a combination of active feedback data and passive
feedback data.
24. A method for operating a search engine system
comprising:
receiving informons in a first system from a network on 20
31. The method of claim 30 wherein the step of providing
feedback data comprises providing active feedback data.
32. The method of claim 30 wherein the step of providing
feedback data comprises providing passive feedback data.
a continuing search basis, for filtering such informons
33. The method of claim 32 wherein the passive feedback
data is obtained by passively monitoring the actual response
from at least one of the other users to a proposed informon.
for relevancy to a query from an individual user and for
storing a ranked list of relevant informons as a wire;
34. The method of claim 30 wherein the step of providing
receiving informons in a second system from a network ^ feedback data comprises providing a combination of active
on a current demand search basis for filtering such
feedback data and passive feedback data.
informons for relevancy to the query from the indi
35. A search engine system comprising:
vidual user; and
selecting at least one of the first and second systems to
make a search for the query and to return the wire or ^
demand search results to the individual user.
25. A method for operating a search engine system
comprising:
scanning a network to make a demand search for infor
mons relevant to a query from an individual user,
35
receiving the inforraons in a content-based filter system
means for receiving informons from a network on a
continuing search basis, for filtering such informons for
relevancy to a query from an individual user, and for
storing a ranked list of relevant informons as a wire;
means for receiving informons from a network on a
current demand search basis and for filtering such
informons for relevancy to the query from the indi
vidual user; and
means for selecting at least one of the first and second
systems to make a search for the query and to return the
from the scanning system and filtering the informons
on the basis of applicable content profile data for
relevance to the query;
wire or demand search results to the individual user.
36. A search engine system comprising:
relative to informons considered by such users; and
means for content-based filtering informons received
receiving collaborative feedback data from system users 40
from a network on a continuing basis for relevancy to
a wire or demand query from an individual user;
combining pertaining feedback data with the content
profile data in filtering each informon for relevance to
the query.
.
45
26. The method of claim 25 wherein the collaborative
feedback data comprises active feedback data.
27. The method of claim 25 wherein the collaborative
feedback data provides passive feedback data.
28. The method of claim 27 wherein the passive feedback _q
data is obtained by passively monitoring the actual response
to a proposed informon.
29. The method of claim 25 wherein the collaborative
feedback data comprises a combination of active feedback
data and passive feedback data.
means for collecting feedback data from other users;
means for controlling the operation of the filtering means
to filter for one of a wire response and a demand
response and to return the one response to the user; and
the filtering means combining pertaining feedback data
from the feedback system with content profile data in
determining the relevancy of the informons for inclu
sion in at least a wire response to the query.
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?