Personalized User Model LLP v. Google Inc.
Filing
474
REDACTED VERSION of 457 Declaration re Answering Brief in Opposition to Motion for Summary Judgment of Invalidity (Declaration of Jaime G. Carbonell) (Volume 1 of 3) by Personalized User Model LLP. (Tigan, Jeremy)
IN THE UNITED STATES DISTRICT COURT
FOR THE DISTRICT OF DELAWARE
PERSONALIZED USER MODEL, L.L.P.,
)
)
Plaintiff,
)
)
v.
)
)
GOOGLE, INC.,
)
)
Defendant.
)
)
)
)
GOOGLE, INC.
)
)
Counterclaimant,
)
)
v.
)
PERSONALIZED USER MODEL, L.L.P. and )
)
YOCHAI KONIG,
)
Counterclaim-Defendants. )
C.A. No. 09-525 (LPS)
PUBLIC VERSION
VOLUME 1 OF 3 - DECLARATION OF JAIME G. CARBONELL IN SUPPORT OF
PLAINTIFF PERSONALIZED USER MODEL, L.L.P.’S ANSWERING
BRIEF IN OPPOSITION TO GOOGLE'S MOTION FOR SUMMARY
JUDGMENT ON INVALIDITY
OF COUNSEL:
Marc S. Friedman
SNR Denton US LLP
1221 Avenue of the Americas
New York, NY 10020-1089
(212) 768-6700
Mark C. Nelson
SNR Denton US LLP
2000 McKinney Avenue, Ste. 1900
Dallas, TX 75201
(214) 259-0901
Jennifer D. Bennett
SNR Denton US LLP
1530 Page Mill Road, Ste. 200
Palo Alto, CA 94304-1125
(650) 798-0300
Original Filing Date: January 14, 2013
Redacted Filing Date: January 23, 2013
MORRIS, NICHOLS, ARSHT & TUNNELL LLP
Karen Jacobs Louden (#2881)
Jeremy A. Tigan (#5239)
1201 N. Market Street
P.O. Box 1347
Wilmington, DE 19899-1347
(302) 658-9200
klouden@mnat.com
Attorneys for Personalized User Model, L.L.P.
IN THE UNITED STATES DISTRICT COURT
FOR THE DISTRICT OF DELAWARE
PERSONALIZED USER MODEL, L.L.P.,
)
)
Plaintiff,
)
)
v.
)
)
GOOGLE, INC.,
)
)
Defendant.
)
)
)
)
GOOGLE, INC.
)
)
Counterclaimant,
)
)
v.
)
PERSONALIZED USER MODEL, L.L.P. and )
)
YOCHAI KONIG,
)
Counterclaim-Defendants. )
C.A. No. 09-525 (LPS)
PUBLIC VERSION
DECLARATION OF JAIME G. CARBONELL IN SUPPORT OF
PLAINTIFF PERSONALIZED USER MODEL, L.L.P.’S ANSWERING
BRIEF IN OPPOSITION TO GOOGLE'S MOTION FOR SUMMARY
JUDGMENT ON INVALIDITY
I, Jaime G. Carbonell, declare:
1.
I submit this Declaration in Support of Plaintiff's Personalized User Model,
L.L.P’s ("PUM") Answering Brief in Opposition to Google’s Motion for Summary Judgment on
Invalidity.
2.
I have been retained by counsel for PUM to respond to the Report of Defendant's
Expert Michael I Jordan, Ph.D., Concerning Invalidity of Claims 1, 11, 22, 32 and 34 of U.S.
Patent No. 6,981,040 and Claim 1, 3, 5, 6, 7, 21 and 22 of U.S. Patent No. 7,685,276, and to
respond where appropriate.
3.
Attached hereto as Exhibit A is a true and correct copy of my Rebuttal Expert
Witness Report signed November 28, 2012, and selected exhibits. I hereby incorporate by
reference and adopt all statements and opinions reflected in my Expert Witness Report as if set
forth fully herein.
I declare under penalty of perjury under the laws of the United States of America that the
foregoing is true and correct.
Executed this 12th day of January 2013, at Pittsburgh, Pennsylvania.
Jaime G. Carbonell
-2-
CERTIFICATE OF SERVICE
I hereby certify that on January 14, 2013, I caused the foregoing to be
electronically filed with the Clerk of the Court using CM/ECF which will send electronic
notification of such filing to all registered participants.
Additionally, I hereby certify that true and correct copies of the foregoing were
caused to be served on January 14, 2013, upon the following individuals in the manner indicated:
BY E-MAIL
BY E-MAIL
Richard L. Horwitz
David E. Moore
POTTER ANDERSON & CORROON LLP
1313 N. Market St., 6th Floor
Wilmington, DE 19801
Brian C. Cannon
QUINN EMANUEL URQUHART
& SULLIVAN, LLP
555 Twin Dolphin Dr., 5th Floor
Redwood Shores, CA 94065
Charles K. Verhoeven
David A. Perlson
Antonio R. Sistos
Andrea Pallios Roberts
Joshua Lee Sohn
QUINN EMANUEL URQUHART
& SULLIVAN, LLP
50 California Street, 22nd Floor
San Francisco, CA 94111
/s/ Jeremy A. Tigan
Jeremy A. Tigan (#5239)
EXHIBIT A
REDACTED
IN ITS
ENTIRETY
Exhibit 1
Personal WebWatcher: design and
implementation
Dunja Mladenic
Dpt. for Intelligent Systems, J.Stefan Institute,
Jamova 39, 11000 Ljubljana, Slovenia
Phone: (+38)(61) 1773 272, Fax: (+38)(61) 1258-158
E-mail: Dunja.Mladenic@ijs.si
http://www-ai.ijs.si/DunjaMladenic/home.html
School of Computer Science, Carnegie Mellon University
Pittsburgh, PA, USA
1 Introduction
With the growing availability of information sources, especially non-homogeneous, distributed sources like the World Wide Web, there is also a growing interest in tools that
can help in making a good and quick selection of information we are interested in. Recent work that arises at the intersection of Information Retrieval and Machine Learning
o ers some novel solutions to this problem, as well as work in Intelligent Agents. For
example, Armstrong et al. 2] developed WebWatcher, a system that assists user in locating information on the World Wide Web taking keywords from the user, suggesting
hyperlinks and receiving evaluation. Balabanovic et al. 3] developed \a system which
learns to browse the Internet on behalf of a user". It searches the World Wide Web
taking bounded amount of time, selects the best pages and receives an evaluation from
the user. The evaluation is used to update the search and selection heuristics. Pazzani
et al. 26] collect ratings of the explored Web pages from the user and learn a user
pro le from them. Pages are separated according to their topic and a separate pro le
is learned for each topic. Mitchell et al. 25] proposed a system connected to the user's
electronic calendar, that generates sets of rules capturing the user's scheduling preferences and some other information about individual attendees of meetings. It uses these
rules to provide advice to the user for new, unscheduled meetings. Lang 21] developed
a system for electronic news ltering that uses text-learning to generate models of user
interests. Krulwich and Burkey 20] proposed \The ContactFinder agent" that reads
and responds to bulletin board messages assists users by referring them to other people
who can help them and categorizes messages and extracts their topic areas. Maes 24]
described \interface agents" that learn from the user as well as from other agents. As
examples of such agents they developed agents for: electronic mail handling, meeting
scheduling, electronic news ltering and entertainment recommendation. Some of these
agents use the context of documents and adopt Information Retrieval approaches (eg.
news ltering). Others rely on correlation between di erent users | performing \social
1
GGL-PUM001011628
learning" (eg. entertainment recommendation). Holte and Drummond 15], Drummond
et al. 9] designed a system that assists browsing of software libraries, taking keywords
from the user and using a rule-based system with forward chaining inference, assuming
that the library consists of one type of item and the user goal is a single item. Etizioni
and Weld 11] o er an integrated interface to the Internet combining UNIX shell and
the World Wide Web to interact with Internet resources. Their agent accepts high-level
user goals and dynamically synthesizes the appropriate sequence of Internet commands
to satisfy those goals. Hammond et al. ?] and Burke et al. 6] developed a system that
uses a \natural language question-based interface to access distributed text information
sources" and helps the user to nd answers to her/his question in a databases such as
FAQ les.
One of the available information sources is the World Wide Web and it is currently
growing quickly, attracting many users with di erent interests. Since interaction with
the World Wide Web (WWW) is by means of computer, one can use computers to
\observe" and record user actions and use this information to help users nd their way
in the WWW.
Personal WebWatcher is a system that observes users of the WWW and suggests
pages they might be interested in. It learns user interests from the pages requested
by the user. The learned model of user interests is then used to suggest hyperlinks
on new HTML-pages requested by and presented to the user via Web browser that
enables connection to \proxy" eg. Netscape. Section 2 gives an overview of Personal
WebWatcher, its functionality and some directions for further development. Section 3
describes some of the research problems related to Personal WebWatcher's learning.
The structure of the system and its implementations are described in Section 4 and Perl
code is given in the Appendix. Results of the rst experiments are given in Section 5.
2 \Personalizing" WebWatcher
Personal WebWatcher is mainly inspired by WebWatcher: \a Learning Apprentice for
the World Wide Web" 2], 16] and some other work related to learning apprentice
and learning from text 17], 19], 21], 25]. The idea of a learning apprentice is to
automatically customize to individual users, using each user interaction as a training
example.
WebWatcher can be described as an agent that assists users in locating information
on the WWW. It learns by observing a user on her/his way through the WWW and
suggests interesting hyperlinks whenever it is con dent enough. The idea is that the
user provides a few keywords describing a search goal and WebWatcher highlights
related hyperlinks on the current page and/or adds new hyperlinks to the current page.
It can also suggest pages related to the current page using information stored in the
structure of hypertext without considering the text itself 16], or send an e-mail message
to the user whenever speci ed pages change. The same WebWatcher version is designed
to serve all users, collecting information and sharing it between users. For example, if
someone recognizes a page as being related to the keywords she/he typed in the system
at the beginning of the search, this can be useful for any user searching for similar
2
GGL-PUM001011629
information.
Unlike WebWatcher, Personal WebWatcher (PWW) is structured to specialize for
a particular user, modeling her/his interests. It \watches over the user's shoulder" the
similar way WebWatcher does, but it avoids involving the user in its learning process
(it doesn't ask the user for any keywords or opinions about pages). It solely records
the addresses of pages requested by the user and highlights hyperlinks that it believes
will be of interest. In the learning phase (typically during the night), requested pages
are analyzed and a model of user interests is generated/updated. This model is used
to give advice for hyperlinks on retrieved HTML-pages requested by and presented to
the user via Web browser.
Since each user has her/his own copy of the system - her/his own agent, these agents
can communicate and exchange information on a base of similarity between their users,
often referred to as collaborative or social learning 22], 31]. There are also some
other types of \Personal agents" the user could use, for example, an agent for Calendar
Apprentice 25], and these agents can exchange information they have about the same
user in di erent elds/activities. we plan to investigate these in the future work on
Personal WebWatcher.
3 PWW \Behind the stage"
There are many research question behind Personal WebWatcher, that wait to be answered. Here, we consider some of them that are related to learning. There are also
many others, for example, how to design communication between user and agent and
between di erent agents, how to provide privacy to the user, to which extent agent
should involve user to its learning/exploration process, : : :
If we concentrate on learning, we rst want to know what kind of problem are we
faced with, how to represent it to some learning algorithm and which kind of algorithm
to use. Currently we restricted our work to text documents (plain text and HTML)
so we are faced with the problem of text-learning having mainly short to medium size
documents with varying vocabulary. This section gives di erent approaches related
to (1) document representation, (2) feature selection and (3) learning used on textlearning problem. Table 1 summarizes them over some related papers in order to give
an idea about current trends.
3.1 Document representation
The frequently used document representation in Information Retrieval and text-learning
is the so called TFIDF -vector representation. It is a bag-of-words representation: all
words from the document are taken and no ordering of words or any structure of text is
used (see Figure 1). Since most of our documents are in HTML format, there is a well
de ned underlying structure that could be used. There is also additional information in
plain text, for example structure of sentences, position of words or neighboring words.
The question is how much can we gain considering additional information in learning
(and what information to consider) and what is the price we have to pay for it? There
is currently no well studied comparison or directions that we are aware of. There is
3
GGL-PUM001011630
Paper reference
Apte et al. 1]
Doc. Representation
bag-of-words (frequency)
Feature Selection
Learning
stop list+
Decision Rules
frequency weight
Armstrong et al. 2]
bag-of-words (Boolean)
mutual info.
TFIDF, Winnow, WordStat
Balabanovic et al. 3] bag-of-words (frequency) stop list+stemming+
TFIDF
keep 10 best words
Bartell et al. 4]
bag-of-words (frequency)
latent semantic
|
indexing using SVD
Berry et al. 5]
bag-of-words(frequency)
latent semantic
TFIDF
Foltz and Dumais 12]
indexing using SVD
Cohen 8]
bag-of-words (Boolean)
infrequent words
Decision Rules
pruned
ILP
Joachims 17]
bag-of-words (frequency)
in/frequent words+
TFIDF, PrTFIDF,
mutual info.
Naive Bayes
Lewis et al. 23]
bag-of-words (Boolean)
log likelihood
logistic regression
ratio
combined with Naive Bayes
Maes 24]
bag-of-words+
mail/news header info.+
Memory-Based
header info.
selecting keywords
reasoning
Pazzani et al. 26]
bag-of-words (Boolean)
stop list+
TFIDF, Naive Bayes,
mutual info.
Nearest Neighbor,
Decision Trees
Sorensen and
n-gram graph
weighting graph
connectionist combined
Mc Elligott 33], 10]
(only bigrams)
edges
with Genetic Algorithms
Yang 35]
bag-of-words (Boolean,
stop list
adapted
frequency, TFIDF)
k-Nearest Neighbor
Table 1: Document representation, feature selection and learning algorithms used in
some related work.
some evidence in Information Retrieval research, that for long documents, considering
information additional to bag-of-words is not worth e orts. But our documents are
mostly HTML-documents on the WWW and they are not especially long!
In the process of using text to learn how to give advice for a hyperlink, di erent
approaches can be used to decide which part of text to use and how to represent it. Personal WebWatcher uses an approach similar to that of WebWatcher. In WebWatcher
the bag-of-words representation is used, where considered text consists of underlined
words, words in the sentence containing the hyperlink, words in all the headings above
the hyperlink and words given as keywords by the user 2]. Some later versions of
the WebWatcher system change slightly the way of constructing text for learning, eg.
adding words in the document retrieved behind hyperlink. Many current systems that
learn on text use the bag-of-words representation using either Boolean features indicating if speci c word occurred in document (eg. 2], 8], 23], 26], 35]) or frequency of
word in a given document (eg. 1], 3] 4], 5], 17], 35]). There is also some work that
uses additional information such as word position 8] or word tuples called n-grams 33].
We decided to use the bag-of-words representation using frequency of word and
observe success of given advice (whether user selected the advised hyperlink). In case
of poor system performance, some additional information from HTML-structure could
be added, for example, frequency of word in headlines of a given document. We would
4
GGL-PUM001011631
0
agent
1
of Artificial Intelligence, which is distributed
intelligence
text
0
JAIR is a refereed journal, covering all areas
journal
2
0
Journal of Artificial Intelligence Research
learning
3
internet
webwatcher
free of charge over the internet. Each volume
of the journal is also published by Morgan
0
0
Kaufman....
.
perl5
.
.
1
volume
Figure 1: Bag-of-words representation using frequency vector.
like to spend time on extracting and using this additional information only when highly
probable that we will gain a decent improvement in system performance. This is
currently under research.
3.2 Feature selection
If we decide to ignore all the additional information and use the bag-of-words approach,
we still end up with several tens of thousands of di erent words that occur in our
documents. Not only is using all these words time-consuming but also many of them
are not really important for our learning task. One of the approaches to reduce number
of di erent words is to use \stop-list" containing common English words (eg. a, the,
with) or pruning the infrequent and/or very frequent words ( 8], 17]). There is also the
possibility of word stemming. Many approaches introduce some sort of word weighting
and select only the best words (eg. 1], 2], 3], 17], 23], 26]) or reduce dimensionality
using latent semantic indexing with singular value decomposition (eg. 4], 5], 12]).
Some of the Machine Learning techniques for feature selection could also be used (eg. 7],
32], 18]) but most of them take too long in situations with several tens of thousands
of features.
In the current implementation of PWW we weight words using mutual information
between word occurance and class value 34], the same way as used in 2] or 17]. The
research topic is the number of best words to consider and we observe its in uence on
classi cation accuracy and precision of the best suggested hyperlink (see Section 5).
Mutual information assigns higher weight to the words that make better distinction
between interesting and uninteresting documents. One of the practical problems during
classi cation is that a new document often contains very few or even non of the selected
words. Since we are more interested in positive class (interesting documents) and we
want to have words that are frequent, it might be better to include in the weighting
formula the probability of a word occurring in the positive class or frequency of the
word ( 23]) TF (w) log( ( jj ) , where w is a selected word, c is the positive class and
(
)
TF (w) is the frequency of word w.
We plan to make additional experiments using the proposed combined weighting as
P w c
P w c
5
GGL-PUM001011632
well as using some other weighting methods. For example, combining a stop list with
weighting words by their frequency and keeping the most frequent words 35] or using
word weighting used in the odds ratio method 27]. Odds ratio is the method of document ranking according to their relevance for a given problem (eg. being interesting for
)
) (
user). Ranking of documents is de ned as ranking (d c) = log ( jj ) = log ( ) ( jj ) =
(
(
)
P 2 weight(w) + const: Word weight that it de nes can be used
( ) w 2d ( j )
log ( ) w2d ( j ) =
for feature selection, weighting all words and selecting words with the highest weight.
This word weight is de ned as
p(
; p(w))
weight(w) = log (1w)(1 w))p(w)
; p(
P c d
P c
P w c
P c P d c
P w c
P c
P c P d c
P c d
w
d
where p(w) = #( )+)+1 , where docs(c) is number of documents in class c and p(w)
(
is the same as p(w) except that c is substituted with c. Shaw 28] proposed special
handling of singularities in the above formulas for p(w) and p(w), namely, p(w) = # 1 2
when TF (w c) = 0 and p(w) = 1 ; # 1 2 when TF (w c) = #docs(c)
TF w c
const:
docs c
docs
docs
3.3 Learning algorithm
One of the well-established techniques for text in Information Retrieval is to represent
each document as a TFIDF -vector in the space of words that appeared in training
documents 30], sum all interesting document vectors and use the resulting vector as a
model for classi cation (based on 29] relevance feedback method). Each component of
a document vector d( ) = TF (w d)IDF (w ) is calculated as the product of TF (Term
Frequency | number of times word w occurred in document) and IDF = log ( i )
(Inverse document Frequency), where D is the number of documents and document
frequency DF (w ) is the number of documents word w occurred in at least once. The
exact formulas used in di erent approaches may slightly vary (some factors are added,
normalization performed 30]) but the idea remains the same. A new document is
then represented as a vector in the same vector space as the generated model and
the distance between them is measured (usually de ned as a cosine of angle between
vectors) in order to classify the document. This technique has already been used in
Machine Learning experiments on World Wide Web data (eg. 2], 3], 5], 17], 26]).
There are also some other techniques for model generation that have been used
in text-learning. Armstrong et al. 2] used a statistical approach they called WordStat that assumes mutual independence of words and de nes probability of class c as
P (c) = 1 ; (1 ; P (c=w)). Pazzani et al. 26] used a Naive (Simple) Bayesian classier on Boolean vectors, that assumes independence of words and de nes probability of
class c for given document doc that contains words w as proportional to P (c) P (c=w).
They also used Nearest Neighbor and symbolic learning using Decision Trees. A variant
of k-Nearest Neighbor was also usedP Yang 35], where relevance of class c given docuby
ment doc is de ned as rel(c=doc) = =1 similarity (doc D )P (c=D ) and similarity is
measured by cosine between vectors and P (c=D ) = # i # i
(same document
may occur several times being classi ed in di erent categories). Joachims 17] introduced Probabilistic TFIDF that takes into account document representation and dei
i
i
D
i
DF w
i
i
w
w
k
i
i
D I nc
i
i
D I ntrainingdata
6
GGL-PUM001011633
nes probability of class c for given document doc that contains words w as P (c=doc) =
P P
P (w=doc ) where P (w=c) = P
and P (w=doc ) = P
P (c)P (w=c)
w
T F (w c)
T F (w doc)
i P (ci )P (w=ci )
i T F (wi c)
i T F (wi
He also used Naive (Simple) Bayesian classi er on frequency vectors, the same as we
used in PWW. It assumes independence of words and de nes probability of class c for
given document doc that contains words w as
(
)
)
(
P (c=doc) = P PP(cc ) P Pw=c) ) ( )
(
(w=c
i
.
T F w doc
w
i
doc)
w
i
T F w doc
where P (w=c) = # 1+ P( )( i ) TF (w c) denotes frequency of word w in docu+
i
ments of class c and TF (w doc) denotes frequency of word w in document doc.
Apte et al. 1] used Decision Rules and observed that in case of di erent topics
being categories, it is better to select features for each given topic (using stop-list and
frequency weighting) than for all topics at once, even if the set of features is additionally
reduced for each topic using entropy-based measure to weight features. Cohen 8] used
Decision Rules and the Inductive Logic Programming systems FOIL and FLIPPER.
Lewis and Gale 23] used a combination of a Naive Bayesian classi er and logistic
regression de ning probability (of class c for given document doc that contains words w
Pw P (w=cc)) )
( +
as P (c=doc) = 1+ ( + P P Pw= ) ) .
(w=c
P (w=c)
w
Maes 24] used Memory-Based reasoning, McElligot and Sorensen 10], 33] used a
connectionist approach combined with Genetic Algorithms.
We decided to test di erent learning algorithms on PWW data (see Section 5), since
it is not clear which algorithm is the most appropriate. The current version of PWW
uses a Naive (Simple) Bayesian classi er on frequency vectors to generate a model of
user interests, that is used for advising hyperlinks.
TF w c
words
exp a
exp a
b
TF w
c
log
b
log
4 Structure of Personal WebWatcher
Personal WebWatcher consists of two main parts: a proxy server that interacts with
the user via Web browser and a learner that provides the user-model to the server
(see Figure 2). Communication between them is via disk the proxy saves addresses
of visited documents (URLs) and the learner uses them to generate model of user
interests. The whole system is implemented in approximately 2500 lines of Perl code
and 1500 lines of C ++ code.
The proxy server consists of three main parts, each implemented as a Perl script:
proxy (additionally calls external fetcher code to fetch the page), adviser and classi er (calls external C ++ code for classi cation). Proxy waits in an in nite loop for
a page request from browser. On request, it fetches the requested document and if it
is an HMTL-document adds advice and forwards the document to the user. To add
advice proxy forwards the page to adviser, that extracts hyperlinks from document
and calls external code for classi cation that uses generated user-model. A limited
number of hyperlinks that are scored above some threshold are recommended to the
7
GGL-PUM001011634
USER
WEB
fetcher
page
request
proxy
url
modified page
http://
visited urls
LEARNER
original page
hyper links
adviser
scores
generated model
classifier
model
Figure 2: Structure of Personal WebWatcher. The learning part is described separately
.
user, indicating their scores using graphical symbols placed around each advised hyperlink. For example, in Figure 3 three hyperlinks are suggested by PWW: \Machine
Learning Information Services" and two project members (Dayne Freitag, Thorsten
Joachims). There is a banner at the top of the page showing that PWW is \watching
over the user's shoulder".
4.1 Structure of the learning module
works in two versions: learning a new model from scratch (learner) or updating an existing model (updater) as shown in Figure 4. The di erence is that the
rst one has to de ne the domain (words to be used) and starts learning with an empty
model, while the second one has already de ned which words to use in representing documents as frequency vectors and has an existing model to modify. Both versions fetch
visited documents and documents one step behind the hyperlinks of visited documents
and store them as positive or negative examples of user interests, depending whether
the user visited the document or not (getDoc in Figure 4). Hyperlinks from visited
HTML-documents are extracted and stored in an extended hyperlink format, the same
that is used by adviser. Each hyperlink is represented in extended format, taking
into account underlined words, words in a window around a hyperlink and words in all
the headings above the hyperlink. Using hyperlinks represented only with underlined
words is often a bad idea, eg. click here...
Hyperlinks whose documents were visited by the user are considered to be positive
examples, and all the other to be negative examples of the user interests. The idea
is that all hyperlinks were presented to the user and the user chose to visit some of
them that meet her/his interests. This simpli cation is introduced to minimize users
Learner
8
GGL-PUM001011635
Figure 3: Example of HTML-page presented to the user by PWW.
involvement in the learning process and enable learning without asking user for page
rating. We plan to make it optional in some later versions of the system.
learner transforms documents into examples in two phases: (1) (docs2exs and
docs2addexs in Figure4) parsing each document, assigning an index to each word and
representing it in three les as a line of word indices containing: all words, only headline words, only underlined words. (2) (exs2vec in Figure4) calculating score (eg. information gain) for each word, selecting some top words and represent documents as
bag-of-words keeping frequency for each of the top words.
updater uses given top words (domain de nition) and represent documents as
bag-of-words, the same way learner does (docs2ddexs in Figure 4).
During the model generation(GenModel in Figure 4), the system can ask for additional information about some words (stating which kind of information - functions and
on which words - basic attributes). The kind of information it could ask for is speci ed
as so called background knowledge eg. feature saying how many times a word occurred
in document headlines (genAttr in Figure 4). This is currently under development.
Learning part consists basically of eight Perl scripts that call external C ++ code
for model generation/updating. Two scripts integrate parts of learner (updater)
and the other six that are represented with rectangles in Figure 4 (getDocs, docs2exs,
docs2addexs, exs2vec, genAttr, docs2ddexs). Two additional rectangles in Figure 4
GenModel, UpdateModel represent C ++ code.
9
GGL-PUM001011636
LEARNER
http://
retrieved documents (+,-)
all words
visited urls
OR docs2exs or
docs2addexs
getDoc
exs2vec
hyperlinks as documents (+,-)
domain
examples
headlines
underlines
basic attributes
functions
genAttr
GenModel
model
attr. names
attr. values
UPDATER
(update model)
http://
retrieved documents (+,-)
new visited urls
examples
getDoc
OR
docs2ddexs
UpdateModel
model
hyperlinks as documents (+,-)
domain
Figure 4: Structure of PWW Learner.
4.2 Model of user interests
The model of user interests is designed to predict if some document is positive or
negative example of user interests. It is used to advice hyperlinks on the HTMLdocument requested by the user. Since the prediction should be performed while user is
waiting for a HTML-document, we are actually predicting interestingness of document
based on the hyperlink pointing to it, and not document itself (retrieving documents
behind the requested hyperlinks is usually time consuming)
The model of user interests is generated \o -line", usually during the night and thus
its generation is not so critical in time as its usage for prediction. One of the simplest
idea for learning is to use hyperlinks that occurred on the documents presented to the
user as training examples and learn to predict if a new hyperlink is positive or negative
example of the user interests:
User
HL
: HyperLink ! fpos neg g
What we use is an extended representation of hyperlink (see Section 4.1), that tries to
capture information related to the document behind a hyperlink. But during the learning phase we can a ord using more time than when adding advice, so why not retrieving
documents behind hyperlinks, instead of using the extended hyperlink representation?
In that case, we can learn the model of user interests directly from documents whose
10
GGL-PUM001011637
interestingness we are trying to predict:
User
DOC
: Document ! fpos neg g
In this case, we end up with using the model generated from documents to predict
interestingness of hyperlinks. Since our hyperlinks are represented as short HTMLdocuments (including headlines and some portions of text) it is not so unusual, but we
could also learn a model that predicts document content based on a given hyperlink:
Document
HL
: HyperLink ! Document
and then predict interestingness of so predicted document content using the above
described model User . Our rst experiments are in learning the rst two models.
DOC
5 First experimental results
In order to select a good document representation, feature selection method and learning algorithm we decide to test di erent possibilities and compare them. Since PWW
has to recommend interesting hyperlinks to the user, we are interested in measuring
the precision of our system on the most highly recommended hyperlink. Precision is
frequently used as a metric in Information Retrieval and it is de ned as the percent of
positive suggestions among all suggestions made. In our case, it is either zero (in case
the best suggestion is a negative example and shouldn't be suggested to the user) or one
(if we made a correct suggestion). We also measured traditional Machine Learning quality estimate classi cation accuracy, de ned as percent of correctly classi ed examples
(over all classes). Both quality estimates are calculated using 10-fold cross-validation
technique (see Figure 5) using the same example splits for all tested algorithms.
training examples
2 3
all examples
1 2
testing examples
...
10
...
1 3
...
GenModel
10
10
GenModel
.
.
.
1 2
1
2
Classify
Classify
.
.
.
...
9
GenModel
10
Classify
average results
Figure 5: Illustration of 10-fold cross-validation experiments.
In rst experiments we observe how vector length (number of features selected) in uences model quality for two learning algorithms: Naive Bayesian classi er on frequency
vector as used by Joachims 17] (see Section 3.3) and k-Nearest Neighbor approach on
11
GGL-PUM001011638
frequency vectors using Euclidean distance between examples and summing class probabilities predicted by k-neighbors. We tested both algorithms on data for documents
behind hyperlinks User
and data for hyperlinks User (see Section 4.2). Documents are currently represented using the bag-of-words approach (see Section 3.1) and
feature selection is performed using mutual information approach (see Section 3.2).
Our experiments are performed on data collected for four users participating in the
HomeNet project 14] with the data characteristics given in Table 2.
DOC
HL
UserId and probability of number of data
data source interestingness examples entropy
usr150101
Doc
0.094
1 333
0.449
HL
0.104
2 528
0.480
usr150202
Doc
0.107
3 415
0.492
HL
0.053
4 798
0.301
usr150211
Doc
0.089
2 038
0.436
HL
0.044
2 221
0.259
usr150502
Doc
0.100
1 272
0.468
HL
0.100
2 498
0.468
Table 2: Data characteristics for document (Doc) and hyperlink (HL) data for each of
the four HomeNet users.
In all experiments k-Nearest Neighbor achieved slightly higher classi cation accuracy than the Naive Bayesian classi er (see Figures 6, 7, 8, 9), but the di erence is
signi cant only in one out of six experiments (see Figure 8). Adding more than approximately 50 features doesn't appear to help for classi cation accuracy, but it also
doesn't hurt. High classi cation accuracy achieved for all four users by k-Nearest Neighbor algorithm is in fact default accuracy if negative class is predicted for all documents.
So what we are really interested in is making good predictions for positive documents,
and that is why we decide to measure precision of the best suggested hyperlink.
Precision varies much more with vector size than classi cation accuracy (see Figures 10, 11, 12, 13), it actually drops in seven out of eight experiments. It seems that
k-Nearest Neighbor is more stable in precision than Naive Bayesian classi er, achieving
higher precision for longer vectors and about the same for up to 100 features, with
exception for one user (see Figure 12), where the precision of Naive Bayesian classi er
is for most vector sizes 0 for the document model and much better on short vectors
than k-Nearest Neighbor for the hyperlink model.
In order to draw some conclusion about vector size and quality of algorithms, we
need to perform more experiments on di erent users. These rst experiments show
that increasing vector size probably isn't as bene cial as one could expect and it even
could hurt precision of the best suggestion. There is no evidence that algorithms
di er substantially in classi cation accuracy, although k-Nearest Neighbor seems to be
12
GGL-PUM001011639
more promising both in accuracy and precision. If further experiments con rm the
hypothesis that long vectors are not advisable, a closer look at the short vectors (eg.
see Figure 14) should give an idea about number of features that work well for both
accuracy and precision.
HomeNet data, user 150101
HomeNet data, user 150101
95
Accuracy(%)
100
95
Accuracy(%)
100
90
85
NBayesTFDoc
kNNeighbourDoc
90
85
80
NBayesTFHL
kNNeighbourHL
80
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 6: In uence of vector size to classi cation accuracy of model based on documents
(left) and hyperlinks (right) for the HomeNet project user with id: 150101. Notice that
classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation
since accuracy is calculated as average of 10 results.
HomeNet data, user 150202
HomeNet data, user 150202
95
Accuracy(%)
100
95
Accuracy(%)
100
90
85
NBayesTFDoc
kNNeighbourDoc
90
85
80
NBayesTFHL
kNNeighbourHL
80
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 7: In uence of vector size to classi cation accuracy of model based on documents
(left) and hyperlinks (right) for the HomeNet project user with id: 150202. Notice that
classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation
since accuracy is calculated as average of 10 results.
References
1] Apte, C., Damerau, F., Weiss, S.M., Toward Language Independent Automated
Learning of Text Categorization Models, Proc. of the 7th Annual International
ACM-SIGIR Conference on Research and Development in Information Retrieval,
Dubline, 1994.
13
GGL-PUM001011640
HomeNet data, user 150211
HomeNet data, user 150211
95
Accuracy(%)
100
95
Accuracy(%)
100
90
90
NBayesTFDoc
kNNeighbourDoc
85
85
80
NBayesTFHL
kNNeighbourHL
80
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 8: In uence of vector size to classi cation accuracy of model based on documents
(left) and hyperlinks (right) for the HomeNet project user with id: 150211. Notice that
classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation
since accuracy is calculated as average of 10 results.
2] Armstrong, R., Freitag, D., Joachims, T., Mitchell, T., WebWatcher: A Learning
Apprentice for the World Wide Web, AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, March
1995. URL: http://www.cs.cmu.edu/afs/cs/project/theo-6/web-agent/www/webagentsplus.ps.Z
3] Balabanovic, M., Shoham, Y., Learning Information Retrieval Agents: Experiments with Automated Web Browsing, AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, March
1995. URL: http://robotics.stanford.edu/people/marko/papers/lira.ps
4] Bartell, B.T., Cottrell, G.W., Belew, R.K., Latent Semantic Indexing is an Optimal
Special Case of Multidimensional Scaling, Proceedings of the ACM SIG Information Retrieval, Copenhagen, 1992.
5] Berry, M.W., Dumais, S.T., OBrein, G.W., Using linear algebra for intelligent
information retrieval. SIAM Review, Vol. 37, No. 4., pp. 573{595, December 1995.
6] Burke, R., Hammond, K., Kozlovsky, J., Knowledge-based Information Retrieval
for Semi-Structured Text, Working Notes from AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, pp. 19{24, American Association
for Arti cial Intelligence, 1995.
7] Caruana, R., Freitag, D., Greedy Attribute Selection, Proc. of the 11th International Conference on Machine Learning ICML94, pp. 28|26, 1994.
8] Cohen, W.W., Learning to Classify English Text with ILP Methods, Workshop on
Inductive Logic Programming, Leuven, September 1995.
14
GGL-PUM001011641
HomeNet data, user 150502
HomeNet data, user 150502
95
Accuracy(%)
100
95
Accuracy(%)
100
90
85
NBayesTFDoc
kNNeighbourDoc
90
85
80
NBayesTFHL
kNNeighbourHL
80
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 9: In uence of vector size to classi cation accuracy of model based on documents
(left) and hyperlinks (right) for the HomeNet project user with id: 150502. Notice that
classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation
since accuracy is calculated as average of 10 results.
HomeNet data, user 150101
HomeNet data, user 150101
0.8
0.6
0.6
Precision
1
0.8
Precision of the best suggestion
1
0.4
0.4
NBayesTFDoc
kNNeighbourDoc
NBayesTFHL
kNNeighbourHL
0.2
0.2
0
0
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 10: In uence of vector size to precision of model based on documents (left) and
hyperlinks (right) for the HomeNet project user with id: 150101.
9] Drummond,C., Ionescu, D., Holte, R., A Learning Agent that Assists the Browsing of Software Libraries, Technical Report TR-95-12, Computer Science Dept.,
University of Ottawa, 1995.
10] Mc Elligott, M., Sorensen, H., An emergent approach to information ltering,
Abakus. U.C.C. Computer Science Journal, Vol 1, No. 4, December 1993.
11] Etizioni, O., Weld, D., A Softbot-Based Interface to the Internet, Communications
of the ACM Vol. 37, No. 7, pp.72{79, July 1994.
12] Foltz, P. W. and Dumais, S. T., Personalized information delivery: An analysis of
information ltering methods. Communications of the ACM, 35(12), pp.51|60,
1992.
15
GGL-PUM001011642
HomeNet data, user 150202
HomeNet data, user 150202
1
0.8
0.8
0.6
NBayesTFDoc
kNNeighbourDoc
0.6
Precision
Precision of the best suggestion
1
0.4
0.4
0.2
0.2
NBayesTFHL
kNNeighbourHL
0
0
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 11: In uence of vector size to precision of model based on documents (left) and
hyperlinks (right) for the HomeNet project user with id: 150202.
HomeNet data, user 150211
HomeNet data, user 150211
0.8
0.6
0.6
Precision
1
0.8
Precision of the best suggestion
1
0.4
0.4
NBayesTFDoc
kNNeighbourDoc
NBayesTFHL
kNNeighbourHL
0.2
0.2
0
0
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 12: In uence of vector size to precision of model based on documents (left) and
hyperlinks (right) for the HomeNet project user with id: 150211.
13] Hammond, K., Burke, R., Schmitt, K., A Case-Based Approach to Knowledge
Navigation, AAAI Workshop on Indexing and Reuse in Multimedia Systems, pp.
46{57, American Association for Arti cial Intelligence, 1994.
14] HomeNet: a research project at Carnegie Mellon University, Pittsburgh, PA, USA,
1996, URL:http://homenet.andrew.cmu.edu/progress/
15] Holte, R.C., Drummond, C., A Learning Apprentice For Browsing, AAAI Spring
Symposium on Software Agents, 1994.
16] Joachims, T., Mitchell, T., Freitag, D., Armstrong, R., WebWatcher: Machine
Learning and Hypertext, Fachgruppentre en Maschinelles Lernen, Dortmund, August 1995.
17] Joachims, T., A Probabilistic Analysis of the Rocchio Algorithm with TFIDF
for Text Categorization, Technical report, CMU-CS-96-118, School of Computer
Science, Carnegie Mellon University, March 1996.
16
GGL-PUM001011643
HomeNet data, user 150502
HomeNet data, user 150502
0.8
0.6
0.6
Precision
1
0.8
Precision of the best suggestion
1
0.4
0.4
NBayesTFDoc
kNNeighbourDoc
NBayesTFHL
kNNeighbourHL
0.2
0.2
0
0
1
100
200
300
400
500
Vector size
600
700
800
900
1000
1
100
200
300
400
500
Vector size
600
700
800
900
1000
Figure 13: In uence of vector size to precision of model based on documents (left) and
hyperlinks (right) for the HomeNet project user with id: 150502.
HomeNet data, user 150101
HomeNet data, user 150101
0.8
0.6
0.6
Precision
1
0.8
Precision of the best suggestion
1
0.4
0.4
NBayesTFDoc
kNNeighbourDoc
NBayesTFHL
kNNeighbourHL
0.2
0.2
0
0
1
2
4
8
16
32
Vector size
64
128
256
512
1
2
4
8
16
32
Vector size
64
128
256
512
Figure 14: In uence of vector size to precision of model based on documents (left) and
hyperlinks (right) for the HomeNet project user with id: 150101. Notice that log-scale
on x-aix.
18] John, G.H., Kohavi, R., P eger, K., Irrelevant Features and the Subset Selection Problem, Proc. of the 11th International Conference on Machine Learning
ICML94, pp. 121|129, 1994.
19] de Kroon, H.C.M, Mitchell, T., Kerckho s, E.J.H., Improving Learning
Accuracy in Information Filtering, ICML-96 Workshop Machine learning
meets human computer interaction, 1996. URL: http://www.ics.forth.gr/ moustaki/ICML96 HCI ML/kroon.ps
20] Krulwich, B., Burkey, C., The ContactFinder agent: Answering buletin board
questions with referrals, Proceedings of the Thirteenth National Conference on
Arti cial Intelligence AAAI 96, pp.10|15, 1996.
21] Lang, K., News Weeder: Learning to Filter Netnews, Proc. of the 12th International Conference on Machine Learning ICML95, 1995.
22] Lashkari, Y., The WebHund Personalized Document Filtering System, 1995.
17
GGL-PUM001011644
23] Lewis, D.,D., Gale, W., A., A Sequential Algorithm for Training Text Classi ers,
Proc. of the 7th Annual International ACM-SIGIR Conference on Research and
Development in Information Retrieval, Dubline, 1994.
24] Maes, P., Agents that Reduce Work and Information Overload, Communications
of the ACM Vol. 37, No. 7, pp.30{40, July 1994.
25] Mitchell, T., Caruana, R., Freitag, D., McDermott, J., Zabowski, D., Experience
with a Learning Personal Assistant, Communications of the ACM Vol. 37, No. 7,
pp.81{91, July 1994. URL: http://www.cs.cmu.edu/afs/cs/user/mitchell/ftp/cacm.ps.Z
26] Pazzani, M., Muramatsu, J., Billsus, D., Syskill & Webert: Identifying interesting
web sites, AAAI Spring Symposium on Machine Learning in Information Access,
Stanford, March 1996 and Proceedings of the Thirteenth National Conference on
Arti cial Intelligence AAAI 96, pp.54|61, 1996.
27] van Rijsbergen, C.J,. Harper, D.J., Porter, M.F., The selection of good search
terms, Information Processing & Management, 17, pp.77|91, 1981.
28] Shaw Jr, W.M., Term-relevance computations and perfect retrieval performance,
Information Processing & Management, 31(4), pp.491|498, 1995.
29] Rocchio, J., Relevance Feedback in Information Retrieval, in The SMART Retrieval System: Experiments in Automatic Document Processing, Chapter 14,
pp.313|323, Prentice-Hall Inc., 1971.
30] Salton, G., Buckley, C., Term Weighting Approaches in Automatic Text Retrieval,
Technical report, COR-87-881, Department of Computer Science, Cornell University, November 1987.
31] Shardanand, U., Maes, P., Social Information Filtering: Algorithms for Automating \Word of Mouth", CHI'95 Mosaic of Creativity, pp. 210|217, 1995.
32] Skalak, D.B., Prototype and Feature Selection by Sampling and Random Mutation
Hill Climbing Algorithms, Proc. of the 11th International Conference on Machine
Learning ICML94, pp. 293|301, 1994.
33] Sorensen, H., McElligott, M., PSUN: A Pro ling System for Usenet News,
CIKM'95 Intelligent Information Agents Workshop, Baltimore, December 1995.
34] Quinlan, J.R., Constructing Decision Tree in C4.5: Programs for Machine Learning, pp.17{26, Morgan Kaufman Publishers, 1993.
35] Yang, Y., Expert Network: E ective and E cient Learning form Human Decisions in Text Categorization and Retrieval, Proc. of the 7th Annual International
ACM-SIGIR Conference on Research and Development in Information Retrieval,
Dubline, 1994.
18
GGL-PUM001011645
Exhibit 4
Exhibit 5
Collecting
User Access Patterns for Building User Profiles
and Collaborative Filtering
Ahmad M. Ahmad Wasfi
School of Computer Sciences
University of Science, Malaysia
Penang 11800, Malaysia
ahmadm@cs.usm.my
relative priorities to completely transforming
into other
domains. Thus, the recommender system must be able to
.
specialize to the current interests of the user and adapt as
they change over time. A variety of learning mechanisms
have been employed within recommender
systems. They
mostly revolve around the following three basic techniques
ABSTRACT
The paper proposes
a new learning mechanism to extract
user preferences transparently
for a World Wide Web
recommender system. The general idea is that we use the
entropy of the page being accessed to determine
its
interestingness
based
on its occurrence
probability
following a sequence of pages accessed by the user. The
probability
distribution
of the pages is obtained by
collecting the access patterns of users navigating on the
Web. A finite context-model is used to represent the usage
information.
Based on our proposed model, we have
developed an autonomous agent, named ProfBuilder, that
works as an online recommender
system for a Web site.
ProfBuilder uses the usage information
as a base for
content-based and collaborative filtering.
u41.
Direct learning technique:
The user provides a set of keywords to describe his/her
interests (e.g. SIFT [22]). These keywords may
include objective terms (such as author name and date
of publication)
or content keywords to reflect the
information manifested in the desired documents. One
advantage of this technique is its predictability.
The
user can usually guess why such information has been
delivered. The problem with this technique is that it
requires a lot of effort on the part of the user. The user
should continually
update the keywords to reflect
his/her new interests. The user may not also formulate
effective keywords to describe his/her interests.
Keywords
Autonomous
agent, Classical
context-model,
Content-based
filtering.
information
filtering,
theory, Finite
Collaborative
INTRODUCTION
Partially direct learning technique:
The system learns user preferences
by eliciting
explicitly some kind of user feedback (e.g. SysKill &
Webert [13], InfoFinder
[lo], Ringo [18], and Fab
[3]). When the user moves from one page to another,
the user rates how interesting the current page is. It
needs less user effort than the first technique.
However, the user still has a greater mental load
compared when he/she browses normally.
The World Wide Web hypertext system is a very large
distributed
digital information
space. Some estimates
suggested that the Web included about 150 million pages
and this number doubled every four months [7]. As more
information becomes available, it becomes increasingly
difficult to search for information
without specialized
aides.
Agent-based
recommenders
have been proposed as a
solution to this problem. During a browsing session, these
computer systems work collaboratively with a user without
the need of an explicit initiation. It has a static or dynamic
collection of pages to be recommended.
It assists the user
by recommending pages that match his/her needs.
Indirect or transparent learning technique:
The system learns user preferences
transparently
without any extra effort from the user. However,
current methods are not adequately practical. For
example, Letizia [1 l] infers user preferences from
observing user-browsing
behavior such as saving a
reference to a page. This approach is intuitively
reasonable to indicate an interest to the page. During a
browsing session, however, a user may refer to many
pages in his/her bookmark just for future examination
and not because of their interestingness.
Since the system involves repeated interaction with the
user and this may extend over a long period of time, the
user’s interests cannot be assumed to stay constant. The
change in interests could be anything from a slight shift in
Permissioli
to make digital
~er5011d
ck~room
or
use
or hard cop&
(fallor partof this work for
is grranted without fee procidrd that copies
To address the problems of the basic techniques, some
systems use a mixture of them to get a compromise model
between user effort and predictability.
For example,
Anatagonomy
[14] learns user preferences by joining both
are not Wide or distnhutcd Ihr profit or commercial a&z_~ntage and that
topics bear this ncoticc and the full citation on the first page. ‘fo copy
othsrwk.
to republish. to post on sewers or to redistribute to lists,
rquires prior specific
permissionand!ora f&2.
ICI 99
Redondo
Copyright
ACM
Beach
CA
USA
1999 l-581 13-098~8/99/01...$5.00
57
GGL-PUM00101614
the second and third techniques.
SiteHelper
[ 191 and
WebWatcher
[2] use the first technique to specify the
initial user area of interests and use the second technique to
get
feedback
from
the
user
to
refme
their
recommendations.
However, the problems of the basic
techniques are still inherent in the recommender systems.
and adding the vector elements Di representing
after it is changed in proportion to to,
Q=Q-I-ti*Di
i.e. the weight of each word in Di is modified proportional
to tii. The resulting effect is that, for those words already
present in the profile, the word-weights are modified in
proportion to tii * di. Words which are not in the profile are
added to it.
With the drawbacks of the learning techniques in mind, we
propose a new mechanism that learns user interests and
adapts automatically
to their changes
without
user
intervention. It relies on using the probability distribution
of the pages to be accessed and tools derived from classical
information theory. In order to illustrate how the learning
mechanism plays out in a real application, we present an
agent-based recommender, called ProfBuilder, that inhabits
a Web site and is assigned the goal of being online
responsive to the information needs of the site’s users.
It remains to find an effective way for inferring tq.
The Method
Before introducing
the algorithm, let us first get our
bearings by considering a few examples. Suppose we have
a hypertext collection of 1000 pages (e.g. a Web site).
Users navigate in the collection by using any navigation
technique such as selecting hypertext links, specifying
page addresses, or selecting pages from interest lists.
Suppose we have collected the access patterns of a large
number of users. Consider that A and B are two pages in
the collection. If the conditional probability of visiting A
given B is very high (i.e. it is very probable that when a
user visits A, the user will jump next to B), one would
consider
that there is a strong interdependency
or
relationship between A and B. One form of relationship is
that there is a high resemblance
between A and B in
content. In effect this means that the user is still interested
in the same domain of A. Hence visiting B does not give
much new information
about the user’s interests as the
content of B is redundant to A’s. To guarantee that the
agent can adapt quickly to the changes of user interests, the
importance of B should be low.
The following
section provides a detail of our new
proposed
learning
mechanism.
We then present
an
overview of the architecture of ProfBuilder. Finally we
outline some related work, followed by some possible
ideas to address the limitations of ProfBuilder.
THE LEARNING
page si
MECHANISM
Background
A profile is a description of user interests. To deliver
information a user wants to see, we should search for pages
that are similar
to his/her profile.
An appropriate
representation for profiles and pages is based on vectorspace representation,
commonly
used in information
retrieval (IR) literature [ 151.
In the vector-space model, pages and queries (profiles) are
both represented as vectors in some hyper-space.
The
model assumes that there is an available keyword set k,
where each element k, is a keyword. Both pages and
profiles can then be represented as weight vectors of the
form
Another form of relationship
is that B has a great
informative value from A. This is because B may contain
important hyper information (hyperlinks pointing to other
pages), an important content, or may be both. In this case,
B may not be considered to reflect the actual user’s
interests. The user may visit B just because of its
importance at some time. As an example, consider a user
reads the top-headline story “Iraq Standoff as Diplomatic
Efforts Continue, Sabers Rattle” from the main page of
CNN site. The user may read the story just because of its
significance
as it is the top-headline
story and is not
interested to get more information
about Iraq. Since B
gives a little information about user interests because a
user will likely visit B regardless of his/her interests, the
importance of B should be low.
D, = < d, >
and
Qi = < qi >
where d, and qi represent the weight of ki in vector Di or Qi,
respectively. di (or qi) is set equal to 0 when k, is absent
from the vector.
For
this
representation,
the
method
for
profile
reformulation in response to the changes of user’s interest
is based on vector adjustment. Since profiles and pages are
both vectors, the profile should move closer to the vectors
representing pages which are relevant and away from the
vector representing
pages which are non-relevant.
The
implicit assumption of this is that pages resemble each
other are represented by reasonably similar vectors.
If, on the other hand, the probability
of visiting B
following A is very low, one may consider that there is no
relationship between A and B (such as in content). Thus, if
a user chooses to jump from A to B, B gives much
information about his/her interest. For example, suppose a
user reads a non-headline story about space from the main
page of CNN site. Since it is reasonable to assume that the
probabilities
of visiting non-headline
stories are low in
comparison to the headline ones, one would expect that the
user is interested in space stories. In other cases, B may be
Consider that page si is the current page of user Uj. Let us
assume that variable t,,, which is a nonnegative
number
the relevance
or
between
zero and one, indicates
importance of page si to user Uj. A reformulation of vector
Qj representing the user profile is obtained by taking Qj
58
GGL-PUM00101615
considered to be a threshold point
some domains and gaining interest
importance of B should be high.
of losing interest in
in others. Thus, the
where pr is the conditional probability of visiting
given the sequence of pages si 1sjl, sjz,..., sjm
A context model may be an order-m fixed model based on
a fixed number m of previous pages in its probability
determination or may be an order-m mixed model based on
contexts of several lengths with a maximum-length
of m.
In an order-m fixed model, the probability
(or, more
accurately, frequency) of a given page is known only if we
know the m preceding pages. For instance, if m = 0 then no
context is used and the probability of the current page is
the probability of its occurrence in the collection. If m = 1
then the previous page is used to determine the probability
of the current page. If m = 2 then the previous two pages
are used, and so on. At any one time, therefore, we shall
call the m preceding pages the state of the order-m model
at that time. Since there are q pages, an order-m model will
have qm possible states. A handy way to illustrate the
behavior of a context model is through the use of a state
diagram. In a state diagram we represent each of the qm
possible states of the model by a circle, and the possible
transitions from state to state by arrows.
If we consider the foregoing examples, we see that the
importance
of a page is inversely proportional
to its
probability following the page (or the sequence of pages)
visited by the user.
The importance
of classical
information
theory
is
embodied in the idea that the value of information content
H of messages sent from message sources to message
receivers can be measured quantitatively.
Because of the
redundancy that occurs due to the dependency between
successive messages, the value of a message is assumed to
be inversely proportional to the probability with which the
message could have been predicted by the receiver before
the message arrived [ 171.
H of a message is then defined as a decreasing function
H(p), also known as the entropy, of the probability p of
that message. Because the information
content of two
messages should be an additive function of their individual
content values, that is, H@,pz) = H(p,) + H(p,), and the
value of a message received with the probability
one
should be zero, the information content of a message can
be defined as (for formal prove see[l]),
HO
As an example, suppose we have a very simple Web site
consisting of two pages A and B. That is, s = {A, B} and q
= 2. When m = 2 then the probability distribution of page si
(i = 1 or 2) is determined by the previous two pages sj and
Sk (j, k = 1 or 2). Let us assume that the conditional
distribution Of Si given Sj and Sk is as follows:
= - log( P)
If the browsing process is considered as a transference of
information from a collection of pages such as a Web site
(message source) to a user (message receiver), it is
possible to use tools derived from information theory to
quantify the information value of a page (message) as it
relates to each user.
pr(si = A
A, Sk = B) = 115
I Sj =
pr(si = B
B, Sk = B) = 213
I Sj = ~1, Sk =
p)
=
1 -pr(si
=A
I Sj = CX, Sk =
p)
where (a, p = A or B).
Because q is equal to 2 and we have assumed an order-2
fixed context model, we have four states - AA, AB, BA,
and BB. The state diagram of this model is shown in
Figure 1.
for Web Navigation
distribution of the pages to be accessed is
based on collecting the visiting patterns of many users.
The usage history of a collection of pages, such as a Web
site, was represented previously as a directed graph of
pages [21] [9]. This means that the occurrence of a page
depends only upon the previous page. A more general type
representation is that the occurrence of a page may depend
upon a context consisting
of a finite number m of
preceding pages. Such a representation
is called finitecontext modeling which is commonly used in statistical
modeling
of an information
source
(e.g. for text
compression
[S]). The advantage of using the context
model is that it better reflects the actual distribution of the
pages. The model is specified by giving the finite set s
consisting of q pages (where each element si is a page) and
the set of conditional probabilities:
,...,
1 Sj =
pr(si = A
The probability
,..., q;jp=1,2
A, sk = A) = l/4
pr(si = A 1.s,= B, sk = A) = l/7
tq = H(pr) = - log (pr)
Si_)
pr(si 1Sjl,
Sj2,..., fori=1,2
1 Sj =
pr(si = A
The importance or interestingness
tii of page si to user u, is
then assumed to be its entropy H(‘pr) based on its
conditional probability pr of being accessed following the
sequence of pages accessed by user uj,
Representation
page s,,
The possible states of the site are indicated by the four
circles. The possible state transitions
are indicated by
arrows from state to state, with the probability
of a
transition shown by a number associated with each arrow.
For example, if we are in state AA we can go to either AA
or AB but not to state BA or BB. The probability
of
remaining in state AA is shown as l/4, and the probability
of going to state AB is shown as 314.
On the other hand, in an order-m mixed model, if m = 2
then we use the previous two pages, one predecessor if
two-pages context fails to determine the probability of a
page, and the probability of the page occurrence if both
two-pages and one-page contexts fail. A mixed model may
be either fully or partially mixed. The model is f~11lymixed
if it contains all the fixed sub-models whose orders equal
4
59
GGL-PUM00101616
mechanism
is implemented
based
described in the preceding section.
on
the
algorithm
ProfBuilder inhabits a Web site and is assigned the goal of
finding relevant local pages for the site’s users. The
advantage of this architecture is that ProfBuilder does not
need to search the Web to collect the pages to be
recommended,
as they are the site’s pages. Thus,
ProfBuilder has the benefit of operating without using
bandwidth from the Internet except a trivial amount when
it delivers its recommendations
to the users. Moreover,
such a system is also transparent to the users requiring to
external installation.
ProfBuilder is autonomous
page filtering on the user’s
the preferences of the user
time. It is transparent as it
user intervention.
213
Figure 1. State diagram of an order-2 fixed model
withq=2.
ProfJ3uilder keeps track of each individual
user and
provides that person online assistance. The assistance
includes two lists of recommendations
based on two
different
filtering
paradigms:
content-based
and
collaborative. ProfBuilder updates the lists each time the
user changes his/her current page. Content-based filtering
is based on the correlation between the content of the
pages and the user’s preferences.
The collaborative
filtering is based on a comparison between the user’s path
of navigation
and the access patterns of past users.
Combining
the two paradigms
may eliminate
the
shortcomings in each approach. By making collaborative
filtering, we can deal with any kind of content and explore
new domains to find something interesting to the user. By
making content-based
filtering, we can deal with pages
unseen by others [3].
or less than the order of the model. That is, a fully mixed
model with m = 3 bases its determination on sub-models of
orders 3, 2, 1, and 0. The model is partially mixed if it uses
some, but not all, of the sub-models.
The Order of a Context Model
Determining the order m of a model is critical to reflect the
actual probability distribution of a collection of pages. Let
us consider the behavior of the model as m equals to zero
and when m gets larger. When m = 0, theprobability of a
page is just the probability
of its occurrence
in the
collection, which is obviously not enough to show the
dependency among the pages. As m increases, it reflects
better the actual dependency among the pages. But when m
gets very large, the dependence of a page on the previous
m pages becomes very weak. Thus, an order of few pages
is very reasonable to determine the dependency among the
pages.
Calculating
as it can take actions relating to
behalf. It is adaptive as it learns
and adapts as they change over
extracts the preferences without
To overcome the problem of stateless connection in HTTP,
ProfBuilder
follows users through tracking
their IP
addresses. To track user presence, a timeout mechanism is
used to delete user’s
session
information
after a
predetermined amount of idle time. So that, a connection
after the specified period having the same IP is identified
as a new user. This method is fairly easy to implement.
The problem with this way is that many users connect to
the Internet though proxy servers. Consequently, the IP of
a proxy server may represent two or more people who are
accessing the same Web site simultaneously
in their
browsing sessions, causing an obvious conflict. However,
the reality is that many large sites use this method and have
not had any clashes [20].
the Probability
The general mechanism to calculate the probability pr of
page si in a frilly order-m mixed model is dictated by the m
most recent pages of the user path, if page Si has occurred
in this particular path before by past users. In this case,
only the order-m
probability
distribution
is used.
Otherwise, sub-models of lower orders are consulted. If the
order-O sub-model is consulted (i.e. the page si has never
occurred in the context of any higher sub-model before), pr
is assumed to be proportional to,
Fast performance is a key requirement as ProfBuilder is in
the middle of every Web transaction. ProfBuilder was built
in a highly multi-threaded fashion using Java language so
that no information
that ought to be delivered to the
browser gets stuck somewhere in the agent system.
where ni is the occurrence frequency of page 31, and N is
the total number of times of visiting all the pages. This will
guarantee to supply a probability for any page in collection
s.
ARCHITECTURE
The architecture
three modules.
This section discusses ProfBuilder (acronym for Profile
Builder), a transparent, adaptive, autonomous agent which
works as a recommender
system. ProfBuilder’s
learning
60
of ProfBuilder
can be broken down into
GGL-PUM00101617
.
The graphical user interface module is responsible
displaying ProfBuilder’s interface.
l
The learning module is responsible for maintaining the
mapping between the actual interests of the user and
the user profiles.
l
The filtering module is responsible
based and collaborative filtering.
for
the number of pages pointing
to the page [4]). The
visibility is a sign of popularity [12] and frequency of
access. For example, we expect that a page with a visibility
of 10 will be accessed more than a page with a visibility of
1. The visibility data is obtained by querying Infoseek
search engine.
for the content-
The context model is built progressively
as users jump
from one page to another using any navigation technique.
The general mechanism
of building
is to update the
frequency of the occurrence of the current page in order-O
sub-model,
and update its frequency of occurrence in
order-l sub-models based on the user’s previous page.
The Graphical User Interface
The graphical user interface of ProfBuilder is a separate
resizable HTML frame at the top of the current page.
Figure 2 illustrates ProfBuilder’s
interface’. The result
frame displays
a list of the recommended
pages
represented by their respective titles. The title is also
associated with the page size which may be useful to
distinguish among pages having the same title, as well as
allows users to estimate the time and space it will take to
retrieve the page.
The Information
Filtering Module
Content-based Filtering
The filtering process consists of translating pages to their
vector space representation,
finding pages that are similar
to the profile, and selecting the top-scoring pages for
presentation to the user.
ProfBuilder highlights each recommendation
to show its
relevance and access frequency (given the user’s current
path) by putting ‘ball’ and ‘man’ icons, respectively, in
front of the title. The number of balls shows levels of
relevance: one-ball pages are poor, two-ball pages are
neutral, three-ball pages are good, and so on. The number
of men shows level of .access frequency logarithmically:
one-man pages are visited once, two-man pages visited two
to three times, three-man pages are visited four to seven
times, four-man pages are visited eight to fifteen times, and
so on.
The vector representation is obtained by a text analysis of
HTML pages. This is done by extracting keywords from
page titles, all level of headings, and anchor hypertexts.
This narrow analysis leads to retrieval of fewer pages, but
most of the retrieved materials are likely to be helpful to
the user; as it is reasonable to assume that the author of a
Web page used these words to give the main aspect of the
page. Stop words [5] are filtered out and word stemming
[6] is then performed to improve IR performance. The
keywords are weighted based on the well-test algorithm
TDIDF [ 161. The weight of a keyword in one page is the
product of its keyword frequency and the inverse of its
document frequency. The weight of the keyword kj is given
To read the content of the page, the user clicks on its title.
Titles in ‘bold’ font indicate unread pages, while titles in
‘normal’ font indicate pages have been read by the user.
by,
In addition to the list, the frame also shows two buttons for
choosing the type of filtering.
In the content-based
filtering, the selected pages are sorted in decreasing order
of their interestingness,
while in the collaborative filtering
are sorted in decreasing order of their access frequencies.
where tAj is the number of occurrences of kl in page s,, and
idA is the inverse document frequency of kj in the Web site.
The similarity metric between the vector Di representing
page si and the vector Qj representing the interests of user
uj is calculated by taking a scalar product of the two vector,
The Learning Module
The learning module handles the task of mapping user
interests to the profile and maintaining
the correlation
between the two. It is implemented
on the basis of our
proposed learning mechanism.
Similarity(D,, a) = f
wik
*
w,k
Collaborative Filtering
The filtering process is based on the following hypothesis:
making available the work of a large number of past users
can be useful to find out relationships or interdependency
between pages. Thus, it is reasonable to advise one user of
what was done by others (the previous section discussed
relationship
types). For instance, when there is a high
probability to visit page B given page A, it may indicate
that B has an important content. This reduces the chances
of missing something particularly significant.
The Web site usage is represented as a full order-l mixed
model. A mixed model is desirable
and essentially
unavoidable to show the actual probability distribution of
the pages. The order has been chosen because of its space
and time effectiveness as well as to the reasons mentioned
in the previous section.
The frequency of occurrence of each page in the order-O
sub-model is initialized based on its visibility (roughly it is
The module finds pages from the user’s current path which
is in this application
only his/her current page, and
selecting the top-frequency
pages for presentation to the
user.
’ For evaluation purposes; we have modified ProfBuilder
to a proxy-based architecture, as there was no practical
site physically available.
61
GGL-PUM00101618
Figure 2 ProfBuilder’s
interface.
collaborative
filtering.
Furthermore,
Letizia
requires
considerable
bandwidth to operate resulting in network
overload and bandwidth
shortages. On the other side,
ProfBuilder inhabits a Web site and operate locally in
assisting
external
users. Finally,
Letizia requires
a
Macintosh
and Netscape
browser to operate, which
severely limits its extent. In contrast, ProBuilder
is a
platform independent.
RELATEDRESEARCH
with
other
compares
ProfHuilder
This
section
recommender
systems. Among these systems, four are
selected for comparison:
Letizia [ll], SiteHelper [19],
Ring0 [18], and Fab [3]. The systems were selected
because they cover the best features
of the other
recommender systems.
Letizia
SiteHelper
Letizia [l l] is an agent that assists a user browsing the
Web. Letizia uses the idle time spent by the user reading
the current document to explore the neighborhood looking
for pages that are related to the user’s interest. Similar to
ProfBuilder, the goal of Letizia agent is to autonomously
provide assistance to the user without his/her intervention
while the user is browsing. However, Letizia learns the
areas that are of interest to a user by recording the user’s
browsing
behavior.
Letizia
uses only content-based
filtering, while ProfBuilder uses both content-based
and
SiteHelper [19] is an agent that acts as a housekeeper for a
Web site. It helps a user to find relevant information at the
site. This is similar to ProfBuilder design. However, the
learning mechanism is not transparent. SiteHelper prompts
the user for a set of keywords and asks for an explicit
feedback of rating the keywords. In contrast, the learning
mechanism
used by ProfBuilder
works transparently
without
user intervention.
In addition,
SiteHelper’s
62
GGL-PUM00101619
recommendations
filtering.
are
based
only
on
content-based
by using self-organizing
lists and hashing techniques to
provide a means of representing
approximated
context
models of any order in a reasonable amount of memory.
Ringo
Finally, ProfBuilder
assists a user by finding relevant
information on only one Web site. We intent to solve the
problem by maintaining user profiles across different Web
sites that use ProfBuilder. So that, when a user jumps to
another site, the user’s profile will also be transferred to
the new site whose ProfBuilder
will search for pages
similar to the profile. Thus, the user can find relevant
recommendations
in the first page accessed in the new site.
Ringo [18] is a system, which creates personalized
recommendations
for music albums and artists. It can be
accessed through electronic
mail or the Web. Users
describe their interests to the system by explicitly rating
some music. A user profile is a record of the user’s
interests (positive as well as negative) in specific items.
Ringo makes recommendations
based on comparisons
among user profiles (collaborative filtering). Its processing
time
takes
an hour,
while
ProfBuilder
operates
concurrently with the user in his/her browsing session.
CONCLUSION
In this paper, we have proposed a new learning mechanism
to learn user preferences from the retrieved pages. It is
based on their probabilities,
which are obtained from
collecting the visiting patterns of past users. We assumed
that the importance of a page is its entropy based on its
probability
of visiting following a sequence of pages
visited by the user.
Fab
Fab [3] is a system that helps Web users to discover new
and interesting sites. Fab combines both content-based and
collaborative filtering systems. The system maintains user
profiles based on content analysis, and directly compares
these profiles to determine similar users for collaborative
recommendation.
The system delivers a number of pages
that it thinks would interest the users. Users evaluate the
pages and provide explicit feedback to the system. Fab
uses the Web for collecting pages for recommendation.
This is an advantage over ProfBuilder’s approach, since it
is restricted to the site’s pages. However, Fab uses
considerable
bandwidth
for collecting
these pages.
Moreover, the system does not operate concurrently with
users during their browsing
session. Users get the
recommendations
by explicitly accessing their account in
Fab’s database through the Web.
We have also introduced
ProfBuilder,
an agent-based
recommender
for a Web site. ProfBuilder uses the site
usage information to learn user interests and as a base for
collaborative filtering. ProfBuilder helps the user to find
relevant pages on the site by providing both content-based
and collaborative filtering. ProfBuilder is independent of
the browser as it assists the users by inserting
its
recommendations
in the requested pages.
The efficiency
of the learning
mechanism
and the
usefulness of recommendations
given by ProfBuilder have
not been formally evaluated so far. We believe that
ProfBuilder performs well for sites that consist of 1000
HTML URLs and above as it is infeasible to read the entire
content of the sites.
FUTURE WORK
The limitations
follows:
of ProfBuilder
and possible solutions
are as
First, the main limitation of our system is that there is no
real assessment of ProfBuilder.
We plan to do serious
evaluation studies in the near future. For example, we plan
to examine different combinations
of sub-models
with
various orders.
REFERENCES
1. Aczel, J., and Daroczy, Z. 1975. On Measures of
Information
and their Characterizations.
Academic
Press.
2. Armstrong, R.; Freitag, D.; Joachims, T.; and Mitchell,
T. 1995. WebWatcher: A Learning Apprentice for the
World Wide Web. In Proceedings of the AAAI Spring
Gathering
from
Information
Symposium
on
Heterogeneous,
Distributed
Environments,
Stanford,
CA.
Second, one problem of its learning mechanism is that it
needs a large number of users, in order to have enough
data for reflecting the interdependency
between pages. We
intend to create simulated or virtual users who represent a
particular taste to show the content relationships among the
pages. In the CNN site for example, we can create a virtual
‘Iraq’ user, who visits only pages containing the word
‘Iraq’. In the case of an order-l fixed model, for instance,
all the pages will be connected to form a complete graph.
The frequency of a transition
between two pages is
assigned proportional to the product of their weights of the
word ‘Iraq’.
3. Balabanovic, M., and Shoham, Y. 1997. Fab: ContentRecommendation.
Collaborative
Based,
Communications of the ACM 40(3):66-72.
4. Bray, T. 1996. Measuring the Web. In Proceedings of
the Fifth International
World Wide Web Conference,
Paris.
5. Fox, C. 1990. A Stop List for General
Forum 24(1-2): 19-35.
Third, since the number of states of an order-m fixed
context model increases exponentially
with m, it is even
difficult to have a model with m = 2 as the space required
to store all the context information
is prohibitive.
We
intend to solve this problem as others have done (e.g. [S])
Text. SIGIR
6. Frakes, W. B. 1984. Term Conflation for Information
Retrieval. New York: Cambridge university Press.
63
GGL-PUM00101620
Proceedings of the sixth International
Conference, Santa Clara.
World Wide Web
7. Gudivada, V. N.; Raghavan, V. V.; Grosky, W. I.; and
Kasanagottu,
R., 1997. Information
Retrieval on the
World Wide Web. IEEE Internet Computing 1(5):5868.
15.Saltoq G., and McGill, M. J. 1983. An Introduction
Modern Information Retrieval, McGraw-Hill.
8. Hirschberg, D. S., and Lelewer, D. A. 1992. Context
Modeling for Text Compression,
Image and Text
compression. Kluwer academic Press, Norwell, MA.
16.Salton, G., and Yang, C. S. 1973. On the Specification
of Term Values in Automatic Indexing. Journal of
Documentation 29(4):351-372.
9. Jaczynski M., Trousse, B. 1997. Broadway: A World
Wide Web Browsing Advisor Reusing Past Navigations
from a Group of Users.
17.Shannon,
C. E.
Communication.
27(3):623-656.
htip://www.inria.fr/aidlpapers/97/ukcbr/htmllukcbr.htm
1
1948. A Mathematical
Theory of
Bell
System
Technical
Journal
18. Shardanand, U., and Maes, P. 1995. Social Information
Filtering: Algorithms for automating “Word of Mouth”.
In Proceedings of the Human Factors in Computing
Systems Conference -- CHI’9.5, Denver.
lO.Krulwich,
B., and Burkey, C. 1997. The InfoFinder
Agent: Learning
User Interests
through Heuristic
Phrase Extraction. IEEE Intelligent System 12(5):22-27.
19.Siaw,
Agent
Wide
World
11. Lieberman, H. 1995. Letizia: An Agent that Assists
Web Browsing. In Proceedings of International Joint
Conference on Artificial Intelligence, Montreal.
D., and Ngu, W. 1997. SiteHelper: A Localized
that Helps Incremental Exploration of the World
Web. In Proceedings of the Sixth International
Wide Web Conference, Santa Clara.
20.Thomas,
B. 1997. Recipe for
Internet Computing 1(6):72-74.
12.Marchiori, M. 1997. The Quest for Correct Information
on the Web: Hyper Search Engines. In Proceedings of
the Sixth International
World Wide Web Conference,
Santa Clara.
13.Pazzani, M.; Muramatsu,
J.;
Syskill & Webert: Identifying
Workshop on Machine Learning
AAAI Spring Symposium Series,
to
E-commerce.
IEEE
21. Wexelblat,
A., and Maes, P. 1997. Footprints:
Visualizing
Histories
for
Web
Browsing.
In
Computer-Assisted
RIA0’97:
Proceedings
of
Information Retrieval on the Internet, Montreal.
and Billsus, D. 1996.
Interesting Web Sites.
in Information Access,
Stanford, CA.
22.Yan, T. W., and Garcia-Molina,
H. 1995. SIFT - A
Tool for Wide Area Information
Dissemination.
USENIX Technical Conference.
14.Sakagami H., and Kamba, T. 1997. Learning Personal
on Online Newspaper Articles from User Behaviors. In
64
GGL-PUM00101621
Exhibit 8
GGL-PUM00101538
GGL-PUM00101539
GGL-PUM00101540
GGL-PUM00101541
GGL-PUM00101542
GGL-PUM00101543
GGL-PUM00101544
GGL-PUM00101545
GGL-PUM00101546
GGL-PUM00101547
GGL-PUM00101548
GGL-PUM00101549
GGL-PUM00101550
GGL-PUM00101551
GGL-PUM00101552
GGL-PUM00101553
GGL-PUM00101554
GGL-PUM00101555
Exhibit 9
Exhibit 15
REDACTED
IN ITS
ENTIRETY
Exhibit 16
REDACTED
IN ITS
ENTIRETY
Exhibit 17
REDACTED
IN ITS
ENTIRETY
Exhibit 18
REDACTED
IN ITS
ENTIRETY
Exhibit 26
Exhibit 27
Bing Making search yours - Search Blog - Site Blogs - Bing Community
WEB
IMAGES
VIDEOS
MAPS
Page 1 of 13
MORE
COMMUNITY
Bing
t
y
r
e
Q
i
m
b
u
S
HOME
BLOGS
FORUMS
MEDIA
EVENTS
TOOLBOX
Making search yours
Facebook
Like
3 people like this.
74
Tags
bing
bing cashback
Bing for Mobile bing maps
Shopping bing travel
search
Stefan Weitz
BingIt
events
Bing Rewards
Bing
Facebook instant answers Kari Dilloo
live
Ted Roduner
Search Blog
Making search yours
The Bing Team
2/10/2011 9:00 AM
Comments (10)
For years people have talked about personalized search as the next evolution of our amazing technology.
Over the years, the biggest obstacle facing search engineers is the simple fact that human behavior is not
predictable. Don’t get us wrong, people can be creatures of habit and we can build functions that enable us to
display different results based on logical assumptions made in the aggregate. For example, ‘traffic’ at 6pm on
a Friday likely refers to road conditions, not the movie or the band. So we can generally use some math
magic to make really good predictions about what you mean when you type ‘traffic’ at 6pm tonight, but what if
we’re wrong? It’s easy to see how that could happen especially if you, say, walk to work.
In that case a more personal search would benefit you; having more detailed information about the person
doing the search can make results more effective.
We think one of the challenges with delivering results which are truly individualized is that, to date,
personalized search “can’t see the forest for the trees”. In other words everyone is collecting everything and
trying to figure out the foibles of human behavior from a mass of digital bits. To an extent, we’ve all been
looking at the wrong inputs which in turn haven't given us the output we want.
We’ve found something interesting: a person’s history or profile often does not necessarily deliver better
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 2 of 13
Even as we continue to develop more relevant search through smart personalization, we are very focused on
maintaining an industry-leading privacy stance. For more information, see here.
We’re currently ‘flighting’ (or “testing”, for non search-geeks) a raft of experiments to see which techniques
deliver the best results for a given user behavior, but today we want to talk about two we’ve recently put out
there for you all! First, something relatively simple: automatically tailoring search results based on your
physical location
As 76% of people use search engines to plan trips, events or social gatherings, Bing Local has always
provided you with maps to nearby business listings, authoritative reviews and areas of interest. Starting
t d
’
i
t f th
ith
i
t th t t k i t
t h
d
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 3 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 4 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 5 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 6 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 7 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 8 of 13
relevant result for that user is not necessarily the same as that for the majority of people in the U.S. To
numerous users with an interest in pursuing a career in chemistry, the most relevant result may be the
American Chemical Society, but to someone interested in how they can get involved in the fight against
cancer, the most relevant result is more likely to be the American Cancer Society.
Suppose, in this latter case, the chemistry fan selects American Chemical Society . Our research shows that
users commonly re-issue such navigational queries and the intent of that user rarely changes. This new
personal search feature uses this human behavior as its core premise – if Bing thinks a user is trying to “refind” a site, the relevant result is promoted to the top position on the page:
The beauty of thinking differently about personalized searching is that it enables us to construct elegant
solutions that require a minimal amount of personal information and, frankly, often exhibit better results than a
more computationally complex predictive model alone.
There is much more to come, but take Bing for spin and tell us what you think!
- Aidan Crook & Sanaz Ahari, Bing Search
Join Bing Community
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 9 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 10 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 11 of 13
2/14/2011 4:51 AM
http://bollywoodmasala.info/
sunny.seo15
2/16/2011 9:27 PM
I like very much...Pizza..............:)
kuriositee
2/17/2011 6:40 AM
Will you offer a way to turn personalized search results off like Google does? Or manually change your
location?
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 12 of 13
Create an account to comment.
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Bing Making search yours - Search Blog - Site Blogs - Bing Community
Page 13 of 13
http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012
Exhibit 28
REDACTED
IN ITS
ENTIRETY
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?