Personalized User Model LLP v. Google Inc.

Filing 474

REDACTED VERSION of 457 Declaration re Answering Brief in Opposition to Motion for Summary Judgment of Invalidity (Declaration of Jaime G. Carbonell) (Volume 1 of 3) by Personalized User Model LLP. (Tigan, Jeremy)

Download PDF
IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF DELAWARE PERSONALIZED USER MODEL, L.L.P., ) ) Plaintiff, ) ) v. ) ) GOOGLE, INC., ) ) Defendant. ) ) ) ) GOOGLE, INC. ) ) Counterclaimant, ) ) v. ) PERSONALIZED USER MODEL, L.L.P. and ) ) YOCHAI KONIG, ) Counterclaim-Defendants. ) C.A. No. 09-525 (LPS) PUBLIC VERSION VOLUME 1 OF 3 - DECLARATION OF JAIME G. CARBONELL IN SUPPORT OF PLAINTIFF PERSONALIZED USER MODEL, L.L.P.’S ANSWERING BRIEF IN OPPOSITION TO GOOGLE'S MOTION FOR SUMMARY JUDGMENT ON INVALIDITY OF COUNSEL: Marc S. Friedman SNR Denton US LLP 1221 Avenue of the Americas New York, NY 10020-1089 (212) 768-6700 Mark C. Nelson SNR Denton US LLP 2000 McKinney Avenue, Ste. 1900 Dallas, TX 75201 (214) 259-0901 Jennifer D. Bennett SNR Denton US LLP 1530 Page Mill Road, Ste. 200 Palo Alto, CA 94304-1125 (650) 798-0300 Original Filing Date: January 14, 2013 Redacted Filing Date: January 23, 2013 MORRIS, NICHOLS, ARSHT & TUNNELL LLP Karen Jacobs Louden (#2881) Jeremy A. Tigan (#5239) 1201 N. Market Street P.O. Box 1347 Wilmington, DE 19899-1347 (302) 658-9200 klouden@mnat.com Attorneys for Personalized User Model, L.L.P. IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF DELAWARE PERSONALIZED USER MODEL, L.L.P., ) ) Plaintiff, ) ) v. ) ) GOOGLE, INC., ) ) Defendant. ) ) ) ) GOOGLE, INC. ) ) Counterclaimant, ) ) v. ) PERSONALIZED USER MODEL, L.L.P. and ) ) YOCHAI KONIG, ) Counterclaim-Defendants. ) C.A. No. 09-525 (LPS) PUBLIC VERSION DECLARATION OF JAIME G. CARBONELL IN SUPPORT OF PLAINTIFF PERSONALIZED USER MODEL, L.L.P.’S ANSWERING BRIEF IN OPPOSITION TO GOOGLE'S MOTION FOR SUMMARY JUDGMENT ON INVALIDITY I, Jaime G. Carbonell, declare: 1. I submit this Declaration in Support of Plaintiff's Personalized User Model, L.L.P’s ("PUM") Answering Brief in Opposition to Google’s Motion for Summary Judgment on Invalidity. 2. I have been retained by counsel for PUM to respond to the Report of Defendant's Expert Michael I Jordan, Ph.D., Concerning Invalidity of Claims 1, 11, 22, 32 and 34 of U.S. Patent No. 6,981,040 and Claim 1, 3, 5, 6, 7, 21 and 22 of U.S. Patent No. 7,685,276, and to respond where appropriate. 3. Attached hereto as Exhibit A is a true and correct copy of my Rebuttal Expert Witness Report signed November 28, 2012, and selected exhibits. I hereby incorporate by reference and adopt all statements and opinions reflected in my Expert Witness Report as if set forth fully herein. I declare under penalty of perjury under the laws of the United States of America that the foregoing is true and correct. Executed this 12th day of January 2013, at Pittsburgh, Pennsylvania. Jaime G. Carbonell -2- CERTIFICATE OF SERVICE I hereby certify that on January 14, 2013, I caused the foregoing to be electronically filed with the Clerk of the Court using CM/ECF which will send electronic notification of such filing to all registered participants. Additionally, I hereby certify that true and correct copies of the foregoing were caused to be served on January 14, 2013, upon the following individuals in the manner indicated: BY E-MAIL BY E-MAIL Richard L. Horwitz David E. Moore POTTER ANDERSON & CORROON LLP 1313 N. Market St., 6th Floor Wilmington, DE 19801 Brian C. Cannon QUINN EMANUEL URQUHART & SULLIVAN, LLP 555 Twin Dolphin Dr., 5th Floor Redwood Shores, CA 94065 Charles K. Verhoeven David A. Perlson Antonio R. Sistos Andrea Pallios Roberts Joshua Lee Sohn QUINN EMANUEL URQUHART & SULLIVAN, LLP 50 California Street, 22nd Floor San Francisco, CA 94111 /s/ Jeremy A. Tigan Jeremy A. Tigan (#5239) EXHIBIT A REDACTED IN ITS ENTIRETY Exhibit 1 Personal WebWatcher: design and implementation Dunja Mladenic Dpt. for Intelligent Systems, J.Stefan Institute, Jamova 39, 11000 Ljubljana, Slovenia Phone: (+38)(61) 1773 272, Fax: (+38)(61) 1258-158 E-mail: Dunja.Mladenic@ijs.si http://www-ai.ijs.si/DunjaMladenic/home.html School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA 1 Introduction With the growing availability of information sources, especially non-homogeneous, distributed sources like the World Wide Web, there is also a growing interest in tools that can help in making a good and quick selection of information we are interested in. Recent work that arises at the intersection of Information Retrieval and Machine Learning o ers some novel solutions to this problem, as well as work in Intelligent Agents. For example, Armstrong et al. 2] developed WebWatcher, a system that assists user in locating information on the World Wide Web taking keywords from the user, suggesting hyperlinks and receiving evaluation. Balabanovic et al. 3] developed \a system which learns to browse the Internet on behalf of a user". It searches the World Wide Web taking bounded amount of time, selects the best pages and receives an evaluation from the user. The evaluation is used to update the search and selection heuristics. Pazzani et al. 26] collect ratings of the explored Web pages from the user and learn a user pro le from them. Pages are separated according to their topic and a separate pro le is learned for each topic. Mitchell et al. 25] proposed a system connected to the user's electronic calendar, that generates sets of rules capturing the user's scheduling preferences and some other information about individual attendees of meetings. It uses these rules to provide advice to the user for new, unscheduled meetings. Lang 21] developed a system for electronic news ltering that uses text-learning to generate models of user interests. Krulwich and Burkey 20] proposed \The ContactFinder agent" that reads and responds to bulletin board messages assists users by referring them to other people who can help them and categorizes messages and extracts their topic areas. Maes 24] described \interface agents" that learn from the user as well as from other agents. As examples of such agents they developed agents for: electronic mail handling, meeting scheduling, electronic news ltering and entertainment recommendation. Some of these agents use the context of documents and adopt Information Retrieval approaches (eg. news ltering). Others rely on correlation between di erent users | performing \social 1 GGL-PUM001011628 learning" (eg. entertainment recommendation). Holte and Drummond 15], Drummond et al. 9] designed a system that assists browsing of software libraries, taking keywords from the user and using a rule-based system with forward chaining inference, assuming that the library consists of one type of item and the user goal is a single item. Etizioni and Weld 11] o er an integrated interface to the Internet combining UNIX shell and the World Wide Web to interact with Internet resources. Their agent accepts high-level user goals and dynamically synthesizes the appropriate sequence of Internet commands to satisfy those goals. Hammond et al. ?] and Burke et al. 6] developed a system that uses a \natural language question-based interface to access distributed text information sources" and helps the user to nd answers to her/his question in a databases such as FAQ les. One of the available information sources is the World Wide Web and it is currently growing quickly, attracting many users with di erent interests. Since interaction with the World Wide Web (WWW) is by means of computer, one can use computers to \observe" and record user actions and use this information to help users nd their way in the WWW. Personal WebWatcher is a system that observes users of the WWW and suggests pages they might be interested in. It learns user interests from the pages requested by the user. The learned model of user interests is then used to suggest hyperlinks on new HTML-pages requested by and presented to the user via Web browser that enables connection to \proxy" eg. Netscape. Section 2 gives an overview of Personal WebWatcher, its functionality and some directions for further development. Section 3 describes some of the research problems related to Personal WebWatcher's learning. The structure of the system and its implementations are described in Section 4 and Perl code is given in the Appendix. Results of the rst experiments are given in Section 5. 2 \Personalizing" WebWatcher Personal WebWatcher is mainly inspired by WebWatcher: \a Learning Apprentice for the World Wide Web" 2], 16] and some other work related to learning apprentice and learning from text 17], 19], 21], 25]. The idea of a learning apprentice is to automatically customize to individual users, using each user interaction as a training example. WebWatcher can be described as an agent that assists users in locating information on the WWW. It learns by observing a user on her/his way through the WWW and suggests interesting hyperlinks whenever it is con dent enough. The idea is that the user provides a few keywords describing a search goal and WebWatcher highlights related hyperlinks on the current page and/or adds new hyperlinks to the current page. It can also suggest pages related to the current page using information stored in the structure of hypertext without considering the text itself 16], or send an e-mail message to the user whenever speci ed pages change. The same WebWatcher version is designed to serve all users, collecting information and sharing it between users. For example, if someone recognizes a page as being related to the keywords she/he typed in the system at the beginning of the search, this can be useful for any user searching for similar 2 GGL-PUM001011629 information. Unlike WebWatcher, Personal WebWatcher (PWW) is structured to specialize for a particular user, modeling her/his interests. It \watches over the user's shoulder" the similar way WebWatcher does, but it avoids involving the user in its learning process (it doesn't ask the user for any keywords or opinions about pages). It solely records the addresses of pages requested by the user and highlights hyperlinks that it believes will be of interest. In the learning phase (typically during the night), requested pages are analyzed and a model of user interests is generated/updated. This model is used to give advice for hyperlinks on retrieved HTML-pages requested by and presented to the user via Web browser. Since each user has her/his own copy of the system - her/his own agent, these agents can communicate and exchange information on a base of similarity between their users, often referred to as collaborative or social learning 22], 31]. There are also some other types of \Personal agents" the user could use, for example, an agent for Calendar Apprentice 25], and these agents can exchange information they have about the same user in di erent elds/activities. we plan to investigate these in the future work on Personal WebWatcher. 3 PWW \Behind the stage" There are many research question behind Personal WebWatcher, that wait to be answered. Here, we consider some of them that are related to learning. There are also many others, for example, how to design communication between user and agent and between di erent agents, how to provide privacy to the user, to which extent agent should involve user to its learning/exploration process, : : : If we concentrate on learning, we rst want to know what kind of problem are we faced with, how to represent it to some learning algorithm and which kind of algorithm to use. Currently we restricted our work to text documents (plain text and HTML) so we are faced with the problem of text-learning having mainly short to medium size documents with varying vocabulary. This section gives di erent approaches related to (1) document representation, (2) feature selection and (3) learning used on textlearning problem. Table 1 summarizes them over some related papers in order to give an idea about current trends. 3.1 Document representation The frequently used document representation in Information Retrieval and text-learning is the so called TFIDF -vector representation. It is a bag-of-words representation: all words from the document are taken and no ordering of words or any structure of text is used (see Figure 1). Since most of our documents are in HTML format, there is a well de ned underlying structure that could be used. There is also additional information in plain text, for example structure of sentences, position of words or neighboring words. The question is how much can we gain considering additional information in learning (and what information to consider) and what is the price we have to pay for it? There is currently no well studied comparison or directions that we are aware of. There is 3 GGL-PUM001011630 Paper reference Apte et al. 1] Doc. Representation bag-of-words (frequency) Feature Selection Learning stop list+ Decision Rules frequency weight Armstrong et al. 2] bag-of-words (Boolean) mutual info. TFIDF, Winnow, WordStat Balabanovic et al. 3] bag-of-words (frequency) stop list+stemming+ TFIDF keep 10 best words Bartell et al. 4] bag-of-words (frequency) latent semantic | indexing using SVD Berry et al. 5] bag-of-words(frequency) latent semantic TFIDF Foltz and Dumais 12] indexing using SVD Cohen 8] bag-of-words (Boolean) infrequent words Decision Rules pruned ILP Joachims 17] bag-of-words (frequency) in/frequent words+ TFIDF, PrTFIDF, mutual info. Naive Bayes Lewis et al. 23] bag-of-words (Boolean) log likelihood logistic regression ratio combined with Naive Bayes Maes 24] bag-of-words+ mail/news header info.+ Memory-Based header info. selecting keywords reasoning Pazzani et al. 26] bag-of-words (Boolean) stop list+ TFIDF, Naive Bayes, mutual info. Nearest Neighbor, Decision Trees Sorensen and n-gram graph weighting graph connectionist combined Mc Elligott 33], 10] (only bigrams) edges with Genetic Algorithms Yang 35] bag-of-words (Boolean, stop list adapted frequency, TFIDF) k-Nearest Neighbor Table 1: Document representation, feature selection and learning algorithms used in some related work. some evidence in Information Retrieval research, that for long documents, considering information additional to bag-of-words is not worth e orts. But our documents are mostly HTML-documents on the WWW and they are not especially long! In the process of using text to learn how to give advice for a hyperlink, di erent approaches can be used to decide which part of text to use and how to represent it. Personal WebWatcher uses an approach similar to that of WebWatcher. In WebWatcher the bag-of-words representation is used, where considered text consists of underlined words, words in the sentence containing the hyperlink, words in all the headings above the hyperlink and words given as keywords by the user 2]. Some later versions of the WebWatcher system change slightly the way of constructing text for learning, eg. adding words in the document retrieved behind hyperlink. Many current systems that learn on text use the bag-of-words representation using either Boolean features indicating if speci c word occurred in document (eg. 2], 8], 23], 26], 35]) or frequency of word in a given document (eg. 1], 3] 4], 5], 17], 35]). There is also some work that uses additional information such as word position 8] or word tuples called n-grams 33]. We decided to use the bag-of-words representation using frequency of word and observe success of given advice (whether user selected the advised hyperlink). In case of poor system performance, some additional information from HTML-structure could be added, for example, frequency of word in headlines of a given document. We would 4 GGL-PUM001011631 0 agent 1 of Artificial Intelligence, which is distributed intelligence text 0 JAIR is a refereed journal, covering all areas journal 2 0 Journal of Artificial Intelligence Research learning 3 internet webwatcher free of charge over the internet. Each volume of the journal is also published by Morgan 0 0 Kaufman.... . perl5 . . 1 volume Figure 1: Bag-of-words representation using frequency vector. like to spend time on extracting and using this additional information only when highly probable that we will gain a decent improvement in system performance. This is currently under research. 3.2 Feature selection If we decide to ignore all the additional information and use the bag-of-words approach, we still end up with several tens of thousands of di erent words that occur in our documents. Not only is using all these words time-consuming but also many of them are not really important for our learning task. One of the approaches to reduce number of di erent words is to use \stop-list" containing common English words (eg. a, the, with) or pruning the infrequent and/or very frequent words ( 8], 17]). There is also the possibility of word stemming. Many approaches introduce some sort of word weighting and select only the best words (eg. 1], 2], 3], 17], 23], 26]) or reduce dimensionality using latent semantic indexing with singular value decomposition (eg. 4], 5], 12]). Some of the Machine Learning techniques for feature selection could also be used (eg. 7], 32], 18]) but most of them take too long in situations with several tens of thousands of features. In the current implementation of PWW we weight words using mutual information between word occurance and class value 34], the same way as used in 2] or 17]. The research topic is the number of best words to consider and we observe its in uence on classi cation accuracy and precision of the best suggested hyperlink (see Section 5). Mutual information assigns higher weight to the words that make better distinction between interesting and uninteresting documents. One of the practical problems during classi cation is that a new document often contains very few or even non of the selected words. Since we are more interested in positive class (interesting documents) and we want to have words that are frequent, it might be better to include in the weighting formula the probability of a word occurring in the positive class or frequency of the word ( 23]) TF (w) log( ( jj ) , where w is a selected word, c is the positive class and ( ) TF (w) is the frequency of word w. We plan to make additional experiments using the proposed combined weighting as P w c P w c 5 GGL-PUM001011632 well as using some other weighting methods. For example, combining a stop list with weighting words by their frequency and keeping the most frequent words 35] or using word weighting used in the odds ratio method 27]. Odds ratio is the method of document ranking according to their relevance for a given problem (eg. being interesting for ) ) ( user). Ranking of documents is de ned as ranking (d c) = log ( jj ) = log ( ) ( jj ) = ( ( ) P 2 weight(w) + const: Word weight that it de nes can be used ( ) w 2d ( j ) log ( ) w2d ( j ) = for feature selection, weighting all words and selecting words with the highest weight. This word weight is de ned as p( ; p(w)) weight(w) = log (1w)(1 w))p(w) ; p( P c d P c P w c P c P d c P w c P c P c P d c P c d w d where p(w) = #( )+)+1 , where docs(c) is number of documents in class c and p(w) ( is the same as p(w) except that c is substituted with c. Shaw 28] proposed special handling of singularities in the above formulas for p(w) and p(w), namely, p(w) = # 1 2 when TF (w c) = 0 and p(w) = 1 ; # 1 2 when TF (w c) = #docs(c) TF w c const: docs c docs docs 3.3 Learning algorithm One of the well-established techniques for text in Information Retrieval is to represent each document as a TFIDF -vector in the space of words that appeared in training documents 30], sum all interesting document vectors and use the resulting vector as a model for classi cation (based on 29] relevance feedback method). Each component of a document vector d( ) = TF (w d)IDF (w ) is calculated as the product of TF (Term Frequency | number of times word w occurred in document) and IDF = log ( i ) (Inverse document Frequency), where D is the number of documents and document frequency DF (w ) is the number of documents word w occurred in at least once. The exact formulas used in di erent approaches may slightly vary (some factors are added, normalization performed 30]) but the idea remains the same. A new document is then represented as a vector in the same vector space as the generated model and the distance between them is measured (usually de ned as a cosine of angle between vectors) in order to classify the document. This technique has already been used in Machine Learning experiments on World Wide Web data (eg. 2], 3], 5], 17], 26]). There are also some other techniques for model generation that have been used in text-learning. Armstrong et al. 2] used a statistical approach they called WordStat that assumes mutual independence of words and de nes probability of class c as P (c) = 1 ; (1 ; P (c=w)). Pazzani et al. 26] used a Naive (Simple) Bayesian classier on Boolean vectors, that assumes independence of words and de nes probability of class c for given document doc that contains words w as proportional to P (c) P (c=w). They also used Nearest Neighbor and symbolic learning using Decision Trees. A variant of k-Nearest Neighbor was also usedP Yang 35], where relevance of class c given docuby ment doc is de ned as rel(c=doc) = =1 similarity (doc D )P (c=D ) and similarity is measured by cosine between vectors and P (c=D ) = # i # i (same document may occur several times being classi ed in di erent categories). Joachims 17] introduced Probabilistic TFIDF that takes into account document representation and dei i i D i DF w i i w w k i i D I nc i i D I ntrainingdata 6 GGL-PUM001011633 nes probability of class c for given document doc that contains words w as P (c=doc) = P P P (w=doc ) where P (w=c) = P and P (w=doc ) = P P (c)P (w=c) w T F (w c) T F (w doc) i P (ci )P (w=ci ) i T F (wi c) i T F (wi He also used Naive (Simple) Bayesian classi er on frequency vectors, the same as we used in PWW. It assumes independence of words and de nes probability of class c for given document doc that contains words w as ( ) ) ( P (c=doc) = P PP(cc ) P Pw=c) ) ( ) ( (w=c i . T F w doc w i doc) w i T F w doc where P (w=c) = # 1+ P( )( i ) TF (w c) denotes frequency of word w in docu+ i ments of class c and TF (w doc) denotes frequency of word w in document doc. Apte et al. 1] used Decision Rules and observed that in case of di erent topics being categories, it is better to select features for each given topic (using stop-list and frequency weighting) than for all topics at once, even if the set of features is additionally reduced for each topic using entropy-based measure to weight features. Cohen 8] used Decision Rules and the Inductive Logic Programming systems FOIL and FLIPPER. Lewis and Gale 23] used a combination of a Naive Bayesian classi er and logistic regression de ning probability (of class c for given document doc that contains words w Pw P (w=cc)) ) ( + as P (c=doc) = 1+ ( + P P Pw= ) ) . (w=c P (w=c) w Maes 24] used Memory-Based reasoning, McElligot and Sorensen 10], 33] used a connectionist approach combined with Genetic Algorithms. We decided to test di erent learning algorithms on PWW data (see Section 5), since it is not clear which algorithm is the most appropriate. The current version of PWW uses a Naive (Simple) Bayesian classi er on frequency vectors to generate a model of user interests, that is used for advising hyperlinks. TF w c words exp a exp a b TF w c log b log 4 Structure of Personal WebWatcher Personal WebWatcher consists of two main parts: a proxy server that interacts with the user via Web browser and a learner that provides the user-model to the server (see Figure 2). Communication between them is via disk the proxy saves addresses of visited documents (URLs) and the learner uses them to generate model of user interests. The whole system is implemented in approximately 2500 lines of Perl code and 1500 lines of C ++ code. The proxy server consists of three main parts, each implemented as a Perl script: proxy (additionally calls external fetcher code to fetch the page), adviser and classi er (calls external C ++ code for classi cation). Proxy waits in an in nite loop for a page request from browser. On request, it fetches the requested document and if it is an HMTL-document adds advice and forwards the document to the user. To add advice proxy forwards the page to adviser, that extracts hyperlinks from document and calls external code for classi cation that uses generated user-model. A limited number of hyperlinks that are scored above some threshold are recommended to the 7 GGL-PUM001011634 USER WEB fetcher page request proxy url modified page http:// visited urls LEARNER original page hyper links adviser scores generated model classifier model Figure 2: Structure of Personal WebWatcher. The learning part is described separately . user, indicating their scores using graphical symbols placed around each advised hyperlink. For example, in Figure 3 three hyperlinks are suggested by PWW: \Machine Learning Information Services" and two project members (Dayne Freitag, Thorsten Joachims). There is a banner at the top of the page showing that PWW is \watching over the user's shoulder". 4.1 Structure of the learning module works in two versions: learning a new model from scratch (learner) or updating an existing model (updater) as shown in Figure 4. The di erence is that the rst one has to de ne the domain (words to be used) and starts learning with an empty model, while the second one has already de ned which words to use in representing documents as frequency vectors and has an existing model to modify. Both versions fetch visited documents and documents one step behind the hyperlinks of visited documents and store them as positive or negative examples of user interests, depending whether the user visited the document or not (getDoc in Figure 4). Hyperlinks from visited HTML-documents are extracted and stored in an extended hyperlink format, the same that is used by adviser. Each hyperlink is represented in extended format, taking into account underlined words, words in a window around a hyperlink and words in all the headings above the hyperlink. Using hyperlinks represented only with underlined words is often a bad idea, eg. click here... Hyperlinks whose documents were visited by the user are considered to be positive examples, and all the other to be negative examples of the user interests. The idea is that all hyperlinks were presented to the user and the user chose to visit some of them that meet her/his interests. This simpli cation is introduced to minimize users Learner 8 GGL-PUM001011635 Figure 3: Example of HTML-page presented to the user by PWW. involvement in the learning process and enable learning without asking user for page rating. We plan to make it optional in some later versions of the system. learner transforms documents into examples in two phases: (1) (docs2exs and docs2addexs in Figure4) parsing each document, assigning an index to each word and representing it in three les as a line of word indices containing: all words, only headline words, only underlined words. (2) (exs2vec in Figure4) calculating score (eg. information gain) for each word, selecting some top words and represent documents as bag-of-words keeping frequency for each of the top words. updater uses given top words (domain de nition) and represent documents as bag-of-words, the same way learner does (docs2ddexs in Figure 4). During the model generation(GenModel in Figure 4), the system can ask for additional information about some words (stating which kind of information - functions and on which words - basic attributes). The kind of information it could ask for is speci ed as so called background knowledge eg. feature saying how many times a word occurred in document headlines (genAttr in Figure 4). This is currently under development. Learning part consists basically of eight Perl scripts that call external C ++ code for model generation/updating. Two scripts integrate parts of learner (updater) and the other six that are represented with rectangles in Figure 4 (getDocs, docs2exs, docs2addexs, exs2vec, genAttr, docs2ddexs). Two additional rectangles in Figure 4 GenModel, UpdateModel represent C ++ code. 9 GGL-PUM001011636 LEARNER http:// retrieved documents (+,-) all words visited urls OR docs2exs or docs2addexs getDoc exs2vec hyperlinks as documents (+,-) domain examples headlines underlines basic attributes functions genAttr GenModel model attr. names attr. values UPDATER (update model) http:// retrieved documents (+,-) new visited urls examples getDoc OR docs2ddexs UpdateModel model hyperlinks as documents (+,-) domain Figure 4: Structure of PWW Learner. 4.2 Model of user interests The model of user interests is designed to predict if some document is positive or negative example of user interests. It is used to advice hyperlinks on the HTMLdocument requested by the user. Since the prediction should be performed while user is waiting for a HTML-document, we are actually predicting interestingness of document based on the hyperlink pointing to it, and not document itself (retrieving documents behind the requested hyperlinks is usually time consuming) The model of user interests is generated \o -line", usually during the night and thus its generation is not so critical in time as its usage for prediction. One of the simplest idea for learning is to use hyperlinks that occurred on the documents presented to the user as training examples and learn to predict if a new hyperlink is positive or negative example of the user interests: User HL : HyperLink ! fpos neg g What we use is an extended representation of hyperlink (see Section 4.1), that tries to capture information related to the document behind a hyperlink. But during the learning phase we can a ord using more time than when adding advice, so why not retrieving documents behind hyperlinks, instead of using the extended hyperlink representation? In that case, we can learn the model of user interests directly from documents whose 10 GGL-PUM001011637 interestingness we are trying to predict: User DOC : Document ! fpos neg g In this case, we end up with using the model generated from documents to predict interestingness of hyperlinks. Since our hyperlinks are represented as short HTMLdocuments (including headlines and some portions of text) it is not so unusual, but we could also learn a model that predicts document content based on a given hyperlink: Document HL : HyperLink ! Document and then predict interestingness of so predicted document content using the above described model User . Our rst experiments are in learning the rst two models. DOC 5 First experimental results In order to select a good document representation, feature selection method and learning algorithm we decide to test di erent possibilities and compare them. Since PWW has to recommend interesting hyperlinks to the user, we are interested in measuring the precision of our system on the most highly recommended hyperlink. Precision is frequently used as a metric in Information Retrieval and it is de ned as the percent of positive suggestions among all suggestions made. In our case, it is either zero (in case the best suggestion is a negative example and shouldn't be suggested to the user) or one (if we made a correct suggestion). We also measured traditional Machine Learning quality estimate classi cation accuracy, de ned as percent of correctly classi ed examples (over all classes). Both quality estimates are calculated using 10-fold cross-validation technique (see Figure 5) using the same example splits for all tested algorithms. training examples 2 3 all examples 1 2 testing examples ... 10 ... 1 3 ... GenModel 10 10 GenModel . . . 1 2 1 2 Classify Classify . . . ... 9 GenModel 10 Classify average results Figure 5: Illustration of 10-fold cross-validation experiments. In rst experiments we observe how vector length (number of features selected) in uences model quality for two learning algorithms: Naive Bayesian classi er on frequency vector as used by Joachims 17] (see Section 3.3) and k-Nearest Neighbor approach on 11 GGL-PUM001011638 frequency vectors using Euclidean distance between examples and summing class probabilities predicted by k-neighbors. We tested both algorithms on data for documents behind hyperlinks User and data for hyperlinks User (see Section 4.2). Documents are currently represented using the bag-of-words approach (see Section 3.1) and feature selection is performed using mutual information approach (see Section 3.2). Our experiments are performed on data collected for four users participating in the HomeNet project 14] with the data characteristics given in Table 2. DOC HL UserId and probability of number of data data source interestingness examples entropy usr150101 Doc 0.094 1 333 0.449 HL 0.104 2 528 0.480 usr150202 Doc 0.107 3 415 0.492 HL 0.053 4 798 0.301 usr150211 Doc 0.089 2 038 0.436 HL 0.044 2 221 0.259 usr150502 Doc 0.100 1 272 0.468 HL 0.100 2 498 0.468 Table 2: Data characteristics for document (Doc) and hyperlink (HL) data for each of the four HomeNet users. In all experiments k-Nearest Neighbor achieved slightly higher classi cation accuracy than the Naive Bayesian classi er (see Figures 6, 7, 8, 9), but the di erence is signi cant only in one out of six experiments (see Figure 8). Adding more than approximately 50 features doesn't appear to help for classi cation accuracy, but it also doesn't hurt. High classi cation accuracy achieved for all four users by k-Nearest Neighbor algorithm is in fact default accuracy if negative class is predicted for all documents. So what we are really interested in is making good predictions for positive documents, and that is why we decide to measure precision of the best suggested hyperlink. Precision varies much more with vector size than classi cation accuracy (see Figures 10, 11, 12, 13), it actually drops in seven out of eight experiments. It seems that k-Nearest Neighbor is more stable in precision than Naive Bayesian classi er, achieving higher precision for longer vectors and about the same for up to 100 features, with exception for one user (see Figure 12), where the precision of Naive Bayesian classi er is for most vector sizes 0 for the document model and much better on short vectors than k-Nearest Neighbor for the hyperlink model. In order to draw some conclusion about vector size and quality of algorithms, we need to perform more experiments on di erent users. These rst experiments show that increasing vector size probably isn't as bene cial as one could expect and it even could hurt precision of the best suggestion. There is no evidence that algorithms di er substantially in classi cation accuracy, although k-Nearest Neighbor seems to be 12 GGL-PUM001011639 more promising both in accuracy and precision. If further experiments con rm the hypothesis that long vectors are not advisable, a closer look at the short vectors (eg. see Figure 14) should give an idea about number of features that work well for both accuracy and precision. HomeNet data, user 150101 HomeNet data, user 150101 95 Accuracy(%) 100 95 Accuracy(%) 100 90 85 NBayesTFDoc kNNeighbourDoc 90 85 80 NBayesTFHL kNNeighbourHL 80 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 6: In uence of vector size to classi cation accuracy of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150101. Notice that classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation since accuracy is calculated as average of 10 results. HomeNet data, user 150202 HomeNet data, user 150202 95 Accuracy(%) 100 95 Accuracy(%) 100 90 85 NBayesTFDoc kNNeighbourDoc 90 85 80 NBayesTFHL kNNeighbourHL 80 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 7: In uence of vector size to classi cation accuracy of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150202. Notice that classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation since accuracy is calculated as average of 10 results. References 1] Apte, C., Damerau, F., Weiss, S.M., Toward Language Independent Automated Learning of Text Categorization Models, Proc. of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline, 1994. 13 GGL-PUM001011640 HomeNet data, user 150211 HomeNet data, user 150211 95 Accuracy(%) 100 95 Accuracy(%) 100 90 90 NBayesTFDoc kNNeighbourDoc 85 85 80 NBayesTFHL kNNeighbourHL 80 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 8: In uence of vector size to classi cation accuracy of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150211. Notice that classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation since accuracy is calculated as average of 10 results. 2] Armstrong, R., Freitag, D., Joachims, T., Mitchell, T., WebWatcher: A Learning Apprentice for the World Wide Web, AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, March 1995. URL: http://www.cs.cmu.edu/afs/cs/project/theo-6/web-agent/www/webagentsplus.ps.Z 3] Balabanovic, M., Shoham, Y., Learning Information Retrieval Agents: Experiments with Automated Web Browsing, AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, March 1995. URL: http://robotics.stanford.edu/people/marko/papers/lira.ps 4] Bartell, B.T., Cottrell, G.W., Belew, R.K., Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling, Proceedings of the ACM SIG Information Retrieval, Copenhagen, 1992. 5] Berry, M.W., Dumais, S.T., OBrein, G.W., Using linear algebra for intelligent information retrieval. SIAM Review, Vol. 37, No. 4., pp. 573{595, December 1995. 6] Burke, R., Hammond, K., Kozlovsky, J., Knowledge-based Information Retrieval for Semi-Structured Text, Working Notes from AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, pp. 19{24, American Association for Arti cial Intelligence, 1995. 7] Caruana, R., Freitag, D., Greedy Attribute Selection, Proc. of the 11th International Conference on Machine Learning ICML94, pp. 28|26, 1994. 8] Cohen, W.W., Learning to Classify English Text with ILP Methods, Workshop on Inductive Logic Programming, Leuven, September 1995. 14 GGL-PUM001011641 HomeNet data, user 150502 HomeNet data, user 150502 95 Accuracy(%) 100 95 Accuracy(%) 100 90 85 NBayesTFDoc kNNeighbourDoc 90 85 80 NBayesTFHL kNNeighbourHL 80 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 9: In uence of vector size to classi cation accuracy of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150502. Notice that classi cation accuracy scale starts at 80% accuracy. Error bars show standard deviation since accuracy is calculated as average of 10 results. HomeNet data, user 150101 HomeNet data, user 150101 0.8 0.6 0.6 Precision 1 0.8 Precision of the best suggestion 1 0.4 0.4 NBayesTFDoc kNNeighbourDoc NBayesTFHL kNNeighbourHL 0.2 0.2 0 0 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 10: In uence of vector size to precision of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150101. 9] Drummond,C., Ionescu, D., Holte, R., A Learning Agent that Assists the Browsing of Software Libraries, Technical Report TR-95-12, Computer Science Dept., University of Ottawa, 1995. 10] Mc Elligott, M., Sorensen, H., An emergent approach to information ltering, Abakus. U.C.C. Computer Science Journal, Vol 1, No. 4, December 1993. 11] Etizioni, O., Weld, D., A Softbot-Based Interface to the Internet, Communications of the ACM Vol. 37, No. 7, pp.72{79, July 1994. 12] Foltz, P. W. and Dumais, S. T., Personalized information delivery: An analysis of information ltering methods. Communications of the ACM, 35(12), pp.51|60, 1992. 15 GGL-PUM001011642 HomeNet data, user 150202 HomeNet data, user 150202 1 0.8 0.8 0.6 NBayesTFDoc kNNeighbourDoc 0.6 Precision Precision of the best suggestion 1 0.4 0.4 0.2 0.2 NBayesTFHL kNNeighbourHL 0 0 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 11: In uence of vector size to precision of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150202. HomeNet data, user 150211 HomeNet data, user 150211 0.8 0.6 0.6 Precision 1 0.8 Precision of the best suggestion 1 0.4 0.4 NBayesTFDoc kNNeighbourDoc NBayesTFHL kNNeighbourHL 0.2 0.2 0 0 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 12: In uence of vector size to precision of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150211. 13] Hammond, K., Burke, R., Schmitt, K., A Case-Based Approach to Knowledge Navigation, AAAI Workshop on Indexing and Reuse in Multimedia Systems, pp. 46{57, American Association for Arti cial Intelligence, 1994. 14] HomeNet: a research project at Carnegie Mellon University, Pittsburgh, PA, USA, 1996, URL:http://homenet.andrew.cmu.edu/progress/ 15] Holte, R.C., Drummond, C., A Learning Apprentice For Browsing, AAAI Spring Symposium on Software Agents, 1994. 16] Joachims, T., Mitchell, T., Freitag, D., Armstrong, R., WebWatcher: Machine Learning and Hypertext, Fachgruppentre en Maschinelles Lernen, Dortmund, August 1995. 17] Joachims, T., A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization, Technical report, CMU-CS-96-118, School of Computer Science, Carnegie Mellon University, March 1996. 16 GGL-PUM001011643 HomeNet data, user 150502 HomeNet data, user 150502 0.8 0.6 0.6 Precision 1 0.8 Precision of the best suggestion 1 0.4 0.4 NBayesTFDoc kNNeighbourDoc NBayesTFHL kNNeighbourHL 0.2 0.2 0 0 1 100 200 300 400 500 Vector size 600 700 800 900 1000 1 100 200 300 400 500 Vector size 600 700 800 900 1000 Figure 13: In uence of vector size to precision of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150502. HomeNet data, user 150101 HomeNet data, user 150101 0.8 0.6 0.6 Precision 1 0.8 Precision of the best suggestion 1 0.4 0.4 NBayesTFDoc kNNeighbourDoc NBayesTFHL kNNeighbourHL 0.2 0.2 0 0 1 2 4 8 16 32 Vector size 64 128 256 512 1 2 4 8 16 32 Vector size 64 128 256 512 Figure 14: In uence of vector size to precision of model based on documents (left) and hyperlinks (right) for the HomeNet project user with id: 150101. Notice that log-scale on x-aix. 18] John, G.H., Kohavi, R., P eger, K., Irrelevant Features and the Subset Selection Problem, Proc. of the 11th International Conference on Machine Learning ICML94, pp. 121|129, 1994. 19] de Kroon, H.C.M, Mitchell, T., Kerckho s, E.J.H., Improving Learning Accuracy in Information Filtering, ICML-96 Workshop Machine learning meets human computer interaction, 1996. URL: http://www.ics.forth.gr/ moustaki/ICML96 HCI ML/kroon.ps 20] Krulwich, B., Burkey, C., The ContactFinder agent: Answering buletin board questions with referrals, Proceedings of the Thirteenth National Conference on Arti cial Intelligence AAAI 96, pp.10|15, 1996. 21] Lang, K., News Weeder: Learning to Filter Netnews, Proc. of the 12th International Conference on Machine Learning ICML95, 1995. 22] Lashkari, Y., The WebHund Personalized Document Filtering System, 1995. 17 GGL-PUM001011644 23] Lewis, D.,D., Gale, W., A., A Sequential Algorithm for Training Text Classi ers, Proc. of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline, 1994. 24] Maes, P., Agents that Reduce Work and Information Overload, Communications of the ACM Vol. 37, No. 7, pp.30{40, July 1994. 25] Mitchell, T., Caruana, R., Freitag, D., McDermott, J., Zabowski, D., Experience with a Learning Personal Assistant, Communications of the ACM Vol. 37, No. 7, pp.81{91, July 1994. URL: http://www.cs.cmu.edu/afs/cs/user/mitchell/ftp/cacm.ps.Z 26] Pazzani, M., Muramatsu, J., Billsus, D., Syskill & Webert: Identifying interesting web sites, AAAI Spring Symposium on Machine Learning in Information Access, Stanford, March 1996 and Proceedings of the Thirteenth National Conference on Arti cial Intelligence AAAI 96, pp.54|61, 1996. 27] van Rijsbergen, C.J,. Harper, D.J., Porter, M.F., The selection of good search terms, Information Processing & Management, 17, pp.77|91, 1981. 28] Shaw Jr, W.M., Term-relevance computations and perfect retrieval performance, Information Processing & Management, 31(4), pp.491|498, 1995. 29] Rocchio, J., Relevance Feedback in Information Retrieval, in The SMART Retrieval System: Experiments in Automatic Document Processing, Chapter 14, pp.313|323, Prentice-Hall Inc., 1971. 30] Salton, G., Buckley, C., Term Weighting Approaches in Automatic Text Retrieval, Technical report, COR-87-881, Department of Computer Science, Cornell University, November 1987. 31] Shardanand, U., Maes, P., Social Information Filtering: Algorithms for Automating \Word of Mouth", CHI'95 Mosaic of Creativity, pp. 210|217, 1995. 32] Skalak, D.B., Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms, Proc. of the 11th International Conference on Machine Learning ICML94, pp. 293|301, 1994. 33] Sorensen, H., McElligott, M., PSUN: A Pro ling System for Usenet News, CIKM'95 Intelligent Information Agents Workshop, Baltimore, December 1995. 34] Quinlan, J.R., Constructing Decision Tree in C4.5: Programs for Machine Learning, pp.17{26, Morgan Kaufman Publishers, 1993. 35] Yang, Y., Expert Network: E ective and E cient Learning form Human Decisions in Text Categorization and Retrieval, Proc. of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline, 1994. 18 GGL-PUM001011645 Exhibit 4 Exhibit 5 Collecting User Access Patterns for Building User Profiles and Collaborative Filtering Ahmad M. Ahmad Wasfi School of Computer Sciences University of Science, Malaysia Penang 11800, Malaysia ahmadm@cs.usm.my relative priorities to completely transforming into other domains. Thus, the recommender system must be able to . specialize to the current interests of the user and adapt as they change over time. A variety of learning mechanisms have been employed within recommender systems. They mostly revolve around the following three basic techniques ABSTRACT The paper proposes a new learning mechanism to extract user preferences transparently for a World Wide Web recommender system. The general idea is that we use the entropy of the page being accessed to determine its interestingness based on its occurrence probability following a sequence of pages accessed by the user. The probability distribution of the pages is obtained by collecting the access patterns of users navigating on the Web. A finite context-model is used to represent the usage information. Based on our proposed model, we have developed an autonomous agent, named ProfBuilder, that works as an online recommender system for a Web site. ProfBuilder uses the usage information as a base for content-based and collaborative filtering. u41. Direct learning technique: The user provides a set of keywords to describe his/her interests (e.g. SIFT [22]). These keywords may include objective terms (such as author name and date of publication) or content keywords to reflect the information manifested in the desired documents. One advantage of this technique is its predictability. The user can usually guess why such information has been delivered. The problem with this technique is that it requires a lot of effort on the part of the user. The user should continually update the keywords to reflect his/her new interests. The user may not also formulate effective keywords to describe his/her interests. Keywords Autonomous agent, Classical context-model, Content-based filtering. information filtering, theory, Finite Collaborative INTRODUCTION Partially direct learning technique: The system learns user preferences by eliciting explicitly some kind of user feedback (e.g. SysKill & Webert [13], InfoFinder [lo], Ringo [18], and Fab [3]). When the user moves from one page to another, the user rates how interesting the current page is. It needs less user effort than the first technique. However, the user still has a greater mental load compared when he/she browses normally. The World Wide Web hypertext system is a very large distributed digital information space. Some estimates suggested that the Web included about 150 million pages and this number doubled every four months [7]. As more information becomes available, it becomes increasingly difficult to search for information without specialized aides. Agent-based recommenders have been proposed as a solution to this problem. During a browsing session, these computer systems work collaboratively with a user without the need of an explicit initiation. It has a static or dynamic collection of pages to be recommended. It assists the user by recommending pages that match his/her needs. Indirect or transparent learning technique: The system learns user preferences transparently without any extra effort from the user. However, current methods are not adequately practical. For example, Letizia [1 l] infers user preferences from observing user-browsing behavior such as saving a reference to a page. This approach is intuitively reasonable to indicate an interest to the page. During a browsing session, however, a user may refer to many pages in his/her bookmark just for future examination and not because of their interestingness. Since the system involves repeated interaction with the user and this may extend over a long period of time, the user’s interests cannot be assumed to stay constant. The change in interests could be anything from a slight shift in Permissioli to make digital ~er5011d ck~room or use or hard cop& (fallor partof this work for is grranted without fee procidrd that copies To address the problems of the basic techniques, some systems use a mixture of them to get a compromise model between user effort and predictability. For example, Anatagonomy [14] learns user preferences by joining both are not Wide or distnhutcd Ihr profit or commercial a&z_~ntage and that topics bear this ncoticc and the full citation on the first page. ‘fo copy othsrwk. to republish. to post on sewers or to redistribute to lists, rquires prior specific permissionand!ora f&2. ICI 99 Redondo Copyright ACM Beach CA USA 1999 l-581 13-098~8/99/01...$5.00 57 GGL-PUM00101614 the second and third techniques. SiteHelper [ 191 and WebWatcher [2] use the first technique to specify the initial user area of interests and use the second technique to get feedback from the user to refme their recommendations. However, the problems of the basic techniques are still inherent in the recommender systems. and adding the vector elements Di representing after it is changed in proportion to to, Q=Q-I-ti*Di i.e. the weight of each word in Di is modified proportional to tii. The resulting effect is that, for those words already present in the profile, the word-weights are modified in proportion to tii * di. Words which are not in the profile are added to it. With the drawbacks of the learning techniques in mind, we propose a new mechanism that learns user interests and adapts automatically to their changes without user intervention. It relies on using the probability distribution of the pages to be accessed and tools derived from classical information theory. In order to illustrate how the learning mechanism plays out in a real application, we present an agent-based recommender, called ProfBuilder, that inhabits a Web site and is assigned the goal of being online responsive to the information needs of the site’s users. It remains to find an effective way for inferring tq. The Method Before introducing the algorithm, let us first get our bearings by considering a few examples. Suppose we have a hypertext collection of 1000 pages (e.g. a Web site). Users navigate in the collection by using any navigation technique such as selecting hypertext links, specifying page addresses, or selecting pages from interest lists. Suppose we have collected the access patterns of a large number of users. Consider that A and B are two pages in the collection. If the conditional probability of visiting A given B is very high (i.e. it is very probable that when a user visits A, the user will jump next to B), one would consider that there is a strong interdependency or relationship between A and B. One form of relationship is that there is a high resemblance between A and B in content. In effect this means that the user is still interested in the same domain of A. Hence visiting B does not give much new information about the user’s interests as the content of B is redundant to A’s. To guarantee that the agent can adapt quickly to the changes of user interests, the importance of B should be low. The following section provides a detail of our new proposed learning mechanism. We then present an overview of the architecture of ProfBuilder. Finally we outline some related work, followed by some possible ideas to address the limitations of ProfBuilder. THE LEARNING page si MECHANISM Background A profile is a description of user interests. To deliver information a user wants to see, we should search for pages that are similar to his/her profile. An appropriate representation for profiles and pages is based on vectorspace representation, commonly used in information retrieval (IR) literature [ 151. In the vector-space model, pages and queries (profiles) are both represented as vectors in some hyper-space. The model assumes that there is an available keyword set k, where each element k, is a keyword. Both pages and profiles can then be represented as weight vectors of the form Another form of relationship is that B has a great informative value from A. This is because B may contain important hyper information (hyperlinks pointing to other pages), an important content, or may be both. In this case, B may not be considered to reflect the actual user’s interests. The user may visit B just because of its importance at some time. As an example, consider a user reads the top-headline story “Iraq Standoff as Diplomatic Efforts Continue, Sabers Rattle” from the main page of CNN site. The user may read the story just because of its significance as it is the top-headline story and is not interested to get more information about Iraq. Since B gives a little information about user interests because a user will likely visit B regardless of his/her interests, the importance of B should be low. D, = < d, > and Qi = < qi > where d, and qi represent the weight of ki in vector Di or Qi, respectively. di (or qi) is set equal to 0 when k, is absent from the vector. For this representation, the method for profile reformulation in response to the changes of user’s interest is based on vector adjustment. Since profiles and pages are both vectors, the profile should move closer to the vectors representing pages which are relevant and away from the vector representing pages which are non-relevant. The implicit assumption of this is that pages resemble each other are represented by reasonably similar vectors. If, on the other hand, the probability of visiting B following A is very low, one may consider that there is no relationship between A and B (such as in content). Thus, if a user chooses to jump from A to B, B gives much information about his/her interest. For example, suppose a user reads a non-headline story about space from the main page of CNN site. Since it is reasonable to assume that the probabilities of visiting non-headline stories are low in comparison to the headline ones, one would expect that the user is interested in space stories. In other cases, B may be Consider that page si is the current page of user Uj. Let us assume that variable t,,, which is a nonnegative number the relevance or between zero and one, indicates importance of page si to user Uj. A reformulation of vector Qj representing the user profile is obtained by taking Qj 58 GGL-PUM00101615 considered to be a threshold point some domains and gaining interest importance of B should be high. of losing interest in in others. Thus, the where pr is the conditional probability of visiting given the sequence of pages si 1sjl, sjz,..., sjm A context model may be an order-m fixed model based on a fixed number m of previous pages in its probability determination or may be an order-m mixed model based on contexts of several lengths with a maximum-length of m. In an order-m fixed model, the probability (or, more accurately, frequency) of a given page is known only if we know the m preceding pages. For instance, if m = 0 then no context is used and the probability of the current page is the probability of its occurrence in the collection. If m = 1 then the previous page is used to determine the probability of the current page. If m = 2 then the previous two pages are used, and so on. At any one time, therefore, we shall call the m preceding pages the state of the order-m model at that time. Since there are q pages, an order-m model will have qm possible states. A handy way to illustrate the behavior of a context model is through the use of a state diagram. In a state diagram we represent each of the qm possible states of the model by a circle, and the possible transitions from state to state by arrows. If we consider the foregoing examples, we see that the importance of a page is inversely proportional to its probability following the page (or the sequence of pages) visited by the user. The importance of classical information theory is embodied in the idea that the value of information content H of messages sent from message sources to message receivers can be measured quantitatively. Because of the redundancy that occurs due to the dependency between successive messages, the value of a message is assumed to be inversely proportional to the probability with which the message could have been predicted by the receiver before the message arrived [ 171. H of a message is then defined as a decreasing function H(p), also known as the entropy, of the probability p of that message. Because the information content of two messages should be an additive function of their individual content values, that is, H@,pz) = H(p,) + H(p,), and the value of a message received with the probability one should be zero, the information content of a message can be defined as (for formal prove see[l]), HO As an example, suppose we have a very simple Web site consisting of two pages A and B. That is, s = {A, B} and q = 2. When m = 2 then the probability distribution of page si (i = 1 or 2) is determined by the previous two pages sj and Sk (j, k = 1 or 2). Let us assume that the conditional distribution Of Si given Sj and Sk is as follows: = - log( P) If the browsing process is considered as a transference of information from a collection of pages such as a Web site (message source) to a user (message receiver), it is possible to use tools derived from information theory to quantify the information value of a page (message) as it relates to each user. pr(si = A A, Sk = B) = 115 I Sj = pr(si = B B, Sk = B) = 213 I Sj = ~1, Sk = p) = 1 -pr(si =A I Sj = CX, Sk = p) where (a, p = A or B). Because q is equal to 2 and we have assumed an order-2 fixed context model, we have four states - AA, AB, BA, and BB. The state diagram of this model is shown in Figure 1. for Web Navigation distribution of the pages to be accessed is based on collecting the visiting patterns of many users. The usage history of a collection of pages, such as a Web site, was represented previously as a directed graph of pages [21] [9]. This means that the occurrence of a page depends only upon the previous page. A more general type representation is that the occurrence of a page may depend upon a context consisting of a finite number m of preceding pages. Such a representation is called finitecontext modeling which is commonly used in statistical modeling of an information source (e.g. for text compression [S]). The advantage of using the context model is that it better reflects the actual distribution of the pages. The model is specified by giving the finite set s consisting of q pages (where each element si is a page) and the set of conditional probabilities: ,..., 1 Sj = pr(si = A The probability ,..., q;jp=1,2 A, sk = A) = l/4 pr(si = A 1.s,= B, sk = A) = l/7 tq = H(pr) = - log (pr) Si_) pr(si 1Sjl, Sj2,..., fori=1,2 1 Sj = pr(si = A The importance or interestingness tii of page si to user u, is then assumed to be its entropy H(‘pr) based on its conditional probability pr of being accessed following the sequence of pages accessed by user uj, Representation page s,, The possible states of the site are indicated by the four circles. The possible state transitions are indicated by arrows from state to state, with the probability of a transition shown by a number associated with each arrow. For example, if we are in state AA we can go to either AA or AB but not to state BA or BB. The probability of remaining in state AA is shown as l/4, and the probability of going to state AB is shown as 314. On the other hand, in an order-m mixed model, if m = 2 then we use the previous two pages, one predecessor if two-pages context fails to determine the probability of a page, and the probability of the page occurrence if both two-pages and one-page contexts fail. A mixed model may be either fully or partially mixed. The model is f~11lymixed if it contains all the fixed sub-models whose orders equal 4 59 GGL-PUM00101616 mechanism is implemented based described in the preceding section. on the algorithm ProfBuilder inhabits a Web site and is assigned the goal of finding relevant local pages for the site’s users. The advantage of this architecture is that ProfBuilder does not need to search the Web to collect the pages to be recommended, as they are the site’s pages. Thus, ProfBuilder has the benefit of operating without using bandwidth from the Internet except a trivial amount when it delivers its recommendations to the users. Moreover, such a system is also transparent to the users requiring to external installation. ProfBuilder is autonomous page filtering on the user’s the preferences of the user time. It is transparent as it user intervention. 213 Figure 1. State diagram of an order-2 fixed model withq=2. ProfJ3uilder keeps track of each individual user and provides that person online assistance. The assistance includes two lists of recommendations based on two different filtering paradigms: content-based and collaborative. ProfBuilder updates the lists each time the user changes his/her current page. Content-based filtering is based on the correlation between the content of the pages and the user’s preferences. The collaborative filtering is based on a comparison between the user’s path of navigation and the access patterns of past users. Combining the two paradigms may eliminate the shortcomings in each approach. By making collaborative filtering, we can deal with any kind of content and explore new domains to find something interesting to the user. By making content-based filtering, we can deal with pages unseen by others [3]. or less than the order of the model. That is, a fully mixed model with m = 3 bases its determination on sub-models of orders 3, 2, 1, and 0. The model is partially mixed if it uses some, but not all, of the sub-models. The Order of a Context Model Determining the order m of a model is critical to reflect the actual probability distribution of a collection of pages. Let us consider the behavior of the model as m equals to zero and when m gets larger. When m = 0, theprobability of a page is just the probability of its occurrence in the collection, which is obviously not enough to show the dependency among the pages. As m increases, it reflects better the actual dependency among the pages. But when m gets very large, the dependence of a page on the previous m pages becomes very weak. Thus, an order of few pages is very reasonable to determine the dependency among the pages. Calculating as it can take actions relating to behalf. It is adaptive as it learns and adapts as they change over extracts the preferences without To overcome the problem of stateless connection in HTTP, ProfBuilder follows users through tracking their IP addresses. To track user presence, a timeout mechanism is used to delete user’s session information after a predetermined amount of idle time. So that, a connection after the specified period having the same IP is identified as a new user. This method is fairly easy to implement. The problem with this way is that many users connect to the Internet though proxy servers. Consequently, the IP of a proxy server may represent two or more people who are accessing the same Web site simultaneously in their browsing sessions, causing an obvious conflict. However, the reality is that many large sites use this method and have not had any clashes [20]. the Probability The general mechanism to calculate the probability pr of page si in a frilly order-m mixed model is dictated by the m most recent pages of the user path, if page Si has occurred in this particular path before by past users. In this case, only the order-m probability distribution is used. Otherwise, sub-models of lower orders are consulted. If the order-O sub-model is consulted (i.e. the page si has never occurred in the context of any higher sub-model before), pr is assumed to be proportional to, Fast performance is a key requirement as ProfBuilder is in the middle of every Web transaction. ProfBuilder was built in a highly multi-threaded fashion using Java language so that no information that ought to be delivered to the browser gets stuck somewhere in the agent system. where ni is the occurrence frequency of page 31, and N is the total number of times of visiting all the pages. This will guarantee to supply a probability for any page in collection s. ARCHITECTURE The architecture three modules. This section discusses ProfBuilder (acronym for Profile Builder), a transparent, adaptive, autonomous agent which works as a recommender system. ProfBuilder’s learning 60 of ProfBuilder can be broken down into GGL-PUM00101617 . The graphical user interface module is responsible displaying ProfBuilder’s interface. l The learning module is responsible for maintaining the mapping between the actual interests of the user and the user profiles. l The filtering module is responsible based and collaborative filtering. for the number of pages pointing to the page [4]). The visibility is a sign of popularity [12] and frequency of access. For example, we expect that a page with a visibility of 10 will be accessed more than a page with a visibility of 1. The visibility data is obtained by querying Infoseek search engine. for the content- The context model is built progressively as users jump from one page to another using any navigation technique. The general mechanism of building is to update the frequency of the occurrence of the current page in order-O sub-model, and update its frequency of occurrence in order-l sub-models based on the user’s previous page. The Graphical User Interface The graphical user interface of ProfBuilder is a separate resizable HTML frame at the top of the current page. Figure 2 illustrates ProfBuilder’s interface’. The result frame displays a list of the recommended pages represented by their respective titles. The title is also associated with the page size which may be useful to distinguish among pages having the same title, as well as allows users to estimate the time and space it will take to retrieve the page. The Information Filtering Module Content-based Filtering The filtering process consists of translating pages to their vector space representation, finding pages that are similar to the profile, and selecting the top-scoring pages for presentation to the user. ProfBuilder highlights each recommendation to show its relevance and access frequency (given the user’s current path) by putting ‘ball’ and ‘man’ icons, respectively, in front of the title. The number of balls shows levels of relevance: one-ball pages are poor, two-ball pages are neutral, three-ball pages are good, and so on. The number of men shows level of .access frequency logarithmically: one-man pages are visited once, two-man pages visited two to three times, three-man pages are visited four to seven times, four-man pages are visited eight to fifteen times, and so on. The vector representation is obtained by a text analysis of HTML pages. This is done by extracting keywords from page titles, all level of headings, and anchor hypertexts. This narrow analysis leads to retrieval of fewer pages, but most of the retrieved materials are likely to be helpful to the user; as it is reasonable to assume that the author of a Web page used these words to give the main aspect of the page. Stop words [5] are filtered out and word stemming [6] is then performed to improve IR performance. The keywords are weighted based on the well-test algorithm TDIDF [ 161. The weight of a keyword in one page is the product of its keyword frequency and the inverse of its document frequency. The weight of the keyword kj is given To read the content of the page, the user clicks on its title. Titles in ‘bold’ font indicate unread pages, while titles in ‘normal’ font indicate pages have been read by the user. by, In addition to the list, the frame also shows two buttons for choosing the type of filtering. In the content-based filtering, the selected pages are sorted in decreasing order of their interestingness, while in the collaborative filtering are sorted in decreasing order of their access frequencies. where tAj is the number of occurrences of kl in page s,, and idA is the inverse document frequency of kj in the Web site. The similarity metric between the vector Di representing page si and the vector Qj representing the interests of user uj is calculated by taking a scalar product of the two vector, The Learning Module The learning module handles the task of mapping user interests to the profile and maintaining the correlation between the two. It is implemented on the basis of our proposed learning mechanism. Similarity(D,, a) = f wik * w,k Collaborative Filtering The filtering process is based on the following hypothesis: making available the work of a large number of past users can be useful to find out relationships or interdependency between pages. Thus, it is reasonable to advise one user of what was done by others (the previous section discussed relationship types). For instance, when there is a high probability to visit page B given page A, it may indicate that B has an important content. This reduces the chances of missing something particularly significant. The Web site usage is represented as a full order-l mixed model. A mixed model is desirable and essentially unavoidable to show the actual probability distribution of the pages. The order has been chosen because of its space and time effectiveness as well as to the reasons mentioned in the previous section. The frequency of occurrence of each page in the order-O sub-model is initialized based on its visibility (roughly it is The module finds pages from the user’s current path which is in this application only his/her current page, and selecting the top-frequency pages for presentation to the user. ’ For evaluation purposes; we have modified ProfBuilder to a proxy-based architecture, as there was no practical site physically available. 61 GGL-PUM00101618 Figure 2 ProfBuilder’s interface. collaborative filtering. Furthermore, Letizia requires considerable bandwidth to operate resulting in network overload and bandwidth shortages. On the other side, ProfBuilder inhabits a Web site and operate locally in assisting external users. Finally, Letizia requires a Macintosh and Netscape browser to operate, which severely limits its extent. In contrast, ProBuilder is a platform independent. RELATEDRESEARCH with other compares ProfHuilder This section recommender systems. Among these systems, four are selected for comparison: Letizia [ll], SiteHelper [19], Ring0 [18], and Fab [3]. The systems were selected because they cover the best features of the other recommender systems. Letizia SiteHelper Letizia [l l] is an agent that assists a user browsing the Web. Letizia uses the idle time spent by the user reading the current document to explore the neighborhood looking for pages that are related to the user’s interest. Similar to ProfBuilder, the goal of Letizia agent is to autonomously provide assistance to the user without his/her intervention while the user is browsing. However, Letizia learns the areas that are of interest to a user by recording the user’s browsing behavior. Letizia uses only content-based filtering, while ProfBuilder uses both content-based and SiteHelper [19] is an agent that acts as a housekeeper for a Web site. It helps a user to find relevant information at the site. This is similar to ProfBuilder design. However, the learning mechanism is not transparent. SiteHelper prompts the user for a set of keywords and asks for an explicit feedback of rating the keywords. In contrast, the learning mechanism used by ProfBuilder works transparently without user intervention. In addition, SiteHelper’s 62 GGL-PUM00101619 recommendations filtering. are based only on content-based by using self-organizing lists and hashing techniques to provide a means of representing approximated context models of any order in a reasonable amount of memory. Ringo Finally, ProfBuilder assists a user by finding relevant information on only one Web site. We intent to solve the problem by maintaining user profiles across different Web sites that use ProfBuilder. So that, when a user jumps to another site, the user’s profile will also be transferred to the new site whose ProfBuilder will search for pages similar to the profile. Thus, the user can find relevant recommendations in the first page accessed in the new site. Ringo [18] is a system, which creates personalized recommendations for music albums and artists. It can be accessed through electronic mail or the Web. Users describe their interests to the system by explicitly rating some music. A user profile is a record of the user’s interests (positive as well as negative) in specific items. Ringo makes recommendations based on comparisons among user profiles (collaborative filtering). Its processing time takes an hour, while ProfBuilder operates concurrently with the user in his/her browsing session. CONCLUSION In this paper, we have proposed a new learning mechanism to learn user preferences from the retrieved pages. It is based on their probabilities, which are obtained from collecting the visiting patterns of past users. We assumed that the importance of a page is its entropy based on its probability of visiting following a sequence of pages visited by the user. Fab Fab [3] is a system that helps Web users to discover new and interesting sites. Fab combines both content-based and collaborative filtering systems. The system maintains user profiles based on content analysis, and directly compares these profiles to determine similar users for collaborative recommendation. The system delivers a number of pages that it thinks would interest the users. Users evaluate the pages and provide explicit feedback to the system. Fab uses the Web for collecting pages for recommendation. This is an advantage over ProfBuilder’s approach, since it is restricted to the site’s pages. However, Fab uses considerable bandwidth for collecting these pages. Moreover, the system does not operate concurrently with users during their browsing session. Users get the recommendations by explicitly accessing their account in Fab’s database through the Web. We have also introduced ProfBuilder, an agent-based recommender for a Web site. ProfBuilder uses the site usage information to learn user interests and as a base for collaborative filtering. ProfBuilder helps the user to find relevant pages on the site by providing both content-based and collaborative filtering. ProfBuilder is independent of the browser as it assists the users by inserting its recommendations in the requested pages. The efficiency of the learning mechanism and the usefulness of recommendations given by ProfBuilder have not been formally evaluated so far. We believe that ProfBuilder performs well for sites that consist of 1000 HTML URLs and above as it is infeasible to read the entire content of the sites. FUTURE WORK The limitations follows: of ProfBuilder and possible solutions are as First, the main limitation of our system is that there is no real assessment of ProfBuilder. We plan to do serious evaluation studies in the near future. For example, we plan to examine different combinations of sub-models with various orders. REFERENCES 1. Aczel, J., and Daroczy, Z. 1975. On Measures of Information and their Characterizations. Academic Press. 2. Armstrong, R.; Freitag, D.; Joachims, T.; and Mitchell, T. 1995. WebWatcher: A Learning Apprentice for the World Wide Web. In Proceedings of the AAAI Spring Gathering from Information Symposium on Heterogeneous, Distributed Environments, Stanford, CA. Second, one problem of its learning mechanism is that it needs a large number of users, in order to have enough data for reflecting the interdependency between pages. We intend to create simulated or virtual users who represent a particular taste to show the content relationships among the pages. In the CNN site for example, we can create a virtual ‘Iraq’ user, who visits only pages containing the word ‘Iraq’. In the case of an order-l fixed model, for instance, all the pages will be connected to form a complete graph. The frequency of a transition between two pages is assigned proportional to the product of their weights of the word ‘Iraq’. 3. Balabanovic, M., and Shoham, Y. 1997. Fab: ContentRecommendation. Collaborative Based, Communications of the ACM 40(3):66-72. 4. Bray, T. 1996. Measuring the Web. In Proceedings of the Fifth International World Wide Web Conference, Paris. 5. Fox, C. 1990. A Stop List for General Forum 24(1-2): 19-35. Third, since the number of states of an order-m fixed context model increases exponentially with m, it is even difficult to have a model with m = 2 as the space required to store all the context information is prohibitive. We intend to solve this problem as others have done (e.g. [S]) Text. SIGIR 6. Frakes, W. B. 1984. Term Conflation for Information Retrieval. New York: Cambridge university Press. 63 GGL-PUM00101620 Proceedings of the sixth International Conference, Santa Clara. World Wide Web 7. Gudivada, V. N.; Raghavan, V. V.; Grosky, W. I.; and Kasanagottu, R., 1997. Information Retrieval on the World Wide Web. IEEE Internet Computing 1(5):5868. 15.Saltoq G., and McGill, M. J. 1983. An Introduction Modern Information Retrieval, McGraw-Hill. 8. Hirschberg, D. S., and Lelewer, D. A. 1992. Context Modeling for Text Compression, Image and Text compression. Kluwer academic Press, Norwell, MA. 16.Salton, G., and Yang, C. S. 1973. On the Specification of Term Values in Automatic Indexing. Journal of Documentation 29(4):351-372. 9. Jaczynski M., Trousse, B. 1997. Broadway: A World Wide Web Browsing Advisor Reusing Past Navigations from a Group of Users. 17.Shannon, C. E. Communication. 27(3):623-656. htip://www.inria.fr/aidlpapers/97/ukcbr/htmllukcbr.htm 1 1948. A Mathematical Theory of Bell System Technical Journal 18. Shardanand, U., and Maes, P. 1995. Social Information Filtering: Algorithms for automating “Word of Mouth”. In Proceedings of the Human Factors in Computing Systems Conference -- CHI’9.5, Denver. lO.Krulwich, B., and Burkey, C. 1997. The InfoFinder Agent: Learning User Interests through Heuristic Phrase Extraction. IEEE Intelligent System 12(5):22-27. 19.Siaw, Agent Wide World 11. Lieberman, H. 1995. Letizia: An Agent that Assists Web Browsing. In Proceedings of International Joint Conference on Artificial Intelligence, Montreal. D., and Ngu, W. 1997. SiteHelper: A Localized that Helps Incremental Exploration of the World Web. In Proceedings of the Sixth International Wide Web Conference, Santa Clara. 20.Thomas, B. 1997. Recipe for Internet Computing 1(6):72-74. 12.Marchiori, M. 1997. The Quest for Correct Information on the Web: Hyper Search Engines. In Proceedings of the Sixth International World Wide Web Conference, Santa Clara. 13.Pazzani, M.; Muramatsu, J.; Syskill & Webert: Identifying Workshop on Machine Learning AAAI Spring Symposium Series, to E-commerce. IEEE 21. Wexelblat, A., and Maes, P. 1997. Footprints: Visualizing Histories for Web Browsing. In Computer-Assisted RIA0’97: Proceedings of Information Retrieval on the Internet, Montreal. and Billsus, D. 1996. Interesting Web Sites. in Information Access, Stanford, CA. 22.Yan, T. W., and Garcia-Molina, H. 1995. SIFT - A Tool for Wide Area Information Dissemination. USENIX Technical Conference. 14.Sakagami H., and Kamba, T. 1997. Learning Personal on Online Newspaper Articles from User Behaviors. In 64 GGL-PUM00101621 Exhibit 8 GGL-PUM00101538 GGL-PUM00101539 GGL-PUM00101540 GGL-PUM00101541 GGL-PUM00101542 GGL-PUM00101543 GGL-PUM00101544 GGL-PUM00101545 GGL-PUM00101546 GGL-PUM00101547 GGL-PUM00101548 GGL-PUM00101549 GGL-PUM00101550 GGL-PUM00101551 GGL-PUM00101552 GGL-PUM00101553 GGL-PUM00101554 GGL-PUM00101555 Exhibit 9 Exhibit 15 REDACTED IN ITS ENTIRETY Exhibit 16 REDACTED IN ITS ENTIRETY Exhibit 17 REDACTED IN ITS ENTIRETY Exhibit 18 REDACTED IN ITS ENTIRETY Exhibit 26 Exhibit 27 Bing Making search yours - Search Blog - Site Blogs - Bing Community WEB IMAGES VIDEOS MAPS Page 1 of 13 MORE COMMUNITY Bing t y r e Q i m b u S HOME BLOGS FORUMS MEDIA EVENTS TOOLBOX Making search yours Facebook Like 3 people like this. 74 Tags bing bing cashback Bing for Mobile bing maps Shopping bing travel search Stefan Weitz BingIt events Bing Rewards Bing Facebook instant answers Kari Dilloo live Ted Roduner Search Blog Making search yours The Bing Team 2/10/2011 9:00 AM Comments (10) For years people have talked about personalized search as the next evolution of our amazing technology. Over the years, the biggest obstacle facing search engineers is the simple fact that human behavior is not predictable. Don’t get us wrong, people can be creatures of habit and we can build functions that enable us to display different results based on logical assumptions made in the aggregate. For example, ‘traffic’ at 6pm on a Friday likely refers to road conditions, not the movie or the band. So we can generally use some math magic to make really good predictions about what you mean when you type ‘traffic’ at 6pm tonight, but what if we’re wrong? It’s easy to see how that could happen especially if you, say, walk to work. In that case a more personal search would benefit you; having more detailed information about the person doing the search can make results more effective. We think one of the challenges with delivering results which are truly individualized is that, to date, personalized search “can’t see the forest for the trees”. In other words everyone is collecting everything and trying to figure out the foibles of human behavior from a mass of digital bits. To an extent, we’ve all been looking at the wrong inputs which in turn haven't given us the output we want. We’ve found something interesting: a person’s history or profile often does not necessarily deliver better http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 2 of 13 Even as we continue to develop more relevant search through smart personalization, we are very focused on maintaining an industry-leading privacy stance. For more information, see here. We’re currently ‘flighting’ (or “testing”, for non search-geeks) a raft of experiments to see which techniques deliver the best results for a given user behavior, but today we want to talk about two we’ve recently put out there for you all! First, something relatively simple: automatically tailoring search results based on your physical location As 76% of people use search engines to plan trips, events or social gatherings, Bing Local has always provided you with maps to nearby business listings, authoritative reviews and areas of interest. Starting t d ’ i t f th ith i t th t t k i t t h d http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 3 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 4 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 5 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 6 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 7 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 8 of 13 relevant result for that user is not necessarily the same as that for the majority of people in the U.S. To numerous users with an interest in pursuing a career in chemistry, the most relevant result may be the American Chemical Society, but to someone interested in how they can get involved in the fight against cancer, the most relevant result is more likely to be the American Cancer Society. Suppose, in this latter case, the chemistry fan selects American Chemical Society . Our research shows that users commonly re-issue such navigational queries and the intent of that user rarely changes. This new personal search feature uses this human behavior as its core premise – if Bing thinks a user is trying to “refind” a site, the relevant result is promoted to the top position on the page: The beauty of thinking differently about personalized searching is that it enables us to construct elegant solutions that require a minimal amount of personal information and, frankly, often exhibit better results than a more computationally complex predictive model alone. There is much more to come, but take Bing for spin and tell us what you think! - Aidan Crook & Sanaz Ahari, Bing Search Join Bing Community http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 9 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 10 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 11 of 13 2/14/2011 4:51 AM http://bollywoodmasala.info/ sunny.seo15 2/16/2011 9:27 PM I like very much...Pizza..............:) kuriositee 2/17/2011 6:40 AM Will you offer a way to turn personalized search results off like Google does? Or manually change your location? http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 12 of 13 Create an account to comment. http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Bing Making search yours - Search Blog - Site Blogs - Bing Community Page 13 of 13 http://www.bing.com/community/site_blogs/b/search/archive/2011/02/10/making-search-y... 7/17/2012 Exhibit 28 REDACTED IN ITS ENTIRETY

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?