Selene Communication Technologies LLC v. Google Inc.

Filing 1

COMPLAINT filed with Jury Demand against Google Inc. - Magistrate Consent Notice to Pltf. ( Filing fee $ 400, receipt number 0311-1503856.) - filed by Selene Communication Technologies LLC. (Attachments: # 1 Exhibit A, part 1, # 2 Exhibit A, part 2, # 3 Exhibit A, part 3, # 4 Exhibit A, part 4, # 5 Civil Cover Sheet)(cla, )

Download PDF
US 6,363,377 Bl 1 2 explosive growth of the Web and the limitation of time and space faced by search engines, it is unlikely that full coverage of the Web is forthcoming. This application claims benefit of the filing date of provisional application No. 60/094,694 filed Jul. 30, 1998. Most users feel the incompleteness of the indices only This invention was made under U.S. Government Con- 5 indirectly, since they can not miss a web page if they do not tract NROXXX-96-G-3006. The Government has certain know it exists. The more pressing problem is that using the rights in the invention. search engines can be a frustrating, time-consuming, and often unsuccessful process for the user. In most search TECHNICAL FIELD sessions, the user's needs are well enough formulated in her 10 head that only a small number of web pages would exactly This invention relates generally to the field of search meet her need. The problem then, is getting the search techniques used on information management system or on engine to understand the user's needs. Unfortunately, the the global information network ("the World Wide Web"). state of the art in human-machine interaction is far from More specifically, the present invention is a method and meeting such a goal. Many user queries produce unsatisfacsystem for refining and improving search queries and for 15 tory results, yielding thousands of matching documents. The organizing the results of a search query by different and search engine indices support many basic information overlapping criteria. retrieval queries, but the users are offered little guidance in determining which keywords and in which combination BACKGROUND OF THE INVENTION would yield the desired content. Typically, the user ends up The blossoming of the World Wide Web in the 1990s has 20 alternating between specifying too few keywords which given computer users access to vast quantities of yield too many matching documents, and supplying too information, an estimated 100--300 million Web pages, many keywords which yield no matches. Many search many terabytes of data. The user provides the Uniform engines lack efficiency in eliminating duplicate URLs from Resource Locator ("URL") of a page to the browser, the their indices. As a consequence, redundant information is browser retrieves the page from the Internet and displays it 25 sometimes returned to users, and can create a lot of frustrato the user. When the user knows the URL of the page, the tion. procedure is simple. However, to find information on the While a number of tools have been developed to help the Web, the user must access a search engine. The user submits user search more intelligently, by allowing selection of a query and the search engine returns a list of URL's of pages that satisfy the query together with a summary of each 30 additional search criteria, none of them offers useful analysis of the query results that could give guidance to the user in page. The continuing exponential growth of the Web makes reformulating a more appropriate query. Some search the task of finding the relevant information exceedingly engines group and display results based on the popularity of difficult. This effort is further aggravated by the unorganized the site. While others attempt to do some type of organizaand extremely dynamic nature of the Web. tion. One such search engine, Northern Light, organizes all There are two paths to searching for information on the 35 the query results into at most 10 folders based on subject, Web. One path is consulting a manually compiled Web type, source and language. While this is a step in the right catalog, such as Yahoo. Any manual catalog of the Web direction, the user is not given any information on how the necessarily suffers two drawbacks: the nature of the inforcategories are derived or on how many results are in each mation on the Web makes any cataloging efforts necessarily folder. limited and incomplete, and the catalog offers no help to a 40 user interested in a subject that happens not to be covered by SUMMARY OF THE INVENTION the catalogers. The present invention is embodied in a simple and effecThe other path to searching for information on the Web is tive method for improving the searching of an information using a Web engine. The major ones as of January 1998 are management system using a search engine and for refining Alta Vista, Excite, HotBot, InfoSeek, Lycos, NorthernLight, 45 and organizing the search results. and Web Crawler, plus a number of branded versions of The present invention provides for a query tuner, allowing these. These engines send out programs called robots, or a user to effectively reformulate a query in order to find a crawlers, which automatically peruse the Web and gather reasonable number of matching documents from the search Web pages they discover. The collected pages are automatiengine by automatically and selectively modifying indically indexed and collected into a data base. In this process, 50 vidual query terms in the user's query to be weaker or known as indexing, Internet URLs are associated with stronger. relevant words from the page they identify. Many search One aspect of the present invention provides for a engines store page summaries along with URLs. Page sumdynamic filter, using a dynamic set of record tokens to marization varies from one search engine to another. Some search engines store the first fifty words of a document. 55 restrict the results of a search query to include only records which correspond to the record tokens. Other engines, try to understand the content of the pages. Another aspect of the present invention provides for a They attempt to define relevant "ideas" based on associaresults organizer, to aid the user in organizing and undertions of words within documents and they summarize the standing a large number of matching documents returned in Web Pages by storing these "ideas". The users can query the indices for pages meeting certain criteria. For example, a 60 response to a search query by clustering like items returned from the search. user can request all the Web pages found by the search Another aspect of the present invention provides for a engine that have the phrase "cryptography software" somesearch history, to allow the user to save, organize and search where in the text. There are two major problems with using the queries and the documents that best satisfy the query. the search engines: 1) incomplete coverage and 2) difficulty of effective use. Not a single engine contains a complete 65 It is to be understood that both the foregoing general description and the following detailed description are index of the Web; they index anywhere from 2 million pages exemplary, but are not restrictive, of the invention. by WebCrawler to 100 million pages by Alta Vista. Given the SEARCH DATA PROCESSOR US 6,363,377 Bl 3 4 1-Query Tuner Option-Reformulation of a query 2-Dynamic Filter Option-Restriction of the results The invention is best understood from the following from a query detailed description when read in connection with the 3-Results Organizer Option-Organization of the accompanying drawings. Included in the drawing are the 5 results from a query following Figures: The system then begins to process each option individuFIG. lAis a flowchart illustrating a high level chart of the ally. First, the system checks, in step 14, if the query tuner invention; option has been selected. If the option has been selected then, in step 16, the query refinement process is initiated and FIG. lB is an example of a data processing system in which the invention may be implemented; 10 the query is modified prior to the search being performed. The search is then performed as shown in step 18. FIG. lC is an example of another data processing system The system, in step 20, checks for the existence of in which the invention may be implemented; additional processing options to be performed. If the system FIG. 2A is a portion of a flow chart illustrating an determines, in step 22, that the dynamic filter option has exemplary implementation of the query tuner operation 15 been selected, then the dynamic filter process is performed shown in FIG. lA; in step 24. The system, in step 26, determines if the result FIG. 2B is a portion of a flow chart illustrating an organizer option has been selected. If this option has been exemplary implementation of the dynamic filter operation selected, then in step 28, the results organization process is shown in FIG. lA; performed. Next, after all options have been processed, the FIG. 2C is a portion of a flow chart illustrating an 20 system displays the results in step 30. The system concludes exemplary implementation of the result organizer operation with the user selection of the results as shown in step 32 and, shown in FIG. lA; optionally, the user saves the results of the query at step 34. FIG. 3 is a further illustration of the user's operating An example of a data processing system which can use the environment illustrated in FIG. lB; search data processing system to search the Web is shown in FIG. 4 is an example of a graphical display of a search 25 FIG. lB. In FIG. lB, the Web server 41, executes the invention and provides the users 43 access to the Web. The query according to a first exemplary embodiment of the users 43 send their queries over the Lan 45 to the Web server present invention; 41. FIG. 3 further illustrates a typical user's interaction with FIG. 5 is an example of a graphical display of a search the Web when performing a search. The Web Server relays query according to a second exemplary embodiment of the 30 a users query to a search engine to perform the search. present invention. Although the invention is illustrated in terms of an FIG. 6 is an example of a graphical display of a selected Internet browser searching pages on the World Wide Web, it search result of FIG. 5. is contemplated that it may be generally applied to any FIG. 7 is a Venn diagram of the theoretical operation information management system. This implementation of according to a third exemplary embodiment of the present 35 the invention is shown in FIG. lC, where the user 42 invention; executes the invention and information management system FIG. 8 is a functional block diagram of an exemplary 49 is the information management system to be searched. implementation of the third exemplary embodiment of the Alternatively, the information management system may be a invention; distributed information management system including both FIG. 9(a) is a hierarchy tree according to a fourth exem- 40 of the information management systems 49 and 49'. In applying the searching techniques described below, it may plary embodiment of the invention; be desirable to substitute information management system FIG. 9(b) is another hierarchy tree according to the fourth records for the documents and web pages described below exemplary embodiment; and to substitute record tokens or some other identifying FIG. 9(c) is yet another hierarchy tree according to the 45 field from an information management system record for the fourth exemplary embodiment; URL of the web page. FIG. lO(a) is a further hierarchy tree according to the FIG. 2C provides the details of the results organizer from fourth exemplary embodiment; step 28 in FIG. lA. The results organizer processes the documents that match the query and cluster them according FIG. lO(b) is a further hierarchy tree according to the 50 to common themes. Clustering may be accomplished, for fourth exemplary embodiment; example, by removing all the common stop-words from the FIG. 11 is an example of an implementation of the query documents and then hashing phrases of different lengths tuner; (referred to hereafter as clean phrases), such as phrases consisting of single words, pairs of consecutive words and DETAILED DESCRIPTION OF THE 55 long sequences of words, to determine which phrases occur INVENTION in multiple documents that were returned by the search FIG. lA shows an overview of the search data processing operation. system. The search data processing system is a computer The hashing function takes all the text fields contained in program which may reside on a carrier such as a disk, the documents, deletes all the common stop-words, and then diskette or modulated carrier wave. The system, in step 5, 60 hashes all the clean phrases into a particular position in a begins processing when a user initiates or continues a search hash table. Typically, a hash address value for a particular session. In step 10 the user enters a search query. If the user item is generated by applying an algorithm (the hashing is continuing a prior search session, then the history is function) directly to the item. The hashing function generretreived as shown in step 11 and the previous search's ates different hash table addresses for different items while keywords are added to the search query. Next, in step 12, the 65 generating the same hash table address for identical items. While the exemplary embodiments of the invention are system determines which of the following processing options are to be performed: described as using a hashing function to cluster the query DESCRIPTION OF THE DRAWING US 6,363,377 Bl 5 6 4, if the user were to choose cluster 410, the system would results, it is contemplated that other methods of clustering, may be used instead of the hashing function. One such display the 23 documents that contain " airports". alternate method might be to form a concordance. Clean FIG. 5 shows an example of a graphical display of a phrases in each document may be alphabetically sorted as they are received to form a list of all of the words in the 5 search query for a second exemplary embodiment of the combined documents. Each item in the list may include the invention. The clustering lenses interface consists of a display of the title, URL, content and age lenses and a clean phrase, a list of the documents in which the clean phrase occurs and the offset in each document at which the Combination window. For each lens, the corresponding part clean phrase occurs. This concordance may be used to of each matching document is analyzed. As a result, a small cluster clean phrases in the documents based on the occur- 10 number of interesting patterns are discovered and presented rence of single words or on the near occurrences of groups to the user by using pattern matching and clustering algorithms. Users also have the option of specifying their own of words in the documents. Another alternate method might be to form a vector for each document in the multidimenpatterns. More specifically, each tool takes one field at a time sional space defined by all the clean phrases in the docuand partitions all the documents returned by the search ments. Each dimension of this space can correspond to a 15 engine according to a pattern found in that field. The single clean phrase in the document collection, and the documents may be partitioned into 1 to 5 clusters or more. corresponding position in a document's vector is set to 1 if Since the pattern analysis is performed on each field the document contains the clean phrase and to 0 otherwise. separately, it corresponds to viewing the documents through Any number of geometric clustering algorithms can then be a lens that only displays the field of interest and hides the used to cluster the vectors into a small number of clusters so 20 other fields. as to minimize a geometric measure of the cluster, such as FIG. 5 shows an illustration of a display for a query about the volume of the cluster or the cluster's diameter. New Jersey restaurants. For example, this query produces As illustrated in FIG. 2C, after the documents have been 100 matching documents. Title lens 500 partitions the docuhashed in step 63, the hash tables are analyzed to identify the ments found into 3 clusters corresponding to cells 502, 504 clusters as shown in step 65. The results of the clustering are 25 and 506. Title Lens 500 considers similarities in the titles of then displayed in step 67 and shown by example in FIG. 4. the matching documents. Searching for similarity in both FIG. 4 is an example of a graphical display of a search query format and words does the partitioning. For example, a format similarity is documents with "No Title" or documents according to a first embodiment of the present invention. whose title begins with "Re:". A word similarity refers to In the exemplary embodiment, the clustering algorithm is implemented in the language Perl, which includes a non- 30 any common subsequence of words in the title. The stroncollision hashing function. An exemplary embodiment gest word similarity is identical titles; a weaker word hashes each clean phrase from the document title, URL, and similarity is an identical phrase within titles or identical summary to any entry in the hash table (also known as a hash words separated by other words, e.g. "Jane K. Doe' and Jane Katherine Doe". bucket) using the hashing function in Perl. The exemplary Title Lens 500 finds that 40 titles contain the phrase hash table entry includes counts of the number of documents 35 that contain the hashed clean phrase. At the end of the "NJWeb: Dining in New Jersey" corresponding to a cluster hashing process, each entry in the table may or may not in cell 502. In cell 504, title lens 500 finds 20 titles that start with the word "Yahoo!". In cell 106, title lens 500 finds that represent a cluster. The entries are analyzed to determine the the remaining 40 titles do not have any interesting patterns. best clusters by weighing both the number of documents that contain the common clean phrase and the length of the clean 40 In this exemplary embodiment of the invention, the width of phrase. The best clusters are output to the user. each cell in the display is proportional to the number of FIG. 4 is an illustration of a clustered display for a query. documents the cell represents. The partition of the docuFor example, the query produced over 400 matching documents into only 3 clusters is not intended to limit to scope ments. The system discovers a small number of interesting of the invention rather it is shown for simplicity and illuspatterns by using pattern matching and clustering alga- 45 trative purposes only. Also shown in FIG. 5 is URL Lens 510 which partitions rithms. The results organizer produced clusters 410, 420, the 100 documents found into four clusters corresponding to 430 and 440 for this sample set of documents. The partition cells 512, 514, 516 and 518. URL Lens 510 considers of the documents into only 4 clusters is not intended to limit similarities in the matching documents' Web addresses. For to scope of the invention rather it is shown for simplicity and illustrative purposes only. For each cluster, the system 50 example, if there are many files with "pub/biblio" as part of displays the number of documents that are in the cluster, the the pathname, they may form a cluster. In general, any common clean phrase, and a representative document from nontrivial contiguous part of the file path is mined for patterns. URL lens 510 finds 40 URLs that contain the term the cluster. For example, cluster 410 contains 23 documents whose common theme is the phrase " "" corresponding to cell 512. In cell airports". For a URL, any characters found between con- 55 514, URL lens 510 finds 20 URLs that contain the term "". In cell 516, URL lens 510 finds 20 URLs that secutive slashes are interpreted as a word in the text. For contain the term "". In cell 518, URL lens 510 example, a URL ewr0444/dayd.html would cause the following "words" to finds 20 URLs that have no patterns. Furthermore, the 40 be hashed:, airports, newark, ewr0444 URLs having "" as a substring are and dayd. In addition to single "words", the following 60 exactly those with titles "NJWeb: Dining in New Jersey". two-word phrases and long phrases would also be hashed: Such a fact is indicated by the edges 550 joining cells 502, airports/newark, newark/ and 512. Edges 552 indicate that the documents clustered in Cell 504 are exactly those documents clustered in cell 514. ewr0444, ewr0444/dayd and newark/ewr0444/dayd. Further, FIG. 5 shows Content Lens 520 with the 100 The user may choose to view any of the discovered 65 documents found partitioned into 4 clusters corresponding to cells 522, 524, 526 and 128. Content lens 520 considers clusters; the system, then, displays the documents that appear in the selected cluster. For example, as shown in FIG. similarity in the short excerpts of the matching documents.

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.

Why Is My Information Online?