Selene Communication Technologies LLC v. Google Inc.
Filing
1
COMPLAINT filed with Jury Demand against Google Inc. - Magistrate Consent Notice to Pltf. ( Filing fee $ 400, receipt number 0311-1503856.) - filed by Selene Communication Technologies LLC. (Attachments: # 1 Exhibit A, part 1, # 2 Exhibit A, part 2, # 3 Exhibit A, part 3, # 4 Exhibit A, part 4, # 5 Civil Cover Sheet)(cla, )
US 6,363,377 Bl
1
2
explosive growth of the Web and the limitation of time and
space faced by search engines, it is unlikely that full
coverage of the Web is forthcoming.
This application claims benefit of the filing date of
provisional application No. 60/094,694 filed Jul. 30, 1998.
Most users feel the incompleteness of the indices only
This invention was made under U.S. Government Con- 5 indirectly, since they can not miss a web page if they do not
tract NROXXX-96-G-3006. The Government has certain
know it exists. The more pressing problem is that using the
rights in the invention.
search engines can be a frustrating, time-consuming, and
often unsuccessful process for the user. In most search
TECHNICAL FIELD
sessions, the user's needs are well enough formulated in her
10 head that only a small number of web pages would exactly
This invention relates generally to the field of search
meet her need. The problem then, is getting the search
techniques used on information management system or on
engine to understand the user's needs. Unfortunately, the
the global information network ("the World Wide Web").
state of the art in human-machine interaction is far from
More specifically, the present invention is a method and
meeting such a goal. Many user queries produce unsatisfacsystem for refining and improving search queries and for
15 tory results, yielding thousands of matching documents. The
organizing the results of a search query by different and
search engine indices support many basic information
overlapping criteria.
retrieval queries, but the users are offered little guidance in
determining which keywords and in which combination
BACKGROUND OF THE INVENTION
would yield the desired content. Typically, the user ends up
The blossoming of the World Wide Web in the 1990s has
20 alternating between specifying too few keywords which
given computer users access to vast quantities of
yield too many matching documents, and supplying too
information, an estimated 100--300 million Web pages,
many keywords which yield no matches. Many search
many terabytes of data. The user provides the Uniform
engines lack efficiency in eliminating duplicate URLs from
Resource Locator ("URL") of a page to the browser, the
their indices. As a consequence, redundant information is
browser retrieves the page from the Internet and displays it
25 sometimes returned to users, and can create a lot of frustrato the user. When the user knows the URL of the page, the
tion.
procedure is simple. However, to find information on the
While a number of tools have been developed to help the
Web, the user must access a search engine. The user submits
user search more intelligently, by allowing selection of
a query and the search engine returns a list of URL's of
pages that satisfy the query together with a summary of each 30 additional search criteria, none of them offers useful analysis
of the query results that could give guidance to the user in
page. The continuing exponential growth of the Web makes
reformulating a more appropriate query. Some search
the task of finding the relevant information exceedingly
engines group and display results based on the popularity of
difficult. This effort is further aggravated by the unorganized
the site. While others attempt to do some type of organizaand extremely dynamic nature of the Web.
tion. One such search engine, Northern Light, organizes all
There are two paths to searching for information on the 35
the query results into at most 10 folders based on subject,
Web. One path is consulting a manually compiled Web
type, source and language. While this is a step in the right
catalog, such as Yahoo. Any manual catalog of the Web
direction, the user is not given any information on how the
necessarily suffers two drawbacks: the nature of the inforcategories are derived or on how many results are in each
mation on the Web makes any cataloging efforts necessarily
folder.
limited and incomplete, and the catalog offers no help to a 40
user interested in a subject that happens not to be covered by
SUMMARY OF THE INVENTION
the catalogers.
The present invention is embodied in a simple and effecThe other path to searching for information on the Web is
tive method for improving the searching of an information
using a Web engine. The major ones as of January 1998 are
management system using a search engine and for refining
Alta Vista, Excite, HotBot, InfoSeek, Lycos, NorthernLight, 45 and organizing the search results.
and Web Crawler, plus a number of branded versions of
The present invention provides for a query tuner, allowing
these. These engines send out programs called robots, or
a user to effectively reformulate a query in order to find a
crawlers, which automatically peruse the Web and gather
reasonable number of matching documents from the search
Web pages they discover. The collected pages are automatiengine by automatically and selectively modifying indically indexed and collected into a data base. In this process, 50
vidual query terms in the user's query to be weaker or
known as indexing, Internet URLs are associated with
stronger.
relevant words from the page they identify. Many search
One aspect of the present invention provides for a
engines store page summaries along with URLs. Page sumdynamic filter, using a dynamic set of record tokens to
marization varies from one search engine to another. Some
search engines store the first fifty words of a document. 55 restrict the results of a search query to include only records
which correspond to the record tokens.
Other engines, try to understand the content of the pages.
Another aspect of the present invention provides for a
They attempt to define relevant "ideas" based on associaresults organizer, to aid the user in organizing and undertions of words within documents and they summarize the
standing a large number of matching documents returned in
Web Pages by storing these "ideas". The users can query the
indices for pages meeting certain criteria. For example, a 60 response to a search query by clustering like items returned
from the search.
user can request all the Web pages found by the search
Another aspect of the present invention provides for a
engine that have the phrase "cryptography software" somesearch history, to allow the user to save, organize and search
where in the text. There are two major problems with using
the queries and the documents that best satisfy the query.
the search engines: 1) incomplete coverage and 2) difficulty
of effective use. Not a single engine contains a complete 65
It is to be understood that both the foregoing general
description and the following detailed description are
index of the Web; they index anywhere from 2 million pages
exemplary, but are not restrictive, of the invention.
by WebCrawler to 100 million pages by Alta Vista. Given the
SEARCH DATA PROCESSOR
US 6,363,377 Bl
3
4
1-Query Tuner Option-Reformulation of a query
2-Dynamic Filter Option-Restriction of the results
The invention is best understood from the following
from a query
detailed description when read in connection with the
3-Results Organizer Option-Organization of the
accompanying drawings. Included in the drawing are the 5
results from a query
following Figures:
The system then begins to process each option individuFIG. lAis a flowchart illustrating a high level chart of the
ally. First, the system checks, in step 14, if the query tuner
invention;
option has been selected. If the option has been selected
then, in step 16, the query refinement process is initiated and
FIG. lB is an example of a data processing system in
which the invention may be implemented;
10 the query is modified prior to the search being performed.
The search is then performed as shown in step 18.
FIG. lC is an example of another data processing system
The system, in step 20, checks for the existence of
in which the invention may be implemented;
additional processing options to be performed. If the system
FIG. 2A is a portion of a flow chart illustrating an
determines, in step 22, that the dynamic filter option has
exemplary implementation of the query tuner operation
15 been selected, then the dynamic filter process is performed
shown in FIG. lA;
in step 24. The system, in step 26, determines if the result
FIG. 2B is a portion of a flow chart illustrating an
organizer option has been selected. If this option has been
exemplary implementation of the dynamic filter operation
selected, then in step 28, the results organization process is
shown in FIG. lA;
performed. Next, after all options have been processed, the
FIG. 2C is a portion of a flow chart illustrating an 20 system displays the results in step 30. The system concludes
exemplary implementation of the result organizer operation
with the user selection of the results as shown in step 32 and,
shown in FIG. lA;
optionally, the user saves the results of the query at step 34.
FIG. 3 is a further illustration of the user's operating
An example of a data processing system which can use the
environment illustrated in FIG. lB;
search data processing system to search the Web is shown in
FIG. 4 is an example of a graphical display of a search 25 FIG. lB. In FIG. lB, the Web server 41, executes the
invention and provides the users 43 access to the Web. The
query according to a first exemplary embodiment of the
users 43 send their queries over the Lan 45 to the Web server
present invention;
41. FIG. 3 further illustrates a typical user's interaction with
FIG. 5 is an example of a graphical display of a search
the Web when performing a search. The Web Server relays
query according to a second exemplary embodiment of the
30 a users query to a search engine to perform the search.
present invention.
Although the invention is illustrated in terms of an
FIG. 6 is an example of a graphical display of a selected
Internet browser searching pages on the World Wide Web, it
search result of FIG. 5.
is contemplated that it may be generally applied to any
FIG. 7 is a Venn diagram of the theoretical operation
information management system. This implementation of
according to a third exemplary embodiment of the present 35 the invention is shown in FIG. lC, where the user 42
invention;
executes the invention and information management system
FIG. 8 is a functional block diagram of an exemplary
49 is the information management system to be searched.
implementation of the third exemplary embodiment of the
Alternatively, the information management system may be a
invention;
distributed information management system including both
FIG. 9(a) is a hierarchy tree according to a fourth exem- 40 of the information management systems 49 and 49'. In
applying the searching techniques described below, it may
plary embodiment of the invention;
be desirable to substitute information management system
FIG. 9(b) is another hierarchy tree according to the fourth
records for the documents and web pages described below
exemplary embodiment;
and to substitute record tokens or some other identifying
FIG. 9(c) is yet another hierarchy tree according to the 45 field from an information management system record for the
fourth exemplary embodiment;
URL of the web page.
FIG. lO(a) is a further hierarchy tree according to the
FIG. 2C provides the details of the results organizer from
fourth exemplary embodiment;
step 28 in FIG. lA. The results organizer processes the
documents that match the query and cluster them according
FIG. lO(b) is a further hierarchy tree according to the
50 to common themes. Clustering may be accomplished, for
fourth exemplary embodiment;
example, by removing all the common stop-words from the
FIG. 11 is an example of an implementation of the query
documents and then hashing phrases of different lengths
tuner;
(referred to hereafter as clean phrases), such as phrases
consisting of single words, pairs of consecutive words and
DETAILED DESCRIPTION OF THE
55 long sequences of words, to determine which phrases occur
INVENTION
in multiple documents that were returned by the search
FIG. lA shows an overview of the search data processing
operation.
system. The search data processing system is a computer
The hashing function takes all the text fields contained in
program which may reside on a carrier such as a disk,
the documents, deletes all the common stop-words, and then
diskette or modulated carrier wave. The system, in step 5, 60 hashes all the clean phrases into a particular position in a
begins processing when a user initiates or continues a search
hash table. Typically, a hash address value for a particular
session. In step 10 the user enters a search query. If the user
item is generated by applying an algorithm (the hashing
is continuing a prior search session, then the history is
function) directly to the item. The hashing function generretreived as shown in step 11 and the previous search's
ates different hash table addresses for different items while
keywords are added to the search query. Next, in step 12, the 65 generating the same hash table address for identical items.
While the exemplary embodiments of the invention are
system determines which of the following processing
options are to be performed:
described as using a hashing function to cluster the query
DESCRIPTION OF THE DRAWING
US 6,363,377 Bl
5
6
4, if the user were to choose cluster 410, the system would
results, it is contemplated that other methods of clustering,
may be used instead of the hashing function. One such
display the 23 documents that contain "www.quickaid.com/
airports".
alternate method might be to form a concordance. Clean
FIG. 5 shows an example of a graphical display of a
phrases in each document may be alphabetically sorted as
they are received to form a list of all of the words in the 5 search query for a second exemplary embodiment of the
combined documents. Each item in the list may include the
invention. The clustering lenses interface consists of a
display of the title, URL, content and age lenses and a
clean phrase, a list of the documents in which the clean
phrase occurs and the offset in each document at which the
Combination window. For each lens, the corresponding part
clean phrase occurs. This concordance may be used to
of each matching document is analyzed. As a result, a small
cluster clean phrases in the documents based on the occur- 10 number of interesting patterns are discovered and presented
rence of single words or on the near occurrences of groups
to the user by using pattern matching and clustering algorithms. Users also have the option of specifying their own
of words in the documents. Another alternate method might
be to form a vector for each document in the multidimenpatterns. More specifically, each tool takes one field at a time
sional space defined by all the clean phrases in the docuand partitions all the documents returned by the search
ments. Each dimension of this space can correspond to a 15 engine according to a pattern found in that field. The
single clean phrase in the document collection, and the
documents may be partitioned into 1 to 5 clusters or more.
corresponding position in a document's vector is set to 1 if
Since the pattern analysis is performed on each field
the document contains the clean phrase and to 0 otherwise.
separately, it corresponds to viewing the documents through
Any number of geometric clustering algorithms can then be
a lens that only displays the field of interest and hides the
used to cluster the vectors into a small number of clusters so 20 other fields.
as to minimize a geometric measure of the cluster, such as
FIG. 5 shows an illustration of a display for a query about
the volume of the cluster or the cluster's diameter.
New Jersey restaurants. For example, this query produces
As illustrated in FIG. 2C, after the documents have been
100 matching documents. Title lens 500 partitions the docuhashed in step 63, the hash tables are analyzed to identify the
ments found into 3 clusters corresponding to cells 502, 504
clusters as shown in step 65. The results of the clustering are 25 and 506. Title Lens 500 considers similarities in the titles of
then displayed in step 67 and shown by example in FIG. 4.
the matching documents. Searching for similarity in both
FIG. 4 is an example of a graphical display of a search query
format and words does the partitioning. For example, a
format similarity is documents with "No Title" or documents
according to a first embodiment of the present invention.
whose title begins with "Re:". A word similarity refers to
In the exemplary embodiment, the clustering algorithm is
implemented in the language Perl, which includes a non- 30 any common subsequence of words in the title. The stroncollision hashing function. An exemplary embodiment
gest word similarity is identical titles; a weaker word
hashes each clean phrase from the document title, URL, and
similarity is an identical phrase within titles or identical
summary to any entry in the hash table (also known as a hash
words separated by other words, e.g. "Jane K. Doe' and Jane
Katherine Doe".
bucket) using the hashing function in Perl. The exemplary
Title Lens 500 finds that 40 titles contain the phrase
hash table entry includes counts of the number of documents 35
that contain the hashed clean phrase. At the end of the
"NJWeb: Dining in New Jersey" corresponding to a cluster
hashing process, each entry in the table may or may not
in cell 502. In cell 504, title lens 500 finds 20 titles that start
with the word "Yahoo!". In cell 106, title lens 500 finds that
represent a cluster. The entries are analyzed to determine the
the remaining 40 titles do not have any interesting patterns.
best clusters by weighing both the number of documents that
contain the common clean phrase and the length of the clean 40 In this exemplary embodiment of the invention, the width of
phrase. The best clusters are output to the user.
each cell in the display is proportional to the number of
FIG. 4 is an illustration of a clustered display for a query.
documents the cell represents. The partition of the docuFor example, the query produced over 400 matching documents into only 3 clusters is not intended to limit to scope
ments. The system discovers a small number of interesting
of the invention rather it is shown for simplicity and illuspatterns by using pattern matching and clustering alga- 45 trative purposes only.
Also shown in FIG. 5 is URL Lens 510 which partitions
rithms. The results organizer produced clusters 410, 420,
the 100 documents found into four clusters corresponding to
430 and 440 for this sample set of documents. The partition
cells 512, 514, 516 and 518. URL Lens 510 considers
of the documents into only 4 clusters is not intended to limit
similarities in the matching documents' Web addresses. For
to scope of the invention rather it is shown for simplicity and
illustrative purposes only. For each cluster, the system 50 example, if there are many files with "pub/biblio" as part of
displays the number of documents that are in the cluster, the
the pathname, they may form a cluster. In general, any
common clean phrase, and a representative document from
nontrivial contiguous part of the file path is mined for
patterns. URL lens 510 finds 40 URLs that contain the term
the cluster. For example, cluster 410 contains 23 documents
whose common theme is the phrase "www.quickaid.com/
"www.njweb.com/dining" corresponding to cell 512. In cell
airports". For a URL, any characters found between con- 55 514, URL lens 510 finds 20 URLs that contain the term
"yahoo.com". In cell 516, URL lens 510 finds 20 URLs that
secutive slashes are interpreted as a word in the text. For
contain the term "metrocast.com". In cell 518, URL lens 510
example, a URL http://www.quickaid.com/airports/newark/
ewr0444/dayd.html would cause the following "words" to
finds 20 URLs that have no patterns. Furthermore, the 40
be hashed: www.quickaid.com, airports, newark, ewr0444
URLs having "www.njweb.com/dining" as a substring are
and dayd. In addition to single "words", the following 60 exactly those with titles "NJWeb: Dining in New Jersey".
two-word phrases and long phrases would also be hashed:
Such a fact is indicated by the edges 550 joining cells 502
www.quickaid.com/airports, airports/newark, newark/
and 512. Edges 552 indicate that the documents clustered in
Cell 504 are exactly those documents clustered in cell 514.
ewr0444, ewr0444/dayd and www.quickaid.com/airports/
newark/ewr0444/dayd.
Further, FIG. 5 shows Content Lens 520 with the 100
The user may choose to view any of the discovered 65 documents found partitioned into 4 clusters corresponding to
cells 522, 524, 526 and 128. Content lens 520 considers
clusters; the system, then, displays the documents that
appear in the selected cluster. For example, as shown in FIG.
similarity in the short excerpts of the matching documents.
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?