Selene Communication Technologies LLC v. Google Inc.
Filing
1
COMPLAINT filed with Jury Demand against Google Inc. - Magistrate Consent Notice to Pltf. ( Filing fee $ 400, receipt number 0311-1503856.) - filed by Selene Communication Technologies LLC. (Attachments: # 1 Exhibit A, part 1, # 2 Exhibit A, part 2, # 3 Exhibit A, part 3, # 4 Exhibit A, part 4, # 5 Civil Cover Sheet)(cla, )
US 6,363,377 Bl
7
8
Content lens 520 first eliminates stop words, such as "a",
search engine returns a large set of URLs, one may want to
"an", "the", "to", etc., and then tries to partition the docurestrict it to pages that were visited only last week, or to
ments by common sentences, phrases or words. As an
pages that have been bookmarked, or to a smaller set of
example, content lens 520 finds 40 documents that contain
URLs that are relevant, based upon some specific criteria not
the term "Home allendale bayonne belleville bergenfield 5 previously captured in the search engine index. Sometimes
bloomfield butler" corresponding to a cluster in cell 522. In
it is desirable to filter the output to exclude pages to which
cell 524, content lens 520 finds 20 documents that contain
a user should not have access. Referring to FIG. 2B, there is
the term "top business and economy companies restaurants
shown the details of the dynamic filter processor in step 24
organizations". In cell 526, content lens 520 finds 20 docuof FIG. lA. The dynamic filtering, based on a dynamic set
ments that contain the term "cape may county". Contact lens
10 of URLs, is used to restrict the results of a search query. In
520 finds 20 documents that have no patterns in cell 528.
the exemplary embodiment of the invention, the dynamic set
Edges 554 indicate that documents clustered in cell 512 are
of URLs can be determined explicitly by the user from a user
exactly the same documents corresponding to cell 522.
profile as shown in step 57 of FIG. 2B, or, in general, from
Edges 556 indicate that the documents found in cell 514 are
information stored in other information management sysexactly the same as the documents clustered in cell 524.
FIG. 5 also shows Age legs 530. Age lens 530 clusters on 15 terns. Once the profile is accessed, then in step 59 the URL's
may be filtered and the results are displayed in step 61.
the documents' date of last update. Age lens 530 partitions
FIG. 7 is a Venn diagram which illustrates a dynamic
the 500 documents found into 4 clusters corresponding to
filtering operation. In FIG. 7, area 710 defines Sl as the set
cells 532, 534, 536, and 538. Cell 532 shows 40 documents
of URLs of pages and their summaries, returned by the
which were updated on Mar. 4, 1997. Cell 536 finds 30
documents updated on 1997. Cell 536 finds 20 documents 20 search engine, in response to a query for text in its own
updated on 1996. Cell 538 finds 10 documents updated on
information management system. Let n be the number of
1995.
URLs in Sl. There are also n summaries corresponding to
FIG. 5 shows the window of clusters results from which
each URL of Sl. Area 712 defines S2 as the set of URLs that
the user may select any number of cells from Lenses 500,
is dynamically generated by the user or by a query external
510, 520 and 530. The system, then, displays the documents 25 to the search engine. Let m be the number of URLs in S2.
that satisfy the conditions in all the selected cells. For
There are no summaries associated with these URLs. Sl and
example, as shown in FIG. 6 which is a graphical display of
S2 are likely to have URLs in common. The filter returns the
a selected search result of FIG. 5. In FIG. 5, the user chooses
URLs in the intersection of Sl and S2 and the respective
cell 526 corresponding to "Cape May County", cell 534
summaries corresponding to area 714.
corresponding to the year 1997 and cell 536 corresponding 30
The dynamic filtering of the present invention may
to year 1996. As a result, the system displays 15 documents
improve upon the poor performance of other filtering techthat contain "Cape May County" and were last updated in
niques which typically involve multiple disk fetch and store
1996 or 1997. The documents found are clustered and
operations and several sorting steps. The performance of a
fetch and store filtering technique of this type would be
displayed in cell 600. Cell 602 indicates that 10 of the
documents found were updated in 1997. Cell 604 indicates 35 O((m+n)log(m+n)). In contrast, the dynamic filtering of the
that 5 out of the 15 documents found were last updated in
present invention uses hashing and associated arrays in an
1995.
intelligent fashion and has a performance of O(m+n).
First, for set S2, an associative array is setup as shown in
In another embodiment of the invention, the set of keyTable 1.
words from the search query are used to rank the documents
returned by the search engine. The more keywords that 40
Table 1
appear in a document, the higher the document is ranked.
The results organizer outputs results from the highest ranked
Flag[url_l]=l;
to the lowest ranked.
Flag[url_2]=1; ...
The benefit of clusters to the user is that the clusters may
contain all items of interest or duplicate items. In the first 45
Flag[url_m ]=1;
In table 1, "url_n" represents a hash address generated from
case, only items in the cluster need be reviewed by the user.
While in the second case, only one item from the cluster
a particular URL. This process takes m steps. The Flag array
needs to be reviewed by the user prior to rejecting all the
indicates the URLs to be included from Sl.
other documents in the cluster. The "Vote" column in the
As the search engine starts returning URLs of set Sl, a
display allows the user to indicate the relevancy of a cluster 50 hash index into the associative array is generated from the
to his informational needs. If the user votes positively on a
URL and a check is performed to see if the corresponding
cluster, the system can use the documents in the desired
Flag is set. Only the URLs with the corresponding Flag set
and their summaries are provided as the output URL's of the
cluster (referred to hereafter as good documents) to recluster
dynamic filter. Everything else is ignored. After the URL is
the remaining documents, giving a higher weight to documents that are similar to the good documents. If the user 55 provided as an output URL, its Flag is reset to 0, to ensure
votes negatively on a cluster, the system can use the docuthat the same URL is not presented again. This is performed
ments in the undesired cluster (referred to hereafter as bad
sequentially and hence takes n steps. The entire algorithm is
therefore O(m+n) since lookup using a hash table in an
documents) to recluster the remaining documents, giving a
lower weight to documents that are similar to the bad
associative array is 0(1). The result of this process is the
documents.
60 intersection of Sl and S2, with the associated summaries
All clusters which receive a yes vote are saved along with
from Sl. Duplicate URLs, if returned by the search engine
the query in a search context folder. A user has the ability to
are eliminated.
find a query and its results by either browsing the search
FIG. 8 shows the architecture of an exemplary implemencontext folders or doing a keyword based search for them
tation according to the third embodiment of the present
among all the search context folders.
65 invention. The implementation uses commercially available
tools such as a Harvest search engine, an Informix database
Often times, it is desirable to filter the output of search
and a dynamic filter implemented using JavaScript and Perl.
engines, to prevent information from being displayed. If the
US 6,363,377 Bl
9
10
As an example, a particular implementation is the one of a
of the query is precisely the process that a user currently
performs, with no help from the search engine, in order to
student wanting to search pages of an Internet course that he
refine the user's search. A query tuner according to the
is taking, but limiting it to the pages he visited last week.
present invention offers the user a helpful guide in the user's
The Student is presented with a query page 810. Query
page 810 is a form where the student user inserts the text to 5 quest by taking the user's query and generating a small
number of additional queries according to a query hierarchy.
search and specifies to limit the search only to pages he
The user's query and the additional queries are forwarded to
visited last week. When the form is submitted, the query is
the search engine. The query tuner, then, evaluates the
split in two other queries 812 and 814. Query 814 goes to the
results of all the queries and suggests possible query reforInformix database 816 which has tracked the student's
10 mulations to the user, together with the expected number of
navigation in the course. The Informix database 816 genermatching documents each reformulation would yield.
ates a dynamic set 822 of URLs (S2) which represents the
Since each search engine has its own query language, the
pages he visited last week. Dynamic set 822 of URLs S2 is
query tuner, as shown in FIG. 2A, is defined for an abstract
then passed to dynamic filter 824, which sets up an assoquery language that can easily be mapped to any particular
ciative array of flags as explained above, with reference to
15 engine's language. In most search engines, there are two
FIG. 7.
types of data that the user can input: content that is to be
Query 812 goes to the Harvest search engine 816 which
matched to the text of documents, and structure, or metais configured to index all the pages of the course. Upon
data, related to each document that represents conditions to
receiving query 812, the Harvest search engine 816 starts to
be satisfied in order for a document to be considered a
return pages from its index. The pages returned are all of the
pages for the course no matter when the student visited the 20 match. For example, in the AltaVista query shown in Table
2.
course pages, since student access information is not stored
in the search index. The Harvest search engine 816 also
Table 2
returns pages that the student has not yet visited. The output
title: "cryptographic protocols"
set 818 from Harvest Sl, is processed against the Flag array
English language
by dynamic filter 824 and the intersection is returned. The 25
dated after Jan. 1, 1997
algorithm also filters out duplicates.
Dynamic filtering according to the present invention, can
For this example, content is "cryptographic protocols" and
be implemented anywhere one desires to restrict/filter one
meta-data is all the other information, consisting of the
set of URLs by another set of URLs. This may be desirable,
requirements that the content appear in the title of the
for example, for security reasons, when a company wants to 30 document, that the document's language be English, and that
restrict access to its resources based on the employee's
the document be dated after Jan. 1, 1997. The query requests
identity. Different employees may have associated with
all pages that have the keyword "cryptographic protocols" in
different lists of URLs that they have permission to read.
the title, that are written in the English, and that are dated
When a search is performed, the company can insure that
after Jan. 1, 1997.
only those URLs that are not restricted for the particular user 35
In addition, to the types of data, there is an implicit or
are presented. This addresses a common problem that search
explicit Boolean operation to be performed on the different
engines have: returning summaries of pages to which a user
parts of the query. In the above query, there is an implicit
"AND" operator among all of the query parts. In other
does not have access. Usually, search engines return sumwords, implicitly the query specifies that the phrase "crypmaries of protected pages, even if the user is restricted
access to those pages via web server password protection or 40 tographic protocols" appear in the content of the document
AND that this match be in the title AND that the document's
other mechanisms. Adult related material can also be filtered
language be English AND that the document be dated after
the same way. Internet service providers could use dynamic
Jan. 1, 1997. A query may be formally defined to be a
filtering to prevent children from searching adult web sites
Boolean expression Q=(q;) op (q), where op is a Boolean
and newsgroups. Personalizing search engines in an educational environment can be taken to higher levels. A student 45 operation from the set {AND, OR AND NOT, BEFOREw,
can associate various topics with the visited pages and then,
NEARw}, and each of q;, qj, is either a Boolean expression,
search the database based on URL's that are associated with
or of the form (m:k), where m is a (possibly empty)
meta-data quantifier and k is a (possibly empty) keyword.
a particular topic. A teacher may also mark certain pages as
The Boolean operators AND, OR and NOT have the stanbeing essential for a final exam, or as potential topics for
independent study. Then searching could be restricted to this 50 dard meanings; x NEARw y is TRUE if an only if x appears
within w words of y; x BEFOREw, y is TRUE if x appears
set of pages only.
For each component of the user's query, there are a few
at most w words before y.
A document satisfies or matches a query if it satisfies the
natural ways in which the query may be either restricted
Boolean expression Q, where satisfying (m:k) means that
further or relaxed further to potentially produce either fewer
results or a greater number of results. This concept is best 55 the keyword k satisfies (or appears in) meta data m. A
illustrated by an example. Suppose the user's query specifies
keyword k can be a single word, a phrase, or a word with
some wildcard characters. Meta-data can be any structural
that the phrase "cryptographic protocols" appear as a headinformation such as title, heading, URL, domain, filename,
ing in the document. A more restrictive, or stricter, query
would require that "cryptographic protocols" be part of the
file extension, date; it can also be a specification to the
title, which would typically yield fewer results in compari- 60 quality of the match required. For example, an approximate
son to the user's query. A less restrictive, broader query
match meta-data operator (approxa y) evaluates to TRUE for
would specify that "cryptographic protocols" appear anya phrase x if and only if x can be transformed into y (or vice
where in the text, which would typically yield a greater
versa) with at most d single-letter deletes, insertions or
number of results in comparison to the user's query. Another
substitutions; a synonyms operator (syn y) evaluates to
way to relax the user's query is to require that the words 65 TRUE for a phrase x if and only if x is a synonym of y.
"cryptographic" and "protocols" appear near each other in a
A hierarchy forest may be defined on all the meta-data of
heading, rather than as a phrase. Relaxation and restriction
a query, where each tree contains meta-data that has certain
US 6,363,377 Bl
11
12
relationship to other meta data. As shown in FIG. 9(a), for
time to receive the results of just the user's query. Referring
example, the structural meta-data of an H1ML document
to FIG. 2A step 55, the next step performed by the query
forms a natural hierarchical list from the tags that specify the
tuner is to formulate the related queries. This process as well
most prominent information in the document, i.e. title, to
as how the results of the query aid the user is described
tags that specify the least prominent information in the 5 below.
document, i.e. text. In FIG. 9(a), requiring the keywords to
The formulation of related queries according the query
hierarchies is illustrated based on a sample query
appear in the title 900 is more restrictive than requiring the
keywords to appear in the title or level 1 heading 902. Cell
Q=(( title;cryptographic) BEFO RE 1 ( title;protocols)) AND
902, in turn is more restrictive than requiring the keywords
((English language) AND (dated after Jan. 1, 1997)). The
to appear in the title or level 1 or level 2 heading 904. An 10 term item is used to refer to any atomic part of the query: a
even broader search query can be done with respect to cell
meta-datum, a keyword or a Boolean operator. For example,
Q contains the following set of items {title, cryptographic,
904, which adds the level 2 headings to the query in cell 902.
The broadest most general query can be done with respect to
BEFORE1 , title, protocols, AND, English language, AND,
cell 906, which allows any text to be searched.
dated after Jan. 1, 1997}. For each query item t, define h(t)
In general, each hierarchy tree is ordered top to bottom 15 to be the node in the hierarchy forest corresponding to the
from meta-data values that most restrict the query to values
item t. Related queries consists of a set of queries, each of
that least restrict the query. For example, numerous hierarwhich takes the original user query and modifies some items
chies are appropriate for the date field, depending on the
in it by either restricting or broadening them according to the
hierarchy forest. The act of broadening (restricting) a query
desired granularity. In the context of recent news stories, a
daily granularity is appropriate as shown in FIG. 9(b). Cell 20 item t corresponds to using a descendant (an ancestor) ofh(t)
in place oft within Q.
910 restricts the search to documents dated after Jan. 2, 1997
while cell 912 restricts the search to documents dated after
For example, one set of related queries for our sample
query Q is shown in Table 3
Jan. 1, 1997. A search with a cell 912 restriction is broader
and encompasses the search with a cell 910 restriction. An
Table 3
even broader search can be done with a cell 914 restriction 25
((title:cryptographic) BEFORE 1 (title: protocols))
which includes documents dated after Dec. 31, 1996. On the
other hand, in the context of general web pages, a yearly or
(( or title: cryptographic) BEFORE 1 ( or title:
biannual granularity is more relevant as shown in FIG. 9(c).
protocols)) AND((English language) AND (dated after
Cell 920 restricts the search to documents dated after Jul. 1,
Jan. 1, 1997))
1997 while cell 927 restricts the search to documents dated 30 ((title:cryptographic) BEFORE 2 (title: protocols)) AND
after Jan. 1, 1997. A broader search can be done with respect
((English language) AND (dated after Jan. 1, 1997))
((title:cryptographic) NEAR 1 (title: protocols) AND
to documents dated after Jul. 1, 1996 in cell 924.
As with the meta-data, there is a hierarchy for the key((English language) AND (dated after Jan. 1, 1997))
words. For example, as shown in FIG. lO(a), the top of the
((title:cryptographic) BEFORE 1 (title: protocols)) AND
hierarchy is represented by cell 1010 and "keyword" corre- 35
(dated after Jan. 1, 1997))
sponds to the most restrictive search query. Second on the
((title:cryptographic) BEFORE 1 (title: protocols)) AND
hierarchy is cell 1012 corresponding to a broader search that
(English language))
can be done with the "all the English stemmings of key((title:cryptographic) BEFORE 1 (title: protocols)) AND
word". Cell 1014 is at the bottom of the hierarchy and
(English language) AND (dated after Jan. 1, 1997)
corresponds to the broadest search query related to "key- 40 Where represents the main index level or the highest
level heading in the HTML of the page.
word or any of its synonyms".
Finally, a hierarchy on the Boolean operators that form the
The exemplary tree shown in FIG. lO(b) indicates that the
query Q is defined as follows. For a single-word keyword
search can be contracted or restricted by moving up the tree.
with or without wildcards, the hierarchy is shown in FIG.
In addition, it indicates that the search can be expanded by
lO(a). When the keyword is a phrase, it is converted into a 45 moving down the tree. For example, a search limited to x
Boolean expression to which the Boolean hierarchy applies.
AND y according to cell 1040 can be restricted by moving
up the tree and searching according to cell 1050 where the
More specifically, ifk=w1 , w 2 . . . w,, and W; is the i-th word
in the keyword phrase, m:k becomes (m:w1 ) BEFORE 1
search is restricted to x NEARn y. In contrast, the search can
be expanded by moving down the tree and searching for only
(m:w2 ) BEFORE 1 . . . BEFORE 1 (m:w,). Although not
shown, the bottom-most node in each hierarchy is the NULL 50 x according to cell 1030 or searching for only y according
expression.
to cell 1035. The search can be further expanded by searchThese query hierarchies may be used to help the user
ing for x OR y according to cell 1020.
The generation of a set of related queries may be
refine a given query more effectively. In the Internet's
accomplished, for example, by holding all but one items of
current state, the slowest operation for a user performing a
search is the network delays in communicating with the 55 Q constant, while broadening or restricting the chosen item
search engine. In a typical search session, the user formut of Q. The broadening (restricting) may be accomplished by
lates a query, sends it to the search engine, waits some time,
traversing any number of edges up (down) the hierarchy tree
receives an answer, then reformulates the query and repeats
from h(t). Since different edges in the hierarchy forest have
the process. Some of the user's frustration comes from
different restrictive/broadening effects on the query, it is
having to pay for the network delay during each query 60 more efficient to traverse different number of edges in the
tree for different items in the query. In the exemplary
reformulation. The present invention cuts down the number
of reformulation iterations used to find the relevant inforembodiment of the invention, the number of edges traversed
is a fraction of the height of the hierarchy tree. Formally, an
mation. When the user poses a query, the browser generates
a number of related queries and sends all the queries to the
f-family is defined as a set of queries, each of which takes
search engine in parallel. The time to receive the complete 65 the original user query and modifies a set {t1 t2 . . . ts} of
results for the users query and just the number of matches for
items in the query, where each modified item tj is replaced
each of the related queries is asymptotically the same as the
by a node that is exactly min{l, f*H} edges away from h(t)
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?