PA Advisors, LLC v. Google Inc. et al
Filing
255
REPLY to #248 Claim Construction Brief filed by Google Inc.. (Attachments: #1 Exhibit A, #2 Exhibit B, #3 Exhibit C, #4 Exhibit D, #5 Exhibit E (part 1), #6 Exhibit E (part 2), #7 Exhibit E (part 3), #8 Exhibit E (part 4), #9 Exhibit F, #10 Exhibit G, #11 Exhibit H, #12 Exhibit I, #13 Exhibit J, #14 Exhibit K, #15 Exhibit L)(Cannon, Brian)
EXHIBIT E
Part 1 of 4
US. UTILITY PATENT APPLICATION
/tii/L
SCANNED
W`'
O.I.P.E.
Q.A.
PATENTDATE
8 7 0 6 ZUm 8
SECTOR
]CLASS
'
SUBCLASS
ART UNIT
2/72
-LED
EXAMINER
5 , A&r/l
0 FICHE
__c__
.
WITH:
DISK (CRF)
(Anached in pocket on right insideflap)
PREPARED AND APPROVED FOR ISSUE
I.#IsAllUT. A W BRIMARY EXAMINER
The information disclosed herein may be restricted. Unauthorized disclosure may be prohibited by the United States Code Title 35, Sections 122, 181 and 368 'ossession outside the US. Patent & Trademark Office is restricted to authorized employees and contractors only.
(FACE)
Bib Data Sheet for Local Print
. .
Printed 10/21/2000
htty
"
'
VPALM/OBJECT/JACKET?SER_NUM=09422%86
_ l y
.,,., w,,,,,,,,,, ,.. ............,,"""""......."."", ",","""............. ...,...
PROV SIONAL APPLICATION 60/116,582 01/20/1999
L STAGE
-30
09 1422286
"s ;
-
PTO '
-
CONTENTS
Date received (Incl. C. of M.) or Date Mailed
'
42. 43.
44.
Date receive (Incl. C. of M or Date Mailed
ro-3-00
5. 6.
7.
.
45. 46. 47. 48.
8. 9. 10. 11. 12.
13.
50.
51. 52. 54.
, -
14. 15. 16.
17.
57.
58.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
J
1.
60.
61.
62.
64. 65.
'. .
66.
67. 68.
.
69.
70. 71. 72. 73. 74. 75.
32. 33.
35.
76. 77. 78. 79.
t
36.
37. 38. 39. 40. 41.
,eo.
81.
'
(FRONT)
1l1l1111 1 III11111lI 1111111111I 1111 I11111l1111 11111lI1 1 1 ll l111lI 1 1 1l
US006199067Bl
(12)
Geller
United States Patent
SYSTEM AND METHOD FOR GENERATING PERSONALIZED USER PROFILES AND FOR UTILIZING THE GENERATED USER PROFILES T PERFORM ADAPTIVE o INTERNET SEARCHES
Inventor:
(10) Patent
(45)
No.: US 6,199,067 B1 Date of Patent: Mar. 6,2001
Maass, Henning, "Location-aware mobile applications based on dictionary services,'' Mobile Network and Applications, vol. 3, Issue 2, Aug. 1998, pp. 157-173.*
Xu, Yaowu et al., "Hierarchical Content Description and Object Formation by Learning," Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, (CBAIVL '99), Jun. 22, 1999, pp. 84-88, May 1991.*
Ilya Geller, Brooklyn, NY (US)
Assignee: Mightiest Logicon Unisearch, Inc., Brooklyn, NY (US) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days.
Primary E x a m i n e r 4 o s a i n T. Alam Assistant ExuminerShahid Alam (74) Attorney, Agent, or Firm-Edward Etkin, Esq. (57)
Appl. No.: 091422,286 Filed:
ABSTRACT
Oct. 21, 1999 Related U.S. Application Data
Provisional application No. 601116,582, filed on Jan. 20,
1999.
Int. CI.' ...................................................... G06F 17/30 U.S. C1. ..................................... 707/10; 70713; 707/5; 7071100; 704/4; 704/9; 7041247; 700117 Field of Search ..................................... 707114, 100, 707110, 200; 704/1, 4, 9, 270.1, 247, 250; 3821181, 190, 209; 700111, 17, 56, 83, 86; 369113 References Cited
U S . PATENT DOCUMENTS
4,914,590 * 411990 Loatman et al. ......................... 5,483,650 * 1/1996 Pedersen et al. ........................ 5,696,962 * 1211097 Kupice ..................................... 70418 70712 70714
(List continued on next page.) OTHER PUBLICATIONS Helm, Richanr et al., "Integrating Information Retrieval and Domain Specific Approaches for Browsing and Retrieval in Ob.ject-Oriented Class Libraries," Conference proceedings on Objcct-Oriented programming systems, languages, and applications, May 1991. p.*
A system and method for automatically generating personalized user profiles and for utilizing the generated profiles to perform adaptive Internet or computer data searches is provided. In accordance with the present invention, particular linguistic patterns and their frequency of recurrence are extracted from personal texts provided by the users of the system of the present invention and stored in a user profile data file such that the user profile data file is representative of the user's overall linguistic patterns and the frequencies of recurrence thereof. AlI documents in a remote computer system, such as the Internet, are likewise analyzed and their linguistic patterns and pattern frequencies are also extracted and stored in corresponding document profiles. When a search for particular data is initiated by the user, linguistic patterns are also extracted from a search string provided by the user into a search profile. The user profile is then cross matched with the search profile and the document profiles to determine whether any linguistic patterns match in all three profiles and to determine the magnitude of the match based on summation of respective frequencies of recurrence of the matching patterns. The documents with document profiles having the highest matching magnitudes are presented to the user as not only matching the subject of the search string, but also as corresponding to the user's cultural, educational, and social backgrounds a s well as the user's psychological profile. 62 Claims, 8 Drawing Sheets
US 6.199.067 B1
Page 2
U S . PATENT DOCUMENTS
5.822,748 * 1011998 Cohen et a1........................... 7074/2 5.848.408 * 1211998 Jakobsson et a1....................... 70713 7041260 5.943. 648 * 811999 Tel ....................................... 5.974. 408 * 1011999 Cohenetal ............................. 70712
6.016. 487 * 6.018. 734 * 6.081. 750 *
1/2000 Rioux el al.............................. 1/2000 Zhang et a1.............................. 612000 Hoffberg et al .......................
7071'2 707/3 700117
* cited by examiner
U.S. Patent
Mar. 6,2001
Sheet 1 of 8
US 6,199,067 B l
0 c13
\
\
cv
0
\
00
cv
0
7
\
N N
u
LL
4-
\
ij
I
77-
U.S. Patent
Mar. 6,2001
Sheet 2 of 8
US 6,199,067 B l
/
r----I
I
I
f
I
Z
0
[v
0
. -
I
0 z
0
cv
F
I
1
I ,
/
(3
LL
U S . Patent
Mar. 6,2001
Sheet 3 of 8
US 6,199,067 B l
p
0
0 h l
0
P
cv
\
0
c\I
US. Patent
Mar. 6,2001
Sheet 4 of 8
US 6,199,067 B1
(3
LL
T
U.S. Patent
Mar. 6,2001
Sheet 5 of 8
US 6,199,067 B l
0
/
c-
Cr)
I
0
cc)
w
0
cv cc)
/
q -4
aD
0
cc)
0
?
c3
U.S. Patent
Mar. 6,2001
Sheet 6 of 8
US 6,199,067 I31
V
U.S. Patent
Mar. 6,2001
Sheet 7 of 8
US 6,199,067 Bl
1
1
D 1 /
0
I cv
f
s
0 0
(3
-I-
\
cu
d
'
d
co F
c-
UeSe Patent
Mar. 6,2001
Sheet 8 of 8
US 6,199,067 B l
I___)
03
cr)
/
L I
v
t
/
T
a0
N
/
I
v
1
1
US 6,199,067 B1
1
SYSTEM AND METHOD FOR GENERATING PERSONALIZED USER PROFILES AND FOR UTILIZING THE GENERATED USER PROFILES TO PERFORM ADAPTIVE INTERNET SEARCHES
RELATED APPLICATIONS This application claims priority from US. Provisional Patent Application Ser. No. 60/116,582, entitled "Internet Search Vehicles" which was filed on Jan. 20, 1999. FIELD OF THE INVENTION The present invention relates generally to the computer data searches and more particularly to a system and method for automatically generating personalized user profiles and for utilizing the generated profiles to perform adaptive Internet or computer data searches. BACKGROUND OF THE INVENTION In recent years, computers have taken the world by storm. Today, most businesses entirely rely on computers to conduct daily operations. In the academic world, computers for learning, teaching and have become essential research. In homes, computers are used to perform daily tasks ranging from paying bills to playing games. '&one unifying requirement for all computer applications is the ability of a user to utilize a computer to locate particular information or data desired by the user. the quantity and diversity of ~~~i~~ the past few information and services available over the public (e.g. Internet) and private (e,g. ~ ~ t local and wide ~areat ~~~ networks has grown substantially. In particular, the variety of information accessible through Internet-based services is growing rapidly both in terms of scope and depth. In simple terms, the Internet is a massive collection of individual computer networks operated by government, industry, that are linked together to academia, and private exchange infomation. while [he Internet was used mostly by scientists, the advent of the World Wide Web has brought the Internet into mainstream use. The World Wide Web (hereafter q q q q is an international, virtualnetwork-based information service composed of Internet host computers that provide on-line information in a specific hypertext format. WWW host servers provide hypertext metalanguage (HTML) formatted documents using a hypertext transfer protocol (HTTP). Information on the WWW is accessed with a hypertext browser, such as the Netscape navigator or Microsoft Explorer. Web sites are collections of interconnected WWW documents. Typically, users communicate with the Internet through a communication gateway that may be implemented and controlled by an internet sewice provider (i.e. an I S p j a company that offers a user access to the Internet and the WWW through a software application that controls communication between the user's computer and the communication gateway. The role of the ISP may also be taken directly by a particular organization that allows internet access to its employees or members. The user can access and navigate the WWW using a hypertext browser application residing on, and executed by, the user's computer. NO hierarchy exists in the m, the Same informaand tion may be found by many different approaches. Hypertext links in WWW HTML documents allow readers to move from one place in a document to another (or even between documents) as they want to. One of the advantages of
2
WWW, is that there is no predetermined order that must be followed in navigating through various WWW documents. Readers can explore new sources of information by following links from place to place. Following links has been made s as easy as clicking a mouse button on the link related to the subject a user wants to access. Each WWW document also has a unique uniform resource locator ("URL") that sewes as an "address" that, when followed, leads the user to the on document or file location the WWW. Using the browser, 1o the user can also mark and store i6favorites,,-uRLs of uarticular WWW documents that interest the user such that the user can quickly and easily return to these documents in the future by selecting them from the favorites list in the browser. 15 Because of the vastness of the Internet and the w w , locating specific information desired by the user can be very d a c u l t . To facilitate search for information a number of "search engines" have been developed and implemented. A search engine is a software application that searches the 20 Internet for web sites containing information on the subject in which the user is interested. These searches are accomOf in the art. plished in a Typically, a user first inputs a ``search string" to the hpefiext browser containing key words representative of the infor25 mation desired by the user. The search engine then applies the search string to a PreviouslY constructed index of a multitude of web sites to locate a certain number of web sites having content that matches the WX's Search string. The located web site URLs are then presented to the user 30 in the order of relevance to the key words i the user's search n PLANT string. For example, a user Providing the key ) would obtain an exhaustive list of all registered sites that refer to Plants. This list, however would be So large h a t the would want to limit this arch. Depending on the 35 search engine used, the WH could limit the search by entering a combination of key words such as the following: PLANTFLOWERciARDEN. This limit the search to only Internet sites that contain all three words. 4o In addition, users could further limit the search by entering GARDEN NOTTREE NOT PLANT AND FLOWER ORCHID. The from lhis search be further limited to exclude sites in which trees and orchids are listed as keywords. 45 A number of approaches have been developed to improve the performance and accuracy of typical key word searches. For example, U.S. Pat. No. 5,845,278, issued to Kirsch, et. al, teaches approaches to establishing a quantitative basis for selecting client database sets (Le. Internet documents or web 5o sites) that include the use of comprehensive indexing strategies, ranking systems based on training queries, expert systems using rule-based deduction methodologies, and inference networks. These approaches were used to examine knowledge base descriptions of client document collections 55 Or databases. However, the key word searching approaches utilized by previously known search engines suffer from a number of significant disadvantages. Most search systems are viewed as often ineffective in identifying the likely most relevant 60 documents. Accordingly, the users are often presented with overwhelming amounts of information in response to their key words. Thus, using proper key word searching techniques becomes an art in itself-an art that is outside the capabilities of most I n t m ~ users. t 65 Most importantly, typical key word and even more advanced searches only provide the user with search results that depend entirely on the search string entered by the user,
US 6,199,067 B1
3
without any regard to the user's cultural, educational, social backgrounds or the user's psychological profiles. The results returned by the search engines are tailored only to the search string provided by the user and not to the user's background. None of the previously known search engines tailor results of user's searches based on his or her background and unexpressed interests. For example, a twelve year old child using key word searches on the Internet for some information on computers may be presented with a multitude of documents that are far above the child's reading and educational level. In another example, a physician searching the Internet for information on a particular disease may be presented with dozens of web sites that contain very generic information, while the physician's "unexpressed" interest was to find web sites about the disease that are on his educational and professional level. It would thus be desirable to provide a system and method for extracting and using linguistic patterns of textual data to assist a user in locating requested data that, in addition to matching the user's specific request, also corresponds to the user's professional, cultural, educational, and social backgrounds as well a s to the user's psychological profile and thus addresses the user's "unexpressed" requests. SUMMARY OF THE INVENTION
4
Internet, are likewise analyzed and their linguistic patterns and frequencies thereof also extracted and stored in corresponding document profiles. When a search for particular data is initiated by the user, linguistic patterns are also extracted from a search string provided by the user into a search profile. The user profile i then cross matched with s the search profile and the document profiles to determine whether any linguistic patterns match in all three profiles and to determine the magnitude of the match based on summation of relative frequencies of matching patterns in the user profile and the document profile. The documents with document profiles having the highest matching magnitudes are presented to the user as not only matching the subject of the search string, but also as corresponding to the user's cultural, educational, and social backgrounds as well as the user's psychological profile. Thus, a world renowned physicist searching for information on quasars would be presented with very sophisticated physics documents that are oriented to wards his level of expertise. It should be noted that the user's background and psychological characteristics are not evident directly from the linguistic patterns themselves or form their frequencies. Accordingly, the system of the present invention matches the user's linguistic patterns to the linguistic patterns of data requested by the user without extracting any actual information about the user's background and psychological characteristics from the user profile. Thus, the user's privacy is not impinged by the creation and retention of the user profile. The profilingkearch system includes a local computer system, connected to a remote computer network (e.g. the Internet) via a telecommunication link. The local computer system includes a control unit and related circuitry for controlling the operation of the local computer system and for executing application programs, a memory for temporarily storing control program instructions and variables during the execution of application programs by the control unit; a storage memory for long term storage of data and application programs; and input devices for accepting input from the user. The local computer system further includes: output devices for providing output data to the user and a communication device for transmitting to, and receiving data from, the remote computer system via the telecommunication link. The remote computer system includes a communication gateway connected to the telecommunication link, a remote data storage system for long term data storage, and a remote computer system control unit (hereinafter RCS control unit). In summary, the system of the present invention operates in three separate independent stages, each stage being controlled by a particular control program executed by one of the local computer system and the remote computer system. In a first stage, a user profiling control program is executed to generate or update a user profile computer file representative of the user's linguistic patterns and the frequencies with which these patterns recur in texts submitted by the user and/or automatically acquired by the inventive system. The user is then invited to provide textual data composed by the user such as e-mail messages, memorandums, essays as well as documents composed by others that the user has adopted as "favorites", such as favorite web sites, short stories, etc. These textual documents are temporarily stored in a user data file. The inventive system also monitors the user's data searching and data browsing (e.g. Internet browsing) to automatically add additional textual information to the user data file. Once the user data file attains a sufficient size, or when other criteria for undatine the user urofile are met. the
5
2o
25
This invention relates to use of linguistic patterns of documents to assist a user in locating requested data that, in addition to matching the user's specific request, also corresponds to the user's cultural, educational, professional, and social backgrounds as well as to the user's psychological profile, and thus addresses the user's "unexpressed" requests. The present invention provides a system and method for automatically generating a personalized user profile based on linguistic patterns of documents provided by the user and for utilizing the generated profile to perform adaptive Internet or computer data searches. The system of the present invention advantageously overcomes the drawbacks of previously known data searching techniques. As was noted earlier, typical key word and even more advanced searches only provide the user with search results that depend entirely on the search string entered by the user, without any regard to the user's cultural, educational, professional, and social backgrounds or the to user's psychological profile. All texts composed by the user, or adopted by the user as favorite or inimical (such as a favorite book or short story), contain certain recurring linguistic patterns, or combinations of various parts of speech (nouns, verbs, adjectives, etc.) in sentences that reflect the user's cultural, educational, social backgrounds and the user's psychological profile. Research has shown that most people have readily identifiable linguistic patterns in their expression and that people with similar cultural, educational, and social backgrounds will have similar linguistic patterns. Furthermore, research has shown that such factors as psychological profile, life experience, profession, socioeconomic status, educational background, etc. contribute to determining the frequency of occurrences of particular linguistic patterns within the user's written expression. In accordance with the present invention, particular linguistic patterns and their frequencies of occurrence are extracted from the texts provided by a user of the system of the present invention and stored in a user profile data file. s The user profile data file i thus representative of the user's overall linguistic patterns and their respective frequencies. All documents in a remote computer system, such as the
30
35
40
45
so
ss
60
65
US 6,199,067 B3
5
system executes a profile extraction subroutine to create/ update the user profile by extracting linguistic patters from the user data file. During the profile extraction subroutine, the system retrieves individual textual documents from the user data file, and separates each document into sentences. The system then extracts a linguistic pattern, or a segment, from each sentence characterized by first identifying words in the sentence as being particuIar parts of speech (Le. nouns, verbs, adjectives, etc.), and then selecting a predetermined combination of the identified parts of speech and storing this combination as a segment. In a preferred embodiment of the present invention, each segment comprises a triad of three parts of speech: noun-verb-adjective. The segment extraction process is repeated for all textual documents in the user data file. The system then groups identical segments together and determines their frequency of occurrence in the user profile. Thus, the resulting user profile contains the linguistic patterns from all texts submitted by the user (or automatically gathered by the system) and the frequencies with which those patterns recur within the texts. In a second stage of the present invention, a data profiling control program is executed to generate data item profile computer files, representative of linguistic patterns and their respective frequencies, of all data items. The data items may include documents, web sites, and other textual data that may be subjected to a search by the user. A list of all data items and their respective data addresses (such as Internet URL addresses) is first provided to the system. The data item profile generation procedure is then performed for each data item in the list in a similar manner to the user-profiling procedure, except that data item address information is stored in each data item's profile. Thus, the resulting data item profile of each data item contain the data item address, the linguistic patterns of the data item and the frequencies with which those patterns recur therein. In a third stage of the present invention, the system executes a data searching program that enables a user to utilize the system to perform advanced searches for desired data files, such that the data files returned as search results correspond to the user's social, educational, and cultural backgrounds and to the user's psychological profile. The search program is initiated when the user provides a search string representative of data requested by the user to the system. The system then creates a search profile representative of linguistic patterns in the search string in a similar manner to the user-profiling procedure, except that frequencies of recurring segments are not recorded in the search profile. Optionally, the system expands the search profile by generating additional segments that contain synonyms of the parts of speech in the existing segments already in the search profile, and storing the additional segments therein. After the search profile is complete, the system retrieves the user profile of the user performing the search and compares the segments stored in the user profile with the segments stored in the search profile to determine a number of matches between various segments in each of the profiles and then, for each matching segment records the frequency with which the matching segment recurs within the user profile. The system then applies the original search string to a standard match engine to obtain a list of data item addresses that potentially match the user's search requirements and then retrieves the data item profiles corresponding to the data item addresses on the list. This procedure is optional but is recommended because a direct linguistic pattern search over all data items stored on the remote computer system can be very time consuming given the modern computing and data transfer technologies. The system then compares, for each data item profile, the segments stored in the data item profile with the segments stored in the search profile to determine a number of matches between various segments in each of the profiles and then, s for each matching segment records the frequency with which the matching segment recurs within the data item profile. Amatch value is then determined by the system for each segment in the data item profile that also appears in the search profile and in the user profile, by adding the frei o quency of the segment's occurrence in the data item profile to the frequency of the segment's occurrence in the user profile. Finally, the system computes a final value for each data item profile by adding together the match values of all matching segments in each data item. The final value is IS representative of the degree to which the linguistic pattern of the data item matches the linguistic pattern of the user in light of the linguistic pattern and subject matter of the search string. The data items, corresponding to data item profiles having the highest final values, are then retrieved by the 20 system. The system then presents the user with several data items having the highest final values, starting with the data item with the highest final value. Other objects and features of the present invention will become apparent from the following detailed description 25 considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention,,for which reference should be made to the appended claims.
30
6
BRIEF DESCRIPTION OF THE DRAWINGS In the drawings, wherein like reference characters denote elements throughout the several views: FIG. 1is a schematic block diagram of a profilinghearch 35 system for automatically generating personalized user profiles and for utilizing the generated profiles to perform adaptive Internet or computer data searches; FIG. 2 is a logic flow diagram representative.of a user 40 profiling control program executed by the profiling/search system of FIG. 1 in accordance with the present invention; FIGS. 3 to 4 are logic flow diagrams representative of a profile procedure subroutine program executed by the profilingisearch system of FIG. 1 in accordance with the 4s present invention; FIGS. 5 to 6 are logic flow diagrams representative of a data profile control program executed by the profiliag/search system of FIG. 1 in accordance with the present invention; and so FIGS. 7 to 8 are logic flow diagrams representative of a data searching program executed by the profilinghearch system of FIG. 1 in accordance with the present invention.
55
DETAILED DESCRIPTION O F PREFERRED EMBODIMENTS Although the present invention is described with reference to interfacing a local computer workstation with the Internet, it should be understood that the system and method of present invention may be applied, without departing from the spirit of the invention, to any arrangement where the local computer workstation is connected via a telecommus nication link to a remote computer system such a workstation or computer network, where the remote computer system may range from a single computer server workstation to a local area or distributed network. Furthermore it should be understood that the system and method of present
60
65
US 6,199,067 B1
invention may be applied, without departing from the spirit of the invention, to a self contained single computer workstation having long-term data storage. Finally, it should be noted that the system and method of the present invention are completely language independent and may thus be applied and utilized with any language. Referring initially to FIG. 1, a profiling/search system 10 for automatically generating personalized user profiles and for utilizing the generated profiles to perform adaptive Internet or computer data searches is shown. As shown, the profiling/search system 10 includes a local computer system computer network 30 via a 12, connected to a telecommunication link 26. The local computer system 12 includes a control unit 14, such as a CPU and related circuitry for controlling the operation of the local computer system 12 and for executing application programs, a memory 16, such as random access memory, for temporarily storing control program instructions and variables during the execution Of programs by the 14; a storage memory 18, such as flash memory or a disk drive for long term storage Of data and programs; and input device(s) 2o for accepting input from the user, that include at least one of the following input devices: a keyboard, a selection device (i.e. mouse, trackball, or touchpad), and a voice recognition device with speech to text capabilities. The local computer system 12 further includes: output device(s) 22 for providing output .data to the user, that include at least one of the following output devices: a display unit such as a CRT monitor or Bat panel display, a printer, and a text to speech device with sound output capabilities; and a communication device 24 for transmitting to, and receiving data from, the remote computer system 30 via the telecommunication link 26, such as a modem or other telecommunication device. The telecommunication link 26 may be a standard telephone line, a DSL line, a high speed data transmission such as a T1 or T3 line, or a wireless telecommunication (i.e. cellular or radio) link. The local computer system 12 may be a generally conventional desktop personal computer, an informational kiosk, or a portable computer such as a laptop or a personal digital assistant (PDA). The remote computer system 30 may be any remote computer system such as a single computer server or a network of interconnected computer systems, such as a local area or a wide area network. The remote computer system 30 includes a communication gateway 28, such as a modem and/or a network router connected to the telecommunication link 26; a remote data storage system 32 for long term data storage, and a remote computer system control unit 34 (hereinafter RCS control unit 34), such as a single CPU and associated devices, when the remote computer system 30 is a single computer server, or a set of independent CPUs and associated devices when the remote computer system 30 is a network of interconnected computer systems. The remote data storage system 32 may be a single data storage device, such as a disk drive, or a distributed data storage system over a plurality of separate interconnected computer systems each having individual data storage units (not shown). In an embodiment of the present invention depicted in FIG. 1, the remote computer system 30 is preferably the Internet (hereinafter, the remote computer system 30 is interchangeably referred to as the Internet 30). Before describing the present invention in greater detail, it is helpful to briefly describe the Internet and related concepts. Simply stated, the Internet is a massive collection of individual networks operated by government, industry, academia, and
7
8
private parties computers that are linked together to exchange information. While originally, the Internet was used mostly by scientists, the advent of the World Wide Web has brought the Internet into mainstream use. The World Wide Web (hereafter "WWW") is an international, virtualnetwork-based information service composed of Internet host computers that provide on-line information in a speciEc hypertext format. W w W host Servers provide hypertext metalanguage (HTML) formatted documents using a hypertext transfer Protocol ( H m ) . hformation on the WWW i s accessed with a hypertext browser such as the Netscape navigator or Microsoft Explorer. Web sites are collections of interconnected WWW documents. Assuming the remote computer system 30 is the Internet, certain functional explanation is necessary for the communication gateway 28, the remote data storage system 32, and the RCS control unit 34. The communication gateway 28 may be implemented and controlled by an Internet service provider (i.e. an 1SP)-a company that offers the user of the local computer e s to the www throughsystem 12 a aapplication Internet 30 and the a software stored in storage memory 18 that controls communication between the cornmunication device 24 and the communication gateway 28. ~ ~ ~ the ~ ~can access and navigate the www using i user l l ~ , a hypertext browser application residing on the local cornputer system 12. The remote data storage system 32 is not a single device, but is representative of the storage devices that are used by the multitude of Internet host computers and networks (not shown). The RCS control unit 34 is representative of a plurality of control units of the multitude of Internet host computers and networks (not shown). No hierarchy exists in the WWW, and the same information may be found by many different approaches. Hypertext links in WWW HTML documents allow readers to move from one place in a document to another (or even between documents) as they want to. One of the advantages of WWW is that there is no predetermined order that must be followed in navigating through various WWW documents. Readers can explore new sources of information by linking from place to place. This linking has been made as easy as clicking a m o u e button on the subject a user wants to access. Each WWW document also has a unique uniform resource locator ("URL,") that serves as an "address" that, when followed leads the user to the document or files location on the WWW. Using the browser, the user can also mark and store "favorites"-URLs of particular WWW documents that interest the user such that the user can quickly and easily return to these documents i the future by n selecting them from the favorites list in the browser. Because of the vastness of the Internet and the WWW, locating specific information desired by the user can be very difficult. To facilitate search for information a number of "search engines" have been developed and implemented. A search engine is a software application that searches the Internet for web sites containing information on the subject in which the user is interested. These searches are accomplished in a variety of ways-all well-known in the art. Typically, a user first inputs a "search string" to the hypertext browser containing key words representative of the information desired by the user. The search engine then applies the search string to a previously constructed index of a multitude of web sites to locate a certain number of web sites having content that matches the user's search string. The located web site URLs are then presented to the user in the order of relevance to the key words in the user's search string. However, as was noted earlier, typical key word and even more advanced searches only provide the user with search
5
lo
Is
2o
25
30
35
45
50
55
60
65
US 6,199,067 B1
9
10
results that depend entirely on the search string entered by central profile database located in a profile storage device the user, without any regard to the user's cultural, 36, such as a storage memory device attached to a specific Internet host computer, in the remote data storage system 32. educational, social backgrounds or the user's psychological profiles. For example, a twelve year old child using key Storing User-Profiles in the central profile database is word searches on the Internet for some information on 5 advantageous because a user may be able to utilize his or her computers may be presented with a multitude of documents User-Profile even when accessing the remote computer that are far above the child's reading and educational level. system 30 from a computer other than the local computer All texts composed by the user, or adopted by the user as system 12. favorite or inimical (such as a favorite book or short story), Thus, at the test 102, the control unit 14 searches the contain certain linguistic patterns, or combinations of vari- lo storage memory 18 to determine whether the local profile ous parts of speech (nouns, verbs, adjectives, etc.) in sendatabase contains a User-Profile that has been previously tences that reflect the user's cultural, educational, social created for the user that has been identified at the step 100. backgrounds and the user's psychological profile. Research In addition, since the User-Profile may be also stored in the h a s shown that most people have readily identifiable lincentral profile database in the profile storage device 36, at guistic patterns in their expression and that people with 1s the test 102 the control unit 14 also searches the profile similar cultural, educational, and social backgrounds will storage device 36 to determine whether the central profile have similar recurring linguistic patterns. In summary, in database contains a User-Profile that has been previously accordance with the present invention, particular linguistic created for the user. Optionally, if User-Profiles are stored patterns and their frequencies of recurrence are extracted both in a local profile database and the central profile from the texts provided by the users of the system of the 2o database, the control unit 14 ensures that both User-Profiles present invention and stored in a user profile data file. The are identical to one another, by replacing an older Useruser profile data file is thus representative of the user's Profile with a newer one if the User-Profiles in each of the overall linguistic patterns. All documents in a remote comdatabases differ from one another. puter system, such as the Internet, are likewise analyzed and If at the test 102, the control unit 14 determines that a their linguistic patterns and respective recurrence frequen- 25 User-Profile for the identified user does not exist, then a cies also extracted and stored in corresponding document new empty User-Profile is created at a step 104 and stored profiles. When a search for particular data is initiated by the in the storage memory 18. At a test 106, the control unit 14 user, linguistic patterns are also extracted from a search queries the user whether the user wishes to voluntarily string provided by the user into a search profile. The user contribute User-Data to the User-Profile. User-Data may profile is then cross matched with the search profile and the 30 be of two types-personal textual data generated by the user, document profiles to determine whether any linguistic patand favorite textual data generated by a source other that the terns match in all three profiles and to determine the maguser. Personal textual data preferably consists of any docunitude of the match based on relative recurrence frequencies ments created and composed by the user and may include, of matching user and document linguistic patterns. The but is not limited to: books, articles, memorandums, essays, documents with document profiles having the highest 35 compositions, e-mails, reports, and web sites. Favorite texmatching magnitudes are presented to the user as not only tual data preferably consists of any documents that were matching the subject of the search string, but also as created by a source other than the user but that the user has corresponding to the user's cultural, educational, and social adopted as being particularly interesting, fascinating, or backgrounds as well as the user's psychological profile. appealing, and may include, but is not limited to books, Thus, a world renowned physicist searching for information 40 articles, memorandums, essays, compositions, e-mails, on quasars would be presented with very sophisticated reports, and web sites. Furthermore, a user with an existing physics documents that are oriented to wards his level of User-Profile may initiate the user profiling control program expertise. from the test 106 when the user wishes to update his or her profile by supplying additional User-Data to the control Referring now to FIG. 2, a logic flow diagram representing a user profiling control program for the control unit 14 45 unit 14. of FIG. 1 in accordance with a preferred embodiment of the At the test 106, the user preferably instructs the control present invention is shown. As a matter of design choice, one unit 14 to acquire all of personal textual data stored in the or more of the steps of the user profiling control program storage memory 18, for example by scanning the user's may be executed by the RCS control unit 34 without "sent" e-mail folders, document directories and any direcdeparting from the spirit of the present invention. The 50 tories with any other documents that the user identifies as purpose of the user profiling control program is to generate personal textual data. Alternately, the user may identify or update a User-Profile computer file representative of the specific personal documents to be used as personal textual user's linguistic patterns (and thus representative of the items. The user may also instruct the control unit 14 to user's social, educational, and cultural background, as well acquire selected favorite textual data from documents idenas of the user's psychological profile). 55 tified by the user a s "favorite" that are stored in the storage memory 18, or instruct the control unit 14 to retrieve WWW The user profiling control program begins at a step 100 documents from the remote data storage system 32 of the where the user's identity is verified by the control unit 14, Internet 30 in accordance with the URLs stored in the for example by asking the user to provide a password or "favorites" section of the browser. In addition, the user may some form of a biometric identifier such as a fingerprint, a voice sample or a retinal image to the input device 20. At a 60 identify additional WWW documents to the control unit 14 test 102, the control unit 14 determines whether a Useras favorite textual data, such that the control unit 14 retrieves these additional documents and adds them to U s e r B a t a . Profile has been previously generated for the user. Because Furthermore, the user may specify, to the control unit 14, a particular local computer system 12 may be used by certain long texts such as full text classical books stored on multiple users, a variety of User-Profiles, one for each individual user, may be stored in the storage memory 18 in 65 the Internet 30 as being favorite textual data. For example, the user may regard Homer's Illiad as his favorite book and a local profile database. In addition to, or instead of, the local thus identify it as favorite textual data. Both personal and profile database, User-Profiles may be stored in a remote
US 6,199,067 B1 11
favorite textual data are stored in User-Data as TextItems-Le. individual text documents. User-Data is preferably structured as a computer data file that contains a number of sequential individual Text-Items that are separated from one another by Some sort of a delimiter readily identifiable by the control unit 14. The quantity and quality of User-Data Provided by the w~ is directly proportional to the quality, accuracy and usefulness Of the User-Profi1e that be based On the User-Data. the user is encouraged to provide as much personal and favorite textual data as possible. It should be noted that although the user may submit very personal texts as personal textual data to the control unit 14, as will be explained below in connection with FIG. 3, the User-Profile does not contain any private information about the user nor does it contain any textual excerpts from the user's private texts. Instead, as was previously explained, the control unit 14 extracts linguistic patterns from the texts rather than the actual information conveyed by the texts. Of course, in some circumstances the may not have any persona' Or favorite textual data stored in the storage memory 18. for examule if the local computer system 1"2 is bran2 new. Also,-it is possible that the user may not have enough knowledge of the Internet to specify any favorite web sites or on-line documents. In both cases the user may be unable to provide any User-Data to the control unit 14. It should be noted that after the completion of the user profiling control program, the User-Data file is purged by the control unit 14. If the control unit 14 determines at the test 106 that User-Data is to be contributed by the user, then at a step 108, the control unit 14 acquires User-Data, including personal and favorite Text-Items identified by the user at the test 106, from the storage memory 18 and/or from the remote data storage system 32 of the Internet 30. The control unit 14 then proceeds to a test 118. If, on the other hand, the control unit 14 determines at the test 106 that U s e r D a t a is not to be contributed (Le., for example if the user does not 8, have any data stored in the storage memory 1 ) then the control unit 14 proceeds to a step 110. Returning to the test 102, if at the test 102, the control unit 14 determines that a User-Profile for the identified user already exists, then at the step 110 the user begins an Internet browsing session using a hypertext browser (such as Netscape or Explorer). During a browsing session, the user may navigate through a variety of web sites, HTML documents or other types of Text-Items. In an alternate embodiment of the present invention, when the remote computer system 30 is not the Internet, at the step 110 the user may begin using any software application that may be installed on the local computer system 12 and that is configured for searching for data and/or for navigating through a plurality of data files, Le., Text-Items. It should be noted that steps 112-116 may be performed by the control unit 14 substantially simultaneously. Furthermore, it should be noted that steps 112 and 114 are optional. At a step 112, the control unit 14 begins to monitor the user's browsing session initiated at the step 110 for the entire duration of the browsing session. If the user spends more than a pre-determined "M" period of time viewing a particular Text-Item, then the control unit 14 adds the Text-Item to User-Data-in effect by spending more that a particular period of time browsing a Text-Item, the user has adopted the Text-Item as one of the user's favorite textual items. Preferably, the control unit 14 accumulates a total duration of time Q that each Text-Item is viewed by the user over a predetermined period P. If during the period P, Q exceeds the period M, then the control unit 14 adds the
12
Text-Item to User-Data. The time period P i s preferably 24 hours, but may be as long as one week, or longer. The period M may be one or more hours and is preferably set in accordance with the period P. Thus, for example, if P is set to 24 hours, M is preferably set between one to two hours, while if p is set to one week M may be set to five to ten hours. To illustrate the operation of the step 112, assuming p is set to 24 hours a d M is set to two hours, if the user views aparticular Text-Item for a total of two or more hours (viewing time Q is greater than M) during the 24 hour period, then the control unit 14 adds the viewed Text-Item to U s e r B a t a . At a step 114, the control unit 14 monitors the operation of the browser, such that when the user adds any Text-Item to the browser's "favorites" section, the control unit 14 automatically adds the T e x t l t e m to User-Data. For example, if the user visits a web site and the user becomes interested enough in the site's material that the user adds the web site (Text-Item) to the favorites section of the browser, the control unit 14 adds the TextJtem to User-Data. At a step 116, the control unit 14 monitors the operation of the browser to automatically add, as TextJtems to User-Data, any search strings that the user inputs into the browser. Thus, for example, when the user utilizes the browser's search capabilities to search for "computer that mimics human thinking process and artificial intelligence and neural network", the control unit 14 adds this search string to User-Data as a Text-Item. At the optional test 118, the control unit 14 determines if the User-Profile should be updated. If the User-Profile file was created at the step 104, then a determination of whether sufficient User-Data has been accumulated at the step 108 or the steps 112-116 may be required. Preferably, the control unit 14 counts the total number of words in all Text-Items in User-Data and compares the total to a predetermined word count threshold. If the total number of words in User-Data exceeds the word count threshold, then UserData is sufficient for updating the User-Profile, and the control unit 14 proceeds to the step 120. On the other hand, if the total number of words in User-Data is below the word count threshold, then U s e r D a t a is insufficient for updating of the initial User-Profile. The control unit 14 then returns to lhe step 110 where the user may continue the browsing session so that the control unit 14 may continue to accumulate additional Text-Items for User-Data at the steps 112 to 116. This approach is advantageous because it ensures that the User-Profile is based on sufficient linguistic data provided by the user before its utilization. If a new User-Profile based on insufficient linguistic data is used it may provide inaccurate results. The word count threshold may be selected as a matter of design choice, keeping in mind that the magnitude of User-Data is proportional to the accuracy of the User-Profile derived from User-Data. For example, the threshold total may be set between 1000 and 3000 words. Alternatively, instead of counting the total number of words i User-Data, the control unit 14 may count a total of all n Text-Items in User-Data and compare that total to another threshold. For example, the threshold may be set to twenty Text-Item s. If, at the test 102, the control unit 14 determined that a User-Profile for the user already exists, then the control unit 14 determines whether the existing User-Profile should be updated. If frequent updating of the existing UserJrofile is undesirable (for example to conserve computing resources), then an update criteria for updating the User-Profile may be
s
10
1s
20
25
3o
35
4o
4s
50
55
60
65
US 6,199,067 B1
13
set as matter of design choice. The update criteria may include, but is not limited to: a particular period of time between updates, for example updating no more than once per 24 hours, or addition of a particular number of words to U s e r D a t a during the steps 112 to 116 and/or the step 108 if the user voluntarily contributed Text-Items to User-Data to update the existing User-Profile. For example, this particular number of words may be 500 or more. If the update criteria is not met, then the control unit 14 returns to the step l l 0 . If, on the other hand frequent updating of the existing User-Profile is desired or if the update criteria has been met, then the control unit 14 proceeds to the step 120. At the step 120, the control unit 14 performs a profile procedure subroutine to update the User-Profile. Subroutines are known in the computer programming art as functions designed to perform specific tasks requested by a main control program. As a matter of design choice, one or more of the steps of the profile procedure subroutine may be executed by the RCS control unit 34 without departing from the spirit of the present invention. One of the advantages of using subroutines is that two or more programs can use the same subroutine to perform a particular function. Modern programming techniques also encompass programmable "objects" which function similarly to subroutines. The main advantage of programmable "objects" is that once an "object" is developed to perfom a particular function, it may be used in any program wishing to use that function. The purpose of the profile procedure subroutine is to compose/update the User-Profile by analyzing and extracting linguistic patterns from the T e x t l t e m s in User-Data sr and adding the extracted linguistic patterns to the U e Profile. Referring now to FIG. 3, the profile procedure subroutine begins at a step 200 and proceeds to a step 202 where the control unit 14 retrieves and opens the User-Profile from the storage memory 18. At a step 204, the control unit 14 retrieves the first T e x t l t e m from User-Data. At a step 206, the control unit 14 separates the retrieved Text-Item into at least one separate "sentence"-a collection of words from which linguistic patterns will be extracted to form the User-Profile. Most Text-Items are documents that consist of a plurality of typical grammatical sentences separated by "end of sentence" (hereinafter "EOS") punctuation marks, such as periods, coIons, and exclamation and question marks. Thus, the control unit 14 can readily separate a typical Text-Item into a number of separate sentences by identifying each separate sentence as a set of words ending in an EOS punctuation mark. Other Text-Items, such as search strings, may not have any EOS punctuation marks and may be of significant length. Furthermore, certain compound sentences, such a s patent claims, may contain multiple clauses and may also be of significant length. Preferably, a maximum sentence word count is defined as L as a matter of design choice. For example, L may be set to fifty words. The control unit 14 analyzes the Text-Item and counts words until a EOS punctuation is reached; if the word count reaches L and an EOS punctuation mark is not reached, then the control unit 14 identifies the L words as a sentence (Le. as if an EOS punctuation mark was actually reached at L words) and begins a new word count for the next sentence. For example if the Text-Item is a 158 word search string, and L i s set to fifty words, then this Text-Item will be separated into four sentences with fifty words in each of the first three sentences and eight words in the fourth sentence. If an EOS punctuation mark is reached before the word count reaches L, then the control unit 14 first identities the words before the EOS
14
punctuation mark as a sentence and then begins a new word count for the next sentence. At a test 208, the control unit 14 determines whether all sentences from the Text-Item retrieved at the step 204, have been retrieved. Because sentences are retrieved at a later step 210, during a first iteration of the test 208, where no sentences have been retrieved by the control unit 14 thus far, the control unit 14 proceeds directly to the step 210. During subsequent iterations, if all sentences have been retrieved from the current Text-Item, then the control unit 14 proceeds to a test 220 (FIG. 4). If, on the other hand, not all sentences have been retrieved, then the control unit 14 proceeds to the step 210. At the step 210, the control unit 14 retrieves the first sentence identified at the step 206 (or, during subsequent iteration of this step, retrieves the next sentence). The control unit 14, then identifies and tags each word in the retrieved sentence as a particular part of speech (hereinafter "POS")-i.e. a noun, pronoun, verb, etc. To simplify further processing of the POS, after tagging the POS, the control unit 14 automatically brings all verbs to simple present tense, and brings all nouns to singular form. For example in a sentence "Joe walked to his beautiful home", the control unit 14 would tag "Joe" and "home" as nouns, "walk" as a verb, "to" a s a preposition, "his" as a pronoun, and "beautiful" as an adjective. However, since for the purpose of performing data searches only a few POS are necessary, the control unit 14 preferably only identifies and tags certain predetermined POS such as nouns, verbs and adjectives. T i procedure is performed in accordance with standardhs ized rules of grammar. Automatic identification of parts of speech in a sentence is well known in the art and need not be described herein. For example, many conventional word processors unitize grammar checking functions that are capable if identifying parts of speech in a sentence. The particular POS that are identified and tagged by the control unit 14 may include, but are not limited to: noun, pronoun, verb, adverb, adjective, gerund, propositions, conjunctions and interjections. To simplify further processing of the POS, during the step 210, after tagging the POS the control unit 14 automatically brings all verbs to simple present tense, and brings all nouns to singular. At a test 212, the control unit 14 analyzes each word in the sentence and determines of it i a unique POS. Certain words s may be used as different parts of speech, for example, the word "police" may be used both as a noun and as a verb. This determination may be done with reference to a dictionary stored in the storage memory 18 or in the remote data storage system 32. If the word is a unique POS, then the controlunit 14 proceeds to a step 216. Otherwise, if the word is not a unique POS, then the control unit 14 proceeds to a step 214, where the control unit 14 tags the word with multiple tags in accordance with i possible POS usage. For b example, the word "police" would be tagged as a noun and as a verb. At the step 216, the control unit 14 extracts one or more segments from the sentence retrieved at the step 210 that are representative of the linguistic patterns of the sentence. A segment consists of one or more predetermined types of POS arranged i a predetermined order. The number, the n type, and the order of POS in a segment may be selected as a matter of design choice, depending on the purpose for which the User-Profile will be utilized. For the purpose of performing data searches, preferably each segment is a triad (Le. N=3) of three POS arranged as follows: noun-verbadjective. Thus, in accordance with this embodiment,
5
10
2o
25
30
35
4o
45
so
55
60
65
US 6,199,067 B1
15
previously, at the step 210, the control unit 14 only identifies and tags nouns, verbs and adjectives, and at the step 216 the control unit 14 extracts noun-verb-adjective segments from each sentence. Alternately, the following other arrangements may be used for the segment if desired: noun-adverb-adjective; gerund-verb-adjective; gerund-adverb-adjective; pronounverb-adjective; pronoun-adverb-adjective. Accordingly, the appropriate PoS used in the segment would need to have been previously tagged by the control unit 14 at the step 210. Fuflhemores in an alternate a d d i m e n t of the Present invention, the segments may consist of one or more Pas. Because a sentence may contain multiple POS of the same type, i.e. two nouns, several segments may potentially be composed by the control unit 14 from a single sentence. Thus,in accordance with the present invention, the control unit 14 extracts every possible noun-verb-adjective segment from the sentence. For example, if the sentence is "Joe walked to his beautiful new house", then the control unit 14 would extract the following segments therefrom: Joe-walk-beautiful Joe-walk-new house-walk-beau tiful house-walk-new However, if a particular sentence is missing one of the three POS (noun, verb, adjective) required in the segment, then the control unit 14 inserts a "blank" flag (for example the
16
execute-fast" with U P S C of 15, and at the step 222 the control unit 14 determines that five such segments were extracted from User-Data during the steps 204 to 220, then the control unit 14 adds the new UP-SC of 5 to the existing the new up-sc Of 2o next lo the up-sc Of l5 and segment set "instruction-execute-fast". A high U P S C for a segment is indicative of the relative importance of the segment as a representation of the uSerlS linguistic pattern. At a step 224, the control unit 14 sorts the identical segment groups in the User-Profile from the identical segment group with the highest UP-SC to the segment group with the lowest UP-SC. Thus, after the step 224, the User-fiofile may look as follows:
5
10
15
Segment computer-execute-at instruction-execute-fast
UP sc
.
.
27 20
5 1
20
Joe-walk-<> police-follow-vigdant
The number of different segment sets that may be stored in the User-Profile is practically limited only by the sizes of 25 the storage memory 18 and the remote data storage system 32, and the computing capabilities of the control unit 14 or of the RCS control unit 34. However, experimentation has large number Of segment sets in the shown that a User-Profile offers diminishing returns as balanced against characters "0") position of the missing POS. For into the the storage for the User-Profile and the comexample, if the sentence is "Joe walked to his house", then 30 puting power required for the control unit 14 in order to the control unit 14 would extract the following segments effectively work with the user-hofile. nus, preferably therefrom: only a certain amount of segment sets with the highest U P S C should be stored in the User-Profile. As a result, at Joe-walk-0 a step 226, the control unit 14 saves only Y of the segments 3s house-walk-0 having the highest U P S C s to the User-Profile, deleting all The blank flag "0" inserted by the control unit 14 into was the remaining segments, For example, may be set to 5ooo, the position of the adjective POS that was not present in the such that only 5ooo of the most commonly occurring sentence. segments are saved by the control unit 14 to the UserAt a step 218, the control unit 14 temporarily stores all Profile. Alternatively, Y% of the most commonly occurring segments extracted at the Step 216 in the USer-Profile and 40 segments may be saved to the User-Profile. For example, if then returns to the test 208. Y is set to 20, the control unit 14 may save the top 20% of Referring now to FIG. 3, at the test 220, the control unit the segments with the highest UP-sCs. 14 determines if all Text-Items have been retrieved from At a step 228, the control unit 14 returns the updated User-Profile to user profiling control program (FIG. 2). User-Data. If all T e x t l t e m s have not been retrieved, then the control unit 14 returns to the step 204 where the control 45 Returning now to FIG. 2, at a step 122, the control unit 14 stores the updated UserProfile in at least one of the locaI unit 14 retrieves the next Text-Item. Otherwise, if the control unit 14 determines that all Text-Items have been profile database in the storage memory 18 and the central profile database in the profile storage device 36. Preferably, retrieved from User-Data, then the control unit 14 proceeds the UserProfile is stored "confidentially"-i.e. encrypted to a step 222. Thus, in summary, during steps 204 to 220, the control SO and protected by a password or by other access control unit 14 retrieves all Text-Items from User-Data, splits means such a s biometrics (e.g. a fingerprint scan, voice each Text-Item into sentences, analyzes each sentence to pattern matching, etc.) such that only the user can access and extract segments representative of the sentence's linguistic update his or her User-Profile. The controI unit 14 then ends user profiling control program at a step 124, or optionally patterns and stores the extracted segments in User-Profile. At the step 222, the control unit 14 groups identical ss returns to the step 110, where the user can continue the segments together into sets, counts the occurrence of idenbrowsing session. tical segments in each set, and then records the number of Referring now to FIG. 5, a logic flow diagram representidentical segments in each set in User-Profile as Usering a data profiling control program for the control unit 14 Profile segment count (hereinafter "UP-SC') next to each of FIG. 1 in accordance with a preferred embodiment of the set of identical segments. For example, if the segment 60 present invention is shown. Data-Item refers to any "computer-execute-fast" appears twenty seven times in document, whether flat text or hypertext, that may be a target User-Profile, the UP-SC for that segment would be during a potential data search by the user. Accordingly, Data-Items include all documents that are stored in the recorded next to that segment as "27". If the User-Profile already contains an identical segment set with an existing remote data storage system 32 on the remote computer U P S C , then the UP-SC determined at the step 222 is 65 system 30. For example, if the remote computer system 30 added to the existing U P S C . For example, if the Useris th
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?