PersonalWeb Technologies LLC v. Google, Inc. et al

Filing 1

COMPLAINT for Patent Infringement against All Defendants ( Filing fee $ 350 receipt number 0540-3354008.), filed by PersonalWeb Technologies LLC. (Attachments: # 1 Civil Cover Sheet, # 2 Exhibit A - Patent '791, # 3 Exhibit B - Patent '280, # 4 Exhibit C - Patent '442, # 5 Exhibit D - Patent '310, # 6 Exhibit E - Patent '539, # 7 Exhibit F - Patent '544, # 8 Exhibit G - Patent '662, # 9 Exhibit H - Patent '096)(Tribble, Max) (Additional attachment(s) added on 12/14/2011: # 10 Exhibit A Searchable, # 11 Exhibit B Searchable, # 12 Exhibit C Searchable, # 13 Exhibit D Searchable, # 14 Exhibit E Searchable, # 15 Exhibit F Searchable, # 16 Exhibit G Searchable, # 17 Exhibit H Searchable) (mjc, ). (Entered: 12/08/2011)

Download PDF
Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 1 of 57 PageID #: 14 EXHIBIT A Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 2 of 57 PageID #: 15 111111 1111111111111111111111111111111111111111111111111111111111111 US005978791A United States Patent [11] Farber et ai. [54] Filed: 4,922,414 5/1990 Holloway et al. ...................... 364/200 4,972,367 11/1990 Burke ...................................... 364/900 5,007,658 4/1991 Bendert et al. ... ... .... ... ... ... ... ... 395/600 5,025,421 6/1991 Cho .................................... 365/230.05 5,050,074 9/1991 Marca ..................................... 364/200 5,050,212 9/1991 Dyson ....................................... 380/25 5,057,837 10/1991 Colwell et al. ........................... 341/55 5,129,081 7/1992 Kobayashi et al. ..................... 395/600 5,129,082 7/1992 Tirling et al. .. ... ... .... ... ... ... ... ... 395/600 5,144,667 9/1992 Pogue, Jr. et al. ........................ 380/45 5,179,680 1/1993 Colwell et al. ......................... 395/425 5,202,982 4/1993 Gramlich et al. ....................... 395/600 5,208,858 5/1993 Vollert et al. ............................. 380/43 5,276,901 1/1994 Howell et al. .......................... 395/800 5,301,286 4/1994 Rajani ..................................... 395/400 5,301,316 4/1994 Hamilton et al. ....................... 395/600 5,343,527 8/1994 Moore ......................................... 380/4 5,357,623 10/1994 Megory-Cohen ....................... 395/425 5,384,565 1/1995 Cannon .............................. 340/825.44 5,404,508 4/1995 Konrad et al. .......................... 395/600 Appl. No.: 08/960,079 [22] Nov. 2, 1999 Assignee: Kinetech, Inc., Northbrook, Ill. [21] Date of Patent: Inventors: David A. Farber, Ojai, Calif.; Ronald D. Lachman, Northbrook, Ill. [73] 5,978,791 DATA PROCESSING SYSTEM USING SUBSTANTIALLY UNIQUE IDENTIFIERS TO IDENTIFY DATA ITEMS, WHEREBY IDENTICAL DATA ITEMS HAVE THE SAME IDENTIFIERS [75] Patent Number: [45] [19] Oct. 24, 1997 Related U.S. Application Data [63] Continuation of application No. 08/425,160, Apr. 11, 1995, abandoned. [51] [52] [58] Int. CI. 6 ...................................................... G06F 17/30 U.S. CI. .................................... 707/2; 707/1; 707/200 Field of Search ..................................... 707/2, 1,200 [56] References Cited U.S. PATENT DOCUMENTS 3,668,647 4,215,402 4,290,105 4,376,299 4,405,829 4,412,285 4,414,624 4,441,155 4,464,713 4,490,782 4,571,700 4,577,293 4,642,793 4,675,810 4,691,299 4,725,945 4,773,039 4,887,235 4,888,681 6/1972 7/1980 9/1981 3/1983 9/1983 10/1983 11/1983 4/1984 8/1984 12/1984 2/1986 3/1986 2/1987 6/1987 9/1987 2/1988 9/1988 12/1989 12/1989 Evangelisti et al. ................. 340/172.5 Mitchell et al. ........................ 364/200 Cichelli et al. ... ... ... ... .... ... ... ... 364/200 Rivest ..................................... 364/900 Rivest et al. ........................... 178/22.1 Neches et al. .......................... 364/200 Summer, Jr. et al. .................. 364/200 Fletcher et al. ......................... 364/200 Benhase et al. .. ... ... ... .... ... ... ... 364/200 Dixon et al. ............................ 364/200 Emry, Jr. et al. .. ... ... .... ... ... ..... 364/900 Matick et al. .......................... 365/189 Meaden ................................... 364/900 Gruner et al. .......................... 364/200 Rivest et al. ............................ 365/185 Kronstadt et al. ... ... ... ... .... ... ... 364/200 Zamora ................................... 364/900 Holloway et al. ... ... ... ... .... ... ... 364/900 Barnes et al. .. ... ... ... ... .... ... ... ... 364/200 OTHER PUBLICATIONS Witold Litwin et ai, Linear Hashing for Distributed Files, ACM SIGMOD, May, 1993 pp. 327-336. Ming-Ling Lo, et ai, On Optimal Processor Allocation to Support Pipelined Hash loins, ACM SIGMOD, pp. 69-78, May 1993. Thomas A. Berson, Differential Cryptanalysis Mod 2 32 with Applications to MD5, pp. 69-81, 1992. (List continued on next page.) Primary Examiner-Paul V. Kulik Assistant Examiner-lean R. Homere Attorney, Agent, or Firm-Pillsbury Madison & Sutro LLP [57] ABSTRACT In a data processing system, a mechanism identifies data items by substantially unique identifiers which depend on all of the data in the data items and only on the data in the data items. The system also determines whether a particular data item is present in the database by examining the identifiers of the plurality of data items. 48 Claims, 31 Drawing Sheets =_:r--_102 I"'l ~ ["126l L":':J ["126l ~ ["126l L-J ["126l ~ MEMORY Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 3 of 57 PageID #: 16 5,978,791 Page 2 OlliER PUBLICATIONS William Perrizo, et aI., Distributed Join Processing Performance Evaluation, 1994. Twenty-Seventh Hawaii International Conference on System Sciences, vol. II, pp. 236-244. A concurrency Control Mechanism based on Extendible Hashing for Main Memory Database Systems, Vijay Kumar, pp. 109-113, ACM, vol. 3, 1989. Birgit Pfitzmann, Sorting Out Signature Schemes, Nov. 1993, 1st Conf. Computer & Comm. Security '93 pp. 74-85. Bert dem Boer, et aI., Collisions for the compression function of MDs pp. 292-304, 1994. Sakti Pramanik, et aI., Multi-Directory Hashing, 1993, Info. Sys., vol. 18, No.1, pp. 63-74. Murlidhar Koushik, Dynamic Hashing With Distributed Overflow Space: A File Organization With Good Insertion Performance, 1993, Info. Sys., vol. 18, No.5, pp. 299-317. Witold Litwin, et aI., LH*-Linear Hashing for Distributed Files, HP Labs Tech. Report No. HPL-93-21 Jun. 1993 pp. 1-22. Yuliang Zheng, et aI., HAVAL - A One-Way Hashing Algorithm with Variable Length of Output (Extended Abstract), pp. 83-105, Advances in Cryptology, AUSCRIPT '92,1992. Chris Charnes and Josef Pieprzky, Linear Nonequivalence versus Nonlinearity, Pieprzky, pp. 156-164, 1993. Zhiyu Tian, et aI., A New Hashing Function: Statistical Behaviour and Algorithm, pp. 3-13, SIGIR Forum, 1993. G. L. Friedman, Digital Camera With Apparatus For Authentication of Images Produced From an Image File, NASA Case No. NPO-19108-1-CU, Serial No. 08/159,980, Nov. 24, 1993. H. Goodman, Feb. 9, 1994 Ada, Object-Oriented Techniques, and Concurrency in Teaching Data Sructures and File Management Report Documentation P. AD-A275 385 - 94-04277. Advances in Cryptology-EUROCRYPT '93, Workshop on the Theory and Application of Cryptographic Techniques Lofthus, Norway, May 23-27, 1993 Proceedings. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, vol. 22, Issue 2, Jun. 1993. Advances in Cryptology-AUSCRYPT '92 - Workshop on the Theory and Application of Cryptographic Techniques Gold Coast, Queensland, Australia Dec. 13-16, 1992 Proceedings. Search Report dated Jun. 24, 1996. Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2, 1999 Filed 12/08/11 Page 4 of 57 PageID #: 17 0 0:: 0r- ~ 0 O N 0 ,... ,... CI) CI) W 0 0 CI) CI) N 0 5,978,791 Sheet 1 of 31 0:: a. w (.) 0 ~ a. • I • • c -. lL. I O::! 0 CI) CJ) N 0 ,... (!) W 0 I 0 0::: I , a. 0:: 0 'W (!)W ~ ~~ ~ 0 fij N 0 ,... CI) CI) W (.) 0 ~ a. 1- 0 )CI) • • • 0:: 0 N 0 ,... f\ ~w ~ ~ g ~ 0 £ij 1-0 \ ) CI) CJ) CJ) w (.) 0 0:: a. ~I Case 6:11-cv-00656 Document 1-2 u.s. Patent r---- Filed 12/08/11 Page 5 of 57 PageID #: 18 Nov. 2, 1999 5,978,791 Sheet 2 of 31 -------------------------------------------------l I I : I I I I I I I ~ I L'- I I I C) I I ~ I~ ~ II~ ~ II~ ~ II~ ~ II~ ~ I i o I H I I I IC\J 10 I I I I I I ....... 0:: ('<)...l w 00::: [JI- [JI- 81- E]...l ~ C\I 0 ..-...l C\I ....... U. C\I «') 1-..- ..- CJ) LO ..- ..- (!) I I I I I I I~ IC) I I I t ~ CJ) /\ CJ) W U : ~ ..q ~ co :J C\I ~ ~ W .J(!)W .~ <2:: ~ ~ ~~ ~ fij ,u t I I I I I t : I 0.. I- C I I \ } CJ) I I _______________________________________________________ J L L: . (!) lL. FILE SYSTEM I REGION I I 117 ~ ~ I 117 REGION ~ ..... ..... REGION • • • = I r117 REGION I z o 118 118 DIRECTORY ••• DIRECTORY ~N '""'" \C \C \C 'JJ. =- 1 ( FILE FILE) ~ ~ ..... l ~ FILE o ...., ~ '""'" I 122 SEGMENT ~ I 12~ SEGMENT SEGMENT '-- Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 6 of 57 PageID #: 19 DIRECTORY 118 ~ Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2, 1999 Filed 12/08/11 Page 7 of 57 PageID #: 20 5,978,791 Sheet 4 of 31 FIG.3 138 Region ID Pathname True Name Type File ID Time of last access Time of last modification Safe flag Lock flaq Size Owner FIG.4 140 True Name File ID Compressed File ID Source IDs Dependent processors Use count Time of last access Expiration Grooming delete count 142 Reqion ID Reqion file system Reqion pathname Region status Mirror processor(s) Mirror duplication count Policy FIG.5 144 FIG.6 ~ source ID ~ ..... ..... source type ~ = source rights source availability source location 146 ~ Original Name ~N Operation '""'" \C \C \C Type Processor ID Timestamp 'JJ. =- Pathname ~ ~ ..... True Name FIG.8 l::::: :::~ Ul ---- 148 o ...., ~ '""'" True Name 150 FIG.9 IT~e Name l~censee Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 8 of 57 PageID #: 21 FIG. 7 z o Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 9 of 57 PageID #: 22 Nov. 2, 1999 Sheet 6 of 31 5,978,791 FIG. 10(0) SIMPL DATA ITEM ", -------------- --------------\ I \ S212 I I , \ COMPUTE MD FUNCTION ON DATA ITEM S214 APPEND LENGTH MODULO 32 OF , DATA ITEM , I \ I \ \ \ , .... ,. ----------------------------TRUE NAME ~ I Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 10 of 57 PageID #: 23 Nov. 2, 1999 8216 _YES 5,978,791 Sheet 7 of 31 0 _ _..., FIG. IO(b) DATA ITEM SIMPLE? S220 PARTITION DATA ITEM INTO SEGMENTS S222 ASSIMILATE EACH SEGMENT (COMPUTING ITS TRUE NAME) I ~ r - - - - -82 f8- - ---'\ r , r ' , , COMPUTE TRUE : , NAME OF SIMPLE : , : DATA ITEM : ------ I S224 / CREATE INDIRECT BLOCK OF SEGMENT TRUE NAMES 8226 ASSIMILATE INDIRECT BLOCK (COMPUTING ITS TRUE NAME) S228 REPLACE FINAL 32 BITS OF TRUE NAME WITH LENGHT MOD 32 OF DATA ITEM ~ 8230 FI G. II ~ ..... ..... DETERMINE TRUE NAME a---/" ~ = z 0 ---- .... _- .......... _. ,. __ .... _ .... _I. -_ ",-YES ~ ~N . 8236 'JJ. * * * * =- ~ ~ CREATE NEW ENTRY SET USE COUNT TO 1 STORE FILE 10 SET OTHER FIELDS ... ..... 00 o ...., ~ '""'" 8238 8239 DELETE FILE 10 STORE FILE ID Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 11 of 57 PageID #: 24 '""'" \C \C \C Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 12 of 57 PageID #: 25 Nov. 2, 1999 5,978,791 Sheet 9 of 31 FIG.12 YES S240 UPDATE DEPENDENCY LIST NO S242 SEND MESSAGE TO ~--------------~ CACHESERVERTO UPDATE CACHE S244 COMPRESS (IF DESIRED) 8246 MIRROR (IF DESIRED) Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 13 of 57 PageID #: 26 Nov. 2,1999 5,978,791 Sheet 10 of 31 FIG.13 8250 SEARCH FOR THE FAIL PATH NAME FOUND S258 ASSIMILATE FILE 10 S256 FREEZE DIRECTORY Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2, 1999 Filed 12/08/11 Page 14 of 57 PageID #: 27 5,978,791 Sheet 11 of 31 8260 CONFIRM THAT TRUE NAME EXISTS LOCALLY 8262 FIG.14 SEARCH FOR PATHNAME IN LDETABLE 8264 CONFIRM THAT DIRECTORY EXISTS YES NO 8270 CREATE ENTRY IN LDE & UPDATE S268 DELETE TRUE FILE NO ./ ........... ~ ~ ..... ..... YES ~ = NEGATIVE RESPONSE S278 REQUEST MOUNT S274 SEND RTF MESSAGE & WAIT FOR RESPONSE z o ~ ~N POSITIVE RESPONSE FAIL 'JJ. =- ~ ~ ..... 'N " ""' S276 S280 FIND FILE S282 l----------.-t-.I FIG.15 ENTER TRUE FILE RETURNED INTO TFR o ...., ~ '""'" VERIFY TRUE FILE (IF I .... - - - - - - - - - J ...I DESIRED) Ol .... \C ~ oe .... ~ \C ~ Filed 12/08/11 Page 15 of 57 PageID #: 28 '""'" \C \C \C Case 6:11-cv-00656 Document 1-2 ~ • rJJ. • Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2,1999 Filed 12/08/11 Page 16 of 57 PageID #: 29 5,978,791 Sheet 13 of 31 "" C "'"' <.0 -L L • (!) ...J « 11. ~~ o CIJ II-CIJ z< O I-CIJ ZI- w-< w _0 ...J< 00 (33: 0:: m LU L -____________________________ ~C/) I0 i=:~ --0 ::;:, ~ ll. LU ffifact~ <::ctO~ ~ ~ ..... ..... ~ = S290 0 STORE PROCESSOR ID ~ Z 0 ~ ~N S290B LOOK UP TFR FOR TRUE NAME & ADD SOURCE LOCATION ID TO SOURCE IDS FOR TRUE NAME FIG. 16(b) 'JJ. =- ~ ~ ..... '""'" ~ o ...., ~ '""'" S291c SEND MESSAGE TO RESERVE TRUE FILE ON SOURCE PROCESSOR o ~ S290C OURCE IS ->- -1 PUBLISHING SYSTEM? .. ~ YES S291d DETERMINE EXPIRATION DATE AND ADD TO LIST Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 17 of 57 PageID #: 30 '""'" \C \C \C Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 18 of 57 PageID #: 31 Nov. 2,1999 5,978,791 Sheet 15 of 31 (J) (J) w w Z ~ co 0) c ..J N 0 en ~ a.. :is 0 0 w c fa 0 z ).: fa );; 0 <:: - • (!) 0 <:: Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 19 of 57 PageID #: 32 Nov. 2,1999 Sheet 16 of 31 5,978,791 I I i r , 0 0 .... ('C') -~ CI) I c W ... 0:::: ~ 0 (!) l.L W Z • 0 C len I W ::J~Ci) en N>-O:::: o~w Cl)O::J Z CI) ('C')I-en 1- 9 ow 0 wo ('C') ..JO:::: "¢ W::J eno en llJ9 tt llJ 0<"> ::s!5 00 <:(1) W ..J ~u:: co c:(W 0 ('C') CI) 0'" 00 ..J~ W 0:::: 0:::: 0 I-o::::w ('C') wu..° CD 0 CI) ~w!5 ..J..J0 l5U::en 0:::: f3 >: ~ FIG.18{c) ,--YES, ~ ..... ..... ~ = z o ~ '""'" \C \C \C 'JJ. =- DELETE TRUE FILE 8320 CREATENSW I~~~______~ seRATCH FILE ~ ~ ..... 8322 MAKE TRUE '""'" -..J o ...., ~ '""'" FILE LOCAL Ul .... \C DONE ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 20 of 57 PageID #: 33 YEs{;;] ~N Case 6:11-cv-00656 Document 1-2 d • rJl • ~ - - - - -r-- - ---- ~ ..... ..... ~ = 0 / USE COUNT """ YES z 0 ~ ~N FIG.18(b) '""'" \C \C \C .- 'JJ. =- S330 COpy FILE TO NEW FILE, STORE FILE 10 IN LDE TABLE, DECREMENT USE COUNT ~ ~ ..... S328 SAVE FILE 10 & REMOVETFR ENTRY '""'" 00 0 ...., ~ '""'" Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 21 of 57 PageID #: 34 ..... Case 6:11-cv-00656 Document 1-2 d • rJl • ~ ~ S332 ..... ..... ~ INCREMENT FREEZE LOCK = FIG. 19(0) z o f ~ y- ~N ~ FOR EACH S334 SUBORDINATE r-----I~~I FREEZE IF FILE AND DIRECTORY DIRECTORY IN THE GIVEN DIRECTORY \ . ) H '""'" \C \C \C i S336 ASSIMILATE UNASSIMILATED FILE 'JJ. =- ~ ~ ..... '""'" \C o ...., ~ '""'" S337 CREATE NEW DATA ITEM ___ I-_ Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 22 of 57 PageID #: 35 ""'" Case 6:11-cv-00656 Document 1-2 . d • rJl • / ~ ~ ~ ~ 1 FOR EACH 8338 SUBORDINATE ADD ENTRY TO .---~~~ NEWDATA FILE AND DIRECTORY IN THE ITEM GIVEN DIRECTORY I 8340 I I ~ RECORD ADDITIONAL DESIRED INFORMATION ~ ..... ..... ~ = z o J ~ ~N '""'" \C \C \C ASSIMILATE THE NEW DATA ITEM 'JJ. =- FIG.19(b) ~ ~ ..... N C o ...., ~ • '""'" S344 DECREMENT THE FREEZE LOCK ... Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 23 of 57 PageID #: 36 ~ S342 Case 6:11-cv-00656 Document 1-2 d • rJl • ~ ~ ..... ..... ~. S346 FIG. 20 ~ = MAKE TRUE FILE LOCAL z o ~ ~,. 8354 ) NO MORE ENTruES \ S353 FOR EACH DIRECTORY ENTRY .4~ ~N ""\ S348 MORE t-ENTRIES .... ~ READ DIRECTORY + S350 CREATE FULL PATHNAME • '""'" \C \C \C 'JJ. =- ~ ~ ..... N '" o"'" ...., ~ '""'" S352 LINK PATH TO TRUE NAME Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 24 of 57 PageID #: 37 r L_DONE~ ~ Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2,1999 Filed 12/08/11 Page 25 of 57 PageID #: 38 5,978,791 Sheet 22 of 31 S354 WAIT FOR FREEZE LOCK TO TURN OFF S356 FINDTFR ENTRY FIG.21 S358 DECREMENT REFERENCE COUNT YES S362 DELETE TRUE FILE NO S364 REMOVE FILE 10 AND COMPRESSED FILE to Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 26 of 57 PageID #: 39 Sheet 23 of 31 Nov. 2,1999 5,978,791 S365 GET OPERATION FIG. 22 S368 ">--_YES _ _ _ _ _~ ASSIMILATE S369 NEW TRUE FILE ~_YES S378 MODIFY USE COUNT OF EACH COMPONENT S379 FOR EACH PARENT DIRECTORY OR FILE, UPDATE USE COUNT, LAST ACCESS AND MODIFY TIMES S370 RECORD TRUE NAME IN AUDIT FILE Case 6:11-cv-00656 Document 1-2 u.s. Patent Nov. 2,1999 Filed 12/08/11 Page 27 of 57 PageID #: 40 5,978,791 Sheet 24 of 31 ." 8382 FIG. 23 VERIFY GROOMING LOCK OFF S384 SET GROOMING LOCK ." S386 SET GROOM COUNTS ." Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 28 of 57 PageID #: 41 Nov. 2,1999 Sheet 25 of 31 S388 FIND LDE RECORD FIG. 24 ." S390 FIND TFR RECORD S392 INCREMENT GROOMING DELETE COUNT ." S394 ADJUST FILE SIZES 5,978,791 Case 6:11-cv-00656 Document 1-2 u.s. Patent Filed 12/08/11 Page 29 of 57 PageID #: 42 Nov. 2,1999 Sheet 26 of 31 5,978,791 FIG. 25 S396 DELETE FILE S398 UNLOCK GROOMING LOCK ~ _____NO c YES ____---, FIG. 26(0) ~ ~ ..... ..... ~ = 8404 8408 OPEN DETERMINE REGION O~ PROHIBIT z o ~ ~N YES_ 'JJ. =- ~ ~ ..... N -..J 0 ...., ~ YES '""'" 8422 PROHIBIT OPEN Ul .... \C YES ----- ""-l . 00 .... ""-l \C ~ Filed 12/08/11 Page 30 of 57 PageID #: 43 '""'" \C \C \C Case 6:11-cv-00656 Document 1-2 d • rJl • ~ ~ ..... ..... ~ = YES I ..I", LOCK IF NOT LOCKED z o ~ ~- S417 CREATE SCRATCH COpy S406 CREATE SCRATCH FILE S424 RETURN SCRATCH FILE ID , 1-' ~ FIG.26(b) . 5420 MAKE LOCAL VERSION & RETURN FILE ID FROMTFR . 'JJ. =- ~ ~ ..... N 00 o ...., ~ '""'" Ul .... \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 31 of 57 PageID #: 44 '""'" \C \C \C '--"10 _ _--, ~ERASEFILE ~N Case 6:11-cv-00656 Document 1-2 d • rJl • ~ ~ ..... ..... S422 DETERMINE LDE & RTENTRY RECORDS FOR FILE ~ = z 0 ~ N ~ ~ PROHIBIT DELETION 'JJ. =- ~ ~ ..... N \C 0 N°O ...., Y ~ '""'" 8424 IDENTIFY TRUE FILE FROM TRUE NAME Y I FIG. 27(0) Ul .... \C -.....l 00 .... -.....l \C ~ Filed 12/08/11 Page 32 of 57 PageID #: 45 '""'" \C \C \C Case 6:11-cv-00656 Document 1-2 d • rJl • ~ ~ ..... ..... ~ ) NO _ _ _--. = z o ~ ") DELETE SCRATCH COPY OF FILE YES ~N '""'" \C \C \C 'JJ. =- ~ ~ ..... S430 ~ c DELETE TRUE FILE 8431 REDUCE USE COUNT BY ONE I o ...., ~ I ~ '""'" S428 ~ ADD ENTRY TO AUDIT FILE Ul .... \C FIG. 27(b) ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 33 of 57 PageID #: 46 S427 Case 6:11-cv-00656 Document 1-2 d • rJl • ~ S432 LOOKUP TRUE NAME YES FIG. 28 ~ ..... ..... ~ = NO z o ~ ~N '""'" \C \C \C =- ~ ~ ..... ~ '""' ....," 0 ~ '""'" S444 POSITIVE RESPONSE Ul .... NEGATIVE RESPONSE \C ""-l 00 .... ""-l \C ~ Filed 12/08/11 Page 34 of 57 PageID #: 47 80 ~ FO~;RD ~YES~EQUESTTO REQUEST FORWARDED? 'JJ. Case 6:11-cv-00656 Document 1-2 d • rJl • Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 35 of 57 PageID #: 48 5,978,791 1 2 object-oriented programming, locations in memory or on a physical device, or the like) are always defined relative to a specific context. For instance, the file identified by a particular file name can only be determined when the directory 5 containing the file (the context) is known. The file identified by a pathname can be determined only when the file system (context) is known. Similarly, the addresses in a process This is a continuation of application Ser. No. 08/425,160, address space, the keys in a database table, or domain names filed on Apr. 11, 1995, which was abandoned upon the filing on a global computer network such as the Internet are hereof. 10 meaningful only because they are specified relative to a BACKGROUND OF THE INVENTION context. In prior art systems for identifying data items there is no 1. Field of the Invention direct relationship between the data names and the data item. This invention relates to data processing systems and, The same data name in two different contexts may refer to more particularly, to data processing systems wherein data 15 different data items, and two different data names in the items are identified by substantially unique identifiers which same context may refer to the same data item. depend on all of the data in the data items and only on the In addition, because there is no correlation between a data data in the data items. name and the data it refers to, there is no a priori way to 2. Background of the Invention confirm that a given data item is in fact the one named by a Data processing (DP) systems, computers, networks of 20 data name. For instance, in a DP system, if one processor computers, or the like, typically offer users and programs requests that another processor deliver a data item with a various ways to identify the data in the systems. given data name, the requesting processor cannot, in Users typically identify data in the data processing system general, verify that the data delivered is the correct data by giving the data some form of name. For example, a (given only the name). Therefore it may require further typical operating system (OS) on a computer provides a file 25 processing, typically on the part of the requestor, to verify system in which data items are named by alphanumeric that the data item it has obtained is, in fact, the item it identifiers. Programs typically identify data in the data requested. processing system using a location or address. For example, A common operation in a DP system is adding a new data a program may identify a record in a file or database by using item to the system. When a new data item is added to the 30 a record number which serves to locate that record. system, a name can be assigned to it only by updating the In all but the most primitive operating systems, users and context in which names are defined. Thus such systems programs are able to create and use collections of named require a centralized mechanism for the management of data items, these collections themselves being named by names. Such a mechanism is required even in a multiidentifiers. These named collections can then, themselves, processing system when data items are created and identified be made part of other named collections. For example, an 35 at separate processors in distinct locations, and in which OS may provide mechanisms to group files (data items) into there is no other need for communication when data items directories (collections). These directories can then, themare added. selves be made part of other directories. A data item may In many data processing systems or environments, data thus be identified relative to these nested directories using a items are transferred between different locations in the sequence of names, or a so-called pathname, which defines 40 system. These locations may be processors in the data a path through the directories to a particular data item (file processing system, storage devices, memory, or the like. For or directory). example, one processor may obtain a data item from another As another example, a database management system may processor or from an external storage device, such as a group data records (data items) into tables and then group 45 floppy disk, and may incorporate that data item into its these tables into database files (collections). The complete system (using the name provided with that data item). address of any data record can then be specified using the However, when a processor (or some location) obtains a database file name, the table name, and the record number of data item from another location in the DP system, it is that data record. possible that this obtained data item is already present in the Other examples of identifying data items include: identi- 50 system (either at the location of the processor or at some fying files in a network file system, identifying objects in an other location accessible by the processor) and therefore a object-oriented database, identifying images in an image duplicate of the data item is created. This situation is database, and identifying articles in a text database. common in a network data processing environment where In general, the terms "data" and "data item" as used herein proprietary software products are installed from floppy disks refer to sequences of bits. Thus a data item may be the 55 onto several processors sharing a common file server. In contents of a file, a portion of a file, a page in memory, an these systems, it is often the case that the same product will object in an object-oriented program, a digital message, a be installed on several systems, so that several copies of digital scanned image, a part of a video or audio signal, or each file will reside on the common file server. any other entity which can be represented by a sequence of In some data processing systems in which several probits. The term "data processing" herein refers to the pro- 60 cessors are connected in a network, one system is designated cessing of data items, and is sometimes dependent on the as a cache server to maintain master copies of data items, type of data item being processed. For example, a data and other systems are designated as cache clients to copy processor for a digital image may differ from a data prolocal copies of the master data items into a local cache on an cessor for an audio signal. as-needed basis. Before using a cached item, a cache client In all of the prior data processing systems the names or 65 must either reload the cached item, be informed of changes to the cached item, or confirm that the master item correidentifiers provided to identify data items (the data items being files, directories, records in the database, objects in sponding to the cached item has not changed. In other words, DATA PROCESSING SYSTEM USING SUBSTANTIALLY UNIQUE IDENTIFIERS TO IDENTIFY DATA ITEMS, WHEREBY IDENTICAL DATA ITEMS HAVE THE SAME IDENTIFIERS Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 36 of 57 PageID #: 49 5,978,791 3 4 a cache client must synchronize its data items with those on the system provides the size, age, and location of groups the cache server. This synchronization may involve reloadof data items in order to decide whether they can be safely ing data items onto the cache client. The need to keep the removed from a local file system; cache synchronized or reload it adds significant overhead to the system can efficiently record and preserve any colexisting caching mechanisms. 5 lection of data items; In view of the above and other problems with prior art the system can efficiently make a copy of any collection systems, it is therefore desirable to have a mechanism which of data items, to support a version control mechanism for allows each processor in a multiprocessor system to detergroups of the data items; mine a common and substantially unique identifier for a data the system can publish data items, allowing other, possiitem, using only the data in the data item and not relying on 10 bly anonymous, systems in a network to gain access to the any sort of context. data items and to rely on the availability of the data items; It is further desirable to have a mechanism for reducing the system can maintain a local inventory of all the data multiple copies of data items in a data processing system and items located on a given removable medium, such as a to have a mechanism which enables the identification of diskette or CD-ROM, the inventory is independent of other identical data items so as to reduce multiple copies. It is 15 properties of the data items such as their name, location, and further desirable to determine whether two instances of a date of creation; data item are in fact the same data item, and to perform the system allows closely related sets of data items, such various other systems' functions and applications on data as matching or corresponding directories on disconnected items without relying on any context information or prop20 computers, to be periodically resynchronized with one erties of the data item. another; It is also desirable to provide such a mechanism in such the system can verify that data retrieved from another a way as to make it transparent to users of the data location is the desired or requested data, using only the data processing system, and it is desirable that a single mechanism be used to address each of the problems described 25 identifier used to retrieve the data; the system can prove possession of specific data items by above. content without disclosing the content of the data items, for SUMMARY OF THE INVENTION purposes of later legal verification and to provide anonymity; This invention provides, in a data processing system, a the system tracks possession of specific data items accordmethod and apparatus for identifying a data item in the 30 ing to content by owner, independent of the name, date, or system, where the identity of the data item depends on all of other properties of the data item, and tracks the uses of the data in the data item and only on the data in the data item. specific data items and files by content for accounting Thus the identity of a data item is independent of its name, purposes. origin, location, address, or other information not derivable directly from the data, and depends only on the data itself. 35 Other objects, features, and characteristics of the present invention as well as the methods of operation and functions This invention further provides an apparatus and a method of the related elements of structure, and the combination of for determining whether a particular data item is present in parts and economies of manufacture, will become more the system or at a location in the system, by examining only apparent upon consideration of the following description the data identities of a plurality of data items. Using the method or apparatus of the present invention, 40 and the appended claims with reference to the accompanying drawings, all of which form a part of this specification. the efficiency and integrity of a data processing system can be improved. The present invention improves the design and BRIEF DESCRIPTION OF THE DRAWINGS operation of a data storage system, file system, relational database, object-oriented database, or the like that stores a FIGS. l(a) and l(b) depicts a typical data processing plurality of data items, by making possible or improving the 45 system in which a preferred embodiment of the present design and operation of at least some or all of the following invention operates; features: FIG. 2 depicts a hierarchy of data items stored at any the system stores at most one copy of any data item at a location in such a data processing system; given location, even when multiple data names in the system 50 FIGS. 3-9 depict data structures used to implement an refer to the same contents; embodiment of the present invention; and the system avoids copying data from source to destination FIGS. 10(a)-28 are flow charts depicting operation of locations when the destination locations already have the various aspects of the present invention. data; DETAILED DESCRIPTION OF THE the system provides transparent access to any data item by PRESENTLY PREFERRED EXEMPLARY reference only to its identity and independent of its present 55 EMBODIMENTS location, whether it be local, remote, or offline; the system caches data items from a server, so that only An embodiment of the present invention is now described the most recently accessed data items need be retained; with reference to a typical data processing system 100, when the system is being used to cache data items, 60 which, with reference to FIGS. l(a) and l(b), includes one problems of maintaining cache consistency are avoided; or more processors (or computers) 102 and various storage devices 104 connected in some way, for example by a bus the system maintains a desired level of redundancy of data 106. items in a network of servers, to protect against failure by ensuring that multiple copies of the data items are present at Each processor 102 includes a CPU 108, a memory 110 different locations in the system; 65 and one or more local storage devices 112. The CPU 108, memory 110, and local storage device 112 may be internally the system automatically archives data items as they are connected, for example by a bus 114. Each processor 102 created or modified; Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 37 of 57 PageID #: 50 5,978,791 5 6 may also include other devices (not shown), such as a processor, a storage device, a removable storage medium keyboard, a display, a printer, and the like. (such as a floppy disk or compact disk), or any other physical location in the system. The term "local" with respect to a In a data processing system 100, wherein more than one particular processor 102 refers to the memory and storage processor 102 is used, that is, in a multiprocessor system, the processors may be in one of various relationships. For 5 devices of that particular processor. example, two processors 102 may be in a client/server, In the following, the terms "True Name", "data identity" client/client, or a server/server relationship. These interand "data identifier" refer to the substantially unique data processor relationships may be dynamic, changing dependidentifier for a particular data item. The term "True File" ing on particular situations and functions. Thus, a particular refers to the actual file, segment, or data item identified by processor 102 may change its relationship to other proces- 10 a True Name. sors as needed, essentially setting up a peer-to-peer relationA file system for a data processing system 100 is now ship with other processors. In a peer-to-peer relationship, described which is intended to work with an existing opersometimes a particular processor 102 acts as a client ating system by augmenting some of the operating system's processor, whereas at other times the same processor acts as file management system codes. The embodiment provided a server processor. In other words, there is no hierarchy 15 relies on the standard file management primitives for actuimposed on or required of processors 102. ally storing to and retrieving data items from disk, but uses In a multiprocessor system, the processors 102 may be the mechanisms of the present invention to reference and homogeneous or heterogeneous. Further, in a multiprocessor access those data items. data processing system 100, some or all of the processors The processes and mechanisms (services) provided in this 102 may be disconnected from the network of processors for 20 embodiment are grouped into the following categories: periods of time. Such disconnection may be part of the primitive mechanisms, operating system mechanisms, normal operation of the system 100 or it may be because a remote mechanisms, background mechanisms, and extended particular processor 102 is in need of repair. mechanisms. Within a data processing system 100, the data may be Primitive mechanisms provide fundamental capabilities organized to form a hierarchy of data storage elements, 25 used to support other mechanisms. The following primitive wherein lower level data storage elements are combined to mechanisms are described: form higher level elements. This hierarchy can consist of, for 1. Calculate True Name; example, processors, file systems, regions, directories, data 2. Assimilate Data Item; files, segments, and the like. For example, with reference to 3. New True File; FIG. 2, the data items on a particular processor 102 may be 30 organized or structured as a file system 116 which comprises 4. Get True Name from Path; regions 117, each of which comprises directories 118, each 5. Link path to True Name; of which can contain other directories 118 or files 120. Each 6. Realize True File from Location; file 120 being made up of one or more data segments 122. 35 7. Locate Remote File; In a typical data processing system, some or all of these 8. Make True File Local; elements can be named by users given certain implementation specific naming conventions, the name (or pathname) of 9. Create Scratch File; an element being relative to a context. In the context of a 10. Freeze Directory; data processing system 100, a pathname is fully specified by 40 11. Expand Frozen Directory; a processor name, a filesystem name, a sequence of zero or 12. Delete True File; more directory names identifying nested directories, and a 13. Process Audit File Entry; final file name. (Usually the lowest level elements, in this 14. Begin Grooming; case segments 122, cannot be named by users.) 15. Select For Removal; and In other words, a file system 116 is a collection of 45 16. End Grooming. directories 118. A directory 118 is a collection of named files Operating system mechanisms provide typical familiar 120---both data files 120 and other directory files 118. A file file system mechanisms, while maintaining the data struc120 is a named data item which is either a data file (which tures required to offer the mechanisms of the present invenmay be simple or compound) or a directory file 118. A simple file 120 consists of a single data segment 122. A 50 tion. Operating system mechanisms are designed to augment existing operating systems, and in this way to make the compound file 120 consists of a sequence of data segments present invention compatible with, and generally transparent 122. A data segment 122 is a fixed sequence of bytes. An to, existing applications. The following operating system important property of any data segment is its size, the mechanisms are described: number of bytes in the sequence. 1. Open File; A single processor 102 may access one or more file 55 systems 116, and a single storage device 104 may contain 2. Close File; one or more file systems 116, or portions of a file system 116. 3. Read File; For instance, a file system 116 may span several storage 4. Write File; devices 104. 5. Delete File or Directory; In order to implement controls in a file system, file system 60 6. Copy File or Directory; 116 may be divided into distinct regions, where each region 7. Move File or Directory; is a unit of management and control. A region consists of a 8. Get File Status; and given directory 118 and is identified by the pathname (user defined) of the directory. 9. Get Files in Directory. In the following, the term "location", with respect to a 65 Remote mechanisms are used by the operating system in data processing system 100, refers to any of a particular responding to requests from other processors. These mechaprocessor 102 in the system, a memory of a particular nisms enable the capabilities of the present invention in a Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 38 of 57 PageID #: 51 5,978,791 7 8 peer-to-peer network mode of operation. The following 100. However, they can also be shared by placing them on remote mechanisms are described: a remote, shared file server (for instance, in a local area network of machines). In order to accommodate sharing data 1. Locate True File; structures, it is necessary that the processors accessing the 2. Reserve True File; 5 shared database use the appropriate locking techniques to 3. Request True File; ensure that changes to the shared database do not interfere 4. Retire True File; with one another but are appropriately serialized. These 5. Cancel Reservation; locking techniques are well understood by ordinarily skilled 6. Acquire True File; programmers of distributed applications. 10 7. Lock Cache; It is sometimes desirable to allow some regions to be local to a particular processor 102 and other regions to be shared 8. Update Cache; and among processors 102. (Recall that a region is a unit of file 9. Check Expiration Date. system management and control consisting of a given direcBackground mechanisms are intended to run occasionally tory identified by the pathname of the directory.) In the case and at a low priority. These provide automated management capabilities with respect to the present invention. The fol- 15 of local and shared regions, there would be both local and shared versions of each data structure. Simple changes to the lowing background mechanisms are described: processes described below must be made to ensure that 1. Mirror True File; appropriate data structures are selected for a given operation. 2. Groom Region; The local directory extensions (LDE) table 124 is a data 3. Check for Expired Links; and 20 structure which provides information about files 120 and 4. Verify Region; and directories 118 in the data processing system 100. The local 5. Groom Source List. directory extensions table 124 is indexed by a pathname or Extended mechanisms run within application programs contextual name (that is, a user provided name) of a file and over the operating system. These mechanisms provide soluincludes the True Name for most files. The information in tions to specific problems and applications. The following 25 local directory extension table 124 is in addition to that provided by the native file system of the operating system. extended mechanisms are described: The True File registry (TFR) 126 is a data store for listing 1. Inventory Existing Directory; actual data items which have True Names, both files 120 and 2. Inventory Removable, Read-only Files; segments 122. When such data items occur in the True File 3. Synchronize directories; 30 registry 126 they are known as True Files. True Files are 4. Publish Region; identified in True File registry 126 by their True Names or 5. Retire Directory; identities. The table True File registry 126 also stores 6. Realize Directory at location; location, dependency, and migration information about True 7. Verify True File; Files. 8. Track for accounting purposes; and 35 The region table (RT) 128 defines areas in the network 9. Track for licensing purposes. storage which are to be managed separately. Region table The file system herein described maintains sufficient 128 defines the rules for access to and migration of files 120 information to provide a variety of mechanisms not ordiamong various regions with the local file system 116 and narily offered by an operating system, some of which are remote peer file systems. listed and described here. Various processing performed by 40 The source table (ST) 130 is a list of the sources of True this embodiment of the present invention will now be Files other than the current True File registry 126. The described in greater detail. source table 130 includes removable volumes and remote In some embodiments, some files 120 in a data processing processors. system 100 do not have True Names because they have been The audit file (AF) 132 is a list of records indicating recently received or created or modified, and thus their True 45 changes to be made in local or remote files, these changes to Names have not yet been computed. A file that does not yet be processed in background. have a True Name is called a scratch file. The process of The accounting log (AL) 134 is a log of file transactions assigning a True Name to a file is referred to as assimilation, used to create accounting information in a manner which and is described later. Note that a scratch file may have a preserves the identity of files being tracked independent of user provided name. 50 their name or location. Some of the processing performed by the present invenThe license table (LT) 136 is a table identifying files, tion can take place in a background mode or on a delayed or which may only be used by licensed users, in a manner as-needed basis. This background processing is used to independent of their name or location, and the users licensed determine information that is not immediately required by to use them. the system or which may never be required. As an example, 55 in some cases a scratch file is being changed at a rate greater Detailed Descriptions of the Data Structures than the rate at which it is useful to determine its True Name. In these cases, determining the True Name of the file can be The following table summarizes the fields of an local postponed or performed in the background. directory extensions table entry, as illustrated by record 138 Data Structures 60 in FIG. 3. The following data structures, stored in memory 110 of one of more processors 102 are used to implement the Field Description mechanisms described herein. The data structures can be local to each processor 102 of the system 100, or they can Region ID identifies the region in which this file is reside on only some of the processors 102. 65 contained. the user provided name or contextural name Pathname The data structures described are assumed to reside on individual peer processors 102 in the data processing system Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 39 of 57 PageID #: 52 5,978,791 9 -continued Field True Name Type Scratch File ID Time of last access Time of last modification Safe flag Lock flag Size Owner 10 -continued Description of the file or directory, relative to the region in which it occurs. the computed True Name or identity of the file or directory. This True Name is not always up th date, and it is set to a special value when a file is modified and is later recomputed in the background. indicates whether the file is a data file or a directory. the physical location of the file in the file system, when no True Name has been calculated for the file. As noted above, such a file is called a scratch file. the last access time to this file. If this file is a directory, this is the last access time to any file in the directory. the time of last change of this file. If this file is a directory, this is the last modification time of any file in the directory. indicates that this file (and, if this file is a directory, all of its subordinate files) have been backed up on some other system, and it is therefore safe to remove them. indicates whether a file is locked, that is, it is being modified by the local processor or a remote processor. Only one processor may modify a file at a time. the full size of this directory (including all subordinate files), if all files in it were fully expanded and duplicated. For a file that is not a directory this is the size of the actual True File. the identity of the user who owns this file, for accounting and license tracking purposes. Field 5 Use count 10 15 20 A region table 128, specified by a directory pathname, records storage policies which allow files in the file system to be stored, accessed and migrated in different ways. Storage policies are programmed in a configurable way using a set of rules described below. Each region table record 142 of region table 128 includes the fields described in the following table (with reference to FIG. 5): Field 25 Description Region ID internally used identifier for this region. file system on the local processor of which this region is a part. a pathname relative to the region file system which defines the location of this region. The region consists of all files and directories subordinate to this pathname, except those in a region subordinate to this region. zero or more identifiers of processors which are to keep mirror or archival copies of all files in the current region. Multiple mirror processors can be defined to form a mirror group. number of copies of each file in this region that should be retained in a mirror group. specifies whether this region is local to a single processor 102, shared by several processors 102 (if, for instance, it resides on a shared file server), or managed by a remote processor. the migration policy to apply to this region. A single region might participate in several policies. The policies are as follows (parameters in brackets are specified as part of the policy): region is a cached version from [processor 10]; region is a member of a mirror set defined by [processor 10]. region is to be archived on [processor 10]. region is to be backed up locally, by placing new copies in [region ID]. region is read only and may not be changed. region is published and expires on [date]. Files in this region should be compressed. Region file system Region pathname 30 Mirror processor(s) 35 Each record of the True File registry 126 has the fields shown in the True File registry record 140 in FIG. 4. The True File registry 126 consists of the database described in the table below as well as the actual True Files identified by the True File IDs below. Field computed True Name or identity of the file. compressed version of the True File may be stored insteaded of, or in addition to, an uncompressed version. This field provides the identity of the actual representation of the compressed version of the file. tentative count of how many references have been selected for deletion during a grooming operation. most recent date and time the content of this file was accessed. date and time after which this file may be deleted by this server. processor IDs of other processors which contain references to this True File. source ID(s) of zero or more sources form which this file or data item may be retrieved. identity or disk location of the actual physical representation of the file or file segment. It is sufficient to use a filename in the registration directory of the Mirror duplication count Region status 40 Description True Name Compressed File ID Grooming delete count Time of last access Expiration Dependent processors Source IDs True File 10 Description underlying operation system. The True File ID is absent if the actual file is not currently present at the current location. number of other records on this processor which identify this True File. Policy 45 50 55 60 65 A source table 130 identifies a source location for True Files. The source table 130 is also used to identify client processors making reservations on the current processor. Each source record 144 of the source table 130 includes the fields summarized in the following table, with reference to FIG. 6: Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 40 of 57 PageID #: 53 5,978,791 11 12 Field Description Field Description source ID internal identifier used to identify a particular source. type of source location: Removable Storage Volume Local Region Cache Server Mirror Group Server Cooperative Server Publishing Server Client includes information about the rights of this processor, such as whether it can ask the local processor to store data items for it. measurement of the bandwidth, cost, and reliability of the connection to this source of True Files. The availability is used to select from among several possible sources. information on how the local processor is to access the source. This may be, for example, the name of a removable storage volume, or the processor ID and region path of a region on a remote processor. True Name True Name of a data item subject to license validation. identity of a user authorized to have access to this 0 bj ect. source type 5 licensee Various other data structures are employed on some or all of the processors 102 in the data processing system 100. Each processor 102 has a global freeze lock (GFL) 152 source (FIG. 1), which is used to prevent synchronization errors rights when a directory is frozen or copied. Any processor 102 may include a special archive directory (SAD) 154 into which 15 directories may be copied for the purposes of archival. Any source availability processor 102 may include a special media directory (SMD) 156, into which the directories of removable volumes are stored to form a media inventory. Each processor has a grooming lock 158, which is set during a grooming operasource location 20 tion. During this period the grooming delete count of True File registry entries 140 is active, and no True Files should be deleted until grooming is complete. While grooming is in effect, grooming information includes a table of pathnames selected for deletion, and keeps track of the amount of space 25 that would be freed if all of the files were deleted. The audit file 132 is a table of events ordered by Primitive Mechanisms timestamp, each record 146 in audit file 132 including the The first of the mechanisms provided by the present fields summarized in the following table (with reference to invention, primitive mechanisms, are now described. The FIG. 7): mechanisms described here depend on underlying data man30 agement mechanisms to create, copy, read, and delete data items in the True File registry 126, as identified by a True Field Description File ID. This support may be provided by an underlying operating system or disk storage manager. path of the file in question. Original Name whether the file was created, read, Operation The following primitive mechanisms are described: written, copied or deleted. 35 1. Calculate True Name; Type specifies whether the source is a file or a directory. 2. Assimilate Data Item; Processor ID ID of the remote processor generating 3. New True File; this event (if not local). 4. Get True Name from Path; time and date file was closed (required Timestamp only for accessed/modified files). 40 5. Link Path to True Name; Name of the file (required only for Pathname 6. Realize True File from Location; rename). computed True Name of the file. This is True Name 7. Locate Remote File; used by remote systems to mirror changes 8. Make True File Local; to the directory and is filled in during background processing. 9. Create Scratch File; 45 10. Freeze Directory; 11. Expand Frozen Directory; Each record 148 of the accounting log 134 records an 12. Delete True File; event which may later be used to provide information for billing mechanisms. Each accounting log entry record 148 13. Process Audit File Entry; includes at least the information summarized in the follow- 50 14. Begin Grooming; ing table, with reference to FIG. 8: 15. Select For Removal; and 16. End Grooming. 1. Calculate True Name Description Field A True Name is computed using a function, MD, which 55 date and time of this log entry. date of reduces a data block B of arbitrary length to a relatively entry small, fixed size identifier, the True Name of the data block, type of Entry types include create file, such that the True Name of the data block is virtually delete file, and transmit file. entry guaranteed to represent the data block B and only data block True Name True Name of data item in question. identity of the user responsible for owner 60 B. this action. The function MD must have the following properties: 1. The domain of the function MD is the set of all data items. The range of the function MD is the set of True Each record 150 of the license table 136 records a Names. relationship between a licensable data item and the user licensed to have access to it. Each license table record 150 65 2. The function MD must take a data item of arbitrary length and reduce it to an integer value in the range 0 includes the information summarized in the following table, with reference to FIG. 9: to N-1, where N is the cardinality of the set of True 10 Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 41 of 57 PageID #: 54 5,978,791 13 Names. That is, for an arbitrary length data block B, 14 A mechanism for calculating a True Name given a data item is now described, with reference to FIGS. 10(a) and O~MD(B)~N. 10(b). 3. The results of MD(B) must be evenly and randomly A simple data item is a data item whose size is less than distributed over the range of N, in such a way that simple or regular changes to B are virtually guaranteed 5 a particular given size (which must be defined in each particular implementation of the invention). To determine to produce a different value of MD(B). the True Name of a simple data item, with reference to FIG. 4. It must be computationally difficult to find a different 10(a), first compute the MD function (described above) on value B' such that MD(B)=MD(B'). the given simple data item (Step S212). Then append to the 5. The function MD(B) must be efficiently computed. resulting 128 bits, the byte length modulo 32 of the data item A family of functions with the above properties are the 10 (Step S214). The resulting 160-bit value is the True Name of so-called message digest functions, which are used in digital the simple data item. security systems as techniques for authentification of data. A compound data item is one whose size is greater than These functions (or algorithms) include MD4, MD5, and the particular given size of a simple data item. To determine SHA. the True Name of an arbitrary (simple or compound) data In the presently preferred embodiments, either MD5 or 15 item, with reference to FIG. 10(b), first determine if the data SHA is employed as the basis for the computation of True item is a simple or a compound data item (Step S216). If the Names. Whichever of these two message digest functions is data item is a simple data item, then compute its True Name employed, that same function must be employed on a in step S218 (using steps S212 and S214 described above), system-wide basis. otherwise partition the data item into segments (Step S220) It is impossible to define a function having a unique 20 and assimilate each segment (Step S222) (the primitive output for each possible input when the number of elements mechanism, Assimilate a Data Item, is described below), computing the True Name of the segment. Then create an in the range of the function is smaller than the number of indirect block consisting of the computed segment True elements in its domain. However, a crucial observation is that the actual data items that will be encountered in the Names (Step S224). An indirect block is a data item which operation of any system embodying this invention form a 25 consists of the sequence of True Names of the segments. very sparse subset of all the possible inputs. Then, in step S226, assimilate the indirect block and compute its True Name. Finally, replace the final thirty-two (32) A colliding set of data items is defined as a set wherein, bits of the resulting True Name (that is, the length of the for one or more pairs x and y in the set, MD(x)=MD(y). indirect block) by the length modulo 32 of the compound Since a function conforming to the requirements for MD must evenly and randomly distribute its outputs, it is 30 data item (Step S228). The result is the True Name of the compound data item. possible, by making the range of the function large enough, to make the probability arbitrarily small that actual inputs Note that the compound data item may be so large that the encountered in the operation of an embodiment of this indirect block of segment True Names is itself a compound invention will form a colliding set. data item. In this case the mechanism is invoked recursively To roughly quantify the probability of a collision, assume 35 until only simple data items are being processed. that there are no more than 230 storage devices in the world, Both the use of segments and the attachment of a length to the True Name are not strictly required in a system using and that each storage device has an average of at most 2 20 different data items. Then there are at most 2 50 data items in the present invention, but are currently considered desirable the world. If the outputs of MD range between 0 and 2 128 , features in the preferred embodiment. it can be demonstrated that the probability of a collision is 40 2. Assimilate Data Item approximately 1 in 229 . Details on the derivation of these A mechanism for assimilating a data item (scratch file or probability values are found, for example, in P. Flajolet and segment) into a file system, given the scratch file ID of the A. M. Odlyzko, "Random Mapping Statistics," Lecture data item, is now described with reference to FIG. 11. The purpose of this mechanism is to add a given data item to the Notes in Computer Science 434: Advances in CryptologyEurocrypt '89 Proceedings, Springer-Verlag, pp. 329-354. 45 True File registry 126. If the data item already exists in the Note that for some less-preferred embodiments of the True File registry 126, this will be discovered and used during this process, and the duplicate will be eliminated. present invention, lower probabilities of uniqueness may be acceptable, depending on the types of applications and Thereby the system stores at most one copy of any data mechanisms used. In some embodiments it may also be item or file by content, even when multiple names refer to useful to have more than one level of True Names, with 50 the same content. First, determine the True Name of the data item corresome of the True Names having different degrees of uniquesponding to the given scratch File ID using the Calculate ness. If such a scheme is implemented, it is necessary to ensure that less unique True Names are not propagated in the True Name primitive mechanism (Step S230). Next, look for system. an entry for the True Name in the True File registry 126 While the invention is described herein using only the 55 (Step S232) and determine whether a True Name entry, True Name of a data item as the identifier for the data item, record 140, exists in the True File registry 126. If the entry other preferred embodiments use tagged, typed, categorized record includes a corresponding True File ID or compressed or classified data items and use a combination of both the File ID (Step S237), delete the file with the scratch File ID True Name and the tag, type, category or class of the data (Step S238). Otherwise store the given True File ID in the item as an identifier. Examples of such categorizations are 60 entry record (step S239). If it is determined (in step S232) that no True Name entry files, directories, and segments; executable files and data exists in the True File registry 126, then, in Step S236, create files, and the like. Examples of classes are classes of objects in an object-oriented system. In such a system, a lower a new entry in the True File registry 126 for this True Name. degree of True Name uniqueness is acceptable over the Set the True Name of the entry to the calculated True Name, entire universe of data items, as long as sufficient uniqueness 65 set the use count for the new entry to one, store the given is provided per category of data items. This is because the True File ID in the entry and set the other fields of the entry tags provide an additional level of uniqueness. as appropriate. Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 42 of 57 PageID #: 55 5,978,791 15 16 Because this procedure may take some time to compute, record 140 of the corresponding True Name; note whether it is intended to run in background after a file has ceased to the entry is a directory by reading the True File to see if it change. In the meantime, the file is considered an unassimicontains a tag (magic number) indicating that it represents a lated scratch file. frozen directory (see also the description of the Freeze 5 Directory primitive mechanism regarding the tag); and com3. New True File pute and set the other fields of the local directory extensions The New True File process is invoked when processing appropriately. For instance, search the region table 128 to the audit file 132, some time after a True File has been identify the region of the path, and set the time of last access assimilated (using the Assimilate Data Item primitive and time of last modification to the current time. mechanism). Given a local directory extensions table entry record 138 in the local directory extensions table 124, the 10 6. Realize True File from Location This mechanism is used to try to make a local copy of a New True File process can provide the following steps (with True File, given its True Name and the name of a source reference to FIG. 12), depending on how the local processor location (processor or media) that may contain the True File. is configured: First, in step S238, examine the local directory extensions This mechanism is now described with reference to FIG. 15. table entry record 138 to determine whether the file is locked 15 First, in step S272, determine whether the location speciby a cache server. If the file is locked, then add the ID of the fied is a processor. If it is determined that the location cache server to the dependent processor list of the True File specified is a processor, then send a Request True File registry table 126, and then send a message to the cache message (using the Request True File remote mechanism) to server to update the cache of the current processor using the the remote processor and wait for a response (Step S274). If 20 a negative response is received or no response is received Update Cache remote mechanism (Step 242). after a timeout period, this mechanism fails. If a positive If desired, compress the True File (Step S246), and, if response is received, enter the True File returned in the True desired, mirror the True File using the Mirror True File File registry 126 (Step S276). (If the file received was background mechanism (Step S248). compressed, enter the True File ID in the compressed File ID 4. Get True Name from Path The True Name of a file can be used to identify a file by 25 field.) contents, to confirm that a file matches its original contents, If, on the other hand, it is determined in step S272 that the or to compare two files. The mechanism to get a True Name location specified is not a processor, then, if necessary, given the pathname of a file is now described with reference request the user or operator to mount the indicated volume to FIG. 13. (Step S278). Then (Step S280) find the indicated file on the First, search the local directory extensions table 124 for 30 given volume and assimilate the file using the Assimilate Data Item primitive mechanism. If the volume does not the entry record 138 with the given pathname (Step S250). If the pathname is not found, this process fails and no True contain a True File registry 126, search the media inventory to find the path of the file on the volume. If no such file can Name corresponding to the given pathname exists. Next, be found, this mechanism fails. determine whether the local directory extensions table entry At this point, whether or not the location is determined (in record 138 includes a True Name (Step S252), and if so, the 35 mechanism's task is complete. Otherwise, determine step S272) to be a processor, if desired, verify the True File whether the local directory extensions table entry record 138 (in step S282). identifies a directory (Step S254), and if so, freeze the 7. Locate Remote File directory (Step S256) (the primitive mechanism Freeze This mechanism allows a processor to locate a file or data Directory is described below). 40 item from a remote source of True Files, when a specific source is unknown or unavailable. A client processor system Otherwise, in step S258, assimilate the file (using the may ask one of several or many sources whether it can Assimilate Data Item primitive mechanism) defined by the File ID field to generate its True Name and store its True supply a data object with a given True Name. The steps to Name in the local directory extensions entry record. Then perform this mechanism are as follows (with reference to return the True Name identified by the local directory 45 FIGS. 16(a) and 16(b). extensions table 124. The client processor 102 uses the source table 145 to 5. Link Path to True Name select one or more source processors (Step S284). If no The mechanism to link a path to a True Name provides a source processor can be found, the mechanism fails. Next, way of creating a new directory entry record identifying an the client processor 102 broadcasts to the selected sources a existing, assimilated file. This basic process may be used to 50 request to locate the file with the given True Name using the Locate True File remote mechanism (Step S286). The copy, move, and rename files without a need to copy their request to locate may be augmented by asking to propagate contents. The mechanism to link a path to a True Name is now described with reference to FIG. 14. this request to distant servers. The client processor then waits for one or more servers to respond positively (Step First, if desired, confirm that the True Name exists locally by searching for it in the True Name registry or local 55 S288). After all servers respond negatively, or after a timeout directory extensions table 135 (Step S260). Most uses of this period with no positive response, the mechanism repeats mechanism will require this form of validation. Next, search selection (Step S284) to attempt to identify alternative for the path in the local directory extensions table 135 (Step sources. If any selected source processor responds, its proS262). Confirm that the directory containing the file named cessor ID is the result of this mechanism. Store the processor in the path already exists (Step S264). If the named file itself 60 ID in the source field of the True File registry entry record exists, delete the File using the Delete True File operating 140 of the given True Name (Step S290). If the source location of the True Name is a different system mechanism (see below) (Step S268). Then, create an entry record in the local directory extenprocessor or medium than the destination (Step S290a), perform the following steps: sions with the specified path (Step S270) and update the (i) Look up the True File registry entry record 140 for the entry record and other data structures as follows: fill in the 65 True Name field of the entry with the specified True Name; corresponding True Name, and add the source location ID to the list of sources for the True Name (Step S290b); and increment the use count for the True File registry entry Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 43 of 57 PageID #: 56 5,978,791 17 18 (ii) If the source is a publishing system, determine the expiration date on the publishing system for the True Name and add that to the list of sources. If the source is not a publishing system, send a message to reserve the True File on the source processor (Step S290c). Source selection in step S284 may be based on optimizations involving general availability of the source, access time, bandwidth, and transmission cost, and ignoring previously selected processors which did not respond in step S288. 8. Make True File Local This mechanism is used when a True Name is known and a locally accessible copy of the corresponding file or data item is required. This mechanism makes it possible to actually read the data in a True File. The mechanism takes a True Name and returns when there is a local, accessible copy of the True File in the True File registry 126. This mechanism is described here with reference to the flow chart of FIGS. 17(a) and 17(b). First, look in the True File registry 126 for a True File entry record 140 for the corresponding True Name (Step S292). If no such entry is found this mechanism fails. If there is already a True File ID for the entry (Step S294), this mechanism's task is complete. If there is a compressed file ID for the entry (Step S296), decompress the file corresponding to the file ID (Step S298) and store the decompressed file ID in the entry (Step S300). This mechanism is then complete. If there is no True File ID for the entry (Step S294) and there is no compressed file ID for the entry (Step S296), then continue searching for the requested file. At this time it may be necessary to notify the user that the system is searching for the requested file. If there are one or more source IDs, then select an order in which to attempt to realize the source ID (Step S304). The order may be based on optimizations involving general availability of the source, access time, bandwidth, and transmission cost. For each source in the order chosen, realize the True File from the source location (using the Realize True File from Location primitive mechanism), until the True File is realized (Step S306). If it is realized, continue with step S294. If no known source can realize the True File, use the Locate Remote File primitive mechanism to attempt to find the True File (Step S308). If this succeeds, realize the True File from the identified source location and continue with step S296. 9. Create Scratch File A scratch copy of a file is required when a file is being created or is about to be modified. The scratch copy is stored in the file system of the underlying operating system. The scratch copy is eventually assimilated when the audit file record entry 146 is processed by the Process Audit File Entry primitive mechanism. This Create Scratch File mechanism requires a local directory extensions table entry record 138. When it succeeds, the local directory extensions table entry record 138 contains the scratch file ID of a scratch file that is not contained in the True File registry 126 and that may be modified. This mechanism is now described with reference to FIGS. 18(a) and 18(b). First determine whether the scratch file should be a copy of the existing True File (Step S310). If so, continue with step S312. Otherwise, determine whether the local directory extensions table entry record 138 identifies an existing True File (Step S316), and if so, delete the True File using the Delete True File primitive mechanism (Step S318). Then create a new, empty scratch file and store its scratch file ID in the local directory extensions table entry record 138 (Step S320). This mechanism is then complete. If the local directory extensions table entry record 138 identifies a scratch file ID (Step S312), then the entry already has a scratch file, so this mechanism succeeds. If the local directory extensions table entry record 138 identifies a True File (S316), and there is no True File ID for the True File (S312), then make the True File local using the Make True File Local primitive mechanism (Step S322). If there is still no True File ID, this mechanism fails. There is now a local True File for this file. If the use count in the corresponding True File registry entry record 140 is one (Step S326), save the True File ID in the scratch file ID of the local directory extensions table entry record 138, and remove the True File registry entry record 140 (Step S328). (This step makes the True File into a scratch file.) This mechanism's task is complete. Otherwise, if the use count in the corresponding True File registry entry record 140 is not one (in step S326), copy the file with the given True File ID to a new scratch file, using the Read File OS mechanism and store its file ID in the local directory extensions table entry record 138 (Step S330), and reduce the use count for the True File by one. If there is insufficient space to make a copy, this mechanism fails. 10. Freeze Directory This mechanism freezes a directory in order to calculate its True Name. Since the True Name of a directory is a function of the files within the directory, they must not change during the computation of the True Name of the directory. This mechanism requires the pathname of a directory to freeze. This mechanism is described with reference to FIGS. 19(a) and 19(b). In step S332, add one to the global freeze lock. Then search the local directory extensions table 124 to find each subordinate data file and directory of the given directory, and freeze each subordinate directory found using the Freeze Directory primitive mechanism (Step S334). Assimilate each unassimilated data file in the directory using the Assimilate Data Item primitive mechanism (Step S336). Then create a data item which begins with a tag or marker (a "magic number") being a unique data item indicating that this data item is a frozen directory (Step S337). Then list the file name and True Name for each file in the current directory (Step S338). Record any additional information required, such as the type, time of last access and modification, and size (Step S340). Next, in step S342, using the Assimilate Data Item primitive mechanism, assimilate the data item created in step S338. The resulting True Name is the True Name of the frozen directory. Finally, subtract one from the global freeze lock (Step S344). 11. Expand Frozen Directory This mechanism expands a frozen directory in a given location. It requires a given pathname into which to expand the directory, and the True Name of the directory and is described with reference to FIG. 20. First, in step S346, make the True File with the given True Name local using the Make True File Local primitive mechanism. Then read each directory entry in the local file created in step S346 (Step S348). For each such directory entry, do the following: Create a full pathname using the given pathname and the file name of the entry (Step S350); and link the created path to the True Name (Step S352) using the Link Path to True Name primitive mechanism. 12. Delete True File This mechanism deletes a reference to a True Name. The underlying True File is not removed from the True File registry 126 unless there are no additional references to the file. With reference to FIG. 21, this mechanism is performed as follows: 5 10 15 20 25 30 35 40 45 50 55 60 65 Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 44 of 57 PageID #: 57 5,978,791 19 20 If the global freeze lock is on, wait until the global freeze lock is turned off (Step S354). This prevents deleting a True File while a directory which might refer to it is being frozen. Next, find the True File registry entry record 140 given the True Name (Step S356). If the reference count field of the True File registry 126 is greater than zero, subtract one from the reference count field (Step S358). If it is determined (in step S360) that the reference count field of the True File registry entry record 140 is zero, and if there are no dependent systems listed in the True File registry entry record 140, then perform the following steps: (i) If the True File is a simple data item, then delete the True File, otherwise, (ii) (the True File is a compound data item) for each True Name in the data item, recursively delete the True File corresponding to the True Name (Step S362). (iii) Remove the file indicated by the True File ID and compressed file ID from the True File registry 126, and remove the True File registry entry record 140 (Step S364). 13. Process Audit File Entry This mechanism performs tasks which are required to maintain information in the local directory extensions table 124 and True File registry 126, but which can be delayed while the processor is busy doing more time-critical tasks. Entries 142 in the audit file 132 should be processed at a background priority as long as there are entries to be processed. With reference to FIG. 22, the steps for processing an entry are as follows: Determine the operation in the entry 142 currently being processed (Step S365). If the operation indicates that a file was created or written (Step S366), then assimilate the file using the Assimilate Data Item primitive mechanism (Step S368), use the New True File primitive mechanism to do additional desired processing (such as cache update, compression, and mirroring) (Step S369), and record the newly computed True Name for the file in the audit file record entry (Step S370). Otherwise, if the entry being processed indicates that a compound data item or directory was copied (or deleted) (Step S376), then for each component True Name in the compound data item or directory, add (or subtract) one to the use count of the True File registry entry record 140 corresponding to the component True Name (Step S378). In all cases, for each parent directory of the given file, update the size, time of last access, and time of last modification, according to the operation in the audit record (Step S379). Note that the audit record is not removed after processing, but is retained for some reasonable period so that it may be used by the Synchronize Directory extended mechanism to allow a disconnected remote processor to update its representation of the local system. 14. Begin Grooming This mechanism makes it possible to select a set of files for removal and determine the overall amount of space to be recovered. With reference to FIG. 23, first verify that the global grooming lock is currently unlocked (Step S382). Then set the global grooming lock, set the total amount of space freed during grooming to zero and empty the list of files selected for deletion (Step S384). For each True File in the True File registry 126, set the delete count to zero (Step S386). 15. Select For Removal This grooming mechanism tentatively selects a pathname to allow its corresponding True File to be removed. With reference to FIG. 24, first find the local directory extensions table entry record 138 corresponding to the given pathname (Step S388). Then find the True File registry entry record 140 corresponding to the True File name in the local directory extensions table entry record 138 (Step S390). Add one to the grooming delete count in the True File registry entry record 140 and add the pathname to a list of files selected for deletion (Step S392). If the grooming delete count of the True File registry entry record 140 is equal to the use count of the True File registry entry record 140, and if the there are no entries in the dependency list of the True File registry entry record 140, then add the size of the file indicated by the True File ID and or compressed file ID to the total amount of space freed during grooming (Step S394). 16. End Grooming This grooming mechanism ends the grooming phase and removes all files selected for removal. With reference to FIG. 25, for each file in the list of files selected for deletion, delete the file (Step S396) and then unlock the global grooming lock (Step S398). Operating System Mechanisms The next of the mechanisms provided by the present invention, operating system mechanisms, are now described. The following operating system mechanisms are described: 1. Open File; 2. Close File; 3. Read File; 4. Write File; 5. Delete File or Directory; 6. Copy File or Directory; 7. Move File or Directory; 8. Get File Status; and 9. Get Files in Directory. 1. open File A mechanism to open a file is described with reference to FIGS. 26(a) and 26(b). This mechanism is given as input a pathname and the type of access required for the file (for example, read, write, read/write, create, etc.) and produces either the File ID of the file to be opened or an indication that no file should be opened. The local directory extensions table record 138 and region table record 142 associated with the opened file are associated with the open file for later use in other processing functions which refer to the file, such as read, write, and close. First, determine whether or not the named file exists locally by examining the local directory extensions table 124 to determine whether there is an entry corresponding to the given pathname (Step S400). If it is determined that the file name does not exist locally, then, using the access type, determine whether or not the file is being created by this opening process (Step S402). If the file is not being created, prohibit the open (Step S404). If the file is being created, create a zero-length scratch file using an entry in local directory extensions table 124 and produce the scratch file ID of this scratch file as the result (Step S406). If, on the other hand, it is determined in step S400 that the file name does exist locally, then determine the region in which the file is located by searching the region table 128 to find the record 142 with the longest region path which is a prefix of the file pathname (Step S408). This record identifies the region of the specified file. Next, determine using the access type, whether the file is being opened for writing or whether it is being opened only for reading (Step S410). If the file is being opened for reading only, then, if the file is a scratch file (Step S419), return the scratch File ID of the file (Step S424). Otherwise 5 10 15 20 25 30 35 40 45 50 55 60 65 Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 45 of 57 PageID #: 58 5,978,791 21 22 5. Delete File or Directory get the True Name from the local directory extensions table 124 and make a local version of the True File associated with The process of deleting a file, for a given pathname, is the True Name using the Make True File Local primitive described here with reference to FIGS. 27(a) and 27(b). mechanism, and then return the True File ID associated with First, determine the local directory extensions table entry the True Name (Step S420). 5 record 138 and region table entry record 142 for the file If the file is not being opened for reading only (Step (Step S422). If the file has no local directory extensions table S410), then, if it is determined by inspecting the region table entry record 138 or is locked or is in a read-only region, entry record 142 that the file is in a read-only directory (Step prohibit the deletion. S416), then prohibit the opening (Step S422). Identify the corresponding True File given the True Name If it is determined by inspecting the region table 128 that 10 of the file being deleted using the True File registry 126 the file is in a cached region (Step S423), then send a Lock (Step S424). If the file has no True Name, (Step S426) then Cache message to the corresponding cache server, and wait delete the scratch copy of the file based on its scratch file ID for a return message (Step S418). If the return message says in the local directory extensions table 124 (Step S427), and the file is already locked, prohibit the opening. continue with step S428. If the access type indicates that the file being modified is If the file has a True Name and the True File's use count being rewritten completely (Step S419), so that the original 15 is one (Step S429), then delete the True File (Step S430), and data will not be required, then Delete the File using the continue with step S428. Delete File OS mechanism (Step S421) and perform step If the file has a True Name and the True File's use count S406. Otherwise, make a scratch copy of the file (Step S417) is greater than one, reduce its use count by one (Step S431). and produce the scratch file ID of the scratch file as the result (Step S424). 20 Then proceed with step S428. 2. Close File In Step S428, delete the local directory extensions table This mechanism takes as input the local directory extenentry record, and add an entry to the audit file 132 indicating sions table entry record 138 of an open file and the data the time and the operation performed (delete). maintained for the open file. To close a file, add an entry to 6. Copy File or Directory the audit file indicating the time and operation (create, read 25 A mechanism is provided to copy a file or directory given or write). The audit file processing (using the Process Audit a source and destination processor and pathname. The Copy File Entry primitive mechanism) will take care of assimiFile mechanism does not actually copy the data in the file, lating the file and thereby updating the other records. only the True Name of the file. This mechanism is performed 3. Read File as follows: To read a file, a program must provide the offset and 30 (A) Given the source path, get the True Name from the length of the data to be read, and the location of a buffer into path. If this step fails, the mechanism fails. which to copy the data read. (B) Given the True Name and the destination path, link The file to be read from is identified by an open file the destination path to the True Name. descriptor which includes a File ID as computed by the Open (C) If the source and destination processors have different File operating system mechanism defined above. The File ID 35 True File registries, find (or, if necessary, create) an entry for may identify either a scratch file or a True File (or True File the True Name in the True File registry table 126 of the segment). If the File ID identifies a True File, it may be destination processor. Enter into the source ID field of this either a simple or a compound True File. Reading a file is new entry the source processor identity. accomplished by the following steps: (D) Add an entry to the audit file 132 indicating the time In the case where the File ID identifies a scratch file or a 40 and operation performed (copy). simple True File, use the read capabilities of the underlying This mechanism addresses capability of the system to operating system. avoid copying data from a source location to a destination In the case where the File ID identifies a compound file, location when the destination already has the data. In break the read operation into one or more read operations on addition, because of the ability to freeze a directory, this component segments as follows: 45 mechanism also addresses capability of the system immeA. Identify the segment(s) to be read by dividing the diately to make a copy of any collection of files, thereby to specified file offset and length each by the fixed size of a support an efficient version control mechanisms for groups segment (a system dependent parameter), to determine the of files. segment number and number of segments that must be read. 7. Move File or Directory B. For each segment number computed above, do the 50 A mechanism is described which moves (or renames) a following: file from a source path to a destination path. The move i. Read the compound True File index block to determine operation, like the copy operation, requires no actual transfer the True Name of the segment to be read. of data, and is performed as follows: ii. Use the Realize True File from Location primitive (A) Copy the file from the source path to the destination mechanism to make the True File segment available 55 path. locally. (If that mechanism fails, the Read File mecha(B) If the source path is different from the destination nism fails). path, delete the source path. iii. Determine the File ID of the True File specified by the 8. Get File Status True Name corresponding to this segment. This mechanism takes a file pathname and provides iv. Use the Read File mechanism (recursively) to read 60 information about the pathname. First the local directory extensions table entry record 138 corresponding to the from this segment into the corresponding location in pathname given is found. If no such entry exists, then this the specified buffer. 4. Write File mechanism fails, otherwise, gather information about the file File writing uses the file ID and data management capaand its corresponding True File from the local directory bilities of the underlying operating system. File access 65 extensions table 124. The information can include any (Make File Local described above) can be deferred until the information shown in the data structures, including the size, first read or write. type, owner, True Name, sources, time of last access, time of Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 46 of 57 PageID #: 59 5,978,791 23 24 last modification, state (local or not, assimilated or not, One the other hand, if a True File registry entry record 140 compressed or not), use count, expiration date, and reseris not found (Step S434), and the flag indicates that the vations. request for this True File is to be forwarded (Step S436), 9. Get Files in Directory then forward a request for this True File to some other This mechanism enumerates the files in a directory. It is 5 processors in the system (Step S442). If the source table for used (implicitly) whenever it is necessary to determine the current processor identifies one or more publishing whether a file exists (is present) in a directory. For instance, servers which should have a copy of this True File, then forward the request to each of those publishing servers (Step it is implicitly used in the Open File, Delete File, Copy File S436). or Directory, and Move File operating system mechanisms, because the files operated on are referred to by pathnames 10 If a True File registry entry record 140 is found for the required True File (Step S434), and if the entry includes a containing directory names. The mechanism works as follows: True File ID or Compressed File ID (Step S440), respond positively (Step S444). If the entry includes a True File ID The local directory extensions table 124 is searched for an entry 138 with the given directory pathname. If no such then this provides the identity or disk location of the actual entry is found, or if the entry found is not a directory, then 15 physical representation of the file or file segment required. If the entry include a Compressed File ID, then a comthis mechanism fails. If there is a corresponding True File field in the local pressed version of the True File may be stored instead of, or directory extensions table record, then it is assumed that the in addition to, an uncompressed version. This field provides True File represents a frozen directory. The Expand Frozen the identity of the actual representation of the compressed Directory primitive mechanism is used to expand the exist- 20 version of the file. If the True File registry entry record 140 is found (Step ing True File into directory entries in the local directory S434) but does not include a True File ID (the File ID is extensions table. absent if the actual file is not currently present at the current Finally, the local directory extensions table 124 is again searched, this time to find each directory subordinate to the location) (Step S440), and if the True File registry entry given directory. The names found are provided as the result. 25 record 140 includes one or more source processors, and if Remote Mechanisms the request can be forwarded, then forward the request for The remote mechanisms provided by the present inventhis True File to one or more of the source processors (Step tion are now described. Recall that remote mechanisms are S444). 2. Reserve True File used by the operating system in responding to requests from other processors. These mechanisms enable the capabilities 30 This mechanism allows a remote processor to indicate of the present invention in a peer-to-peer network mode of that it depends on the local processor for access to a specific operation. True File. It takes a True Name as input. This mechanism is In a presently preferred embodiment, processors commudescribed here. (A) Find the True File registry entry record 140 associated nicate with each other using a remote procedure call (RPC) style interface, running over one of any number of commu- 35 with the given True File. If no entry exists, reply negatively. nication protocols such as IPX/SPX or TCP/IP. Each peer (B) If the True File registry entry record 140 does not include a True File ID or compressed File ID, and if the True processor which provides access to its True File registry 126 or file regions, or which depends on another peer processor, File registry entry record 140 includes no source IDs for provides a number of mechanisms which can be used by its removable storage volumes, then this processor does not peers. 40 have access to a copy of the given file. Reply negatively. The following remote mechanisms are described: (C) Add the ID of the sending processor to the list of dependent processors for the True File registry entry record 1. Locate True File; 140. Reply positively, with an indication of whether the 2. Reserve True File; reserved True File is on line or off line. 3. Request True File; 45 3. Request True File 4. Retire True File; This mechanism allows a remote processor to request a 5. Cancel Reservation; copy of a True File from the local processor. It requires a 6. Acquire True File; True Name and responds positively by sending a True File 7. Lock Cache; back to the requesting processor. The mechanism operates as 8. Update Cache; and 50 follows: (A) Find the True File registry entry record 140 associated 9. Check Expiration Date. 1. Locate True File with the given True Name. If there is no such True File This mechanism allows a remote processor to determine registry entry record 140, reply negatively. whether the local processor contains a copy of a specific (B) Make the True File local using the Make True File True File. The mechanism begins with a True Name and a 55 Local primitive mechanism. If this mechanism fails, the flag indicating whether to forward requests for this file to Request True File mechanism also fails. other servers. This mechanism is now described with refer(C) Send the local True File in either it is uncompressed ence to FIG. 28. or compressed form to the requesting remote processor. First determine if the True File is available locally or if Note that if the True File is a compound file, the components there is some indication of where the True File is located (for 60 are not sent. example, in the Source IDs field). Look up the requested (D) If the remote file is listed in the dependent process list True Name in the True File registry 126 (Step S432). of the True File registry entry record 140, remove it. If a True File registry entry record 140 is not found for this 4. Retire True File True Name (Step S434), and the flag indicates that the This mechanism allows a remote processor to indicate request is not to be forwarded (Step S436), respond nega- 65 that it no longer plans to maintain a copy of a given True tively (Step S438). That is, respond to the effect that the True File. An alternate source of the True File can be specified, if, File is not available. for instance, the True File is being moved from one server Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 47 of 57 PageID #: 60 5,978,791 25 26 to another. It begins with a True Name, a requesting proLink the given pathname to the given True Name using cessor ID, and an optional alternate source. This mechanism the Link Path to True Name primitive mechanism. operates as follows: Unlock the local directory extensions table entry record (A) Find a True Name entry in the True File registry 126. 138 and return positively. If there is no entry for this True Name, this mechanism's task 5 9. Check Expiration Date is complete. Return current or new expiration date and possible alter(B) Find the requesting processor on the source list and, native source to caller. if it is there, remove it. Background Processes and Mechanisms (C) If an alternate source is provided, add it to the source The background processes and mechanisms provided by list for the True File registry entry record 140. 10 the present invention are now described. Recall that back(D) If the source list of the True File registry entry record ground mechanisms are intended to run occasionally and at 140 has no items in it, use the Locate Remote File primitive a low priority to provide automated management capabilities mechanism to search for another copy of the file. If it fails, . . with respect to the present invention. raIse a senous error. The following background mechanisms are described: 5. Cancel Reservation 1. Mirror True File; This mechanism allows a remote processor to indicate 15 that it no longer requires access to a True File stored on the 2. Groom Region; local processor. It begins with a True Name and a requesting 3. Check for Expired Links; processor ID and proceeds as follows: 4. Verify Region; and (A) Find the True Name entry in the True File registry 126. If there is no entry for this True Name, this mecha- 20 5. Groom Source List. nism's task is complete. 1. Mirror True File (B) Remove the identity of the requesting processor from This mechanism is used to ensure that files are available the list of dependent processors, if it appears. in alternate locations in mirror groups or archived on archi(C) If the list of dependent processors becomes zero and val servers. The mechanism depends on application-specific the use count is also zero, delete the True File. 25 migration/archival criteria (size, time since last access, num6. Acquire True File ber of copies required, number of existing alternative This mechanism allows a remote processor to insist that sources) which determine under what conditions a file a local processor make a copy of a specified True File. It is should be moved. The Mirror True File mechanism operates used, for example, when a cache client wants to write as follows, using the True File specified, perform the folthrough a new version of a file. The Acquire True File 30 lowing steps: mechanism begins with a data item and an optional True (A) Count the number of available locations of the True Name for the data item and proceeds as follows: File by inspecting the source list of the True File registry (A) Confirm that the requesting processor has the right to entry record 140 for the True File. This step determines how require the local processor to acquire data items. If not, send many copies of the True File are available in the system. a negative reply. (B) If the True File meets the specified migration criteria, (B) Make a local copy of the data item transmitted by the 35 select a mirror group server to which a copy of the file remote processor. should be sent. Use the Acquire True File remote mechanism (C) Assimilate the data item into the True File registry of to copy the True File to the selected mirror group server. Add the local processor. the identity of the selected system to the source list for the (D) If a True Name was provided with the file, the True Name calculation can be avoided, or the mechanism can 40 True File. 2. Groom Region verify that the file received matches the True Name sent. This mechanism is used to automatically free up space in (E) Add an entry in the dependent processor list of the true file registry record indicating that the requesting processor a processor by deleting data items that may be available elsewhere. The mechanism depends on application-specific depends on this copy of the given True File. 45 grooming criteria (for instance, a file may be removed if (F) Send a positive reply. there is an alternate online source for it, it has not been 7. Lock Cache accessed in a given number of days, and it is larger than a This mechanism allows a remote cache client to lock a local file so that local users or other cache clients cannot given size). This mechanism operates as follows: change it while the remote processor is using it. The Repeat the following steps (i) to (iii) with more aggressive mechanism begins with a pathname and proceeds as follows: 50 grooming criteria until sufficient space is freed or until all grooming criteria have been exercised. Use grooming infor(A) Find the local directory extensions table entry record 138 of the specified pathname. If no such entry exists, reply mation to determine how much space has been freed. Recall negatively. that, while grooming is in effect, grooming information (B) If an local directory extensions table entry record 138 includes a table of pathnames selected for deletion, and exists and is already locked, reply negatively that the file is 55 keeps track of the amount of space that would be freed if all already locked. of the files were deleted. (C) If an local directory extensions table entry record 138 (i) Begin Grooming (using the primitive mechanism). exists and is not locked, lock the entry. Reply positively. (ii) For each pathname in the specified region, for the True 8. Update Cache File corresponding to the pathname, if the True File is This mechanism allows a remote cache client to unlock a 60 present, has at least one alternative source, and meets local file and update it with new contents. It begins with a application specific grooming criteria for the region, select pathname and a True Name. The file corresponding to the the file for removal (using the primitive mechanism). True Name must be accessible from the remote processor. (iii) End Grooming (using the primitive mechanism). If the region is used as a cache, no other processors are This mechanism operates as follows: Find the local directory extensions table entry record 138 65 dependent on True Files to which it refers, and all such True corresponding to the given pathname. Reply negatively if no Files are mirrored elsewhere. In this case, True Files can be such entry exists or if the entry is not locked. removed with impunity. For a cache region, the grooming Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 48 of 57 PageID #: 61 5,978,791 27 28 criteria would ordinarily eliminate the least recently Extended Mechanisms accessed True Files first. This is best done by sorting the The extended mechanisms provided by the present invenTrue Files in the region by the most recent access time tion are now described. Recall that extended mechanisms before performing step (ii) above. The application specific run within application programs over the operating system criteria would thus be to select for removal every True File 5 to provide solutions to specific problems and applications. encountered (beginning with the least recently used) until The following extended mechanisms are described: the required amount of free space is reached. 1. Inventory Existing Directory; 3. Check for Expired Links 2. Inventory Removable, Read-only Files; This mechanism is used to determine whether dependencies on published files should be refreshed. The following 10 3. Synchronize Directories; steps describe the operation of this mechanism: 4. Publish Region; For each pathname in the specified region, for each True 5. Retire Directory; File corresponding to the pathname, perform the following 6. Realize Directory at Location; step: If the True File registry entry record 140 corresponding to 7. Verify True File; the True File contains at least one source which is a 15 8. Track for Accounting Purposes; and publishing server, and if the expiration date on the depen9. Track for Licensing Purposes. dency is past or close, then perform the following steps: 1. Inventory Existing Directory (A) Determine whether the True File registry entry record This mechanism determines the True Names of files in an contains other sources which have not expired. existing on-line directory in the underlying operating sys(B) Check the True Name expiration of the server. If the 20 tem. One purpose of this mechanism is to install True Name expiration date has been extended, or an alternate source is mechanisms in an existing file system. suggested, add the source to the True File registry entry An effect of such an installation is to eliminate immedirecord 140. ately all duplicate files from the file system being traversed. (C) If no acceptable alternate source was found in steps 25 If several file systems are inventoried in a single True File (A) or (B) above, make a local copy of the True File. registry, duplicates across the volumes are also eliminated. (D) Remove the expired source. (A) Traverse the underlying file system in the operating 4. Verify Region system. For each file encountered, excluding directories, This mechanism can be used to ensure that the data items perform the following: in the True File registry 126 have not been damaged acci(i) Assimilate the file encountered (using the Assimilate 30 dentally or maliciously. The operation of this mechanism is File primitive mechanism). This process computes its described by the following steps: True Name and moves its data into the True File (A) Search the local directory extensions table 124 for registry 126. each pathname in the specified region and then perform the (ii) Create a pathname consisting of the path to the volume 35 following steps: directory and the relative path of the file on the media. (i) Get the True File name corresponding to the pathname; Link this path to the computed True Name using the (ii) If the True File registry entry 140 for the True File Link Path to True Name primitive mechanism. does not have a True File ID or compressed file ID, 2. Inventory Removable, Read-only Files ignore it. A system with access to removable, read-only media (iii) Use the Verify True File mechanism (see extended 40 volumes (such as WORM disks and CD-ROMs) can create mechanisms below) to confirm that the True File specia usable inventory of the files on these disks without having fied is correct. to make online copies. These objects can then be used for S. Groom Source List archival purposes, directory overlays, or other needs. An The source list in a True File entry should be groomed operator must request that an inventory be created for such sometimes to ensure there are not too many mirror or archive 45 a volume. copies. When a file is deleted or when a region definition or This mechanism allows for maintaining inventories of the its mirror criteria are changed, it may be necessary to inspect contents of files and data items on removable media, such as the affected True Files to determine whether there are too diskettes and CD-ROMS, independent of other properties of many mirror copies. This can be done with the following the files such as name, location, and date of creation. steps: 50 The mechanism creates an online inventory of the files on For each affected True File, one or more removable volumes, such as a floppy disk or (A) Search the local directory extensions table to find CD-ROM, when the data on the volume is represented as a each region that refers to the True File. directory. The inventory service uses a True Name to iden(B) Create a set of "required sources", initially empty. tify each file, providing a way to locate the data independent (C) For each region found, 55 of its name, date of creation, or location. (a) determine the mirroring criteria for that region, The inventory can be used for archival of data (making it (b) determine which sources for the True File satisfy the possible to avoid archiving data when that data is already on mirroring criteria, and a separate volume), for grooming (making it possible to delete infrequently accessed files if they can be retrieved (c) add these sources to the set of required sources. (D) For each source in the True File registry entry, if the 60 from removable volumes), for version control (making it source identifies a remote processor (as opposed to removpossible to generate a new version of a CD-ROM without having to copy the old version), and for other purposes. able media), and if the source is not a publisher, and if the The inventory is made by creating a volume directory in source is not in the set of required sources, then eliminate the source, and use the Cancel Reservation remote mechanism the media inventory in which each file named identifies the to eliminate the given processor from the list of dependent 65 data item on the volume being inventoried. Data items are processors recorded at the remote processor identified by the not copied from the removable volume during the inventory source. process. Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 49 of 57 PageID #: 62 5,978,791 29 An operator must request that an inventory be created for a specific volume. Once created, the volume directory can be frozen or copied like any other directory. Data items from either the physical volume or the volume directory can be accessed using the Open File operating system mechanism which will cause them to be read from the physical volume using the Realize True File from Location primitive mechamsm. To create an inventory the following steps are taken: (A) A volume directory in the media inventory is created to correspond to the volume being inventoried. Its contextual name identifies the specific volume. (B) A source table entry 144 for the volume is created in the source table 130. This entry 144 identifies the physical source volume and the volume directory created in step (A). (C) The filesystem on the volume is traversed. For each file encountered, excluding directories, the following steps are taken: (i) The True Name of the file is computed. An entry is created in the True Name registry 124, including the True Name of the file using the primitive mechanism. The source field of the True Name registry entry 140 identifies the source table entry 144. (ii) A pathname is created consisting of the path to the volume directory and the relative path of the file on the media. This path is linked to the computed True Name using Link Path to True Name primitive mechanism. (D) After all files have been inventoried, the volume directory is frozen. The volume directory serves as a table of contents for the volume. It can be copied using the Copy File or Directory primitive mechanism to create an "overlay" directory which can then be modified, making it possible to edit a virtual copy of a read-only medium. 3. Synchronize Directories Given two versions of a directory derived from the same starting point, this mechanism creates a new, synchronized version which includes the changes from each. Where a file is changed in both versions, this mechanism provides a user exit for handling the discrepancy. By using True Names, comparisons are instantaneous, and no copies of files are necessary. This mechanism lets a local processor synchronize a directory to account for changes made at a remote processor. Its purpose is to bring a local copy of a directory up to date after a period of no communication between the local and remote processor. Such a period might occur if the local processor were a mobile processor detached from its server, or if two distant processors were run independently and updated nightly. An advantage of the described synchronization process is that it does not depend on synchronizing the clocks of the local and remote processors. However, it does require that the local processor track its position in the remote processor's audit file. This mechanism does not resolve changes made simultaneously to the same file at several sites. If that occurs, an external resolution mechanism such as, for example, operator intervention, is required. The mechanism takes as input a start time, a local directory pathname, a remote processor name, and a remote directory pathname name, and it operates by the following steps: (A) Request a copy of the audit file 132 from the remote processor using the Request True File remote mechanism. (B) For each entry 146 in the audit file 132 after the start time, if the entry indicates a change to a file in the remote directory, perform the following steps: 30 (i) Compute the pathname of the corresponding file in the local directory. Determine the True Name of the corresponding file. (ii) If the True Name of the local file is the same as the old 5 True Name in the audit file, or if there is no local file and the audit entry indicates a new file is being created, link the new True Name in the audit file to the local pathname using the Link Path to True Name primitive mechanism. 10 (iii) Otherwise, note that there is a problem with the synchronization by sending a message to the operator or to a problem resolution program, indicating the local pathname, remote pathname, remote processor, and time of change. 15 (C) After synchronization is complete, record the time of the final change. This time is to be used as the new start time the next time this directory is synchronized with the same remote processor. 4. Publish Region 20 The publish region mechanism allows a processor to offer the files in a region to any client processors for a limited period of time. The purpose of the service is to eliminate any need for client processors to make reservations with the publishing 25 processor. This in turn makes it possible for the publishing processor to service a much larger number of clients. When a region is published, an expiration date is defined for all files in the region, and is propagated into the pub30 lishing system's True File registry entry record 140 for each file. When a remote file is copied, for instance using the Copy File operating system mechanism, the expiration date is copied into the source field of the client's True File registry 35 entry record 140. When the source is a publishing system, no dependency need be created. The client processor must occasionally and in background, check for expired links, to make sure it still has access to these files. This is described in the background 40 mechanism Check for Expired Links. 5. Retire Directory This mechanism makes it possible to eliminate safely the True Files in a directory, or at least dependencies on them, after ensuring that any client processors depending on those 45 files remove their dependencies. The files in the directory are not actually deleted by this process. The directory can be deleted with the Delete File operating system mechanism. The mechanism takes the pathname of a given directory, and optionally, the identification of a preferred alternate 50 source processor for clients to use. The mechanism performs the following steps: (A) Traverse the directory. For each file in the directory, perform the following steps: 55 60 65 (i) Get the True Name of the file from its path and find the True File registry entry 140 associated with the True Name. (ii) Determine an alternate source for the True File. If the source IDs field of the TFR entry includes the preferred alternate source, that is the alternate source. If it does not, but includes some other source, that is the alternate source. If it contains no alternate sources, there is no alternate source. (iii) For each dependent processor in the True File registry entry 140, ask that processor to retire the True File, specifying an alternate source if one was determined, using the remote mechanism. Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 50 of 57 PageID #: 63 5,978,791 31 32 6. Realize Directory at Location (A) Note every time a file is created or deleted, for instance by monitoring audit entries in the Process Audit This mechanism allows the user or operating system to force copies of files from some source location to the True File Entry primitive mechanism. When such an event is File registry 126 at a given location. The purpose of the encountered, create an entry 148 in the accounting log 134 mechanism is to ensure that files are accessible in the event 5 that shows the responsible party and the identity of the file the source location becomes inaccessible. This can happen created or deleted. for instance if the source or given location are on mobile (B) Every time a file is transmitted, for instance when a computers, or are on removable media, or if the network file is copied with a Request True File remote mechanism or connection to the source is expected to become unavailable, an Acquire True File remote mechanism, create an entry in 10 the accounting log 134 that shows the responsible party, the or if the source is being retired. This mechanism is provided in the following steps for identity of the file, and the source and destination proceseach file in the given directory, with the exception of sors. subdirectories: (C) Occasionally run an accounting program to process (A) Get the local directory extensions table entry record the accounting log 134, distributing the events to the account 138 given the pathname of the file. Get the True Name of the local directory extensions table entry record 138. This 15 records of each responsible party. The account records can eventually be summarized for billing purposes. service assimilates the file if it has not already been assimi9. Track for Licensing Purposes 1ated. This mechanism ensures that licensed files are not used by (B) Realize the corresponding True File at the given unauthorized parties. The True Name provides a safe way to location. This service causes it to be copied to the given 20 identify licensed material. This service allows proof of location from a remote system or removable media. possession of specific files according to their contents with7. Verify True File out disclosing their contents. This mechanism is used to verify that the data item in a Enforcing use of valid licenses can be active (for example, True File registry 126 is indeed the correct data item given by refusing to provide access to a file without authorization) its True Name. Its purpose is to guard against device errors, 25 or passive (for example, by creating a report of users who do malicious changes, or other problems. not have proper authorization). If an error is found, the system has the ability to "heal" One possible way to perform license validation is to itself by finding another source for the True File with the perform occasional audits of employee systems. The service given name. It may also be desirable to verify that the error described herein relies on True Names to support such an has not propagated to other systems, and to log the problem or indicate it to the computer operator. These details are not 30 audit, as in the following steps: (A) For each licensed product, record in the license table described here. 136 the True Name of key files in the product (that is, files To verify a data item that is not in a True File registry 126, which are required in order to use the product, and which do use the Calculate True Name primitive mechanism described not occur in other products) Typically, for a software above. The basic mechanism begins with a True Name, and 35 product, this would include the main executable image and perhaps other major files such as clip-art, scripts, or online operates in the following steps: help. Also record the identity of each system which is (A) Find the True File registry entry record 140 correauthorized to have a copy of the file. sponding to the given True Name. (B) Occasionally, compare the contents of each user (B) If there is a True File ID for the True File registry entry record 140 then use it. Otherwise, indicate that no file 40 processor against the license table 136. For each True Name in the license table do the following: exists to verify. (i) Unless the user processor is authorized to have a copy (C) Calculate the True Name of the data item given the file of the file, confirm that the user processor does not have ID of the data item. a copy of the file using the Locate True File mecha(D) Confirm that the calculated True Name is equal to the msm. 45 given True Name. (ii) If the user processor is found to have a file that it is (E) If the True Names are not equal, there is an error in the True File registry 126. Remove the True File ID from the not authorized to have, record the user processor and True File registry entry record 140 and place it somewhere True Name in a license violation table. else. Indicate that the True File registry entry record 140 The System in Operation contained an error. 50 Given the mechanisms described above, the operation of a typical DP system employing these mechanisms is now 8. Track for Accounting Purposes described in order to demonstrate how the present invention This mechanism provides a way to know reliably which files have been stored on a system or transmitted from one meets its requirements and capabilities. system to another. The mechanism can be used as a basis for In operation, data items (for example, files, database a value-based accounting system in which charges are based 55 records, messages, data segments, data blocks, directories, on the identity of the data stored or transmitted, rather than instances of object classes, and the like) in a DP system simply on the number of bits. employing the present invention are identified by substanThis mechanism allows the system to track possession of tially unique identifiers (True Names), the identifiers depending on all of the data in the data items and only on the specific data items according to content by owner, independent of the name, date, or other properties of the data item, 60 data in the data items. The primitive mechanisms Calculate and tracks the uses of specific data items and files by content True Name and Assimilate Data Item support this property. for accounting purposes. True names make it possible to For any given data item, using the Calculate True Name identify each file briefly yet uniquely for this purpose. primitive mechanism, a substantially unique identifier or Tracking the identities of files requires maintaining an True Name for that data item can be determined. accounting log 134 and processing it for accounting or 65 Further, in operation of a DP system incorporating the billing purposes. The mechanism operates in the following present invention, multiple copies of data items are avoided steps: (unless they are required for some reason such as backups or Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 51 of 57 PageID #: 64 5,978,791 33 34 mirror copies in a fault-tolerant system). Multiple copies of Location primitive mechanism to make sure that component data items are avoided even when multiple names refer to segments are locally available, and then uses the operating the same data item. The primitive mechanisms Assimilate system file mechanisms to read data from the local file. Thus, when a compound file is copied from a remote Data Items and New True File support this property. Using the Assimilate Data Item primitive mechanism, if a data item 5 system, only its True Name is copied. When it is opened, already exists in the system, as indicated by an entry in the only its indirect block is copied. When the corresponding file True File registry 126, this existence will be discovered by is read, the required component segments are realized and this mechanism, and the duplicate data item (the new data therefore copied. item) will be eliminated (or not added). Thus, for example, In operation data items can be accessed by reference to if a data file is being copied onto a system from a floppy 10 their identities (True N ames) independent of their present location. The actual data item or True File corresponding to disk, if, based on the True Name of the data file, it is determined that the data file already exists in the system (by a given data identifier or True Name may reside anywhere in the same or some other name), then the duplicate copy will the system (that is, locally, remotely, offline, etc). If a not be installed. If the data item was being installed on the required True File is present locally, then the data in the file system by some name other than its current name, then, 15 can be accessed. If the data item is not present locally, there using the Link Path to True Name primitive mechanism, the are a number of ways in which it can be obtained from other (or new) name can be linked to the already existing wherever it is present. Using the source IDs field of the True data item. File registry table, the location(s) of copies of the True File In general, the mechanisms of the present invention corresponding to a given True Name can be determined. The operate in such a way as to avoid recreating an actual data 20 Realize True File from Location primitive mechanism tries item at a location when a copy of that data item is already to make a local copy of a True File, given its True Name and present at that location. In the case of a copy from a floppy the name of a source location (processor or media) that may disk, the data item (file) may have to be copied (into a contain the True File. If, on the other hand, for some reason it is not known where there is a copy of the True File, or if scratch file) before it can be determined that it is a duplicate. This is because only one processor is involved. On the other 25 the processors identified in the source IDs field do not respond with the required True File, the processor requiring hand, in a multiprocessor environment or DP system, each the data item can make a general request for the data item processor has a record of the True Names of the data items on that processor. When a data item is to be copied to using the Request True File remote mechanism from all processors in the system that it can contact. another location (another processor) in the DP system, all As a result, the system provides transparent access to any that is necessary is to examine the True Name of the data 30 item prior to the copying. If a data item with the same True data item by reference to its data identity, and independent of its present location. Name already exists at the destination location (processor), In operation, data items in the system can be verified and then there is no need to copy the data item. Note that if a data item which already exists locally at a destination location is have their integrity checked. This is from the manner in still copied to the destination location (for example, because 35 which True Names are determined. This can be used for the remote system did not have a True Name for the data security purposes, for instance, to check for viruses and to verify that data retrieved from another location is the desired item or because it arrives as a stream of un-named data), the Assimilate Data Item primitive mechanism will prevent and requested data. For example, the system might store the multiple copies of the data item from being created. True Names of all executable applications on the system and Since the True Name of a large data item (a compound 40 then periodically redetermine the True Names of each of data item) is derived from and based on the True Names of these applications to ensure that they match the stored True components of the data item, copying of an entire data item Names. Any change in a True Name potentially signals can be avoided. Since some (or all) of the components of a corruption in the system and can be further investigated. The large data item may already be present at a destination Verify Region background mechanism and the Verify True location, only those components which are not present there 45 File extended mechanisms provide direct support for this need be copied. This property derives from the manner in mode of operation. The Verify Region mechanism is used to ensure that the data items in the True File registry have not which True Names are determined. When a file is copied by the Copy File or Directory been damaged accidentally or maliciously. The Verify True operating system mechanism, only the True Name of the file File mechanism verifies that a data item in a True File 50 registry is indeed the correct data item given its True Name. is actually replicated. Once a processor has determined where (that is, at which When a file is opened (using the Open File operating other processor or location) a copy of a data item is in the system mechanism), it uses the Make True File Local DP system, that processor might need that other processor or primitive mechanism (either directly or indirectly through location to keep a copy of that data item. For example, a the Create Scratch File primitive mechanism) to create a local copy of the file. The Open File operating system 55 processor might want to delete local copies of data items to mechanism uses the Make True File Local primitive make space available locally while knowing that it can rely mechanism, which uses the Realize True File from Location on retrieving the data from somewhere else when needed. To this end the system allows a processor to Reserve (and primitive mechanism, which, in turn uses the Request True cancel the reservation of) True Files at remote locations File remote mechanism. The Request True File remote mechanism copies only a 60 (using the remote mechanism). In this way the remote single data item from one processor to another. If the data locations are put on notice that another location is relying on item is a compound file, its component segments are not the presence of the True File at their location. copied, only the indirect block is copied. The segments are A DP system employing the present invention can be copied only when they are read (or otherwise needed). made into a fault-tolerant system by providing a certain The Read File operating system mechanism actually reads 65 amount of redundancy of data items at multiple locations in data. The Read File mechanism is aware of compound files the system. Using the Acquire True File and Reserve True and indirect blocks, and it uses the Realize True File from File remote mechanisms, a particular processor can imp le- Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 52 of 57 PageID #: 65 5,978,791 35 36 ment its own form of fault-tolerance by copying data items A publishing server, on the other hand, may want to to other processors and then reserving them there. However, provide access to many clients, and possibly anonymous the system also provides the Mirror True File background ones, without incurring the overhead of tracking dependenmechanism to mirror (make copies) of the True File availcies for each client. Therefore, a public server can provide able elsewhere in the system. Any degree of redundancy 5 expiration dates for True Files in its registry. This allows (limited by the number of processors or locations in the client systems to safely maintain references to a True File on system) can be implemented. As a result, this invention the public server. The Check For Expired Links background maintains a desired degree or level of redundancy in a mechanism allows the client of a publishing server to network of processors, to protect against failure of any occasionally confirm that its dependencies on the publishing particular processor by ensuring that multiple copies of data 10 server are safe. items exist at different locations. In a variation of this aspect of the invention, a processor The data structures used to implement various features that is newly connected (or reconnected after some absence) and mechanisms of this invention store a variety of useful to the system can obtain a current version of all (or of information which can be used, in conjunction with the needed) data in the system by requesting it from a server various mechanisms, to implement storage schemes and policies in a DP system employing the invention. For 15 processor. Any such processor can send a request to update or resynchronize all of its directories (starting at a root example, the size, age and location of a data item (or of directory), simply by using the Synchronize Directories groups of data items) is provided. This information can be extended mechanism on the needed directories. used to decide how the data items should be treated. For Using the accounting log or some other user provided example, a processor may implement a policy of deleting local copies of all data items over a certain age if other 20 mechanism, a user can prove the existence of certain data items at certain times. By publishing (in a public place) a list copies of those data items are present elsewhere in the system. The age (or variations on the age) can be determined of all True Names in the system on a given day (or at some using the time of last access or modification in the local given time), a user can later refer back to that list to show directory extensions table, and the presence of other copies that a particular data item was present in the system at the of the data item can be determined either from the Safe Flag 25 time that list was published. Such a mechanism is useful in tracking, for example, laboratory notebooks or the like to or the source IDs, or by checking which other processors in prove dates of conception of inventions. Such a mechanism the system have copies of the data item and then reserving at least one of those copies. also permits proof of possession of a data item at a particular In operation, the system can keep track of data items date and time. regardless of how those items are named by users (or 30 The accounting log file can also track the use of specific regardless of whether the data items even have names). The data items and files by content for accounting purposes. For system can also track data items that have different names instance, an information utility company can determine the (in different or the same location) as well as different data data identities of data items that are stored and transmitted through its computer systems, and use these identities to items that have the same name. Since a data item is identified by the data in the item, without regard for the context of the 35 provide bills to its customers based on the identities of the data, the problems of inconsistent naming in a DP system are data items being transmitted (as defined by the substantially overcome. unique identifier). The assignment of prices for storing and In operation, the system can publish data items, allowing transmitting specific True Files would be made by the other, possibly anonymous, systems in a network to gain information utility and/or its data suppliers; this information access to the data items and to rely on the availability of 40 would be joined periodically with the information in the accounting log file to produce customer statements. these data items. True Names are globally unique identifiers Backing up data items in a DP system employing the which can be published simply by copying them. For example, a user might create a textual representation of a file present invention can be done based on the True Names of on system A with True Name N (for instance as a hexadecithe data items. By tracking backups using True Names, mal string), and post it on a computer bulletin board. 45 duplication in the backups is prevented. In operation, the Another user on system B could create a directory entry F system maintains a backup record of data identifiers of data for this True Name N by using the Link Path to True Name items already backed up, and invokes the Copy File or primitive mechanism. (Alternatively, an application could Directory operating system mechanism to copy only those be developed which hides the True Name from the users, but data items whose data identifiers are not recorded in the provides the same public transfer service.) 50 backup record. Once a data item has been backed up, it can be restored by retrieving it from its backup location, based When a program on system B attempts to open pathname on the identifier of the data item. Using the backup record F linked to True Name N, the Locate Remote File primitive produced by the backup to identify the data item, the data mechanism would be used, and would use the Locate True item can be obtained using, for example, the Make True File File remote mechanism to search for True Name N on one or more remote processors, such as system A. If system B 55 Local primitive mechanism. has access to system A, it would be able to realize the True In operation, the system can be used to cache data items File (using the Realize True File from Location primitive from a server, so that only the most recently accessed data mechanism) and use it locally. Alternatively, system B could items need be retained. To operate in this way, a cache client find True Name N by accessing any publicly available True is configured to have a local registry (its cache) with a Name server, if the server could eventually forward the 60 remote Local Directory Extensions table (from the cache request to system A. server). Whenever a file is opened (or read), the Local Clients of a local server can indicate that they depend on Directory Extensions table is used to identify the True Name, and the Make True File Local primitive mechanism a given True File (using the Reserve True File remote inspects the local registry. When the local registry already mechanism) so that the True File is not deleted from the server registry as long as some client requires access to it. 65 has a copy, the file is already cached. Otherwise, the Locate (The Retire True File remote mechanism is used to indicate True File remote mechanism is used to get a copy of the file. that a client no longer needs a given True File.) This mechanism consults the cache server and uses the Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 53 of 57 PageID #: 66 5,978,791 37 38 Whenever a pathname is traversed, the Get Files in Request True File remote mechanism to make a local copy, Directory operating system mechanism is used, and when it effectively loading the cache. encounters a frozen directory, it uses the Expand Frozen The Groom Cache background mechanism flushes the Directory primitive mechanism. cache, removing the least-recently-used files from the cache A frozen directory can be copied from one pathname to client's True File registry. While a file is being modified on 5 another efficiently, merely by copying its True Name. The a cache client, the Lock Cache and Update Cache remote Copy File operating system mechanism is used to copy a mechanisms prevent other clients from trying to modify the frozen directory. same file. Thus it is possible to efficiently create copies of different In operation, when the system is being used to cache data versions of a directory, thereby creating a record of its items, the problems of maintaining cache consistency are 10 history (hence a version control system). avoided. In operation, the system can maintain a local inventory of To access a cache and to fill it from its server, a key is all the data items located on a given removable medium, required to identify the data item desired. Ordinarily, the key such as a diskette or CD-ROM. The inventory is indepenis a name or address (in this case, it would be the pathname dent of other properties of the data items such as their name, of a file). If the data associated with such a key is changed, 15 location, and date of creation. the client's cache becomes inconsistent; when the cache The Inventory Existing Directory extended mechanism client refers to that name, it will retrieve the wrong data. In provides a way to create True File Registry entries for all of order to maintain cache consistency it is necessary to notify the files in a directory. One use of this inventory is as a way every client immediately whenever a change occurs on the to pre-load a True File registry with backup record infor20 mation. Those files in the registry (such as previously server. installed software) which are on the volumes inventoried By using an embodiment of the present invention, the need not be backed up onto other volumes. cache key uniquely identifies the data it represents. When The Inventory Removable, Read-only Files extended the data associated with a name changes, the key itself mechanism not only determines the True Names for the files changes. Thus, when a cache client wishes to access the modified data associated with a given file name, it will use 25 on the medium, but also records directory entries for each a new key (the True Name of the new file) rather than the key file in a frozen directory structure. By copying and modifying this directory, it is possible to create an on line patch, to the old file contents in its cache. The client will always or small modification of an existing read-only file. For request the correct data, and the old data in its cache will be example, it is possible to create an online representation of eventually aged and flushed by the Groom Cache back30 a modified CD-ROM, such that the unmodified files are ground mechanism. actually on the CD-ROM, and only the modified files are Because it is not necessary to immediately notify clients when changes on the cache server occur, the present invenonline. tion makes it possible for a single server to support a much In operation, the system tracks possession of specific data larger number of clients than is otherwise possible. items according to content by owner, independent of the In operation, the system automatically archives data items 35 name, date, or other properties of the data item, and tracks the uses of specific data items and files by content for as they are created or modified. After a file is created or accounting purposes. Using the Track for Accounting Purmodified, the Close File operating system mechanism creposes extended mechanism provides a way to know reliably ates an audit file record, which is eventually processed by the Process Audit File Entry primitive mechanism. This which files have been stored on a system or transmitted from mechanism uses the New True File primitive mechanism for 40 one system to another. any file which is newly created, which in turn uses the True Names in Relational and Object-Oriented Databases Mirror True File background mechanism if the True File is Although the preferred embodiment of this invention has in a mirrored or archived region. This mechanism causes one been presented in the context of a file system, the invention of True Names would be equally valuable in a relational or or more copies of the new file to be made on remote 45 object-oriented database. A relational or object-oriented processors. database system using True Names would have similar In operation, the system can efficiently record and prebenefits to those of the file system employing the invention. serve any collection of data items. The Freeze Directory primitive mechanism creates a True File which identifies all For instance, such a database would permit efficient elimiof the files in the directory and its subordinates. Because this nation of duplicate records, support a cache for records, True File includes the True Names of its constituents, it 50 simplify the process of maintaining cache consistency, prorepresents the exact contents of the directory tree at the time vide location-independent access to records, maintain it was frozen. The frozen directory can be copied with its archives and histories of records, and synchronize with distant or disconnected systems or databases. components preserved. The mechanisms described above can be easily modified The Acquire True File remote mechanism (used in mirroring and archiving) preserves the directory tree structure 55 to serve in such a database environment. The True Name registry would be used as a repository of database records. by ensuring that all of the component segments and True Files in a compound data item are actually copied to a All references to records would be via the True Name of the record. (The Local Directory Extensions table is an example remote system. Of course, no transfer is necessary for data items already in the registry of the remote system. of a primary index that uses the True Name as the unique In operation, the system can efficiently make a copy of 60 identifier of the desired records.) any collection of data items, to support a version control In such a database, the operations of inserting, updating, mechanism for groups of the data items. and deleting records would be implemented by first assimiThe Freeze Directory primitive mechanism is used to lating records into the registry, and then updating a primary create a collection of data items. The constituent files and key index to map the key of the record to its contents by segments referred to by the frozen directory are maintained 65 using the True Name as a pointer to the contents. in the registry, without any need to make copies of the The mechanisms described in the preferred embodiment, constituents each time the directory is frozen. or similar mechanisms, would be employed in such a Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 54 of 57 PageID #: 67 5,978,791 39 40 system. These mechanisms could include, for example, the record of identifiers of data items backed up, and mechanisms for calculating true names, assimilating, invoking duplication means to copy only those data locating, realizing, deleting, copying, and moving True items whose data identifiers are not recorded in the Files, for mirroring True Files, for maintaining a cache of backup record. True Files, for grooming True Files, and other mechanisms 5 9. An apparatus as in claim 8, further comprising: based on the use of substantially unique identifiers. recovery means for retrieving a data item previously While the invention has been described in connection backed up by said backup means, based on the identiwith what is presently considered to be the most practical fier of the data item, said recovery means using the and preferred embodiments, it is to be understood that the backup record to identify the data item, and invoking invention is not to be limited to the disclosed embodiment, 10 access means to retrieve the data item. but on the contrary, is intended to cover various modifica10. An apparatus as in claim 2, wherein a location is a tions and equivalent arrangements included within the spirit computer among a network of computers, the apparatus and scope of the appended claims. further comprising: What is claimed is: 1. In a data processing system, an apparatus comprising: remote existence means for determining whether a data identity means for determining, for any of a plurality of 15 item is present at a remote location in the system from data items present in the system, a substantially unique a current location in the system, based on the identifier identifier, the identifier being determined using and of the data item, said remote location using local depending on all of the data in the data item and only existence means at the remote location to determine the data in the data item, whereby two identical data whether the data item is present at the remote location, items in the system will have the same identifier; and 20 and providing the current location with an indication of existence means for determining whether a particular data the presence of the data item at the remote location. item is present in the system, by examining the iden11. An apparatus as in claim 4, wherein a location is a tifiers of the plurality of data items. computer among a network of computers, the apparatus 2. An apparatus as in claim 1, further comprising: further comprising: local existence means for determining whether an 25 requesting means for requesting a data item at a current instance of a particular data item is present at a parlocation in the system from a remote location in the ticular location in the system, based on the identifier of system, based on the identifier of the data item, said the data item. remote location using access means at the remote 3. An apparatus as in claim 2, wherein each location location to obtain the data item and to send it to the contains a distinct plurality of data items, and wherein said 30 current location if it is present. local existence means determines whether a particular data 12. An apparatus as in claim 1, further comprising: item is present at a particular location in the system by context means for making and maintaining a context examining the identifiers of the plurality of data items at said association between at least one contextual name of a particular location in the system. data item in the system and the identifier of the data 35 4. An apparatus as in claim 2, further comprising: item; and data associating means for making and maintaining, for a referencing means for obtaining the identifier of a data data item in the system, an association between the data item in the system given a contextual name for the data item and the identifier of the data item; and item, using said context association. access means for accessing a particular data item using 40 13. An apparatus as in claim 12, further comprising: the identifier of the data item. assignment means for assigning a data item to a contex5. An apparatus as in claim 2, further comprising: tual name, invoking said identity means to determine duplication means for copying a data item from a source the identifier of the data item, and invoking said context to a destination in the data processing system, by means to make or modify the context association providing said destination with the data item only if it 45 between the contextual name of the data item and the is determined using the data identifier that the data item identifier of the data item. is not present at the destination. 14. An apparatus as in claim 12, further comprising: 6. An apparatus as in claim 4, further comprising: data associating means for making and maintaining, for a assimilation means for assimilating a new data item into data item in the system, an association between the data the system, said assimilation means invoking said 50 item and the identifier of the data item; identity means to determine the identifier of the new access means for accessing a particular data item using data item and invoking said data associating means to the identifier of the particular data item; and associate the new data item with its identifier. 7. An apparatus as in claim 4, further comprising: contextual name access means for accessing a data item in the system for a given context name of the data item, duplication means for duplicating a data item from a 55 determining the data identifier associated with the source location to a destination location in the data given context name, and invoking said access means to processing system, based on the identifier of the data access the data item using the data identifier. item, said duplication means invoking said local exist15. An apparatus as in claim 11, further comprising: ence means to determine whether an instance of the data item is present at the destination location, and 60 transparent access means for accessing a data item from invoking said access means to provide said destination one of several locations, using the identifier of the data with the data item only if said local existence means item, said transparent access means invoking said local determines that no instance of the data item is present existence means to determine if the particular data item at the destination. is present at the current location, and, in the case when 8. An apparatus as in claim 7, further comprising: 65 the particular data item is not present at the current backup means for making copies of data items in the location, invoking said requesting means to obtain the system, said backup means maintaining a backup data item from a remote location. Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 55 of 57 PageID #: 68 5,978,791 41 42 16. An apparatus as in claim 15, further comprising: requesting with a particular data identifier, to confirm identifier copy means for copying an identifier of a data that the data item obtained from the requesting means item from a source location to a destination location. is the same data item as the data item requested, the 17. An apparatus as in claim 15, further comprising: verifying means invoking the identity means to detercontext means for making and maintaining a context 5 mine the data identifier of the obtained data item, and association between a contextual name of a data item in comparing the determined data identifier with the parthe system and the identifier of the data item; ticular data identifier to verify the obtained data item. context copy means for copying a data item from a source 24. An apparatus as in claim 2, wherein a location is at location to a destination location, given the contextual least one of a storage location and a processing location, and name of the data item, by copying only the context 10 wherein a storage location is at least one of a data storage association between the contextual identifier and the device and a data storage volume, and wherein a processing data identifier from the source location to the destinalocation is at least one of a data processor and a computer. tion location; and 25. An apparatus as in claim 3, wherein at least some of transparent referencing means for obtaining a data item from one of several locations the system given a 15 said data items are compound data items, each compound contextual name for the data item, said transparent data item including at least some component data items in a referencing means invoking said context association to fixed sequence, and wherein the identity means determines determine the data identifier of a data item given a the identifier of a compound data item based on the identifier contextual name, and invoking said transparent access of each component data item of the compound data item. means to access the data item from one of several 20 26. An apparatus as in claim 3, further comprising: locations given the identifier of the data item. context associating means for making and maintaining a 18. An apparatus as in claim 1, wherein at least some of context association, for any data item in the system, said data items are compound data items, each compound between the identifier of the data item and at least one data item including at least some component data items in a contextual name of the data item at a particular location fixed sequence, and wherein the identity means determines 25 in the system; the identifier of a compound data item based on each means for obtaining the identifier of a data item in the component data item of the compound data item. system given a contextual name for the data item at a 19. An apparatus as in claim 18, wherein said compound particular location in the system; and data items are files and said component data items are logical copy means for associating the data identifier segments, and wherein the identity means determines the 30 corresponding to a contextual name at a source location identifier of a file based on the identifier of each data with a contextual name at a destination location in the segment of the file. data processing system. 20. An apparatus as in claim 18, wherein said compound 27. An apparatus as in claim 25, wherein said compound data items are directories and said component data items are files or subordinate directories, and wherein the identity 35 data items are files and said component data items are segments, and wherein the identity means determines the means determines the identifier of a given directory based on identifier of a file based on the identifier of each data each file and subordinate directory within the given direcsegment of the file. tory. 28. An apparatus as in claim 25, further comprising: 21. An apparatus as in claim 11, further comprising: compound copy means for copying a data item from a means for advertising a data item from a location in the 40 source location to a destination location in the data system to at least one other location in the system, said processing system, said compound copy means invokmeans for advertising providing each of said at least ing said local existence means to determine whether the one other location with the data identifier of the data data item is present at the destination, and to determine, item, and providing the data item to only those locawhen the data item is a compound data item, whether tions of said other locations that request said data item 45 the component data items of the compound data item in response to said providing. are present at the destination, and providing said des22. An apparatus as in claim 18, further comprising: tination with the data item only if said local existence local existence means for determining whether a particumeans determines that the data item is not present at the lar data item is present at a particular location in the destination, and providing said destination with each system, based on the identifier of the data item; and 50 component data item only if said local existence means compound copy means for copying a data item from a determines that the component data item is not present source to a destination in the data processing system, at the destination. said compound copy means invoking said local exist29. An apparatus as in any of claims 1-28, wherein a data ence means to determine whether the data item is present at the destination, and to determine, when the 55 item is at least one of a file, a database record, a message, a data segment, a data block, a directory, and an instance an data item is a compound data item, whether the comobject class. ponent data items of the compound data item are 30. A method of identifying a data item present in a data present at the destination, and providing said destinaprocessing system for subsequent access to the data item, the tion with the data item only if said local existence means determines that the data item is not present at the 60 method comprising: determining a substantially unique identifier for the data destination, and providing said destination with each item, the identifier depending on and being determined component data item only if said local existence means using all of the data in the data item and only the data determines that the component data item is not present in the data item, whereby two identical data items in the at the destination. system will have the same identifier; and 65 23. An apparatus as in claim 11, further comprising: means for verifying the integrity of a data item obtained accessing a data item in the system using the identifier of the data item. from the requesting means in response to providing the Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 56 of 57 PageID #: 69 5,978,791 43 44 31. A method as in claim 30, further comprising: (A) maintammg a backup record of identifiers of data items backed up at the previous backup time; and making and maintaining, for a plurality of data items present in the system, an association between each of (B) for each of the plurality of data items present in the the data items and the identifier of each of the data data processing system, items, wherein said accessing a data item accesses a 5 (i) determining a substantially unique identifier for the data item via the association. data item, the identifier depending on and being 32. A method as in claim 31, further comprising: determined using all of the data in the data item and only the data in the data item, whereby two identical assimilating a new data item into the system, by deterdata items in the system will have the same identimining the identifier of the new data item and associ10 fier; ating the new data item with its identifier. (ii) determining those data items of the plurality of data 33. A method for duplicating a given data item present at items whose identifiers are not in the backup record; a source location to a destination location in a data processand ing system, the method comprising: (iii) based on the determining, copying only those data determining a substantially unique identifier for the given items whose data identities are not recorded in the data item, the identifier depending on and being deter- 15 backup record. mined using all of the data in the data item and only the 37. A method as in claim 36, further comprising: data in the data item, whereby two identical data items recording in the backup record the identifiers of those data in the system will have the same identifier; items copied in said copying. determining, using the data identifier, whether the data 20 38. A method of locating a particular data item at a item is present at the destination location; and location in a data processing system, the method comprisbased on the determining whether the data item is present, ing: providing the destination location with the data item (A) determining a substantially unique identifier for the only if the data item is not present at the destination. data item, the identifier depending on and being deter34. A method as in claim 33, wherein the given data item 25 mined using all of the data in the data item and only the is a compound data item having a plurality of component data in the data item, whereby two identical data items data items, the method further comprising: in the system will have the same identifier; for each data item of the component data items, (B) requesting the particular data item by sending the data obtaining the component data identifier of the data item identifier of the data item from the requester location to by determining a substantially unique identifier for 30 at least one location of a plurality of provider locations the data item, the identifier depending on and being in the system; and determined using all of the data in the data item and (C) on at least some of the provider locations, only the data in the data item, whereby two identical (a) for each data item of a plurality of data items at the data items in the system will have the same identiprovider locations, fier; 35 (i) determining a substantially unique identifier for the determining, using the obtained component data data item, the identifier depending on and being identifier, whether the data item is present at the determined using all of the data in the data item and destination; and only on the data in the data item, whereby two based on the determining, providing the destination identical data items in the system will have the same with the data item only if the data item is not present 40 identifier; and at the destination. (ii) making and maintaining a set of identifiers of data 35. A method for determining whether a particular data items, item is present in a data processing system, the method (b) determining, based on the set of identifiers, whether comprising: the data item corresponding to the requested data (A) for each data item of a plurality of data items present 45 identifier is present at the provider location; and in the system, (c) based on the determining, when the provider loca(i) determining a substantially unique identifier for the tion determines that the particular data item is data item, the identifier depending on and being present at the provider location, notifying the determined using all of the data in the data item and requestor that the provider has a copy of the given only the data in the data item, whereby two identical 50 data item. data items in the system will have the same identi39. The method of claim 38, further comprising: fier; and (a) for each data item of a plurality of data items present (ii) making and maintaining a set of identifiers of the at said provider locations, plurality of data items; and making and maintaining an association between the (B) for the particular data item, 55 data item and the identifier of the data item, (i) determining a particular substantially unique iden(b) in response to said notifying, said client location tifier for the data item, the identifier depending on copying said data item from one of said responding and being determined using all of the data in the data remote locations, using said association to access the item and only the data in the data item, whereby two data item given the data identifier. identical data items in the system will have the same 60 40. A method of locating a particular data item among a identifier; and plurality of locations, each of the locations having a plurality (ii) determining whether the particular identifier is in of data items, the method comprising: the set of data items. 36. A method of backing up, of a plurality of data items determining, for the particular data item and for each data present in a data processing system, data items modified 65 item of the plurality of data items, a substantially since a previous backup time in the data processing system, unique identifier for the data item, the identifier the method comprising: depending on and being determined using all of the Case 6:11-cv-00656 Document 1-2 Filed 12/08/11 Page 57 of 57 PageID #: 70 5,978,791 45 46 data in the data item and only the data in the data item, whereby two identical data items in the system will have the same identifier; and determining the presence of the particular data item in each of the plurality of locations by determining whether the identifier of the particular data item is present at each of the locations. 41. The method of claim 30, wherein said accessing further comprises: for a given data identifier and for a given current location and a remote location in the system: determining whether the data item corresponding to the given data identifier is present at the current location, and based on said determining, if said data item is not present at the current location, fetching the data item from a remote location in the system to the current location. 42. The method of claim 41, further comprising: for each contextual name at a location, making and maintaining a context association between the context name of a data item and the identifier of said data item, and when some context association changes at said current location, and notifying said remote location of a modification to the context association. 43. The method of claim 42, further comprising: at said remote location, updating the association between the contextual identifier of the data item and the identifier of the data item. 44. The method of claim 43, further comprising: from said remote location, notifying all other locations that said data item has been modified, by providing the contextual identifier and data identifier of said data item to said other locations. 45. The method of claim 44, further comprising, at each location notified that the data item has been modified: modifying an association between the contextual identifier of the data item and the data identifier of the data item, to record that the data item has been modified. 46. A method of maintaining at least a predetermined number of copies of a given data item in a data processing system, at different locations in the data processing system, the data processing system being one wherein data is identified by a substantially unique identifier, the identifier depending on and being determined using all of the data in the data item and only the data in the data item, whereby two identical data items in the system will have the same identifier, and wherein any data item in the system may be accessed using only the identifier of the data item, the method comprising: (i) sending, from a first location in the system, the data identifier of the given data item to other locations in the system; and (ii) in response to the sending, at each of the other locations, (A) determining whether the data item corresponding to the data identifier is present at the other location, and based on the determining, and (B) informing the first location whether the data item is present at the other location; and (iii) in response to the informing from the other locations, at the first location, (A) determining whether the data item is present in at least the predetermined number of other locations, and based on the determining, (B) when less than the predetermined number of other locations have a copy of the data item, requesting some locations that do not have a copy of the data item make a copy of the data item. 47. A method as in claim 46, wherein said step (iii) further comprises: (C) when more than the predetermined number of other locations have a copy of the data item present, requesting some locations that do have a copy of the data item present delete the copy of the data item. 48. A method as in any of claims 30--45, 46 and 47, wherein said data items are at least one of a file, a database record, a message, a data segment, a data block, a directory, and an instance of an object class. 5 10 15 20 25 30 35 40 * * * * *

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?