PersonalWeb Technologies LLC v. Google, Inc. et al
Filing
1
COMPLAINT for Patent Infringement against All Defendants ( Filing fee $ 350 receipt number 0540-3354008.), filed by PersonalWeb Technologies LLC. (Attachments: # 1 Civil Cover Sheet, # 2 Exhibit A - Patent '791, # 3 Exhibit B - Patent '280, # 4 Exhibit C - Patent '442, # 5 Exhibit D - Patent '310, # 6 Exhibit E - Patent '539, # 7 Exhibit F - Patent '544, # 8 Exhibit G - Patent '662, # 9 Exhibit H - Patent '096)(Tribble, Max) (Additional attachment(s) added on 12/14/2011: # 10 Exhibit A Searchable, # 11 Exhibit B Searchable, # 12 Exhibit C Searchable, # 13 Exhibit D Searchable, # 14 Exhibit E Searchable, # 15 Exhibit F Searchable, # 16 Exhibit G Searchable, # 17 Exhibit H Searchable) (mjc, ). (Entered: 12/08/2011)
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 1 of 57 PageID #: 14
EXHIBIT A
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 2 of 57 PageID #: 15
111111
1111111111111111111111111111111111111111111111111111111111111
US005978791A
United States Patent
[11]
Farber et ai.
[54]
Filed:
4,922,414 5/1990 Holloway et al. ...................... 364/200
4,972,367 11/1990 Burke ...................................... 364/900
5,007,658 4/1991 Bendert et al. ... ... .... ... ... ... ... ... 395/600
5,025,421 6/1991 Cho .................................... 365/230.05
5,050,074 9/1991 Marca ..................................... 364/200
5,050,212 9/1991 Dyson ....................................... 380/25
5,057,837 10/1991 Colwell et al. ........................... 341/55
5,129,081 7/1992 Kobayashi et al. ..................... 395/600
5,129,082 7/1992 Tirling et al. .. ... ... .... ... ... ... ... ... 395/600
5,144,667 9/1992 Pogue, Jr. et al. ........................ 380/45
5,179,680 1/1993 Colwell et al. ......................... 395/425
5,202,982 4/1993 Gramlich et al. ....................... 395/600
5,208,858 5/1993 Vollert et al. ............................. 380/43
5,276,901 1/1994 Howell et al. .......................... 395/800
5,301,286 4/1994 Rajani ..................................... 395/400
5,301,316 4/1994 Hamilton et al. ....................... 395/600
5,343,527 8/1994 Moore ......................................... 380/4
5,357,623 10/1994 Megory-Cohen ....................... 395/425
5,384,565 1/1995 Cannon .............................. 340/825.44
5,404,508 4/1995 Konrad et al. .......................... 395/600
Appl. No.: 08/960,079
[22]
Nov. 2, 1999
Assignee: Kinetech, Inc., Northbrook, Ill.
[21]
Date of Patent:
Inventors: David A. Farber, Ojai, Calif.; Ronald
D. Lachman, Northbrook, Ill.
[73]
5,978,791
DATA PROCESSING SYSTEM USING
SUBSTANTIALLY UNIQUE IDENTIFIERS TO
IDENTIFY DATA ITEMS, WHEREBY
IDENTICAL DATA ITEMS HAVE THE SAME
IDENTIFIERS
[75]
Patent Number:
[45]
[19]
Oct. 24, 1997
Related U.S. Application Data
[63]
Continuation of application No. 08/425,160, Apr. 11, 1995,
abandoned.
[51]
[52]
[58]
Int. CI. 6 ...................................................... G06F 17/30
U.S. CI. .................................... 707/2; 707/1; 707/200
Field of Search ..................................... 707/2, 1,200
[56]
References Cited
U.S. PATENT DOCUMENTS
3,668,647
4,215,402
4,290,105
4,376,299
4,405,829
4,412,285
4,414,624
4,441,155
4,464,713
4,490,782
4,571,700
4,577,293
4,642,793
4,675,810
4,691,299
4,725,945
4,773,039
4,887,235
4,888,681
6/1972
7/1980
9/1981
3/1983
9/1983
10/1983
11/1983
4/1984
8/1984
12/1984
2/1986
3/1986
2/1987
6/1987
9/1987
2/1988
9/1988
12/1989
12/1989
Evangelisti et al. ................. 340/172.5
Mitchell et al. ........................ 364/200
Cichelli et al. ... ... ... ... .... ... ... ... 364/200
Rivest ..................................... 364/900
Rivest et al. ........................... 178/22.1
Neches et al. .......................... 364/200
Summer, Jr. et al. .................. 364/200
Fletcher et al. ......................... 364/200
Benhase et al. .. ... ... ... .... ... ... ... 364/200
Dixon et al. ............................ 364/200
Emry, Jr. et al. .. ... ... .... ... ... ..... 364/900
Matick et al. .......................... 365/189
Meaden ................................... 364/900
Gruner et al. .......................... 364/200
Rivest et al. ............................ 365/185
Kronstadt et al. ... ... ... ... .... ... ... 364/200
Zamora ................................... 364/900
Holloway et al. ... ... ... ... .... ... ... 364/900
Barnes et al. .. ... ... ... ... .... ... ... ... 364/200
OTHER PUBLICATIONS
Witold Litwin et ai, Linear Hashing for Distributed Files,
ACM SIGMOD, May, 1993 pp. 327-336.
Ming-Ling Lo, et ai, On Optimal Processor Allocation to
Support Pipelined Hash loins, ACM SIGMOD, pp. 69-78,
May 1993.
Thomas A. Berson, Differential Cryptanalysis Mod 2 32 with
Applications to MD5, pp. 69-81, 1992.
(List continued on next page.)
Primary Examiner-Paul V. Kulik
Assistant Examiner-lean R. Homere
Attorney, Agent, or Firm-Pillsbury Madison & Sutro LLP
[57]
ABSTRACT
In a data processing system, a mechanism identifies data
items by substantially unique identifiers which depend on all
of the data in the data items and only on the data in the data
items. The system also determines whether a particular data
item is present in the database by examining the identifiers
of the plurality of data items.
48 Claims, 31 Drawing Sheets
=_:r--_102
I"'l
~
["126l
L":':J
["126l
~
["126l
L-J
["126l
~
MEMORY
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 3 of 57 PageID #: 16
5,978,791
Page 2
OlliER PUBLICATIONS
William Perrizo, et aI., Distributed Join Processing Performance Evaluation, 1994. Twenty-Seventh Hawaii International Conference on System Sciences, vol. II, pp. 236-244.
A concurrency Control Mechanism based on Extendible
Hashing for Main Memory Database Systems, Vijay Kumar,
pp. 109-113, ACM, vol. 3, 1989.
Birgit Pfitzmann, Sorting Out Signature Schemes, Nov.
1993, 1st Conf. Computer & Comm. Security '93 pp. 74-85.
Bert dem Boer, et aI., Collisions for the compression function of MDs pp. 292-304, 1994.
Sakti Pramanik, et aI., Multi-Directory Hashing, 1993, Info.
Sys., vol. 18, No.1, pp. 63-74.
Murlidhar Koushik, Dynamic Hashing With Distributed
Overflow Space: A File Organization With Good Insertion
Performance, 1993, Info. Sys., vol. 18, No.5, pp. 299-317.
Witold Litwin, et aI., LH*-Linear Hashing for Distributed
Files, HP Labs Tech. Report No. HPL-93-21 Jun. 1993 pp.
1-22.
Yuliang Zheng, et aI., HAVAL - A One-Way Hashing
Algorithm with Variable Length of Output (Extended
Abstract), pp. 83-105, Advances in Cryptology, AUSCRIPT
'92,1992.
Chris Charnes and Josef Pieprzky, Linear Nonequivalence
versus Nonlinearity, Pieprzky, pp. 156-164, 1993.
Zhiyu Tian, et aI., A New Hashing Function: Statistical
Behaviour and Algorithm, pp. 3-13, SIGIR Forum, 1993.
G. L. Friedman, Digital Camera With Apparatus For
Authentication of Images Produced From an Image File,
NASA Case No. NPO-19108-1-CU, Serial No. 08/159,980,
Nov. 24, 1993.
H. Goodman, Feb. 9, 1994 Ada, Object-Oriented Techniques, and Concurrency in Teaching Data Sructures and
File Management Report Documentation P. AD-A275
385 - 94-04277.
Advances in Cryptology-EUROCRYPT '93, Workshop on
the Theory and Application of Cryptographic Techniques
Lofthus, Norway, May 23-27, 1993 Proceedings.
Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, vol. 22, Issue 2, Jun. 1993.
Advances in Cryptology-AUSCRYPT '92 - Workshop on
the Theory and Application of Cryptographic Techniques
Gold Coast, Queensland, Australia Dec. 13-16, 1992 Proceedings.
Search Report dated Jun. 24, 1996.
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Nov. 2, 1999
Filed 12/08/11 Page 4 of 57 PageID #: 17
0
0::
0r-
~
0
O
N
0
,...
,...
CI)
CI)
W
0
0
CI)
CI)
N
0
5,978,791
Sheet 1 of 31
0::
a.
w
(.)
0
~
a.
•
I
•
•
c
-.
lL.
I
O::!
0
CI)
CJ)
N
0
,...
(!)
W
0
I
0
0:::
I
,
a.
0::
0
'W
(!)W
~ ~~
~ 0 fij
N
0
,...
CI)
CI)
W
(.)
0
~
a.
1- 0
)CI)
•
•
•
0::
0
N
0
,...
f\
~w
~ ~ g
~ 0 £ij
1-0
\ ) CI)
CJ)
CJ)
w
(.)
0
0::
a.
~I
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
r----
Filed 12/08/11 Page 5 of 57 PageID #: 18
Nov. 2, 1999
5,978,791
Sheet 2 of 31
-------------------------------------------------l
I
I
:
I
I
I
I
I
I
I
~
I
L'-
I
I
I
C)
I
I
~ I~ ~ II~ ~ II~ ~ II~ ~ II~ ~ I
i
o
I
H
I
I
I
IC\J
10
I
I
I
I
I
I
.......
0::
('<)...l
w 00::: [JI- [JI- 81- E]...l
~
C\I
0
..-...l
C\I
.......
U.
C\I
«')
1-..-
..-
CJ)
LO
..-
..-
(!)
I
I
I
I
I
I
I~
IC)
I
I
I
t
~
CJ)
/\
CJ)
W
U
: ~
..q
~
co :J
C\I
~ ~
W
.J(!)W
.~ <2:: ~
~ ~~ ~ fij
,u
t
I
I
I
I
I
t
:
I 0..
I- C
I
I
\ } CJ)
I
I _______________________________________________________ J
L
L:
.
(!)
lL.
FILE
SYSTEM
I
REGION
I
I
117
~
~
I
117
REGION
~
.....
.....
REGION
• • •
=
I
r117
REGION
I
z
o
118
118
DIRECTORY
•••
DIRECTORY
~N
'""'"
\C
\C
\C
'JJ.
=-
1
( FILE
FILE)
~
~
.....
l
~
FILE
o
....,
~
'""'"
I 122
SEGMENT
~
I
12~
SEGMENT
SEGMENT
'--
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 6 of 57 PageID #: 19
DIRECTORY
118
~
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
Nov. 2, 1999
Filed 12/08/11 Page 7 of 57 PageID #: 20
5,978,791
Sheet 4 of 31
FIG.3
138
Region ID
Pathname
True Name
Type
File ID
Time of last access
Time of last modification
Safe flag
Lock flaq
Size
Owner
FIG.4
140
True Name
File ID
Compressed File ID
Source IDs
Dependent processors
Use count
Time of last access
Expiration
Grooming delete count
142
Reqion ID
Reqion file system
Reqion pathname
Region status
Mirror processor(s)
Mirror duplication count
Policy
FIG.5
144
FIG.6
~
source ID
~
.....
.....
source type
~
=
source rights
source availability
source location
146
~
Original Name
~N
Operation
'""'"
\C
\C
\C
Type
Processor ID
Timestamp
'JJ.
=-
Pathname
~
~
.....
True Name
FIG.8
l::::: :::~
Ul
----
148
o
....,
~
'""'"
True Name
150
FIG.9
IT~e
Name
l~censee
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 8 of 57 PageID #: 21
FIG. 7
z
o
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 9 of 57 PageID #: 22
Nov. 2, 1999
Sheet 6 of 31
5,978,791
FIG. 10(0)
SIMPL
DATA ITEM
",
-------------- --------------\
I
\
S212
I
I
,
\
COMPUTE MD FUNCTION ON
DATA ITEM
S214
APPEND LENGTH MODULO 32 OF
,
DATA ITEM
,
I
\
I
\
\
\
, ....
,.
----------------------------TRUE NAME
~
I
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 10 of 57 PageID #: 23
Nov. 2, 1999
8216
_YES
5,978,791
Sheet 7 of 31
0 _ _...,
FIG. IO(b)
DATA ITEM
SIMPLE?
S220
PARTITION DATA ITEM INTO
SEGMENTS
S222
ASSIMILATE EACH SEGMENT
(COMPUTING ITS TRUE NAME)
I
~
r
-
-
-
-
-82 f8- - ---'\
r
,
r
'
,
, COMPUTE TRUE
:
, NAME OF SIMPLE :
,
:
DATA ITEM
:
------
I
S224
/
CREATE INDIRECT BLOCK OF
SEGMENT TRUE NAMES
8226
ASSIMILATE INDIRECT BLOCK
(COMPUTING ITS TRUE NAME)
S228
REPLACE FINAL 32 BITS OF TRUE
NAME WITH LENGHT MOD 32 OF DATA
ITEM
~
8230
FI G. II
~
.....
.....
DETERMINE
TRUE NAME
a---/"
~
=
z
0
---- .... _- ..........
_. ,. __ .... _ .... _I. -_ ",-YES
~
~N
.
8236
'JJ.
*
*
*
*
=-
~
~
CREATE NEW ENTRY
SET USE COUNT TO 1
STORE FILE 10
SET OTHER FIELDS
...
.....
00
o
....,
~
'""'"
8238
8239
DELETE FILE 10
STORE FILE ID
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 11 of 57 PageID #: 24
'""'"
\C
\C
\C
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 12 of 57 PageID #: 25
Nov. 2, 1999
5,978,791
Sheet 9 of 31
FIG.12
YES
S240
UPDATE
DEPENDENCY
LIST
NO
S242
SEND MESSAGE TO
~--------------~ CACHESERVERTO
UPDATE CACHE
S244
COMPRESS
(IF DESIRED)
8246
MIRROR
(IF DESIRED)
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 13 of 57 PageID #: 26
Nov. 2,1999
5,978,791
Sheet 10 of 31
FIG.13
8250
SEARCH FOR
THE
FAIL
PATH NAME
FOUND
S258
ASSIMILATE
FILE 10
S256
FREEZE
DIRECTORY
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
Nov. 2, 1999
Filed 12/08/11 Page 14 of 57 PageID #: 27
5,978,791
Sheet 11 of 31
8260
CONFIRM THAT
TRUE NAME
EXISTS LOCALLY
8262
FIG.14
SEARCH FOR
PATHNAME IN
LDETABLE
8264
CONFIRM THAT
DIRECTORY
EXISTS
YES
NO
8270
CREATE
ENTRY IN LDE
& UPDATE
S268
DELETE
TRUE FILE
NO
./
...........
~
~
.....
.....
YES
~
=
NEGATIVE
RESPONSE
S278
REQUEST
MOUNT
S274
SEND RTF
MESSAGE &
WAIT FOR
RESPONSE
z
o
~
~N
POSITIVE
RESPONSE
FAIL
'JJ.
=-
~
~
.....
'N "
""'
S276
S280
FIND FILE
S282
l----------.-t-.I
FIG.15
ENTER TRUE FILE
RETURNED INTO
TFR
o
....,
~
'""'"
VERIFY TRUE
FILE (IF
I .... - - - - - - - - - J
...I
DESIRED)
Ol
....
\C
~
oe
....
~
\C
~
Filed 12/08/11 Page 15 of 57 PageID #: 28
'""'"
\C
\C
\C
Case 6:11-cv-00656 Document 1-2
~
•
rJJ.
•
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
Nov. 2,1999
Filed 12/08/11 Page 16 of 57 PageID #: 29
5,978,791
Sheet 13 of 31
""
C
"'"'
<.0
-L
L
•
(!)
...J
«
11.
~~
o
CIJ
II-CIJ
z<
O
I-CIJ
ZI-
w-<
w
_0
...J<
00
(33:
0::
m
LU
L -____________________________
~C/)
I0
i=:~ --0
::;:,
~
ll.
LU
ffifact~
<::ctO~
~
~
.....
.....
~
=
S290
0
STORE
PROCESSOR ID
~
Z
0
~
~N
S290B
LOOK UP TFR FOR
TRUE NAME & ADD
SOURCE LOCATION ID
TO SOURCE IDS FOR
TRUE NAME
FIG. 16(b)
'JJ.
=-
~
~
.....
'""'"
~
o
....,
~
'""'"
S291c
SEND MESSAGE TO
RESERVE TRUE FILE
ON SOURCE
PROCESSOR
o
~
S290C
OURCE IS
->- -1
PUBLISHING
SYSTEM? ..
~
YES
S291d
DETERMINE
EXPIRATION DATE
AND ADD TO LIST
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 17 of 57 PageID #: 30
'""'"
\C
\C
\C
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
Filed 12/08/11 Page 18 of 57 PageID #: 31
Nov. 2,1999
5,978,791
Sheet 15 of 31
(J)
(J)
w
w
Z
~
co
0)
c
..J
N
0
en
~
a..
:is
0
0
w
c
fa
0
z
).:
fa
);;
0
<::
-
•
(!)
0
<::
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 19 of 57 PageID #: 32
Nov. 2,1999
Sheet 16 of 31
5,978,791
I
I
i
r
,
0
0
.... ('C')
-~
CI)
I
c
W
...
0::::
~
0
(!)
l.L
W
Z
•
0
C
len
I
W
::J~Ci)
en
N>-O::::
o~w
Cl)O::J
Z
CI)
('C')I-en
1-
9
ow
0
wo
('C') ..JO::::
"¢
W::J
eno
en
llJ9
tt llJ
0<">
::s!5
00
<:(1)
W
..J
~u::
co c:(W
0
('C')
CI)
0'"
00
..J~
W
0::::
0:::: 0 I-o::::w
('C') wu..°
CD
0
CI)
~w!5
..J..J0
l5U::en
0::::
f3
>:
~
FIG.18{c)
,--YES,
~
.....
.....
~
=
z
o
~
'""'"
\C
\C
\C
'JJ.
=-
DELETE
TRUE FILE
8320
CREATENSW I~~~______~
seRATCH FILE
~
~
.....
8322
MAKE TRUE
'""'"
-..J
o
....,
~
'""'"
FILE LOCAL
Ul
....
\C
DONE
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 20 of 57 PageID #: 33
YEs{;;]
~N
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
- - - - -r-- - ----
~
.....
.....
~
=
0
/
USE COUNT """
YES
z
0
~
~N
FIG.18(b)
'""'"
\C
\C
\C
.-
'JJ.
=-
S330
COpy FILE TO NEW
FILE, STORE FILE 10
IN LDE TABLE,
DECREMENT USE
COUNT
~
~
.....
S328
SAVE FILE 10 &
REMOVETFR
ENTRY
'""'"
00
0
....,
~
'""'"
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 21 of 57 PageID #: 34
.....
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
~
S332
.....
.....
~
INCREMENT
FREEZE LOCK
=
FIG. 19(0)
z
o
f
~
y-
~N
~
FOR EACH
S334
SUBORDINATE r-----I~~I FREEZE IF
FILE AND
DIRECTORY
DIRECTORY IN THE
GIVEN DIRECTORY
\
.
)
H
'""'"
\C
\C
\C
i
S336
ASSIMILATE
UNASSIMILATED
FILE
'JJ.
=-
~
~
.....
'""'"
\C
o
....,
~
'""'"
S337
CREATE NEW
DATA ITEM
___ I-_
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 22 of 57 PageID #: 35
""'"
Case 6:11-cv-00656 Document 1-2
.
d
•
rJl
•
/
~
~
~
~
1
FOR EACH
8338
SUBORDINATE
ADD ENTRY TO
.---~~~ NEWDATA
FILE AND
DIRECTORY IN THE
ITEM
GIVEN DIRECTORY
I
8340
I
I
~
RECORD
ADDITIONAL
DESIRED
INFORMATION
~
.....
.....
~
=
z
o
J
~
~N
'""'"
\C
\C
\C
ASSIMILATE THE
NEW DATA ITEM
'JJ.
=-
FIG.19(b)
~
~
.....
N
C
o
....,
~
•
'""'"
S344
DECREMENT
THE FREEZE
LOCK
...
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 23 of 57 PageID #: 36
~
S342
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
~
.....
.....
~.
S346
FIG. 20
~
=
MAKE TRUE
FILE LOCAL
z
o
~
~,.
8354 )
NO MORE
ENTruES
\
S353
FOR EACH
DIRECTORY
ENTRY
.4~
~N
""\
S348
MORE
t-ENTRIES
....
~
READ
DIRECTORY
+
S350
CREATE FULL
PATHNAME
•
'""'"
\C
\C
\C
'JJ.
=-
~
~
.....
N
'"
o"'"
....,
~
'""'"
S352
LINK PATH TO
TRUE NAME
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 24 of 57 PageID #: 37
r
L_DONE~
~
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
u.s.
Patent
Nov. 2,1999
Filed 12/08/11 Page 25 of 57 PageID #: 38
5,978,791
Sheet 22 of 31
S354
WAIT FOR
FREEZE LOCK
TO TURN OFF
S356
FINDTFR
ENTRY
FIG.21
S358
DECREMENT
REFERENCE
COUNT
YES
S362
DELETE
TRUE FILE
NO
S364
REMOVE FILE 10
AND COMPRESSED
FILE to
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 26 of 57 PageID #: 39
Sheet 23 of 31
Nov. 2,1999
5,978,791
S365
GET
OPERATION
FIG. 22
S368
">--_YES _ _ _ _ _~
ASSIMILATE
S369
NEW TRUE
FILE
~_YES
S378
MODIFY USE
COUNT OF EACH
COMPONENT
S379
FOR EACH PARENT
DIRECTORY OR FILE,
UPDATE USE COUNT,
LAST ACCESS AND
MODIFY TIMES
S370
RECORD TRUE
NAME IN AUDIT
FILE
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Nov. 2,1999
Filed 12/08/11 Page 27 of 57 PageID #: 40
5,978,791
Sheet 24 of 31
."
8382
FIG. 23
VERIFY
GROOMING
LOCK OFF
S384
SET
GROOMING
LOCK
."
S386
SET GROOM
COUNTS
."
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 28 of 57 PageID #: 41
Nov. 2,1999
Sheet 25 of 31
S388
FIND LDE
RECORD
FIG. 24
."
S390
FIND TFR
RECORD
S392
INCREMENT
GROOMING
DELETE COUNT
."
S394
ADJUST FILE
SIZES
5,978,791
Case 6:11-cv-00656 Document 1-2
u.s. Patent
Filed 12/08/11 Page 29 of 57 PageID #: 42
Nov. 2,1999
Sheet 26 of 31
5,978,791
FIG. 25
S396
DELETE
FILE
S398
UNLOCK
GROOMING
LOCK
~
_____NO
c
YES ____---,
FIG. 26(0)
~
~
.....
.....
~
=
8404
8408
OPEN
DETERMINE
REGION
O~ PROHIBIT
z
o
~
~N
YES_
'JJ.
=-
~
~
.....
N
-..J
0
....,
~
YES
'""'"
8422
PROHIBIT
OPEN
Ul
....
\C
YES
-----
""-l
.
00
....
""-l
\C
~
Filed 12/08/11 Page 30 of 57 PageID #: 43
'""'"
\C
\C
\C
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
~
.....
.....
~
=
YES
I
..I",
LOCK IF NOT
LOCKED
z
o
~
~-
S417
CREATE
SCRATCH
COpy
S406
CREATE
SCRATCH FILE
S424
RETURN
SCRATCH FILE
ID
,
1-'
~
FIG.26(b)
.
5420
MAKE LOCAL
VERSION &
RETURN FILE ID
FROMTFR
.
'JJ.
=-
~
~
.....
N
00
o
....,
~
'""'"
Ul
....
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 31 of 57 PageID #: 44
'""'"
\C
\C
\C
'--"10 _ _--,
~ERASEFILE
~N
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
~
.....
.....
S422
DETERMINE LDE &
RTENTRY
RECORDS FOR
FILE
~
=
z
0
~
N
~
~
PROHIBIT
DELETION
'JJ.
=-
~
~
.....
N
\C
0
N°O
....,
Y
~
'""'"
8424
IDENTIFY TRUE
FILE FROM TRUE
NAME
Y
I
FIG. 27(0)
Ul
....
\C
-.....l
00
....
-.....l
\C
~
Filed 12/08/11 Page 32 of 57 PageID #: 45
'""'"
\C
\C
\C
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
~
.....
.....
~
)
NO _ _ _--.
=
z
o
~
")
DELETE
SCRATCH COPY
OF FILE
YES
~N
'""'"
\C
\C
\C
'JJ.
=-
~
~
.....
S430
~
c
DELETE
TRUE FILE
8431
REDUCE USE
COUNT BY ONE
I
o
....,
~
I
~
'""'"
S428
~
ADD ENTRY TO
AUDIT FILE
Ul
....
\C
FIG. 27(b)
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 33 of 57 PageID #: 46
S427
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
~
S432
LOOKUP
TRUE NAME
YES
FIG. 28
~
.....
.....
~
=
NO
z
o
~
~N
'""'"
\C
\C
\C
=-
~
~
.....
~
'""'
....,"
0
~
'""'"
S444
POSITIVE
RESPONSE
Ul
....
NEGATIVE
RESPONSE
\C
""-l
00
....
""-l
\C
~
Filed 12/08/11 Page 34 of 57 PageID #: 47
80
~ FO~;RD ~YES~EQUESTTO
REQUEST
FORWARDED?
'JJ.
Case 6:11-cv-00656 Document 1-2
d
•
rJl
•
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 35 of 57 PageID #: 48
5,978,791
1
2
object-oriented programming, locations in memory or on a
physical device, or the like) are always defined relative to a
specific context. For instance, the file identified by a particular file name can only be determined when the directory
5 containing the file (the context) is known. The file identified
by a pathname can be determined only when the file system
(context) is known. Similarly, the addresses in a process
This is a continuation of application Ser. No. 08/425,160,
address space, the keys in a database table, or domain names
filed on Apr. 11, 1995, which was abandoned upon the filing
on a global computer network such as the Internet are
hereof.
10 meaningful only because they are specified relative to a
BACKGROUND OF THE INVENTION
context.
In prior art systems for identifying data items there is no
1. Field of the Invention
direct relationship between the data names and the data item.
This invention relates to data processing systems and,
The same data name in two different contexts may refer to
more particularly, to data processing systems wherein data
15 different data items, and two different data names in the
items are identified by substantially unique identifiers which
same context may refer to the same data item.
depend on all of the data in the data items and only on the
In addition, because there is no correlation between a data
data in the data items.
name and the data it refers to, there is no a priori way to
2. Background of the Invention
confirm that a given data item is in fact the one named by a
Data processing (DP) systems, computers, networks of 20 data name. For instance, in a DP system, if one processor
computers, or the like, typically offer users and programs
requests that another processor deliver a data item with a
various ways to identify the data in the systems.
given data name, the requesting processor cannot, in
Users typically identify data in the data processing system
general, verify that the data delivered is the correct data
by giving the data some form of name. For example, a
(given only the name). Therefore it may require further
typical operating system (OS) on a computer provides a file 25 processing, typically on the part of the requestor, to verify
system in which data items are named by alphanumeric
that the data item it has obtained is, in fact, the item it
identifiers. Programs typically identify data in the data
requested.
processing system using a location or address. For example,
A common operation in a DP system is adding a new data
a program may identify a record in a file or database by using
item to the system. When a new data item is added to the
30
a record number which serves to locate that record.
system, a name can be assigned to it only by updating the
In all but the most primitive operating systems, users and
context in which names are defined. Thus such systems
programs are able to create and use collections of named
require a centralized mechanism for the management of
data items, these collections themselves being named by
names. Such a mechanism is required even in a multiidentifiers. These named collections can then, themselves,
processing system when data items are created and identified
be made part of other named collections. For example, an 35 at separate processors in distinct locations, and in which
OS may provide mechanisms to group files (data items) into
there is no other need for communication when data items
directories (collections). These directories can then, themare added.
selves be made part of other directories. A data item may
In many data processing systems or environments, data
thus be identified relative to these nested directories using a
items are transferred between different locations in the
sequence of names, or a so-called pathname, which defines 40 system. These locations may be processors in the data
a path through the directories to a particular data item (file
processing system, storage devices, memory, or the like. For
or directory).
example, one processor may obtain a data item from another
As another example, a database management system may
processor or from an external storage device, such as a
group data records (data items) into tables and then group 45 floppy disk, and may incorporate that data item into its
these tables into database files (collections). The complete
system (using the name provided with that data item).
address of any data record can then be specified using the
However, when a processor (or some location) obtains a
database file name, the table name, and the record number of
data item from another location in the DP system, it is
that data record.
possible that this obtained data item is already present in the
Other examples of identifying data items include: identi- 50 system (either at the location of the processor or at some
fying files in a network file system, identifying objects in an
other location accessible by the processor) and therefore a
object-oriented database, identifying images in an image
duplicate of the data item is created. This situation is
database, and identifying articles in a text database.
common in a network data processing environment where
In general, the terms "data" and "data item" as used herein
proprietary software products are installed from floppy disks
refer to sequences of bits. Thus a data item may be the 55 onto several processors sharing a common file server. In
contents of a file, a portion of a file, a page in memory, an
these systems, it is often the case that the same product will
object in an object-oriented program, a digital message, a
be installed on several systems, so that several copies of
digital scanned image, a part of a video or audio signal, or
each file will reside on the common file server.
any other entity which can be represented by a sequence of
In some data processing systems in which several probits. The term "data processing" herein refers to the pro- 60 cessors are connected in a network, one system is designated
cessing of data items, and is sometimes dependent on the
as a cache server to maintain master copies of data items,
type of data item being processed. For example, a data
and other systems are designated as cache clients to copy
processor for a digital image may differ from a data prolocal copies of the master data items into a local cache on an
cessor for an audio signal.
as-needed basis. Before using a cached item, a cache client
In all of the prior data processing systems the names or 65 must either reload the cached item, be informed of changes
to the cached item, or confirm that the master item correidentifiers provided to identify data items (the data items
being files, directories, records in the database, objects in
sponding to the cached item has not changed. In other words,
DATA PROCESSING SYSTEM USING
SUBSTANTIALLY UNIQUE IDENTIFIERS TO
IDENTIFY DATA ITEMS, WHEREBY
IDENTICAL DATA ITEMS HAVE THE SAME
IDENTIFIERS
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 36 of 57 PageID #: 49
5,978,791
3
4
a cache client must synchronize its data items with those on
the system provides the size, age, and location of groups
the cache server. This synchronization may involve reloadof data items in order to decide whether they can be safely
ing data items onto the cache client. The need to keep the
removed from a local file system;
cache synchronized or reload it adds significant overhead to
the system can efficiently record and preserve any colexisting caching mechanisms.
5 lection of data items;
In view of the above and other problems with prior art
the system can efficiently make a copy of any collection
systems, it is therefore desirable to have a mechanism which
of data items, to support a version control mechanism for
allows each processor in a multiprocessor system to detergroups of the data items;
mine a common and substantially unique identifier for a data
the system can publish data items, allowing other, possiitem, using only the data in the data item and not relying on 10
bly anonymous, systems in a network to gain access to the
any sort of context.
data items and to rely on the availability of the data items;
It is further desirable to have a mechanism for reducing
the system can maintain a local inventory of all the data
multiple copies of data items in a data processing system and
items located on a given removable medium, such as a
to have a mechanism which enables the identification of
diskette or CD-ROM, the inventory is independent of other
identical data items so as to reduce multiple copies. It is 15
properties of the data items such as their name, location, and
further desirable to determine whether two instances of a
date of creation;
data item are in fact the same data item, and to perform
the system allows closely related sets of data items, such
various other systems' functions and applications on data
as matching or corresponding directories on disconnected
items without relying on any context information or prop20 computers, to be periodically resynchronized with one
erties of the data item.
another;
It is also desirable to provide such a mechanism in such
the system can verify that data retrieved from another
a way as to make it transparent to users of the data
location is the desired or requested data, using only the data
processing system, and it is desirable that a single mechanism be used to address each of the problems described 25 identifier used to retrieve the data;
the system can prove possession of specific data items by
above.
content without disclosing the content of the data items, for
SUMMARY OF THE INVENTION
purposes of later legal verification and to provide anonymity;
This invention provides, in a data processing system, a
the system tracks possession of specific data items accordmethod and apparatus for identifying a data item in the 30
ing to content by owner, independent of the name, date, or
system, where the identity of the data item depends on all of
other properties of the data item, and tracks the uses of
the data in the data item and only on the data in the data item.
specific data items and files by content for accounting
Thus the identity of a data item is independent of its name,
purposes.
origin, location, address, or other information not derivable
directly from the data, and depends only on the data itself. 35
Other objects, features, and characteristics of the present
invention as well as the methods of operation and functions
This invention further provides an apparatus and a method
of the related elements of structure, and the combination of
for determining whether a particular data item is present in
parts and economies of manufacture, will become more
the system or at a location in the system, by examining only
apparent upon consideration of the following description
the data identities of a plurality of data items.
Using the method or apparatus of the present invention, 40 and the appended claims with reference to the accompanying drawings, all of which form a part of this specification.
the efficiency and integrity of a data processing system can
be improved. The present invention improves the design and
BRIEF DESCRIPTION OF THE DRAWINGS
operation of a data storage system, file system, relational
database, object-oriented database, or the like that stores a
FIGS. l(a) and l(b) depicts a typical data processing
plurality of data items, by making possible or improving the 45 system in which a preferred embodiment of the present
design and operation of at least some or all of the following
invention operates;
features:
FIG. 2 depicts a hierarchy of data items stored at any
the system stores at most one copy of any data item at a
location in such a data processing system;
given location, even when multiple data names in the system 50
FIGS. 3-9 depict data structures used to implement an
refer to the same contents;
embodiment of the present invention; and
the system avoids copying data from source to destination
FIGS. 10(a)-28 are flow charts depicting operation of
locations when the destination locations already have the
various aspects of the present invention.
data;
DETAILED DESCRIPTION OF THE
the system provides transparent access to any data item by
PRESENTLY PREFERRED EXEMPLARY
reference only to its identity and independent of its present 55
EMBODIMENTS
location, whether it be local, remote, or offline;
the system caches data items from a server, so that only
An embodiment of the present invention is now described
the most recently accessed data items need be retained;
with reference to a typical data processing system 100,
when the system is being used to cache data items, 60 which, with reference to FIGS. l(a) and l(b), includes one
problems of maintaining cache consistency are avoided;
or more processors (or computers) 102 and various storage
devices 104 connected in some way, for example by a bus
the system maintains a desired level of redundancy of data
106.
items in a network of servers, to protect against failure by
ensuring that multiple copies of the data items are present at
Each processor 102 includes a CPU 108, a memory 110
different locations in the system;
65 and one or more local storage devices 112. The CPU 108,
memory 110, and local storage device 112 may be internally
the system automatically archives data items as they are
connected, for example by a bus 114. Each processor 102
created or modified;
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 37 of 57 PageID #: 50
5,978,791
5
6
may also include other devices (not shown), such as a
processor, a storage device, a removable storage medium
keyboard, a display, a printer, and the like.
(such as a floppy disk or compact disk), or any other physical
location in the system. The term "local" with respect to a
In a data processing system 100, wherein more than one
particular processor 102 refers to the memory and storage
processor 102 is used, that is, in a multiprocessor system, the
processors may be in one of various relationships. For 5 devices of that particular processor.
example, two processors 102 may be in a client/server,
In the following, the terms "True Name", "data identity"
client/client, or a server/server relationship. These interand "data identifier" refer to the substantially unique data
processor relationships may be dynamic, changing dependidentifier for a particular data item. The term "True File"
ing on particular situations and functions. Thus, a particular
refers to the actual file, segment, or data item identified by
processor 102 may change its relationship to other proces- 10 a True Name.
sors as needed, essentially setting up a peer-to-peer relationA file system for a data processing system 100 is now
ship with other processors. In a peer-to-peer relationship,
described which is intended to work with an existing opersometimes a particular processor 102 acts as a client
ating system by augmenting some of the operating system's
processor, whereas at other times the same processor acts as
file management system codes. The embodiment provided
a server processor. In other words, there is no hierarchy 15 relies on the standard file management primitives for actuimposed on or required of processors 102.
ally storing to and retrieving data items from disk, but uses
In a multiprocessor system, the processors 102 may be
the mechanisms of the present invention to reference and
homogeneous or heterogeneous. Further, in a multiprocessor
access those data items.
data processing system 100, some or all of the processors
The processes and mechanisms (services) provided in this
102 may be disconnected from the network of processors for 20 embodiment are grouped into the following categories:
periods of time. Such disconnection may be part of the
primitive mechanisms, operating system mechanisms,
normal operation of the system 100 or it may be because a
remote mechanisms, background mechanisms, and extended
particular processor 102 is in need of repair.
mechanisms.
Within a data processing system 100, the data may be
Primitive mechanisms provide fundamental capabilities
organized to form a hierarchy of data storage elements, 25 used to support other mechanisms. The following primitive
wherein lower level data storage elements are combined to
mechanisms are described:
form higher level elements. This hierarchy can consist of, for
1. Calculate True Name;
example, processors, file systems, regions, directories, data
2. Assimilate Data Item;
files, segments, and the like. For example, with reference to
3. New True File;
FIG. 2, the data items on a particular processor 102 may be 30
organized or structured as a file system 116 which comprises
4. Get True Name from Path;
regions 117, each of which comprises directories 118, each
5. Link path to True Name;
of which can contain other directories 118 or files 120. Each
6. Realize True File from Location;
file 120 being made up of one or more data segments 122.
35
7. Locate Remote File;
In a typical data processing system, some or all of these
8. Make True File Local;
elements can be named by users given certain implementation specific naming conventions, the name (or pathname) of
9. Create Scratch File;
an element being relative to a context. In the context of a
10. Freeze Directory;
data processing system 100, a pathname is fully specified by 40
11. Expand Frozen Directory;
a processor name, a filesystem name, a sequence of zero or
12. Delete True File;
more directory names identifying nested directories, and a
13. Process Audit File Entry;
final file name. (Usually the lowest level elements, in this
14. Begin Grooming;
case segments 122, cannot be named by users.)
15. Select For Removal; and
In other words, a file system 116 is a collection of 45
16. End Grooming.
directories 118. A directory 118 is a collection of named files
Operating system mechanisms provide typical familiar
120---both data files 120 and other directory files 118. A file
file system mechanisms, while maintaining the data struc120 is a named data item which is either a data file (which
tures required to offer the mechanisms of the present invenmay be simple or compound) or a directory file 118. A
simple file 120 consists of a single data segment 122. A 50 tion. Operating system mechanisms are designed to augment
existing operating systems, and in this way to make the
compound file 120 consists of a sequence of data segments
present invention compatible with, and generally transparent
122. A data segment 122 is a fixed sequence of bytes. An
to, existing applications. The following operating system
important property of any data segment is its size, the
mechanisms are described:
number of bytes in the sequence.
1. Open File;
A single processor 102 may access one or more file 55
systems 116, and a single storage device 104 may contain
2. Close File;
one or more file systems 116, or portions of a file system 116.
3. Read File;
For instance, a file system 116 may span several storage
4. Write File;
devices 104.
5. Delete File or Directory;
In order to implement controls in a file system, file system 60
6. Copy File or Directory;
116 may be divided into distinct regions, where each region
7. Move File or Directory;
is a unit of management and control. A region consists of a
8. Get File Status; and
given directory 118 and is identified by the pathname (user
defined) of the directory.
9. Get Files in Directory.
In the following, the term "location", with respect to a 65
Remote mechanisms are used by the operating system in
data processing system 100, refers to any of a particular
responding to requests from other processors. These mechaprocessor 102 in the system, a memory of a particular
nisms enable the capabilities of the present invention in a
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 38 of 57 PageID #: 51
5,978,791
7
8
peer-to-peer network mode of operation. The following
100. However, they can also be shared by placing them on
remote mechanisms are described:
a remote, shared file server (for instance, in a local area
network of machines). In order to accommodate sharing data
1. Locate True File;
structures, it is necessary that the processors accessing the
2. Reserve True File;
5 shared database use the appropriate locking techniques to
3. Request True File;
ensure that changes to the shared database do not interfere
4. Retire True File;
with one another but are appropriately serialized. These
5. Cancel Reservation;
locking techniques are well understood by ordinarily skilled
6. Acquire True File;
programmers of distributed applications.
10
7. Lock Cache;
It is sometimes desirable to allow some regions to be local
to a particular processor 102 and other regions to be shared
8. Update Cache; and
among processors 102. (Recall that a region is a unit of file
9. Check Expiration Date.
system management and control consisting of a given direcBackground mechanisms are intended to run occasionally
tory identified by the pathname of the directory.) In the case
and at a low priority. These provide automated management
capabilities with respect to the present invention. The fol- 15 of local and shared regions, there would be both local and
shared versions of each data structure. Simple changes to the
lowing background mechanisms are described:
processes described below must be made to ensure that
1. Mirror True File;
appropriate data structures are selected for a given operation.
2. Groom Region;
The local directory extensions (LDE) table 124 is a data
3. Check for Expired Links; and
20 structure which provides information about files 120 and
4. Verify Region; and
directories 118 in the data processing system 100. The local
5. Groom Source List.
directory extensions table 124 is indexed by a pathname or
Extended mechanisms run within application programs
contextual name (that is, a user provided name) of a file and
over the operating system. These mechanisms provide soluincludes the True Name for most files. The information in
tions to specific problems and applications. The following 25 local directory extension table 124 is in addition to that
provided by the native file system of the operating system.
extended mechanisms are described:
The True File registry (TFR) 126 is a data store for listing
1. Inventory Existing Directory;
actual data items which have True Names, both files 120 and
2. Inventory Removable, Read-only Files;
segments 122. When such data items occur in the True File
3. Synchronize directories;
30 registry 126 they are known as True Files. True Files are
4. Publish Region;
identified in True File registry 126 by their True Names or
5. Retire Directory;
identities. The table True File registry 126 also stores
6. Realize Directory at location;
location, dependency, and migration information about True
7. Verify True File;
Files.
8. Track for accounting purposes; and
35
The region table (RT) 128 defines areas in the network
9. Track for licensing purposes.
storage which are to be managed separately. Region table
The file system herein described maintains sufficient
128 defines the rules for access to and migration of files 120
information to provide a variety of mechanisms not ordiamong various regions with the local file system 116 and
narily offered by an operating system, some of which are
remote peer file systems.
listed and described here. Various processing performed by 40
The source table (ST) 130 is a list of the sources of True
this embodiment of the present invention will now be
Files other than the current True File registry 126. The
described in greater detail.
source table 130 includes removable volumes and remote
In some embodiments, some files 120 in a data processing
processors.
system 100 do not have True Names because they have been
The audit file (AF) 132 is a list of records indicating
recently received or created or modified, and thus their True 45 changes to be made in local or remote files, these changes to
Names have not yet been computed. A file that does not yet
be processed in background.
have a True Name is called a scratch file. The process of
The accounting log (AL) 134 is a log of file transactions
assigning a True Name to a file is referred to as assimilation,
used to create accounting information in a manner which
and is described later. Note that a scratch file may have a
preserves the identity of files being tracked independent of
user provided name.
50 their name or location.
Some of the processing performed by the present invenThe license table (LT) 136 is a table identifying files,
tion can take place in a background mode or on a delayed or
which may only be used by licensed users, in a manner
as-needed basis. This background processing is used to
independent of their name or location, and the users licensed
determine information that is not immediately required by
to use them.
the system or which may never be required. As an example, 55
in some cases a scratch file is being changed at a rate greater
Detailed Descriptions of the Data Structures
than the rate at which it is useful to determine its True Name.
In these cases, determining the True Name of the file can be
The following table summarizes the fields of an local
postponed or performed in the background.
directory extensions table entry, as illustrated by record 138
Data Structures
60 in FIG. 3.
The following data structures, stored in memory 110 of
one of more processors 102 are used to implement the
Field
Description
mechanisms described herein. The data structures can be
local to each processor 102 of the system 100, or they can
Region ID
identifies the region in which this file is
reside on only some of the processors 102.
65
contained.
the user provided name or contextural name
Pathname
The data structures described are assumed to reside on
individual peer processors 102 in the data processing system
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 39 of 57 PageID #: 52
5,978,791
9
-continued
Field
True Name
Type
Scratch
File ID
Time of
last
access
Time of
last modification
Safe flag
Lock flag
Size
Owner
10
-continued
Description
of the file or directory, relative to the
region in which it occurs.
the computed True Name or identity of the
file or directory. This True Name is not
always up th date, and it is set to a
special value when a file is modified and
is later recomputed in the background.
indicates whether the file is a data file
or a directory.
the physical location of the file in the
file system, when no True Name has been
calculated for the file. As noted above,
such a file is called a scratch file.
the last access time to this file. If this
file is a directory, this is the last
access time to any file in the directory.
the time of last change of this file. If
this file is a directory, this is the last
modification time of any file in the
directory.
indicates that this file (and, if this file
is a directory, all of its subordinate
files) have been backed up on some other
system, and it is therefore safe to remove
them.
indicates whether a file is locked, that
is, it is being modified by the local processor or a remote processor. Only one
processor may modify a file at a time.
the full size of this directory (including
all subordinate files), if all files in it
were fully expanded and duplicated. For a
file that is not a directory this is the
size of the actual True File.
the identity of the user who owns this
file, for accounting and license tracking
purposes.
Field
5
Use count
10
15
20
A region table 128, specified by a directory pathname,
records storage policies which allow files in the file system
to be stored, accessed and migrated in different ways.
Storage policies are programmed in a configurable way
using a set of rules described below.
Each region table record 142 of region table 128 includes
the fields described in the following table (with reference to
FIG. 5):
Field
25
Description
Region ID
internally used identifier for this
region.
file system on the local processor of
which this region is a part.
a pathname relative to the region file
system which defines the location of
this region. The region consists of
all files and directories subordinate
to this pathname, except those in a
region subordinate to this region.
zero or more identifiers of processors
which are to keep mirror or archival
copies of all files in the current
region. Multiple mirror processors
can be defined to form a mirror group.
number of copies of each file in this
region that should be retained in a
mirror group.
specifies whether this region is local
to a single processor 102, shared by
several processors 102 (if, for
instance, it resides on a shared file
server), or managed by a remote
processor.
the migration policy to apply to this
region. A single region might
participate in several policies. The
policies are as follows (parameters in
brackets are specified as part of the
policy):
region is a cached version from
[processor 10];
region is a member of a mirror set
defined by [processor 10].
region is to be archived on
[processor 10].
region is to be backed up locally,
by placing new copies in [region
ID].
region is read only and may not be
changed.
region is published and expires on
[date].
Files in this region should be
compressed.
Region file system
Region pathname
30
Mirror processor(s)
35
Each record of the True File registry 126 has the fields
shown in the True File registry record 140 in FIG. 4. The
True File registry 126 consists of the database described in
the table below as well as the actual True Files identified by
the True File IDs below.
Field
computed True Name or identity of
the file.
compressed version of the True File
may be stored insteaded of, or in
addition to, an uncompressed
version. This field provides the
identity of the actual
representation of the compressed
version of the file.
tentative count of how many
references have been selected for
deletion during a grooming
operation.
most recent date and time the
content of this file was accessed.
date and time after which this file
may be deleted by this server.
processor IDs of other processors
which contain references to this
True File.
source ID(s) of zero or more
sources form which this file or
data item may be retrieved.
identity or disk location of the
actual physical representation of
the file or file segment. It is
sufficient to use a filename in the
registration directory of the
Mirror duplication
count
Region status
40
Description
True Name
Compressed
File ID
Grooming
delete count
Time of last
access
Expiration
Dependent
processors
Source IDs
True File 10
Description
underlying operation system. The
True File ID is absent if the
actual file is not currently
present at the current location.
number of other records on this
processor which identify this True
File.
Policy
45
50
55
60
65
A source table 130 identifies a source location for True
Files. The source table 130 is also used to identify client
processors making reservations on the current processor.
Each source record 144 of the source table 130 includes the
fields summarized in the following table, with reference to
FIG. 6:
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 40 of 57 PageID #: 53
5,978,791
11
12
Field
Description
Field
Description
source ID
internal identifier used to identify a
particular source.
type of source location:
Removable Storage Volume
Local Region
Cache Server
Mirror Group Server
Cooperative Server
Publishing Server
Client
includes information about the rights
of this processor, such as whether it
can ask the local processor to store
data items for it.
measurement of the bandwidth, cost,
and reliability of the connection to
this source of True Files. The availability is used to select from among
several possible sources.
information on how the local processor
is to access the source. This may be,
for example, the name of a removable
storage volume, or the processor ID
and region path of a region on a
remote processor.
True Name
True Name of a data item subject to
license validation.
identity of a user authorized to have
access to this 0 bj ect.
source
type
5
licensee
Various other data structures are employed on some or all
of the processors 102 in the data processing system 100.
Each processor 102 has a global freeze lock (GFL) 152
source
(FIG. 1), which is used to prevent synchronization errors
rights
when a directory is frozen or copied. Any processor 102 may
include a special archive directory (SAD) 154 into which
15 directories may be copied for the purposes of archival. Any
source
availability
processor 102 may include a special media directory (SMD)
156, into which the directories of removable volumes are
stored to form a media inventory. Each processor has a
grooming lock 158, which is set during a grooming operasource
location
20 tion. During this period the grooming delete count of True
File registry entries 140 is active, and no True Files should
be deleted until grooming is complete. While grooming is in
effect, grooming information includes a table of pathnames
selected for deletion, and keeps track of the amount of space
25 that would be freed if all of the files were deleted.
The audit file 132 is a table of events ordered by
Primitive Mechanisms
timestamp, each record 146 in audit file 132 including the
The first of the mechanisms provided by the present
fields summarized in the following table (with reference to
invention, primitive mechanisms, are now described. The
FIG. 7):
mechanisms described here depend on underlying data man30 agement mechanisms to create, copy, read, and delete data
items in the True File registry 126, as identified by a True
Field
Description
File ID. This support may be provided by an underlying
operating system or disk storage manager.
path of the file in question.
Original Name
whether the file was created, read,
Operation
The following primitive mechanisms are described:
written, copied or deleted.
35
1. Calculate True Name;
Type
specifies whether the source is a file
or a directory.
2. Assimilate Data Item;
Processor ID
ID of the remote processor generating
3. New True File;
this event (if not local).
4. Get True Name from Path;
time and date file was closed (required
Timestamp
only for accessed/modified files).
40
5. Link Path to True Name;
Name of the file (required only for
Pathname
6. Realize True File from Location;
rename).
computed True Name of the file. This is
True Name
7. Locate Remote File;
used by remote systems to mirror changes
8. Make True File Local;
to the directory and is filled in during
background processing.
9. Create Scratch File;
45
10. Freeze Directory;
11. Expand Frozen Directory;
Each record 148 of the accounting log 134 records an
12. Delete True File;
event which may later be used to provide information for
billing mechanisms. Each accounting log entry record 148
13. Process Audit File Entry;
includes at least the information summarized in the follow- 50
14. Begin Grooming;
ing table, with reference to FIG. 8:
15. Select For Removal; and
16. End Grooming.
1. Calculate True Name
Description
Field
A True Name is computed using a function, MD, which
55
date and time of this log entry.
date of
reduces a data block B of arbitrary length to a relatively
entry
small, fixed size identifier, the True Name of the data block,
type of
Entry types include create file,
such that the True Name of the data block is virtually
delete file, and transmit file.
entry
guaranteed to represent the data block B and only data block
True Name
True Name of data item in question.
identity of the user responsible for
owner
60 B.
this action.
The function MD must have the following properties:
1. The domain of the function MD is the set of all data
items. The range of the function MD is the set of True
Each record 150 of the license table 136 records a
Names.
relationship between a licensable data item and the user
licensed to have access to it. Each license table record 150 65
2. The function MD must take a data item of arbitrary
length and reduce it to an integer value in the range 0
includes the information summarized in the following table,
with reference to FIG. 9:
to N-1, where N is the cardinality of the set of True
10
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 41 of 57 PageID #: 54
5,978,791
13
Names. That is, for an arbitrary length data block B,
14
A mechanism for calculating a True Name given a data
item is now described, with reference to FIGS. 10(a) and
O~MD(B)~N.
10(b).
3. The results of MD(B) must be evenly and randomly
A simple data item is a data item whose size is less than
distributed over the range of N, in such a way that
simple or regular changes to B are virtually guaranteed 5 a particular given size (which must be defined in each
particular implementation of the invention). To determine
to produce a different value of MD(B).
the True Name of a simple data item, with reference to FIG.
4. It must be computationally difficult to find a different
10(a), first compute the MD function (described above) on
value B' such that MD(B)=MD(B').
the given simple data item (Step S212). Then append to the
5. The function MD(B) must be efficiently computed.
resulting 128 bits, the byte length modulo 32 of the data item
A family of functions with the above properties are the 10
(Step S214). The resulting 160-bit value is the True Name of
so-called message digest functions, which are used in digital
the simple data item.
security systems as techniques for authentification of data.
A compound data item is one whose size is greater than
These functions (or algorithms) include MD4, MD5, and
the particular given size of a simple data item. To determine
SHA.
the True Name of an arbitrary (simple or compound) data
In the presently preferred embodiments, either MD5 or 15 item, with reference to FIG. 10(b), first determine if the data
SHA is employed as the basis for the computation of True
item is a simple or a compound data item (Step S216). If the
Names. Whichever of these two message digest functions is
data item is a simple data item, then compute its True Name
employed, that same function must be employed on a
in step S218 (using steps S212 and S214 described above),
system-wide basis.
otherwise partition the data item into segments (Step S220)
It is impossible to define a function having a unique 20 and assimilate each segment (Step S222) (the primitive
output for each possible input when the number of elements
mechanism, Assimilate a Data Item, is described below),
computing the True Name of the segment. Then create an
in the range of the function is smaller than the number of
indirect block consisting of the computed segment True
elements in its domain. However, a crucial observation is
that the actual data items that will be encountered in the
Names (Step S224). An indirect block is a data item which
operation of any system embodying this invention form a 25 consists of the sequence of True Names of the segments.
very sparse subset of all the possible inputs.
Then, in step S226, assimilate the indirect block and compute its True Name. Finally, replace the final thirty-two (32)
A colliding set of data items is defined as a set wherein,
bits of the resulting True Name (that is, the length of the
for one or more pairs x and y in the set, MD(x)=MD(y).
indirect block) by the length modulo 32 of the compound
Since a function conforming to the requirements for MD
must evenly and randomly distribute its outputs, it is 30 data item (Step S228). The result is the True Name of the
compound data item.
possible, by making the range of the function large enough,
to make the probability arbitrarily small that actual inputs
Note that the compound data item may be so large that the
encountered in the operation of an embodiment of this
indirect block of segment True Names is itself a compound
invention will form a colliding set.
data item. In this case the mechanism is invoked recursively
To roughly quantify the probability of a collision, assume 35 until only simple data items are being processed.
that there are no more than 230 storage devices in the world,
Both the use of segments and the attachment of a length
to the True Name are not strictly required in a system using
and that each storage device has an average of at most 2 20
different data items. Then there are at most 2 50 data items in
the present invention, but are currently considered desirable
the world. If the outputs of MD range between 0 and 2 128 ,
features in the preferred embodiment.
it can be demonstrated that the probability of a collision is 40 2. Assimilate Data Item
approximately 1 in 229 . Details on the derivation of these
A mechanism for assimilating a data item (scratch file or
probability values are found, for example, in P. Flajolet and
segment) into a file system, given the scratch file ID of the
A. M. Odlyzko, "Random Mapping Statistics," Lecture
data item, is now described with reference to FIG. 11. The
purpose of this mechanism is to add a given data item to the
Notes in Computer Science 434: Advances in CryptologyEurocrypt '89 Proceedings, Springer-Verlag, pp. 329-354. 45 True File registry 126. If the data item already exists in the
Note that for some less-preferred embodiments of the
True File registry 126, this will be discovered and used
during this process, and the duplicate will be eliminated.
present invention, lower probabilities of uniqueness may be
acceptable, depending on the types of applications and
Thereby the system stores at most one copy of any data
mechanisms used. In some embodiments it may also be
item or file by content, even when multiple names refer to
useful to have more than one level of True Names, with 50 the same content.
First, determine the True Name of the data item corresome of the True Names having different degrees of uniquesponding to the given scratch File ID using the Calculate
ness. If such a scheme is implemented, it is necessary to
ensure that less unique True Names are not propagated in the
True Name primitive mechanism (Step S230). Next, look for
system.
an entry for the True Name in the True File registry 126
While the invention is described herein using only the 55 (Step S232) and determine whether a True Name entry,
True Name of a data item as the identifier for the data item,
record 140, exists in the True File registry 126. If the entry
other preferred embodiments use tagged, typed, categorized
record includes a corresponding True File ID or compressed
or classified data items and use a combination of both the
File ID (Step S237), delete the file with the scratch File ID
True Name and the tag, type, category or class of the data
(Step S238). Otherwise store the given True File ID in the
item as an identifier. Examples of such categorizations are 60 entry record (step S239).
If it is determined (in step S232) that no True Name entry
files, directories, and segments; executable files and data
exists in the True File registry 126, then, in Step S236, create
files, and the like. Examples of classes are classes of objects
in an object-oriented system. In such a system, a lower
a new entry in the True File registry 126 for this True Name.
degree of True Name uniqueness is acceptable over the
Set the True Name of the entry to the calculated True Name,
entire universe of data items, as long as sufficient uniqueness 65 set the use count for the new entry to one, store the given
is provided per category of data items. This is because the
True File ID in the entry and set the other fields of the entry
tags provide an additional level of uniqueness.
as appropriate.
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 42 of 57 PageID #: 55
5,978,791
15
16
Because this procedure may take some time to compute,
record 140 of the corresponding True Name; note whether
it is intended to run in background after a file has ceased to
the entry is a directory by reading the True File to see if it
change. In the meantime, the file is considered an unassimicontains a tag (magic number) indicating that it represents a
lated scratch file.
frozen directory (see also the description of the Freeze
5 Directory primitive mechanism regarding the tag); and com3. New True File
pute and set the other fields of the local directory extensions
The New True File process is invoked when processing
appropriately. For instance, search the region table 128 to
the audit file 132, some time after a True File has been
identify the region of the path, and set the time of last access
assimilated (using the Assimilate Data Item primitive
and time of last modification to the current time.
mechanism). Given a local directory extensions table entry
record 138 in the local directory extensions table 124, the 10 6. Realize True File from Location
This mechanism is used to try to make a local copy of a
New True File process can provide the following steps (with
True File, given its True Name and the name of a source
reference to FIG. 12), depending on how the local processor
location (processor or media) that may contain the True File.
is configured:
First, in step S238, examine the local directory extensions
This mechanism is now described with reference to FIG. 15.
table entry record 138 to determine whether the file is locked 15
First, in step S272, determine whether the location speciby a cache server. If the file is locked, then add the ID of the
fied is a processor. If it is determined that the location
cache server to the dependent processor list of the True File
specified is a processor, then send a Request True File
registry table 126, and then send a message to the cache
message (using the Request True File remote mechanism) to
server to update the cache of the current processor using the
the remote processor and wait for a response (Step S274). If
20 a negative response is received or no response is received
Update Cache remote mechanism (Step 242).
after a timeout period, this mechanism fails. If a positive
If desired, compress the True File (Step S246), and, if
response is received, enter the True File returned in the True
desired, mirror the True File using the Mirror True File
File registry 126 (Step S276). (If the file received was
background mechanism (Step S248).
compressed, enter the True File ID in the compressed File ID
4. Get True Name from Path
The True Name of a file can be used to identify a file by 25 field.)
contents, to confirm that a file matches its original contents,
If, on the other hand, it is determined in step S272 that the
or to compare two files. The mechanism to get a True Name
location specified is not a processor, then, if necessary,
given the pathname of a file is now described with reference
request the user or operator to mount the indicated volume
to FIG. 13.
(Step S278). Then (Step S280) find the indicated file on the
First, search the local directory extensions table 124 for 30 given volume and assimilate the file using the Assimilate
Data Item primitive mechanism. If the volume does not
the entry record 138 with the given pathname (Step S250).
If the pathname is not found, this process fails and no True
contain a True File registry 126, search the media inventory
to find the path of the file on the volume. If no such file can
Name corresponding to the given pathname exists. Next,
be found, this mechanism fails.
determine whether the local directory extensions table entry
At this point, whether or not the location is determined (in
record 138 includes a True Name (Step S252), and if so, the 35
mechanism's task is complete. Otherwise, determine
step S272) to be a processor, if desired, verify the True File
whether the local directory extensions table entry record 138
(in step S282).
identifies a directory (Step S254), and if so, freeze the
7. Locate Remote File
directory (Step S256) (the primitive mechanism Freeze
This mechanism allows a processor to locate a file or data
Directory is described below).
40 item from a remote source of True Files, when a specific
source is unknown or unavailable. A client processor system
Otherwise, in step S258, assimilate the file (using the
may ask one of several or many sources whether it can
Assimilate Data Item primitive mechanism) defined by the
File ID field to generate its True Name and store its True
supply a data object with a given True Name. The steps to
Name in the local directory extensions entry record. Then
perform this mechanism are as follows (with reference to
return the True Name identified by the local directory 45 FIGS. 16(a) and 16(b).
extensions table 124.
The client processor 102 uses the source table 145 to
5. Link Path to True Name
select one or more source processors (Step S284). If no
The mechanism to link a path to a True Name provides a
source processor can be found, the mechanism fails. Next,
way of creating a new directory entry record identifying an
the client processor 102 broadcasts to the selected sources a
existing, assimilated file. This basic process may be used to 50 request to locate the file with the given True Name using the
Locate True File remote mechanism (Step S286). The
copy, move, and rename files without a need to copy their
request to locate may be augmented by asking to propagate
contents. The mechanism to link a path to a True Name is
now described with reference to FIG. 14.
this request to distant servers. The client processor then
waits for one or more servers to respond positively (Step
First, if desired, confirm that the True Name exists locally
by searching for it in the True Name registry or local 55 S288). After all servers respond negatively, or after a timeout
directory extensions table 135 (Step S260). Most uses of this
period with no positive response, the mechanism repeats
mechanism will require this form of validation. Next, search
selection (Step S284) to attempt to identify alternative
for the path in the local directory extensions table 135 (Step
sources. If any selected source processor responds, its proS262). Confirm that the directory containing the file named
cessor ID is the result of this mechanism. Store the processor
in the path already exists (Step S264). If the named file itself 60 ID in the source field of the True File registry entry record
exists, delete the File using the Delete True File operating
140 of the given True Name (Step S290).
If the source location of the True Name is a different
system mechanism (see below) (Step S268).
Then, create an entry record in the local directory extenprocessor or medium than the destination (Step S290a),
perform the following steps:
sions with the specified path (Step S270) and update the
(i) Look up the True File registry entry record 140 for the
entry record and other data structures as follows: fill in the 65
True Name field of the entry with the specified True Name;
corresponding True Name, and add the source location ID to
the list of sources for the True Name (Step S290b); and
increment the use count for the True File registry entry
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 43 of 57 PageID #: 56
5,978,791
17
18
(ii) If the source is a publishing system, determine the
expiration date on the publishing system for the True Name
and add that to the list of sources. If the source is not a
publishing system, send a message to reserve the True File
on the source processor (Step S290c).
Source selection in step S284 may be based on optimizations involving general availability of the source, access
time, bandwidth, and transmission cost, and ignoring previously selected processors which did not respond in step
S288.
8. Make True File Local
This mechanism is used when a True Name is known and
a locally accessible copy of the corresponding file or data
item is required. This mechanism makes it possible to
actually read the data in a True File. The mechanism takes
a True Name and returns when there is a local, accessible
copy of the True File in the True File registry 126. This
mechanism is described here with reference to the flow chart
of FIGS. 17(a) and 17(b).
First, look in the True File registry 126 for a True File
entry record 140 for the corresponding True Name (Step
S292). If no such entry is found this mechanism fails. If
there is already a True File ID for the entry (Step S294), this
mechanism's task is complete. If there is a compressed file
ID for the entry (Step S296), decompress the file corresponding to the file ID (Step S298) and store the decompressed file ID in the entry (Step S300). This mechanism is
then complete.
If there is no True File ID for the entry (Step S294) and
there is no compressed file ID for the entry (Step S296), then
continue searching for the requested file. At this time it may
be necessary to notify the user that the system is searching
for the requested file.
If there are one or more source IDs, then select an order
in which to attempt to realize the source ID (Step S304). The
order may be based on optimizations involving general
availability of the source, access time, bandwidth, and
transmission cost. For each source in the order chosen,
realize the True File from the source location (using the
Realize True File from Location primitive mechanism), until
the True File is realized (Step S306). If it is realized,
continue with step S294. If no known source can realize the
True File, use the Locate Remote File primitive mechanism
to attempt to find the True File (Step S308). If this succeeds,
realize the True File from the identified source location and
continue with step S296.
9. Create Scratch File
A scratch copy of a file is required when a file is being
created or is about to be modified. The scratch copy is stored
in the file system of the underlying operating system. The
scratch copy is eventually assimilated when the audit file
record entry 146 is processed by the Process Audit File Entry
primitive mechanism. This Create Scratch File mechanism
requires a local directory extensions table entry record 138.
When it succeeds, the local directory extensions table entry
record 138 contains the scratch file ID of a scratch file that
is not contained in the True File registry 126 and that may
be modified. This mechanism is now described with reference to FIGS. 18(a) and 18(b).
First determine whether the scratch file should be a copy
of the existing True File (Step S310). If so, continue with
step S312. Otherwise, determine whether the local directory
extensions table entry record 138 identifies an existing True
File (Step S316), and if so, delete the True File using the
Delete True File primitive mechanism (Step S318). Then
create a new, empty scratch file and store its scratch file ID
in the local directory extensions table entry record 138 (Step
S320). This mechanism is then complete.
If the local directory extensions table entry record 138
identifies a scratch file ID (Step S312), then the entry already
has a scratch file, so this mechanism succeeds.
If the local directory extensions table entry record 138
identifies a True File (S316), and there is no True File ID for
the True File (S312), then make the True File local using the
Make True File Local primitive mechanism (Step S322). If
there is still no True File ID, this mechanism fails.
There is now a local True File for this file. If the use count
in the corresponding True File registry entry record 140 is
one (Step S326), save the True File ID in the scratch file ID
of the local directory extensions table entry record 138, and
remove the True File registry entry record 140 (Step S328).
(This step makes the True File into a scratch file.) This
mechanism's task is complete.
Otherwise, if the use count in the corresponding True File
registry entry record 140 is not one (in step S326), copy the
file with the given True File ID to a new scratch file, using
the Read File OS mechanism and store its file ID in the local
directory extensions table entry record 138 (Step S330), and
reduce the use count for the True File by one. If there is
insufficient space to make a copy, this mechanism fails.
10. Freeze Directory
This mechanism freezes a directory in order to calculate
its True Name. Since the True Name of a directory is a
function of the files within the directory, they must not
change during the computation of the True Name of the
directory. This mechanism requires the pathname of a directory to freeze. This mechanism is described with reference
to FIGS. 19(a) and 19(b).
In step S332, add one to the global freeze lock. Then
search the local directory extensions table 124 to find each
subordinate data file and directory of the given directory, and
freeze each subordinate directory found using the Freeze
Directory primitive mechanism (Step S334). Assimilate
each unassimilated data file in the directory using the
Assimilate Data Item primitive mechanism (Step S336).
Then create a data item which begins with a tag or marker
(a "magic number") being a unique data item indicating that
this data item is a frozen directory (Step S337). Then list the
file name and True Name for each file in the current
directory (Step S338). Record any additional information
required, such as the type, time of last access and
modification, and size (Step S340). Next, in step S342, using
the Assimilate Data Item primitive mechanism, assimilate
the data item created in step S338. The resulting True Name
is the True Name of the frozen directory. Finally, subtract
one from the global freeze lock (Step S344).
11. Expand Frozen Directory
This mechanism expands a frozen directory in a given
location. It requires a given pathname into which to expand
the directory, and the True Name of the directory and is
described with reference to FIG. 20.
First, in step S346, make the True File with the given True
Name local using the Make True File Local primitive
mechanism. Then read each directory entry in the local file
created in step S346 (Step S348). For each such directory
entry, do the following:
Create a full pathname using the given pathname and the
file name of the entry (Step S350); and
link the created path to the True Name (Step S352) using
the Link Path to True Name primitive mechanism.
12. Delete True File
This mechanism deletes a reference to a True Name. The
underlying True File is not removed from the True File
registry 126 unless there are no additional references to the
file. With reference to FIG. 21, this mechanism is performed
as follows:
5
10
15
20
25
30
35
40
45
50
55
60
65
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 44 of 57 PageID #: 57
5,978,791
19
20
If the global freeze lock is on, wait until the global freeze
lock is turned off (Step S354). This prevents deleting a True
File while a directory which might refer to it is being frozen.
Next, find the True File registry entry record 140 given the
True Name (Step S356). If the reference count field of the
True File registry 126 is greater than zero, subtract one from
the reference count field (Step S358). If it is determined (in
step S360) that the reference count field of the True File
registry entry record 140 is zero, and if there are no
dependent systems listed in the True File registry entry
record 140, then perform the following steps:
(i) If the True File is a simple data item, then delete the
True File, otherwise,
(ii) (the True File is a compound data item) for each True
Name in the data item, recursively delete the True File
corresponding to the True Name (Step S362).
(iii) Remove the file indicated by the True File ID and
compressed file ID from the True File registry 126, and
remove the True File registry entry record 140 (Step S364).
13. Process Audit File Entry
This mechanism performs tasks which are required to
maintain information in the local directory extensions table
124 and True File registry 126, but which can be delayed
while the processor is busy doing more time-critical tasks.
Entries 142 in the audit file 132 should be processed at a
background priority as long as there are entries to be
processed. With reference to FIG. 22, the steps for processing an entry are as follows:
Determine the operation in the entry 142 currently being
processed (Step S365). If the operation indicates that a file
was created or written (Step S366), then assimilate the file
using the Assimilate Data Item primitive mechanism (Step
S368), use the New True File primitive mechanism to do
additional desired processing (such as cache update,
compression, and mirroring) (Step S369), and record the
newly computed True Name for the file in the audit file
record entry (Step S370).
Otherwise, if the entry being processed indicates that a
compound data item or directory was copied (or deleted)
(Step S376), then for each component True Name in the
compound data item or directory, add (or subtract) one to the
use count of the True File registry entry record 140 corresponding to the component True Name (Step S378).
In all cases, for each parent directory of the given file,
update the size, time of last access, and time of last
modification, according to the operation in the audit record
(Step S379).
Note that the audit record is not removed after processing,
but is retained for some reasonable period so that it may be
used by the Synchronize Directory extended mechanism to
allow a disconnected remote processor to update its representation of the local system.
14. Begin Grooming
This mechanism makes it possible to select a set of files
for removal and determine the overall amount of space to be
recovered. With reference to FIG. 23, first verify that the
global grooming lock is currently unlocked (Step S382).
Then set the global grooming lock, set the total amount of
space freed during grooming to zero and empty the list of
files selected for deletion (Step S384). For each True File in
the True File registry 126, set the delete count to zero (Step
S386).
15. Select For Removal
This grooming mechanism tentatively selects a pathname
to allow its corresponding True File to be removed. With
reference to FIG. 24, first find the local directory extensions
table entry record 138 corresponding to the given pathname
(Step S388). Then find the True File registry entry record
140 corresponding to the True File name in the local
directory extensions table entry record 138 (Step S390). Add
one to the grooming delete count in the True File registry
entry record 140 and add the pathname to a list of files
selected for deletion (Step S392). If the grooming delete
count of the True File registry entry record 140 is equal to
the use count of the True File registry entry record 140, and
if the there are no entries in the dependency list of the True
File registry entry record 140, then add the size of the file
indicated by the True File ID and or compressed file ID to
the total amount of space freed during grooming (Step
S394).
16. End Grooming
This grooming mechanism ends the grooming phase and
removes all files selected for removal. With reference to
FIG. 25, for each file in the list of files selected for deletion,
delete the file (Step S396) and then unlock the global
grooming lock (Step S398).
Operating System Mechanisms
The next of the mechanisms provided by the present
invention, operating system mechanisms, are now described.
The following operating system mechanisms are
described:
1. Open File;
2. Close File;
3. Read File;
4. Write File;
5. Delete File or Directory;
6. Copy File or Directory;
7. Move File or Directory;
8. Get File Status; and
9. Get Files in Directory.
1. open File
A mechanism to open a file is described with reference to
FIGS. 26(a) and 26(b). This mechanism is given as input a
pathname and the type of access required for the file (for
example, read, write, read/write, create, etc.) and produces
either the File ID of the file to be opened or an indication that
no file should be opened. The local directory extensions
table record 138 and region table record 142 associated with
the opened file are associated with the open file for later use
in other processing functions which refer to the file, such as
read, write, and close.
First, determine whether or not the named file exists
locally by examining the local directory extensions table 124
to determine whether there is an entry corresponding to the
given pathname (Step S400). If it is determined that the file
name does not exist locally, then, using the access type,
determine whether or not the file is being created by this
opening process (Step S402). If the file is not being created,
prohibit the open (Step S404). If the file is being created,
create a zero-length scratch file using an entry in local
directory extensions table 124 and produce the scratch file
ID of this scratch file as the result (Step S406).
If, on the other hand, it is determined in step S400 that the
file name does exist locally, then determine the region in
which the file is located by searching the region table 128 to
find the record 142 with the longest region path which is a
prefix of the file pathname (Step S408). This record identifies the region of the specified file.
Next, determine using the access type, whether the file is
being opened for writing or whether it is being opened only
for reading (Step S410). If the file is being opened for
reading only, then, if the file is a scratch file (Step S419),
return the scratch File ID of the file (Step S424). Otherwise
5
10
15
20
25
30
35
40
45
50
55
60
65
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 45 of 57 PageID #: 58
5,978,791
21
22
5. Delete File or Directory
get the True Name from the local directory extensions table
124 and make a local version of the True File associated with
The process of deleting a file, for a given pathname, is
the True Name using the Make True File Local primitive
described here with reference to FIGS. 27(a) and 27(b).
mechanism, and then return the True File ID associated with
First, determine the local directory extensions table entry
the True Name (Step S420).
5 record 138 and region table entry record 142 for the file
If the file is not being opened for reading only (Step
(Step S422). If the file has no local directory extensions table
S410), then, if it is determined by inspecting the region table
entry record 138 or is locked or is in a read-only region,
entry record 142 that the file is in a read-only directory (Step
prohibit the deletion.
S416), then prohibit the opening (Step S422).
Identify the corresponding True File given the True Name
If it is determined by inspecting the region table 128 that
10 of the file being deleted using the True File registry 126
the file is in a cached region (Step S423), then send a Lock
(Step S424). If the file has no True Name, (Step S426) then
Cache message to the corresponding cache server, and wait
delete the scratch copy of the file based on its scratch file ID
for a return message (Step S418). If the return message says
in the local directory extensions table 124 (Step S427), and
the file is already locked, prohibit the opening.
continue with step S428.
If the access type indicates that the file being modified is
If the file has a True Name and the True File's use count
being rewritten completely (Step S419), so that the original 15
is one (Step S429), then delete the True File (Step S430), and
data will not be required, then Delete the File using the
continue with step S428.
Delete File OS mechanism (Step S421) and perform step
If the file has a True Name and the True File's use count
S406. Otherwise, make a scratch copy of the file (Step S417)
is greater than one, reduce its use count by one (Step S431).
and produce the scratch file ID of the scratch file as the result
(Step S424).
20 Then proceed with step S428.
2. Close File
In Step S428, delete the local directory extensions table
This mechanism takes as input the local directory extenentry record, and add an entry to the audit file 132 indicating
sions table entry record 138 of an open file and the data
the time and the operation performed (delete).
maintained for the open file. To close a file, add an entry to
6. Copy File or Directory
the audit file indicating the time and operation (create, read 25
A mechanism is provided to copy a file or directory given
or write). The audit file processing (using the Process Audit
a source and destination processor and pathname. The Copy
File Entry primitive mechanism) will take care of assimiFile mechanism does not actually copy the data in the file,
lating the file and thereby updating the other records.
only the True Name of the file. This mechanism is performed
3. Read File
as follows:
To read a file, a program must provide the offset and 30
(A) Given the source path, get the True Name from the
length of the data to be read, and the location of a buffer into
path. If this step fails, the mechanism fails.
which to copy the data read.
(B) Given the True Name and the destination path, link
The file to be read from is identified by an open file
the destination path to the True Name.
descriptor which includes a File ID as computed by the Open
(C) If the source and destination processors have different
File operating system mechanism defined above. The File ID 35 True File registries, find (or, if necessary, create) an entry for
may identify either a scratch file or a True File (or True File
the True Name in the True File registry table 126 of the
segment). If the File ID identifies a True File, it may be
destination processor. Enter into the source ID field of this
either a simple or a compound True File. Reading a file is
new entry the source processor identity.
accomplished by the following steps:
(D) Add an entry to the audit file 132 indicating the time
In the case where the File ID identifies a scratch file or a 40 and operation performed (copy).
simple True File, use the read capabilities of the underlying
This mechanism addresses capability of the system to
operating system.
avoid copying data from a source location to a destination
In the case where the File ID identifies a compound file,
location when the destination already has the data. In
break the read operation into one or more read operations on
addition, because of the ability to freeze a directory, this
component segments as follows:
45 mechanism also addresses capability of the system immeA. Identify the segment(s) to be read by dividing the
diately to make a copy of any collection of files, thereby to
specified file offset and length each by the fixed size of a
support an efficient version control mechanisms for groups
segment (a system dependent parameter), to determine the
of files.
segment number and number of segments that must be read.
7. Move File or Directory
B. For each segment number computed above, do the 50
A mechanism is described which moves (or renames) a
following:
file from a source path to a destination path. The move
i. Read the compound True File index block to determine
operation, like the copy operation, requires no actual transfer
the True Name of the segment to be read.
of data, and is performed as follows:
ii. Use the Realize True File from Location primitive
(A) Copy the file from the source path to the destination
mechanism to make the True File segment available 55 path.
locally. (If that mechanism fails, the Read File mecha(B) If the source path is different from the destination
nism fails).
path, delete the source path.
iii. Determine the File ID of the True File specified by the
8. Get File Status
True Name corresponding to this segment.
This mechanism takes a file pathname and provides
iv. Use the Read File mechanism (recursively) to read 60 information about the pathname. First the local directory
extensions table entry record 138 corresponding to the
from this segment into the corresponding location in
pathname given is found. If no such entry exists, then this
the specified buffer.
4. Write File
mechanism fails, otherwise, gather information about the file
File writing uses the file ID and data management capaand its corresponding True File from the local directory
bilities of the underlying operating system. File access 65 extensions table 124. The information can include any
(Make File Local described above) can be deferred until the
information shown in the data structures, including the size,
first read or write.
type, owner, True Name, sources, time of last access, time of
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 46 of 57 PageID #: 59
5,978,791
23
24
last modification, state (local or not, assimilated or not,
One the other hand, if a True File registry entry record 140
compressed or not), use count, expiration date, and reseris not found (Step S434), and the flag indicates that the
vations.
request for this True File is to be forwarded (Step S436),
9. Get Files in Directory
then forward a request for this True File to some other
This mechanism enumerates the files in a directory. It is 5 processors in the system (Step S442). If the source table for
used (implicitly) whenever it is necessary to determine
the current processor identifies one or more publishing
whether a file exists (is present) in a directory. For instance,
servers which should have a copy of this True File, then
forward the request to each of those publishing servers (Step
it is implicitly used in the Open File, Delete File, Copy File
S436).
or Directory, and Move File operating system mechanisms,
because the files operated on are referred to by pathnames 10
If a True File registry entry record 140 is found for the
required True File (Step S434), and if the entry includes a
containing directory names. The mechanism works as follows:
True File ID or Compressed File ID (Step S440), respond
positively (Step S444). If the entry includes a True File ID
The local directory extensions table 124 is searched for an
entry 138 with the given directory pathname. If no such
then this provides the identity or disk location of the actual
entry is found, or if the entry found is not a directory, then 15 physical representation of the file or file segment required.
If the entry include a Compressed File ID, then a comthis mechanism fails.
If there is a corresponding True File field in the local
pressed version of the True File may be stored instead of, or
directory extensions table record, then it is assumed that the
in addition to, an uncompressed version. This field provides
True File represents a frozen directory. The Expand Frozen
the identity of the actual representation of the compressed
Directory primitive mechanism is used to expand the exist- 20 version of the file.
If the True File registry entry record 140 is found (Step
ing True File into directory entries in the local directory
S434) but does not include a True File ID (the File ID is
extensions table.
absent if the actual file is not currently present at the current
Finally, the local directory extensions table 124 is again
searched, this time to find each directory subordinate to the
location) (Step S440), and if the True File registry entry
given directory. The names found are provided as the result. 25 record 140 includes one or more source processors, and if
Remote Mechanisms
the request can be forwarded, then forward the request for
The remote mechanisms provided by the present inventhis True File to one or more of the source processors (Step
tion are now described. Recall that remote mechanisms are
S444).
2. Reserve True File
used by the operating system in responding to requests from
other processors. These mechanisms enable the capabilities 30
This mechanism allows a remote processor to indicate
of the present invention in a peer-to-peer network mode of
that it depends on the local processor for access to a specific
operation.
True File. It takes a True Name as input. This mechanism is
In a presently preferred embodiment, processors commudescribed here.
(A) Find the True File registry entry record 140 associated
nicate with each other using a remote procedure call (RPC)
style interface, running over one of any number of commu- 35 with the given True File. If no entry exists, reply negatively.
nication protocols such as IPX/SPX or TCP/IP. Each peer
(B) If the True File registry entry record 140 does not
include a True File ID or compressed File ID, and if the True
processor which provides access to its True File registry 126
or file regions, or which depends on another peer processor,
File registry entry record 140 includes no source IDs for
provides a number of mechanisms which can be used by its
removable storage volumes, then this processor does not
peers.
40 have access to a copy of the given file. Reply negatively.
The following remote mechanisms are described:
(C) Add the ID of the sending processor to the list of
dependent processors for the True File registry entry record
1. Locate True File;
140. Reply positively, with an indication of whether the
2. Reserve True File;
reserved True File is on line or off line.
3. Request True File;
45 3. Request True File
4. Retire True File;
This mechanism allows a remote processor to request a
5. Cancel Reservation;
copy of a True File from the local processor. It requires a
6. Acquire True File;
True Name and responds positively by sending a True File
7. Lock Cache;
back to the requesting processor. The mechanism operates as
8. Update Cache; and
50 follows:
(A) Find the True File registry entry record 140 associated
9. Check Expiration Date.
1. Locate True File
with the given True Name. If there is no such True File
This mechanism allows a remote processor to determine
registry entry record 140, reply negatively.
whether the local processor contains a copy of a specific
(B) Make the True File local using the Make True File
True File. The mechanism begins with a True Name and a 55 Local primitive mechanism. If this mechanism fails, the
flag indicating whether to forward requests for this file to
Request True File mechanism also fails.
other servers. This mechanism is now described with refer(C) Send the local True File in either it is uncompressed
ence to FIG. 28.
or compressed form to the requesting remote processor.
First determine if the True File is available locally or if
Note that if the True File is a compound file, the components
there is some indication of where the True File is located (for 60 are not sent.
example, in the Source IDs field). Look up the requested
(D) If the remote file is listed in the dependent process list
True Name in the True File registry 126 (Step S432).
of the True File registry entry record 140, remove it.
If a True File registry entry record 140 is not found for this
4. Retire True File
True Name (Step S434), and the flag indicates that the
This mechanism allows a remote processor to indicate
request is not to be forwarded (Step S436), respond nega- 65 that it no longer plans to maintain a copy of a given True
tively (Step S438). That is, respond to the effect that the True
File. An alternate source of the True File can be specified, if,
File is not available.
for instance, the True File is being moved from one server
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 47 of 57 PageID #: 60
5,978,791
25
26
to another. It begins with a True Name, a requesting proLink the given pathname to the given True Name using
cessor ID, and an optional alternate source. This mechanism
the Link Path to True Name primitive mechanism.
operates as follows:
Unlock the local directory extensions table entry record
(A) Find a True Name entry in the True File registry 126.
138 and return positively.
If there is no entry for this True Name, this mechanism's task 5 9. Check Expiration Date
is complete.
Return current or new expiration date and possible alter(B) Find the requesting processor on the source list and,
native source to caller.
if it is there, remove it.
Background Processes and Mechanisms
(C) If an alternate source is provided, add it to the source
The background processes and mechanisms provided by
list for the True File registry entry record 140.
10 the present invention are now described. Recall that back(D) If the source list of the True File registry entry record
ground mechanisms are intended to run occasionally and at
140 has no items in it, use the Locate Remote File primitive
a low priority to provide automated management capabilities
mechanism to search for another copy of the file. If it fails,
.
.
with respect to the present invention.
raIse a senous error.
The following background mechanisms are described:
5. Cancel Reservation
1. Mirror True File;
This mechanism allows a remote processor to indicate 15
that it no longer requires access to a True File stored on the
2. Groom Region;
local processor. It begins with a True Name and a requesting
3. Check for Expired Links;
processor ID and proceeds as follows:
4. Verify Region; and
(A) Find the True Name entry in the True File registry
126. If there is no entry for this True Name, this mecha- 20
5. Groom Source List.
nism's task is complete.
1. Mirror True File
(B) Remove the identity of the requesting processor from
This mechanism is used to ensure that files are available
the list of dependent processors, if it appears.
in alternate locations in mirror groups or archived on archi(C) If the list of dependent processors becomes zero and
val servers. The mechanism depends on application-specific
the use count is also zero, delete the True File.
25 migration/archival criteria (size, time since last access, num6. Acquire True File
ber of copies required, number of existing alternative
This mechanism allows a remote processor to insist that
sources) which determine under what conditions a file
a local processor make a copy of a specified True File. It is
should be moved. The Mirror True File mechanism operates
used, for example, when a cache client wants to write
as follows, using the True File specified, perform the folthrough a new version of a file. The Acquire True File 30
lowing steps:
mechanism begins with a data item and an optional True
(A) Count the number of available locations of the True
Name for the data item and proceeds as follows:
File by inspecting the source list of the True File registry
(A) Confirm that the requesting processor has the right to
entry record 140 for the True File. This step determines how
require the local processor to acquire data items. If not, send
many copies of the True File are available in the system.
a negative reply.
(B) If the True File meets the specified migration criteria,
(B) Make a local copy of the data item transmitted by the 35
select a mirror group server to which a copy of the file
remote processor.
should be sent. Use the Acquire True File remote mechanism
(C) Assimilate the data item into the True File registry of
to copy the True File to the selected mirror group server. Add
the local processor.
the identity of the selected system to the source list for the
(D) If a True Name was provided with the file, the True
Name calculation can be avoided, or the mechanism can 40 True File.
2. Groom Region
verify that the file received matches the True Name sent.
This mechanism is used to automatically free up space in
(E) Add an entry in the dependent processor list of the true
file registry record indicating that the requesting processor
a processor by deleting data items that may be available
elsewhere. The mechanism depends on application-specific
depends on this copy of the given True File.
45 grooming criteria (for instance, a file may be removed if
(F) Send a positive reply.
there is an alternate online source for it, it has not been
7. Lock Cache
accessed in a given number of days, and it is larger than a
This mechanism allows a remote cache client to lock a
local file so that local users or other cache clients cannot
given size). This mechanism operates as follows:
change it while the remote processor is using it. The
Repeat the following steps (i) to (iii) with more aggressive
mechanism begins with a pathname and proceeds as follows: 50 grooming criteria until sufficient space is freed or until all
grooming criteria have been exercised. Use grooming infor(A) Find the local directory extensions table entry record
138 of the specified pathname. If no such entry exists, reply
mation to determine how much space has been freed. Recall
negatively.
that, while grooming is in effect, grooming information
(B) If an local directory extensions table entry record 138
includes a table of pathnames selected for deletion, and
exists and is already locked, reply negatively that the file is 55 keeps track of the amount of space that would be freed if all
already locked.
of the files were deleted.
(C) If an local directory extensions table entry record 138
(i) Begin Grooming (using the primitive mechanism).
exists and is not locked, lock the entry. Reply positively.
(ii) For each pathname in the specified region, for the True
8. Update Cache
File corresponding to the pathname, if the True File is
This mechanism allows a remote cache client to unlock a 60 present, has at least one alternative source, and meets
local file and update it with new contents. It begins with a
application specific grooming criteria for the region, select
pathname and a True Name. The file corresponding to the
the file for removal (using the primitive mechanism).
True Name must be accessible from the remote processor.
(iii) End Grooming (using the primitive mechanism).
If the region is used as a cache, no other processors are
This mechanism operates as follows:
Find the local directory extensions table entry record 138 65 dependent on True Files to which it refers, and all such True
corresponding to the given pathname. Reply negatively if no
Files are mirrored elsewhere. In this case, True Files can be
such entry exists or if the entry is not locked.
removed with impunity. For a cache region, the grooming
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 48 of 57 PageID #: 61
5,978,791
27
28
criteria would ordinarily eliminate the least recently
Extended Mechanisms
accessed True Files first. This is best done by sorting the
The extended mechanisms provided by the present invenTrue Files in the region by the most recent access time
tion are now described. Recall that extended mechanisms
before performing step (ii) above. The application specific
run within application programs over the operating system
criteria would thus be to select for removal every True File 5 to provide solutions to specific problems and applications.
encountered (beginning with the least recently used) until
The following extended mechanisms are described:
the required amount of free space is reached.
1. Inventory Existing Directory;
3. Check for Expired Links
2. Inventory Removable, Read-only Files;
This mechanism is used to determine whether dependencies on published files should be refreshed. The following 10
3. Synchronize Directories;
steps describe the operation of this mechanism:
4. Publish Region;
For each pathname in the specified region, for each True
5. Retire Directory;
File corresponding to the pathname, perform the following
6. Realize Directory at Location;
step:
If the True File registry entry record 140 corresponding to
7. Verify True File;
the True File contains at least one source which is a 15
8. Track for Accounting Purposes; and
publishing server, and if the expiration date on the depen9. Track for Licensing Purposes.
dency is past or close, then perform the following steps:
1. Inventory Existing Directory
(A) Determine whether the True File registry entry record
This mechanism determines the True Names of files in an
contains other sources which have not expired.
existing on-line directory in the underlying operating sys(B) Check the True Name expiration of the server. If the 20 tem. One purpose of this mechanism is to install True Name
expiration date has been extended, or an alternate source is
mechanisms in an existing file system.
suggested, add the source to the True File registry entry
An effect of such an installation is to eliminate immedirecord 140.
ately all duplicate files from the file system being traversed.
(C) If no acceptable alternate source was found in steps 25 If several file systems are inventoried in a single True File
(A) or (B) above, make a local copy of the True File.
registry, duplicates across the volumes are also eliminated.
(D) Remove the expired source.
(A) Traverse the underlying file system in the operating
4. Verify Region
system. For each file encountered, excluding directories,
This mechanism can be used to ensure that the data items
perform the following:
in the True File registry 126 have not been damaged acci(i) Assimilate the file encountered (using the Assimilate
30
dentally or maliciously. The operation of this mechanism is
File primitive mechanism). This process computes its
described by the following steps:
True Name and moves its data into the True File
(A) Search the local directory extensions table 124 for
registry 126.
each pathname in the specified region and then perform the
(ii) Create a pathname consisting of the path to the volume
35
following steps:
directory and the relative path of the file on the media.
(i) Get the True File name corresponding to the pathname;
Link this path to the computed True Name using the
(ii) If the True File registry entry 140 for the True File
Link Path to True Name primitive mechanism.
does not have a True File ID or compressed file ID,
2. Inventory Removable, Read-only Files
ignore it.
A system with access to removable, read-only media
(iii) Use the Verify True File mechanism (see extended 40 volumes (such as WORM disks and CD-ROMs) can create
mechanisms below) to confirm that the True File specia usable inventory of the files on these disks without having
fied is correct.
to make online copies. These objects can then be used for
S. Groom Source List
archival purposes, directory overlays, or other needs. An
The source list in a True File entry should be groomed
operator must request that an inventory be created for such
sometimes to ensure there are not too many mirror or archive 45 a volume.
copies. When a file is deleted or when a region definition or
This mechanism allows for maintaining inventories of the
its mirror criteria are changed, it may be necessary to inspect
contents of files and data items on removable media, such as
the affected True Files to determine whether there are too
diskettes and CD-ROMS, independent of other properties of
many mirror copies. This can be done with the following
the files such as name, location, and date of creation.
steps:
50
The mechanism creates an online inventory of the files on
For each affected True File,
one or more removable volumes, such as a floppy disk or
(A) Search the local directory extensions table to find
CD-ROM, when the data on the volume is represented as a
each region that refers to the True File.
directory. The inventory service uses a True Name to iden(B) Create a set of "required sources", initially empty.
tify each file, providing a way to locate the data independent
(C) For each region found,
55 of its name, date of creation, or location.
(a) determine the mirroring criteria for that region,
The inventory can be used for archival of data (making it
(b) determine which sources for the True File satisfy the
possible to avoid archiving data when that data is already on
mirroring criteria, and
a separate volume), for grooming (making it possible to
delete infrequently accessed files if they can be retrieved
(c) add these sources to the set of required sources.
(D) For each source in the True File registry entry, if the 60 from removable volumes), for version control (making it
source identifies a remote processor (as opposed to removpossible to generate a new version of a CD-ROM without
having to copy the old version), and for other purposes.
able media), and if the source is not a publisher, and if the
The inventory is made by creating a volume directory in
source is not in the set of required sources, then eliminate the
source, and use the Cancel Reservation remote mechanism
the media inventory in which each file named identifies the
to eliminate the given processor from the list of dependent 65 data item on the volume being inventoried. Data items are
processors recorded at the remote processor identified by the
not copied from the removable volume during the inventory
source.
process.
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 49 of 57 PageID #: 62
5,978,791
29
An operator must request that an inventory be created for
a specific volume. Once created, the volume directory can be
frozen or copied like any other directory. Data items from
either the physical volume or the volume directory can be
accessed using the Open File operating system mechanism
which will cause them to be read from the physical volume
using the Realize True File from Location primitive mechamsm.
To create an inventory the following steps are taken:
(A) A volume directory in the media inventory is created
to correspond to the volume being inventoried. Its contextual name identifies the specific volume.
(B) A source table entry 144 for the volume is created in
the source table 130. This entry 144 identifies the physical
source volume and the volume directory created in step (A).
(C) The filesystem on the volume is traversed. For each
file encountered, excluding directories, the following steps
are taken:
(i) The True Name of the file is computed. An entry is
created in the True Name registry 124, including the
True Name of the file using the primitive mechanism.
The source field of the True Name registry entry 140
identifies the source table entry 144.
(ii) A pathname is created consisting of the path to the
volume directory and the relative path of the file on the
media. This path is linked to the computed True Name
using Link Path to True Name primitive mechanism.
(D) After all files have been inventoried, the volume
directory is frozen. The volume directory serves as a table of
contents for the volume. It can be copied using the Copy File
or Directory primitive mechanism to create an "overlay"
directory which can then be modified, making it possible to
edit a virtual copy of a read-only medium.
3. Synchronize Directories
Given two versions of a directory derived from the same
starting point, this mechanism creates a new, synchronized
version which includes the changes from each. Where a file
is changed in both versions, this mechanism provides a user
exit for handling the discrepancy. By using True Names,
comparisons are instantaneous, and no copies of files are
necessary.
This mechanism lets a local processor synchronize a
directory to account for changes made at a remote processor.
Its purpose is to bring a local copy of a directory up to date
after a period of no communication between the local and
remote processor. Such a period might occur if the local
processor were a mobile processor detached from its server,
or if two distant processors were run independently and
updated nightly.
An advantage of the described synchronization process is
that it does not depend on synchronizing the clocks of the
local and remote processors. However, it does require that
the local processor track its position in the remote processor's audit file.
This mechanism does not resolve changes made simultaneously to the same file at several sites. If that occurs, an
external resolution mechanism such as, for example, operator intervention, is required.
The mechanism takes as input a start time, a local
directory pathname, a remote processor name, and a remote
directory pathname name, and it operates by the following
steps:
(A) Request a copy of the audit file 132 from the remote
processor using the Request True File remote mechanism.
(B) For each entry 146 in the audit file 132 after the start
time, if the entry indicates a change to a file in the remote
directory, perform the following steps:
30
(i) Compute the pathname of the corresponding file in the
local directory. Determine the True Name of the corresponding file.
(ii) If the True Name of the local file is the same as the old
5
True Name in the audit file, or if there is no local file
and the audit entry indicates a new file is being created,
link the new True Name in the audit file to the local
pathname using the Link Path to True Name primitive
mechanism.
10
(iii) Otherwise, note that there is a problem with the
synchronization by sending a message to the operator
or to a problem resolution program, indicating the local
pathname, remote pathname, remote processor, and
time of change.
15
(C) After synchronization is complete, record the time of
the final change. This time is to be used as the new start time
the next time this directory is synchronized with the same
remote processor.
4. Publish Region
20
The publish region mechanism allows a processor to offer
the files in a region to any client processors for a limited
period of time.
The purpose of the service is to eliminate any need for
client processors to make reservations with the publishing
25
processor. This in turn makes it possible for the publishing
processor to service a much larger number of clients.
When a region is published, an expiration date is defined
for all files in the region, and is propagated into the pub30 lishing system's True File registry entry record 140 for each
file.
When a remote file is copied, for instance using the Copy
File operating system mechanism, the expiration date is
copied into the source field of the client's True File registry
35 entry record 140. When the source is a publishing system, no
dependency need be created.
The client processor must occasionally and in
background, check for expired links, to make sure it still has
access to these files. This is described in the background
40 mechanism Check for Expired Links.
5. Retire Directory
This mechanism makes it possible to eliminate safely the
True Files in a directory, or at least dependencies on them,
after ensuring that any client processors depending on those
45 files remove their dependencies. The files in the directory are
not actually deleted by this process. The directory can be
deleted with the Delete File operating system mechanism.
The mechanism takes the pathname of a given directory,
and optionally, the identification of a preferred alternate
50 source processor for clients to use. The mechanism performs
the following steps:
(A) Traverse the directory. For each file in the directory,
perform the following steps:
55
60
65
(i) Get the True Name of the file from its path and find the
True File registry entry 140 associated with the True
Name.
(ii) Determine an alternate source for the True File. If the
source IDs field of the TFR entry includes the preferred
alternate source, that is the alternate source. If it does
not, but includes some other source, that is the alternate
source. If it contains no alternate sources, there is no
alternate source.
(iii) For each dependent processor in the True File registry
entry 140, ask that processor to retire the True File,
specifying an alternate source if one was determined,
using the remote mechanism.
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 50 of 57 PageID #: 63
5,978,791
31
32
6. Realize Directory at Location
(A) Note every time a file is created or deleted, for
instance by monitoring audit entries in the Process Audit
This mechanism allows the user or operating system to
force copies of files from some source location to the True
File Entry primitive mechanism. When such an event is
File registry 126 at a given location. The purpose of the
encountered, create an entry 148 in the accounting log 134
mechanism is to ensure that files are accessible in the event 5 that shows the responsible party and the identity of the file
the source location becomes inaccessible. This can happen
created or deleted.
for instance if the source or given location are on mobile
(B) Every time a file is transmitted, for instance when a
computers, or are on removable media, or if the network
file is copied with a Request True File remote mechanism or
connection to the source is expected to become unavailable,
an Acquire True File remote mechanism, create an entry in
10 the accounting log 134 that shows the responsible party, the
or if the source is being retired.
This mechanism is provided in the following steps for
identity of the file, and the source and destination proceseach file in the given directory, with the exception of
sors.
subdirectories:
(C) Occasionally run an accounting program to process
(A) Get the local directory extensions table entry record
the accounting log 134, distributing the events to the account
138 given the pathname of the file. Get the True Name of the
local directory extensions table entry record 138. This 15 records of each responsible party. The account records can
eventually be summarized for billing purposes.
service assimilates the file if it has not already been assimi9. Track for Licensing Purposes
1ated.
This mechanism ensures that licensed files are not used by
(B) Realize the corresponding True File at the given
unauthorized parties. The True Name provides a safe way to
location. This service causes it to be copied to the given
20 identify licensed material. This service allows proof of
location from a remote system or removable media.
possession of specific files according to their contents with7. Verify True File
out disclosing their contents.
This mechanism is used to verify that the data item in a
Enforcing use of valid licenses can be active (for example,
True File registry 126 is indeed the correct data item given
by refusing to provide access to a file without authorization)
its True Name. Its purpose is to guard against device errors,
25 or passive (for example, by creating a report of users who do
malicious changes, or other problems.
not have proper authorization).
If an error is found, the system has the ability to "heal"
One possible way to perform license validation is to
itself by finding another source for the True File with the
perform occasional audits of employee systems. The service
given name. It may also be desirable to verify that the error
described herein relies on True Names to support such an
has not propagated to other systems, and to log the problem
or indicate it to the computer operator. These details are not 30 audit, as in the following steps:
(A) For each licensed product, record in the license table
described here.
136 the True Name of key files in the product (that is, files
To verify a data item that is not in a True File registry 126,
which are required in order to use the product, and which do
use the Calculate True Name primitive mechanism described
not occur in other products) Typically, for a software
above.
The basic mechanism begins with a True Name, and 35 product, this would include the main executable image and
perhaps other major files such as clip-art, scripts, or online
operates in the following steps:
help. Also record the identity of each system which is
(A) Find the True File registry entry record 140 correauthorized to have a copy of the file.
sponding to the given True Name.
(B) Occasionally, compare the contents of each user
(B) If there is a True File ID for the True File registry
entry record 140 then use it. Otherwise, indicate that no file 40 processor against the license table 136. For each True Name
in the license table do the following:
exists to verify.
(i) Unless the user processor is authorized to have a copy
(C) Calculate the True Name of the data item given the file
of the file, confirm that the user processor does not have
ID of the data item.
a copy of the file using the Locate True File mecha(D) Confirm that the calculated True Name is equal to the
msm.
45
given True Name.
(ii) If the user processor is found to have a file that it is
(E) If the True Names are not equal, there is an error in
the True File registry 126. Remove the True File ID from the
not authorized to have, record the user processor and
True File registry entry record 140 and place it somewhere
True Name in a license violation table.
else. Indicate that the True File registry entry record 140
The System in Operation
contained an error.
50
Given the mechanisms described above, the operation of
a typical DP system employing these mechanisms is now
8. Track for Accounting Purposes
described in order to demonstrate how the present invention
This mechanism provides a way to know reliably which
files have been stored on a system or transmitted from one
meets its requirements and capabilities.
system to another. The mechanism can be used as a basis for
In operation, data items (for example, files, database
a value-based accounting system in which charges are based 55 records, messages, data segments, data blocks, directories,
on the identity of the data stored or transmitted, rather than
instances of object classes, and the like) in a DP system
simply on the number of bits.
employing the present invention are identified by substanThis mechanism allows the system to track possession of
tially unique identifiers (True Names), the identifiers
depending on all of the data in the data items and only on the
specific data items according to content by owner, independent of the name, date, or other properties of the data item, 60 data in the data items. The primitive mechanisms Calculate
and tracks the uses of specific data items and files by content
True Name and Assimilate Data Item support this property.
for accounting purposes. True names make it possible to
For any given data item, using the Calculate True Name
identify each file briefly yet uniquely for this purpose.
primitive mechanism, a substantially unique identifier or
Tracking the identities of files requires maintaining an
True Name for that data item can be determined.
accounting log 134 and processing it for accounting or 65
Further, in operation of a DP system incorporating the
billing purposes. The mechanism operates in the following
present invention, multiple copies of data items are avoided
steps:
(unless they are required for some reason such as backups or
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 51 of 57 PageID #: 64
5,978,791
33
34
mirror copies in a fault-tolerant system). Multiple copies of
Location primitive mechanism to make sure that component
data items are avoided even when multiple names refer to
segments are locally available, and then uses the operating
the same data item. The primitive mechanisms Assimilate
system file mechanisms to read data from the local file.
Thus, when a compound file is copied from a remote
Data Items and New True File support this property. Using
the Assimilate Data Item primitive mechanism, if a data item 5 system, only its True Name is copied. When it is opened,
already exists in the system, as indicated by an entry in the
only its indirect block is copied. When the corresponding file
True File registry 126, this existence will be discovered by
is read, the required component segments are realized and
this mechanism, and the duplicate data item (the new data
therefore copied.
item) will be eliminated (or not added). Thus, for example,
In operation data items can be accessed by reference to
if a data file is being copied onto a system from a floppy 10 their identities (True N ames) independent of their present
location. The actual data item or True File corresponding to
disk, if, based on the True Name of the data file, it is
determined that the data file already exists in the system (by
a given data identifier or True Name may reside anywhere in
the same or some other name), then the duplicate copy will
the system (that is, locally, remotely, offline, etc). If a
not be installed. If the data item was being installed on the
required True File is present locally, then the data in the file
system by some name other than its current name, then, 15 can be accessed. If the data item is not present locally, there
using the Link Path to True Name primitive mechanism, the
are a number of ways in which it can be obtained from
other (or new) name can be linked to the already existing
wherever it is present. Using the source IDs field of the True
data item.
File registry table, the location(s) of copies of the True File
In general, the mechanisms of the present invention
corresponding to a given True Name can be determined. The
operate in such a way as to avoid recreating an actual data 20 Realize True File from Location primitive mechanism tries
item at a location when a copy of that data item is already
to make a local copy of a True File, given its True Name and
present at that location. In the case of a copy from a floppy
the name of a source location (processor or media) that may
disk, the data item (file) may have to be copied (into a
contain the True File. If, on the other hand, for some reason
it is not known where there is a copy of the True File, or if
scratch file) before it can be determined that it is a duplicate.
This is because only one processor is involved. On the other 25 the processors identified in the source IDs field do not
respond with the required True File, the processor requiring
hand, in a multiprocessor environment or DP system, each
the data item can make a general request for the data item
processor has a record of the True Names of the data items
on that processor. When a data item is to be copied to
using the Request True File remote mechanism from all
processors in the system that it can contact.
another location (another processor) in the DP system, all
As a result, the system provides transparent access to any
that is necessary is to examine the True Name of the data 30
item prior to the copying. If a data item with the same True
data item by reference to its data identity, and independent
of its present location.
Name already exists at the destination location (processor),
In operation, data items in the system can be verified and
then there is no need to copy the data item. Note that if a data
item which already exists locally at a destination location is
have their integrity checked. This is from the manner in
still copied to the destination location (for example, because 35 which True Names are determined. This can be used for
the remote system did not have a True Name for the data
security purposes, for instance, to check for viruses and to
verify that data retrieved from another location is the desired
item or because it arrives as a stream of un-named data), the
Assimilate Data Item primitive mechanism will prevent
and requested data. For example, the system might store the
multiple copies of the data item from being created.
True Names of all executable applications on the system and
Since the True Name of a large data item (a compound 40 then periodically redetermine the True Names of each of
data item) is derived from and based on the True Names of
these applications to ensure that they match the stored True
components of the data item, copying of an entire data item
Names. Any change in a True Name potentially signals
can be avoided. Since some (or all) of the components of a
corruption in the system and can be further investigated. The
large data item may already be present at a destination
Verify Region background mechanism and the Verify True
location, only those components which are not present there 45 File extended mechanisms provide direct support for this
need be copied. This property derives from the manner in
mode of operation. The Verify Region mechanism is used to
ensure that the data items in the True File registry have not
which True Names are determined.
When a file is copied by the Copy File or Directory
been damaged accidentally or maliciously. The Verify True
operating system mechanism, only the True Name of the file
File mechanism verifies that a data item in a True File
50 registry is indeed the correct data item given its True Name.
is actually replicated.
Once a processor has determined where (that is, at which
When a file is opened (using the Open File operating
other processor or location) a copy of a data item is in the
system mechanism), it uses the Make True File Local
DP system, that processor might need that other processor or
primitive mechanism (either directly or indirectly through
location to keep a copy of that data item. For example, a
the Create Scratch File primitive mechanism) to create a
local copy of the file. The Open File operating system 55 processor might want to delete local copies of data items to
mechanism uses the Make True File Local primitive
make space available locally while knowing that it can rely
mechanism, which uses the Realize True File from Location
on retrieving the data from somewhere else when needed. To
this end the system allows a processor to Reserve (and
primitive mechanism, which, in turn uses the Request True
cancel the reservation of) True Files at remote locations
File remote mechanism.
The Request True File remote mechanism copies only a 60 (using the remote mechanism). In this way the remote
single data item from one processor to another. If the data
locations are put on notice that another location is relying on
item is a compound file, its component segments are not
the presence of the True File at their location.
copied, only the indirect block is copied. The segments are
A DP system employing the present invention can be
copied only when they are read (or otherwise needed).
made into a fault-tolerant system by providing a certain
The Read File operating system mechanism actually reads 65 amount of redundancy of data items at multiple locations in
data. The Read File mechanism is aware of compound files
the system. Using the Acquire True File and Reserve True
and indirect blocks, and it uses the Realize True File from
File remote mechanisms, a particular processor can imp le-
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 52 of 57 PageID #: 65
5,978,791
35
36
ment its own form of fault-tolerance by copying data items
A publishing server, on the other hand, may want to
to other processors and then reserving them there. However,
provide access to many clients, and possibly anonymous
the system also provides the Mirror True File background
ones, without incurring the overhead of tracking dependenmechanism to mirror (make copies) of the True File availcies for each client. Therefore, a public server can provide
able elsewhere in the system. Any degree of redundancy 5 expiration dates for True Files in its registry. This allows
(limited by the number of processors or locations in the
client systems to safely maintain references to a True File on
system) can be implemented. As a result, this invention
the public server. The Check For Expired Links background
maintains a desired degree or level of redundancy in a
mechanism allows the client of a publishing server to
network of processors, to protect against failure of any
occasionally confirm that its dependencies on the publishing
particular processor by ensuring that multiple copies of data 10
server are safe.
items exist at different locations.
In a variation of this aspect of the invention, a processor
The data structures used to implement various features
that is newly connected (or reconnected after some absence)
and mechanisms of this invention store a variety of useful
to the system can obtain a current version of all (or of
information which can be used, in conjunction with the
needed) data in the system by requesting it from a server
various mechanisms, to implement storage schemes and
policies in a DP system employing the invention. For 15 processor. Any such processor can send a request to update
or resynchronize all of its directories (starting at a root
example, the size, age and location of a data item (or of
directory), simply by using the Synchronize Directories
groups of data items) is provided. This information can be
extended mechanism on the needed directories.
used to decide how the data items should be treated. For
Using the accounting log or some other user provided
example, a processor may implement a policy of deleting
local copies of all data items over a certain age if other 20 mechanism, a user can prove the existence of certain data
items at certain times. By publishing (in a public place) a list
copies of those data items are present elsewhere in the
system. The age (or variations on the age) can be determined
of all True Names in the system on a given day (or at some
using the time of last access or modification in the local
given time), a user can later refer back to that list to show
directory extensions table, and the presence of other copies
that a particular data item was present in the system at the
of the data item can be determined either from the Safe Flag 25 time that list was published. Such a mechanism is useful in
tracking, for example, laboratory notebooks or the like to
or the source IDs, or by checking which other processors in
prove dates of conception of inventions. Such a mechanism
the system have copies of the data item and then reserving
at least one of those copies.
also permits proof of possession of a data item at a particular
In operation, the system can keep track of data items
date and time.
regardless of how those items are named by users (or 30
The accounting log file can also track the use of specific
regardless of whether the data items even have names). The
data items and files by content for accounting purposes. For
system can also track data items that have different names
instance, an information utility company can determine the
(in different or the same location) as well as different data
data identities of data items that are stored and transmitted
through its computer systems, and use these identities to
items that have the same name. Since a data item is identified
by the data in the item, without regard for the context of the 35 provide bills to its customers based on the identities of the
data, the problems of inconsistent naming in a DP system are
data items being transmitted (as defined by the substantially
overcome.
unique identifier). The assignment of prices for storing and
In operation, the system can publish data items, allowing
transmitting specific True Files would be made by the
other, possibly anonymous, systems in a network to gain
information utility and/or its data suppliers; this information
access to the data items and to rely on the availability of 40 would be joined periodically with the information in the
accounting log file to produce customer statements.
these data items. True Names are globally unique identifiers
Backing up data items in a DP system employing the
which can be published simply by copying them. For
example, a user might create a textual representation of a file
present invention can be done based on the True Names of
on system A with True Name N (for instance as a hexadecithe data items. By tracking backups using True Names,
mal string), and post it on a computer bulletin board. 45 duplication in the backups is prevented. In operation, the
Another user on system B could create a directory entry F
system maintains a backup record of data identifiers of data
for this True Name N by using the Link Path to True Name
items already backed up, and invokes the Copy File or
primitive mechanism. (Alternatively, an application could
Directory operating system mechanism to copy only those
be developed which hides the True Name from the users, but
data items whose data identifiers are not recorded in the
provides the same public transfer service.)
50 backup record. Once a data item has been backed up, it can
be restored by retrieving it from its backup location, based
When a program on system B attempts to open pathname
on the identifier of the data item. Using the backup record
F linked to True Name N, the Locate Remote File primitive
produced by the backup to identify the data item, the data
mechanism would be used, and would use the Locate True
item can be obtained using, for example, the Make True File
File remote mechanism to search for True Name N on one
or more remote processors, such as system A. If system B 55 Local primitive mechanism.
has access to system A, it would be able to realize the True
In operation, the system can be used to cache data items
File (using the Realize True File from Location primitive
from a server, so that only the most recently accessed data
mechanism) and use it locally. Alternatively, system B could
items need be retained. To operate in this way, a cache client
find True Name N by accessing any publicly available True
is configured to have a local registry (its cache) with a
Name server, if the server could eventually forward the 60 remote Local Directory Extensions table (from the cache
request to system A.
server). Whenever a file is opened (or read), the Local
Clients of a local server can indicate that they depend on
Directory Extensions table is used to identify the True
Name, and the Make True File Local primitive mechanism
a given True File (using the Reserve True File remote
inspects the local registry. When the local registry already
mechanism) so that the True File is not deleted from the
server registry as long as some client requires access to it. 65 has a copy, the file is already cached. Otherwise, the Locate
(The Retire True File remote mechanism is used to indicate
True File remote mechanism is used to get a copy of the file.
that a client no longer needs a given True File.)
This mechanism consults the cache server and uses the
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 53 of 57 PageID #: 66
5,978,791
37
38
Whenever a pathname is traversed, the Get Files in
Request True File remote mechanism to make a local copy,
Directory operating system mechanism is used, and when it
effectively loading the cache.
encounters a frozen directory, it uses the Expand Frozen
The Groom Cache background mechanism flushes the
Directory primitive mechanism.
cache, removing the least-recently-used files from the cache
A frozen directory can be copied from one pathname to
client's True File registry. While a file is being modified on 5
another efficiently, merely by copying its True Name. The
a cache client, the Lock Cache and Update Cache remote
Copy File operating system mechanism is used to copy a
mechanisms prevent other clients from trying to modify the
frozen directory.
same file.
Thus it is possible to efficiently create copies of different
In operation, when the system is being used to cache data
versions of a directory, thereby creating a record of its
items, the problems of maintaining cache consistency are 10
history (hence a version control system).
avoided.
In operation, the system can maintain a local inventory of
To access a cache and to fill it from its server, a key is
all the data items located on a given removable medium,
required to identify the data item desired. Ordinarily, the key
such as a diskette or CD-ROM. The inventory is indepenis a name or address (in this case, it would be the pathname
dent of other properties of the data items such as their name,
of a file). If the data associated with such a key is changed, 15 location, and date of creation.
the client's cache becomes inconsistent; when the cache
The Inventory Existing Directory extended mechanism
client refers to that name, it will retrieve the wrong data. In
provides a way to create True File Registry entries for all of
order to maintain cache consistency it is necessary to notify
the files in a directory. One use of this inventory is as a way
every client immediately whenever a change occurs on the
to pre-load a True File registry with backup record infor20 mation. Those files in the registry (such as previously
server.
installed software) which are on the volumes inventoried
By using an embodiment of the present invention, the
need not be backed up onto other volumes.
cache key uniquely identifies the data it represents. When
The Inventory Removable, Read-only Files extended
the data associated with a name changes, the key itself
mechanism not only determines the True Names for the files
changes. Thus, when a cache client wishes to access the
modified data associated with a given file name, it will use 25 on the medium, but also records directory entries for each
a new key (the True Name of the new file) rather than the key
file in a frozen directory structure. By copying and modifying this directory, it is possible to create an on line patch,
to the old file contents in its cache. The client will always
or small modification of an existing read-only file. For
request the correct data, and the old data in its cache will be
example, it is possible to create an online representation of
eventually aged and flushed by the Groom Cache back30 a modified CD-ROM, such that the unmodified files are
ground mechanism.
actually on the CD-ROM, and only the modified files are
Because it is not necessary to immediately notify clients
when changes on the cache server occur, the present invenonline.
tion makes it possible for a single server to support a much
In operation, the system tracks possession of specific data
larger number of clients than is otherwise possible.
items according to content by owner, independent of the
In operation, the system automatically archives data items 35 name, date, or other properties of the data item, and tracks
the uses of specific data items and files by content for
as they are created or modified. After a file is created or
accounting purposes. Using the Track for Accounting Purmodified, the Close File operating system mechanism creposes extended mechanism provides a way to know reliably
ates an audit file record, which is eventually processed by
the Process Audit File Entry primitive mechanism. This
which files have been stored on a system or transmitted from
mechanism uses the New True File primitive mechanism for 40 one system to another.
any file which is newly created, which in turn uses the
True Names in Relational and Object-Oriented Databases
Mirror True File background mechanism if the True File is
Although the preferred embodiment of this invention has
in a mirrored or archived region. This mechanism causes one
been presented in the context of a file system, the invention
of True Names would be equally valuable in a relational or
or more copies of the new file to be made on remote
45 object-oriented database. A relational or object-oriented
processors.
database system using True Names would have similar
In operation, the system can efficiently record and prebenefits to those of the file system employing the invention.
serve any collection of data items. The Freeze Directory
primitive mechanism creates a True File which identifies all
For instance, such a database would permit efficient elimiof the files in the directory and its subordinates. Because this
nation of duplicate records, support a cache for records,
True File includes the True Names of its constituents, it 50 simplify the process of maintaining cache consistency, prorepresents the exact contents of the directory tree at the time
vide location-independent access to records, maintain
it was frozen. The frozen directory can be copied with its
archives and histories of records, and synchronize with
distant or disconnected systems or databases.
components preserved.
The mechanisms described above can be easily modified
The Acquire True File remote mechanism (used in mirroring and archiving) preserves the directory tree structure 55 to serve in such a database environment. The True Name
registry would be used as a repository of database records.
by ensuring that all of the component segments and True
Files in a compound data item are actually copied to a
All references to records would be via the True Name of the
record. (The Local Directory Extensions table is an example
remote system. Of course, no transfer is necessary for data
items already in the registry of the remote system.
of a primary index that uses the True Name as the unique
In operation, the system can efficiently make a copy of 60 identifier of the desired records.)
any collection of data items, to support a version control
In such a database, the operations of inserting, updating,
mechanism for groups of the data items.
and deleting records would be implemented by first assimiThe Freeze Directory primitive mechanism is used to
lating records into the registry, and then updating a primary
create a collection of data items. The constituent files and
key index to map the key of the record to its contents by
segments referred to by the frozen directory are maintained 65 using the True Name as a pointer to the contents.
in the registry, without any need to make copies of the
The mechanisms described in the preferred embodiment,
constituents each time the directory is frozen.
or similar mechanisms, would be employed in such a
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 54 of 57 PageID #: 67
5,978,791
39
40
system. These mechanisms could include, for example, the
record of identifiers of data items backed up, and
mechanisms for calculating true names, assimilating,
invoking duplication means to copy only those data
locating, realizing, deleting, copying, and moving True
items whose data identifiers are not recorded in the
Files, for mirroring True Files, for maintaining a cache of
backup record.
True Files, for grooming True Files, and other mechanisms 5
9. An apparatus as in claim 8, further comprising:
based on the use of substantially unique identifiers.
recovery means for retrieving a data item previously
While the invention has been described in connection
backed up by said backup means, based on the identiwith what is presently considered to be the most practical
fier of the data item, said recovery means using the
and preferred embodiments, it is to be understood that the
backup record to identify the data item, and invoking
invention is not to be limited to the disclosed embodiment, 10
access means to retrieve the data item.
but on the contrary, is intended to cover various modifica10. An apparatus as in claim 2, wherein a location is a
tions and equivalent arrangements included within the spirit
computer among a network of computers, the apparatus
and scope of the appended claims.
further comprising:
What is claimed is:
1. In a data processing system, an apparatus comprising:
remote existence means for determining whether a data
identity means for determining, for any of a plurality of 15
item is present at a remote location in the system from
data items present in the system, a substantially unique
a current location in the system, based on the identifier
identifier, the identifier being determined using and
of the data item, said remote location using local
depending on all of the data in the data item and only
existence means at the remote location to determine
the data in the data item, whereby two identical data
whether the data item is present at the remote location,
items in the system will have the same identifier; and 20
and providing the current location with an indication of
existence means for determining whether a particular data
the presence of the data item at the remote location.
item is present in the system, by examining the iden11. An apparatus as in claim 4, wherein a location is a
tifiers of the plurality of data items.
computer among a network of computers, the apparatus
2. An apparatus as in claim 1, further comprising:
further comprising:
local existence means for determining whether an 25
requesting means for requesting a data item at a current
instance of a particular data item is present at a parlocation in the system from a remote location in the
ticular location in the system, based on the identifier of
system, based on the identifier of the data item, said
the data item.
remote location using access means at the remote
3. An apparatus as in claim 2, wherein each location
location to obtain the data item and to send it to the
contains a distinct plurality of data items, and wherein said 30
current location if it is present.
local existence means determines whether a particular data
12. An apparatus as in claim 1, further comprising:
item is present at a particular location in the system by
context means for making and maintaining a context
examining the identifiers of the plurality of data items at said
association between at least one contextual name of a
particular location in the system.
data item in the system and the identifier of the data
35
4. An apparatus as in claim 2, further comprising:
item; and
data associating means for making and maintaining, for a
referencing means for obtaining the identifier of a data
data item in the system, an association between the data
item in the system given a contextual name for the data
item and the identifier of the data item; and
item, using said context association.
access means for accessing a particular data item using 40
13. An apparatus as in claim 12, further comprising:
the identifier of the data item.
assignment means for assigning a data item to a contex5. An apparatus as in claim 2, further comprising:
tual name, invoking said identity means to determine
duplication means for copying a data item from a source
the identifier of the data item, and invoking said context
to a destination in the data processing system, by
means to make or modify the context association
providing said destination with the data item only if it 45
between the contextual name of the data item and the
is determined using the data identifier that the data item
identifier of the data item.
is not present at the destination.
14. An apparatus as in claim 12, further comprising:
6. An apparatus as in claim 4, further comprising:
data associating means for making and maintaining, for a
assimilation means for assimilating a new data item into
data item in the system, an association between the data
the system, said assimilation means invoking said 50
item and the identifier of the data item;
identity means to determine the identifier of the new
access means for accessing a particular data item using
data item and invoking said data associating means to
the identifier of the particular data item; and
associate the new data item with its identifier.
7. An apparatus as in claim 4, further comprising:
contextual name access means for accessing a data item in
the system for a given context name of the data item,
duplication means for duplicating a data item from a 55
determining the data identifier associated with the
source location to a destination location in the data
given context name, and invoking said access means to
processing system, based on the identifier of the data
access the data item using the data identifier.
item, said duplication means invoking said local exist15. An apparatus as in claim 11, further comprising:
ence means to determine whether an instance of the
data item is present at the destination location, and 60
transparent access means for accessing a data item from
invoking said access means to provide said destination
one of several locations, using the identifier of the data
with the data item only if said local existence means
item, said transparent access means invoking said local
determines that no instance of the data item is present
existence means to determine if the particular data item
at the destination.
is present at the current location, and, in the case when
8. An apparatus as in claim 7, further comprising:
65
the particular data item is not present at the current
backup means for making copies of data items in the
location, invoking said requesting means to obtain the
system, said backup means maintaining a backup
data item from a remote location.
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 55 of 57 PageID #: 68
5,978,791
41
42
16. An apparatus as in claim 15, further comprising:
requesting with a particular data identifier, to confirm
identifier copy means for copying an identifier of a data
that the data item obtained from the requesting means
item from a source location to a destination location.
is the same data item as the data item requested, the
17. An apparatus as in claim 15, further comprising:
verifying means invoking the identity means to detercontext means for making and maintaining a context 5
mine the data identifier of the obtained data item, and
association between a contextual name of a data item in
comparing the determined data identifier with the parthe system and the identifier of the data item;
ticular data identifier to verify the obtained data item.
context copy means for copying a data item from a source
24. An apparatus as in claim 2, wherein a location is at
location to a destination location, given the contextual
least one of a storage location and a processing location, and
name of the data item, by copying only the context 10
wherein a storage location is at least one of a data storage
association between the contextual identifier and the
device and a data storage volume, and wherein a processing
data identifier from the source location to the destinalocation is at least one of a data processor and a computer.
tion location; and
25. An apparatus as in claim 3, wherein at least some of
transparent referencing means for obtaining a data item
from one of several locations the system given a 15 said data items are compound data items, each compound
contextual name for the data item, said transparent
data item including at least some component data items in a
referencing means invoking said context association to
fixed sequence, and wherein the identity means determines
determine the data identifier of a data item given a
the identifier of a compound data item based on the identifier
contextual name, and invoking said transparent access
of each component data item of the compound data item.
means to access the data item from one of several 20
26. An apparatus as in claim 3, further comprising:
locations given the identifier of the data item.
context associating means for making and maintaining a
18. An apparatus as in claim 1, wherein at least some of
context association, for any data item in the system,
said data items are compound data items, each compound
between the identifier of the data item and at least one
data item including at least some component data items in a
contextual name of the data item at a particular location
fixed sequence, and wherein the identity means determines 25
in the system;
the identifier of a compound data item based on each
means for obtaining the identifier of a data item in the
component data item of the compound data item.
system given a contextual name for the data item at a
19. An apparatus as in claim 18, wherein said compound
particular location in the system; and
data items are files and said component data items are
logical copy means for associating the data identifier
segments, and wherein the identity means determines the 30
corresponding to a contextual name at a source location
identifier of a file based on the identifier of each data
with a contextual name at a destination location in the
segment of the file.
data processing system.
20. An apparatus as in claim 18, wherein said compound
27. An apparatus as in claim 25, wherein said compound
data items are directories and said component data items are
files or subordinate directories, and wherein the identity 35 data items are files and said component data items are
segments, and wherein the identity means determines the
means determines the identifier of a given directory based on
identifier of a file based on the identifier of each data
each file and subordinate directory within the given direcsegment of the file.
tory.
28. An apparatus as in claim 25, further comprising:
21. An apparatus as in claim 11, further comprising:
compound copy means for copying a data item from a
means for advertising a data item from a location in the 40
source location to a destination location in the data
system to at least one other location in the system, said
processing system, said compound copy means invokmeans for advertising providing each of said at least
ing said local existence means to determine whether the
one other location with the data identifier of the data
data item is present at the destination, and to determine,
item, and providing the data item to only those locawhen the data item is a compound data item, whether
tions of said other locations that request said data item 45
the component data items of the compound data item
in response to said providing.
are present at the destination, and providing said des22. An apparatus as in claim 18, further comprising:
tination with the data item only if said local existence
local existence means for determining whether a particumeans determines that the data item is not present at the
lar data item is present at a particular location in the
destination, and providing said destination with each
system, based on the identifier of the data item; and 50
component data item only if said local existence means
compound copy means for copying a data item from a
determines that the component data item is not present
source to a destination in the data processing system,
at the destination.
said compound copy means invoking said local exist29. An apparatus as in any of claims 1-28, wherein a data
ence means to determine whether the data item is
present at the destination, and to determine, when the 55 item is at least one of a file, a database record, a message,
a data segment, a data block, a directory, and an instance an
data item is a compound data item, whether the comobject class.
ponent data items of the compound data item are
30. A method of identifying a data item present in a data
present at the destination, and providing said destinaprocessing system for subsequent access to the data item, the
tion with the data item only if said local existence
means determines that the data item is not present at the 60 method comprising:
determining a substantially unique identifier for the data
destination, and providing said destination with each
item, the identifier depending on and being determined
component data item only if said local existence means
using all of the data in the data item and only the data
determines that the component data item is not present
in the data item, whereby two identical data items in the
at the destination.
system will have the same identifier; and
65
23. An apparatus as in claim 11, further comprising:
means for verifying the integrity of a data item obtained
accessing a data item in the system using the identifier of
the data item.
from the requesting means in response to providing the
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 56 of 57 PageID #: 69
5,978,791
43
44
31. A method as in claim 30, further comprising:
(A) maintammg a backup record of identifiers of data
items backed up at the previous backup time; and
making and maintaining, for a plurality of data items
present in the system, an association between each of
(B) for each of the plurality of data items present in the
the data items and the identifier of each of the data
data processing system,
items, wherein said accessing a data item accesses a 5
(i) determining a substantially unique identifier for the
data item via the association.
data item, the identifier depending on and being
32. A method as in claim 31, further comprising:
determined using all of the data in the data item and
only the data in the data item, whereby two identical
assimilating a new data item into the system, by deterdata items in the system will have the same identimining the identifier of the new data item and associ10
fier;
ating the new data item with its identifier.
(ii) determining those data items of the plurality of data
33. A method for duplicating a given data item present at
items whose identifiers are not in the backup record;
a source location to a destination location in a data processand
ing system, the method comprising:
(iii) based on the determining, copying only those data
determining a substantially unique identifier for the given
items whose data identities are not recorded in the
data item, the identifier depending on and being deter- 15
backup record.
mined using all of the data in the data item and only the
37. A method as in claim 36, further comprising:
data in the data item, whereby two identical data items
recording in the backup record the identifiers of those data
in the system will have the same identifier;
items copied in said copying.
determining, using the data identifier, whether the data 20
38. A method of locating a particular data item at a
item is present at the destination location; and
location in a data processing system, the method comprisbased on the determining whether the data item is present,
ing:
providing the destination location with the data item
(A) determining a substantially unique identifier for the
only if the data item is not present at the destination.
data item, the identifier depending on and being deter34. A method as in claim 33, wherein the given data item 25
mined using all of the data in the data item and only the
is a compound data item having a plurality of component
data in the data item, whereby two identical data items
data items, the method further comprising:
in the system will have the same identifier;
for each data item of the component data items,
(B) requesting the particular data item by sending the data
obtaining the component data identifier of the data item
identifier of the data item from the requester location to
by determining a substantially unique identifier for 30
at least one location of a plurality of provider locations
the data item, the identifier depending on and being
in the system; and
determined using all of the data in the data item and
(C) on at least some of the provider locations,
only the data in the data item, whereby two identical
(a) for each data item of a plurality of data items at the
data items in the system will have the same identiprovider locations,
fier;
35
(i) determining a substantially unique identifier for the
determining, using the obtained component data
data item, the identifier depending on and being
identifier, whether the data item is present at the
determined using all of the data in the data item and
destination; and
only on the data in the data item, whereby two
based on the determining, providing the destination
identical data items in the system will have the same
with the data item only if the data item is not present 40
identifier; and
at the destination.
(ii) making and maintaining a set of identifiers of data
35. A method for determining whether a particular data
items,
item is present in a data processing system, the method
(b) determining, based on the set of identifiers, whether
comprising:
the data item corresponding to the requested data
(A) for each data item of a plurality of data items present 45
identifier is present at the provider location; and
in the system,
(c) based on the determining, when the provider loca(i) determining a substantially unique identifier for the
tion determines that the particular data item is
data item, the identifier depending on and being
present at the provider location, notifying the
determined using all of the data in the data item and
requestor that the provider has a copy of the given
only the data in the data item, whereby two identical 50
data item.
data items in the system will have the same identi39. The method of claim 38, further comprising:
fier; and
(a) for each data item of a plurality of data items present
(ii) making and maintaining a set of identifiers of the
at said provider locations,
plurality of data items; and
making and maintaining an association between the
(B) for the particular data item,
55
data item and the identifier of the data item,
(i) determining a particular substantially unique iden(b) in response to said notifying, said client location
tifier for the data item, the identifier depending on
copying said data item from one of said responding
and being determined using all of the data in the data
remote locations, using said association to access the
item and only the data in the data item, whereby two
data item given the data identifier.
identical data items in the system will have the same 60
40. A method of locating a particular data item among a
identifier; and
plurality of locations, each of the locations having a plurality
(ii) determining whether the particular identifier is in
of data items, the method comprising:
the set of data items.
36. A method of backing up, of a plurality of data items
determining, for the particular data item and for each data
present in a data processing system, data items modified 65
item of the plurality of data items, a substantially
since a previous backup time in the data processing system,
unique identifier for the data item, the identifier
the method comprising:
depending on and being determined using all of the
Case 6:11-cv-00656 Document 1-2
Filed 12/08/11 Page 57 of 57 PageID #: 70
5,978,791
45
46
data in the data item and only the data in the data item,
whereby two identical data items in the system will
have the same identifier; and
determining the presence of the particular data item in
each of the plurality of locations by determining
whether the identifier of the particular data item is
present at each of the locations.
41. The method of claim 30, wherein said accessing
further comprises: for a given data identifier and for a given
current location and a remote location in the system:
determining whether the data item corresponding to the
given data identifier is present at the current location,
and
based on said determining, if said data item is not present
at the current location, fetching the data item from a
remote location in the system to the current location.
42. The method of claim 41, further comprising:
for each contextual name at a location,
making and maintaining a context association between
the context name of a data item and the identifier of
said data item, and when some context association
changes at said current location, and
notifying said remote location of a modification to the
context association.
43. The method of claim 42, further comprising:
at said remote location, updating the association between
the contextual identifier of the data item and the identifier of the data item.
44. The method of claim 43, further comprising:
from said remote location, notifying all other locations
that said data item has been modified, by providing the
contextual identifier and data identifier of said data item
to said other locations.
45. The method of claim 44, further comprising, at each
location notified that the data item has been modified:
modifying an association between the contextual identifier of the data item and the data identifier of the data
item, to record that the data item has been modified.
46. A method of maintaining at least a predetermined
number of copies of a given data item in a data processing
system, at different locations in the data processing system,
the data processing system being one wherein data is identified by a substantially unique identifier, the identifier
depending on and being determined using all of the data in
the data item and only the data in the data item, whereby two
identical data items in the system will have the same
identifier, and wherein any data item in the system may be
accessed using only the identifier of the data item, the
method comprising:
(i) sending, from a first location in the system, the data
identifier of the given data item to other locations in the
system; and
(ii) in response to the sending, at each of the other
locations,
(A) determining whether the data item corresponding to
the data identifier is present at the other location, and
based on the determining, and
(B) informing the first location whether the data item is
present at the other location; and
(iii) in response to the informing from the other locations,
at the first location,
(A) determining whether the data item is present in at least
the predetermined number of other locations, and based
on the determining,
(B) when less than the predetermined number of other
locations have a copy of the data item, requesting some
locations that do not have a copy of the data item make
a copy of the data item.
47. A method as in claim 46, wherein said step (iii) further
comprises:
(C) when more than the predetermined number of other
locations have a copy of the data item present, requesting some locations that do have a copy of the data item
present delete the copy of the data item.
48. A method as in any of claims 30--45, 46 and 47,
wherein said data items are at least one of a file, a database
record, a message, a data segment, a data block, a directory,
and an instance of an object class.
5
10
15
20
25
30
35
40
* * * * *
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?