Institute For Information Industry v. Google Inc.
Filing
1
COMPLAINT For Patent Infringement against Google Inc. ( Filing fee $ 400 receipt number 0540-4133094.), filed by Institute For Information Industry. (Attachments: # 1 Civil Cover Sheet, # 2 Exhibit A)(Huff, Winston)
EXHIBIT
A
1
1111111114)1111 11
1111111111111111
11111111 ! 1 11 11
1
1
1
( 2 United States Patent
1)
(to) Patent No.: U S 6,845,354 B 1
(45) Date of Patent: J a n . 18, 2005
Kuo et al.
(54)
I N F O R M AT I O N R E T R I E VA L SYSTEM W I T H
A N E M O - F U Z Z Y STRUCTURE
(75)
Inventors: You-Hwang Kuo, Tainan (11V);
Jang-Pong Hsu, Tainan (TIV)
(73)
Assignee: Institute for Information Industry,
Taipei (TIV)
Notice:
6,101,491 A * 8 / 2 0 0 0 Wo o d s
(57)
Int. C I .
7
U.S. Cl.
7
0
4
/
9
;
707/6
G
Field o f Search
7 0 4 / 1 , 9, 10; 707/2-3,
707/4, 5-6, 104, 102; 715/531, 532-534
O
6
References Cited
(56)
F
U.S. PATENT DOCUMENTS
1
7
5,642,502 A * 6 / 1 9 9 7 D r i s c o l l
7 0 7 / 5
5,724,571 A * 3 / 1 9 9 8 Wo o d s
7 0 7 / 5
/
5,933,822 A * 8/1999 Braden-Harder et al. 7 0 7 / 5
2
5,956,711 A * 9 / 1 9 9 9 Sullivan et al.
7 0 7 / 6
7
(51)
(52)
(58)
(-101
ENCODING
SECTION
QUERY
WORDS
(11)
B
S
T
R
A
C
(
SIMILARITY
CALCULATION
SECTION 1
0
7
3
T
22 Claims, 7 Drawing Sheets
0
3
DETERMINING
SECTION
1
L.
A
(
CUSTOMIZABLE SYNONYM
ADJUSTING BLOCK
9
(
/
An intelligent information retrieval system for finding information components corresponding to an input query word.
The system includes a synonym block for finding synonymous indexed keywords of the query word; an information
indexing block for finding corresponding information components based o n t h e indexed keyword; a component
ranking-filtering block for ranking and filtering the found
information components and outputting the desired information components being selected; a synonym adjusting
block for adjusting the fuzzy mechanism o f the synonym
block based on the found information components; and a
filtering adjusting block for adjusting the fuzzy mechanism
of the ranking-filtering block based on the found information
components. T h e aforementioned synonym b l o c k and
information-indexing block are implemented b y neurofuzzy networks f o r accelerating parallel processing and
automatic learning. Further, the synonym block can tolerate
input errors by way o f query word encoding and position
shift compensation.
Filed: S e p . 9, 1999
RETRIEVED
COMPONENT
KEYWORDS (19)
7
* cited by examiner
Appl. No.: 09/392,572
(22)
0
Primary Examiner—David D. Knepper
(74) Attorney, Agent, o r Firm—Birch, Stewart, Kolasch &
Birch, LLP
Subject to any disclaimer, the term of this
patent is extended or adjusted under 35
154(b) by 0 days.
(21)
7
(
CORRELATION I N D E X E D
SECTION K E Y W O R D
(13)
1
0
5
U.S. P a t e n t
Sheet 1 of 7
J a n . 18, 2005
U
S
6,845,354 B1
I
—
ril
M
•c:
N
0
If)
L
glr'"
6
'61
7_
( - - -
c
L
'
U.S. P a t e n t
J a n . 18, 2005
.
CO
T
Sheet 2 of 7
C
O
C
U
S
O
.
t
C
D
C
O
T
1
-
a)
c
o
=
I
2 / ›< ><
o i C) co c )
cD > >
0
c
D
c
o
>< > < _ J
(--) CO C..)
CD > >
h;
CL
>N
F-
E
a)
-1-,
c
o
>,
(/)
14'
E
Ja
m
cn
6,845,354 B1
U.S. P a t e n t
Sheet 3 of 7
J a n . 18, 2005
U
S
6,845,354 B1
1) c4
4
1
x c)
n
'
OIMEMEMINIMILEMMIMM
,
4 1 t•••,.,._,
r
z
i
1—'
r
A
•
•
•
•
•
•
•
•
•
•
,
L
i
r
Z7.,
a
L
1-CD
L
0
zz
10
6
415 0 E'
0 0
7
41 4 1
: L . Z C/)
A A
L.
i
U . S . P a t e n t J a n . 18, 2005
CD
CD
C:)
C:)
CD
CD
CD
CD
Cb
CD
1
CD
CD
C:)
CD
CD
C)
C:)
CD
u:)
1
T
-
co
1
CD
n---
CD
CD
co
I
CNI
CNI
CD
Sheet 4 of 7
ci
N
1
CD
C:)
C:)
CD
C)
C)
C:)
C:)
C)
o
C:)
CD
CD
C)
CD
.
3
,
-
C:)
CD
1
0
2
2
(NJ
1
1
.%•
r
s
•
1
to
1
>.%
re)
,
I
N
N
Co
olf•
—
••
,
I
l
1
.1
•
•
•
Pr)
•
T
1
1
N•••••
CD
1
—
'.1•
•I•••••
CD
Co
IC)
re)
I
CD
C:)
6,845,354 B1
ol•••
mr--
CD
co
1
CNI
1
.
•
CD
CD
I
1•••
CD
CD
0)
S
—
•
1
(NI
C:)
S
CZ)
o l •
'1'
.
•
U
1
4
—
•I
.
•
U.S. P a t e n t
Sheet 5 of 7
J a n . 18, 2005
U
,
S
L,
2
-z H
,, i
6,845,354 B1
,
1.0
6
L2
7
a
L
.—
LL,
L
L.
L
_
I
ri
._.
4
t
•
__
,
U.S. P a t e n t
J a n . 18, 2005
re)
CD
re)
L
Sheet 6 o f 7
U
s
6,845,354 B 1
U.S. Patent
Jan. 18, 2005
Sheet 7 of 7
US 6
9
845
9
354
B
1
-,
..,
1
US 6,845,354 B1
I N F O R M AT I O N R E T R I E VA L SYSTEM W I T H
A NEURO-FUZZY STRUCTURE
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an information retrieval
system. M o r e particularly, i t relates t o a n information
retrieval system with a neuro-fuzzy approach for building a
customizable thesaurus, tolerating an operator's input errors,
performing fuzzy queries, and providing a quick select
option by prioritizing the information retrieved by the system according to the operator's preference.
2. Description of the Related Art
The rapid growth in the Internet and information industry
has diversified the w a y information i s retrieved. I t has
become crucial for an operator to effectively find the information o f his or her interest. For example, a software
programmer who develops software programs with objectoriented technology may hope to quickly assemble a prototype software application directly from the appropriate
retrieved and developed software components and test i t
after acquiring the formal specification upon receipt o f
demands f o r large scale software development. Such
demands also hold true in public library query system, World
Wide Web search tools and other information services (such
as the management and query of massive news archives and
legal documents) where operators hope to get the desired
information that meets their professional needs.
However, the exponential growth in information loads or
software components included i n software component
libraries has led t o uncertainty i n the entire process o f
retrieving information and software components. Efficiency
is normally contingent upon an operator's experience and
expertise, that is, whether the query words the operator
inputs precisely describe the features of the desired information. Consequently, information retrieval systems have
become a critical issue i n information management and
information recycling technology.
Conventional information retrieval systems are constructed based on Boolean logic. That is, the query words
entered by an operator are matched with specific keywords
in the information o r software components to determine
whether they meet operator's needs. Such conventional
information retrieval systems offer very limited flexibility
and have the following drawbacks:
2
should be accurate, i.e., conventional systems tolerate
no erroneous input. Perfect accuracy cannot be guaranteed for routine input, and synonymous words with a
different part of speech may be entered. For example,
5 a n operator may skip a character in typing a query
word, or the operator may enter a noun form o f the
query word instead of the verb form as found in the
thesaurus. Similar situations will prohibit the operator
from finding a corresponding keyword.
10 T o tackle this problem, prior art has created a replace table
to make up the deficiency. The replace table lists occurrences of possible errors, declensions and inflections of the
query words in the thesaurus to revise potential input errors
the operator may make. However, this kind of replace table
15 i s quite rigid and basically unable to compensate f o r all
possible errors.
3. Information retrieval systems o f the prior art fail to
process indefinite information, that is, fail to provide
fuzzy query processing. In other words, the operator
20 h a s to precisely specify various descriptors or features
of the query f o r information archives o r software
components to obtain the desired information. That is
to say, the operator has to have certain knowledge o f
the information document or software components to
25 b e queried; otherwise, the queried result from erroneous information will not fulfill expectations.
4. Lastly, a significant number of documents or software
components can normally be retrieved for a specific
keyword in conventional information retrieval systems.
30 G e n e r a l l y , advanced information retrieval systems may
find the correspondence between documents or software components and query words and output the
ranking. Since the ranking-filtering functions are rigid,
every operator has to follow the same ranking-filtering
35 p r o c e d u r e s . Consequently, no flexibility is provided in
using the systems.
Current information retrieval systems, as specified above,
can perform basic query and retrieval tasks; however, they
are not conveniently accessible and have difficulty finding
40 appropriate files or software components. And this problem
is what this invention intends to resolve.
SUMMARY OF THE INVENTION
Therefore, the primary object o f this invention i s t o
provide an intelligent information retrieval system, where
the thesaurus can be customizable according to the operator's preference and be queried according to the keyword the
operator enters after performing the customizable conversion of synonyms.
50
Another object of this invention is to provide an intelligent information retrieval system that tolerates erroneous
input. That is, the system can still proceed with a query even
the operator has to some extent made mistakes.
45
1. I n a typical information retrieval system, the system
normally expects the query words an operator enters
upon accessing query libraries to fully match the keywords provided by the system. In other words, those
keywords are formulated to describe the features of an
information archive or software component when the
system i s built. However, i n practice, i t i s nearly 55 S t i l l another object o f this invention is to provide an
intelligent information retrieval system, which performs and
impossible to demand that operators precisely use the
processes indefinite data, that is, performs fuzzy query, to
keywords stored in the system.
enhance the flexibility in usage.
To tackle this problem, a thesaurus is created for keywords in the systems of the prior art. That is, each query
And another object o f this invention is to provide an
word in the system is defined by a group of synonyms in the 60 intelligent information retrieval system capable of performthesaurus. It can therefore be inferred that an input word is
ing ranking of the retrieved software components according
a query w o r d when one o f the synonyms i s entered.
to operator's preference and performing filtering according
However, a thesaurus as such generally adopts a rigid
to operator's settings, thereby enabling easy access and
formula to list the correspondence between query words and
enhancing the system's function.
synonyms.
65 I n accordance w i t h the aforementioned objects, this
2. I n conventional information retrieval systems, i t i s
invention provides an intelligent information retrieval sysassumed that query words entered b y the operator
tem for querying the corresponding information components
US 6,845,354 B1
4
with an inputted query word. The system includes a synonym block for finding the correspondence o f at least an
indexed keyword according to the query word based on a
fuzzy synonym mechanism; an information indexing block
for finding at least an information component corresponding
to the indexed keyword based on the aforementioned query
word; a component ranking-filtering block for filtering and
ranking the found information components based on a fuzzy
ranking-filtering mechanism and for outputting the desired
information components for selection; a synonym adjusting
block for adjusting the fuzzy mechanism o f the synonym
block based on the selected information components; and a
filtering adjusting block for adjusting the fuzzy mechanism
of the component ranking-filtering block based o n the
selected information components.
This invention consists of information components independently packed f o r a specific task as aforementioned
information components. I n addition, the above synonym
block and information indexing block are implemented by
fuzzy neural networks for accelerating parallel processing.
Also, the synonym block can tolerate input errors due t o
query word encoding and position shift compensation.
The aforementioned synonym block can perform in two
phases, namely: recall phase and learning phase. In the recall
phase, the synonym block encodes a query word with an
encoding section to generate an encoded pattern of the query
word t o correspond w i t h a feature o f the query word.
Further, a similarity section is used to match the encoded
pattern of the query word with the encoded pattern of system
synonyms to generate a corresponding similarity value representing the degree o f similarity between the query word
and those synonyms in the system. Those system synonyms
correspond respectively to a plurality of keywords set in the
system. Comparisons are made with the similarity value and
the correlation value between the system synonyms and
system keywords through a correlation section and a default
threshold so as to select the output as the above-mentioned
indexed keywords from those system keywords. In practice,
however, the similarity calculation section can be composed
of the nodes o f a fuzzy neural network with each node
corresponding to a synonym set in the system. The correlation section can be composed o f nodes to correspond to
each individual keyword set in the system.
In the learning phase, the synonym b l o c k generates
respectively an encoded pattern o f a query word and an
encoded pattern of a retrieved component keyword for the
query word and the retrieved component keyword corresponding to the selected information component through an
encoding section. Next, the similarity calculation section
matches respectively the encoded pattern of the query word
and the encoded pattern of the retrieved component keyword
with the encoded pattern of the system synonym to generate
a first set of corresponding similarity values and a second set
of corresponding similarity values. The correlation section
calculates the corresponding output values based on the first
and second sets of similarity values and the original correlation value between the system synonyms and the system
keywords. Finally, the determining section is used to determine whether it should correct the aforementioned similarity
calculation section and the correlation section according to
the first set and the second set of similarity values and the
above output value.
The aforementioned information indexing block includes
an indexed keyword register section for sequentially storing
the indexed keywords; and a plurality of component indexing units for corresponding to various information components and for storing the component keywords correspond-
ing to the information components and the identification
data of the corresponding information components. When an
indexed keyword i s received, its similarity to the saved
component keyword is calculated, and it is judged whether
5 a predetermined threshold is exceeded to determine whether
the identification data o f the corresponding information
component should be outputted.
BRIEF DESCRIPTION OF THE DRAWINGS
io T h e above objects, features and advantages of this invention will become apparent from the detailed description of a
preferred embodiment w i t h reference t o t h e following
drawings, wherein
FIG. 1 i s a block diagram illustrating the functional
15 structure o f an intelligent information retrieval system o f
this invention;
FIG. 2 is a schematic diagram illustrating an exemplary
query screen of the intelligent information retrieval system
of this embodiment;
2
FIG. 3 i s a block diagram illustrating the functional
structure of the synonym block and customizable synonym
adjusting section of the embodiment of this invention;
FIG. 4 is a schematic diagram illustrating an example of
25 the encoding procedure o f the embodiment;
FIG. 5 is a diagram illustrating the detailed structure of a
synonym block realized with the structure o f fuzzy neural
network of the embodiment of this invention;
FIG. 6 i s a block diagram illustrating the functional
30 structure o f the information indexing block of the embodiment of this invention; and
FIG. 7 is a diagram illustrating the detailed structure of an
actualized information indexing block with the structure of
the fuzzy neural network of the embodiment of this inven35 t i o n
DETAILED DESCRIPTION OF THE
PREFERRED EMBODIMENTS
The information retrieval system as disclosed i n this
40 embodiment is a system for retrieving software components.
The software component mentioned here refers to source
codes for a specific functions as well as private data used by
the source codes, w h i c h a r e packed a s independent
components, such as the software components used i n
45 object-oriented technology f o r performing a function.
However, application of this invention should not be limited
to the environments as described in this embodiment since
modifications o f this embodiment involving, for example,
environments i n which multimedia data such as various
50 documents, images, animated pictures a n d songs are
retrieved, can still be made by those who are skillful in the
art.
According to the above description o f the prior art, the
information retrieval system of this embodiment is provided
55 t o achieve the following objects by: Firstly, enabling the
operator to customize the thesaurus for easy query according
to personal habit and preference; secondly, tolerating input
errors; thirdly, providing representation and processing o f
indefinite information; and fourthly, ranking the retrieved
60 software components and filtering out those with low correlation according to operator's personal retrieval preference
and habit for convenience in selection. The above objects of
this invention are realized by making use of fuzzy information retrieval, knowledge engineering and machine learning
65 technologies. The embodiment of this invention is described
in further detail with reference to the accompanying drawings.
US 6,845,354 B1
5
FIG. 1 i s a block diagram illustrating the functional
structure o f an intelligent software component retrieval
system o f this invention. A s shown i n FIG. 1, the entire
software component retrieval system is composed o f synonym block 1, information-indexing block 3, component
ranking-filtering b l o c k 5 , customizable ranking-filtering
adjusting block 7 , and customizable synonym adjusting
block 9. Functionally, the system is divided into two parts:
the query retrieval mechanism comprising synonym block 1,
information indexing b l o c k 3 a n d component rankingfiltering block 5 as shown in the upper section of FIG. 1, and
the adjusting mechanism comprising customizable rankingfiltering adjusting b l o c k 7 a n d customizable synonym
adjusting block 9 in the bottom section of FIG. 1.
In the process o f query retrieval, the operator 10 first
specifies a query word 11 to the synonym block 1 for finding
corresponding keywords. The keywords are set in the system
for specifying the characteristics of software components to
be retrieved. I n this embodiment, the query word 11 specified b y the operator 1 0 includes f o u r query facets f o r
defining the software components to be retrieved by the
operator. The four facets are software component function,
software component file type, associated medium and system type, respectively. The operator can define the four
facets for retrieving the desired software components. The
query item o f software component function is for defining
the function and capability of the software components. I n
this embodiment, the query item for software component
function accepts multiple query w o r d input, which are
divided by slashes. The query item for software component
type is for defining the file type of the software components,
such as OCX, VBX (visual basic), VCL, DLL (dynamic link
library), C++, and delphi among others. The query item for
associated medium i s f o r defining a device used b y a
software component, such as disk, keyboard, input/output
port, network, screen and interface. The query item f o r
system type is for defining an appropriate operating platform
for a software component, such as Windows 3.x/951NT,
DOS, OS/2, and UNIX, among others. In this embodiment,
only the query item for the software component function is
entered by the operator, other query items, which are in a
fixed keyword table provided with the system, are for the
operator to select.
FIG. 2 is a schematic diagram illustrating an exemplary
query screen o f the information retrieval system o f this
embodiment. A s shown in the figure, the operator 10 can
define the desired software components with settings for
various query items, wherein 62 denotes a software component function, 64 an associated medium, 66 a file type and
68 a system type. It is clear from the figure that the operator
can freely key in query words to define the function of the
software components; other query items can be selected
directly from the system keywords. Consequently, only the
query words the operator enters in the function item will be
processed for synonyms.
After a query word is entered into the synonym block 1
for synonymy analysis and processing, its corresponding
keywords predefined b y the system are called "indexed
keywords" 1 3 i n t h i s embodiment. Indexed keywords
belong to keywords predefined in the system and have a
certain degree o f synonymy with the entered query word.
Such synonymy is defined by the synonym block 1. In this
embodiment, the synonym block 1 comprises a fuzzy neural
network and can tolerate input errors by virtue of query word
encoding and position shift compensation. T h e detailed
construction of the synonym block 1 will be explained later.
Also, as described above, since the synonym block 1 of this
6
embodiment only processes the synonyms f o r the query
words in the function item of the software components, other
query words entered i n the query item w i l l be directly
selected from the keywords preset i n the system, hence,
5 problems associated with synonyms and input errors are
eliminated.
The information-indexing block 3 performs the retrieval
of software components based on the indexed keywords
found i n the synonym block 1. I n this embodiment, the
to information-indexing block 3 w i l l output an identification
data 15 for a retrieved software component. Moreover, the
information-indexing block 3 also comprises a fuzzy neural
network and can enhance parallel processing efficiency.
Detailed description follows.
Finally, the identification data of software component 15
5
outputted from the information indexing block 3 is supplied
to the component ranking-filtering block 5 for ranking and
filtering processing and outputted to the display device for
the operator 10 to select. Generally, a significant number of
software components that match the conditions set for the
20 query word by the operator will be found for a query word
after the processing in the synonym block 1 and the information indexing block 3. The number of indexed words that
match the query word w i l l increase with the burgeoning
growth i n the volume and field of software components. It
25 w i l l be deemed improbable for the a operator to easily find
the desired software component from such a massive number of messages. Therefore, it is necessary to categorize and
filter out the retrieved software components. The purpose of
ranking processing is to sequentially display the retrieved
30 software components according to the correlation with the
query word and the operator's previous preference in selecting software components so the operator may find and select
the desired software component as soon as possible. The
purpose o f filtering processing i s t o filter o u t software
35 components with lower correlation so the operator may find
the desired software components more easily. Consequently,
a threshold may be modified and set by the operator in the
component ranking and filtering block 5. The operator may
then select the software components 21 he really needs from
40 the displayed software components.
Another feature of the software component retrieval system of this embodiment lies in that an automatic feedback
adjusting mechanism is provided for adapting the component ranking-filtering b l o c k 5 a n d synonym b l o c k 1 ,
45 respectively, according to the retrieved components 21 and
the keywords corresponding to the software components
desired and selected by the operator 10, and the operating
habit and retrieval preference the operator has demonstrated.
Customizable ranking-filtering adjusting block 7 adapts
50 according t o t h e retrieved components 2 1 t h e ranking
mechanism and filtering mechanism i n the component
ranking-filtering block 5 to make them learn to suit the needs
of practical operations. Customizable synonym adjusting
section 9 adapts according to software component keywords
55 the synonymy in the synonym block 1 to make it learn to suit
the needs of practical operations.
The synonym block 1, the information-indexing block 3
and component ranking-filtering block 5 are described in the
following, respectively, with reference t o the associative
60 drawings.
Synonym Block (1)
FIG. 3 i s a block diagram illustrating the functional
structure of the synonym block 1 and customizable synonym
adjusting section 9 of the embodiment of this invention. As
65 shown i n the figure, the synonym block 1 comprises an
encoding section 101, similarity calculation section 103,
correlation section 105 and a determining section 107.
US 6,845,354 B1
7
8
The synonym block 1 undergoes its operation i n two
The correlation value between system-defined synonyms
phases: the recall phase and the learn phase.
and system-defined keywords is saved i n the correlation
Recall Phase
section 105. For example, i f the keyword for describing the
In the recall phase, the query word 11 the operator inputs
function of a system-defined software component is "about
is first encoded i n t h e encoding section 101. I n t h i s 5 d i a l o g box," the synonyms may be "interface," "dialog," and
embodiment, the encoding format displays in parallel the
"help" among others. From user experience, the above three
number of letters and the location o f their first appearance
synonyms differ in the degree of correlation with the actual
with the word to be encoded (at this time, the query word).
keyword. For example, "dialog" may have a high correlation
The encoding process of this embodiment is as follows:
of 0.7 with "about the dialog box," the synonymy of "help
(a) Convert first all letters in capital (upper) case into to b o x " with a lower correlation of 0.5, and "interface" with an
lower case, and all other characters other than a through
even lower correlation 0.3 due to its broad meaning.
z and space into spaces.
Consequently, the correlation between a query word and
(b) Next, set 27 registers corresponding to the English
each keyword can be determined based on the correlation
alphabet a to z and the space.
value between each synonym and an actual keyword after
(c) Further, calculate the number o f each letter o f the 15 the similarity value being calculated by the similarity calconverted word to be encoded and save the number of
culation section 103 is supplied to the correlation section
these letters in their corresponding registers.
105. I n this embodiment, the lesser o f the corresponding
(d) By now, there are still some registers with empty data.
similarity value and correlation value is taken to determine
Therefore, those empty registers can be used for saving
the correlation between the query word and each keyword
the location o f the first occurrence o f each letter 20 and matched w i t h a pre-determined threshold t o further
(including the space) being converted but remaining to
determine whether the keyword is to be outputted as an
be encoded. Saving is realized by calculating the corindexed keyword 13.
responding location o f each letter to be encoded and
In the recall mode, the synonym block 1 will use neither
assigning it a negative value and then saving it in an
the determining section 107 nor the customizable synonym
empty register adjacent t o the corresponding letter 25 adjusting 9. Also, the synonym block 1 further includes a
register, preferably, i n the l e f t register. I f the l e f t
position shift calculation section (not shown in FIG. 3, its
register is already filled with data, then the data will be
function is realized in L2 node of FIG. 5) for providing the
saved in the right register. I f both registers are filled
capability o f tolerating input errors based on the encoding
with data, then the location value is abandoned.
format o f the query words 11 to be described later.
(e) Though most registers are filled with location data, 30 Learning Phase
there are still some among these 27 registers that w i l l
In the learning phase, the synonym block 1 may make use
remain empty. At this time, all letters being converted
of the functions provided b y the customizable synonym
adjusting section 9 to adjust the similarity value calculation
but not yet encoded (that is, letters a to z and the space)
are sequentially inserted i n t o a l l empty registers.
parameters (to be described later) of the similarity calculaHence, the encoding o f the words to be encoded is 35 t i o n section 103 and the correlation value o f correlation
realized.
section 105. The adjustment is determined based on the
FIG. 4 is a schematic diagram illustrating the encoding
software component chosen by the operator and its corresponding keyword 19 o f the retrieved component.
procedure o f "fuzzy control" o f this embodiment. The (a),
First, the query word 11 and the retrieved component
(b), (c) and (d) i n FIG. 4 represent the content o f each
register of the above steps (b), (c), (d) and (e), respectively. 40 keyword 19 are inputted to the encoding section 101 for
Therein, " 0 " represents empty data status, and the first 26
encoding so as to obtain the encoding format corresponding
registers correspond respectively to letters "j," "e," "q," "i,"
to the query word and the encoding format corresponding to
"r," "w," "t," "k," "o," "v,"
the retrieved component keyword. The encoding format
"1," "g," "c," " y, " " u , " " h , " " p , " " d , " and ''in''. The 2 7
corresponding to the query word and the encoding format
register corresponds to the space. Such a distribution o f 45 corresponding t o the retrieved component keyword are
th
letter locations follows the rule that most frequently used
further simultaneously supplied to the similarity calculation
letter w i l l be assigned to a location adjacent to the least
section 103 for calculating respectively the degree of simifrequently used letter based on the encoding features of the
larity between the two and each system synonym. A t this
encoding section to increase the data recording capacity in
time, two sets of similarity value are generated to correspond
the coding after the encoding.
50 the query word and the system synonym, respectively. The
The encoding format generated by the encoding section
two sets o f similarity values w i l l be supplied to both the
101 will be supplied to the similarity calculation section 103
correlation section 105 and the determining section 107.
However, at this time, the correlation section 105 executes
for calculating its similarity with system-defined synonyms
and then forwarded to the correlation section 105 for deterthe operation different from that in the recall phase, further
mining what system keyword is relevant to the actual query 55 generates another set o f output values to the determining
word, the determined result being outputted as an indexed
section 107. The determining section 107 w i l l determine
keyword.
whether it should adjust and how it should adjust based on
The encoding format o f system-defined synonyms are
the similarity value and the output value o f the t w o sets,
pre-saved i n the similarity calculation section 103. I t s
further the adjustment is actually performed by the customiencoding is in the same manner as described above. System- 60 zable synonym adjusting section 9.
defined synonyms correspond to a specific system-defined
The encoding of the encoding section 101 and the calculation o f the similarity of the similarity calculation section
keyword, the keyword is the word for directly describing the
103 are performed in the same manner as those specified in
function o f a software component (in this embodiment).
Consequently, the query word 11 and system keyword are
the recall phase.
matched in the similarity calculation section 103 to generate 65 F I G . 5 i s a diagram illustrating the detailed structure o f
a similarity value representing the degree o f similarity
the synonym block realized with the structure of the fuzzy
between them.
neural network o f the embodiment o f this invention. The
US 6,845,354 B1
9
fuzzy neural network provides advantages of parallel processing and learning capabilities. As shown in FIG. 5, the
fuzzy neural network comprises nodes in five layers, represented as L i , L2, L3, L4 and L5, respectively. The function
of the node o f each layer is described in the following.
Layer L i
There are two sublayers for nodes on layer L i : L l a and
L i b , each sublayer has n nodes. So sublayer H a comprises
nodes q
k , „
i
, Sublayers H a and Lib are for storing query words 11 and
the encoding format of the retrieved component keyword 19.
q ,
Therefore, n in this embodiment is set to 27, and the function
, nodes q
of
q
words 11 and the total number of each letter, the location of
i
u qappearance of each letter and actual alphabetical data in
first ,
,
the
, retrieved component keyword 17. Encoded results of the
„
q
encoding section 101 o f FIG. 3 are saved in nodes o f L i
a
u
layer, except f o r those actual performing modules (not
n
a
n
shown).
d
d Moreover, in the recall phase, only the data in every node
s sublayer H a are used, whereas in the learning phase, the
of o
n u
b l e nodes of both sublayers L i a and L i b are used.
data of
d
Layer
a y L2
s
e The L2 layer comprises nodes P
k r
P,
L i f o r corresponding t o each system-defined synonym,
i
respectively, and hence, also for storing the encoding format
i , P 2 , P
, the system synonym. Each node receives from layer L i
of
b 3encoding formats o f query word 11 and the retrieved
the
k
P
c ,
component keyword 19. I n the recall phase, the similarity
,
o 4
value between both the encoding format of the query word
„
P
,
11
m ,and the encoding format o f the retrieved component
k
keyword 19 is calculated; whereas in the learning phase, the
a
n
p „
,
similarity values between the encoding format o f system
d
r
,
synonym and the encoding format of the query word 11 and
i encoding format of the retrieved component keyword 19
the
i
s calculated, respectively. The similarity value represents
are
s
e degree or extent o f similarity between the input word
the
t
(query word 11 or the retrieved component keyword 19) and
s
o
the
n system synonym. Moreover, input errors can be detected
s
through the position shift amount calculated with the nodes
o
t layer L 2 t o compensate the encoding format o f the
of
d
o
inputted query word H .
e In this embodiment, each node o f layer L2 performs the
r
s
following calculations:
e
k
q
(1)
i
u
s(q, P
,
X1}]
i
e
r
y
)
=
d(q, Pi) = II q— Pill =
10
Please take note that equation (1) only calculates the
negative value o f the encoding format, i.e., calculates only
when both q, and P
if a r e
u q, is negative, then q,' equals 1 or 0; i f p
5 pn e sg a te i gv ae t i v e ,
u i
n
q„-Ph
u .
t
e
n
calculated to have a value equal to or greater than 1 or
u
' T h e r e
smaller than or equal to —1, all position information with a
o
e f ro r value f o r the encoding format q needs t o b e
negative e ,
q iO
lo compensated with (by deducting or adding) this shift value.
.
u n The position shift amount s (q, P,), calculated with the
W
a e
encoding format and equation (1), can check for missing
q
u
h
character in inputting the query word. For example, i f the
l a
t
i
e
operator intends to key in "fuzzy" as the query word but
s o
n
misses the first letter by keying in "uzzy" instead. It will be
15 n
1 (
1
found that the position shift amount is greater than or equal
t
o )
,
to 1 when the shift i s calculated and this shift amount
r h
denotes that there is a letter missing in the front, so the shift
0 e
amount is compensated to the position amount of the format
; p
20 after encoding and the problem associated with the missing
i o
letter in front of the inputted query word is then resolved.
f s
Therefore, input errors will be reduced by making use of the
i
q fuzzy mechanism and encoding format of this embodiment.
, t Equation ( 2 ) denotes the average Hamming distance
a i
25 between a query word q and a synonym P
symbols d
in o
the three sets are defined first: S,, is a set of possible values
n
u
.d T h e
t h r e e
of
, various letters recorded in an encoded pattern of a word,
p s P
a
u
u h set with a positive integer value smaller than 27; S
30 o f valuesa
with a negative integer value larger than —27 for
s e t
,
a ip i s
recording the location where each letter displays for the first
f n
a
r time indan encoded pattern of a word; and S, is a set for
q
„
e t
recording each letter in an encoded pattern o f a word, a set
n e
n a ASCII values larger or equal to 32. So there are four
of
e
e m d
35 possible values for d
o
t
g u
absolute value of P, and finally, the absolute value of q„-P
as
u v a l
o
a ; shownuinethe following:
ii
n
b
t 1 ,
,
i f (q E S
t
e h
i t
e
p
40 s
(q = 0 and ( P E S
e x
v a b s o l
a n d
p o r
P
( P S, and (1
(q1 c
( tl e
p
e u
i /
E
= =
3
0
q i a
a
S E ,S
)
(q
, v
l
0 r
o
P
)
p
,
n
t u ee
o iE S, and P i ' e S, and q
i
(q
o n
r
a
r
E
S
d
h P
id (q E S, and ( P
f
45 o
P
r
iP E
j
S
)
h
e f
u (P
i)f )
p j
j
i
e
E
io
j
n q
„ I qj — /Ail otherwise /
o Sr
E
S
r
E
s
r
b ,
3
p
e
S
6
a
e
, t
h
)
,
E
S
n
)
50 Moreover, P
a
.
, e
e
d
o
n
u H following:
) )
q
e the
(2)
u
c
t
u
(
,u(g, Pi) = (1 + (d(q, P
l
3
"q
i
i
)
) F d )
u
h
i
F.
Equation ( 1 ) denotes the position shift amount s(q, P ) ta
(
'
)
55
P
between the synonym (P,) and the inputted query word 11 l
1
e
'
(q) saved in a node (P
s
l
represents the encoding format o f query word q and the p
i
j
encoding format of synonym P, j is a number between 1 and s
q
) o f t h e
27. W
i
e
l a L2 and r j
y e the
60
)
layer
u
/
Lhe and can be considered as the importance o f the j t v
.
L2 d2a t a
t
r
2
e
h
Wrt h data r e o f the query word U . Generally, such u
i ncoded m item
e e
1
p
importance can be represented with the probability o f the
o
i sn f ,
e
occurrence of a specific letter. For example, letter "z" only
t
e
q n th appears in query words; therefore, i t w i l l become
e
seldom
i
s
i
more important i f it shows up, and should be more heavily
t
a
weighted. n
n
o
w
d
P
d
e
e
i
P
i
,
,
r
d
o
(
r
q
E
S
i
/ / 0 o t h e r w ps e
o
r;
3
o
e
,= 6
o s1
r
i
q
Also, q, lmay have three possible values, as shown in the
i bi
f
E
following: P
f
S
(
i
a Pl j J i f q ,)
E i E
e ij
)
S
p
E
s
, (11 = {„ 1Si tfh e(qwes e
0o
r i S
S
p o r
q j
a
p
65
E
S
o
s
c
Equation (3) calculates the similarity between the two
r
s according to the average Hamming distance, wherein Fd and
)
P
i
h
j
o
E
S
w
,
e
m ar y
e vt e
a
o rf P e
h
i
US 6,845,354 B1
12
11
Fe represent t w o respective fuzzifiers, which define the
Niv, represents the correlation with the ith synonym of layer
range covered in the membership function o f the average
L4. R
Hamming distance between patterns after encoding and the
represent a respective synonym with the node o f layer LA,
i
sensitivity to the average Hamming distance. Therefore, the
that is, with the i
d e
similarity value /..t, (q,P,) of a query word to a corresponding 5 p „ . Q is an encoded pattern o f the query word 11, k an
t L
n o 4
synonym can be generated.
encoded pattern o f the retrieved component keyword 19.
n
t eo d e
The calculation of the similarity value for query words 11
The n n e
c o correlation between the query word and the related
as described above can also be applied to the calculation for
s
keywordncan be obtained with equation (5).
c t i
g
retrieving component keywords 19. Moreover, the function
t
In equation (6), L , represents the identification data o f
of nodes o f the L 2 layer corresponds t o the similarity
t
h
10 system keyword to indicate an actual word rather than its
calculation section 103 o f FIG. 3.
o
e
encoding format. Further, threshold 0
Layer L3
L
i 1
operator for n
2
c a processing in a broader term, narrower term or
b e
Layer L3 includes a node SA. Node SA is used for finding
2
normal t
means, the smaller the threshold 0
t
s e term. Thatb
y
a similarity value that is most similar to the input query word
n
value,h o broader the term used for processing; the greater
the e
2
n
t
11 among all values supplied from all nodes of layer U . In
15 the threshold 0
d
e
1
o
this embodiment, node S A functions principally i n the
2 1 processing. In equation (7), 0 ( 1
s
d
learning phase, its operation is represented as follows:
respective
v a3 l u e , output value o f nodes o f L 2 layer relevant to
p
e
nodes of L4 layer.
t 1
,h e
o
P, ) = M K , P
i
)a Finally, O 0 ( P
n f ar n r d o
a
20 o f R
nodes of layer LA in the recall phase and learning phase
)
w 2e r
n
l
(
supplied respectively to the information indexing block 3
( h Q
I
d layer L 5 . Layer L A corresponds t o the correlation
a
where K4 denotes a query word 11 or a retrieved compo- t )
and n
a p r e s e n t
)
section
nent keyword 19. X denotes an aggregation function, or the e r e 105 o f FIG. 3.
y
d
max function i n this embodiment. ,a(K,P,) denotes the t a e L5
e
Layer
r r ye
similarity output for node of layer L2 corresponding to the 25 L a0 m r , L5 comprises only a node RA and receives input
( sI
synonym found most similar to K. Also in this embodiment, u from every node o f layer L 4 i n the learning phase. The
L
Q d
the output SA of node S A , is a vector comprising another e function o f node R A is to select the maximal output R A ,
4
r e
pair of nodes w of layer L2 most similar to the query word f from correlation (co-dependency procedure) outputs of each
,
11 and the retrieved component keyword 19 in addition to a
p
o node orf layer L4. Its action can be illustrated as follows:
a
pair with a value most similar to the query word 11 and the 30
r n e sA
R
retrieved component keyword 19, as mentioned above.
e n
0
d
Layer L 4
t =
p In this embodiment, layer L 5 and layer L 3 are i n the
Layer L 4 comprises nodes R
a m phase, and determine whether any adjustment is
learning a
respectively correspond t o system-defined keywords. A s
,
x (
i
r ye(
necessary or how the adjustment will be made. As described
described above, a system keyword m a y correspond t o 35 a
, R
s
above,Rip, this embodiment, layer L 3 outputs at least two
n
several system-defined synonyms, each o f which having a
n
2
)
maximal ,similarity values corresponding to the query word
e )c
different correlation to the system keyword. Nodes of layer
d (with encoded pattern q) and the retrieved component
,
R
11 t ii =
L4 determine the correlation between the input query word
1 ,
p
3
keyword 19 (with encoded pattern k), respectively. Layer L5
v ,e
11 and system-defined synonyms according to the similarity
,
,
R
value calculated in aforementioned layer L2 (corresponding 40 then outputs an output with the greatest value. Customizable
o e
(
synonym adjusting section 109 can perform adjustment with
4
to each individual system synonym) and the correlation
u 5
these three values and two independently set thresholds 03
, corresponding to ,each individual node of layer U . In
R
value
t )
and O
t
o
this embodiment, the correlation between the query word 11
p
(Case 1)
r
and the keywords can be determined by the maximal simiu
When the output o f node R A ( L 5 ) i s greater than the
larity value (that is, most similar synonym) and its minimal 45
t
threshold 0,, or both of the two output similarity values of
corresponding correlation value. Next, each node of layer L4
node SA(L3) are greater than the threshold 0,, then execute
is set with a threshold 0
the following learning procedures:
of (
2 L4i node) to determine whether the query word 11 and its
(a) Modify system encoded patterns P
correspondinge n t s keywords meet the requirement. I f the
system
r e p r e s
n d
P
50 thewt lw oacorresponding nodes (set as W , and W
requirement is met, then the L4 node can output the correa
n
w 2
s a values o
sponding keyword as a n indexed keyword 1 3 t o t h e 2 maximal similarity v e d f layer L 2 . Modification i s
i
n
d
e
performed
) w ii t h n as follows, where N
information-indexing block 3.
x
respective d
w l a n number of variations being made.
N
Each L4 node has different operative characteristics in the
o
r
recall phase and the learning phase. In the recall phase, the
w 2
p
i
d
c
min n
operationi is executed once, whereas the min operation 55 r e v r e s e n t
p
a executed twice i n the learning phase. The operation
t
o
r
is
(9)
a
p
1r 2 „ S r ,
,
performed by each L4 node is represented as follows:
=
2 / ( k v 2
A p v 1
+
=
1 )
X p v 1
•
Ar2_XTK
+
,p
1
i
2
w
+ 2
+
1
1 0 2
+
1 )
•
k
,
X
p
60
v
(6)
LR(Ri) i f li(Ri) 0
OR(R) = '
where 1
i=10.
Nutt, o t h e r w i s e
/
2
(b) Create a new L4 node w3 when the output value of the
(
node RA(L5) is smaller than the threshold 03, wherein the
X
correlation values being saved and the initial value o f
p
0L(1,)=min(min(it(q,P2,),*'),min(y(1cP2),V1P))
(
7
)
65 variations are set respectively as:
v
Therein, equations (5) and (6) are performed in the recall
1
(10)
phase and equation (7) in the learning phase. In equation (5),
+
1
)
T
,
,u(R,)=min(y(q,P2,),TP)
(
5
)
US 6,845,354 B1
13
14
Xr'
3
Otherwise, the correlation value I V
.1
layer L4 with the v e d output from min operation of L4
W3 3
s a maximal
layernwill be modified ase
positive learning. Modifications are
i
t
h
W3
performed as follows, where N represents the number of
n
o
d
e
variations.
o
f
tion 301 and software component indexing unit 303 f o r
generating corresponding I D 15 for software components
that meet some requirements based on the indexed keyword
13 supplied by the synonym block 1.
As described above in this embodiment, there are four
query items f o r operators t o enter, namely: function o f
software components, associated medium, file type of software components and system type, as shown i n FIG. 2.
Therein, only the input for function of software components
can be freely set to generate several relevant indexed keywords 13 after the processing in the synonym block 1; other
index items can be selected with one of keywords provided
by the system. The indexed keywords 13 (only those corresponding to the function o f software components) found
by the synonym block 1 are sequentially selected and
the-keywords from the four query items are combined to
form a set o f actual query information, which are then
supplied to and saved in the indexed keyword register 301,
respectively. Next, each keyword stored i n the indexed
keyword register 301 is supplied to the software component
indexing unit 303 for matching so as to find an appropriate
software component, then the identification data for those
software components are outputted to complete the function
of software component retrieval. The above process continues until all indexed keywords are completely searched.
FIG. 7 is a diagram illustrating the detailed structure of an
actualized information indexing block with the structure of
the fuzzy neural network of the embodiment of this invention. The fuzzy neural network for actualizing this information indexing block 3 consists o f two layers, L 6 and L7,
corresponding t o the indexed keyword register 301 and
software component indexing unit 303, respectively. The
function of each node of layers L 6 and L7 is described as
follows.
l
'
i
r
1
3
=
1
3
0
'
1
3
+
1
1
.
1
)
V
1
3
+
1
(
(
'
'
1
3
1
)
.
1
.
5
10
3
=
g
"
'
3
+
1
(
1
1
)
(c) Output in the recall phase a node o f layer L4 with a
value other than the maximal value in later L4. Assume the
correlation value saved i n the node to be Wr, and T4'' to
represent the number of variations, the modification, known
as negative learning, is performed as follows:
15
Tfr=(k+b-F1)•Tir-ill(k+b-F1)•0,(R,)
k=k+1
(12)
20
where b represents a bias f o r preventing the correlation
value from being greatly reduced in negative learning in the
initial several learning processes.
(Case 2)
25
When the output o f node RA(L5) is smaller than the
threshold 0
dures:
3
, (a) Create a new L2 node w l when the output with the
maximal similarity value o f the node RA(L2) to the 30
t h e n
e x query word 11 (with encoded pattern q) is smaller than
e
c u the threshold 01, wherein the encoded patterns being
t
saved and the initial, value of variations are set respece
tively as:
Layer L6
t
35
Layer L6 is composed of m sublayers designated respech
,
tively as L61, L 6 2 „ L6m, and each sublayer has n nodes.
e
For example, sublayer L61 comprises n nodes from Q
(13)
f o l
iQ,„
l o
and o
l1 t so on. Each sublayer is for storing a keyword for each
(b) Create a new L2 node w2 when the output with the
w i
40 query keyword set, hence, m represents t h e keyword
,
maximal similarity value o f the node RA(L2) to the
n g
included i n each set o f query words ( m i s 4 i n this
a
retrieved component keyword 19 (with encoded pattern
l
e
embodiment), and whereas n represents the maximal numk) is smaller than the threshold 0
n
ber of letters for the query word (n is 21) in this embodiment.
a
patterns being saved and the initial value of variations
1r
d
In other words, in this embodiment, the nodes Q
n
are set e r e i n
, i w h respectively as:
s
45 of sublayer L61 are for storing the keywords for the function
1
n
g h
t
e
u software
of 1 t o components and the keywords are sequentially
Pr
Q
p
e2 n c o d e d
b
set2by the indexed keywords found with the synonym block
1
r
=w
g
l The nodes Q
1. 1
k
2
o
a
keywords for associated medium. The nodes Q
1
i 1
=
c (c) Create a new L 4 node w3, wherein the correlation 50 sublayer L63 are for storing the keywords for software file
,
y
(i
1 t o Q
2
e
=
value being saved and the initial value of variations are
1
type. o nodes Q
e t
3 1 The Q
2
1
4
set ,respectively as:
the
1
r
2 1keywords for system type. In this embodiment, the latter
,
)
three o
keywords are selected from the keywords predefined
4
iW w
L
3 t f o Q f
o
by 1 system.
i
2 the b l
6 u
3
s
55 Layer L 7
IS
(15)
4 y e r f
o
m
a Layer L7 consists of u nodes, designated as C
'= 1 .
1
s u b l a
c respectively. Each node of layer L7 corresponds to each
o
L i
6
C„,
1
It has become evident from the above description of the
y e r
3
o
2 , C
synonym block 1 and the customizable synonym adjusting
individual software component set in the system. Software
=
1
6
1
m
component keywords describing the corresponding software
a z
section 9 t h a t the information (software component)
60 4
p
retrieval system o f this embodiment is able not only t o
components and the identification data corresponding to the
r , C
a
software rcomponents are saved respectively i n nodes o f
dynamically adjust the structure of the synonym procedure,
r
e 3
e
but also tolerate input errors.
layer L7. Therein, software component keywords containing
i
f ,
the
Information Indexing Block (3)
f keywords for the four query items with their values
o
s
o
inputted by nodes of layer L6 when the nodes of layer L7 are
FIG. 6 i s a block diagram illustrating the functional
r
e
r
structure o f the information-indexing block of the embodi- 65 created. Theo I D number indicates the location o f the softs
t
s
t
ware component in the software libraries or the location of
ment of this invention. As shown in the figure, the informar
ir
n
n
o
retrieval in the Internet.
tion procedure 3 comprises an index keyword register secg
n
i
n
o
g
d
t
e
h
US 6,845,354 B1
15
16
Each node o f layer L7 receives a keyword supplied by
each individual node o f layer L 6 f o r matching with the
software c o m p o n e n t k e y w o r d s , i f t h e y m e e t t h e
requirements, their corresponding identification data are
outputted. The action o f each node o f layer L 7 can be
represented in the following equation:
The setting of the threshold 0
described. L 7
4 o f
When startingsto use the information retrieval system o f
n o d e
this invention, the operator will be prompted to enter various
w
i
l
l
5 q u en y items as displayed on the query screen shown in FIG.
r
o
w
2 f o r indexing software component libraries. The query
b input methods as adopted in this embodiment can be
e
word
(11Q, CA = A „ ,
(16)
if K
m
if K
*
m
0
otherwise
114 (1 — pk) =
a
0
n
d
K
11(Q,C,)=2* (1+(cl(Q,C i)1Far')
m
*
(17)
one to be freely entered by the operator as in function o f
software components 62 or another to be selected by the
to operator from the pop-up menu, such as associated medium
64, software file type 66 and system type 68. I f software
component function 6 2 i s selected f o r direct input, the
=
0(C,)=
0
LC(Ci)
i f
il(Q, Ci)
operator can key in multiple query words with each of the
(18)
(19
)
query words being separated by a slash. Query words in the
15 query items w i l l be processed one after another b y the
system. Each individual result w i l l finally be combined and
processed. I f query words are selected from the pop-up
menu, only one query item will be selected each time, since
options are individually independent and mutually exclu-
20 sive. Besides, only software component function 62 w i l l
undergo processing in the synonym block 1 to obtain numerous keywords. In all other query items that do not require
Equation (16) is used for calculating the distance between
synonymy processing, the query word selected by the operathe keyword set Q (composed o f four keywords) being
tor is the default keyword. Each time i n the information
entered and the software component keyword set C, of the 25 index block 3, a keyword is selected from each query item
i
to form a set of keywords for component retrieval processIn
t this embodiment, Q includes four vectors (1—m) and each
ing.
vector includes 21 elements (1—n. Similarly, C, also includes
h
There are three possible outcomes when all the software
four vectors and each vector includes 2 1 elements. I n
components found b y the information index block 3 are
s
addition, the value A represents the distance of a vector (one 30 presented to the operator for selection. First, the operator
o
of the 1—m) and is calculated with equation (17).
may find the appropriate software component immediately
f
In equation (17), the distance between a vector Q o f the
from a fair number o f the retrieved software component.
t
input keywords and the corresponding vectors A„, o f the
Second, the operator may find it difficult to select a software
w
encoded pattern C, o f software component is calculated.
component from a massive number of the retrieved software
a
Particularly, only portions o f query items (vectors) w i t h 35 components. Third, the operator may not find the approprir
actual input keywords in the input keywords Q are calcuate software component from a limited number of software
e
lated; the distance A
components. The problems with the last two outcomes can
to O.
c . fI no this manner, the processing can be smoothly perm
r
be resolved by resetting the threshold 0
formed r y
o u e i n this embodiment even when the operator i s
the component e
q
4 o f
t h ranking-filtering block 5. When the number
L
7
uncertainr o f d certain keyword for querying (for example, 40 o f software components becomes excessive, the threshold 0
a
m o
w
n o d e s
unsure of system type). That is, fuzzy input retrieval can be 4 is reset higher. That is, the threshold 0
p
s
i
n
realized. Therein, W"' represents the weight of the M
output r c o n t r o l l i can
4 f o of software components n g be reset higher to reduce
o
w
i
t
(that e c t o r
the number e f software components for output when the
o
t h vis, query item) of the input keyword set Q and indicates
t
h
n
h
the importance of the vector among all vectors in the input
number o f the outputted software components becomes
e
n
o
keyword set.
45 excessive. Conversely, the threshold 0
n Equation 18 is used for calculating the similarity value
i
n
p
4 when the number of the e s e components becomes too
c a n
b e
r software t
t
u
t
between the input keyword set Q and the software compo- l limited so e
that rthe operator may have more software como
w
d
i
nent keyword set C
ponents to choose from.
e o f Fd' h e Fe' are fuzzifiers for describing the range
s
Therein, t and
From the above description of information index block 3,
r
coveredgby the membership function and the sensitivity in 50 i t becomes evident that the information (software
s
o r i
i n a
relation
component) retrieval system o f this embodiment can perc l y to distance between the input keyword set Q and the
e
l
form retrieval in a fuzzy retrieval manner in response to the
software component keyword set C, saved in the node. The
r
t
s
a
v
similarity value calculated with equation 18 can be supplied
query words the operator has entered. That is, even when the
i
e
d
operator can not describe the exact content for a query item,
not only to the L7 node for determining whether the idenb
i
tification data of the corresponding software component can 55 this information retrieval system can still find all possible
e
n
software components in a most approximate manner.
be outputted, but also to the component ranking-filtering
d
n
o
Component ranking-filtering block (5)
block 5 for determining the order of ranking for the retrieved
a
d
softwaree
components.
The component ranking-filtering b l o c k 5 filters t h e
n Equation 19 is used for determining whether the software
amount of the retrieved software component based on the
C
d
component corresponding t o L 7 node i s retrieved. That 60 threshold 0
.
s
means, i f the similarity value kt(Q,C,) between the input 4 software components based on the corresponding similarity
o r
value r
keyword set Q and the software keyword set C
a
d e t e supplied by the information index block 3. I n other
words, s
ithe software i component has exceeded a predetermined m i n e the greater the similarity value, the more approxiv d e s c r b i n g
mate the input query word to the keyword describing the
threshold 0
e
t
h
software component is determined to be outputted, its cor- 65 software component.
4
d
e
Customizable ranking-filtering adjusting section 7 can
responding similarity value is also outputted to the compo,
b
r
a
n
nent e n
adjust the ranking mechanism and the filtering mechanism
t h ranking-filtering block 5 for further processing.
y
k
i
n
t
t h e
g
i
h d e n
o
r
t i f i
e
d
e
Null, o t h e r w i s e
US 6,845,354 B1
17
18
of the component ranking-filtering block 5 according to the
software component the operator actually selects to adjust to
suit personal habit and retrieval preference.
According to the above description, the software component retrieval system o f this embodiment provides the following advantages of:
Firstly, enabling the operator to customize the thesaurus
for easy querying according to personal habit and preference;
Secondly, tolerating input errors;
T h i r d l y, p e r f o r m i n g a n d p r o c e s s i n g i n d e f i n i t e
information, that is, providing methods of fuzzy querying;
and
Fourthly, ranking the retrieved software component and
filtering out those low in correlation according to operator's
personal retrieval preference and habit for convenience in
selection.
It should be understood that this invention is not limited
to the preferred embodiment as disclosed above. Variations
and modifications can be made by those who are skillful in
the art without departing from the spirit and scope o f the
present invention as defined in the appended claims.
What is claimed is:
4. The information retrieval system as claimed in claim 1,
wherein the correlation means comprises:
a plurality o f correlation units, corresponding t o the
system keywords, respectively, for receiving the simi5 l a r i t y values generated b y the similarity means and
comparing the predetermined threshold value with the
combination of the correlation values and the similarity
values, and determining whether the corresponding
system keywords serve as the indexed keywords.
lo 5 . The information retrieval system as claimed in claim 1,
wherein the encoding means encodes the query word into
the corresponding encoded pattern o f the query word by
recording the occurrences of the alphabetical letters in the
query word, the location of the first occurrence for each of
15 the alphabetical letters and a portion of the query word.
6. The information retrieval system as claimed in claim 5,
wherein the synonym block further comprises a shifting
means for calculating the position shift amount o f each
alphabetical letter by the location of the first occurrence of
20 each alphabetical letter to determine whether there is any
input error, and compensating the query word when there is
an input error.
7. An information retrieval system for finding correspond1. An information retrieval system for finding corresponding information components according to an entered query
ing information components according to an entered query 25 word, comprising:
word, comprising:
a synonym block for finding indexed keywords correa synonym block for finding indexed keywords according
sponding to the query word;
to the query word;
an information indexing block for finding a corresponding
an information indexing block for finding a corresponding
information component based o n the indexed keywords; and
30
a component ranking-filtering block for ranking and filtering the found information components and for outputting the information components for selection;
35
wherein the synonym block comprises:
encoding means f o r encoding the query word and
generating an encoded pattern of the query word, the
encoded pattern of the query word corresponding to
a feature of the query word;
40
similarity means for comparing the encoded pattern of
the query word with the encoded patterns of system
synonyms, the latter generated by encoding a plurality o f system synonyms, and generating corresponding similarity values, the similarity values rep- 45
resenting the similarity between the query word and
the system synonyms, each of the system synonyms
corresponding t o a t least o n e o f t h e system
keywords, respectively; and
correlation means for selecting the indexed keywords 50
from the system keywords according to correlation
values between the system synonyms and the system
keywords, the similarity values generated b y the
similarity means and a predetermined threshold
value for comparison.
55
2. The information retrieval system as claimed in claim 1,
wherein the information components are software components classified by types, functions, associated storage media
and operating platforms.
3. The information retrieval system as claimed in claim 1, 60
wherein the similarity means comprises:
information component based o n the indexed keywords; and
a component ranking-filtering block for ranking and filtering the found information components and for outputting the information components for selection;
w h e r e i n the synonym block comprises:
encoding means for encoding the query word and a
keyword for the found information components and
generating an encoded pattern of the query word and
an encoded pattern o f the keyword for the found
i n f o r m a t i o n components, the encoded pattern of the
query word corresponding to a feature o f the query
word, the encoded pattern o f the keyword for the
found information components corresponding to a
feature o f the keyword for the found information
components;
similarity means f o r respectively comparing t h e
encoded pattern of the query word and the encoded
pattern o f the keyword for the found information
components w i t h encoded patterns o f system
s y n o n y m s , the latter generated by encoding a plurality of system synonyms, and generating a first set
of similarity values and a second set o f similarity
values, the first set of the similarity values representing the similarity between the query word and the
s y s t e m synonyms, the second set o f the similarity
values representing the similarity between the keyword for the found information components and the
system synonyms, each o f the system synonyms
corresponding t o a t least o n e o f t h e system
k e y w o r d s , respectively;
correlation means f o r calculating a n output value
a plurality of similarity-calculating units for respectively
according to the first set of the similarity values, the
storing the encoded patterns o f the system synonyms
second set o f the similarity values and original
corresponding to the system synonyms, receiving the
correlation values between the system synonyms and
encoded pattern of the query word, and calculating the 65 t h e system keywords; and
corresponding s i m i l a r i t y values according t o a
determining means for determining whether the simisimilarity-calculating function.
larity means and the correlation means need adjust-
US 6,845,354 B1
19
20
ment according t o the first set o f the similarity
a plurality o f indexing units, respectively corresponding
values, the second set of the similarity values and the
to the information components and storing a compooutput value.
nent keyword set and identification data pertaining to
8. The information retrieval system as claimed in claim 7,
the corresponding information components, for calcufurther comprising an adjusting means f o r adjusting the 5 l a t i n g a similarity value between the component keysimilarity means and the correlation means according to the
word set and the received indexed keywords and outoutput value generated by the determining means.
putting t h e identification data pertaining t o t h e
9. The information retrieval system as claimed in claim 7,
corresponding information component when the simiwherein the information components are software compolarity value exceeds a predetermined threshold value.
nents classified by types, functions, associated storage media lo 1 6 . The information retrieval system as claimed in claim
and operating platforms.
15, wherein the information components are software com10. The information retrieval system as claimed in claim
ponents classified b y types, functions, associated storage
7, wherein the similarity means comprises:
media and operating platforms.
a plurality of similarity-calculating units for respectively
17. The information retrieval system as claimed in claim
storing the encoded patterns of the system synonyms 15 1 5 , wherein the predetermined threshold value is adjustable
corresponding to the system synonyms and receiving
and the adjusted threshold value i s used t o decide the
the encoded pattern of the query word and the encoded
number of the outputted identification data pertaining to the
pattern o f the keyword f o r the found information
corresponding information components.
components, to calculate the first set of the similarity
18. A n information retrieval system for finding conevalues and the second set of the similarity values.
20 sponding information components according to an entered
11. The information retrieval system as claimed in claim
query word, comprising:
8, wherein the determining means comprises a first selection
a synonym block for finding indexed keywords responmeans for comparing a first threshold value with the maxisive to the query word according to a fuzzy synonym
mal of the first set of the similarity values and the maximal
procedure;
of the second set to decide an adjustment procedure per- 2 5
an information-indexing block for finding a correspondtaining to the similarity means and the correlation means.
ing information component based on the indexed key12. The information retrieval system as claimed in claim
words;
8, wherein the determining means comprises a second
a component ranking-filtering block for ranking and filselection means f o r comparing a second threshold value
30 t e r i n g the found information components and for outwith the maximal of the output values to decide an adjustputting t h e information components f o r selection
ment procedure pertaining to the similarity means and the
according to a fuzzy ranking-filtering procedure;
correlation means.
a synonym adjusting block for adjusting the fuzzy syn13. The information retrieval system as claimed in claim
onym procedure of the synonym block according to the
7, wherein the encoding means encodes the query word and
the keyword for the found information components into the 35 f o u n d information components; and
corresponding encoded patterns b y recording the occurrences of the alphabetical letters in the query word and the
keyword for the found information components, the location
of the first occurrence for each of the alphabetical letters and
a portion of the query word and the keyword for the found
information components.
14. The information retrieval system as claimed in claim
13, wherein the synonym block further comprises a shifting
means f o r calculating the position shift amount o f each
alphabetical letter by the location of the first occurrence o f
each alphabetical letter to determine whether there is any
input error, and compensating the query word when there is
an input error.
15. A n information retrieval system f o r finding corresponding information components according to an entered
query word, comprising:
a synonym block for finding indexed keywords according
to the query word;
an information indexing block for finding a corresponding
information component based o n the indexed keywords; and
a component ranking-filtering block for ranking and filtering the found information components and for outputting the information components for selection;
wherein the information indexing block comprises:
storing means for sequentially storing the indexed keywords; and
a ranking-filtering adjusting block for adjusting the fuzzy
ranking-filtering procedure of the component rankingfiltering block according to the found information components.
40 1 9 . The information retrieval system as claimed in claim
18, wherein the information components are software components classified b y types, functions, associated storage
media and operating platforms.
20. The information retrieval system as claimed in claim
45 1 8 , wherein the synonym block is implemented by a fuzzy
neural network for accelerating parallel processing.
21. The information retrieval system as claimed in claim
18, wherein the information-indexing block is implemented
by a fuzzy neural network for accelerating parallel process50 i
n g 22. The information retrieval system as claimed in claim
. 18, wherein the synonym block comprises encoding means
for encoding the query word into an encoded pattern by
recording the occurrences of the alphabetical letters in the
55 query word, the location of the first occurrence for each of
the alphabetical letters and a portion of the query word; and
shifting means for calculating the position shift amount of
each alphabetical letter by the location of the first occurrence
of each alphabetical letter to determine whether there is any
60 input error and compensating the query word when there is
an input error.
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?