Apple, Inc. v. Motorola, Inc. et al
Filing
92
Declaration of Christine Saunders Haskett filed by Plaintiffs Apple, Inc., NEXT SOFTWARE, INC. re: 90 Motion Requesting Claims Construction (Attachments: # 1 Ex. 1 Moto Infring. Cont. Ex. A, # 2 Ex. 2 '157 patent, # 3 Ex. 3 '179 patent, # 4 Ex. 4 '329 patent, # 5 Ex. 5 '230 file history, # 6 Ex. 6 Oxford dictionary definition, # 7 Ex. 7 '559 file history, # 8 Ex. 8 The OSI Model, # 9 Ex. 9 ISO Standard, # 10 Ex. 10 Japanese file history, # 11 Ex. 11 Japanese prosecution appeal, # 12 Ex. 13 Moto Infring. Cont. Ex. E, # 13 Ex. 14 IEEE Standard, # 14 Ex. 15 '333 patent, # 15 Ex. 16 '721 file history, # 16 Ex. 17 '193 file history, # 17 Ex. 18 Moto Infring. Cont. Ex. F, # 18 Ex. 19 Merriam Webster Dictionary, # 19 Ex. 20 Webster's Dictionary) (Haslam, Robert)
EXHIBIT 4
U'nited States Patent
Taguchi
[il]
Best Available
[54]
SPEECH ANALYSIS AND SYNTHESIS
APPARATUS
(75)
Inventor:
Tetsu Taguchi, Tokyo, Japan
. [73]
Assignee:
Nippon Electric Co., Ltd., Tokyo,
Japan
[21]
[30]
Jan. 4, 1979
Foreign Application Priority Data
Jan. 9, 1978 [IP]
Ian. 9, 1978 [JP]
Nov. 10, 1978 [IP]
[51]
[52]
[58]
Japan .................................... 53-1282
Japan .................................... 53-1283
Japan ................................ 53-138690
Int. CI) ................................................ GI0L 1/00
U.S. Cl. .................................................. 179/1 SA
Field of Search ........................... 179/1 SA, 1 SD
[56]
References Cited
U.S. PATENT DOCUMENTS
3,715,512
4,038,495
Nov. 17, 1981
the normalized predictive residual power falls to low
levels, for example in high-pitched speech, the calculation of linear predictor coefficients from the autocorrelation coefficients of the speech sound is stopped when
the normalized predictive residual power falls below a
predetermined threshhold level. Either a variable stage
synthesis fIlter is used having its number of stages determined by the number of linear predictor coefficients
actually calculated, or a fixed number of stages can be
used and a zero value ftlter stage coefficient supplied to
those stages in excess of the number of coefficients
calculated. Degradation of speech quality due to quantization and transmission errors can be alleviated by computing the normalized predictive residual power on the
synthesis side from the transmitted predictor coefficients and using it· to excite the input to the synthesis
filter. In one embodiment especially suitable for high
ambient noise conditions, both a sound source and a
noise source are employed and two different conversion
and window processing channels are provided; one for
noise-affected speech and the other for pure noise. Autocorrelations in each channel are performed along
with correllations between charmels, and the autocorrelations and correllations are then appropriately combined to provide an autocorrelation coefficient of the
speech sound.
AppL No.: 942
[22J Filed:
COPY
4,301,329
[45]
[19]
2/1973 Kelly ................................ 179/1 SA
7/1977 White ............................... 179/1 SA
Primary Examiner-Mark E. Nusbaum
Assistant Examiner-E. S. Kemeny
Attorney, Agent, or Firm-Sughrue, Mion, Zinn,
Macpeak & Seas
ABSTRACT
In order to prevent errors and instability which may
occur in a speech analysis and synthesis apparatus when
[57]
10 Claims, 7 Drawing Figures
Iffi
110
111
Vj!Jv
;16
'-------"'~,40
ADlIOI{cl
,
1i5
117
u.s. Patent
Nov. 17, 1981
Sheet 1 of 5
4,301,329
FIC; I
ffilOR NIT
106
8KHz
1(6
(Ill
p
(~
{c
100
~Hz
LINEAR ffiEllIcrOR
U
COCFFlCENT METER
101
KI
KZ- ---- Kp
V/UV
OUANTIZER
110
III
DEMODULATffi
V/UV
20
FROM 101l0j
127
128
PIlCH
u.s.
Patent
Nov. 17, 1981
4,301,329
Sheet 2 of 5
fit l
FOO\1104
S(tl
106
104
103
T
I
1
or
I~ DIAl} r-- BUFFER ,.. WINDON
MEMffiY
PIffiID1
LPf
PIlCH
PICKER
FfiU~~I{ol
FROJIOlla)
v/UV
l JUDGING
K)5
I AUTO-
I
109
CORRELATOR
fromjloHbl
----
f
UNOO
107----
PREDICTOR
COEFRClENT
IUER
U
COORQIIR
~
)08
{UTI
It
~IOI(cl
QUANTIZER
fl(J 1
u.s.
Patent
Sheet 3 of 5
Nov. 17, 1981
FROM IOlle)
4,301,329
[[MODULATOR
1[3
FILTER STAGE
CONTROLLER
,
'---1
I
PITCH
414'
116
IMPlJUf
GENERATm
117
FROMIOI(a)
127
128
129
514
501
r+
FIC 5
CORRELATK)l
9J8TRACTOR
S@))))
~
•
CI.l
•
~
;,)""i)"'I
N(c»)):"'I""""
~
405)J)).,)
~))
('1)
:s
:JJ)
1"1"
z
0
~
109
VNi
JUDQNG
CONTROLLER J-.301
I
~r
PICKER
108
10
FROM 101k)
--I
'I
QLl4NTIZER
106
......
:J
......
\0
00
VJ
::r
('b
('b
.....
.j:::..
0
"""+l
FI(J 7
VI
712
700
I,r-l...
~
! A
--;--i
I F.'\/
710
I
704
I
720
i
MPUT
~
""
w
0
.......
'II
W
.---t-- J
713
740
U-----..
711
N
\0
U.s. Patent
'OL
Nov. 17, 1981
103
ly2
LPF
8
BUFFER
A7D
I~
K:l4
i
WINDOW
I
~
----fl!;I
MEMCffl
PITCH
PlO 1, the filter is unstable, as is
known, so that the stability of the filter can be checked
by using the K parameter. Thus, the K parameter is of
importance. Additionally, the K parameter is coincident
with a K parameter appearing as an interim parameter
in the course of the computation by the above-mentioned recursive method and is expressed as a function
of a normalized predictive residual power (see the
above-mentioned article by J. MAKHOUL). The normalized predictive residual power is defined as a value
resulting from dividing u in the equation (1) by the
power of the speech sound in the analysis frame.
The exposition of the speech analysis and synthesis is
discussed in more detail in an article "Speech Analysis
and Synthesis by Linear Prediction of the Speech Wave
" by B. S. ATAL AND SUZANNE L. HANAVER,
The J oumal of the Acoustic Society of America. VoL
50, Number 2 (Part 2), 1971, pp. 637 to 655.
The conventional speech analysis and synthesis apparatus of this kind has a very limited computational speed
3
4,301,329
4
due to the limitation on the scale of the apparatus alof a noise-affected speech sound; N, the number of
lowed therefor. The arithmetic unit of a limited accusamples of a waveform to be analyzed; and i, the numracy arithmetic such as one based on a limited word
ber of each sample. The right side of the above equation
length with fixed decimal point is usually employed for
is rewritten in the form of the autocorrelation:
such apparatus. The normalized predictive residual 5
power is relatively small in the voiced sound with high
p(SN)(SNjT = P(S)(S)T - P(l\)(!')T + P(lII)(SN)T
periodicity but relatively large in the unvoiced sound
with low periodicity, and its value is lower as the anawhere
lyzing order is higher (see the article by AT AL et aI,
FIG. 5 on page 642, for example).
10
N-l
The conventional speech analysis and synthesis appap(S)(S)T = i::O Si· Si + T
ratus has a synthesis filter of a fixed number of stages
N-I
p(N)(N)r = i':O Ni· Ni + T
corresponding to the number of order for the linear
predictor coefficient. Therefore, when a waveform of
N-l
p(SN)(N)T = i::O (Si + i'lI) . Ni + T
extremely high periodicity, i.e., of clear spectrogram 15
structure, such as the stationary part of a voiced sound,
N-l
p(l'l)(SN)T = i=:O Ni· (Si + T + Ni +T)
is processed, the normalized predictive residual power
tends to be smaller than the smallest significant value
that can be handled by the above-mentioned limited
Generalizing the delay T, P(SN)(SN)T is defined as the first
accuracy arithmetic. More definitely, this means that 20
autocorrelation
coefficient
and
the K parameters, which are given as a function of the
(P(Slv)(N)r - P(N)(lv)T+ P(N)(SN)T) is defined as the second
normalized predictive residual power, tend to be
coefficient. Under
the
IK I > 1, adversely affecting the stability of the synthesis autocorrelation of a speech sound isthis definition, difautocorrelation
expressed as a
filter. The window processing applied to successive
prefixed lengths of sound waveform may help increase 25 ference between the first and second autocorrelation
coefficients.
the normalized predictive residual power, because the
As described above, to obtain the parameter to corwindow length rarely equals an integral multiple of the
rectly express only the feature of the speed sound under
pitch period of the sound even if it is of high periodicity
high ambient noise, -the autocorrelation of the speech
and, consequently because the spectral structure of the
sound waveform within a single window length has a 30 sound is expressed in terms of the difference between
the first and second autocorrelation coefficients. More
lower clarity. Such increased normalized predictive
specifically, a conventional method employs an acousresidual power may help avoid the above-mentioned
tic-to-electrical signal converting unit for noise detecinstability of the synthesis filter. However, the use of
tion as well as an acoustic-to-electrical signal convertthe window processing does not necessarily mean an
increase in the predictive residual power sufficient to 35 ing unit for speech signal detection. With these units,
the acoustic signal from a noise source .and the acoustic
contribute to the stability of the synthesis filter, because
signal from a speaker are detected as a synthesis acousa high-pitched voice sound, such as a female voice, has
tic signal while at the same time only the acoustic signal
a sufficient periodicity within a very short window
derived from the noise source is detected. Then, the
length to lower the predictive residual power.
When the linear predictor coefficient for the analysis 40 autocorrelation coefficient of the noise-affected speech
sound and the autocorrelation coefficient of. the noise
is made to be of high order while the number of stages
are measured. Following this, the correlation coefficiof the synthesizing digital filter is reduced to overcome
ent between the noise-affected speech signal is measuch difficulty, the approximation of the spectral envesured from the above two kinds of signals. Similarly, the
lope of a less stationary speech sound or of the voiced
sound having a relatively large predictive residual com- 45 correlation coefficient between the noise and the noiseaffected speech signal is measured. Then, the autocorrepared power with the arithmetic accuracy is considerably reduced, deteriorating the quality of the synthesized
lation coefficient of the speech sound signal is measured
speed sound.
on the basis of the two autocorrelation coefficients, and
The calculation of the linear predictor coefficient
the linear-predictor coefficient is measured on the basis
under a high ambient noise involves errors since the 50 of the autocorrelation coefficient of the speech signal.
signal wave to be analysed is the superposition of the
In the conventional method, however, when the spatial
ambient noise on the speech wave. The spectral envedistances from the noise source to the acoustic to eleclope calculated from the linear predictor coefficient
trical signal converters for signal detection and noise
affected by the ambient noise is different from the specdetection are different from each other, no linearity or
tral envelope of the original speech wave. Under the 55 analogy exists between the input speech signals to both
influence of the ambient noise, the linear predictor coefconverting units. Therefore, the relation established
ficient must be analyzed to remove the influence by the
may be inaccurate among the autocorrelation coefficiambient noise. Such analysis is usually carried out by
ent of the speech signal relative to the autocorrelation
using an autocorrelation coefficient as follows. The
coefficient of the noise-affected speech signal, the autoautocorrelation coefficient p(SN)(SN)T of a noise- 60 correlation coefficient of the noise, the correlation coefaffected speech sound at a delay T is given as
ficient between the noise-affected speech signal and the
noise, and the correlation coefficient between the noise
and the noise-affected speech signal.
N-l
p(SN) (SN)T =
}: (Si + Ni)(Si + T + Ni + T)
As a result, there is a possibility that the autocorrelai=O
65 tion coefficient measured of the speech sound at delay T
becomes larger than that of the sound· perse. Specifiwhere So, S!, S2, ... are a series of samples of a speech
cally, when the autocorrelation value at delay T is norsound wave; no, n I, n2, ... , a series of samples of a noise
malized to "1", the autocorrelation value of the speech
wave; So+No, SI +N], S2+N2, ... , a series of samples
5
4,301,329
sound measured at delay 7 may be closer to "I", compared to that of the speech sound per se, and, as the case
may be, it exceeds "I". 'Vhen the autocorrelation value
exceeds "1", the synthesizing filter with the coefficient
which is the linear predictor coefficient calculated from
the autocorrelation coefficient becomes unstable. This
is seen, for example, from the fact that when the linear
predictor coefficient is of first degree, the K parameter
which is the interim parameter in the calculation of the
linear predictor coefficient by the Durbin method exceeds "I".
The above-mentioned conventional method to obtain
the linear predictor coefficient for the purpose of expressing correctly only the feature of the speech sound
under the condition of high ambient noise, has a disadvantage that the speech synthesis filter with the obtained linear predictor coefficient as its coefficient becomes unstable because of the influence of noise. As
described above, the conventional method first measures the autocorrelation coefficient of the speech
sound on the basis of the autocorrelation coefficient of
the noise-affected speech sound, the autocorrelation of
noise, the correlation coefficient between the noiseaffected speech sound and noise, and the correlation
coefficient between noise and the noise-affected speech
sound, and then obtains the linear predictor coefficient
depending on the autocorrelation coefficient measured
of the speech sound.
Evidently, the conventional method suffers from the
same disadvantage when the noise source has a spatially
large volume, or when the transfer function in the
acoustic area ranging from the noise source to the converter for speech sound detection is different from that
in the acoustic area from the noise source to the converter for noise detection. In the characteristic parameters of the speech sound obtained on the analysis side,
the speech sound source information, particularly the
normalized predictive residual power representative of
the amplitude information or the complex parameter of
a short time average power and a normalized predictive
residual power, have a much larger rate of time variation than that of the linear predictor coefficient a or the
K parameter. This arises from the fact that, while K
parameter representative of the reflection coefficient of
the vocal tract depends on the cross sectional area of
the vocal tract changing with muscular motion of a
human and therefore slowly varies with time, the normalized predictive residual power U as expressed by
(2)
6
the original speech sound to the synthesizing filter, the
reproducibility of the amplitude is, of course, poor.
Specifically, in the conventional apparatus, the linear
predictor coefficient is exactly coincident with the nor5 malizedpredictive residual power representative of the
spectral envelope of the speech sound on the analysis
side, while, on the synthesis side, the normalized predictive residual power is largely influenced by the above
errors but the linear predictor coefficient is little ef10 fected by errors. Therefore, the speech sound synthesized by using both the factors is poor in amplitude
reproducibility.
SUMMARY OF THE INVENTION
15
20
25
30
35
40
45
Accordingly, an object of the invention is to provide
a speech analysis and synthesis apparatus capable of
making speech analysis and synthesis with high stability
even when the nomalized predictor residual power is
below the limited accuracy of the apparatus as in the
stationary part of voiced sound stationary part.
Another object ofthe invention is to provide a speech
analysis and synthesis apparatus which is stably operable even under high ambient noise.
Still another object of the invention is to provide a
speech analysis and synthesis apparatus which can compensate for deterioration of the amplitude reproducibility due to quantization error and transmission error and
is capable of making speech analysis and synthesis with
high stability even when the amount of information to
be transmitted is little.
According to the invention, the normalized residual
power obtained on the analysis side is monitored and
when it falls below a predetermined value, the synthesis
filter is controlled to be the number of stages corresponding to the orderin such a case or the linear predictor coefficient with higher order than that is transmitted
with zero to thereby eliminate the instability of the
synthesis filter. Further, the normalized residual power
is obtained from the linear predictor coefficient on the
synthesis side and is used to excite the synthesis filter to
thereby prevent speech quality from being degraded
due to quantization error and transmission error.
In one embodiment especially suitable for high ambient noise conditions, both a sound source and a noise
source are employed and two different conversion and
window processing channels are provided; one for
noise-affected speech and the other for pure noise. Autocorrelations in each channel are performed along
with correllations between channels, and the autocorrelations and correllations are then appropriately combined to provide an autocorrelation coefficient of the
speech sound.
Other objects and features of the invention will be
apparent from the following description taken in conjunction with the accompanying drawings, in which:
50
where Ki is the K parameter of i-th order and p is the
number of order, is affected by the amplification of all
the changes of the respective Ki's and therefore its
variation is complicated and steep.
For this reason, in the analysis of the parameter in- 55
eluding the normalized predictive residual power, the
analysis frame length must be set shorter than that ofthe
BRIEF DESCRIPTION OF THE DRAWINGS
analysis frame required for analyzing the other parameFIG. 1 shows a blo(;;k diagram of an ordinary speech
ters such as the linear predictor coefficient and the like,
60 analysis and synthesis apparatus;
resulting in the increase of transmission capacity.
FI G. 2 shows a block diagram of a part of the circuit
Since the time variation of the parameters including
shown in FIG. 1;
the normalized predictive residual power is signficant,
FIG. 3 shows a block diagram of the analysis side of
the parameters are easily influenced by transmission
a speech analysis and synthesis apparatus according to
error due to external and internal causes in the course of
the transmission. Further, when the parameters are 65 the invention;
FIG. 4 shows a block diagram of the synthesis side of
quantized they involve quantization error. When the
the speech analysis and synthesis apparatus according to
normalized predictive residual power influenced by
such errors is applied as the amplitude information of
the invention;
7
4,301,329
8
normalized predictive residual power U to an amplitude
FIG. 5 shows a block diagram of the analysis side of
signal meter 108.
the apparatus which is another embodiment according
The amplitude signal meter 108 measures an exciting
to the invention;
amplitude as VU.P from the short time average power
FIG. 6 shows a block diagram of a speech analysis
and synthesis which is another embodiment of the in- 5 P supplied from the autocorrelator 105 and the normalized predictive residual power U supplied from the
vention and includes the analysis side and synthesis side;
linear predictor coefficient meter 107 and supplies the
and
measured exciting amplitude to the quantitizer 110.
FIG. 7 shows a block diagram of another example of
The pitch picker 106 measures the pitch period from
a speech synthesizing digital filter.
10 the speech voiced wave representing word code supDESCRIPTION OF THE PREFERRED
plied from the window processing memory 104 by a
EMBODIMENTS
known autocorrelation method, or the· Cepstrum
method, as described in an article "Automatic Speaker
Reference is first made to FIG. 1 illustrating an ordiRecognition Based on Pitch Contours" by B. S. Atal,
nary speech analysis and synthesis apparatus. In operation, a speech sound signal is applied through a wave- 15 Ph D thesis Polytech. Brooklyn (1968) and in an article
"Cepstrum Pitch Determination" by A. M. Noll, J.
form input terminal 100 to an analog to digital (A-D)
Acoust. Soc. Amer., Vol. 41, pp. 293 to 309, Feb. 1967.
converter 102. In the A-D converter 102, a high freThe result of the measurement is applied as the pitch
quency component of the speech sound signal is filtered
period information to the quantitizer 110.
out by a low-pass filter with a cut-off frequency of 3,400
A voic,d/unvoiced jUdging unit 109 judges voiced or
Hz and the speech signal filtered out is sampled by 20
unvoiced signal by a well known method using paramesampling pulses of 8,000 Hz derived from terminal (a) of
ters such as K parameters measured by the linear pretiming source 101. The sampled signal is then converted
dictor coefficient meter 107, and the normalized predicinto a digital signal with 12 bits per one sample for
tive residual power. This method is discussed in detail in
storage in a buffer memory 103. The buffer memory 103
temporarily stores the digitized speech wave for ap- 25 an article "A Pattern Recognition Approach to VoiceUnvoiced-Silence Classification with Application to
proximately one analysis frame period (for example, 20
Speech Recognition", IEEE TRANSACTION ON
msec) and supplies the speech wave stored for every
ACOUSTIC, SPEECH, AND SIGNAL PROCESSone analysis frame period to a window processing memING, VOL. ASSP-24,.No. 3, June 1976.
ory 104, in response to the signal from the output termiThe quantitizer 110 quantitizes K parameters Kl, K2
nal (b) of the timing source 101. The window process- 30
. . . Kp supplied from the linear predictor coefficient
ing memory 104 includes a memory capable of storing
measuring unit 107, the exciting amplitude information
the speech wave of one analysis window length, for
v'lJ:j> fed from the amplitude signal meter 108, the
example, 30 msec,· and stores the speech wave of the
judging information supplied from the voice/unvoiced
total of 30 msec; 10 msec of the speech wave transferred
from the buffer memory 103 in the preceding frame, the 35 judging unit 109, and the pitch period information fed
from the pitch picker 106, into 71 bits. With one bit
10 msec part being adjacent to the present frame, and
derived from the output terminal (c) of the timing
the whole speech wave in the present frame transferred
source 101 added to the 71 bit code for the transmission
from the buffer memory 103. The window processing
frame synchronization, the quantization output is transmemory 104 then multiplies the speech wave stored by
a window such as the Hamming window and then ap- 40 mitted in the form of 72 bit transmission frames through
a transmission line 111.
plies the multiplied speech wave to an autocorrelator
The transmission line 111 is capable of transmitting
105 and a pitch picker 106.
data of 3600 bits/sec, for example, and leads the data of
The autocorrelator 105 calculates an autocorrelation
each 72 bit frame and 20 msec frame period, i.e., of 3600
coefficient in delay T from a delay 1, for example, 125
JLsec to a delay p, for example, 1250 JLsec (P= 10), by 45 Baud, to a. demodulator 112.
The demodulator 112 detects the frame synchronizusing a speech wave representative of word code in
ing bit of the data fed through the transmission line 111,
accordance with the following equation (3):
and delivers demodulated K parameters to a K/a converter 113, the exciting amplitude information to a mulN-I-T
(3) SO tiplier 114, the voiced/unvoiced decision information to
l:
Si·Si+T
i-O
a switch 115, the pitch period information to an impulse
PT = -..!..::;!:l\:,.,',---,l---generator 116.
l: Si2
i=O
The impulse generator 116 generates a train of impulses with the same period as the pitch period obtained
Further, the autocorrelator 105 supplies to an amplitude 55 from the pitch period information and supplies it to one
signal instrument 108 the energy of the speech wave
of the fixed contacts of the switch 115. A noise generacode word within one window length, that is, short
tor 117 generates· white noise for transfer to the other
time average power
fixed contact of the switch 115. The switch 115 couples
the impulse generator through the movable contact
N-l
60 with the multipler 114, when the voiced/unvoiced
l: sP.
judging information indicates the voiced sound. On the
i=O
other hand, when the jUdging information indicates the
A linear predictor coefficient instrument 107 meaunvoiced sound, the switch 115 couples the noise genersures K parameter of p and the normalized predictive
ator 117 with the multiplier 114.
residual power U from the autocorrelation coefficient ·65
The multiplier 114 multiplies the impulse train or the
supplied from the autocorrelator 105 by the method
white noise passed through the switch i15 by the excitknown as an autocorrelating method and distributes the
ing amplitude information, i.e., the amplitude coefficiK parameters measured to a quantitizer 110 and the
ent, and sends the multiplied signal to an. adder 118. The
9
4,301,329
10
ler 301.and to an amplitude signal instrument 108. The
adder 118 provides a summation of the output signal
from the mUltiplier 114 and the signal delivered from an
controller 301 checks to determine whether the normaladder 120 and delivers the sum to a one-sample-period
ized predictive residual power is larger than a predeterdelay 121 and a digital"to analog (D-A)converter 127.
mined value or not, corresponding to the limited accuThe delay 121 delays the input signal by one sampling 5 racy of the apparatus. When it is smaller than the predeperiod of the A-D converter 102 and sends the output
termined value, a calculation stop signal is applied to
signal to the multiplier 124 and to a one-sample-period
linear predictor coefficient instrument 107. Upon redelay 122. Similarly, the output signal of the one-samceipt of the calculation stop signal, the linear predictor
pIe-period delay 122 is applied to a multiplier 125 and
coefficient instrument 107 stops its calculation. When
the next stage one-sample-period delay. In a similar 10 no calculation stop signal is applied thereto, it calculates
manner, the output of the adder 118 is successively
the linear predictor coefficient of the second order and
delayed finally through one-sample-period delay 123
the normalized predictive residual power of the second
and then is applied to a multiplier 126.
order by using the autocorrelation coefficient representTne multiplier factors of the multipliers 124, 125 and
ing the waveform reproducibility, the predictor coeffi126 are determined by a parameters supplied from Kia IS cient of the first order, and the normalized predictive
converter 113. The result of the multiplication of each
residual powder of the first order. Succeedingly, the
multiplier is successively added in adders 119 and 120.
instrument 107 recursively calculates the linear predicThe Kia converter 113 converts K parameters to linear
tor coefficient until the controllel' 301 produces the
predictor coefficients al, az, a3, ... ap by the recursive
calculation stop signai. Alternatively, the maximum
method mentioned above, and delivers at to the multi- 20 predictor order Nt may be present to thereby stop the
plier 124, a2 to the multiplier 125, . . . and ap to the
calculation of the coefficient measuring unit 107 automultiplier 126.
matically when it calculates the maximum one Nl, reThe adders 118 to 120, the one-sample delays 121 to
gardless of the calculation stop signal, preventing the
123, and the multipliers 124 to 126 cooperate to form a
need for the increased order number for the linear prespeech sound synthesizing filter. The synthesized 25
dictor coefficient.
speech sound is converted into analog form by the D-A
If the measuring unit 107 stops its calculation after
converter 127 and then is passed through a low-pass
calculating the linear predictor coefficient of the N2
filter 128 of 3400 Hz so that the synthesized speech
order, the N2 order linear predictor coefficient is apsound is obtained at the speech sound output terminal
30 plied to a variable sage synthesis filter 40 in the synthe129.
sis side shown in FIG. 4. The controller 301 applies a
In the circuit thus far described, the speech analysis
variable filter control signal for controlling the number
part from the speech sound input terminal 100 to the
of the fIlter stages corresponding to the N2 order to the
quantitizing circuit 110 may be disposed at the transmitvariable stage synthesis filter 40. The filter coefficient of
ting side, the transmission line 111 may be constructed
by an ordinary telephone line, and the speech synthesis 35 the filter 40 is controlled by the linear predictor coefficient of the N2 order and the number of filter stages of
part from the demodulator 112 to the output terminal
the fIlter 40 is controlled by the variable stage filter
129 may be disposed at the receiving side.
control signal. Under such controls, the filter 40 is exThe autocorrelation measuring unit shown in FIG. 1
cited by an exciting signal and produces a synthesized
may be of the product-summation type shown in FIG.
2. With S(O), S(l), ... S(N-l) for the speech wave code 40 speech sound signal to the D-A converter 127. As
shown in FIG. 4, synthesis filter 40 is comprised of an
words which are input signals to the window processadder 118, adders 410 to 414 of the same number as the
ing memory (in the designation, N designates the numfilter stage number n previously set, multipliers 420 to
ber of sampling pulses within one window length),
424, one-sample delays 430 to 434 and switches for
wave data 8(t) corresponding to one sampling pulse and
controlling the number of filter stages. A control signal
another wave data S(t+2) spaced by i sample periods 45
fed from the controller 301 on the analysis side is defrom the wave data 8(t) are applied to a multiplier 201
modulated by a demodulator 112 on the synthesis side
of which the output signal is applied to an adder 202.
and is then sent to the filter stage controller 401. The
The output signal from the adder 202 is applied to a
controller 401, in response to the control signal, turns
register 203 of which the output is coupled with the
other input of the adder 202. Through the process in the 50 on switches SWo to SWn2 (in the drawing, SW4 is
expressed SWn2) and turns off the remaining switches.
instrument shown in FIG. 2, the numerator components
With respect to the coefficient ofthe synthesis filter, the
of the autocorrelation coefficient PT shown in Eq. (3)
K parameter of the N2 order calculated on the synthesis
are obtained as the output signal from the coefficient
side is converted into an a parameter by the K-a conmeasuring unit 105 (the denominator component, i.e.,
the short time average power, corresponds to the out- 55 verter 113. The a parameter of the N2 order is applied
to the corresponding multiplier 420 to 424. In the drawput signal at delay 0). The autocorrelation coefficient PT
ing, the a parameter corresponding to the N2 order is
is calculated by using these components in accordance
applied to the multiplier 423 for setting the filter coeffiwith the equation (3).
cient. In place of the arrangement having the measuring
Turning now to FIGS. 3 and 4, there are shown block
diagrams of the analysis side and the synthesis side in 60 unit 107 supplying the linear predictor coefficient of the
N2 order and the controller 301 supplying the variable
the apparatus of the invention. In these drawings, like
stage synthesis filter control signal to the synthesis filreference numerals denote like parts or portions in FIG.
ter, the linear predictor coefficient of the N3 order can
1. Linear predictor coefficient instrument 107 calculates
be always transferred and the linear predictor coefficithe linear predictor coefficient of the first order and the
normalized predictive residual power from the autocor- 65 ents from the (N2)+ 1 to the N3 order set to zero. In this
alternative, the use of the fixed stage synthesis filter of
relation coefficient representing the reproducihilityof a
waveform delivered from the autocorrelator 105. The
n3 stages can attain approximately the same effect as
normalized predictive residual power is fed to a controlthat attained by the variable stage synthesis filter.
11
4,301,329
In the above-mentioned example according to the
invention, when the normalized predictor residual
power of the high order exceeds the accuracy range in
the limited accuracy arithmetic because of high predictivity, as in the stationary part of voiced sound, the
control 301 detects this to stop the calculation of the
linear predictor coefficients of the superfluous order.
The filter stage control signal is used corresponding to
the order where the normalized predictive residual
power is within the accuracy range of the apparatus.
Further, the linear predictive coefficient of a higher
order than that limiting order is treated as zero. For this,
the speech sound may be stably synthesized at all times.
Turning now to FIG. 5, there is shown another embodiment of the sppech analysis and synthesis apparatus
according to the invention which is operable stably
even under high ambient noise. FIG. 5 illustrates in
block form the construction of the analysis side as in
FIG. 3. In the figure, like reference numerals denote
like structural elements shown in FIG. 3. An acoustic
signal generated by a noise source 405 is applied to an
acoustic-to-electrical signal converter 501 and to another similar type converter 502, each of which may be
a microphone. The converter 501 converts a signal
mixed with acoustic signals generated by a speech
sound and noise source N into an electrical signal an
supplies the converted electrical signal to a window
processing memory 503, through an A-D converter 102
and a buffer memory 103. The converter 502 converts
the acoustic signal from the noise source into an electrical signal which in turn is applied to the window processing memory 503. The window processor 503 segments an electrical signal into windows such as rectangular windows or Hamming windows, and stores the
segmented signals and produces the stored data at the
fixed delay speech sound output terminal 505 and the
variable delay speech sound output terminal 506. The
window processing memory 504 segments an electrical
signal derived from the converter 502 into windows
such as rectangular windows or Hamming windows,
stores the segmented signals therein and then produces
them at the fixed delay noise output terminal 507 and
the variable delay noise output terminal 508. Correlation instrumental memories 509 to 512 measure the
correlation coefficients from delay 0 to T and store
them therein.
The correlation instrumental memory 509 measures
the autocorrelation coefficient of a noise-affected
speech sound signal from delay 0 to T by using a noiseaffected speech sound signal which is derived from the
fixed delay speech sound output terminal and has no
delay relative to the output signal derived from the
variable delay speech sound output terminal 506, and by
using a noise-affected speech sound signal which is
derived from terminal 506 and has delays from 0 to T
relative to the output signal from the output terminal
505. The correlation instrumental memory 509 then
stores the autocorrelation coefficient measured. Similarly, the remaining correlation instrumental memories
510 to 512 each measure the autocorrelation coefficient
of noise from delay 0 to T by using the correlation
coefficient between a noise-affected speech sound and
noise and the correlation coefficient between noise and
a noise-affected speech sound. Each memory stores the
autocorrelation coefficient measured. A correlation
adder/subtractor 513 performs the following calculation on the three kinds of the correlation coefficients
with respect to delay from 0 to T; (correlation coeffici-
5
10
15
20
25
30
35
40
45
50
55
60
'65
12
ent between a noise-affected speech sound and noise)+(correlation coefficient between noise and a noiseaffected speech sound) - (autocorrelation coefficient of
noise). The adder/subtractor 513 then applies to result
of the calculation as the second autocorrelation coefficient to a correlation subtractor 514. The correlation
subtractor 514 is supplied with the autocorrelation coefficient of the noise-affected speech sound stored in the
correlation instrument 509. The autocorrelation coefficient in this case is treated as a first correlation coefficient. Then subtracted from the first correlation coefficient is a second correlation coefficient linearly, nonlinearly or linearly in weighted manner. The result of the
subtraction is applied as a third correlation coefficient
to a linear predictive coefficient calculator 107. The
subtracting method in nonlinear manner or in linear but
weighted manner may be enumerated below:
Third correlation coefficient = first correlation coefficient-f (first correlation coefficient at delay 0, second correlation coefficient at delay O)Xsecond
correlation coefficient
Third correlation coefficient = first correlation coefficient-f (T)Xsecond correlation coefficient
Third correlation coefficient = first correlation coefficient-f (first correlation coefficient at delay 0, second correlation coefficient at delay O,T)Xsecond
correlation coefficient
where 7 represents a delay ranging from 0 to T; f
(first correlation coefficient at delay 0, second correlation coefficient at delay 0) is a function expressed by by
m1-m2. exp (-m3xsecond correlation coefficient at
delay O/first correlation coefficient at delay 0); K1 to
K3 are constants; f(7) is a function which monotonously
increases with 7 and satisfies the relation O
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?