Apple Computer Inc. v. Burst.com, Inc.

Filing 108

Declaration of Allen gersho in Support of 107 Response of Burst.com, Inc.'s Opposition to Plaintiff Apple Computer, Inc.'s Motion for Summary Judgment on Invalidity Based on Kramer and Kepley Patents filed byBurst.com, Inc.. (Attachments: # 1 Exhibit A to A. Gersho Declaration# 2 Exhibit B to A. Gersho Declaration# 3 Exhibit C to A. Gersho Declaration# 4 Exhibit D to A. Gersho Declaration# 5 Exhibit E to A. Gersho Declaration# 6 Exhibit F to A. Gersho Declaration (Part 1)# 7 Exhibit F to A. Gersho Declaration (Part 2)# 8 Exhibit G to A. Gersho Declaration# 9 Exhibit H to A. Gersho Declaration# 10 Exhibit I to A. Gersho Declaration (Part 1)# 11 Exhibit I to A. Gersho Declaration (Part 2)# 12 Exhibit J to A. Gersho Declaration# 13 Exhibit K to A. Gersho Declaration# 14 Exhibit L to A. Gersho Declaration# 15 Exhibit M to A. Gersho Declaration# 16 Exhibit N to A. Gersho Declaration# 17 Exhibit O to A. Gersho Declaration)(Related document(s) 107 ) (Crosby, Ian) (Filed on 6/7/2007)

Download PDF
Apple Computer Inc. v. Burst.com, Inc. Doc. 108 Att. 12 Case 3:06-cv-00019-MHP Document 108-13 Filed 06/07/2007 Page 1 of 5 Dockets.Justia.com Case 3:06-cv-00019-MHP Document 108-13 Filed 06/07/2007 Page 2 of 5 HIERARCHICAL YECTOR QUANTIZATION OF SPEECH WITH DYNAMIC CODEBOOK ALLOCATON Allen Gersho and Yair Shoham Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106 ABSTRACT This paper introduces a Hierarchical Vector Quantization (HVQ) scheme that can operate on "supervectors" of dimensionality in the hundreds of samples. HVQ is based on a tree-structured decomposition of the original supervector into a large number of low dimensional vectors, The supervector is partitioned into subvectors, the subvectors HVQ is based on a tree-structured decomposition of the given large dimensionahty input vector, called the s'uperuector. The supervector is split into subvectors, each subvector is then spht into lower dimensional minivectors, and so on until the lowest level subvectors, called ceils, are obtained. At each level (except the bottom level), a feature vector is extracted from each data vector at that into minivectors and so on. The glue' that links subvecto.rs at one level to the parent vector at the next higher level is a feature vector that characterizes th.e correlation pattern of the parent vector and controls the quantization of lower level feature vectors and ultimately of the final descendant data vectors. Each component of a feature leveL The feature vectors are themselves vector quantized and used to control the bit allocation for quantizing lower level feature vectors and ultimately the cells themselves. Each quantized feature vector contains a partial descripunit of side information about the supervector characteristics. The bottom level is composed of low dimensional vectors, the cells, which are efficiently quantized using the acquired side information. The essence of this scheme lies in the definition of the various feature vectors and in the way they are used in the coding process. It is crucial that these vectors be highiy tion of the data vector at that level and provides a vital vector is a scalar parameter that partially describes a corresponding subvector. The paper presents a three level HVQ for which the feature vectors are based on subvector energies. Gain normalization and dynamic codebook allocation are used in coding both feature vectors and the final effectiveness of HVQ for speech waveform coding at 9.6 and 16 Kb/s. data subvectors. Simulation results demonstrate the Also, the way vectors from different levels are interrelated during the coding process must be simple and efficient. Eudidean norm of a subvector as a basic feature. Normalization and codeword allocation are used in exploiting the side information during the coding process. In Section 2, the system is in;troduced and explained in detail. Section 3 discusses the dynamic codeword allo- correlated and easy to extract from the data vectors. This paper focuses on a three level HVQ using the 1. INTRODUCTION over hundreds of samples, i.e., corresponding to basic sound units of duration 0.1 seconds or larger. Although vector quantization (VQ) is in principle the ideal way to encode a block or segment of this size, the associated computational and storage complexities are astronomically high. Existing suboptimal VQ schemes that trade performance for a reduced complexity are still limited to dimensionalities in the range of 10 to 20. For a review of current work in VQ, see 115] and [16J. Speech waveforms may contain significant correlation tion 5 considers the capability of the system as a variable dimension VQ. Finally, Section 6 describes simulation results. 2. THREE LEVEL HVQ - GENERAL DESCRIPTION cation problem. Section 4 contains a discussion on the design of the various codebooks used in the system. Sec- The Hierarchical Vector Quantization (HVQ) scheme proposed in this paper can operate on vectors of dimensionality in the hundreds of samples. HVQ is particularly suitable for speech waveform coding because speech is characterized by large correlated segments, due to its mechanism. The correlation pattern within a speech segment, or some other feature of its internal structure, may dimensionality much lower than that of the segment itself. This vector contains valuable side information about the dimensional subsets of the segment. quasi-periodicity and its slowly varying production The proposed scheme is shown in Figure 1. The high dimensional supervector, Xz(xi,z2, ' ' ' ,zJ(), where K is the dimension of the input supervector, is partitioned into M subvectors: x(x1,x2 ,...., X11) where X denotes the i' subvector. An Id-dimensional feature vector, denoted by ,z,), is defined and associated with the vec· tor on the basis of one scalar feature per subvector. In the proposed coder, this feature is simply the norm of the corresponding subvector, although other definitions are possible. s is, therefore given by: = IZH = be described by a properly defined feature vector of segment, which can be used to efficiently code lower (1) Ex2J2 Each subvector is further partitioned into L ceilvecof dimension k--K/(ML): called cells, tors, subcell of the x=(x11,x2,. . ,x) where Xi,- is the This work was supported th part by the Air Force Office of Scientific Research under erant AFOSR es-coca. vector. An L-dimensional feature vector, denoted by 10.9.1 CH1945-5/84/0000-0107 $1.00 © 1984 IEEE Case 3:06-cv-00019-MHP Document 108-13 Filed 06/07/2007 Page 3 of 5 =i5=1 R K(b--b) (3) x where R is the overall number of bits assigned to the main information (the quantized cells), b is the overall system bit rate and b2 is the bit rate of the side information (the quantized norms). Both b and b3 are given in bits/sample. values from a given finite set: Since there are practical limitations on the number and size of the third level codebooks, can only assume a B1,B2, B=B , (4) Also, the allowed codebook sizes 2 in=l,..,q, must be non-negative integers. A reasonable, but not necessary, choice for the set B is average of a given distortior measure, subject to the above The optimal allocation rule aims at minimizing an constraints. The distortion measure is defined over the entire supervector and denoted by d(X,X), where I is the quantized supervector. For mathematical tractability we impose an additivity restriction on the distortion measure, to be decomposable in terms of subvectors and cells: d(X,I) = Figure 1. Three level HVQ scheme. i=1 (X) = =Ij=I () This also decouples different cells with respect to their contribution to the overall distortion. This means that each cell can be assigned a measure d independently of other cells. It is easy, in this case, to achieve noise shaping by assigning the cells different weights. P=(p1,p2,.. ,p), is associated with each subvector and consists of the corresponding cell norms. Thus, the norm of the j' cell in the i subvector is given by: p5IXH= zt-xj Ex22 (2) The definition of the features as norms of of subvectors is particularly suited to the following distortion measure: ML d(X,I) = (X--I)t W(X--X) lXj--X 12 (6) =i5 =1 fashion: In thefirst level, the M-dimensional vector S is The quantization is carried out in a hierarchical where W is a triangular non negative matrix whose elemay be chosen to achieve a desired shaping of the quantization noise. If no shaping is desired, we set Wq 1 and get the usual Euclidean distance for the distortion measure. ments on the main diagonal are w. The constants w In the subsequent discussion we assume that w=1 for simplicity. It can be shown that the only change required to account for the case where wj I is to replace p,, by = Using (6) the expected value of the distortion measure expectations as follows: quantized to S using one first level codebook. In the second level, each L-dimensional vector .P is normalized using and then quantized, using a second codebook, In the third level, each cell is normalized by the corresponding quantized norms from the second level and quantized by a set of third level variable size codebooks. The main idea of this quantization structure is that are passed all the way down to every minivector, thereby, influencing the way it is quantized. One way in which the quantization of an individual cell is influenced by the entire input speech segment is the normalization process mentioned above. The second way is through dynamic codebook allo cation whereby the coder uses the quantized vectors of norms to optimally allocate a codebook of an appropriate size to each cell while maintaining a fixed number of bits for the overall specification of the supervector. 3. THE CODEBOOK ALLOCATION PROBLEM some features of the entire supervector characteristics D=Ed(X,X). may be written in terms of conditional D =E where ML =15=i D (5 .rc) (7) = II - 112/ p r} (8) D This expression is conditioned on the scalars p, and r which are randomly related to each other. Nevertheless, the allocation algorithm (to be determined) deterministi- ( ,r) , The codebook allocation takes place in the third level where each cell is coded using a codebook of different size. Through this operation (and the normalization process - to cally maps the set P=p, , i=l,M ,j=1,L to the set be discussed later) coding of each individual cell is =1 j Rrio, ihas,M be1,L To minimize (7), the double summat n to minimized over all possible mappings, affected by the entire block pattern in such a way as to ping from the quantized version of the norms , to the corresponding codebook sizes denoted by n. The codebook sizes are, more conveniently, expressed by the required number of bits as: r,=log2n5. One basic constraint is imposed on these allocation values, that Is: minimize the overall distortion. The problem is to find the allocation rule, or, the map- subject to the constraints imposed on r. The optimization problem cannow, be stated as follows: Given the set P, minimize D (, ,r) over all 15=1 possible sets R, subject to >r.j=R and 5= It =1 To solve the problem, the conditional expectations D (p, ,r,) should be explicitly expressed in terms of and r. This functional dependency, which is crucial to the 10.9.2 Case 3:06-cv-00019-MHP Document 108-13 Filed 06/07/2007 Page 4 of 5 allocation algorithm, is not given. However, a good estimate of it can be found, based on the following arguments. As will be explained in the next section, the cell quantizer is of gain-shape type which means that the quantized cell is given in the form (9) Z5 =fi5A where A denotes a codevector from a shape codebook of size n. using (9) we get: (10) ·--·1 Gain Quaflti2j L_ 1 x We, now, assume that conditional expectalaca [a (to) is This assumption, which has essentially independent of been verified experimentally is heuristically explained by the fact that due to the large long-term dynamic range of independent of the cell Shape. Therefore, the performance of the shape codebook in coding the cell shape, is almost independent of the gain of the cell. As a result, (10) can be rewritten as: D (flu ,rV) Figure 2. Gain normalized vector quantizcr. the speech waveform, the cell energy is essentially vector X is given by: quantizer. As shown, the quantized version of the input = G(r) (ii) G(r) represents the distortion of a normalized cell quantizer as a function of the codebook size (in bits). Values of 17(r) versus r, for rrB, are obtained during the codebook design. Experimental data shows that 17(r) can, quite XaA (14) where a, an input gain value, is supplied by another external quantizer. A represents a codevector from a shape codebook. The design objective is to optimize this shape codebook taking into account the statistical relationship of accurately, be modeled as: G(r) = (12) The parameter x can be found from a best fit to the design the supplied gain to the input vector. The basic idea data, and is approximately equal to 2//a. Using (12) the problem reduces to that of minimizing: (13) with respect to the set R, for a given set P, subject to (3) and (4) We briefly outline two methods for solving the problem. In the ffrst method the problem is, initially solved without the constraint (4), i.e., r,, are assumed to be continuous but non-negative values. This problem is solved using variational calculus techniques. See [i]. The resulting r-'s are, then, quantized to the nearest value in the set B, as reqwred by (4). Tins last operation may violate the constraint (3), but, by properly constructing the set B, the efTect of tins violation can be made negligible. The second method is based on the "Marginal Returns" approach [21 and provides an exact solution. It works in behind this structure is the utilization of the information, carried by a about the input vector, to improve the coder performance. The design of tins coder is carried oat using an algorithm, similar to the LBG algorithm. The coding (partition) rule and the centroid calculation are modifIed to account for the special situation but the logical flow of the algorithm is exactly the same as described in [3]. The coder has, in fact, two inputs, X sail a, which may be combined into one augmented input vector (X,a). The coding rule is defined over the space of the augmented input. Let us denote by A5 the j codevector in a codebook of size M. The optimal coding rule is, then, given by: X --- 1 a II j1,M (15) where j is the index of the best codeword, given the input and given by: (Xc). This coding rule implies a partition over the augmented space, whose individual subset is denoted by Cj = (16) A5 'is chosen C, With the aid of (16) we define the following conditional expectation . the case where the set B is uniform, i.e., B0,t,2t,.,cjt. ' is the bit allocation increment. The process goes as fol- lows: r are initialized to zero and the marginals marginal is mdmum, is incremented by t bits. These steps are repeated until (9) is satisfied. 4. THE CODEBOOK DESIGN are found for all j,j, r, for which the / (X,a) EcrX (17) Then, it can be shown that the centroid, or the optimal codeword is given by: A5 codebook is used in coding the vector S. The second level codebook codes each of the vectors P, .r1,..,M and provides the set P whIch determine the allocation set R. A set of q third level codebooks of sizes B1,B2,. ,B7 (in bits) is used to code the cells X. This set constitutes the dynamicafly controlled, variable size third level codebook. The first level codebook is designed as a standard VQ using the LBG algorithm [3] with the Euclidean distance for the distortion measure. Three codebooks must be designed. The first level (19) Both the second and the third level quaritizers are based on a special form of a gain-shape quantizer [4] where the gain is provided externally and is not subject to optimization. Figure 2 depicts the structure of such a Note that unlike [4] we do not impose the constraint that the shape vector has unit norm. Rather, an optimal normalization process determines the shape norms, allowing improved performance. Expressions (15) and (16), which state necessary conditions for optimnality, are used in the LBG design procedure to get the optimal shape codebooks of the second and third levels. The codebook design is, usually, based on a training set winch is assumed to adequately represent the random 10.9.3 Case 3:06-cv-00019-MHP Document 108-13 Filed 06/07/2007 Page 5 of 5 process X, The conditional expectations are estimated from the training set by: EçoX = --k'-'-- 3I and (19) (X.s4cCj = S (XMCC tors of norms. This was done by segmenting a speech waveform into variable size 'steady state" segments and transforming each segment to the frequency domain using the discrete cosine transform. Since the smoothed shortterm power spectrum of a speech segment belongs to a very limited family of possible shapes, the resulting vectors are highly correlated. The speech material was composed of B male and 3 female voices. The full speech segment was 75 sec and was sampled at a rate of 8000 s/sec. (20) where the required shape codebbook is too large a suboptimal low complexity YQ method can be incorporated. In partic- is the number of training vectors in C5. To avoid excessive coding complexity, in cases where the range from 256 to 768 data points. Three-stage codeooks 5] were used to reduce the coding mpleonty. In the sirriulations, the parameters ML and IC were 16, 12 and 4 respectively. The segment Length varied in About 30% of the total number of bits was allocated to the side information. The system was tested with two different bit rates: 2.0 and 1.2 bit/sample. Preliminary simulations without any system optimization eave the following results in terms of MSE signal to quantization noise ratio: 20db for 16.0 kb/s and 15db for 9.5 kb/s Although the SNR is not necessarily indicative of with other coding systems. The use of noise shaping allows an improved perceptual quality that is not indicated by the ular, it is convenient, in this case, to use the multi-stage VQ approach [5]. It can be shown that equations (10) through (12) still hold, so, the allocation algorithm does not change. The training set used in the codebook design is a large set of augrn.emted vectors. This means that a set of quantized gains has to be prepared before designing the second and the third codebooks The overall system design is. therefore conducted in the following sequence; speech perceptual quality, it enables quick comparison (1) Given a training set of K-dimensional supervectors, prepare an M-dimensional training set of vectors of norms by calculating the norm of the corresponding subvectors, (2) Design the first level codebook using the above training st and store the resulting training set of quantized M-dimensional norm vectors. ing the L subnorrns. SNR values. Informal Listening tests indicate that good communication quality is achieved at 9.6 kb/s and very nearly toll quality is achieved at 16 kb/s. References [6]-{i6] describe several representative speech wavetorm coding systems for w'nic'n the ll!$t is reported. Specifically, 112]-[16] refer to VQ based systems. Based on the above preliminary results, the HVQ (3) Prepare a training set of L-dimensional norm vectors by splitting each subvector into L cells and calculat- scheme outperforms all of these systems and, thus, appears to be very promising. (4) Design the second level codebook using the training set of step (3) and the quantized norms (gains) of step (2). Store the resulting training set 'of quantized subnorm vectors. (5) Prepare a training set of cells by splitting the subvec- REFEIgENCES [1] A Segall. "Pit Allocatioti Encoding for Vector Sourcee", IEEE Tr. on Ini. Th, IT-22 No 2 pp 162-:89 Mar 76. and tors into L k-dimensional cells. (6) Design the third level codebooks using the training set of step (5) and the quantized gains of step (4). Note that q such codebooks have to be designed, one for each of the B1,B2,,.,B5 codebook sizes. 5. YARIABLE DIMENSION VQ [2] 2. Fox. "Discrete Optimization via Marginal Analysis". Management Sri, Vol 13, pp 210-213, Nov66 [3] Y. Linde, A. Buzo, EM. Gray. "An Algorithm for Vector Quantizer Deoign" IEEE T. on Comm. COM-28 pp. 84-95 Jan 80. [41, M..l. Sejote. end. W.V.. Gt&y, "Wrod.u.ct Costa Vantur e.rtaLathm- tao Speech Waveform Coding", Con!. Eec. Globecom 82, pp. lE87"9:. Dec 3] B.H. Juang. A.H. Gray Jr., 'Multiple Stage Vector QuantiZation for Speech Coding", Proc. mt. Con!. ASSP, pp. 597-600, Paris, May 82. variable dimension vector quantization (VDYQ). This is because the major difficulty in VDVQ, maintaining a fixed bit rate, is readily solved here through the partitioning and codeword allocation processes. The parameters L,M and/c are kept fixed to allows for a maximum main dimension of The H3TQ scheme allows an easy way of implementing [6] EM. Ccx, 6.11, Croohies-e. "Heal Time Simulation of Adaptive No. 2, Apr 51. Transform floding", IEEE Tr. on Acoust.. Hp, Sig.Proc, Vol. ASSP-29. [7] N.S. Jayant, "Pitch Adaptive DPCM Coding of Speech With Two Bit Quantizatiorl and Fixed Spectrum Prediction", BSTJ Journal, pp. 440434, March 1977, [8] CS. Xydeao, C.C Evci, B. Steele, "Sequential Adaptive Predictors for ADPCM Speech Encoders", IEEE Tr. on Comm. Vol. COM-30. No. 8. pp. 1942-54, Au 82. K=kLM. If, however, the actual dimension K is less then Km, the missing data points are simply assigned zero values. Unlike the usual VQ schemes bits are not wasted on coding zero valued components since they define cells with zero norm and, by means of the feature [9] RE. Crochiere. S.A. Webber, J.L. Flanagan. Digital Coding of Speech in Sub-band!". BSTJ, Vol. 55, pp. 1069-65, Oct 76. [10] L.C. Stewart, EM. Gray. Y Linde, "The Design of Trollig Waveform Coders". IEEE Tr. on Comm. Vol. C0M30, pp. 702-710, Apr gz 111] 6. Zelinsks, P Nell, "Adaptive Transform Coding of Speech Signals" IEEE Tr oa itcou.s. Spot.. Et Proc. McI.. 455W-OW, Mo, 4, Ang 77 vectors, are assigned zero number ol bits and are not cation algorithm is adjusted to the new dimension by transmitted. An additional small amount of sideinformation is transmitted to inform the receiver of the actual segment size K. To keep the bit rate fixed, the allochanging only the parameter R. VDITQ is important in the context of speech waveform coding. It enables an efficient waveform segmentation into the natural variable-length speech units. 6. SIMULATION RESULTS [12] EM. Gray, 14. Abut. "Full Search and Tree Search Vector Quantication of Speech Waveforms", ICASSP-62 Proc., pp. 593-96, Paris. May 82. [13] H. Abut, EM. Gray, "Vector Quantization of Speech and $peechLike Waveforms", IFEE Tr, on 45SF. Vol. Amp-3D, pp. 423-436. Jun 62. [14] MM. Cuperniati and A. Gersho, "Adaptive Ds2eential Vector Coding of Speech," Con!. Bee., IEEE Global Communications Con!., Miami. Dec. 1982, pp 1092-1096. [is] A. Gersho and V Cuperman, "Vector Quantization A PatternMatching Approach to Speech Coding," IEEE Communications Magazine, to appear, December 1983. [18] 6. V. Gray, "Vector Quantization," IEEE ASSP Magazine, to appear. April 1984. To demonstrate the operation of the system we constructed a special training set, with highly correlated vec- 10.9.4

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?