Apple Computer Inc. v. Burst.com, Inc.

Filing 108

Declaration of Allen gersho in Support of 107 Response of Burst.com, Inc.'s Opposition to Plaintiff Apple Computer, Inc.'s Motion for Summary Judgment on Invalidity Based on Kramer and Kepley Patents filed byBurst.com, Inc.. (Attachments: # 1 Exhibit A to A. Gersho Declaration# 2 Exhibit B to A. Gersho Declaration# 3 Exhibit C to A. Gersho Declaration# 4 Exhibit D to A. Gersho Declaration# 5 Exhibit E to A. Gersho Declaration# 6 Exhibit F to A. Gersho Declaration (Part 1)# 7 Exhibit F to A. Gersho Declaration (Part 2)# 8 Exhibit G to A. Gersho Declaration# 9 Exhibit H to A. Gersho Declaration# 10 Exhibit I to A. Gersho Declaration (Part 1)# 11 Exhibit I to A. Gersho Declaration (Part 2)# 12 Exhibit J to A. Gersho Declaration# 13 Exhibit K to A. Gersho Declaration# 14 Exhibit L to A. Gersho Declaration# 15 Exhibit M to A. Gersho Declaration# 16 Exhibit N to A. Gersho Declaration# 17 Exhibit O to A. Gersho Declaration)(Related document(s) 107 ) (Crosby, Ian) (Filed on 6/7/2007)

Download PDF
Apple Computer Inc. v. Burst.com, Inc. Doc. 108 Att. 6 Case 3:06-cv-00019-MHP Document 108-7 Filed 06/07/2007 Page 1 of 12 Dockets.Justia.com Case 3:06-cv-00019-MHP Document 108-7 Filed 06/07/2007 Page 2 of 12 I Robert M. Gray A vector quantizer is a system for mapping a sequence of continuous or discrete vectorsinto a digital sequence suitable for communication over or storage in a digital channel. The goal of such a system i s data compression: to reduce the bit rate so as to minimize communication channel capacity or digital storage memory requirements while maintaining the necessary fidelity of the data. The mapping for each vector may or may not have memory in the sense of depending on past actions of the coder, just as in well established scalar techniquessuch as PCM, which has no memory,and predictive quantization, which does,Even though information theory implies that one alwaysobtain better performance can by coding vectors instead of scalars, scalar quantizers have remained by far the most common data compression system because oftheirsimplicity andgoodperformancewhen is sufficiently large. In addition, thecommunicationrate relativelyfewdesigntechniqueshaveexistedforvector quantizers. During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of thesecodes has beenstudied for speech waveforms,speech linearpredictiveparameter vectors, images, andseveral simulatedrandom processes. It is the purpose of this article to survey some of these design techniques and their applications. ATA compression is the conversion of a stream of analog or very high rate discrete data into a stream of relatively low rate data for communication over a digital communication link or storage in a digital memory. As digital communication and secure communication have become increasingly important, the theoryand practice of data compression have received increased attention. While it is true that in many systems bandwidth is relatively inexpensive, e.g., fiber optic and cable n/ links, in most systems the growing amount of information thatusers wish to communicate or store necessitatessome form of compression for efficient, secure,and reliable,use of the communication or storage medium. A prime examplearises with imagedata, wheresimple schemes require bit rates too large for many communicatipn links or storagedevices. Anotherexamplewherecompression is required results from the fact that if speech is digitized using a simple PCM system consisting of a sampler followed by scalar quantization, the resulting signal will no longer have a small enoughbandwidth to fit on ordinary telephone channels. That is, digitization (which may be desirable for security or reliability) causes bandwidth expansion. Hence data compression will be required if the original communication channel is to be used. The two examples of image compression and speech compression or, as they are often called, image coding and speech coding, are probably the currently most important applications of data compression. They are also among the most interestingfor studybecauseexperience has shown that both types of data exhibit sufficient structure to permit considerablecompression withsufficientlysophisticated codes. Such conversion of relatively high rate data to lower rate data virtually always entails a loss of fidelity oran increase in distortion. Hence a fundamental goal of data compression is to obtainthe best possible fidelityforthe given rate or, equivalently, to minimize the rate required for a given fidelity. If a system has a sufficiently high rate constraint, then good fidelityis relatively easy to achieve and techniques such as PCM, transform coding, predictive coding, and adaptive versions of these techniques have become quite popular because of their simplicity and good performance t1,2,31. All of these techniques share a fundamental property: The actual quantization or coding or conversion of continuous quantities into discrete quantities is done on scalars, e.g., on individual real-valued samples of waveforms or pixels of images. PCM does this in a memoryless fashion; that is, each successive input is encoded using a rule that does not depend on any past inputs or outputs of the encoder. Transform coding does it by first taking block transforms of a vector and then scalar coding the coordinatesof the transformed vector. Predictive coding does it by quantizing an error term formed as the difference between the new sample and a prediction of the new sample based on past coded outputs. A fundamentalresult of Shannon's rate-distortion theory, thebranchofinformationtheorydevoted t o data compression, is that better performance can always be achieved by coding vectors instead of scalars, even if the data source is memoryless, e.g., consists of a sequence of independent random variables, or if the data compression system can have memory, i.e., the action of an encoder at each time is permitted todepend on past encoder inputs or outputs[4,5,6,7,8]. While some traditional compression schemes such as transform coding operate on vectors and achieve significant improvement over PCM, the quantization i s still accomplished on scalars and hence these systems are, in a Shannon sense, inherently suboptimal: better performance is always achievable in theory by codingvectors instead of scalars, even if the scalarshave been produced by preprocessing the original input data so as to make them uncorrelated or independent! 0740-7467i84iO400-0004S1.00~~1984 IEEE 0 ~ 4 IEEE ASSP MAGAZINE APRIL 1984 Case 3:06-cv-00019-MHP Document 108-7 Filed 06/07/2007 Page 3 of 12 This theory had a limited impact actual system design vector quantization to on adapt awaveform coder, which may be another VQ. because I ) theShannontheorydoesnotprovideconstructive design techniques for coders, vector and We nextpresentavarietyofsimulationresultsde2) traditional scalar coders oftenyield satisfactory per- scribing the performance of various VQ systems on variAs a ous data sources. formance with enough adaptation and fine tuning. Examples of all of the above VQ varieties result, few design techniques for vector quantizers were are tested for waveform coding applications two on considered in the literature prior to the late 1970's when it common data sources: a Gauss Markov source and real was found that a simple algorithm of Lloyd [91 for the sampledspeech. One bit per samplecoders for these iterative design of scalar quantization or PCM systems ex- sources are compared onthe basis of performance, tended ina straightforward way to the design of memory- memory requirements, and c o m m i o n a l complexity. less vector quantizers, that is, of vector quantizers which Both memoryless and simple feedback Sector quantizers encode successive inputvectorsin amannernotdeare studiedforvoicecodingapplications at a rateof pending on previous encoder input vectors or their coded bits/sample and less and for image coding at a rate 0.062 outputs. Variations the of basic algorithm have since of 0.5 bit per sample. One example is given of a simple adaptivepredictivevectorquanfizerfor speechwaveproved useful for the design of vector quantizers with and without memory for a variety of data sources including form coding. data By studying a variety of codingsystems on common speechwaveforms,speechparametervectors,images, sources, the results yield some general comparisons and and several random process models, the latter being trends among the various vector quantization techniques. of the resulting useful for gauging performance the codes with the optimal performance bounds of informa- Thereader should, however, keep two caveats in mind tion theory. when interpreting such quantitative results: First, the emThis paper is intended as a survey of the basic design phasis here is on low bit rate systems, e.g., speech coders algorithm and many of its variations and applications.We using 1 bit per sample or less and image coders 1/2 bit per begin with the simplest example of a memoryless vector pixel. Comparisons favoring certain systems at such low for quantizer, a vector generalizationPCM. of For con- rates may not be valid the same systems at higher rates. venience we use the shorthand VQ for both vector quan- Second, the numbers reported here are intended to provide comparisons for different systems used on common tization vector and quantizer. Necessary properties of optimal quantizers are described and an algorithm given data sources; they can be compared with other numbers which uses these properties to iteratively improve a code. reported in the literature only with great care: the input For concreteness, we focus on twoexamples of distortion data and the system design parameters such as sampling rate and pre- or post-filtering may be quite different. measures: theubiquitous mean-squarederrorandthe Applications of vector quantization toreal data sources Itakura-Saito distortion. The first example, whichis popustill lar in waveform coding applications, provides a geometric such as sampled speech waveforms and images are flavor to the development; thesecond example, which is young and the algorithms do not yet incorporate the souseful in voice coding applications, helpsto demonstrate phisticated "bells and whistles'' of many well-established the generality and power of the technique. scalar quantization schemes. Thepreliminary experiments Next, various techniques are described for designing described here, using fairly simple vector quantizers with the initial codes required by the algorithm. These tech- and without memory, demonstrate that the general apniques also indicatesomeusefulstructurethat can be proach holds considerable promise for some applications. imposed on vector quantizers to make them more imple- For example, good qualityvocoding systems using VQ and mentable. Several variations ofthe basic VQ are detheItakura-Saitodistortionhavebeendeveloped at scribed which permit reduced complexity or memory or 800 bits per second, a significant reduction in the bit rate both at the expense of a hopefully tolerable loss of per- previously required for comparable quality While the [IO]. formance. These includetree-searched codes, product compression achieved so far in waveform coding and imcodes, and multistep codes. age coding applications using the squared-error distorWe then turn from memorylessvectorquantizers to tion has not yet beenas significant, we believe that it has those with memory: feedback vector quantizers such as yielded comparable or better performance at low rates vector predictive quantizers and finite-state vector quanthan traditional scalar schemes of greater complexity. The tizers. These codes are not yet well understood, but they quality of the lh bit per pixel images shown here is prompossess a structure highly suited to VLSl implementation ising given the simplicity of the coding scheme used. and initial studies suggest that they offer significant perWe attempt to use the minimum of mathematics and a formance gains. maximum of English in the presentation so as to focus on For comparison,we also brieflydescribetrellisenthe intuitive ideas underlying the design and operation of coding systems or "lookahead" or "delayed decision'' or vector quantizers. The detailed descriptions of the vari"multipath search" codes which use the same decoder as ous algorithms can be found in the citedreferences. The a feedback vector quantizerbut which permit the encoderreader is also referred to a recent tutorial by Gersho and to base its decision on a longer input data sequence. whichpresents a briefoverviewof Cuperman [Ill A final general code structure is described which uses VQ applied to speech waveform coding. APRIL 1984 IEEE ASSP MAGAZINE M M RCase VECTOR QUANTIZERS E O Y E S 3:06-cv-00019-MHP LS Document 108-7 Filed unlike scalar Page 4 of 12 Observethat06/07/2007 quantization,general V Q permits fractional rates in bits per sample. For example, scalar PCM must. have a bit rate of at least 1 bit persample while a k dimensional V Q can have a bit rate of only I l k bits per sample by having only a single binary channel symbol for k-dimensional input vectors. is The goal of such a quantization systemto produce the "best" possible reproduction sequence for a given rate R. To quantify thisidea, to define the performancea of quantizer, and to complete the definition of a quantizer, we require the idea of a distortion measure. Distortion Inthissectionweintroducethe basic definition of memorylessvectorquantizers,theirproperties,and an algorithm for their design. Quantization Mathematically,ak-dimensionalmemorylessvector quantizer or, simply, a V Q (without modifying adjectives) consists of two mappings: an encoder y which assigns to , x ~ - ~channelsymbol a) each inputvector x = (xo,xl, y ( x ) in some channel symbol set M, and a decoder p assigning to each chanFel symbol u in M a value in a reproduction alphabet A. The channel symbol set is often assumed to be a space of binary vectors for convenience, e.g., M may be the set of all 2R binary R-dimensional vectors. The reproduction alphabet may or may not be the same as the input vector space; in particular, it may consist of real vectors of a different dimension. If M has M elements, then the quantity R = logz M is called the rate ofthequantizerinbitspervectorand r = R/k is the rate in bits per symbol or, when the input is a sampled waveform, bits per sample. The application of a quantizer to data compression is depictedinthestandard Fig. 1. The input datavectors might be consecutive samples of a waveform, consecutive parameter vectors in a voice coding system, or consecutive rasters or subrasters in an image coding system. For integer values of R it is useful to think of the channel symbols,theencodedinputvectors, as binary Rdimensional vectors. As is commonly done in informawe, tion and communication theory, assume that the channel is noiseless, that is, that U , = U,. While real channels are rarely noiseless, the joint source and channel coding a good data theorem of information theory implies that compression system designed for anoiseless channel can be combined witha good error correction codingsystem for a noisy channel in order to produce a complete system. In other words, the assumption of a noiseless channel is made simplyt o focus on the problem data compression of system design and not to reflect any practical model. A distortion measure d is an assignment of a costd(x,N of reproducing any input vector x as a reproduction vector 1. Given such adistortion measure, we can quantify the performance of a system by an average distortion d(X,X)between the input and the final reproduction: A system will be good if it yields a small average distortion. sample In practice, the importantaverage is the long term average or time average lim L.2' d(Xi,.Xi) n-n ,=O provided, of course, that the limitmakes sense. If the vector process is stationaryand ergodic, then, with probability one, the limit exists and equals an expectation (d(X,X)). For the moment we will assume that such conditions are met and that suchlong termsample averages are given by expectations. Later remarks will focus on the general assumptionsrequiredandtheirimplicationsforpractice. Ideallyadistortionmeasureshouldbetractableto permit analysis, computable so that it can be evaluated in real time and used in minimum distortion systems, and subjectively meaningful so that large or small quantitative distortion measures correlate with bad and good subjective quality. Here we do not consider the difficult and controversial issues of selecting a distortion measure; we assume that one has been selected and consider means of designing systems which yield small average distortion. For simplicity and to ease exposition, we focus on two important specific examples: (1) The squared error distortion measure: Here the input and reproductionspaces are k-dimensional Euclidean space d(x,9) = IIX 4 "-1 CHANNEL - k(12 = k-l Figure 1. Data CompressionSystem. The data or information source {Xn;n = 0,1, , . . } is a sequence of random vectors. The encoderproducesaseqyence of cbannel symbols {&: n = 0 , 1 , 2 , , . . }, The sequence {U,,; n = 0, 1,2, . . . } is delivered to the receiver by the digital channel.The decoderthenmaps this sequence of vectors in-to the final reproduction sequence {Xn:n = 0, 1,2, . . , }. i=O x (Xi - 2i)2, the square of the Euclidean distance between thevectors. This is thesimplestdistortionmeasureandthemost common for waveform coding. While not subjectively meaningfulin many cases, generalizationspermitting input-dependent weighting have proved useful and only slightlymorecomplicated. For the squared-error distortion itis common practice to measure the performance of asystem by the signal-to-noise ratio (or signal-toquantization-noise ratio) 6 IEEE ASSP MAGAZINE APRIL 1984 Case 3:06-cv-00019-MHP Document 108-7 Filed 06/07/2007 Page 5 of 12 power spectral density G, The above formula for the distortion is one of the simplest, yet it demonstrates that the distortion measure is This corresponds to normalizing the average distortion indeed complicated-it is not a simple function of an a logarithmic error vector, it is not symmetric in its input and output by the average energy and plotting it on scale: Large (small) SNR correspondstosmall(large) arguments, and it is not a metric or distance. Because of average distortion. the intimate connection of this distortion measure with (2) The (modified) Itakura-Saito distortion: This distor- LPC vocoding techniques, we will refer to VQ`s designed tion measure i s useful in voice coding applications where using this distortion measure asLPC VQ`s. the receiver is sent a linear model of the underlying voice production process. The distortion measure i s based on Average distortion As the average distortion quantifies the performanceof the"errormatchingmeasure"developedinthepiowill be trying to minimize quanthis neering work of ltakura and Saito on the PARCOR or LPC a system and since we tityusinggoodcodes, wepause to consider what the approach to voice coding [12]. More generally, this disaverage means in theory and in practice. tortion measure is a special case of a minimum relative As previouslynoted,inpracticeit is thelongterm entropy or discrimination measure; VQ using suchdissample average of (I) that we actually measure and which tortion measures can be viewed as an application of the small. If the process is stationary and minimum relative entropy pattern classification technique we would like to be ergodic, then this limiting time average is the same as [I31 as an applicationofinforintroducedbyKullback mationtheoryto statisticalpatternclassification. (See the mathematical expectation. The mathematical expectation is useful for developing information theoretic peralso [14,15].) is formance bounds, but it often impossible to calculate in We here introduce a minimum of notation to present a practice because the required probabilitydistributions are definition of the ltakura-Saito distortion measure. Details and generalizations may be found in [16,17,14,15]. Here not known, e.g., there are no noncontroversial generally real the input vectorcan again be consideredas a collection of accepted accurate probability distributions for speech and image data. Hence a pragmatic approach to system consecutivewaveformsamples.Now,however,the sequences of training data, estimate al, ap), where design is to take long output vectors have the form 9 = (a, a2, the "true" but unknown expected distortion by sample the a is a positive gain or residual energy term and where the average, and attempt to designa code that minimizes the ai with a. = 1 are inverse filter coefficients in the sense If sample average distortion for the training sequence.the that if input source is indeedstationaryandergodic,thereP sultingsample average shouldbenearlytheexpected A(z) = aiz-` value and the same code used on future data should yield i=O approximately the same averages [181. then the all-pole filter with z-transform I/A(z) is a stable The above motivates a training sequence based design filter. Here the reproduction vectors may be thought of for stationary and ergodic data sources. In fact, even if the as all-pole models for synthesizing the reproduction at "true" probability distributionsare known as in thecase of the receiver using a locally generated noise or periodic a Gauss Markov source, the training sequence approach a linear source, in other words, as the filter portion of reduces to a standard Monte Carlo approach. predictivecoding (LPC) modelinavocoding(voice An immediate objection to the above approach, howcoding) system. The ltakura-Saito distortion between the makes sense for real sources ever, is whether or not it input vector and the model can be defined in the time which may be neither stationary nor ergodic. The answer domain as is an emphatic "yes" in the following sense: The desired a property is that if we design code based on a sufficiently afR(x)a a,,(x) d(x.91 = - In -- 1. \,, long training sequence and then use the code on future a a data produced by thesame source, then the performance where af = ( I , + , * - . , a p ) , R(x) is the (p + 1) X (p 1 ) of the code on the new data should be roughly that sample autocorrelation matrix of the input vector x, and achieved on the training data. The theoretical issue is to where aJx) is an input gain (residual energy) term defined provideconditionsunderwhichthisstatement can be as the minimum value of brR(x)b, where the minimum is made rigorous. For reasonabledistortion measures, a taken over all vectors b with first component equal to 1 . sufficient condition for this to be true for memoryless There are many equivalent forms of the distortion mea- VQdesign i s thatthesourcebeasymptotically mean sure, some useful for theory and some for computation. stationary, it need not be either stationary nor ergodic Frequency domain forms show that minimizing the above[19,20,21,22,23]. Asymptotically mean stationary sources distortion can be interpreted as trying to match the sample include all stationary sources, block (or cyclo) stationary spectrum of the input vector to the power spectral density sources, and asymptotically stationary sources. Processes of the linear all-pole model formed by driving the filter such as speech which exhibit distinct short term and long with z-transform I/A(z) bywhitenoisewithconstant term stationarity properties are well modeled by asymp,+ + APRIL 1984 IEEE MAGAZINE ASSP 7 totically Case stationary sources [211. mean 3:06-cv-00019-MHP classicalFiled 06/07/2007 development for optimal PCM with Document 108-7 Page 6 of 12ameanThe key point here is that the general design approach squared error distortion measure. The following definiusinglongtrainingsequences does notrequireeither tion is useful for stating these properties: collection of The ergodicity nor stationarity to have a solid mathematical possible reproduction vectors C = {all y : y = p(u), some foundation. In fact, the mathematics suggest the followu in M} is called the reproduction codebook or, simply, ingpragmaticapproach:Trytodesign 'a code which codebook ofthequantizeranditsmemberscalled minimizes the sample average distortion for a very long codewords (or templates). The encoder knows the structraining sequence. Then use the code on test sequences ture of the decoder and hence all of the possible final produced by the same source, but not in the training se- output codewords. quence.lftheperformance is reasonablyclose tothe design values, thenone can have a certainamountof average Property 7: Giventhegoalofminimizingthe confidence that the code will continue to yield roughlydistortion and given a specific decoder p, no memoryless the same performance in the future. If the training and test quantizer encodercan do better than select the codeword performance are significantly different, then probably the u in M that will yield the minimum possible distortion at training sequence is not sufficiently long. In other words, the output, thatis, to select the channel symbolu yielding a source is asymp- the minimum do not try to prove mathematically that totically mean stationary, insteadtry to design codes for it d{x,PCy(x)l} = min d[x,.P(v)l = mind(x,y). (2) and then see if they work on new data. vEM YEC Henceforth for brevity we will write expectations with the assumption that they are to be interpreted as shortThat is, for a given'decoder ina memoryless vector quanhand for long term sample averages. (A sample average tizer the best encoder is a minimum distortion ornearest L-l d(Xi,k) is, in fact, an expectationwithrespectto neighbor mapping the sample distribution which assigns a probability of 1 / L to each vector in the training sequence.) d[x,p(u)], (3) y(x) = min-' Properties of optimal quantizers ,vEM A VQ is optimal if it minimizes anaverage distortion Ed{X,/3[y(X)]}.Two necessary conditions for a VQ to be optimal follow easily using the same logic as in Lloyd's [9] b ( i ) = bin of Figure 2. VQ Encoder. The distortion between the input vector and each stored codeword is computed. The encoded output is then the binary representation of the index of the minimum distortion codeword. where the inverse minimum notation means that we select the u giving the minimum of (2). Gersho [24] calls a quantizer with a minimum distortion encoder,a Voronoi quantizer since the Voronoi regions about a set of points in a space correspond to a partition of that space according to the nearest-neighbor rule. The word quantizer, however, is practically always associated withsuch a minimumdistortionmapping.Weobserve a minimum disthat such a vector quantizer with such tortion encoder is exactly the Shannon model for a block is source code subjectto a fidelity criterion which used in information theory to develop optimal performance bounds for data compression systems. An encoder y can be thought of as a partition of the input space into cells where all input vectors yielding a commonreproduction are groupedtogether. Such a partition according to a minimum distortion rule is called aVoronoiorDirichletpartition.Ageneralminimum distance VQ encoder is depicted In Fig. 2. A simple example of such a partition and hence of an encoder is depicted in Fig. 3 (a more interesting example follows shortly). Observe that this vector quantizer just is two uses of a scalar quantizer in disguise. As the minimum distortion rule optimizes the encoder of a memoryless VQ for a decoder, we can also optimize the decoder for a given encoder. Property 2: Given an encoder y, then no decoder can do better than that which assigns to each channel symbol u the generalized centroid (or center of gravityorbarycenter) of all source vectors encoded into u, that is, p(u) = cent(u) = m i n - l f(d(X,f) y(X) = u), E& , I (4) 8 IEEE ASSP MAGAZINE APRIL 1984 Case 3:06-cv-00019-MHP Document 108-7 p 4 4~ x X 1 X 0 1 X -1 X X X X 1 - The Euclidean centroids of the example of Fig. 3 are depicted in Fig. 4. (The numerical values may be found in [251.) The new codewords better represent the training vectors mapping into the old codewords, but they yield a different minimum distortion partition of the input alphabet, as indicated by the broken lineFig. 3. This is the key in of the algorithm: iteratively optimize the codebook for the old encoder and then use a minimum distortion encoder for the new codebook. i s somewhat TheItakura-Saitodistortionexample As morecomplicated,butstilleasilycomputable. with the squared error distortion, one groups all input vectors yielding a common channel symbol. Instead of averaging thevectors,however,the sample autocorrelationmatrices for all of the vectorsareaveraged.The centroid is then given by the standard LPC all-pole model forthis average autocorrelation, that is, the centroid is found by a standard Levinson's recursion run on the average autocorrelation. Filed 06/07/2007 Page 7 of 12 0 3 p3 x -1 X l O2 p2 X = training vectors 0 = codewords Pi = region encoded into codeword i pr 1 Figure 3 . Two-DimensionalMinimumDistortion Partition. The fourcirclesarethecodewordsofa two-dimensional codebook. The Voronoi regions are the quadrants containing the circles. The x's were produced by a training sequence of twelve two-dimensional Gaussian vectors. Eachinputvectoris mapped into the nearest-neighbor codeword, that is, the circle in the same quadrant. x X 01 x 4 0 I X 02 \ X \ \ \ P 2 \ that is, p(v)is the vector yielding the minimum conditional average distortion given that the input vector was mapped into v, While minimizing such a conditional averagemay bequitedifficultfor an arbitraryrandomprocessand distortion measure, it is often easy to find for a sample distribution and a nice distortion measure. For example, the centroid in the case of a sample distribution and a squared-error distortion measure is simply the ordinary Euclidean centroid or the vector sum of all input vectors is, encoded into the given channel symbol, that given the sample distribution defined by a training sequence {xi; i = 0,1,. . . ,L - I}, then cent(v) = - C. x i , i(v) x,:r(x,)=v I 30 p3 X X -1 \ \ \ \ \ \ \ \ \ \ where i(v) i s the number of indices i for which $ x i ) = V. s APRIL 1984 IEEE MAGAZINE ASSP Figure 4. Centroids of Figure 3. The new centroids of the oldVoronoi regions of Fig. 3 are drawn as circles. Note that the centroid computationhas moved the codewords t o better represent theinput vectors which yielded those codewords, that is, if one used the same encoder [as in Fig. 33, but replaced the reproduction codewords produced a t the decoder by these new centroids,the average distortion would decrease. The broken line delineates the new Voronoi regions for these codewords. 9 The generalized Lloyd algorithm Case 3:06-cv-00019-MHP NRandomNcodes Document 108-7 Filed 06/07/2007 Page 8 of 12 The fact that the encoder can be optimized for the decoder and vice versa formed the basis of Lloyd's origia scalar random nal optimal PCMdesignalgorithmfor variable with a known probability density function and a squarederrordistortion.ThegeneralVQdesignalgorithms considered here arebased on the simple observation that Lloyd's basic development i s valid for vectors, for sample distributions, and for a variety of distortion measures. The only requirement on the distortion measure is that one can compute the centroids. Thebasic algorithm is the following: Step 0. Given: A training sequence and an initial decoder. Step 1. Encode the training sequence into a sequence ofchannelsymbolsusingthegivendecoder minimumdistortionrule.Ifthe averagedistortion is small enough, quit. Step 2. Replace the old reproduction codeword of the decoder for each channel symbol u by the centroid of all training vectors which mapped into u in Step 1. Go to Step 1. Perhaps the simplest example of the first technique is that used in the k-means variation of the algorithm [261: Use the first 2R vectors in the training sequence as the initial codebook. An obvious modification more natural for highly correlated data is to select severalwidely spaced words from the training sequence. This approach is sometimes called random code generation, but we avoid this nomenclature because of i t s confusion with the random code techniques of information theory whichare used to prove the performance bounds. Product codes Another example of the first approach to use a scalar is code such as a uniform quantizer k times in succession and then prune the resulting vector codebook down to the correct size. The mathematical model for such a code is a product code, which we pause to define for current and later use: Say we have a collection of codebooks Ci, i = 0 , 1 , . . . ,m - 1, each consisting of Mi vectors of dimension ki and having rate Ri = logz Mi bits per vector. C Then the product codebook is defined as the collection Means of generating initial decoders will be considered of all M = HiMi possible concatenations of words drawn rn in thenext section. Each step of the algorithm must either successively from the m codebooks C i .The dimension of reduce average distortion or leave it unchanged.The the product codebook is k = Et;' ki, the sum of the dialgorithm is usually stopped when the relative distortion mensions of the component codebooks. The product decrease falls below some small threshold. The algorithm code is denoted mathematically as a Cartesian product: was developed for vector quantizers, training sequences, m-7 C = X Ci = {all vectors of the form (ko,%;**,k,,,-d; and general distortion measures by Linde, BUZO, and Gray i=O [25] and it is sometimes referred to as the LBG algorithm. k i i n Ci; i = O,l,.,.,m - I } Previously Lloyd's algorithmhad been considered forvectors and difference distortionmeasures in cluster analysis Thus, for example, using a scalar quantizer with rate R/k and pattern recognition problems (e.g., MacQueen [261 k times in succession yields a product k-dimensional vecandDidayandSimon[27])andintwo-dimensional tor quantizer of rate R bits per vector, This product code quantization (e.g., Chen [28] and Adoul et a/. [291). Only can be used as an initial code for the design algorithm. The recently,however, has it beenextensivelystudiedfor scalar quantizers may be identical uniform quantizers with vectorquantizationapplicationsusing several different a rangeselected to match the source, or they may be distortion measures. different, e.g., a positive codebook fora gain and uniform Before continuing, it should be emphasized that such quantizersfor[-1,1]forreflectioncoefficientsin an iterativeimprovementalgorithmsneednotingeneral LPC VQ system. It is knownthatsubjectto yieldtrulyoptimumcodes. In waveform coding applications where the reproducsomemathematicalconditionsthealgorithmwillyield tion and input alphabetsare thesame-k-dimensional locally optimum quantizers, but in general there may be Euclideanspace-analternative product code provides numeroussuchcodesandmany may yieldpoorperameans ofgrowingbetterinitial guesses from smaller formance. (See, e.g., [30].)It is often useful, therefore, to dimensional codes [31]. Begin with a scalar quantizer Co enhance thealgorithm'spotentialbyprovidingitwith and use a two-dimensional product code Co X Co as an it good initial codebooks and perhaps by tryingon several initial guess for designing a two-dimensional VQ. comOn different initial codebooks. pletion of the design we have atwo-dimensional code,say INITIAL CODEBOOKS The basic-design algorithm of the previous section an is iterativeimprovementalgorithmandrequires an initial code to improve. Two basic approaches have been developed: One can start with some simple codebook of the correct size or one can start with a simple small codebook and recursively construct larger ones. C2. Form an initial guess for a three dimensional code as all possible pairs from C2 and scalars from Ca, that is, use the product codeCz x Coas an initial guess. Continuing in this way, given a goodk - 1 dimensional VQ described by a codebookC k - l , an initial guess for a k-dimensional code design is the product code Ck-' x Co. One can also use such product code constructions with a different initial scalar code Co, such as those produced by thescalar version of the next algorithm. 10 IEEE ASSP MAGAZINE APRIL 1984 r------ Case 3:06-cv-00019-MHP Document 108-7 .. . . Splitting Filed 06/07/2007 Page 9 of 12 Instead of constructing long codes from smaller dimensional codes, we can construct a sequence of bigger codes havingafixeddimensionusinga"splitting"technique [25,16]. This method can be used for any fixed dimension, 0 including scalar codes. Here one first finds the optimum rate code-the centroid of the entire training sequence, as depicted in Fig. 5a for a two-dimensional input alphabet. This single codeword is then split to form two codewords (Fig. 5b). For example, the energycan be perturbed slightly to form a second distinct word or one might purposefullyfind aworddistant from the first. is convenient It to have the original codeword a member of the new pair to ensure that the distortion will not increase. The algorithm is then run to get a good rate 1 bit per vector code as indicated in Fig. 5c. The design continues in thisway in stages as shown: the final code of one is split to form stage an initial code for the next. VARIATIONS OF MEMORYLESS VECTOR QUANTIZERS In this section we consider some of the variations of memorylessvectorquantizationaimed at reducing the a full search computationormemoryrequirementsof memoryless VQ. Tree-searched VQ el Figure 5. Splitting. A large code is defined in stages: at each stage each codeword of a small code is split intotwo new codewords, giving an initial codebook of twice the size. The algorithm is run t o get a new better codebook. tal Rate 0: The centroid of the entire training sequence. [bl Initial Rate 1: Theone codeword is split t o form an initial guess for a two word code. [cl Final Rate 1: The algorithmproducesa good code with two words. The [ d l Initial dotted line indicatestheVoronoiregions, Rate 2: The two words are split t o form an initial guess for a four word code. [el Final Rate 2: The algorithm is run t o produce a final four word code. Tree-searched vector quantizers were first proposed by Buzo et a/. [I61 and are a natural byproduct of the splitting algorithm for generating initial code guesses. We focus on the case of a binary tree for simplicity, but more general a trees will provide better performance while retaining significant reduction in complexity. Say that we have a good rate 1 code as in Fig. 5c and we form a new rate two code by splitting the two codewords as in Fig. 5d. Instead of running a full search VQ design on the resulting 4-word codebook, however, we divide the training sequence into two pieces, collecting together all those vectors encoded into a common word in the 1 bit codebook, that is, all of the training sequence vectors in a common cell of the Voronoi partition. For each of these subsequences of training vectors, we then find a good I - b i t code using the algorithm. final codebook (so far) The consists of the four codewords in the two I-bit codebooks designed for the two subsequences. A tree-searched enan ordinary full coderselects one of the words not by search of this codebook, but instead it uses the first one bit codebook designed on the whole sequence to select a the second code and it then picks the best word in second code. This encoder can then be used to further subdivide thetrainingsequenceandconstruct even bettercodebooks for the subsequences. The encoder operation can be depicted as a tree in Fig. 6. The tree is designed one layer at a time; each new layer being designed so that the new codebook available from vectors encoded into the node. each node is good for the Observe that there 2R possible reproduction vectorsas are in the fullsearch VQ, but nowR binary searches are made instead of a single 2'?-ary search. In addition, the encoder APRIL 1984 IEEE ASSP MAGAZINE 11 Case 3:06-cv-00019-MHP Document 108-7Nonbinary trees can also be Page 10 of 12 ith layer Filed 06/07/2007 used where at the codebooks of rate Riare used and the overall rate is then ZjRi. For example, a depththreetreeforVQof LPC parameter vectors using successive rates of 4,4, and 2 bits per vector yields performance nearly as good as a full search VQ of the same total rate of 10 bits per vector, yet for the tree search one need only compute 24 24 z2 = 36 distortions instead of 2'' = 1028 distortions [IO]. Other techniques can be used to designtree-searched codes. For example, Adoul et a/. [32] use a separating hyperplaneapproach. Another approach is to begin with a full search codebook and to design a tree-search into the codeis book. One technique for accomplishing this to first group the codewords into close disjoint pairsand then form the centroids of the pairs as the node label of the immediate ancestor of the pair. One then worksbackwards through the tree,always grouping close pairs. Ideally, one would like a general design technique for obtaining a tree search into an arbitraryVQcodebook with only a small loss of average distortion. Gersho and Cheng [33] have reported preliminary results for designing a variable-length treesearch for an arbitrary codebook and have demonstrated its implementability for several small dimensional examples. ++ Multistep VQ A multistep VQ is a tree-searched VQ where only a single small codebook is stored for each layer of the treeinstead of a different codebook for each node ofeach layer. Such codes provide the computation reduction of tree-searched codes while reducing the storage requirements below that of even ordinary VQ`s.The first example of such a code was the multistage codebook [34].For simplicitywe again confine interest to codes which make a sequence of binarydecisions. The first layer binary codeis designed as in the tree-searched case. This codebook is used to encode the trainingsequence and then a training sequence of error or residual vectors is formed. For waveform coding applications the error vectors reconstruction k=R=3 are simply the difference of theinput vectors and their codecode book words. For vocodingapplications,theerrorvectors are Figure 6. Tree-Searched VCI. A binary encoder tree is residuals formed by passing the input waveform through the shown for a three-dimensional one bit per sample VQ. The inverse filter A ( z ) / a . The algorithm is then run to design a encoder makes a succession of R minimum distortion binary VQ forthis vector training sequence of coding errors. choices from binary codebooks, where the available codeThe reconstruction for these twobits i s thenformed by book at each level consists of labels of the nodes in the combining the two codewords: For waveform coding this is next level. Thelabels of the nodes of the final layer are the actualreproductioncodewords. A t each node the enaccomplishedbyadding the first codewordtotheerror coder chooses the minimum distortion available label and, is codeword. For voice coding this accomplished by using the if thenew index is a 0 (13 , sends a channel symbol 0 [I cascade of twoall-pole filters for of I synthesis. This reproduction and advances up [down) to the next node. After R binary can then be used to form a "finer" error vector and a code selections the complete channel codeword has been sent designed for it. Thus an input vector is encoded in stages as and the reproduction codeword specified to the decoder. with the tree-searched code, but now only R binary codeto be stored. books andhence 2R totalcodewordsneed Observe that there are still 2R possible final codewords, but we have not needed this much storage because the code can storage requirements have doubled. The encoder is no be constructed by adding different combinations a smaller of in longer optimal for the decoder the sense of Property 1 set of words. A multistage VQ is depicted in Fig. 7. since it n o longer can perform an exhaustive search of the codebook. The search, however, is much more efficient if done sequentiallythanis a full search. Thus one may trade Product codes performance for efficiency of implementation. Another useful structure formemoryless VQ is a proda - 12 IEEE ASSP MAGAZINE APRIL 1984 Case 3:06-cv-00019-MHP Document 108-7 Figure 8 sketches the surprisingfact that for the squared error case considered, the two-stepselection of the product codeword is an optimal encoding for the given product codebook. We emphasize that here the encoder is optimal for the given product codebook or decoder, but the codebook itself is in general suboptimal because of the constrained product form. A similar property holds for the Itakura-Saito distortion gainhhape Thus in this VQ. case if one devotes R, bits to the shape and R, bits to the R, = R, then one need only compute 2RS gain, where R, vectordistortionsand aneasyscalar quantization. The full search encoder would require 2 R vector distortions, yet bothencodersyieldthe same minimumdistortion codeword! Filed 06/07/2007 Page 11 of 12 + ENCODER DECODER Figure 7 . Multistage VQ with 2 Stages. The input vector is first encoded by one VQ and an error vector is formed. The second VQ then encodes the error vector. The two channelsymbols from the two VQ's together form the complete channel symbol for the entireencoder. The decoder adds together the corresponding reproduction vectors. uct code,In one extreme, multiple use of scalar quantizers is equivalent to productVQ's and are obviously simple to implement. More general product VQ`s, however,may permit one totake advantage of the performanceachievable by VQ's while still being able to achieve the higher rates required for good fidelity. In addition, such codes may yield a smaller computational complexity than an ordinary VQ of the same rate and performance (but different dimension). The basic technique is useful when there are differing aspects of the input vector that one might wish to code separately because of different effects, e.g., on dynamic range or finite word length implementation. Gainlshape VQ ENCODER One example of a product code is againishape VQ where separate, but interdependent, codes are used to code the "shape" and "gain" of the waveform, where the "shape" is defined as the original input vector normalized by removal of a "gain" term suchas energy in a waveform coderor LPC residualenergyinavocoder.Gainishape encoders were introduced by Buzo et a/. [I61 and were subsequentlyextendedandoptimizedby Sabin and Gray [35,36]. A gain/shape VQ for waveform coding with a squared-error distortion is illustrated in Fig. 8. DECODER energy shape Figure 8. GainiShape VQ. Firstaunit vector is chosen t o match the input vector by maximizing the inner product over the codewords. Given the resulting shape vector, a scalargain codeword is selectedso as t o minimize the indicated quantity. The encoder yields the product codeword aiyiwith theminimum possible squared error distortion from the input vector. this multistep Thus encoder is optimum for the product codebook. APRIL 1984 IEEE ASSP MAGAZINE 13 search, Variations of the basic VQ algorithm can be used 108-7Ideaily, onewouldliketotakeaPage 12 of 12 unconCase 3:06-cv-00019-MHP Document to Filed 06/07/2007 full strained VQ and find some fast means of encoding having iteratively improve a gain shape code by alternately optimizing theshape for thegain and viceversa. The resulting complexity more like the above techniques than that of of m u l t i conditional centroids are easy to compute. The centroid thefullsearch.Forexample,someform updates can be made either simultaneously or alternately. dimensional companding followed by a lattice quantizer as suggested byGersho [24] would provide both good after each iteration [36]. One can experimentally determine the optimal bit allo- performanceand efficient implementation. Unfortunately, however, no design methods accomplishing this goal have cation between the gain and the shape codebooks. yet been found. Separating mean V Q FEEDBACK VECTOR QUANTIZERS Anotherexample of a multistep product code is the Memory can be incorporated into a vector quantizer in separating mean VQ where a sample mean instead of an a simple manner by using different codebooks for each energy term i s removed [37]. Define the sample mean (x) of a k-dimensional vector by k-' XFIi xi. In a separated input vector, where the codebooks are chosen based on past input vectors. The decoder must know which codemean VQ one first uses a scalar quantizer t o code the sample mean of a vector, then the coded sample mean is book is being used by the encoder in order to decode can be accomplished in two subtracted from all of the components of the input vectorthechannelsymbols.This to form a new vector with approximately zero sample ways: 1 ) The encoder can use a codebook selection procepast encoderoutputsand mean. This new vector is then vector quantized. Such a durethatdependsonlyon system is depicted in Fig. 9. The basic motivation here is hencethe codebooksequence can betrackedbythe that in image coding the sample mean of pixel intensities decoder. 2) The decoder is informed of the selected codein a small rectangular block represents a relatively slowly book via a special low-ratesidechannel. The first apis the proach is called feedback vector quantization and varying average backgroundvalueofpixelintensity around which there are variations. To design such a VQ, first use the algorithm to design a (x,), scalar quantizerforthesamplemeansequence j = 0 , 1 , . . .,L - 1. Let $(x)) denote the reproduction for (x) using the quantizer. Then use the vector training sequence x, - q ( ( x j } ) lwhere 1 = (1, l f . .. , I ) , to design a , VQ for the difference. Like the gainishapeVQ, a product codebook and a multistep encoder are used, but unlike the gainishape VQ it can beshownthat.themultistep encoderhere does not select the best possible mean, shape pair, that is, the multistep encoderis not equivalent to a full search encoder. Lattice VQ A final VQ structure capable of efficient searches and memory usage is thelatticequantizer,ak-dimensional generalization of the scalar uniform quantizer. A lattice in k-dimensional space is a collection of all vectors of the form y = E;=-; aiei, where n Ik , where eo,.. . e,-1 are a ENCODER set of linearly independent vectors inRk, and where theai are arbitrary integers. A lattice quantizer is a quantizer whose codewords form asubset of a lattice. Lattice quan[38] andthepertizerswereintroducedbyGersho formance and efficient coding algorithms were developed formanyparticularlatticesbyConwayandSloane [39,40f41] and Barnes and Sloane [42]. The disadvantage o f lattice quantizers is that they cannot be improved by a DECODER variation of the Lloyd algorithm without losing their structure and good quantizers produced by the Lloyd algorithm Figure 9. Separating Mean VQ. The sample mean of the cannot generally be well approximated by lattices. Lattice input vector is computed, scalar quantized,and then subcodes can work well on source distributions that are apt r a c t e d from each component of the input vector. The resulting vector with approximately zero sample mean is proximately uniform over a bounded region of space. In fact, lattices that are asymptotically optimal in the limit of then vector quantized. The decoderadds the coded large rate are known for thiscase in two and three dimen- sample meant o all components of the coded shapevector. to sions and good lattices are known for dimensions up 16. t 14 IEEE ASSP MAGAZINE APRIL 1984

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?