CNET Networks, Inc. v. Etilize, Inc.

Filing 188

MEMORANDUM AND ORDER by Judge Marilyn Hall Patel: defendants motion on noninfringement is GRANTED in part and DENIED in part; plaintiffs motions to strike are DENIED; defendants motion to bifurcate is DENIED; and defendants motion to file a supplemental brief is DENIED(awb, COURT-STAFF) (Filed on 9/2/2008)

Download PDF
1 2 3 4 5 6 7 8 9 10 11 UNITED UNITED STATES DISTRICT COURT 12 For the Northern District of California UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF CALIFORNIA CNET NETWORKS, INC., v. ETILIZE, INC., Defendant. _____________________________________/ Plaintiff CNET Networks, Inc. ("CNET") filed this action against Etilize, Inc. ("Etilize") alleging infringement of two patents that teach a process of compiling information about various consumer products into a database. Now before the court is defendant Etilize's motion for summary judgment on noninfringement of the asserted claims of the two patents, United States Patent Nos. 6,714,933 ("the `933 Patent") and 7,082,426 ("the `426 Patent"). Having fully considered the parties' arguments and submissions1 and for the reasons set forth below, the court enters the following memorandum and order. Plaintiff, No. C 06-5378 MHP MEMORANDUM & ORDER Defendant's Motion for Summary Judgment on Noninfringement 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 BACKGROUND I. The Patented Inventions This infringement action relates to patents claiming methods and processes that aggregate content for online purchasing and cataloging systems. The `933 Patent, issued on March 30, 2004, and the `426 Patent, issued on July 25, 2006, disclose methods of aggregating product information from a plurality of sources using crawlers, computational linguistics and software. 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California A. The `933 Patent The `933 Patent, titled "Content Aggregation Method and Apparatus For On-Line Purchasing," claims methods of gathering information about various products from multiple sources for storage in a product database. The two claims at issue for the `933 Patent are Claims 1 and 15. Claim 1 claims: A method of aggregating product information for use in a product database including various products arranged in product categories, the product information being collected from a plurality of sources in a networked computer environment regarding products of a product category comprising the steps of: generating a crawler from a server interconnected to the network computer environment to visit the plurality of sources; gathering product phrase information from each of the plurality of sources via said crawler; and determining whether at least one phrase of said product phrase information is a product characteristic associated with a product category; wherein said crawler utilizes computational linguistics to gather said product phrase information which includes a phrase and at least one characteristic of said phrase. `933 Patent at 18:49­65. Claim 15 depends upon the method of Claim 1. Specifically, a crawler generated from a server interconnected to a network computer environment gathers pertinent phrase information from a plurality of sources. In other words, a crawler--a software program that visits websites and has the ability to identify and gather information from these sites--is launched from a computer connected to the internet. This crawler scours different websites on the internet to gather pertinent information. Pertinent information is determined through the use of computational linguistics--a field of statistical or rule based modeling of natural language that uses computational analysis. The crawler then determines whether at least one phrase of the phrase information gathered is a product characteristic associated with a product category. B. The `426 Patent 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 The `426 Patent, titled "Content Aggregation Method and Apparatus For An On-Line Product Catalog," is a continuation-in-part of the `933 Patent. It populates a catalog by categorizing 2 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California and storing into the catalog product information from multiple web pages. The `426 Patent claims methods of processing disparate product information records from various sources into one or more groups. Determining the group in which to place a particular record depends on which product information records are likely to correspond to the same product. Each group, which corresponds to a particular product, is given a unique identifier. This identifier is then compared to categories in a taxonomy to determine a category for that particular product in the taxonomy. Finally, product attributes are determined for each categorized product based on the earlier collected product information records for that product. There are eight claims at issue for the `426 Patent--Claims 1, 14, 16, 20, 23, 24, 39 and 52. Of these only Claims 1, 39 and 52 are independent while the rest are dependent. 1. Claim 1 Claim 1 of the `426 Patent claims as follows: A method of creating a product catalog stored on computer readable media by aggregating product information from a plurality of product information sources having disparate formats for product information and storing the information in a taxonomy, said method comprising: processing plural product information records from the product information sources into one or more groups based on which product information records are likely to correspond to the same product; correlating a unique product ID corresponding to the product associated with each of said groups to identify the product; electronically comparing each identified product to categories of a taxonomy to determine a category for the identified products in the taxonomy; and electronically parsing the product information records corresponding to each group to electronically determine attributes for each categorized product based on the product information records; electronically generating product specifications based on the determined attributes; and storing the product specification in the corresponding determined categories of the taxonomy. `426 Patent at 36:22­45. Claims 14, 16, 20, 23 and 24 depend upon Claim 1. 2. Claim 39 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Claim 39 of the `426 Patent is substantially similar to Claim 1. However, Claim 39 does not 3 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California store the information comprising the product catalog in a taxonomy. Furthermore, Claim 39 repeats the processing and correlating steps after performing the comparing step to revise the groups in which the product information records belong. See `426 Patent at 39:66­40:23.2 3. Claim 52 Claim 52 of the `426 Patent claims as follows: A method of aggregating product information from a plurality of product information sources in a networked computer environment comprising the steps of: generating a crawler from a server interconnected to the network computer environment to visit the plurality of sources; gathering product phrase information and characteristics of said product phrase information from each of the plurality of sources via said crawler; grouping said product phrase information based on which product phrase information are likely to correspond to the same product and based on the characteristics of said product phrase information; electronically parsing said grouped product phrase information to determine attributes for each product based on at least one of the product phrase information and the characteristics of said product phrase information; and creating a catalog of products based on the determined attributes. `426 Patent at 41:36­56. C. Claim Terms 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 The court uses the following definitions for claim terms that are implicated by the instant motion: Source Computational Linguistics Phrase Crawler A webpage or other document that may be defined by a URL, or text, graphics, or links within such webpage or other document. A cross-disciplinary field of modeling of language utilizing computational analysis to process language data. A string of characters, such as an alpha-numeric character string or strings present in a source. A software program or programs which visit and search sources of content on a networked computer environment; have the capability to identify and gather information from the sources; and can include bots, robots, automated site searchers, and the like. 4 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California Phrase Characteristics Some attribute of a phrase that can be used to distinguish it from other phrases, such as the frequency, location, font size, font style, font case, font effects, and font color as well as the frequency of collocation (phrases immediately next to each other) and co-occurrence of phrases (phrases within a predetermined number of words of each other) includes the co-occurrence of phrases, which is when a particular word is within a number of words of another (e.g., "weight" and "lbs."). See generally Docket Nos. 66, 82. II. The Accused Product The accused product, SpeX, is created by Etilize through the use of a pair of software tools called aQuire and Xtract. Etilize uses these tools in situations where manufacturers store information about their products in a consistent manner on their website. These tools help extract information from the website in an automated fashion. Hameed Dec. ¶ 1. aQuire is a semiautomated search script used to download entire web pages describing a particular product's specifications. Id. ¶ 11. Xtract is then used to extract data from these downloaded web pages for entry into Etilize's templates. Id. ¶ 12. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 A. aQuire The aQuire tool fetches web pages by using predefined URLs and simple searching patterns. Id. ¶ 11. aQuire downloads all of the web pages from a specified URL directory by sequentially accessing the web pages associated with each product. It does so by sequentially substituting different model numbers and their associated product categories into the URL template. Id. For example, if the product category is "laptop computers" made by Dell and Dell has five separate models associated with this product category, then each such model number and the product category "laptop computers" would be substituted into the URL template. Id. For instance, aQuire would first download www.dell.com/laptop_comptuer/model#1, followed by www.dell.com/laption_computer/model#2, and so on. This process would repeat until all web pages for the models known to aQuire were downloaded by aQuire. Id. aQuire does not look for unknown 5 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California products; it is configured by humans to fetch web pages associated with predefined products if a manufacturer's website is consistently structured. Id. Specifically, aQuire simply stores "copies of the pages that it accesses on an Etilize Pakistan server to facilitate data extraction at a later time." Id. B. Xtract After aQuire downloads and stores copies of certain web pages on the Etilize server, Xtract is used to both "obtain raw attribute/value pairs (e.g. attribute: hard drive capacity; value: 40GB)" from these downloaded websites and to store these gathered pairs into the relevant product categories in Etilize Pakistan's templates. Id. ¶ 12. In order to gather and store these attribute/value pairs, Xtract utilizes predefined expression patterns to parse the downloaded web pages. Id. The Xtract software tool is semi-automated and researchers must specify, for each product attribute: 1) where to extract the information from the web page; 2) what information to extract; and 3) where to put it in the Etilize template. Id. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 LEGAL STANDARD I. Summary Judgment As in any other civil action, summary judgment is proper in a patent infringement action when the pleadings, discovery and affidavits show that there is "no genuine issue as to any material fact and that the moving party is entitled to judgment as a matter of law." Fed. R. Civ. P. 56(c); see also Southwall Techs., Inc. v. Cardinal IG Co., 54 F.3d 1570, 1575 (Fed. Cir.), cert. denied, 516 U.S. 987 (1995). Material facts are those which may affect the outcome of the case. Anderson v. Liberty Lobby, Inc., 477 U.S. 242, 248 (1986). A dispute as to a material fact is genuine if there is sufficient evidence for a reasonable jury to return a verdict in favor of the nonmoving party. Id. The party moving for summary judgment bears the burden of identifying those portions of the pleadings, discovery and affidavits that demonstrate the absence of a genuine issue of material fact. Celotex Corp. v. Cattrett, 477 U.S. 317, 323 (1986). On an issue for which the opposing party 6 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California will have the burden of proof at trial, the moving party need only point out "that there is an absence of evidence to support the nonmoving party's case." Id. at 325; Crown Operations Int'l, Ltd. v. Solutia, Inc., 289 F.3d 1367, 1377 (Fed. Cir. 2002). On the other hand, where the moving party bears the burden of proof on an issue, it must submit evidence sufficient to establish that no reasonable jury could find against it on that issue at trial. See Frank's Casing Crew & Rental Tools, Inc. v. Weatherford Int'l, Inc., 389 F.3d 1370, 1376 (Fed. Cir. 2004); Gart v. Logitech, Inc., 254 F.3d 1334, 1339 (Fed. Cir. 2001), cert. denied, 534 U.S. 1114 (2002). Once the moving party meets its initial burden, the nonmoving party must go beyond the pleadings and, by its own affidavits or discovery, "set forth specific facts showing that there is a genuine issue for trial." Fed. R. Civ. P. 56(e). Mere allegations or denials do not defeat a moving party's allegations. Id.; Gasaway v. Nw. Mut. Life Ins. Co., 26 F.3d 957, 960 (9th Cir. 1994). The court may not make credibility determinations, and inferences to be drawn from the facts must be viewed in the light most favorable to the party opposing the motion. Masson v. New Yorker Magazine, 501 U.S. 496, 520 (1991); Anderson, 477 U.S. at 249. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 II. Patent Infringement Under the Patent Act, 35 U.S.C. section 271, liability for patent infringement may be imposed on any person who without permission of the patentee "makes, uses, offers to sell, or sells any patented invention[] within the United States or imports into the United States any patented invention during the term of the patent therefor." The rights granted to the patentee are defined by the patent's claims. Markman v. Westview Instruments, Inc., 517 U.S. 370, 373 (1996). In determining whether an allegedly infringing device falls within the scope of the claims, a two-step process is used: first, the court must determine as a matter of law the meaning of the particular claim or claims at issue; and second, it must consider whether the accused product infringes one or more of the properly construed claims. Id. at 384; Allen Eng'g Corp. v. Bartell Indus., Inc., 299 F.3d 1336, 1344 (Fed. Cir. 2002). The second inquiry is a question of fact, although summary judgment of infringement or noninfringement may nonetheless be appropriate 7 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California when no genuine dispute of material fact exists. Irdeto Access, Inc. v. Echostar Satellite Corp., 383 F.3d 1295, 1299 (Fed. Cir. 2004) (quoting Bai v. L & L Wings, Inc., 160 F.3d 1350, 1353 (Fed. Cir. 1998)). The patentee bears the burden of proving infringement by a preponderance of the evidence. Laitram Corp. v. Rexnord, Inc., 939 F.2d 1533, 1535 (Fed. Cir. 1991). This burden can be met by showing that the patent is infringed either literally or under the doctrine of equivalents. See Linear Tech. Corp. v. Impala Linear Corp., 379 F.3d 1311, 1318 (Fed. Cir. 2004). To support a finding of literal infringement, the patentee must establish that "every limitation recited in the claim appears in the accused product, i.e., the properly construed claim reads on the accused product exactly." Jeneric/Pentron, Inc. v. Dillon Co., 205 F.3d 1377, 1382 (Fed. Cir. 2000) (citing Amhil Enters. Ltd. v. Wawa, Inc., 81 F.3d 1554, 1562 (Fed. Cir. 1996)). Alternatively, where one or more elements of the claim are not literally present in the allegedly infringing product or process, infringement may nonetheless be found under the doctrine of equivalents if the differences between the accused device and the patented invention are "insubstantial." Honeywell Int'l, Inc. v. Hamilton Sundstrand Corp., 370 F.3d 1131, 1139 (Fed. Cir. 2004) (quoting Eagle Comtronics, Inc. v. Arrow Communication Labs., Inc., 305 F.3d 1303, 1315 (Fed. Cir. 2002)), cert. denied, 537 U.S. 1172 (2003). As with literal infringement, the inquiry into whether infringement may be found under the doctrine of equivalents requires an element-byelement comparison of the patented invention to the accused device. Warner-Jenkinson Co. v. Hilton Davis Chem. Co., 520 U.S. 17, 40 (1997). Consequently, in applying the doctrine, the court must consider whether the accused device "contain[s] elements that are either identical or equivalent to each claimed element of the patented invention." Id.; EMI Group N. Am., Inc. v. Intel Corp., 157 F.3d 887, 896 (Fed. Cir. 1998), cert. denied, 526 U.S. 1112 (1999). Under the classic formulation of the doctrine of equivalents set forth in Graver Tank & Mfg. Co. v. Linde Air Prods. Co., 339 U.S. 605, 608 (1950), a feature of the accused device is "equivalent" to an element of the claimed invention if it performs substantially the same function in substantially the same way to achieve substantially the same result. See also Schoell v. Regal Mar. 8 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California Indus., Inc., 247 F.3d 1202, 1209­10 (Fed. Cir. 2001). However, as the Supreme Court subsequently acknowledged in Warner-Jenkinson, this particular "linguistic framework" may not be appropriate in every case. 520 U.S. at 39­40. Rather, the Court observed that "[a]n analysis of the role played by each element in the context of the specific patent claim [must] inform the inquiry as to whether a substitute element matches the function, way, and result of the claimed element, or whether the substitute element plays a role substantially different from the claimed element." Id. at 40. A number of other considerations may also be relevant in determining the range of equivalents to which the claimed invention is entitled, including the prosecution history of the patent-in-suit, the pioneer status of the invention (or lack thereof), and the limitations on patentability of the allegedly equivalent device that would have been imposed by the existing prior art at the time that the patent application was filed. Intel Corp. v. Int'l Trade Comm'n, 946 F.2d 821, 842 (Fed. Cir. 1991); see also K-2 Corp. v. Salomon S.A., 191 F.3d 1356, 1366­68 (Fed. Cir. 1999). 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 I. Manual Processes The court notes that Etilize's manual collection of product information for input into the taxonomy, which it utilizes a majority of the time, cannot infringe upon either the `933 Patent or the `426 Patent. During patent prosecution, in order to distinguish its `426 Patent application from Blutinger, a prior art reference, CNET disavowed: 1) the manual determination of attributes; 2) the manual comparison of products to categories of the taxonomy; and 3) the manual generation of product specifications based on the determined attributes. Khaliq Dec., Exh. G at 28. CNET stated that "in contrast [to Blutinger] the present invention parses the product information records corresponding to each group electronically to determine the attributes;" makes comparisons "between the [sic] each identified product to categories of a taxonomy . . . electronically, not 9 DISCUSSION Etilize argues that multiple limitations in the claims of the patents-in-suit are not practiced by its devices. Before reaching the merits, the court discusses a threshold matter with respect to the manual processes employed by Etilize. 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California manually;" and generates product specifications based on the determined attributes electronically. Id. Further, during patent prosecution, CNET also disavowed the manual collection of desired product information for input into the database in order to distinguish its `933 Patent application from Blutinger. CNET stated that "Blutinger does not disclose or suggest implementing [a system utilizing a crawler] to extract information from a website or a URL, but rather, discloses a centralized master catalog that is essentially manually generated." Id. at 30. Etilize, similar to Blutinger, utilizes a team of researchers who "typically go to a manufacturer's website, . . . gather the desired information, and fill the gathered information into the relevant fields of the template." Hameed Dec. ¶ 11. Further, Etilize utilizes a template based system for categorizing, comparing and generating product specifications and the creation of this template is a manual process. Id. ¶ 9. Accordingly, Etilize's use of manual methods to perform the comparing, determining, generating and gathering functions cannot infringe either of CNET's patents. The court turns now to CNET's patent infringement claim based on Etilize's use of software tools to extract information from websites. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 II. Infringement of the `933 Patent Etilize argues its software tools do not infringe the `933 Patent because they do not (1) use a crawler (2) generated from a server (3) that visits a plurality of sources and (4) uses computational linguistics. The court discusses each in turn. A. Crawler A crawler is "a software program or programs which visit and search sources of content on a networked computer environment; have the capability to identify and gather info from the sources; and can include bots, robots, automated site searchers and the like." Docket No. 82 at 6. Rather than arguing that it does not employ software that falls within the construction of crawler provided by the court, Etilize focuses on the fact that because it does not use a server to deploy a crawler, and because it does not utilize a program which visits a "plurality of sources," it cannot be using a crawler. Both the "server" and "plurality of sources" issues are discussed below. However, to the 10 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California extent that Etilize argues that it does not utilize a crawler, the court rejects this argument. aQuire is a "software program" that visits web pages or "sources" of content in a "networked computer environment" (i.e., the internet). Further, Xtract is a software tool that has the ability to "identify and gather" information from these web pages. Accordingly, the combination of aQuire and Xtract constitutes use of a crawler by Etilize. B. Generated from a Server Interconnected to a Network Computer Environment Etilize argues that aQuire is deployed by human operators from individual client computers connected to the internet, which it contends are not servers as required by Claim 1 of the `933 Patent and Claim 52 of the `426 Patent. Thus, Etilize is essentially making two arguments for the proposition that it does not generate a crawler from a server: 1) the crawler is not automatically generated but instead is deployed by human operators; and 2) the crawler is deployed from an individual client computer which does not constitute a server. Etilize's first argument has already been rejected by this court in its claim construction order. In rejecting Etilize's argument that a crawler must operate without human intervention, the court stated that "crawlers are not intended or claimed as software which operate perpetually, without any human intervention or instruction. Neither patent disclaims human initiation of the crawler search." Docket No. 82 at 12. Indeed, neither patent speaks to who deploys the crawler. Accordingly, a crawler deployed by a human operator can meet the "generating a crawler from a server" limitation at issue here. The court now turns to Etilize's second argument that it does not generate a crawler from a server. The preferred embodiments of both the `933 and `426 patents state that a server "refers to any type of computing device . . . such as a personal computer, a portable computer . . . a hand held device, a wireless phone, or any combination of such devices. The various clients and servers can be a single computer at a single location." `933 Patent at 10:24­28 (emphases added); see also `426 Patent at 8:34­38. Accordingly, because a single computer can constitute a server, and Etilize utilizes a crawler that is deployed from individual client computers connected to the internet, Etilize practices this limitation found in Claim 1 of the `933 Patent and Claim 52 of the `426 Patent. 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California Alternatively, this limitation is met under the doctrine of equivalents even if a "computer" connected to the internet, as opposed to a "server," generates the crawler. C. Plurality of Sources A "source" is "a webpage or other document that may be defined by a URL, or text, graphics, or links within such webpage or other document." Docket No. 66, Exh. A at 1. This agreed upon definition is not surprising as the preferred embodiments of both the `933 and `426 patents state that the "crawler may crawl through the plurality of Web pages linked to the home Web page to gather product phrase information." `933 Patent at 12:55­58; `426 Patent at 11:11­14. Etilize now contends that the term "plurality of sources" should be construed as referring to web pages hosted by separate manufacturers or merchants. Thus, it is conflating the definition of "source" with that of an internet domain. However, as is clear from the parties' agreed construction of "source," a plurality of sources can be a plurality of web pages and is not limited to a plurality of separately hosted websites. The plain language of the construction states that a source is a "webpage . . . that may be defined by a URL." There is no argument that aQuire visits a multitude of web pages, each defined by a separate URL. Consequently, this limitation is practiced by the software tools utilized by Etilize. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 D. Computational Linguistics The final question is whether Etilize literally, or under the doctrine of equivalents, infringes the computational linguistics limitation of Claims 1 and 15 of the `933 Patent. The Claim limitation states that the "crawler utilizes computational linguistics to gather said product phrase information which includes a phrase and at least one characteristic of said phrase." `933 Patent at 18:63­65. A "phrase" is a "string of characters, such as an alpha-numeric character string or strings present in a source." Docket No. 66, Exh. A at 1. Further, "phrase characteristics" is defined as: Some attribute of a phrase that can be used to distinguish it from other phrases, such as the frequency, location, font size, font style, font case, font effects, and font color as well as the frequency of collocation (phrases immediately next to each other) and co-occurrence of phrases (phrases within a predetermined number of words of each 12 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California other) includes the co-occurrence of phrases, which is when a particular word is within a number of words of another (e.g., `weight' and `lbs.'). Id. Based on the above, if the string of characters "weight: 5.8 lbs" appears in a source and that string is gathered, then "product phrase information" could have been gathered. Specifically, the cooccurrence of the phrase "weight" and the phrase "lbs" within a few words of each other distinguishes the phrase "weight" from other phrases. Thus, the "at least one characteristic of said phrase" limitation is met by the string through co-occurrence. Further, there can be no argument that the string of characters constituting the word "weight" is a phrase. Consequently, since both a "phrase" and "at least one characteristic of said phrase" have been gathered, "product phrase information" could have been gathered. There is also no argument that Xtract gathers exactly this type of information. The only remaining question is whether "computational linguistics" was utilized to gather the product phrase information. "Computational linguistics" is "[a] cross disciplinary field of modeling of language utilizing computational analysis to process language data." Id. CNET's expert testifies that the definition of "computational linguistics" encompasses the use of regular expressions which he contends "use computational analysis to process text language data and are a way to describe text through pattern matching." Gray Dec., Exh. A at 36. CNET contends Xtract practices this limitation by its use of regular expressions. A regular expression is a string of characters that is used to describe or match a set of strings according to certain rules used to construct sentences in natural languages. The specific rules of construction vary depending on the task, but regular expressions can search, manipulate and process text-based patterns. Language data, e.g., "weight: 5.8 lbs" can be matched according to a regular expression that looks for the following pattern: the string "weight: #.#" followed immediately by either the string "lbs" or "oz" and where # stands for any string of numerical characters only. Computational analysis would be required to parse a phrase to determine if it matches the language data sought. Thus, this modeling of parsed phrases could meet the definition of computational linguistics. Accordingly, CNET's expert is correct in concluding that regular expressions could be encompassed within the construction of computational linguistics. Alternatively, even if Etilize does not use 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 13 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California regular expressions, the court finds that use of pattern matching to find relevant phrases on a web page could meet the definition of computational linguistics. This is because pattern matching utilizes computational analysis (comparing letters in a string) to process language data (phrases in the web page). Etilize incorrectly argues that because it utilizes a manual process, and not a crawler, it cannot carry out the computational linguistics required by Claim 1. Etilize thus converges two separate inquiries into one because if the court finds use of a crawler, the computational linguistics issue remains ripe. Because of this conflation, Etilize correctly argues only that aQuire does not carry out computational linguistics and thus does not infringe the `933 Patent. It, however, ignores the parsing done by Xtract.3 Etilize acknowledges that Xtract pulls or gathers attribute/value pairs from the collected web pages using predefined expression patterns.4 Hameed Dec. ¶ 12. Based on this admission, the court finds it unnecessary to rely upon the Christensen declaration, which points out the deficiencies in CNET's expert report. Indeed, even though Etilize asserts that much of CNET's expert declaration is deficient because it analyzes source code that is not used by Xtract, nowhere does Etilize state that Xtract does not make use of predefined expressions. Further, instead of refuting the argument that use of predefined expressions constitutes use of computational linguistics, Etilize argues that because the use of Xtract is not fully automated, it cannot utilize computational linguistics. When making this argument Etilize states that since "researchers have to specify for every product attribute where to extract the information from the web page, what information to extract, and where to put it in the template," id., Xtract does not utilize computational linguistics. This argument fails for two reasons. First, the court has already rejected Etilize's proposition that two additional limitations--"automatically" and "without human intervention"--should be inserted into the term "crawler." Docket No. 82 at 10. Thus, simply because some, or even a majority, of the operations Etilize performs to gather product information are performed manually does not necessitate that Xtract cannot utilize computational linguistics. Second, although researchers have to specify where to extract product information from a web page, what information to extract and where to put it in the template, the actual obtaining or gathering of the attribute/value pairs, e.g., "attribute: weight; 14 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California value: #.# lbs," is nevertheless performed by Xtract through the use of predefined expression patterns. The expression patterns used by Xtract can be used to gather product phrase information because they gather a phrase and at least one characteristic of said phrase as required by Claim 1 of the `933 Patent. For example, where a string of characters with the values of "price," "hard drive capacity," and "weight" are found on a website, regular expressions may be used to produce the following results: "price: $1299.99;" "hard drive: 40GB;" and "weight: 5.4 lbs." In sum, it is the use of expression patterns that may constitute the use of computational linguistics to gather product phrase information. Etilize argues that Xtract does not gather any formatting or other characteristics of the extracted attributes; instead, it simply allows a research operator to "pull" attribute/value pairs from isolated web pages for entry into the template. These two statements by Etilize are inconsistent because "pulling" attribute/value pairs is synonymous with "gathering" those pairs. Since pulling name/value pairs of data through pattern matching can be the utilization of computational linguistics to gather product phrase information, the court concludes that CNET has met its burden and this limitation may be practiced by Etilize.5 In sum, since the software tools employed by Etilize, aQuire and Xtract, could, in combination, be found to practice every limitation set forth in Claims 1 and 15 of the `933 Patent, summary judgment on non-infringement is DENIED. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 III. Infringement of the `426 Patent Etilize argues its software tools do not infringe the claims of the `426 Patent because they do not practice the following limitations: 1) grouping; 2) electronic comparing; 3) electronic parsing; 4) electronic generating; 5) use of a crawler; 6) crawler generated from a server; and 7) visiting a plurality of sources. Limitations five through seven are discussed above and the remaining limitations, to the extent necessary, are discussed here. However, the court first discusses Etilize's contention that it does not perform the claim limitations in the order specified in the claims. 15 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California A. Order of Performance The court accepts Etilize's argument that the claims at issue here cannot be infringed unless the accused product practices each limitation in the same order as stated in the claim. CNET cites to Loral Fairchild Corp. v. Sony Corp., 181 F.3d 1313, 1322 (Fed. Cir. 1999), to argue that order is irrelevant if all the limitations are practiced by the accused device. In Loral, the court concluded that "although not every process claim is limited to the performance of its steps in the order written, the language of the claim, the specification and the prosecution history [may] support a limiting construction." Id. at 1321. Additionally, where the "sequential nature of the claim steps is apparent from the plain meaning of the claim language and nothing in the written description [of the patent] suggests otherwise" the steps of the claim are to be performed in sequential order. Mantech Envtl. Corp v. Hudson Envtl. Servs., Inc., 152 F.3d 1368, 1375­76 (Fed. Cir. 1998). Here, the plain meaning of the claim language supports a finding that the steps within Claims 1, 39 and 52 are to be performed sequentially. Claim 1 of the `426 Patent recites: A method of creating a product catalog stored on computer readable media by aggregating product information from a plurality of product information sources having disparate formats for product information and storing the information in a taxonomy, said method comprising: processing plural product information records from the product information sources into one or more groups based on which product information records are likely to correspond to the same product; correlating a unique product ID corresponding to the product associated with each of said groups to identify the product; electronically comparing each identified product to categories of a taxonomy to determine a category for the identified products in the taxonomy; and electronically parsing the product information records corresponding to each group to electronically determine attributes for each categorized product based on the product information records; electronically generating product specifications based on the determined attributes; and storing the product specification in the corresponding determined categories of the taxonomy. `426 Patent at 36:22­45. 16 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California Claim 1 must be performed sequentially. The initial step, the processing step, creates the groups. The second step correlates a unique product ID to each group. No correlation of a product ID to the groups can occur until the groups are created. The third step compares the unique product to categories of a taxonomy. However, no categorization can occur until the product is identifiable by a product ID. The fourth step determines the categorized product's attributes. This obviously requires that the product be categorized. The fifth step generates product specifications based on the determined attributes. However, no specifications could be generated until the attributes are determined in the fourth step. Finally, the product specification is stored in the previously determined category of the taxonomy. No product specification could be stored until it is created in the previous step. Thus, the steps of this method claim must be performed in order. Because Claim 39 is substantially similar to Claim 1, the same conclusion follows for Claim 39.6 Etilize's process categorizes the product in question before any of the steps in the claims are performed. Specifically, after Etilize determines that information about a particular product must be acquired, it first classifies the product to a category. Hammed Dec. ¶ 8. This classification can be manual or automatic. Id. Since each product category has a template associated with it, upon association of a product with a category, a product template with relevant product attributes is also associated with the product. Id. ¶ 9. For instance, the template for the category of notebook computer would include relevant product attributes such as processor technology, RAM memory space, hard drive capacity and display size. If no template exists for the category, one is created manually. Id. The template could then be populated using aQuire and Xtract. Claims 1 and 39 of the `426 Patent requires that the grouping step occur before the product is categorized; however, the Etilize process categorizes before grouping. CNET's expert simply asserts that ConQuire, another software tool utilized by Etilize, compares the product to categories of a taxonomy to determine the appropriate category.7 That may be true, however, the same does not demonstrate that the grouping occurs before the categorization. Indeed, the use of ConQuire, to the extent it performs categorization, is perfectly consistent with Hameed's declaration, which states that the categorization is sometimes performed automatically. Id. ¶ 8. With respect to the order in which the steps are performed, CNET's expert simply states that 17 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California "Etilize's asserted different order, or combination of multiple steps into a single step is an insubstantial change from the process elements of Claim 1, and any such change performs substantially the same function in substantially the same way to achieve substantially the same result." Gray Dec., Exh. A at 47. This conclusory statement cannot carry the day. Consequently, the court finds that Etilize's process categorizes the product before performing any of the steps of Claims 1 and 39 of the `426 Patent. Thus, Etilize does not infringe Claims 1 and 39, as well as all claims dependent on these independent claims of the `426 Patent. B. Grouping The "grouping" limitation is present in the first limitation of Claims 1 and 39 of the `426 Patent as well as in the third limitation of Claim 52 of the `426 Patent. Etilize argues that it does not sort or otherwise process a collection of product information records into groups based on their similarities or differences. The use of aQuire "necessitates that there [be] a vendor with a large number of similar products being acquired and a URL pattern [that] can be repeated over and over again [by] just replacing the product ID or SKU." Hameed Dec., Exh. D at 5. Hameed states that if, for example, the "product category is disk drive systems available from IBM and the IBM site generally has three pages that are associated with each product" aQuire would store copies of all three pages associated with the product on an Etilize server. Hameed Dec. ¶ 11. For instance, to fetch pages related to IBM's x500 disk drive, aQuire would copy: "ibm.com/disk_drive/x500/index.html," "ibm.com/disk_drive/x500/features.html" and "ibm.com/disk_drive/x500/specifications.html." According to CNET, aQuire practices this limitation because it downloads multiple web pages associated with a singular product and these downloaded web pages constitute a "group." CNET's expert states that "[p]rior to extraction from the data from the web page [by Xtract] the process method assigns the [downloaded] page to a group based on the site data related to the link that was crawled to obtain the page." Gray Dec., Exh. A at 44 (emphasis added). No explanation is given as to the "site data related to the link." Consequently, the court is left to conjecture. The court finds that "site data related to the link" simply means that the uniform resource locator ("URL") of 18 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California the website is used to identify the downloaded web page. Simply stated, Etilize's process likely stores the website www.ibm.com/disk_drive/x500/index.html on its servers with the filename "ibm.com_disk_drive_x500_index.html."8 Other downloaded web pages associated with the x500 disk drive likely have similar names. It is this similarity of identification that CNET presumably claims constitutes the group. This so-called grouping, however, is fundamentally different from the grouping envisioned by the claim limitation. The limitation speaks to "processing plural product information records from the product information sources into one or more groups based on which product information records are likely to correspond to the same product." Here, it is uncontested that each web page, which CNET implicitly contends is a "product information record," refers to a particular known product. No determination as to which web page belongs to which product needs to be made. This information is known well before the web page is even downloaded. Consequently, the notion that web pages need to be placed into a group such that all group members likely all correspond to the same product is nonsensical. CNET is unsuccessful even if the court accepts that Etilize's process requires that web pages be placed into groups based on which product information records, i.e., web pages, are likely to correspond to the same product. An argument can be made that all downloaded web pages with "x500" in their filename belong to a group that represents a unique product, likely IBM's x500 disk drive. This grouping based on the "site data related to the link," however, is not performed by Etilize to identify the product or its attributes, as required by the claim limitations. The so-called grouping is based on URL's associated with an identified product that already has predefined attributes in Etilize's template, and therefore, no reasonable jury could find that it is performed with the purpose of either identifying the product, `426 Patent, Claims 1, 39, or identifying the attributes of the product, `426 Patent, Claim 52. CNET argues that grouping should extend not only to the grouping of product information records based on which records are likely to correspond to the same product, but also to the grouping and prioritizing of Etilize's customer requests. Gray Dec., Exh A at 45. By broadly interpreting the grouping limitation, not only does CNET seek to now expand upon the construction that the parties 19 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California agreed to but also to recapture what it disavowed during the prosecution of the `426 Patent. In order to differentiate the `426 Patent from the prior-art reference of Blutinger, CNET specified that "the recited initial processing of the product information records into one or more groups does not fall under any taxonomy structure. Instead, this grouping refers to the fact that the product information records are analyzed for similarities and differences and associated together based upon which product information records are likely to correspond to the same product." Khaliq Dec., Exh. G at 27. Based on this prior disavowal and the parties' agreement to the contrary, the court rejects CNET's broad interpretation and restricts the limitation to associating together product information records that are likely to correspond to the same product.9 In sum, Etilize's products do not practice this limitation and summary judgment on noninfringement of every independent, and consequently dependent, claim of the `426 Patent still in issue in this action is GRANTED.10 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 20 C. Burden Shifting CNET argues that summary judgment is premature because the court has not yet addressed the applicability of 35 U.S.C. section 295. Section 295 states: In actions alleging infringement of a process patent based on the importation, sale, offer for sale, or use of a product which is made from a process patented in the United States, if the court finds­ (1) that a substantial likelihood exists that the product was made by the patented process, and (2) that the plaintiff has made a reasonable effort to determine the process actually used in the production of the product and was unable so to determine, the product shall be presumed to have been so made, and the burden of establishing that the product was not made by the patented process shall be on the party asserting that it was not so made. 35 U.S.C. § 295 (emphasis added). Here, CNET has been able to determine the process actually used in the production of the allegedly infringing product. Indeed, CNET has all of Etilize's source code. Thus, because the second factual requirement is not met, the burden shifting of section 295 does not apply. 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California IV. Motions to Strike CNET objects to the declarations submitted by Etilize's Chief Executive Officer ("CEO") and an individual in its employ. Specifically, CNET argues that: 1) Etilize's CEO, Azhar Hameed, is not competent to testify because he has no personal knowledge of the software at issue; and 2) Etilize's employee, Benjamin Christensen, is not competent to testify because he is introducing new arguments for the first time in Etilize's reply and because of a lack of personal knowledge. A. Azhar Hameed CNET makes three specific arguments. First, Hameed testified that he has neither seen nor used the Xtract system; second, Hameed has no familiarity with Webspinx; and third, Hameed did not prepare the entirety of his declaration on his own. In the alternative, CNET seeks to strike paragraphs 1, 5 and 7­14 of Hameed's declaration. Each of these arguments is unconvincing. First, CNET is correct to state that Hameed simply states, in a conclusory manner, that his declaration is based on personal knowledge. He does not specify his basis for the same. However, Hameed is Etilize's CEO. In his capacity as CEO, he must have knowledge as to how his company conducts business. Specifically, the fact that Hameed has never used the software in question is of no occasion. Hameed does not purport to explain the details of the software or testify as an expert witness. He simply seeks to explain the general process employed by the software. He need not run the software himself in order to understand the broad strokes; he may rely upon subordinates to explain the same to him. Indeed, as CEO, it is proper for the court to infer that Hameed has general personal knowledge of the process used to obtain data for Etilize's product. See Barthelemy v. Air Lines Pilots Ass'n, 897 F.2d 999, 1018 (9th Cir. 1990) ("personal knowledge and competence to testify are reasonably inferred from [the declarant's] position[] and the nature of their participation in the matters to which they swore"). Second, the court does not consider Hameed's testimony with respect to Webspinx and consequently, objections to the evidence on that issue are moot. Third, the fact that Hameed may have copied, verbatim, language from opinion counsel's opinion is also unpersuasive. Hameed swore to the functionality of the software under penalty of 21 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California perjury and if opinion counsel's language was the clearest formulation of the same, then the court will not penalize Hameed for recycling the language. CNET also argues that Hameed's declaration contains conclusions that are unsupported by underlying factual information. To the extent that is true, it pertains to the evidentiary weight of the declaration, not the admissibility thereof. In sum, CNET's motion to strike Hameed's declaration is DENIED. For the same reasons, CNET's alternative request to strike particular paragraphs of Hameed's declaration is also DENIED. B. Benjamin Christensen The court does not rely upon the Christensen's declaration in reaching its decision. Consequently, all of CNET's objections to Christensen's declaration are DENIED as moot. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 V. Motion to Bifurcate Etilize seeks to bifurcate this action into two phases. Specifically, it seeks resolution of whether Etilize infringes CNET's patents and whether those patents are valid and enforceable before resolution of whether CNET has engaged in unfair business practices. The motion is denied as untimely for the following reasons. Federal Rule of Civil Procedure 42(b) states: "For convenience, to avoid prejudice, or to expedite and economize, the court may order a separate trial of one or more separate issues, claims, crossclaims, counterclaims, or third-party claims." Etilize has raised some compelling reasons to bifurcate the trial in this action: 1) bifurcation will reduce the risk of jury confusion; 2) the issues are separable; and 3) bifurcation will lead to judicial economy. However, the trial date for this action is still at least six months away. See Docket No. 86. Indeed, expert discovery is ongoing and there are recently filed dispositive motions that the court has yet to be heard. Thus, the issues that will actually be tried may be vastly different from the issues currently present in the action. Consequently, the court shall not order the trial bifurcated at this time. The court notes that even if it ordered the trial bifurcated, the same does not necessitate that discovery on the bifurcated issues be stayed. Indeed, the motion to bifurcate seems designed by 22 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California Etilize to simply delay discovery on counterclaims brought by Etilize. Etilize states it does not wish to incur the expense necessary to prepare the expert report for its unfair business practices cause of action. However, Etilize brought the counterclaim and cannot now be heard to argue that bifurcation will save it the expense of preparing expert reports. Furthermore, none of the arguments Etilize advances in support of bifurcation support a stay of discovery regarding the issues to be bifurcated. For this additional reason, the motion to bifurcate is DENIED. While bifurcation of discovery is not justified, the court may at a later date consider bifurcation for trial. VI. Etilize's Supplemental Brief The court does not rely upon the arguments regarding the `426 Patent in Etilize's supplemental brief since the same is unnecessary. Further, the court does not rely upon Etilize's arguments regarding the "determining" step of the `933 Patent since the same was not put into issue by Etilize in its opening brief.11 Further, considering Etilize's submission would unfairly prejudice CNET by allowing Etilize to place new information and argument before this court that CNET has not had an opportunity to refute. To the extent that the supplemental brief relies upon new information previously unavailable to Etilize, Etilize may not be heard to complain. Etilize filed the instant motion on non-infringement before expert discovery was complete. Consequently, to the extent Etilize did not have the relevant information at the time it filed its motion, the ignorance was self-inflicted. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 CONCLUSION For the foregoing reasons, defendant's motion on noninfringement is GRANTED in part and DENIED in part; plaintiff's motions to strike are DENIED; defendant's motion to bifurcate is DENIED; and defendant's motion to file a supplemental brief is DENIED. IT IS SO ORDERED. Dated: August 29, 2008 MARILYN HALL PATEL United States District Court Judge Northern District of California 23 1 2 3 4 5 6 7 8 9 10 11 UNITED STATES DISTRICT COURT 12 For the Northern District of California ENDNOTES 1. Defendant's request to file an oversized reply brief is GRANTED. 2. The difference between the two correlating steps is insubstantial. Compare "correlating a unique product ID corresponding to an identified product for each of said groups" with "correlating a unique product ID corresponding to the product associated with each of said groups to identify the product." `426 Patent at 40:8­9; 36:31­33. 3. The fact that Xtract, which uses html identifiers, is run on a downloaded copy of a web page as opposed to the same web page on the internet is of no significance. Running Xtract on either results in the same functions being carried out in the same way to achieve the same result. 4. The fact that the extraction takes place from tables or predefined coordinates on a web page is irrelevant since pattern matching is nevertheless used to perform the extraction. 5. The analysis for finding infringement as a matter of law requires that all claim limitations be practiced by the accused product. CNET has not moved for the same here. 6. Claim 52 also requires sequential performance. The first step generates the crawler, the second uses the crawler to gather information, the third groups this information, the fourth parses the grouped information to determine attributes and the fifth creates a catalog based on the determined attributes. The steps must be performed in order because each subsequent step depends upon the results generated by its predecessor. 7. The fact that ConQuire is not itself accused of infringement is irrelevant because SpeX, the final product sold by Etilize and which utilizes ConQuire, is the product accused of infringement. 8. Depending on the operating system used, the software also likely changes the URL slightly to account for special characters that may not be used as part of a filename. 9. CNET also argues that placing web pages to be downloaded via aQuire in a queue is "grouping." This does not constitute a "group" in any sense of the word. Indeed, it is impossible to determine the group members. Furthermore, to the extent this queuing could constitute "grouping" as envisioned by the claims of the `426 Patent, it is not performed for the purpose of identifying the product or its attributes. 10. Based on this holding, the court does not reach the parties' "electronically" arguments. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 11. The same rationale applies to the argument regarding the "determining" step in Etilize's reply brief. 24

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?