Personalized User Model LLP v. Google Inc.
Filing
118
APPENDIX re 116 Claim Construction Opening Brief [Appendix of Exhibits (Vol. II of II)] by Google Inc.. (Attachments: # 1 Exhibit C part 1, # 2 Exhibit C part 2, # 3 Exhibit C part 3, # 4 Exhibit C part 4, # 5 Exhibit C part 5, # 6 Exhibit D, # 7 Exhibit E, # 8 Exhibit F)(Moore, David)
Personalized User Model LLP v. Google Inc.
D
Doc. 118 Att.
EXHIBIT E
ockets.Justia.com
FOU N DATIONS OF STATISTICAL NATURAL LA N GUAGE PROCESSING
© 1999 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Typeset in 10/13 Lucida Bright by the authors using LATEX 2E. Printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Information Manning, Christopher D. Foundations of statistical natural language processing / Christopher D. Manning, Hinrich Schiitze.
p. cm. Includes bibliographical references (p. ) and index. ISBN 978-0-262-13360-9 (hc.:alkpaper)
1. Computational linguistics-Statistical methods. 1. Schiitze, Hinrich.
II. Title. P98.5.S83M36 1999 410'.285-dc21
99-21137
CIP
10
2.1 Elementary Probability Theory
41
s P(Q) = 1 DISJOINT s Countable additivity : For disjoint sets Aj E F (i.e., Aj n Ak = 0 for j ^ k) co 00 P(U Aj) _ Y. P(Aj)
j=1
j=1
PROBABILITY SPACE
We call P(A) the probability of the event A. These axioms say that an event that encompasses, say, three distinct possibilities must have a probability that is the sum of the probabilities of each possibility, and that since an experiment must have some basic outcome as its result, the probability of that is 1. Using basic set theory, we can derive from these axioms a set of further properties of probability functions; see exercise 2.1. A well-founded probability space consists of a sample space 0, a o--field of events F, and a probability function P. In Statistical NLP applications, we always seek to properly define such a probability space for our models. Otherwise, the numbers we use are merely ad hoc scaling factors, and there is no mathematical theory to help us. In practice, though, corners often have been, and continue to be, cut. Example 1: A fair coin is tossed 3 times. What is the chance of 2 heads?
Solution : The experimental protocol is clear. The sample space is: 0 = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT } Each of the basic outcomes in 0 is equally likely, and thus has probability 1/8. A situation where each basic outcome is equally likely is called a uniform distribution. In a finite sample space with equiprobable basic outcomes, P(A) _ 1' 0(where A is the number of elements in a set A). 1
UNIFORM DISTRIBUTION
The event of interest is:
A = {HHT, HTH, THH} So: P(A) = iii = 3
42
2 Mathematical Foundations
Figure 2 .1 A diagram illustrating the calculation of conditional probability P(AI B). Once we know that the outcome is in B, the probability of A becomes P(A n B)/P(B).
2.1.2
Conditional probability and independence
Sometimes we have partial knowledge about the outcome of an experiment and that naturally influences what experimental outcomes are possible. We capture this knowledge through the notion of conditional probability. This is the updated probability of an event given some knowledge. The probability of an event before we consider our additional knowledge is called the prior probability of the event, while the new probability that results from using our additional knowledge is referred to as the posterior probability of the event. Returning to example 1 (the chance of getting 2 heads when tossing 3 coins), if the first coin has been tossed and is a head, then of the 4 remaining possible basic outcomes, 2 result in 2 heads, and so the probability of getting 2 heads now becomes z . The conditional probability of an event A given that an event B has occurred (P (B) > 0) is: P(AIB) = P(A n B) P (B)
CONDITIONAL PROBABILITY PRIOR PROBABILITY POSTERIOR PROBABILITY
(2.2)
Even if P (B) = 0 we have that: (2.3) P(A n B) = P(B)P(AEB) = P(A)P(BjA) [The multiplication rule]
We can do the conditionalization either way because set intersection is symmetric (A n B = B n A). One can easily visualize this result by looking at the diagram in figure 2.1.
2.1 Elementary Probability Theory
43
CHAIN RULE
The generalization of this rule to multiple events is a central result that will be used throughout this book, the chain rule:
(2.4)
P(AI n ... n An) = P(AI)P(A21AI)P(A3IAI n A2) ... P(AnI nni=1 Ai) V The chain rule is used in many places in Statistical NLP, such as working out the properties of Markov models in chapter 9. Two events A, B are independent of each other if P(AnB ) = P(A)P(B). Unless P(B) = 0 this is equivalent to saying that P (A) = P(AIB) ( i.e., knowing that B is the case does not affect the probability of A). This equivalence follows trivially from the chain rule . Otherwise events are dependent. We can also say that A and B are conditionally independent given C when P( A n C- P A I ) (IC B I ) - (C P B )
INDEPENDENCE
DEPENDENCE CONDITIONAL INDEPENDENCE
2.1.3
BAYES' THEOREM
Bayes' theorem
Bayes' theorem lets us swap the order of dependence between events. That is, it lets us calculate P (B I A) in terms of P (AI B). This is useful when the former quantity is difficult to determine . It is a central tool that we will use again and again, but it is a trivial consequence of the definition of conditional probability and the chain rule introduced in equations (2.2) and (2.3):
(2.5) NORMALIZING CONSTANT
P(BIA)
= P(B n A) - P(AIB)P(B) P(A) P(A)
The righthand side denominator P(A) can be viewed as a normalizing constant , something that ensures that we have a probability function. If we are simply interested in which event out of some set is most likely given A, we can ignore it. Since the denominator is the same in all cases, we have that: arg max B P(AIB)P (B) = argmaxP (AIB)P(B) P(A) B
(2.6)
However , we can also evaluate the denominator by recalling that: P(A n B) = P(AIB)P(B) P(A n B) = P(AIB)P(B) So we have: P(A) = P(A n B) + P(A n B) = P(AIB)P(B) +P(AIB)P(B) [additivity]
Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.
Why Is My Information Online?