"The Apple iPod iTunes Anti-Trust Litigation"

Filing 753

***ERRONEOUS ENTRY, PLEASE REFER TO DOCUMENT NO. 754 *** EXHIBITS re 752 Opposition/Response to Motion, filed byApple Inc.. (Attachments: # 1 Exhibit 2, # 2 Exhibit 3, # 3 Exhibit 4, # 4 Exhibit 5, # 5 Exhibit 6, # 6 Exhibit 7, # 7 Exhibit 8, # 8 Exhibit 9, # 9 Exhibit 11, # 10 Exhibit 12, # 11 Proposed Order)(Related document(s) 752 ) (Kiernan, David) (Filed on 1/14/2014) Modified on 1/14/2014 (jlmS, COURT STAFF).

Download PDF
Exhibit 12 ARTICLE IN PRESS Journal of Econometrics 141 (2007) 597–620 www.elsevier.com/locate/jeconom Asymptotic properties of a robust variance matrix estimator for panel data when T is large Christian B. Hansen University of Chicago, Graduate School of Business, 5807 South Woodlawn Ave., Chicago, IL 60637, USA Available online 20 November 2006 Abstract I consider the asymptotic properties of a commonly advocated covariance matrix estimator for panel data. Under asymptotics where the cross-section dimension, n, grows large with the time dimension, T, fixed, the estimator is consistent while allowing essentially arbitrary correlation within each individual. However, many panel data sets have a non-negligible time dimension. I extend the usual analysis to cases where n and T go to infinity jointly and where T ! 1 with n fixed. I provide conditions under which t and F statistics based on the covariance matrix estimator provide valid inference and illustrate the properties of the estimator in a simulation study. r 2007 Elsevier B.V. All rights reserved. JEL classification: C12; C13; C23 Keywords: Panel; Heteroskedasticity; Autocorrelation; Robust; Covariance matrix 1. Introduction The use of heteroskedasticity robust covariance matrix estimators, cf. White (1980), in cross-sectional settings and of heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators, cf. Andrews (1991), in time series contexts is extremely common in applied econometrics. The popularity of these robust covariance matrix estimators is due to their consistency under weak functional form assumptions. In particular, their use allows the researcher to form valid confidence regions about a set of parameters from a model of interest without specifying an exact process for the disturbances in the model. E-mail address: chansen1@chicagoGSB.edu. 0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2006.10.009 ARTICLE IN PRESS 598 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 With the increasing availability of panel data, it is natural that the use of robust covariance matrix estimators for panel data settings that allow for arbitrary within individual correlation are becoming more common. A recent paper by Bertrand et al. (2004) illustrated the pitfalls of ignoring serial correlation in panel data, finding through a simulation study that inference procedures which fail to account for within individual serial correlation may be severely size distorted. As a potential resolution of this problem, Bertrand et al. (2004) suggest the use of a robust covariance matrix estimator proposed by Arellano (1987) and explored in Kezdi (2002) which allows arbitrary within individual correlation and find in a simulation study that tests based on this estimator of the covariance parameters have correct size. One drawback of the estimator of Arellano (1987), hereafter referred to as the ‘‘clustered’’ covariance matrix (CCM) estimator, is that its properties are only known in conventional panel asymptotics as the cross-section dimension, n, increases with the time dimension, T, fixed. While many panel data sets are indeed characterized by large n and relatively small T, this is not necessarily the case. For example, in many differences-indifferences and policy evaluation studies, the cross-section is composed of states and the time dimension of yearly or quarterly (or occasionally monthly) observations on each state for 20 or more years. In this paper, I address this issue by exploring the theoretical properties of the CCM estimator in asymptotics that allow n and T to go to infinity jointly and in asymptotics where T goes to infinity with n fixed. I find that the CCM estimator, appropriately normalized, is consistent without imposing any conditions on the rate of growth of T relative to n even when the time series dependence between the observations within each individual is left unrestricted. In this case, both the OLS estimator and the CCM estimator pffiffiffi converge at only the n-rate, essentially because the only information is coming from cross-sectional variation. If the pffiffiffiffiffiffiffi series process is restricted to be strongly mixing, I time show that the OLS estimator is nT -consistent but that, because high lags pffiffiffi not down are weighted, the robust covariance matrix estimator still converges at only the n-rate. This behavior suggests, as indicated in the simulations found in Kezdi (2002), that it is the n dimension and not the size of n relative to T that matters for determining the properties of the CCM estimator. It is interesting to note that the limiting behavior of b changes ‘‘discontinuously’’ as the b amount of dependence is limited. In ffiffiffiffiffiffiffi particular, the rate of convergence of b changes from b p pffiffiffi n in the ‘‘no-mixing case’’ to nT when mixing is imposed. However, despite the difference in the limiting behavior of b there is no difference in the behavior of standard b, inference procedures based on the CCM estimator between the two cases. In particular, the same t and F statistics will be valid in either case (and in the n ! 1 with T fixed case) without reference to the asymptotics or degree of dependence in the data. I also derive the behavior of the CCM estimator as T ! 1 with n fixed, where I find the estimator is not consistent but does have a limiting distribution. This result corresponds to asymptotic results for HAC estimators without truncation found in recent work by Kiefer and Vogelsang (2002, 2005), Phillips et al. (2003), and Vogelsang (2003). While the limiting distribution is not proportional to the true covariance matrix in general, it is proportional to the covariance matrix in the important special case of iid data across individuals,1 1 Note that this still allows arbitrary correlation and heteroskedasticity within individuals, but restricts that the pattern is the same across individuals. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 599 allowing construction of asymptotically pivotal statistics in this case. In fact, in this case, the standard t-statistic is not asymptotically normal but converges in distribution to a random variable which is exactly proportional to a tnÀ1 distribution. This behavior suggests the use of the tnÀ1 for constructing confidence intervals and tests when the CCM estimator is used as a general rule, as this will provide asymptotically correct critical values under any asymptotic sequence. I then explore the finite sample behavior of the CCM estimator and tests based upon it through a short simulation study. The simulation results indicate that tests based on the robust standard error estimates generally have approximately correct size in serially correlated panel data even in small samples. However, the standard error estimates themselves are considerably more variable than their counterparts based on simple parametric models. The bias of the simple parametric estimators is also typically smaller in the cases where the parametric model is correct, suggesting that these standard error estimates are likely preferable when the researcher is confident in the form of the error process. In the simulation, I also explore the behavior of an analog of White’s (1980) direct test for heteroskedasticity proposed by Kezdi (2002).2 The results indicate the performance of the test is fairly good for moderate n, though it is quite poor when n is small. This simulation behavior suggests that this test may be useful for choosing between the use of robust standard error estimates and standard errors estimated from a more parsimonious model when n is reasonably large. The remainder of this paper is organized as follows. In Section 2, I present the basic framework and the estimator and test statistics that will be considered. The asymptotic properties of these estimators are collected in Section 3, and Section 4 contains a discussion of a Monte Carlo study assessing the finite sample performance of the estimators in simple models. Section 5 concludes. 2. A heteroskedasticity–autocorrelation consistent covariance matrix estimator for panel data Consider a regression model defined by yit ¼ x0it b þ it , (1) where i ¼ 1; . . . ; n indexes individuals, t ¼ 1; . . . ; T indexes time, xit is a k  1 vector of observable covariates, and it is an unobservable error component. Note that this formulation incorporates the standard fixed effects model as well as models which include other covariates that enter the model with individual specific coefficients, such as individual specific time trends, where these covariates have been partialed out. In these cases, the variables xit , yit , and it should be interpreted as residuals from regressions of xà , it yà , and à on an auxiliary set of covariates zà from the underlying model it it 0 it 0 yà ¼ xà b þ zà g þ à . For example, in the fixed effects model, Zà is a matrix of dummy it it it it variables for each individual and g is a vector of individual specific fixed effects. In this P case, xit ¼ xà À ð1=TÞ T xà , and yit and it are defined similarly. Alternatively, xit , yit , it t¼1 it and it could be interpreted as variables resulting from other transformations which 2 Solon and Inoue (2004) offers a different testing procedure for detecting serial correlation in fixed effects panel models. See also Bhargava et al. (1982), Baltagi and Wu (1999), Wooldridge (2002, pp. 275, 282–283), and Drukker (2003). ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 600 remove the nuisance parameters from the equation, such as first-differencing to remove the fixed effects. In what follows, all properties are given in terms of the transformed variables for convenience. Alternatively, conditions could be imposed on the underlying variables and the properties derived as T ! 1 as in Hansen (2006).3 Within each individual, the equations defined by (1) may be stacked and represented in matrix form as y i ¼ xi b þ  i , (2) where yi is a T  1 vector of individual outcomes, xi is a T  k vector of observed covariates, and i is a T  1 vector of unobservables affecting the outcomes yi with E½i 0i jxi Š ¼ Oi . The P P OLS estimator of b from Eq. (2) may then be defined as b ¼ ð n x0 xi ÞÀ1 n x0 y . The b i¼1 i i¼1 i i properties of b as n ! 1 with T fixed are well known. In particular, under regularity b pffiffiffi b conditions, nP À bÞ is asymptotically normal with covariance matrix QÀ1 WQÀ1 where ðb P Q ¼ limn ð1=nÞ n E½x0i xi Š and W ¼ limn ð1=nÞ n E½x0i Oi xi Š. i¼1 i¼1 The problem of robust covariance matrix estimation is then estimating W without imposing a parametric structure on the Oi . In this paper, I consider the estimator suggested by Arellano (1987) which may be defined as n 1 X 0 0 b x bib xi , W¼  nT i¼1 i i (3) b where bi ¼ yi À xi b are OLS residuals from Eq. (2). This estimator is an appealing  generalization of White’s (1980) heteroskedasticity consistent covariance matrix estimator that allows for arbitrary intertemporal correlation patterns and heteroskedasticity across individuals.4 The estimator is also appealing in that, unlike HAC estimators for time series data, its implementation does not require the selection of a kernel or bandwidth parameter. b The properties of W under conventional panel asymptotics where n ! 1 with T fixed are well-established. In the remainder of this paper, I extend this analysis by considering the b properties of W under asymptotic sequences where T ! 1 as well. The chief reason for interest in the CCM estimator is for performing inference about b b. pffiffiffiffiffiffiffiffi b d b Suppose d nT ðb À bÞ ! Nð0; BÞ and define an estimator of the asymptotic variance of b as b where B ! B. The following estimator of the asymptotic variance of b based on b p b ð1=d nT ÞB b W is used throughout the remainder of the paper: d bÞ Avarðb ¼ n X !À1 i¼1 ¼ n X i¼1 3 b ðnT W Þ x0i xi !À1 x0i xi n X i¼1 n X i¼1 x0ibib0i xi  !À1 x0i xi ! n X !À1 x0i xi . ð4Þ i¼1 This is especially relevant in Theorem 3 where the mixing conditions will not hold for the transformed variables if, for example, the transformation is to remove fixed effects by differencing out the individual means. Hansen (2006) provides conditions on the untransformed variables which will cover this case in a different but related context. This approach complicates the proof and notation and is not pursued here. 4 It does, however, ignore the possibility of cross-sectional correlation, and it will be assumed that there is no cross-sectional correlation for the remainder of the paper. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 601 In addition, for testing the hypothesis Rb ¼ r for a q  k matrix R with rank q, the usual t (for R a 1  k vector) and Wald statistics can be defined as pffiffiffiffiffiffiffi nT ðRb À rÞ b t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bÀ1 b bÀ1 RQ W Q R0 (5) bÀ1 b bÀ1 b b F à ¼ nTðRb À rÞ0 ½RQ W Q R0 ŠÀ1 ðRb À rÞ, (6) à and P b b respectively, where W is defined above and Q ¼ ð1=nTÞ n x0i xi . In Section 3, I verify i¼1 d d b tà ! Nð0; 1Þ, F à ! w2 , and Avarðb is d bÞ that, despite differences in the limiting behavior of b, q valid for estimating the asymptotic variance of b as n ! 1 regardless of the behavior of T. b b I also consider the behavior of tà and F à as T ! 1 with n fixed. In this case, W is not consistent for W but does have a limiting distribution; and when the data are iid across i,5 I d show that tà !ðn=ðn À 1ÞÞ1=2 tnÀ1 and that F à is asymptotically pivotal and so can be used b to construct valid tests. This behavior suggests that inference using ðn=ðn À 1ÞÞW and forming critical values using a tnÀ1 distribution will be valid regardless of the asymptotic sequence considered. b It is worth noting that the estimator W has also been used extensively in multilevel models to account for the presence of correlation between individuals within cells; cf. Liang and Zeger (1986) and Bell and McCaffrey (2002). For example, in a schooling study, one might have data on individual outcomes where the individuals are grouped into classes. In this case, the cross-sectional unit of observation could be defined as the class, and arbitrary correlation between all individuals within each class could be allowed. In this case, one would expect the presence of a classroom specific random effect resulting in equicorrelation between all individuals within a class. While this would clearly violate the mixing assumptions imposed in obtaining the asymptotic behavior as T ! 1 with n fixed, b it would not invalidate the use of W for inference about b in cases where n and T go to infinity jointly. In addition to being useful for performing inference about b W may also be used to test b, b the specification of simple parametric models of the error process.6 Such a test may be useful for a number of reasons. If a parametric model is correct, the estimates of the variance of b based on this model will tend to behave better than the estimates obtained b b from W . In particular, parametric estimates of the variance of b will often be considerably b b less variable and will typically converge faster than estimates made using W ; and if the parametric model is deemed to be adequate, this model may be used to perform FGLS estimation. The FGLS estimator is asymptotically more efficient than the OLS estimator, and simulation evidence in Hansen (2006) suggests that the efficiency gain to using FGLS over OLS in serially correlated panel data may be substantial. 5 Note that this still allows arbitrary correlation and heteroskedasticity within individuals but restricts that the pattern is the same across individuals. 6 The test considered is a straightforward generalization of the test proposed by White (1980) for heteroskedasticity and was suggested in the panel context by Kezdi (2002). ARTICLE IN PRESS 602 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 To define the specification test, called hereafter the heteroskedasticity–autocorrelation P b yÞ yÞ y (HA) test, let W ðb ¼ ð1=nTÞ n x0i Oi ðb 0 xi where b are estimates of a finite set of i¼1 yÞ parameters describing the disturbance process and Oi ðb is the implied covariance matrix for individual i.7 Define a test statistic À b b yÞÞ b b b yÞފ, S à ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb (7) b b where D is a positive semi-definite weighting matrix that estimates the variance of vecðW À b ðb and AÀ is the generalized inverse of a matrix A.8 In the following section, it will be W yÞÞ d b shown that S à ! w2 kðkþ1Þ=2 for D defined below. b is A natural choice for D n 1 X b D¼ ½ðvecðx0ibib0i xi À x0i Oi ðb i ÞÞðvecðx0ibib0i xi À x0i Oi ðb i ÞÞ0 Š. yÞx yÞx   nT i¼1 (8) b Under asymptotics where fn; Tg ! 1 jointly, another potential choice for D is an estimate b: of the asymptotic variance of W n 1 X b b b ½ðvecðx0ibib0i xi À W ÞÞðvecðx0ibib0i xi À W ÞÞ0 Š. V¼   nT i¼1 (9) b b b yÞÞ That V provides an estimatorffi of the variance of vecðW À W ðb follows from the fact that pffiffiffiffiffiffiffi pffiffi b b yÞÞ as fn; Tg ! 1, vecðW Þ is n-consistent while vecðW ðb will be nT -consistent in many b yÞÞ b cases, so vecðW ðb may be taken as a constant relative to vecðW Þ. The difference in rates of convergence would arise, for example, in a fixed effects panel model where the errors follow an AR process with common AR coefficients across individuals. However, it is important to note that this will not always be the case. In particular, in random effects models, the estimator of the variance of the individual specific shock will converge at only pffiffiffi a n rate, implying the same rate of convergence for both the robust and parametric estimators of the variance. In the following section, I outline the asymptotic properties of b W , and V from which the behavior of tà , F à , and Sà will follow. The properties of D, b b b, b b though not discussed, will generally be the same as those of V under the different asymptotic sequences considered. 3. Asymptotic properties of the robust covariance matrix estimator To develop the asymptotic inference results, I impose the following conditions. b yÞ Consistency and asymptotic normality of W ðb will generally follow from consistency and asymptotic b In particular, defining W i ðyÞ as the derivative of W with respect to yi and letting y be a p  1 normality of y: P ¯ y ¯ vector, a Taylor series expansion of W ðb yields W ðb ¼ W ðyÞ þ p W i ðyÞðb À yÞ where y is an intermediate yÞ yÞ i¼1 b À W ðyÞ will inherit the properties of value. As long as a uniform law of large numbers applies to W i ðyÞ, W ðyÞ 7 b À y. The problem is then reduced to finding an estimator of y that is consistent and asymptotically normal with a y mean zero asymptotic distribution. Finding such an estimator in fixed effects panel models with serial correlation and/or heteroskedasticity when n ! 1 and T=n ! r where ro1 is complicated, though there are estimators which exist. See, for example, Nickell (1981), MaCurdy (1982), Solon (1984), Lancaster (2002), Hahn and Kuersteiner (2002), Hahn and Newey (2004), and Hansen (2006). 8 b b yÞ The test could alternatively be defined by only considering the ðkðk þ 1ÞÞ=2 unique elements of W À W ðb and using the inverse of the implied covariance matrix. This test will be equivalent to the test outlined above. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 603 Assumption 1. fxi ; i g are independent across i, and E½i 0i jxi Š ¼ Oi . P Assumption 2. QnT ¼ E½ n ðx0i xi =nTފ is uniformly positive definite with constant limit Q i¼1 where limits are taken as n ! 1 with T fixed in Theorem 1, as fn; Tg ! 1 in Theorems 2 and 3, and as T ! 1 with n fixed in Theorem 4. In addition, I impose either Assumption 3(a) or Assumption 3(b) depending on the context. Assumption 3. (a) E½i jxi Š ¼ 0. (b) E½xit it Š ¼ 0. Assumptions 1–3 are quite standard for panel data models. Assumption 1 imposes independence across individuals, ruling out cross-sectional correlation, but leaves the time series correlation unconstrained and allows general heterogeneity across individuals. Assumption 2 is a standard full rank condition, and the restriction that QnT has a constant limit could be relaxed at the cost of more complicated notation. Assumption 3 imposes that one of two orthogonality conditions is satisfied. Assumption 3(b) imposes that xit and it are uncorrelated and is weaker than the strict exogeneity imposed in Assumption 3(a). Assumption 3(a) is stronger than necessary, but it simplifies the proof of asympb b totic normality of W and consistency of V . In addition, Assumption 3(a) would typically 9 be imposed in fixed effects models. The first theorem, which is stated here for completeness, collects the properties of b and b b in asymptotics where n ! 1 with T fixed. W Theorem 1. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisfied, and that n ! 1 with T fixed. (i) If Assumption 3(b) holds and Ejxith j4þd oDo1 and Ejit j4þd oDo1 for some d40, then ! n pffiffiffiffiffiffiffi 1 X d nT ðb À bÞ ! QÀ1 N 0; W ¼ lim b E½x0i Oi xi Š , n nT i¼1 and b p W ! W. (ii) In addition, if Assumption 3(a) holds and Ejxith j8þd oDo1 and Ejit j8þd oDo1 for some d40, then pffiffiffiffiffiffiffi b nT ½vecðW À W ފ ! n 1 X d ! N 0; V ¼ lim E½ðvecðx0i i 0i xi À W ÞÞðvecðx0i i 0i xi À W ÞÞ0 Š , n nT i¼1 9 Note that a balanced panel has also implicitly been assumed. All of the results with the exception of Corollary 4.1 could be extended to accommodate unbalanced panels at the cost of more complicated notation. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 604 and b p V ! V. Remark 3.1. It follows from Theorem 1(i) that the asymptotic variance of b can be b estimated using (4) since !À1 !À1 n n n X X X 0 d bÞ x0 xi x0bib xi x0 xi Avarðb ¼  i i¼1 1 ¼ nT i i¼1 n 1 X 0 x xi nT i¼1 i !À1 i i i¼1 n X b 1 W x0 xi nT i¼1 i !À1 ¼ 1 bÀ1 b bÀ1 Q WQ , nT bÀ1 b bÀ1 p where Q W Q ! QÀ1 WQÀ1 . It also follows immediately from the definitions of tà and d F à in Eqs. (5) and (6) and Theorem 1(i) that, under the null hypothesis, tà ! Nð0; 1Þ and d b yÞ F à ! w2 . Similarly, using Theorem 1(ii) and assuming W ðb has properties similar to those q b b of W , it will follow that the HA test statistic, S à , formed using D defined above converges 2 in distribution to a wkðkþ1Þ=2 under the null hypothesis. b b Theorem 1 verifies that b and W are consistent and asymptotically normal as n ! 1 with T fixed without imposing any restrictions on the time series dimension. In the following results, I consider alternate asymptotic approximations under the assumption that both n and T are going to infinity.10 In these cases, consistency and asymptotic b normality of suitably normalized versions of W are established under weak conditions. Theorem 2, given immediately below, covers the case where n and T are going to infinity and there is not weak dependence in the time series. In particular, the results of Theorem 2 P are only interesting in the case where W ¼ limn;T ð1=nT 2 Þ n E½x0i Oi xi Š40. Perhaps the i¼1 leading case where this behavior would occur is in a model where it includes an individual specific random effect that is uncorrelated to xit and the estimated model does not include an individual specific effect. In this case, all observations for a given individual will be equicorrelated, and the condition given above will hold. Theorem 3, given following Theorem 2, covers the case where there is mixing in the time series. Theorem 2. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisfied, and that fn; Tg ! 1 jointly. (i) If Assumption 3(b) holds and Ejxith j4þd oDo1 and Ejit j4þd oDo1 for some d40, then n pffiffiffi b 1 X d nðb À bÞ ! QÀ1 Nð0; W ¼ lim 2 E½x0i Oi xi ŠÞ, n;T nT i¼1 10 One could also consider sequential limits in which one takes limits as n or T goes to infinity with the other dimension fixed and then lets the other dimension go to infinity. It could be shown that under the conditions of Theorem 2 and appropriate normalizations sequential limits taken first with respect to either n or T would yield the same results as the joint limit. Similarly, under the conditions of Theorem 3, the sequential limits taken first with respect to either n or T would produce the same results as the joint limit. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 605 and p b W =T ! W . (ii) In addition, if Assumption 3(a) holds and Ejxith j8þd oDo1 and Ejit j8þd oDo1 for some d40, then pffiffiffi b n½vecðW =T À W ފ ! n 1 X d 0 0 0 0 0 E½ðvecðxi i i xi À W ÞÞðvecðxi i i xi À W ÞÞ Š , ! N 0; V ¼ lim 4 n;T nT i¼1 and p b V =T 3 ! V . Remark 3.2. It is important to note that the results presented in Theorem 2 are not interesting in the setting where the fj; kg element of Oi becomes small when jj À kj is large P since in these circumstances ð1=nT 2 Þ n E½x0i Oi xi Š ! 0. Theorem 3 presents results which i¼1 are relevant in this case. b Remark 3.3. Theorem 2 verifies consistency and asymptotic normality of both b and W b while imposing essentially no constraints on the time series dependence in the data. The large cross-section effectively allows the time series dimension to pffiffiffiignored even when ffiffiffiffiffiffiffi be pT is large. However, without constraints on the time series, b is n-consistent, not nT b consistent. Intuitively, the slower rate of convergence is due to the fact that there may be little information contained in the time series since it is allowed to be arbitrarily dependent. pffiffiffiffiffiffiffi b Remark 3.4. The fact that b and W are not nT -consistent will not affect practical b implementation of inference about b In particular, the estimate of the asymptotic variance b. b based on Eq. (4) is of b !À1 !À1 n n n X X X 0 0 0 0 d b ¼ AvarðbÞ x xi x bib xi x xi  i i¼1 i i i¼1 !À1 n 1 1 X 0 ¼ x xi n nT i¼1 i i i¼1 b ðW =TÞ n 1 X 0 x xi nT i¼1 i !À1 1 bÀ1 b bÀ1 ¼ Q ðW =TÞQ , n bÀ1 b bÀ1 p where Q ðW =TÞQ ! QÀ1 WQÀ1 : The t-statistic defined in Eq. (5) may also be expressed as pffiffiffiffiffiffiffi nT ðRb À rÞ b à t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi bÀ1 b bÀ1 RQ W Q R0 pffiffiffi b nðRb À rÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi À1 b bÀ1 b RQ ðW =TÞQ R0 which converges in distribution to a Nð0; 1Þ random variable under the null hypothesis, d Rb ¼ r, by Theorem 2(i). Similarly, it follows that F à ! w2 under the null. Finally, the HA q ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 606 test statistic, S à , defined above also satisfies b b yÞÞ bÀ b b yÞފ Sà ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb b b b yÞ=Tފ, b b yÞ=TÞ0 ðD=T 3 ÞÀ vecðW =T À W ðb ¼ n½vecðW =T À W ðb which converges in distribution to a w2 kðkþ1Þ=2 under the conditions of the theorem and the b ðb behaves similarly to V . b additional assumption that W yÞ The previous theorem establishes the properties of b and the robust variance matrix b estimator as n and T go to infinity jointly without imposing restrictions on the time series dependence. While the result is interesting, there are many cases in which one might expect the time series dependence to diminish over time. In the following theorem, the properties b of b and W are established under the assumption that the data are strong mixing in the b time series dimension. Theorem 3. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisfied, and that fn; Tg ! 1 jointly. (i) If Assumption 3(b) is satisfied, Ejxith jrþd oD and Ejit jrþd oD for some d40, and fxit ; it g is a strong mixing sequence in t with a of size À3r=ðr À 4Þ for r44, ! n X pffiffiffiffiffiffiffi d 0 b À bÞ ! QÀ1 N 0; W ¼ lim 1 E½xi Oi xi Š nT ðb n;T nT i¼1 and p b W À W ! 0. (ii) In addition, if Assumption 3(a) is satisfied, Ejxith jrþd oD and Ejit jrþd oD for some d40, and fxit ; it g is a strong mixing sequence in t with a of size À7r=ðr À 8Þ for r48, pffiffiffi b n½vecðW À W ފ ! n 1 X d 0 0 0 0 0 E½ðvecðxi i i xi À W ÞÞðvecðxi i i xi À W ÞÞ Š , ! N 0; V ¼ lim 2 n;T nT i¼1 and p b V =T ! V . b Remark 3.5. Theorem 3 verifies consistency and asymptotic normality of both b and W b under fairly conventional conditions on the time series dependence of the variables. The pffiffiffiffiffiffiffi added restriction on the time series dependence pffiffiffi allows estimation of b at the nT -rate, which differs from the case above where b is only n-consistent. Intuitively, the increase in b the rate of convergence is due to the fact that under the mixing conditions, the time series is more informative than in the case analyzed in Theorem 2. Remark 3.6. It follows immediately from the conclusions of Theorem 3 and the definitions d bÞ, d bÞ of Avarðb tà , and F à in Eqs. (4)–(6) that Avarðb is valid for estimating the asymptotic d d variance of b and that tà ! Nð0; 1Þ and F à ! w2 under the null hypothesis. The HA test b q ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 607 statistic, S à , also satisfies b b yÞÞ bÀ b b yÞފ S à ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb b b yÞފ, b b yÞÞ b ¼ n½vecðW À W ðb 0 ðD=TÞÀ vecðW À W ðb which converges in distribution to a w2 kðkþ1Þ=2 under the conditions of the theorem and the b b b assumption that D behaves similarly to V . In this case, V could also typically be used as pffiffiffiffiffiffiffi à b yÞ the weighting matrix in forming S since it will often be the case that W ðb will be nT pffiffiffi b consistent while W is n-consistent. Theorems 1–3 establish that conventional estimators of the asymptotic variance of b and b b t and F statistics formed using W have their usual properties as long as n ! 1 regardless of the behavior of T. In addition, the results indicate that it is essentially only the size of n that matters for the asymptotic behavior of the estimators under these sequences. To b complete the theoretical analysis, I present the asymptotic properties of W as T ! 1 with n fixed below. The results are interesting in providing a justification for a commonly used procedure and in unifying the results and the different asymptotics considered. Theorem 4. Suppose the data are generated by model (1), that Assumptions 1, 2, and 3(b) are satisfied, and that T ! 1 with n fixed. If Ejxith jrþd oD, Ejit jrþd oD, and fxit ; it g is a strong mixing sequence in t with a of size À3r=ðr À 4Þ for r44, then pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi d p d nT ðb À bÞ ! QÀ1 Nð0; W Þ; x0i xi =nT À Qi =n ! 0; x0i i = nT ! Nð0; W i =nÞ, b and ! !À1 n n n X X 1X b W !U ¼ ðLi Bi B0i Li À Li Bi B0j Lj Qj Qi n i¼1 j¼1 j¼1 !À1 ! n n X X À Qi Qj Lj Bj B0i Li d j¼1 þ Qi n X j¼1 !À1 Qj j¼1 n X j¼1 ! Lj Bj n X j¼1 ! B0j Lj n X !À1 Qj Qi , j¼1 P where W i ¼ limT ð1=TÞE½x0i Oi xi Š, W ¼ limT ð1=nTÞ i E½x0i Oi xi Š, Bi $Nð0; I k Þ is a k-dimen1=2 sional normal vector with E½Bi B0j Š ¼ 0 and Li ¼ W i . b Remark 3.7. Theorem 4 verifies that W is not consistent but does have a limiting distribution as T ! 1 with n fixed. Unfortunately, the result here differs from results obtained in Phillips et al. (2003), Kiefer and Vogelsang (2002, 2005), and Vogelsang (2003) who consider HAC estimation in time series data without truncation in that how to construct asymptotically pivotal statistics from U is not immediately obvious. However, in one important special case, U is proportional to the true covariance matrix allowing construction of asymptotically pivotal tests. Corollary 4.1. Suppose the conditions of Theorem 4 are satisfied and that Qi ¼ Q and W i ¼ W for all i. Then ! n n n X 1 1X X 0 d b Bi B0i À Bi Bi L W !U ¼ L n n i¼1 i¼1 i¼1 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 608 for Bi defined in Theorem 4 and L ¼ W 1=2 . Then, for testing the null hypothesis H0 : Rb ¼ r against the alternative H1 : Rbar for a q  k matrix R with rank q, the limiting distributions of the conventional Wald (F à ) and t-type ðtÃ Þ tests under H0 are bÀ1 b bÀ1 b b F à ¼ ðnTÞðRb À rÞ0 ½RQ W Q R0 ŠÀ1 ðRb À rÞÞ " !#À1 X nq d e0 1 e e e0 F q;nÀq , ! Bq;n Bq;n ; ¼ Bq;i B0q;i À Bq;n Bq;n n nÀq i ð10Þ and pffiffiffiffiffiffiffi nT ðRb À rÞ b t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi À1 b b bÀ1 RQ W Q R0 rffiffiffiffiffiffiffiffiffiffiffi e B1;n n d ! qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ð11Þ tnÀ1 , P 2 2 nÀ1 e ð1=nÞð i B1;i À B1;n Þ pffiffiffi P e where Bq;i $Nð0; I q Þ, Bq;n ¼ ð1= nÞ n Bq;i , tnÀ1 is a t distribution with n À 1 degrees of i¼1 freedom, and F q;nÀq is an F distribution with q numerator and n À q denominator degrees of freedom. à b Corollary 4.1 gives the limiting distribution of W as T ! 1 under the additional restriction that Qi ¼ Q and W i ¼ W for all i. These restrictions would be satisfied when the data vectors for each individual fxi ; yi g are iid across i. While this is more restrictive than the condition imposed in Assumption 1, it still allows for quite general forms of conditional heteroskedasticity and does not impose any structure on the time series process within individuals. The most interesting feature about the result in Corollary 4.1 is that under the b conditions imposed, the limiting distribution of W is proportional to the actual covariance matrix in the data. This allows construction of asymptotically pivotal statistics based on standard t and Wald tests as in Phillips et al. (2003), Kiefer and Vogelsang (2002, 2005), and Vogelsang (2003). This is particularly convenient in the panel case since the limiting pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi distribution of the t-statistic is exactly ðn=ðn À 1ÞÞ tnÀ1 where tnÀ1 denotes the t distribution with n À 1 degrees of freedom.11 It is also interesting that EU ¼ ð1 À ð1=nÞÞW . b This suggests normalizing the estimator W by n=ðn À 1Þ will result in an asymptotically unbiased estimator in asymptotics where T ! 1 with n fixed and will likely reduce the finite-sample bias under asymptotics where n ! 1. In addition, the t-statistic constructed b based on the estimator defined by ðn=ðn À 1ÞÞW will be asymptotically distributed as a tnÀ1 for which critical values are readily available.12 The conclusions of Corollary 4.1 suggest a simple procedure for testing hypotheses regarding regression coefficients which will be valid under any of the asymptotics b considered. Using ðn=ðn À 1ÞÞW and obtaining critical values from a tnÀ1 distribution will yield tests which are asymptotically valid regardless of the asymptotic sequence since the 11 b If n ¼ 1, W is identically equal to 0. In this case, it is easy to verify that U equals 0, though the results of Theorem 4 and Corollary 4.1 are obviously uninteresting in this case. 12 b This is essentially the normalization used in Stata’s cluster command, which normalizes W by ½ðnT À 1Þ=ðnT À kފ ½n=ðn À 1ފ, where the normalization is motivated as a finite-sample adjustment under the usual n ! 1, T fixed asymptotics; see Stata User’s Guide Release 8, p. 275 (Stata Corporation, 2003). ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 609 tnÀ1 ! Nð0; 1Þ and n=ðn À 1Þ ! 1 as n ! 1. Thus, this approach will yield valid tests under any of the asymptotics considered in the presence of quite general heteroskedasticity and serial correlation.13 In addition, it is important to note that in the cases where there is weak dependence in the time series and T is large, more efficient estimators of the covariance matrix which make use of this information are available. In particular, standard time series HAC estimators which downweight the correlation between observations that are far apart will have faster rates of convergence than the CCM estimator. b Finally, it is worth noting that the maximum rank of W will generally be n À 1, which b b suggests that W will be rank deficient when k4n À 1: Since W is supposed to estimate a b full rank matrix, it seems likely that inference based on W will perform poorly in these cases. Also, the above development ignores time effects, which will often be included in panel data models. Under T fixed, n ! 1 asymptotics, the time effects can be included in the covariate vector xit and pose no additional complications. However, as T ! 1, they also need to be considered separately from x and partialed out with the individual fixed effects. This partialing out will generally result in the presence of an Oð1=nÞ correlation between individuals. When n is large, this correlation should not matter, but in the fixed n, T ! 1 case, it will invalidate the results. The effect of the presence of time effects was explored in a simulation study with the same design as that reported in the following section where each model was estimated including a full set of time fixed effects. The results, which are not reported below but are available upon b request, show that tests based on W are somewhat more size distorted than when no time effects are included for small n, but that this size distortion diminishes quickly as n increases. 4. Monte Carlo evidence The asymptotic results presented above suggest that tests based on the robust standard error estimates should have good properties regardless of the relative sizes of n and T. I report results from a simple simulation study used to assess the finite sample effectiveness of the robust covariance matrix estimator and tests based upon it below. Specifically, the simulation focuses on t-tests for regression coefficients and the HA test discussed above. The Monte Carlo simulations are based on two different specifications: a ‘‘fixed effect’’ specification and a ‘‘random effects’’ specification. The terminology refers to the fact that in the ‘‘fixed effect’’ specification, the models will be estimated including individual specific fixed effects with the goal of focusing on the case where the underlying disturbances exhibit weak dependence. In the ‘‘random effects’’ specification individual specific effects are not estimated and the goal is to examine the behavior of the CCM estimator and tests based upon it in an equicorrelated model. The fixed effect specification is yit ¼ x0it b þ ai þ eit , where xit is a scalar and ai is an individual specific effect. The data generating process for the fixed effect specification allows for serial correlation in both xit and eit and 13 This argument also applies to testing multiple parameters using F à . ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 610 heteroskedasticity: xit ¼ :5xitÀ1 þ vit ; vit $Nð0; :75Þ, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eit ¼ reitÀ1 þ a0 þ a1 x2 uit ; uit $Nð0; 1 À r2 Þ, it ai $Nð0; :5Þ. Data are simulated using four different values of r, r 2 f0; :3; :6; :9g, in both the homoskedastic ða0 ¼ 1; a1 ¼ 0Þ and heteroskedastic ða0 ¼ a1 ¼ :5Þ cases, resulting in a total of eight distinct parameter settings. The models are estimated including xit and a full set of individual specific fixed effects.14 The random effects specifications is yit ¼ x0it b þ it , where xit is a normally distributed scalar with E½x2 Š ¼ 1 and E½xit1 xit2 Š ¼ :8 for all t1 at2 . it it contains an individual specific random component and a random error term: it ¼ ai þ uit , ai $Nð0; rÞ, uit $Nð0; 1 À rÞ. Note that the random effects data generating process implies that E½it1 it2 Š ¼ r for t1 at2 . Three values of r are employed for the random effects specification: .3, .6, and .9. The model is estimated by regressing yit on xit and a constant. The fixed effects model is commonly used in empirical work when panel data are available. The random effects specification is also widely used in the policy evaluation literature. In many policy evaluation studies, the covariate of interest is a policy variable that is highly correlated within aggregate cells, often with a correlation of one, which has led to the dominance of the random effects estimator in this context. For example, a researcher may desire to estimate the effect of classroom level policies on student-level micro data containing observations from multiple classrooms. In this setting, T indexes the number of students within each class, n indexes the number of classrooms, and ai is a classroom specific random effect. The CCM estimator has been widely utilized in such situations in order to consistently estimate standard errors.15 Simulation results for various values of the cross-sectional (n) and time ðTÞ dimensions are reported. For each fn; Tg combination, reported results for each of the 11 parameter settings (eight for the fixed effects specification and three for the random effects specification) are based on 1,000 simulation repetitions. Each simulation estimates three types of standard errors for b unadjusted OLS standard errors, bOLS , CCM standard b: s errors, bCLUS , and standard errors consistent with an AR(1) process, bARð1Þ .16 For the s s 14 Since ai is uncorrelated with xi , this model could be estimated using random effects. I chose to consider a different specification for the random effects estimates where the xit were generated to more closely resemble covariates which appear in policy analysis studies. 15 This is, in fact, one of the original motivations for the development of the CCM estimator, cf. Liang and Zeger (1986). 16 bARð1Þ imposes the parametric structure implied by an AR(1) process. The r parameter is estimated from the s OLS residuals using the procedure described in Hansen (2006) which consistently estimates AR parameters in fixed effects panel models. The standard errors are then computed as ðX 0 X ÞÀ1 X 0 OðbÞX ðX 0 X ÞÀ1 where OðbÞ is the r r covariance matrix implied by an AR(1) process. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 611 random effects specification, standard errors consistent with random effects, bRE , are s s substituted for bARð1Þ .17 bCLUS is consistent for all parameter settings. bOLS is consistent only s s in the iid case (the homoskedastic data generating process with r ¼ 0Þ. bARð1Þ is consistent s in all homoskedastic data generating processes, and bRE is consistent in all models for s which it is reported. In all cases, the CCM estimator is computed using the normalization implied by T ! 1 with n fixed asymptotics; that is, the CCM estimator is computed as b b ðn=ðn À 1ÞÞW for W defined in Eq. (3). Tables 1–4 present the results of the Monte Carlo study, where each table corresponds to a different fn; Tg combination.18 In each table, Panel A presents the fixed effects results for the homoskedastic and heteroskedastic cases, while Panel B presents the random effects results. Column (1) presents t-test rejection rates for 5% level tests based on OLS, CCM, and AR(1) standard errors. The critical values for tests based on OLS and AR(1) errors are taken from a tnTÀnÀ1 distribution, and the critical values for tests based on clustered standard errors are taken from a tnÀ1 distribution. Columns (2) and (3) present the mean and standard deviation of the estimated standard errors respectively. Column (4) presents the standard deviation of the b The difference between columns (2) and (4) is therefore b’s. the bias of the estimated standard errors. Finally, column (5) presents the rejection rates for the HA test described above which tests the null hypothesis that both the CCM estimator and the parametric estimator are consistent. As expected, tests based on bOLS and bARð1Þ perform well in the cases where the assumed s s model is consistent with the data across the full range of n and T combinations. The results pffiffiffiffiffiffi ffi are also consistent with the asymptotic theory, clearly illustrating the nT -consistency of b b b b b and W with the bias of W and the variance of both b and W decreasing as either n or T b increases. Of course, when the assumed parametric model is inconsistent with the data, tests based on parametric standard errors suffer from size distortions and the standard error estimates are biased. The RE tests have the correct size for moderate and large n, but not for small n (i.e. n ¼ 10); and as indicated by the asymptotic theory, the T dimension has no apparent impact on the size of RE based tests or the overall performance of the RE estimates. Tests based on the CCM estimator have approximately correct size across all combinations of n and T and all models of the disturbances considered in the fixed effect specification. The estimator does, however, display a moderate bias in the small n case; it seems likely that this bias does not translate into a large size distortion due to the fact that the bias is small relative to the standard error of the estimator and the use of the tnÀ1 distribution to obtain the critical values. While the clustered standard errors perform well in terms of size of tests and reasonably well in terms of bias, the simulations reveal that a potential weakness of the clustered estimator is a relatively high variance. The CCM estimates have a substantially higher standard deviation than the other estimators and this difference, in percentage terms, increases with T. This behavior is consistent with the 17 bRE is estimated in a manner analogous to bARð1Þ where the covariance parameters are estimated in the usual s s manner from the OLS and within residuals. 18 Tables 1–4 correspond to fn; Tg ¼ f10; 10g, fn; Tg ¼ f10; 50g, fn; Tg ¼ f50; 10g, fn; Tg ¼ f50; 50g, respectively. Additional results for fn; Tg ¼ f10; 200g, fn; Tg ¼ f50; 20g, fn; Tg ¼ f50; 200g, fn; Tg ¼ f200; 10g, and fn; Tg ¼ f200; 50g are available from the author upon request. The results are consistent with the asymptotic theory with the performance of the CCM estimator improving as either n or T increases in the fixed effects specification and as n increases in the random effects specification. In the random effects case, the performance does not appear to be greatly influenced by the size of T relative to n. ARTICLE IN PRESS 612 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Table 1 Data generating process N ¼ 10; T ¼ 10 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.038 0.043 0.041 0.1180 0.1149 0.1170 0.0133 0.0330 0.0141 0.1152 0.1152 0.1152 0.152 0.082 0.054 0.055 0.1130 0.1212 0.1240 0.0136 0.0357 0.0161 0.1269 0.1269 0.1269 0.095 0.093 0.060 0.051 0.1005 0.1167 0.1219 0.0133 0.0352 0.0181 0.1231 0.1231 0.1231 0.074 0.145 0.053 0.054 0.0609 0.0772 0.0795 0.0090 0.0249 0.0136 0.0818 0.0818 0.0818 0.038 0.126 0.057 0.126 0.1150 0.1410 0.1140 0.0126 0.0458 0.0137 0.1502 0.1502 0.1502 0.051 0.171 0.068 0.143 0.1165 0.1538 0.1284 0.0137 0.0500 0.0172 0.1708 0.1708 0.1708 0.036 0.187 0.074 0.117 0.1238 0.1717 0.1503 0.0153 0.0572 0.0219 0.1853 0.1853 0.1853 0.027 0.198 0.087 0.097 0.1406 0.1872 0.1830 0.0209 0.0641 0.0336 0.2181 0.2181 0.2181 0.031 0.295 0.115 0.097 0.1063 0.1561 0.1693 0.0231 0.0609 0.0460 0.1926 0.1926 0.1926 0.017 0.399 0.118 0.094 0.1030 0.2024 0.2180 0.0248 0.0788 0.0600 0.2438 0.2438 0.2438 0.054 0.482 0.108 0.095 0.0987 0.2346 0.2546 0.0293 0.0909 0.0723 0.2925 0.2925 0.2925 0.093 HA test rejection rate (5) 0.135 0.133 0.123 0.085 0.042 0.044 0.049 0.074 0.027 0.023 0.018 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 613 Table 2 Data generating process N ¼ 10; T ¼ 50 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.054 0.050 0.057 0.0462 0.0449 0.0460 0.0024 0.0117 0.0026 0.0472 0.0472 0.0472 0.184 0.088 0.043 0.050 0.0459 0.0520 0.0529 0.0024 0.0133 0.0031 0.0519 0.0519 0.0519 0.077 0.155 0.042 0.047 0.0447 0.0574 0.0598 0.0028 0.0150 0.0044 0.0590 0.0590 0.0590 0.049 0.225 0.046 0.049 0.0372 0.0562 0.0583 0.0034 0.0159 0.0072 0.0600 0.0600 0.0600 0.046 0.158 0.051 0.162 0.0459 0.0606 0.0458 0.0021 0.0169 0.0023 0.0637 0.0637 0.0637 0.052 0.199 0.041 0.142 0.0479 0.0735 0.0553 0.0022 0.0198 0.0032 0.0724 0.0724 0.0724 0.046 0.229 0.043 0.112 0.0558 0.0928 0.0748 0.0031 0.0260 0.0054 0.0934 0.0934 0.0934 0.067 0.239 0.046 0.076 0.0857 0.1428 0.1338 0.0079 0.0451 0.0163 0.1490 0.1490 0.1490 0.059 0.568 0.104 0.097 0.0471 0.1356 0.1475 0.0092 0.0547 0.0413 0.1636 0.1636 0.1626 0.147 0.703 0.104 0.095 0.0466 0.1897 0.2079 0.0105 0.0727 0.0567 0.2331 0.2331 0.2331 0.212 0.744 0.106 0.103 0.0450 0.2310 0.2539 0.0130 0.0920 0.0701 0.2785 0.2785 0.2785 0.245 HA test rejection rate (5) 0.185 0.159 0.184 0.150 0.057 0.047 0.059 0.099 0.014 0.007 0.014 ARTICLE IN PRESS 614 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Table 3 Data generating process N ¼ 50; T ¼ 10 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.049 0.057 0.047 0.0522 0.0515 0.0522 0.0026 0.0062 0.0028 0.0526 0.0526 0.0526 0.106 0.080 0.059 0.055 0.0500 0.0552 0.0556 0.0027 0.0072 0.0033 0.0569 0.0569 0.0569 0.053 0.102 0.048 0.049 0.0447 0.0549 0.0553 0.0026 0.0071 0.0037 0.0539 0.0539 0.0539 0.132 0.156 0.075 0.067 0.0273 0.0364 0.0367 0.0273 0.0367 0.0367 0.0387 0.0387 0.0387 0.220 0.119 0.047 0.116 0.0517 0.0673 0.0516 0.0025 0.0093 0.0028 0.0659 0.0659 0.0659 0.213 0.197 0.062 0.139 0.0521 0.0741 0.0581 0.0026 0.0114 0.0033 0.0768 0.0768 0.0768 0.369 0.214 0.048 0.108 0.0558 0.0820 0.0688 0.0031 0.0126 0.0045 0.0840 0.0840 0.0840 0.451 0.152 0.038 0.057 0.0623 0.0899 0.0834 0.0043 0.0144 0.0070 0.0883 0.0883 0.0883 0.324 0.291 0.062 0.059 0.0451 0.0776 0.0788 0.0041 0.0135 0.0091 0.0822 0.0822 0.0822 0.673 0.357 0.073 0.068 0.0452 0.1004 0.1028 0.0049 0.0183 0.0127 0.1034 0.1034 0.1034 0.892 0.497 0.062 0.063 0.0447 0.1192 0.1210 0.0056 0.0212 0.0147 0.1246 0.1246 0.1246 0.943 HA test rejection rate (5) 0.099 0.092 0.072 0.078 0.210 0.140 0.056 0.023 0.058 0.054 0.048 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 615 Table 4 Data generating process N ¼ 50; T ¼ 20 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.050 0.049 0.052 0.0342 0.0341 0.0342 0.0013 0.0040 0.0014 0.0341 0.0341 0.0341 0.097 0.094 0.051 0.056 0.0334 0.0379 0.0382 0.0013 0.0045 0.0016 0.0393 0.0393 0.0393 0.077 0.120 0.059 0.050 0.0315 0.0407 0.0412 0.0014 0.0052 0.0021 0.0414 0.0414 0.0414 0.300 0.200 0.059 0.060 0.0222 0.0327 0.0329 0.0013 0.0047 0.0024 0.0336 0.0336 0.0336 0.580 0.168 0.063 0.171 0.0340 0.0458 0.0340 0.0011 0.0056 0.0012 0.0479 0.0479 0.0479 0.408 0.209 0.051 0.145 0.0350 0.0527 0.0399 0.0012 0.0068 0.0016 0.0536 0.0536 0.0536 0.675 0.228 0.050 0.119 0.0394 0.0636 0.0514 0.0017 0.0084 0.0027 0.0653 0.0653 0.0653 0.802 0.196 0.036 0.058 0.0507 0.0809 0.0751 0.0028 0.0131 0.0056 0.0775 0.0775 0.0775 0.681 0.405 0.069 0.063 0.0320 0.0726 0.0738 0.0029 0.0131 0.0085 0.0756 0.0756 0.0756 0.915 0.515 0.066 0.055 0.0318 0.0976 0.0996 0.0033 0.0169 0.0118 0.1012 0.1012 0.1012 0.944 0.614 0.054 0.051 0.0314 0.1166 0.1194 0.0038 0.0204 0.0140 0.1203 0.1203 0.1203 0.948 HA test rejection rate (5) 0.088 0.086 0.092 0.094 0.406 0.294 0.123 0.034 0.064 0.055 0.053 ARTICLE IN PRESS 616 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 pffiffiffi n-consistency of the estimator and does suggest that if a parametric estimator is available, it may have better properties for estimating the variance of b b: The clustered estimator performs less well in the random effects specification. For small n, tests based on the CCM estimator suffer from a substantial size distortion for all values of T. For moderate to large values of n, the tests have the correct size, and the overall performance does not appear to depend on T. In addition, the variance of b does ffiffiffiffiffiffiffi b p not appear to decrease as T increases. These results are consistent with the lack of nT consistency in this case.19 The performance of the HA test is much less robust than that of t-tests based on clustered standard errors. For small n, the tests are badly size distorted and have essentially no power against any alternative hypotheses. As n and T grow, the test performance improves. With n ¼ 50, the test remains size distorted, but it does have some power against alternatives that increases as T increases. The HA test also performs poorly for the random effects specification for small n. However, for moderate or large n, the test has both the correct size and good power. Overall, the simulation results support the use of clustered standard errors for performing inference on regression coefficient estimates in serially correlated panel data, though they also suggest care should be taken if n is small and one suspects a ‘‘random b effects’’ structure. The poor performance of W in ‘‘random effects’’ models with small n is already well-known; see for example Bell and McCaffrey (2002) who also suggest a bias b reduction for W in this case. However, that the estimator does quite well even for small n in the serially correlated case where the errors are mixing is somewhat surprising and is a new result which is suggested by the asymptotic analysis of the previous section. The simulation results confirm the asymptotic results, suggesting that the clustered standard errors are consistent as long as n ! 1 and that they are not sensitive to the size of n relative to T. The chief drawback of the CCM estimator is that the robustness comes at the cost of increasing the variance of the standard error estimate relative to that of standard errors estimated through more parsimonious models. The HA test offers one simple information based criterion for choosing between the CCM estimator and a simple parametric model of the error process. However, the simulation evidence regarding its usefulness is mixed. In particular, the properties of the test are poor in small sample settings where there is likely to be the largest gain to using a parsimonious model. However, in moderate sized samples, the test performs reasonably well, and there still may be gains to using a simple parametric model in these cases. 5. Conclusion This paper explores the asymptotic behavior of the robust covariance matrix estimator of Arellano (1987). It extends the usual analysis performed under asymptotics where n ! 1 with T fixed to cases where n and T go to infinity jointly, considering both non-mixing and mixing cases, and to the case where T ! 1 with n fixed. The limiting behavior of the OLS estimator, b in each case is different. However, the analysis shows that the b, conventional estimator of the asymptotic variance and the usual t and F statistics have the same properties regardless of the behavior of the time series as long as n ! 1: In addition, The inconsistency of b when T increases with n fixed in differences-in-differences and policy evaluation studies b has also been discussed in Donald and Lang (2001). 19 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 617 when T ! 1 with n fixed and the data satisfy mixing conditions and an iid assumption across individuals, the usual t and F statistics can be used for inference despite the fact that the robust covariance matrix estimator is not consistent but converges in distribution to a limiting random variable. In this case, it is shown that the t statistic constructed using n=ðn À 1Þ times the estimator of Arellano (1987) is asymptotically tnÀ1 , suggesting the use of n=ðn À 1Þ times the estimator of Arellano (1987) and critical values obtained from a tnÀ1 in all cases. The use of this procedure is also supported in a short simulation experiment, which verifies that it produces tests with approximately correct size regardless of the relative size of n and T in cases where the time series correlation between observations diminishes as the distance between observations increases. The simulations also verify that tests based on the robust standard errors are consistent as n increases regardless of the relative size of n and T even in cases when the data are equicorrelated. Acknowledgments The research reported in this paper was motivated through conversations with Byron Lutz, to whom I am very grateful for input in developing this paper. I would like to thank Whitney Newey and Victor Chernozhukov as well as anonymous referees and a coeditor for helpful comments and suggestions. This work was partially supported by the William S. Fishman Faculty Research Fund at the Graduate School of Business, the University of Chicago. All remaining errors are mine. Appendix For brevity, sketches of the proofs are provided below. More detailed versions are available in an additional Technical Appendix from the author upon request and in Hansen (2004). pffiffiffiffiffiffiffi P p d Proof of Theorem 1. b À b ! 0 and b nT ðb À bÞ ! QÀ1 Nð0; W ¼ limn ð1=nTÞ n b i¼1 E½x0i Oi xi ŠÞ follow immediately under the conditions of Theorem 1 from the Markov LLN and the Liapounov CLT. The remaining conclusions follow from repeated use of the Cauchy–Schwarz inequality, Minkowski’s inequality, the Markov LLN, and the Liapounov CLT. & The proofs of Theorems 2 and 3 make use of the following lemmas which provide a LLN and CLT for inid data as fn; Tg ! 1 jointly. Lemma 1. Suppose fZ i;T g are independent across i for all T with E½Z i;T Š ¼ mi;T and P p EjZ i;T j1þd oDo1 for some d40 and all i; T. Then ð1=nÞ n ðZ i;T À mi;T Þ ! 0 as fn; Tg ! i¼1 1 jointly. Proof. The proof follows from standard arguments, cf. Chung (2001) Chapter 5. Details are given in Hansen (2004). & Lemma 2. For k  1 vectors Z i;T , suppose fZ i;T g are independent across i for all T with E½Z i;T Š ¼ 0, E½Z i;T Z 0i;T Š ¼ Oi;T , and EkZ i;T k2þd oDo1 for some d40. Assume O ¼ P pffiffiffi P limn;T ð1=nÞ n Oi;T is positive definite with minimum eigenvalue lmin 40. Then ð1= nÞ n i¼1 i¼1 d Z i;T ! Nð0; W Þ as fn; Tg ! 1 jointly. ARTICLE IN PRESS 618 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Proof. The result follows from verifying the Lindeberg condition of Theorem 2 in Phillips and Moon (1999) using an argument similar to that used in the proof of Theorem 3 in Phillips and Moon (1999). Details are given in Hansen (2004). & Proof of Theorem 2. The conclusions follow from conventional arguments making repeated use of the Cauchy–Schwarz inequality, Minkowski’s inequality, and Lemmas 1 and 2. & In addition to using Lemmas 1 and 2, I make use of the following mixing inequality, restated from Doukhan (1994) Theorem 2 with a slight change of notation, to establish the properties of the estimators as fn; Tg ! 1 when mixing conditions are imposed. Its proof may be found in Doukhan (1994, p. 25–30). Lemma 3. Let fzt g be a strong mixing sequence with E½zt Š ¼ 0, Ekzt ktþ oDo1, and mixing coefficient aðmÞ of size ð1 À cÞr=ðr À cÞ where c 2 2N, P and r4c. Then there is a constant cXt, C depending only on t and aðmÞ such that Ej T yt jt pCDðt; ; TÞ with Dðt; ; TÞ t¼1 defined in Doukhan (1994) and satisfying Dðt; ; TÞ ¼ OðTÞ if tp2 and Dðt; ; TÞ ¼ OðT t=2 Þ if t42. Proof of Theorem 3. The conclusions follow under the conditions of the theorem by making use of the Cauchy–Schwarz inequality, Minkowsk’s inequality, and Lemma 3 to verifythe conditions of Lemmas 1 and 2. & pffiffiffi d Proof of Theorem 4. Under ffiffiffiffi hypotheses of the theorem, nðb À bÞ ! QÀ1 Nð0; W Þ, b p thed p x0i xi =T À Qi ! 0, and x0i i = T ! Nð0; W i Þ are immediate from a LLN and CLT for mixing sequences, cf. White (2001, Theorems 3.47 and 5.20). The conclusion then follows b from the definition of W and bi . &  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffi bÀ1 b bÀ1 Proof of Corollary 4.1. Consider tà ¼ nT ðRb À rÞ= RQ W Q R0 . Under the null b pffiffiffiffiffiffiffi P b nT Rðb À bÞ ¼ Rðð1=nTÞ i x0i xi ÞÀ1 hypothesis, Rb ¼ r, so the numerator of tà is pffiffiffiffiffiffiffi P P pffiffiffi d ðð1= nT Þ i x0i i Þ ! RQÀ1 L i Bi = n. From Theorem 4 and the hypotheses of the Corollary, the denominator of tà converges in distribution to vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! u n n n X u 1X X 0 À1 1 0 tRQ L Bi Bi À Bi Bi LQÀ1 R0 . n n i¼1 i¼1 i¼1 It follows from the Continuous Mapping Theorem that P pffiffiffi RQÀ1 L i Bi = n à d t ! qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . P P P ð1=nÞRQÀ1 Lð n Bi B0i À ð1=nÞ n Bi n B0i ÞLQÀ1 R0 i¼1 i¼1 i¼1 Define d ¼ ðRQÀ1 LLQÀ1 R0 Þ1=2 , so P pffiffiffi d i B1;i = n d tà ! U ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P P ðd2 =nÞð n B1;i B01;i À ð1=nÞ n B1;i n B01;i Þ i¼1 i¼1 i¼1 e B1;n ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . P e2 ð1=nÞð i B2 À B1;n Þ 1;i ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 e It is straightforward to show that B1;n $Nð0; 1Þ, that e2 B1;n P 2 i B1;i 2 e À B1;n $w2 , and that nÀ1 619 P 2 i B1;i À e and B1;n are independent, from which it follows that U¼  n 1=2  n 1=2 e B1;n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi $ tnÀ1 . P nÀ1 nÀ1 e2 ð i B2 À B1;n Þ=ðn À 1Þ 1;i The result for F à is obtained through a similar argument, and using a result from Rao (2002) Chapter 8b to verify that the resulting quantity follows an F distribution. & References Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59 (3), 817–858. Arellano, M., 1987. Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics 49 (4), 431–434. Baltagi, B.H., Wu, P.X., 1999. Unequally spaced panel data regressions with AR(1) disturbances. Econometric Theory 15, 814–823. Bell, R.M., McCaffrey, D.F., 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Mimeo RAND. Bertrand, M., Duflo, E., Mullainathan, S., 2004. How much should we trust differences-in-differences estimates? Quarterly Journal of Economics 119 (1), 249–275. Bhargava, A., Franzini, L., Narendranathan, W., 1982. Serial correlation and the fixed effects model. Review of Economic Studies 49, 533–549. Chung, K.L., 2001. A Course in Probability Theory, third ed. Academic Press, San Diego. Donald, S., Lang, K., 2001. Inference with differences in differences and other panel data. Mimeo. Doukhan, P., 1994. Mixing: properties and examples. In: Fienberg, S., Gani, J., Krickeberg, K., Olkin, I., Wermuth, N. (Eds.), Lecture Notes in Statistics, vol. 85. Springer, New York. Drukker, D.M., 2003. Testing for serial correlation in linear panel-data models. Stata Journal 3, 168–177. Hahn, J., Kuersteiner, G.M., 2002. Asymptotically unbiased inference for a dynamic panel model with fixed effects when both N and T are large. Econometrica 70 (4), 1639–1657. Hahn, J., Newey, W.K., 2004. Jackknife and analytical bias reduction for nonlinear panel models. Econometrica 72 (4), 1295–1319. Hansen, C.B., 2004. Inference in linear panel data models with serial correlation and an essay on the impact of 401(k) participation on the wealth distribution. Ph.D. Dissertation, Massachusetts Institute of Technology. Hansen, C.B., 2006. Generalized least squares inference in multilevel models with serial correlation and fixed effects. Journal of Econometrics, doi:10.1016/j.jeconom.2006.07.011. Kezdi, G., 2002. Robust standard error estimation in fixed-effects panel models. Mimeo. Kiefer, N.M., Vogelsang, T.J., 2002. Heteroskedasticity–autocorrelation robust testing using bandwidth equal to sample size. Econometric Theory 18, 1350–1366. Kiefer, N.M., Vogelsang, T.J., 2005. A new asymptotic theory for heteroskedasticity–autocorrelation robust tests. Econometric Theory 21, 1130–1164. Lancaster, T., 2002. Orthogonal parameters and panel data. Review of Economic Studies 69, 647–666. Liang, K.-Y., Zeger, S., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73 (1), 13–22. MaCurdy, T.E., 1982. The use of time series processes to model the error structure of earnings in a longitudinal data analysis. Journal of Econometrics 18 (1), 83–114. Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica 49 (6), 1417–1426. Phillips, P.C.B., Moon, H.R., 1999. Linear regression limit theory for nonstationary panel data. Econometrica 67 (5), 1057–1111. Phillips, P.C.B., Sun, Y., Jin, S., 2003. Consistent HAC estimation and robust regression testing using sharp origin kernels with no truncation. Cowles Foundation Discussion Paper 1407. Rao, C.R., 2002. Linear Statistical Inference and Its Application. Wiley-Interscience. ARTICLE IN PRESS 620 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Solon, G., 1984. Estimating autocorrelations in fixed effects models. NBER Technical Working Paper no. 32. Solon, G., Inoue, A., 2004. A portmanteau test for serially correlated errors in fixed effects models. Mimeo. Stata Corporation, 2003. Stata User’s Guide Release 8. Stata Press, College Station, Texas. Vogelsang, T.J., 2003. Testing in GMM models without truncation. In: Fomby, T.B., Hill, R.C. (Eds.), Advances in Econometrics, volume 17, Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later. Elsevier, Amsterdam, pp. 192–233. White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 (4), 817–838. White, H., 2001. Asymptotic Theory for Econometricians, revised edition. Academic Press, San Diego. Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data. The MIT Press, Cambridge, MA.

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.


Why Is My Information Online?