"The Apple iPod iTunes Anti-Trust Litigation"

Filing 753

***ERRONEOUS ENTRY, PLEASE REFER TO DOCUMENT NO. 754 *** EXHIBITS re 752 Opposition/Response to Motion, filed byApple Inc.. (Attachments: # 1 Exhibit 2, # 2 Exhibit 3, # 3 Exhibit 4, # 4 Exhibit 5, # 5 Exhibit 6, # 6 Exhibit 7, # 7 Exhibit 8, # 8 Exhibit 9, # 9 Exhibit 11, # 10 Exhibit 12, # 11 Proposed Order)(Related document(s) 752 ) (Kiernan, David) (Filed on 1/14/2014) Modified on 1/14/2014 (jlmS, COURT STAFF).

Download PDF

Exhibit 12 ARTICLE IN PRESS Journal of Econometrics 141 (2007) 597–620 www.elsevier.com/locate/jeconom Asymptotic properties of a robust variance matrix estimator for panel data when T is large Christian B. Hansen University of Chicago, Graduate School of Business, 5807 South Woodlawn Ave., Chicago, IL 60637, USA Available online 20 November 2006 Abstract I consider the asymptotic properties of a commonly advocated covariance matrix estimator for panel data. Under asymptotics where the cross-section dimension, n, grows large with the time dimension, T, ﬁxed, the estimator is consistent while allowing essentially arbitrary correlation within each individual. However, many panel data sets have a non-negligible time dimension. I extend the usual analysis to cases where n and T go to inﬁnity jointly and where T ! 1 with n ﬁxed. I provide conditions under which t and F statistics based on the covariance matrix estimator provide valid inference and illustrate the properties of the estimator in a simulation study. r 2007 Elsevier B.V. All rights reserved. JEL classiﬁcation: C12; C13; C23 Keywords: Panel; Heteroskedasticity; Autocorrelation; Robust; Covariance matrix 1. Introduction The use of heteroskedasticity robust covariance matrix estimators, cf. White (1980), in cross-sectional settings and of heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators, cf. Andrews (1991), in time series contexts is extremely common in applied econometrics. The popularity of these robust covariance matrix estimators is due to their consistency under weak functional form assumptions. In particular, their use allows the researcher to form valid conﬁdence regions about a set of parameters from a model of interest without specifying an exact process for the disturbances in the model. E-mail address: chansen1@chicagoGSB.edu. 0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2006.10.009 ARTICLE IN PRESS 598 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 With the increasing availability of panel data, it is natural that the use of robust covariance matrix estimators for panel data settings that allow for arbitrary within individual correlation are becoming more common. A recent paper by Bertrand et al. (2004) illustrated the pitfalls of ignoring serial correlation in panel data, ﬁnding through a simulation study that inference procedures which fail to account for within individual serial correlation may be severely size distorted. As a potential resolution of this problem, Bertrand et al. (2004) suggest the use of a robust covariance matrix estimator proposed by Arellano (1987) and explored in Kezdi (2002) which allows arbitrary within individual correlation and ﬁnd in a simulation study that tests based on this estimator of the covariance parameters have correct size. One drawback of the estimator of Arellano (1987), hereafter referred to as the ‘‘clustered’’ covariance matrix (CCM) estimator, is that its properties are only known in conventional panel asymptotics as the cross-section dimension, n, increases with the time dimension, T, ﬁxed. While many panel data sets are indeed characterized by large n and relatively small T, this is not necessarily the case. For example, in many differences-indifferences and policy evaluation studies, the cross-section is composed of states and the time dimension of yearly or quarterly (or occasionally monthly) observations on each state for 20 or more years. In this paper, I address this issue by exploring the theoretical properties of the CCM estimator in asymptotics that allow n and T to go to inﬁnity jointly and in asymptotics where T goes to inﬁnity with n ﬁxed. I ﬁnd that the CCM estimator, appropriately normalized, is consistent without imposing any conditions on the rate of growth of T relative to n even when the time series dependence between the observations within each individual is left unrestricted. In this case, both the OLS estimator and the CCM estimator pﬃﬃﬃ converge at only the n-rate, essentially because the only information is coming from cross-sectional variation. If the pﬃﬃﬃﬃﬃﬃﬃ series process is restricted to be strongly mixing, I time show that the OLS estimator is nT -consistent but that, because high lags pﬃﬃﬃ not down are weighted, the robust covariance matrix estimator still converges at only the n-rate. This behavior suggests, as indicated in the simulations found in Kezdi (2002), that it is the n dimension and not the size of n relative to T that matters for determining the properties of the CCM estimator. It is interesting to note that the limiting behavior of b changes ‘‘discontinuously’’ as the b amount of dependence is limited. In ﬃﬃﬃﬃﬃﬃﬃ particular, the rate of convergence of b changes from b p pﬃﬃﬃ n in the ‘‘no-mixing case’’ to nT when mixing is imposed. However, despite the difference in the limiting behavior of b there is no difference in the behavior of standard b, inference procedures based on the CCM estimator between the two cases. In particular, the same t and F statistics will be valid in either case (and in the n ! 1 with T ﬁxed case) without reference to the asymptotics or degree of dependence in the data. I also derive the behavior of the CCM estimator as T ! 1 with n ﬁxed, where I ﬁnd the estimator is not consistent but does have a limiting distribution. This result corresponds to asymptotic results for HAC estimators without truncation found in recent work by Kiefer and Vogelsang (2002, 2005), Phillips et al. (2003), and Vogelsang (2003). While the limiting distribution is not proportional to the true covariance matrix in general, it is proportional to the covariance matrix in the important special case of iid data across individuals,1 1 Note that this still allows arbitrary correlation and heteroskedasticity within individuals, but restricts that the pattern is the same across individuals. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 599 allowing construction of asymptotically pivotal statistics in this case. In fact, in this case, the standard t-statistic is not asymptotically normal but converges in distribution to a random variable which is exactly proportional to a tnÀ1 distribution. This behavior suggests the use of the tnÀ1 for constructing conﬁdence intervals and tests when the CCM estimator is used as a general rule, as this will provide asymptotically correct critical values under any asymptotic sequence. I then explore the ﬁnite sample behavior of the CCM estimator and tests based upon it through a short simulation study. The simulation results indicate that tests based on the robust standard error estimates generally have approximately correct size in serially correlated panel data even in small samples. However, the standard error estimates themselves are considerably more variable than their counterparts based on simple parametric models. The bias of the simple parametric estimators is also typically smaller in the cases where the parametric model is correct, suggesting that these standard error estimates are likely preferable when the researcher is conﬁdent in the form of the error process. In the simulation, I also explore the behavior of an analog of White’s (1980) direct test for heteroskedasticity proposed by Kezdi (2002).2 The results indicate the performance of the test is fairly good for moderate n, though it is quite poor when n is small. This simulation behavior suggests that this test may be useful for choosing between the use of robust standard error estimates and standard errors estimated from a more parsimonious model when n is reasonably large. The remainder of this paper is organized as follows. In Section 2, I present the basic framework and the estimator and test statistics that will be considered. The asymptotic properties of these estimators are collected in Section 3, and Section 4 contains a discussion of a Monte Carlo study assessing the ﬁnite sample performance of the estimators in simple models. Section 5 concludes. 2. A heteroskedasticity–autocorrelation consistent covariance matrix estimator for panel data Consider a regression model deﬁned by yit ¼ x0it b þ it , (1) where i ¼ 1; . . . ; n indexes individuals, t ¼ 1; . . . ; T indexes time, xit is a k Â 1 vector of observable covariates, and it is an unobservable error component. Note that this formulation incorporates the standard ﬁxed effects model as well as models which include other covariates that enter the model with individual speciﬁc coefﬁcients, such as individual speciﬁc time trends, where these covariates have been partialed out. In these cases, the variables xit , yit , and it should be interpreted as residuals from regressions of xÃ , it yÃ , and Ã on an auxiliary set of covariates zÃ from the underlying model it it 0 it 0 yÃ ¼ xÃ b þ zÃ g þ Ã . For example, in the ﬁxed effects model, ZÃ is a matrix of dummy it it it it variables for each individual and g is a vector of individual speciﬁc ﬁxed effects. In this P case, xit ¼ xÃ À ð1=TÞ T xÃ , and yit and it are deﬁned similarly. Alternatively, xit , yit , it t¼1 it and it could be interpreted as variables resulting from other transformations which 2 Solon and Inoue (2004) offers a different testing procedure for detecting serial correlation in ﬁxed effects panel models. See also Bhargava et al. (1982), Baltagi and Wu (1999), Wooldridge (2002, pp. 275, 282–283), and Drukker (2003). ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 600 remove the nuisance parameters from the equation, such as ﬁrst-differencing to remove the ﬁxed effects. In what follows, all properties are given in terms of the transformed variables for convenience. Alternatively, conditions could be imposed on the underlying variables and the properties derived as T ! 1 as in Hansen (2006).3 Within each individual, the equations deﬁned by (1) may be stacked and represented in matrix form as y i ¼ xi b þ i , (2) where yi is a T Â 1 vector of individual outcomes, xi is a T Â k vector of observed covariates, and i is a T Â 1 vector of unobservables affecting the outcomes yi with E½i 0i jxi ¼ Oi . The P P OLS estimator of b from Eq. (2) may then be deﬁned as b ¼ ð n x0 xi ÞÀ1 n x0 y . The b i¼1 i i¼1 i i properties of b as n ! 1 with T ﬁxed are well known. In particular, under regularity b pﬃﬃﬃ b conditions, nP À bÞ is asymptotically normal with covariance matrix QÀ1 WQÀ1 where ðb P Q ¼ limn ð1=nÞ n E½x0i xi and W ¼ limn ð1=nÞ n E½x0i Oi xi . i¼1 i¼1 The problem of robust covariance matrix estimation is then estimating W without imposing a parametric structure on the Oi . In this paper, I consider the estimator suggested by Arellano (1987) which may be deﬁned as n 1 X 0 0 b x bib xi , W¼ nT i¼1 i i (3) b where bi ¼ yi À xi b are OLS residuals from Eq. (2). This estimator is an appealing generalization of White’s (1980) heteroskedasticity consistent covariance matrix estimator that allows for arbitrary intertemporal correlation patterns and heteroskedasticity across individuals.4 The estimator is also appealing in that, unlike HAC estimators for time series data, its implementation does not require the selection of a kernel or bandwidth parameter. b The properties of W under conventional panel asymptotics where n ! 1 with T ﬁxed are well-established. In the remainder of this paper, I extend this analysis by considering the b properties of W under asymptotic sequences where T ! 1 as well. The chief reason for interest in the CCM estimator is for performing inference about b b. pﬃﬃﬃﬃﬃﬃﬃﬃ b d b Suppose d nT ðb À bÞ ! Nð0; BÞ and deﬁne an estimator of the asymptotic variance of b as b where B ! B. The following estimator of the asymptotic variance of b based on b p b ð1=d nT ÞB b W is used throughout the remainder of the paper: d bÞ Avarðb ¼ n X !À1 i¼1 ¼ n X i¼1 3 b ðnT W Þ x0i xi !À1 x0i xi n X i¼1 n X i¼1 x0ibib0i xi !À1 x0i xi ! n X !À1 x0i xi . ð4Þ i¼1 This is especially relevant in Theorem 3 where the mixing conditions will not hold for the transformed variables if, for example, the transformation is to remove ﬁxed effects by differencing out the individual means. Hansen (2006) provides conditions on the untransformed variables which will cover this case in a different but related context. This approach complicates the proof and notation and is not pursued here. 4 It does, however, ignore the possibility of cross-sectional correlation, and it will be assumed that there is no cross-sectional correlation for the remainder of the paper. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 601 In addition, for testing the hypothesis Rb ¼ r for a q Â k matrix R with rank q, the usual t (for R a 1 Â k vector) and Wald statistics can be deﬁned as pﬃﬃﬃﬃﬃﬃﬃ nT ðRb À rÞ b t ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ bÀ1 b bÀ1 RQ W Q R0 (5) bÀ1 b bÀ1 b b F Ã ¼ nTðRb À rÞ0 ½RQ W Q R0 À1 ðRb À rÞ, (6) Ã and P b b respectively, where W is deﬁned above and Q ¼ ð1=nTÞ n x0i xi . In Section 3, I verify i¼1 d d b tÃ ! Nð0; 1Þ, F Ã ! w2 , and Avarðb is d bÞ that, despite differences in the limiting behavior of b, q valid for estimating the asymptotic variance of b as n ! 1 regardless of the behavior of T. b b I also consider the behavior of tÃ and F Ã as T ! 1 with n ﬁxed. In this case, W is not consistent for W but does have a limiting distribution; and when the data are iid across i,5 I d show that tÃ !ðn=ðn À 1ÞÞ1=2 tnÀ1 and that F Ã is asymptotically pivotal and so can be used b to construct valid tests. This behavior suggests that inference using ðn=ðn À 1ÞÞW and forming critical values using a tnÀ1 distribution will be valid regardless of the asymptotic sequence considered. b It is worth noting that the estimator W has also been used extensively in multilevel models to account for the presence of correlation between individuals within cells; cf. Liang and Zeger (1986) and Bell and McCaffrey (2002). For example, in a schooling study, one might have data on individual outcomes where the individuals are grouped into classes. In this case, the cross-sectional unit of observation could be deﬁned as the class, and arbitrary correlation between all individuals within each class could be allowed. In this case, one would expect the presence of a classroom speciﬁc random effect resulting in equicorrelation between all individuals within a class. While this would clearly violate the mixing assumptions imposed in obtaining the asymptotic behavior as T ! 1 with n ﬁxed, b it would not invalidate the use of W for inference about b in cases where n and T go to inﬁnity jointly. In addition to being useful for performing inference about b W may also be used to test b, b the speciﬁcation of simple parametric models of the error process.6 Such a test may be useful for a number of reasons. If a parametric model is correct, the estimates of the variance of b based on this model will tend to behave better than the estimates obtained b b from W . In particular, parametric estimates of the variance of b will often be considerably b b less variable and will typically converge faster than estimates made using W ; and if the parametric model is deemed to be adequate, this model may be used to perform FGLS estimation. The FGLS estimator is asymptotically more efﬁcient than the OLS estimator, and simulation evidence in Hansen (2006) suggests that the efﬁciency gain to using FGLS over OLS in serially correlated panel data may be substantial. 5 Note that this still allows arbitrary correlation and heteroskedasticity within individuals but restricts that the pattern is the same across individuals. 6 The test considered is a straightforward generalization of the test proposed by White (1980) for heteroskedasticity and was suggested in the panel context by Kezdi (2002). ARTICLE IN PRESS 602 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 To deﬁne the speciﬁcation test, called hereafter the heteroskedasticity–autocorrelation P b yÞ yÞ y (HA) test, let W ðb ¼ ð1=nTÞ n x0i Oi ðb 0 xi where b are estimates of a ﬁnite set of i¼1 yÞ parameters describing the disturbance process and Oi ðb is the implied covariance matrix for individual i.7 Deﬁne a test statistic À b b yÞÞ b b b yÞÞ, S Ã ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb (7) b b where D is a positive semi-deﬁnite weighting matrix that estimates the variance of vecðW À b ðb and AÀ is the generalized inverse of a matrix A.8 In the following section, it will be W yÞÞ d b shown that S Ã ! w2 kðkþ1Þ=2 for D deﬁned below. b is A natural choice for D n 1 X b D¼ ½ðvecðx0ibib0i xi À x0i Oi ðb i ÞÞðvecðx0ibib0i xi À x0i Oi ðb i ÞÞ0 . yÞx yÞx nT i¼1 (8) b Under asymptotics where fn; Tg ! 1 jointly, another potential choice for D is an estimate b: of the asymptotic variance of W n 1 X b b b ½ðvecðx0ibib0i xi À W ÞÞðvecðx0ibib0i xi À W ÞÞ0 . V¼ nT i¼1 (9) b b b yÞÞ That V provides an estimatorﬃ of the variance of vecðW À W ðb follows from the fact that pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃ b b yÞÞ as fn; Tg ! 1, vecðW Þ is n-consistent while vecðW ðb will be nT -consistent in many b yÞÞ b cases, so vecðW ðb may be taken as a constant relative to vecðW Þ. The difference in rates of convergence would arise, for example, in a ﬁxed effects panel model where the errors follow an AR process with common AR coefﬁcients across individuals. However, it is important to note that this will not always be the case. In particular, in random effects models, the estimator of the variance of the individual speciﬁc shock will converge at only pﬃﬃﬃ a n rate, implying the same rate of convergence for both the robust and parametric estimators of the variance. In the following section, I outline the asymptotic properties of b W , and V from which the behavior of tÃ , F Ã , and SÃ will follow. The properties of D, b b b, b b though not discussed, will generally be the same as those of V under the different asymptotic sequences considered. 3. Asymptotic properties of the robust covariance matrix estimator To develop the asymptotic inference results, I impose the following conditions. b yÞ Consistency and asymptotic normality of W ðb will generally follow from consistency and asymptotic b In particular, deﬁning W i ðyÞ as the derivative of W with respect to yi and letting y be a p Â 1 normality of y: P ¯ y ¯ vector, a Taylor series expansion of W ðb yields W ðb ¼ W ðyÞ þ p W i ðyÞðb À yÞ where y is an intermediate yÞ yÞ i¼1 b À W ðyÞ will inherit the properties of value. As long as a uniform law of large numbers applies to W i ðyÞ, W ðyÞ 7 b À y. The problem is then reduced to ﬁnding an estimator of y that is consistent and asymptotically normal with a y mean zero asymptotic distribution. Finding such an estimator in ﬁxed effects panel models with serial correlation and/or heteroskedasticity when n ! 1 and T=n ! r where ro1 is complicated, though there are estimators which exist. See, for example, Nickell (1981), MaCurdy (1982), Solon (1984), Lancaster (2002), Hahn and Kuersteiner (2002), Hahn and Newey (2004), and Hansen (2006). 8 b b yÞ The test could alternatively be deﬁned by only considering the ðkðk þ 1ÞÞ=2 unique elements of W À W ðb and using the inverse of the implied covariance matrix. This test will be equivalent to the test outlined above. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 603 Assumption 1. fxi ; i g are independent across i, and E½i 0i jxi ¼ Oi . P Assumption 2. QnT ¼ E½ n ðx0i xi =nTÞ is uniformly positive deﬁnite with constant limit Q i¼1 where limits are taken as n ! 1 with T ﬁxed in Theorem 1, as fn; Tg ! 1 in Theorems 2 and 3, and as T ! 1 with n ﬁxed in Theorem 4. In addition, I impose either Assumption 3(a) or Assumption 3(b) depending on the context. Assumption 3. (a) E½i jxi ¼ 0. (b) E½xit it ¼ 0. Assumptions 1–3 are quite standard for panel data models. Assumption 1 imposes independence across individuals, ruling out cross-sectional correlation, but leaves the time series correlation unconstrained and allows general heterogeneity across individuals. Assumption 2 is a standard full rank condition, and the restriction that QnT has a constant limit could be relaxed at the cost of more complicated notation. Assumption 3 imposes that one of two orthogonality conditions is satisﬁed. Assumption 3(b) imposes that xit and it are uncorrelated and is weaker than the strict exogeneity imposed in Assumption 3(a). Assumption 3(a) is stronger than necessary, but it simpliﬁes the proof of asympb b totic normality of W and consistency of V . In addition, Assumption 3(a) would typically 9 be imposed in ﬁxed effects models. The ﬁrst theorem, which is stated here for completeness, collects the properties of b and b b in asymptotics where n ! 1 with T ﬁxed. W Theorem 1. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisﬁed, and that n ! 1 with T ﬁxed. (i) If Assumption 3(b) holds and Ejxith j4þd oDo1 and Ejit j4þd oDo1 for some d40, then ! n pﬃﬃﬃﬃﬃﬃﬃ 1 X d nT ðb À bÞ ! QÀ1 N 0; W ¼ lim b E½x0i Oi xi , n nT i¼1 and b p W ! W. (ii) In addition, if Assumption 3(a) holds and Ejxith j8þd oDo1 and Ejit j8þd oDo1 for some d40, then pﬃﬃﬃﬃﬃﬃﬃ b nT ½vecðW À W Þ ! n 1 X d ! N 0; V ¼ lim E½ðvecðx0i i 0i xi À W ÞÞðvecðx0i i 0i xi À W ÞÞ0 , n nT i¼1 9 Note that a balanced panel has also implicitly been assumed. All of the results with the exception of Corollary 4.1 could be extended to accommodate unbalanced panels at the cost of more complicated notation. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 604 and b p V ! V. Remark 3.1. It follows from Theorem 1(i) that the asymptotic variance of b can be b estimated using (4) since !À1 !À1 n n n X X X 0 d bÞ x0 xi x0bib xi x0 xi Avarðb ¼ i i¼1 1 ¼ nT i i¼1 n 1 X 0 x xi nT i¼1 i !À1 i i i¼1 n X b 1 W x0 xi nT i¼1 i !À1 ¼ 1 bÀ1 b bÀ1 Q WQ , nT bÀ1 b bÀ1 p where Q W Q ! QÀ1 WQÀ1 . It also follows immediately from the deﬁnitions of tÃ and d F Ã in Eqs. (5) and (6) and Theorem 1(i) that, under the null hypothesis, tÃ ! Nð0; 1Þ and d b yÞ F Ã ! w2 . Similarly, using Theorem 1(ii) and assuming W ðb has properties similar to those q b b of W , it will follow that the HA test statistic, S Ã , formed using D deﬁned above converges 2 in distribution to a wkðkþ1Þ=2 under the null hypothesis. b b Theorem 1 veriﬁes that b and W are consistent and asymptotically normal as n ! 1 with T ﬁxed without imposing any restrictions on the time series dimension. In the following results, I consider alternate asymptotic approximations under the assumption that both n and T are going to inﬁnity.10 In these cases, consistency and asymptotic b normality of suitably normalized versions of W are established under weak conditions. Theorem 2, given immediately below, covers the case where n and T are going to inﬁnity and there is not weak dependence in the time series. In particular, the results of Theorem 2 P are only interesting in the case where W ¼ limn;T ð1=nT 2 Þ n E½x0i Oi xi 40. Perhaps the i¼1 leading case where this behavior would occur is in a model where it includes an individual speciﬁc random effect that is uncorrelated to xit and the estimated model does not include an individual speciﬁc effect. In this case, all observations for a given individual will be equicorrelated, and the condition given above will hold. Theorem 3, given following Theorem 2, covers the case where there is mixing in the time series. Theorem 2. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisﬁed, and that fn; Tg ! 1 jointly. (i) If Assumption 3(b) holds and Ejxith j4þd oDo1 and Ejit j4þd oDo1 for some d40, then n pﬃﬃﬃ b 1 X d nðb À bÞ ! QÀ1 Nð0; W ¼ lim 2 E½x0i Oi xi Þ, n;T nT i¼1 10 One could also consider sequential limits in which one takes limits as n or T goes to inﬁnity with the other dimension ﬁxed and then lets the other dimension go to inﬁnity. It could be shown that under the conditions of Theorem 2 and appropriate normalizations sequential limits taken ﬁrst with respect to either n or T would yield the same results as the joint limit. Similarly, under the conditions of Theorem 3, the sequential limits taken ﬁrst with respect to either n or T would produce the same results as the joint limit. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 605 and p b W =T ! W . (ii) In addition, if Assumption 3(a) holds and Ejxith j8þd oDo1 and Ejit j8þd oDo1 for some d40, then pﬃﬃﬃ b n½vecðW =T À W Þ ! n 1 X d 0 0 0 0 0 E½ðvecðxi i i xi À W ÞÞðvecðxi i i xi À W ÞÞ , ! N 0; V ¼ lim 4 n;T nT i¼1 and p b V =T 3 ! V . Remark 3.2. It is important to note that the results presented in Theorem 2 are not interesting in the setting where the fj; kg element of Oi becomes small when jj À kj is large P since in these circumstances ð1=nT 2 Þ n E½x0i Oi xi ! 0. Theorem 3 presents results which i¼1 are relevant in this case. b Remark 3.3. Theorem 2 veriﬁes consistency and asymptotic normality of both b and W b while imposing essentially no constraints on the time series dependence in the data. The large cross-section effectively allows the time series dimension to pﬃﬃﬃignored even when ﬃﬃﬃﬃﬃﬃﬃ be pT is large. However, without constraints on the time series, b is n-consistent, not nT b consistent. Intuitively, the slower rate of convergence is due to the fact that there may be little information contained in the time series since it is allowed to be arbitrarily dependent. pﬃﬃﬃﬃﬃﬃﬃ b Remark 3.4. The fact that b and W are not nT -consistent will not affect practical b implementation of inference about b In particular, the estimate of the asymptotic variance b. b based on Eq. (4) is of b !À1 !À1 n n n X X X 0 0 0 0 d b ¼ AvarðbÞ x xi x bib xi x xi i i¼1 i i i¼1 !À1 n 1 1 X 0 ¼ x xi n nT i¼1 i i i¼1 b ðW =TÞ n 1 X 0 x xi nT i¼1 i !À1 1 bÀ1 b bÀ1 ¼ Q ðW =TÞQ , n bÀ1 b bÀ1 p where Q ðW =TÞQ ! QÀ1 WQÀ1 : The t-statistic deﬁned in Eq. (5) may also be expressed as pﬃﬃﬃﬃﬃﬃﬃ nT ðRb À rÞ b Ã t ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ bÀ1 b bÀ1 RQ W Q R0 pﬃﬃﬃ b nðRb À rÞ ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ À1 b bÀ1 b RQ ðW =TÞQ R0 which converges in distribution to a Nð0; 1Þ random variable under the null hypothesis, d Rb ¼ r, by Theorem 2(i). Similarly, it follows that F Ã ! w2 under the null. Finally, the HA q ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 606 test statistic, S Ã , deﬁned above also satisﬁes b b yÞÞ bÀ b b yÞÞ SÃ ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb b b b yÞ=TÞ, b b yÞ=TÞ0 ðD=T 3 ÞÀ vecðW =T À W ðb ¼ n½vecðW =T À W ðb which converges in distribution to a w2 kðkþ1Þ=2 under the conditions of the theorem and the b ðb behaves similarly to V . b additional assumption that W yÞ The previous theorem establishes the properties of b and the robust variance matrix b estimator as n and T go to inﬁnity jointly without imposing restrictions on the time series dependence. While the result is interesting, there are many cases in which one might expect the time series dependence to diminish over time. In the following theorem, the properties b of b and W are established under the assumption that the data are strong mixing in the b time series dimension. Theorem 3. Suppose the data are generated by model (1), that Assumptions 1 and 2 are satisﬁed, and that fn; Tg ! 1 jointly. (i) If Assumption 3(b) is satisﬁed, Ejxith jrþd oD and Ejit jrþd oD for some d40, and fxit ; it g is a strong mixing sequence in t with a of size À3r=ðr À 4Þ for r44, ! n X pﬃﬃﬃﬃﬃﬃﬃ d 0 b À bÞ ! QÀ1 N 0; W ¼ lim 1 E½xi Oi xi nT ðb n;T nT i¼1 and p b W À W ! 0. (ii) In addition, if Assumption 3(a) is satisﬁed, Ejxith jrþd oD and Ejit jrþd oD for some d40, and fxit ; it g is a strong mixing sequence in t with a of size À7r=ðr À 8Þ for r48, pﬃﬃﬃ b n½vecðW À W Þ ! n 1 X d 0 0 0 0 0 E½ðvecðxi i i xi À W ÞÞðvecðxi i i xi À W ÞÞ , ! N 0; V ¼ lim 2 n;T nT i¼1 and p b V =T ! V . b Remark 3.5. Theorem 3 veriﬁes consistency and asymptotic normality of both b and W b under fairly conventional conditions on the time series dependence of the variables. The pﬃﬃﬃﬃﬃﬃﬃ added restriction on the time series dependence pﬃﬃﬃ allows estimation of b at the nT -rate, which differs from the case above where b is only n-consistent. Intuitively, the increase in b the rate of convergence is due to the fact that under the mixing conditions, the time series is more informative than in the case analyzed in Theorem 2. Remark 3.6. It follows immediately from the conclusions of Theorem 3 and the deﬁnitions d bÞ, d bÞ of Avarðb tÃ , and F Ã in Eqs. (4)–(6) that Avarðb is valid for estimating the asymptotic d d variance of b and that tÃ ! Nð0; 1Þ and F Ã ! w2 under the null hypothesis. The HA test b q ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 607 statistic, S Ã , also satisﬁes b b yÞÞ bÀ b b yÞÞ S Ã ¼ ðnTÞ½vecðW À W ðb 0 D vecðW À W ðb b b yÞÞ, b b yÞÞ b ¼ n½vecðW À W ðb 0 ðD=TÞÀ vecðW À W ðb which converges in distribution to a w2 kðkþ1Þ=2 under the conditions of the theorem and the b b b assumption that D behaves similarly to V . In this case, V could also typically be used as pﬃﬃﬃﬃﬃﬃﬃ Ã b yÞ the weighting matrix in forming S since it will often be the case that W ðb will be nT pﬃﬃﬃ b consistent while W is n-consistent. Theorems 1–3 establish that conventional estimators of the asymptotic variance of b and b b t and F statistics formed using W have their usual properties as long as n ! 1 regardless of the behavior of T. In addition, the results indicate that it is essentially only the size of n that matters for the asymptotic behavior of the estimators under these sequences. To b complete the theoretical analysis, I present the asymptotic properties of W as T ! 1 with n ﬁxed below. The results are interesting in providing a justiﬁcation for a commonly used procedure and in unifying the results and the different asymptotics considered. Theorem 4. Suppose the data are generated by model (1), that Assumptions 1, 2, and 3(b) are satisﬁed, and that T ! 1 with n ﬁxed. If Ejxith jrþd oD, Ejit jrþd oD, and fxit ; it g is a strong mixing sequence in t with a of size À3r=ðr À 4Þ for r44, then pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃ d p d nT ðb À bÞ ! QÀ1 Nð0; W Þ; x0i xi =nT À Qi =n ! 0; x0i i = nT ! Nð0; W i =nÞ, b and ! !À1 n n n X X 1X b W !U ¼ ðLi Bi B0i Li À Li Bi B0j Lj Qj Qi n i¼1 j¼1 j¼1 !À1 ! n n X X À Qi Qj Lj Bj B0i Li d j¼1 þ Qi n X j¼1 !À1 Qj j¼1 n X j¼1 ! Lj Bj n X j¼1 ! B0j Lj n X !À1 Qj Qi , j¼1 P where W i ¼ limT ð1=TÞE½x0i Oi xi , W ¼ limT ð1=nTÞ i E½x0i Oi xi , Bi $Nð0; I k Þ is a k-dimen1=2 sional normal vector with E½Bi B0j ¼ 0 and Li ¼ W i . b Remark 3.7. Theorem 4 veriﬁes that W is not consistent but does have a limiting distribution as T ! 1 with n ﬁxed. Unfortunately, the result here differs from results obtained in Phillips et al. (2003), Kiefer and Vogelsang (2002, 2005), and Vogelsang (2003) who consider HAC estimation in time series data without truncation in that how to construct asymptotically pivotal statistics from U is not immediately obvious. However, in one important special case, U is proportional to the true covariance matrix allowing construction of asymptotically pivotal tests. Corollary 4.1. Suppose the conditions of Theorem 4 are satisﬁed and that Qi ¼ Q and W i ¼ W for all i. Then ! n n n X 1 1X X 0 d b Bi B0i À Bi Bi L W !U ¼ L n n i¼1 i¼1 i¼1 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 608 for Bi deﬁned in Theorem 4 and L ¼ W 1=2 . Then, for testing the null hypothesis H0 : Rb ¼ r against the alternative H1 : Rbar for a q Â k matrix R with rank q, the limiting distributions of the conventional Wald (F Ã ) and t-type ðtÃ Þ tests under H0 are bÀ1 b bÀ1 b b F Ã ¼ ðnTÞðRb À rÞ0 ½RQ W Q R0 À1 ðRb À rÞÞ " !#À1 X nq d e0 1 e e e0 F q;nÀq , ! Bq;n Bq;n ; ¼ Bq;i B0q;i À Bq;n Bq;n n nÀq i ð10Þ and pﬃﬃﬃﬃﬃﬃﬃ nT ðRb À rÞ b t ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ À1 b b bÀ1 RQ W Q R0 rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ e B1;n n d ! qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ ð11Þ tnÀ1 , P 2 2 nÀ1 e ð1=nÞð i B1;i À B1;n Þ pﬃﬃﬃ P e where Bq;i $Nð0; I q Þ, Bq;n ¼ ð1= nÞ n Bq;i , tnÀ1 is a t distribution with n À 1 degrees of i¼1 freedom, and F q;nÀq is an F distribution with q numerator and n À q denominator degrees of freedom. Ã b Corollary 4.1 gives the limiting distribution of W as T ! 1 under the additional restriction that Qi ¼ Q and W i ¼ W for all i. These restrictions would be satisﬁed when the data vectors for each individual fxi ; yi g are iid across i. While this is more restrictive than the condition imposed in Assumption 1, it still allows for quite general forms of conditional heteroskedasticity and does not impose any structure on the time series process within individuals. The most interesting feature about the result in Corollary 4.1 is that under the b conditions imposed, the limiting distribution of W is proportional to the actual covariance matrix in the data. This allows construction of asymptotically pivotal statistics based on standard t and Wald tests as in Phillips et al. (2003), Kiefer and Vogelsang (2002, 2005), and Vogelsang (2003). This is particularly convenient in the panel case since the limiting pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ distribution of the t-statistic is exactly ðn=ðn À 1ÞÞ tnÀ1 where tnÀ1 denotes the t distribution with n À 1 degrees of freedom.11 It is also interesting that EU ¼ ð1 À ð1=nÞÞW . b This suggests normalizing the estimator W by n=ðn À 1Þ will result in an asymptotically unbiased estimator in asymptotics where T ! 1 with n ﬁxed and will likely reduce the ﬁnite-sample bias under asymptotics where n ! 1. In addition, the t-statistic constructed b based on the estimator deﬁned by ðn=ðn À 1ÞÞW will be asymptotically distributed as a tnÀ1 for which critical values are readily available.12 The conclusions of Corollary 4.1 suggest a simple procedure for testing hypotheses regarding regression coefﬁcients which will be valid under any of the asymptotics b considered. Using ðn=ðn À 1ÞÞW and obtaining critical values from a tnÀ1 distribution will yield tests which are asymptotically valid regardless of the asymptotic sequence since the 11 b If n ¼ 1, W is identically equal to 0. In this case, it is easy to verify that U equals 0, though the results of Theorem 4 and Corollary 4.1 are obviously uninteresting in this case. 12 b This is essentially the normalization used in Stata’s cluster command, which normalizes W by ½ðnT À 1Þ=ðnT À kÞ ½n=ðn À 1Þ, where the normalization is motivated as a ﬁnite-sample adjustment under the usual n ! 1, T ﬁxed asymptotics; see Stata User’s Guide Release 8, p. 275 (Stata Corporation, 2003). ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 609 tnÀ1 ! Nð0; 1Þ and n=ðn À 1Þ ! 1 as n ! 1. Thus, this approach will yield valid tests under any of the asymptotics considered in the presence of quite general heteroskedasticity and serial correlation.13 In addition, it is important to note that in the cases where there is weak dependence in the time series and T is large, more efﬁcient estimators of the covariance matrix which make use of this information are available. In particular, standard time series HAC estimators which downweight the correlation between observations that are far apart will have faster rates of convergence than the CCM estimator. b Finally, it is worth noting that the maximum rank of W will generally be n À 1, which b b suggests that W will be rank deﬁcient when k4n À 1: Since W is supposed to estimate a b full rank matrix, it seems likely that inference based on W will perform poorly in these cases. Also, the above development ignores time effects, which will often be included in panel data models. Under T ﬁxed, n ! 1 asymptotics, the time effects can be included in the covariate vector xit and pose no additional complications. However, as T ! 1, they also need to be considered separately from x and partialed out with the individual ﬁxed effects. This partialing out will generally result in the presence of an Oð1=nÞ correlation between individuals. When n is large, this correlation should not matter, but in the ﬁxed n, T ! 1 case, it will invalidate the results. The effect of the presence of time effects was explored in a simulation study with the same design as that reported in the following section where each model was estimated including a full set of time ﬁxed effects. The results, which are not reported below but are available upon b request, show that tests based on W are somewhat more size distorted than when no time effects are included for small n, but that this size distortion diminishes quickly as n increases. 4. Monte Carlo evidence The asymptotic results presented above suggest that tests based on the robust standard error estimates should have good properties regardless of the relative sizes of n and T. I report results from a simple simulation study used to assess the ﬁnite sample effectiveness of the robust covariance matrix estimator and tests based upon it below. Speciﬁcally, the simulation focuses on t-tests for regression coefﬁcients and the HA test discussed above. The Monte Carlo simulations are based on two different speciﬁcations: a ‘‘ﬁxed effect’’ speciﬁcation and a ‘‘random effects’’ speciﬁcation. The terminology refers to the fact that in the ‘‘ﬁxed effect’’ speciﬁcation, the models will be estimated including individual speciﬁc ﬁxed effects with the goal of focusing on the case where the underlying disturbances exhibit weak dependence. In the ‘‘random effects’’ speciﬁcation individual speciﬁc effects are not estimated and the goal is to examine the behavior of the CCM estimator and tests based upon it in an equicorrelated model. The ﬁxed effect speciﬁcation is yit ¼ x0it b þ ai þ eit , where xit is a scalar and ai is an individual speciﬁc effect. The data generating process for the ﬁxed effect speciﬁcation allows for serial correlation in both xit and eit and 13 This argument also applies to testing multiple parameters using F Ã . ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 610 heteroskedasticity: xit ¼ :5xitÀ1 þ vit ; vit $Nð0; :75Þ, qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ eit ¼ reitÀ1 þ a0 þ a1 x2 uit ; uit $Nð0; 1 À r2 Þ, it ai $Nð0; :5Þ. Data are simulated using four different values of r, r 2 f0; :3; :6; :9g, in both the homoskedastic ða0 ¼ 1; a1 ¼ 0Þ and heteroskedastic ða0 ¼ a1 ¼ :5Þ cases, resulting in a total of eight distinct parameter settings. The models are estimated including xit and a full set of individual speciﬁc ﬁxed effects.14 The random effects speciﬁcations is yit ¼ x0it b þ it , where xit is a normally distributed scalar with E½x2 ¼ 1 and E½xit1 xit2 ¼ :8 for all t1 at2 . it it contains an individual speciﬁc random component and a random error term: it ¼ ai þ uit , ai $Nð0; rÞ, uit $Nð0; 1 À rÞ. Note that the random effects data generating process implies that E½it1 it2 ¼ r for t1 at2 . Three values of r are employed for the random effects speciﬁcation: .3, .6, and .9. The model is estimated by regressing yit on xit and a constant. The ﬁxed effects model is commonly used in empirical work when panel data are available. The random effects speciﬁcation is also widely used in the policy evaluation literature. In many policy evaluation studies, the covariate of interest is a policy variable that is highly correlated within aggregate cells, often with a correlation of one, which has led to the dominance of the random effects estimator in this context. For example, a researcher may desire to estimate the effect of classroom level policies on student-level micro data containing observations from multiple classrooms. In this setting, T indexes the number of students within each class, n indexes the number of classrooms, and ai is a classroom speciﬁc random effect. The CCM estimator has been widely utilized in such situations in order to consistently estimate standard errors.15 Simulation results for various values of the cross-sectional (n) and time ðTÞ dimensions are reported. For each fn; Tg combination, reported results for each of the 11 parameter settings (eight for the ﬁxed effects speciﬁcation and three for the random effects speciﬁcation) are based on 1,000 simulation repetitions. Each simulation estimates three types of standard errors for b unadjusted OLS standard errors, bOLS , CCM standard b: s errors, bCLUS , and standard errors consistent with an AR(1) process, bARð1Þ .16 For the s s 14 Since ai is uncorrelated with xi , this model could be estimated using random effects. I chose to consider a different speciﬁcation for the random effects estimates where the xit were generated to more closely resemble covariates which appear in policy analysis studies. 15 This is, in fact, one of the original motivations for the development of the CCM estimator, cf. Liang and Zeger (1986). 16 bARð1Þ imposes the parametric structure implied by an AR(1) process. The r parameter is estimated from the s OLS residuals using the procedure described in Hansen (2006) which consistently estimates AR parameters in ﬁxed effects panel models. The standard errors are then computed as ðX 0 X ÞÀ1 X 0 OðbÞX ðX 0 X ÞÀ1 where OðbÞ is the r r covariance matrix implied by an AR(1) process. ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 611 random effects speciﬁcation, standard errors consistent with random effects, bRE , are s s substituted for bARð1Þ .17 bCLUS is consistent for all parameter settings. bOLS is consistent only s s in the iid case (the homoskedastic data generating process with r ¼ 0Þ. bARð1Þ is consistent s in all homoskedastic data generating processes, and bRE is consistent in all models for s which it is reported. In all cases, the CCM estimator is computed using the normalization implied by T ! 1 with n ﬁxed asymptotics; that is, the CCM estimator is computed as b b ðn=ðn À 1ÞÞW for W deﬁned in Eq. (3). Tables 1–4 present the results of the Monte Carlo study, where each table corresponds to a different fn; Tg combination.18 In each table, Panel A presents the ﬁxed effects results for the homoskedastic and heteroskedastic cases, while Panel B presents the random effects results. Column (1) presents t-test rejection rates for 5% level tests based on OLS, CCM, and AR(1) standard errors. The critical values for tests based on OLS and AR(1) errors are taken from a tnTÀnÀ1 distribution, and the critical values for tests based on clustered standard errors are taken from a tnÀ1 distribution. Columns (2) and (3) present the mean and standard deviation of the estimated standard errors respectively. Column (4) presents the standard deviation of the b The difference between columns (2) and (4) is therefore b’s. the bias of the estimated standard errors. Finally, column (5) presents the rejection rates for the HA test described above which tests the null hypothesis that both the CCM estimator and the parametric estimator are consistent. As expected, tests based on bOLS and bARð1Þ perform well in the cases where the assumed s s model is consistent with the data across the full range of n and T combinations. The results pﬃﬃﬃﬃﬃﬃ ﬃ are also consistent with the asymptotic theory, clearly illustrating the nT -consistency of b b b b b and W with the bias of W and the variance of both b and W decreasing as either n or T b increases. Of course, when the assumed parametric model is inconsistent with the data, tests based on parametric standard errors suffer from size distortions and the standard error estimates are biased. The RE tests have the correct size for moderate and large n, but not for small n (i.e. n ¼ 10); and as indicated by the asymptotic theory, the T dimension has no apparent impact on the size of RE based tests or the overall performance of the RE estimates. Tests based on the CCM estimator have approximately correct size across all combinations of n and T and all models of the disturbances considered in the ﬁxed effect speciﬁcation. The estimator does, however, display a moderate bias in the small n case; it seems likely that this bias does not translate into a large size distortion due to the fact that the bias is small relative to the standard error of the estimator and the use of the tnÀ1 distribution to obtain the critical values. While the clustered standard errors perform well in terms of size of tests and reasonably well in terms of bias, the simulations reveal that a potential weakness of the clustered estimator is a relatively high variance. The CCM estimates have a substantially higher standard deviation than the other estimators and this difference, in percentage terms, increases with T. This behavior is consistent with the 17 bRE is estimated in a manner analogous to bARð1Þ where the covariance parameters are estimated in the usual s s manner from the OLS and within residuals. 18 Tables 1–4 correspond to fn; Tg ¼ f10; 10g, fn; Tg ¼ f10; 50g, fn; Tg ¼ f50; 10g, fn; Tg ¼ f50; 50g, respectively. Additional results for fn; Tg ¼ f10; 200g, fn; Tg ¼ f50; 20g, fn; Tg ¼ f50; 200g, fn; Tg ¼ f200; 10g, and fn; Tg ¼ f200; 50g are available from the author upon request. The results are consistent with the asymptotic theory with the performance of the CCM estimator improving as either n or T increases in the ﬁxed effects speciﬁcation and as n increases in the random effects speciﬁcation. In the random effects case, the performance does not appear to be greatly inﬂuenced by the size of T relative to n. ARTICLE IN PRESS 612 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Table 1 Data generating process N ¼ 10; T ¼ 10 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.038 0.043 0.041 0.1180 0.1149 0.1170 0.0133 0.0330 0.0141 0.1152 0.1152 0.1152 0.152 0.082 0.054 0.055 0.1130 0.1212 0.1240 0.0136 0.0357 0.0161 0.1269 0.1269 0.1269 0.095 0.093 0.060 0.051 0.1005 0.1167 0.1219 0.0133 0.0352 0.0181 0.1231 0.1231 0.1231 0.074 0.145 0.053 0.054 0.0609 0.0772 0.0795 0.0090 0.0249 0.0136 0.0818 0.0818 0.0818 0.038 0.126 0.057 0.126 0.1150 0.1410 0.1140 0.0126 0.0458 0.0137 0.1502 0.1502 0.1502 0.051 0.171 0.068 0.143 0.1165 0.1538 0.1284 0.0137 0.0500 0.0172 0.1708 0.1708 0.1708 0.036 0.187 0.074 0.117 0.1238 0.1717 0.1503 0.0153 0.0572 0.0219 0.1853 0.1853 0.1853 0.027 0.198 0.087 0.097 0.1406 0.1872 0.1830 0.0209 0.0641 0.0336 0.2181 0.2181 0.2181 0.031 0.295 0.115 0.097 0.1063 0.1561 0.1693 0.0231 0.0609 0.0460 0.1926 0.1926 0.1926 0.017 0.399 0.118 0.094 0.1030 0.2024 0.2180 0.0248 0.0788 0.0600 0.2438 0.2438 0.2438 0.054 0.482 0.108 0.095 0.0987 0.2346 0.2546 0.0293 0.0909 0.0723 0.2925 0.2925 0.2925 0.093 HA test rejection rate (5) 0.135 0.133 0.123 0.085 0.042 0.044 0.049 0.074 0.027 0.023 0.018 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 613 Table 2 Data generating process N ¼ 10; T ¼ 50 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.054 0.050 0.057 0.0462 0.0449 0.0460 0.0024 0.0117 0.0026 0.0472 0.0472 0.0472 0.184 0.088 0.043 0.050 0.0459 0.0520 0.0529 0.0024 0.0133 0.0031 0.0519 0.0519 0.0519 0.077 0.155 0.042 0.047 0.0447 0.0574 0.0598 0.0028 0.0150 0.0044 0.0590 0.0590 0.0590 0.049 0.225 0.046 0.049 0.0372 0.0562 0.0583 0.0034 0.0159 0.0072 0.0600 0.0600 0.0600 0.046 0.158 0.051 0.162 0.0459 0.0606 0.0458 0.0021 0.0169 0.0023 0.0637 0.0637 0.0637 0.052 0.199 0.041 0.142 0.0479 0.0735 0.0553 0.0022 0.0198 0.0032 0.0724 0.0724 0.0724 0.046 0.229 0.043 0.112 0.0558 0.0928 0.0748 0.0031 0.0260 0.0054 0.0934 0.0934 0.0934 0.067 0.239 0.046 0.076 0.0857 0.1428 0.1338 0.0079 0.0451 0.0163 0.1490 0.1490 0.1490 0.059 0.568 0.104 0.097 0.0471 0.1356 0.1475 0.0092 0.0547 0.0413 0.1636 0.1636 0.1626 0.147 0.703 0.104 0.095 0.0466 0.1897 0.2079 0.0105 0.0727 0.0567 0.2331 0.2331 0.2331 0.212 0.744 0.106 0.103 0.0450 0.2310 0.2539 0.0130 0.0920 0.0701 0.2785 0.2785 0.2785 0.245 HA test rejection rate (5) 0.185 0.159 0.184 0.150 0.057 0.047 0.059 0.099 0.014 0.007 0.014 ARTICLE IN PRESS 614 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Table 3 Data generating process N ¼ 50; T ¼ 10 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.049 0.057 0.047 0.0522 0.0515 0.0522 0.0026 0.0062 0.0028 0.0526 0.0526 0.0526 0.106 0.080 0.059 0.055 0.0500 0.0552 0.0556 0.0027 0.0072 0.0033 0.0569 0.0569 0.0569 0.053 0.102 0.048 0.049 0.0447 0.0549 0.0553 0.0026 0.0071 0.0037 0.0539 0.0539 0.0539 0.132 0.156 0.075 0.067 0.0273 0.0364 0.0367 0.0273 0.0367 0.0367 0.0387 0.0387 0.0387 0.220 0.119 0.047 0.116 0.0517 0.0673 0.0516 0.0025 0.0093 0.0028 0.0659 0.0659 0.0659 0.213 0.197 0.062 0.139 0.0521 0.0741 0.0581 0.0026 0.0114 0.0033 0.0768 0.0768 0.0768 0.369 0.214 0.048 0.108 0.0558 0.0820 0.0688 0.0031 0.0126 0.0045 0.0840 0.0840 0.0840 0.451 0.152 0.038 0.057 0.0623 0.0899 0.0834 0.0043 0.0144 0.0070 0.0883 0.0883 0.0883 0.324 0.291 0.062 0.059 0.0451 0.0776 0.0788 0.0041 0.0135 0.0091 0.0822 0.0822 0.0822 0.673 0.357 0.073 0.068 0.0452 0.1004 0.1028 0.0049 0.0183 0.0127 0.1034 0.1034 0.1034 0.892 0.497 0.062 0.063 0.0447 0.1192 0.1210 0.0056 0.0212 0.0147 0.1246 0.1246 0.1246 0.943 HA test rejection rate (5) 0.099 0.092 0.072 0.078 0.210 0.140 0.056 0.023 0.058 0.054 0.048 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 615 Table 4 Data generating process N ¼ 50; T ¼ 20 A. Fixed effects Homoskedastic, r ¼ 0 OLS Cluster AR1 Homoskedastic, r ¼ :3 OLS Cluster AR1 Homoskedastic, r ¼ :6 OLS Cluster AR1 Homoskedastic, r ¼ :9 OLS Cluster AR1 Heteroskedastic, r ¼ 0 OLS Cluster AR1 Heteroskedastic, r ¼ :3 OLS Cluster AR1 Heteroskedastic, r ¼ :6 OLS Cluster AR1 Heteroskedastic, r ¼ :9 OLS Cluster AR1 B. Random effects r ¼ :3 OLS Cluster RE r ¼ :6 OLS Cluster RE r ¼ :9 OLS Cluster RE t-test rejection rate (1) Mean (s.e.) Std (s.e.) Std ðbÞ (2) (3) (4) 0.050 0.049 0.052 0.0342 0.0341 0.0342 0.0013 0.0040 0.0014 0.0341 0.0341 0.0341 0.097 0.094 0.051 0.056 0.0334 0.0379 0.0382 0.0013 0.0045 0.0016 0.0393 0.0393 0.0393 0.077 0.120 0.059 0.050 0.0315 0.0407 0.0412 0.0014 0.0052 0.0021 0.0414 0.0414 0.0414 0.300 0.200 0.059 0.060 0.0222 0.0327 0.0329 0.0013 0.0047 0.0024 0.0336 0.0336 0.0336 0.580 0.168 0.063 0.171 0.0340 0.0458 0.0340 0.0011 0.0056 0.0012 0.0479 0.0479 0.0479 0.408 0.209 0.051 0.145 0.0350 0.0527 0.0399 0.0012 0.0068 0.0016 0.0536 0.0536 0.0536 0.675 0.228 0.050 0.119 0.0394 0.0636 0.0514 0.0017 0.0084 0.0027 0.0653 0.0653 0.0653 0.802 0.196 0.036 0.058 0.0507 0.0809 0.0751 0.0028 0.0131 0.0056 0.0775 0.0775 0.0775 0.681 0.405 0.069 0.063 0.0320 0.0726 0.0738 0.0029 0.0131 0.0085 0.0756 0.0756 0.0756 0.915 0.515 0.066 0.055 0.0318 0.0976 0.0996 0.0033 0.0169 0.0118 0.1012 0.1012 0.1012 0.944 0.614 0.054 0.051 0.0314 0.1166 0.1194 0.0038 0.0204 0.0140 0.1203 0.1203 0.1203 0.948 HA test rejection rate (5) 0.088 0.086 0.092 0.094 0.406 0.294 0.123 0.034 0.064 0.055 0.053 ARTICLE IN PRESS 616 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 pﬃﬃﬃ n-consistency of the estimator and does suggest that if a parametric estimator is available, it may have better properties for estimating the variance of b b: The clustered estimator performs less well in the random effects speciﬁcation. For small n, tests based on the CCM estimator suffer from a substantial size distortion for all values of T. For moderate to large values of n, the tests have the correct size, and the overall performance does not appear to depend on T. In addition, the variance of b does ﬃﬃﬃﬃﬃﬃﬃ b p not appear to decrease as T increases. These results are consistent with the lack of nT consistency in this case.19 The performance of the HA test is much less robust than that of t-tests based on clustered standard errors. For small n, the tests are badly size distorted and have essentially no power against any alternative hypotheses. As n and T grow, the test performance improves. With n ¼ 50, the test remains size distorted, but it does have some power against alternatives that increases as T increases. The HA test also performs poorly for the random effects speciﬁcation for small n. However, for moderate or large n, the test has both the correct size and good power. Overall, the simulation results support the use of clustered standard errors for performing inference on regression coefﬁcient estimates in serially correlated panel data, though they also suggest care should be taken if n is small and one suspects a ‘‘random b effects’’ structure. The poor performance of W in ‘‘random effects’’ models with small n is already well-known; see for example Bell and McCaffrey (2002) who also suggest a bias b reduction for W in this case. However, that the estimator does quite well even for small n in the serially correlated case where the errors are mixing is somewhat surprising and is a new result which is suggested by the asymptotic analysis of the previous section. The simulation results conﬁrm the asymptotic results, suggesting that the clustered standard errors are consistent as long as n ! 1 and that they are not sensitive to the size of n relative to T. The chief drawback of the CCM estimator is that the robustness comes at the cost of increasing the variance of the standard error estimate relative to that of standard errors estimated through more parsimonious models. The HA test offers one simple information based criterion for choosing between the CCM estimator and a simple parametric model of the error process. However, the simulation evidence regarding its usefulness is mixed. In particular, the properties of the test are poor in small sample settings where there is likely to be the largest gain to using a parsimonious model. However, in moderate sized samples, the test performs reasonably well, and there still may be gains to using a simple parametric model in these cases. 5. Conclusion This paper explores the asymptotic behavior of the robust covariance matrix estimator of Arellano (1987). It extends the usual analysis performed under asymptotics where n ! 1 with T ﬁxed to cases where n and T go to inﬁnity jointly, considering both non-mixing and mixing cases, and to the case where T ! 1 with n ﬁxed. The limiting behavior of the OLS estimator, b in each case is different. However, the analysis shows that the b, conventional estimator of the asymptotic variance and the usual t and F statistics have the same properties regardless of the behavior of the time series as long as n ! 1: In addition, The inconsistency of b when T increases with n ﬁxed in differences-in-differences and policy evaluation studies b has also been discussed in Donald and Lang (2001). 19 ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 617 when T ! 1 with n ﬁxed and the data satisfy mixing conditions and an iid assumption across individuals, the usual t and F statistics can be used for inference despite the fact that the robust covariance matrix estimator is not consistent but converges in distribution to a limiting random variable. In this case, it is shown that the t statistic constructed using n=ðn À 1Þ times the estimator of Arellano (1987) is asymptotically tnÀ1 , suggesting the use of n=ðn À 1Þ times the estimator of Arellano (1987) and critical values obtained from a tnÀ1 in all cases. The use of this procedure is also supported in a short simulation experiment, which veriﬁes that it produces tests with approximately correct size regardless of the relative size of n and T in cases where the time series correlation between observations diminishes as the distance between observations increases. The simulations also verify that tests based on the robust standard errors are consistent as n increases regardless of the relative size of n and T even in cases when the data are equicorrelated. Acknowledgments The research reported in this paper was motivated through conversations with Byron Lutz, to whom I am very grateful for input in developing this paper. I would like to thank Whitney Newey and Victor Chernozhukov as well as anonymous referees and a coeditor for helpful comments and suggestions. This work was partially supported by the William S. Fishman Faculty Research Fund at the Graduate School of Business, the University of Chicago. All remaining errors are mine. Appendix For brevity, sketches of the proofs are provided below. More detailed versions are available in an additional Technical Appendix from the author upon request and in Hansen (2004). pﬃﬃﬃﬃﬃﬃﬃ P p d Proof of Theorem 1. b À b ! 0 and b nT ðb À bÞ ! QÀ1 Nð0; W ¼ limn ð1=nTÞ n b i¼1 E½x0i Oi xi Þ follow immediately under the conditions of Theorem 1 from the Markov LLN and the Liapounov CLT. The remaining conclusions follow from repeated use of the Cauchy–Schwarz inequality, Minkowski’s inequality, the Markov LLN, and the Liapounov CLT. & The proofs of Theorems 2 and 3 make use of the following lemmas which provide a LLN and CLT for inid data as fn; Tg ! 1 jointly. Lemma 1. Suppose fZ i;T g are independent across i for all T with E½Z i;T ¼ mi;T and P p EjZ i;T j1þd oDo1 for some d40 and all i; T. Then ð1=nÞ n ðZ i;T À mi;T Þ ! 0 as fn; Tg ! i¼1 1 jointly. Proof. The proof follows from standard arguments, cf. Chung (2001) Chapter 5. Details are given in Hansen (2004). & Lemma 2. For k Â 1 vectors Z i;T , suppose fZ i;T g are independent across i for all T with E½Z i;T ¼ 0, E½Z i;T Z 0i;T ¼ Oi;T , and EkZ i;T k2þd oDo1 for some d40. Assume O ¼ P pﬃﬃﬃ P limn;T ð1=nÞ n Oi;T is positive deﬁnite with minimum eigenvalue lmin 40. Then ð1= nÞ n i¼1 i¼1 d Z i;T ! Nð0; W Þ as fn; Tg ! 1 jointly. ARTICLE IN PRESS 618 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Proof. The result follows from verifying the Lindeberg condition of Theorem 2 in Phillips and Moon (1999) using an argument similar to that used in the proof of Theorem 3 in Phillips and Moon (1999). Details are given in Hansen (2004). & Proof of Theorem 2. The conclusions follow from conventional arguments making repeated use of the Cauchy–Schwarz inequality, Minkowski’s inequality, and Lemmas 1 and 2. & In addition to using Lemmas 1 and 2, I make use of the following mixing inequality, restated from Doukhan (1994) Theorem 2 with a slight change of notation, to establish the properties of the estimators as fn; Tg ! 1 when mixing conditions are imposed. Its proof may be found in Doukhan (1994, p. 25–30). Lemma 3. Let fzt g be a strong mixing sequence with E½zt ¼ 0, Ekzt ktþ oDo1, and mixing coefﬁcient aðmÞ of size ð1 À cÞr=ðr À cÞ where c 2 2N, P and r4c. Then there is a constant cXt, C depending only on t and aðmÞ such that Ej T yt jt pCDðt; ; TÞ with Dðt; ; TÞ t¼1 deﬁned in Doukhan (1994) and satisfying Dðt; ; TÞ ¼ OðTÞ if tp2 and Dðt; ; TÞ ¼ OðT t=2 Þ if t42. Proof of Theorem 3. The conclusions follow under the conditions of the theorem by making use of the Cauchy–Schwarz inequality, Minkowsk’s inequality, and Lemma 3 to verifythe conditions of Lemmas 1 and 2. & pﬃﬃﬃ d Proof of Theorem 4. Under ﬃﬃﬃﬃ hypotheses of the theorem, nðb À bÞ ! QÀ1 Nð0; W Þ, b p thed p x0i xi =T À Qi ! 0, and x0i i = T ! Nð0; W i Þ are immediate from a LLN and CLT for mixing sequences, cf. White (2001, Theorems 3.47 and 5.20). The conclusion then follows b from the deﬁnition of W and bi . & qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃ bÀ1 b bÀ1 Proof of Corollary 4.1. Consider tÃ ¼ nT ðRb À rÞ= RQ W Q R0 . Under the null b pﬃﬃﬃﬃﬃﬃﬃ P b nT Rðb À bÞ ¼ Rðð1=nTÞ i x0i xi ÞÀ1 hypothesis, Rb ¼ r, so the numerator of tÃ is pﬃﬃﬃﬃﬃﬃﬃ P P pﬃﬃﬃ d ðð1= nT Þ i x0i i Þ ! RQÀ1 L i Bi = n. From Theorem 4 and the hypotheses of the Corollary, the denominator of tÃ converges in distribution to vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! u n n n X u 1X X 0 À1 1 0 tRQ L Bi Bi À Bi Bi LQÀ1 R0 . n n i¼1 i¼1 i¼1 It follows from the Continuous Mapping Theorem that P pﬃﬃﬃ RQÀ1 L i Bi = n Ã d t ! qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ . P P P ð1=nÞRQÀ1 Lð n Bi B0i À ð1=nÞ n Bi n B0i ÞLQÀ1 R0 i¼1 i¼1 i¼1 Deﬁne d ¼ ðRQÀ1 LLQÀ1 R0 Þ1=2 , so P pﬃﬃﬃ d i B1;i = n d tÃ ! U ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ P P P ðd2 =nÞð n B1;i B01;i À ð1=nÞ n B1;i n B01;i Þ i¼1 i¼1 i¼1 e B1;n ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ . P e2 ð1=nÞð i B2 À B1;n Þ 1;i ARTICLE IN PRESS C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 e It is straightforward to show that B1;n $Nð0; 1Þ, that e2 B1;n P 2 i B1;i 2 e À B1;n $w2 , and that nÀ1 619 P 2 i B1;i À e and B1;n are independent, from which it follows that U¼ n 1=2 n 1=2 e B1;n qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ $ tnÀ1 . P nÀ1 nÀ1 e2 ð i B2 À B1;n Þ=ðn À 1Þ 1;i The result for F Ã is obtained through a similar argument, and using a result from Rao (2002) Chapter 8b to verify that the resulting quantity follows an F distribution. & References Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59 (3), 817–858. Arellano, M., 1987. Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics 49 (4), 431–434. Baltagi, B.H., Wu, P.X., 1999. Unequally spaced panel data regressions with AR(1) disturbances. Econometric Theory 15, 814–823. Bell, R.M., McCaffrey, D.F., 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Mimeo RAND. Bertrand, M., Duﬂo, E., Mullainathan, S., 2004. How much should we trust differences-in-differences estimates? Quarterly Journal of Economics 119 (1), 249–275. Bhargava, A., Franzini, L., Narendranathan, W., 1982. Serial correlation and the ﬁxed effects model. Review of Economic Studies 49, 533–549. Chung, K.L., 2001. A Course in Probability Theory, third ed. Academic Press, San Diego. Donald, S., Lang, K., 2001. Inference with differences in differences and other panel data. Mimeo. Doukhan, P., 1994. Mixing: properties and examples. In: Fienberg, S., Gani, J., Krickeberg, K., Olkin, I., Wermuth, N. (Eds.), Lecture Notes in Statistics, vol. 85. Springer, New York. Drukker, D.M., 2003. Testing for serial correlation in linear panel-data models. Stata Journal 3, 168–177. Hahn, J., Kuersteiner, G.M., 2002. Asymptotically unbiased inference for a dynamic panel model with ﬁxed effects when both N and T are large. Econometrica 70 (4), 1639–1657. Hahn, J., Newey, W.K., 2004. Jackknife and analytical bias reduction for nonlinear panel models. Econometrica 72 (4), 1295–1319. Hansen, C.B., 2004. Inference in linear panel data models with serial correlation and an essay on the impact of 401(k) participation on the wealth distribution. Ph.D. Dissertation, Massachusetts Institute of Technology. Hansen, C.B., 2006. Generalized least squares inference in multilevel models with serial correlation and ﬁxed effects. Journal of Econometrics, doi:10.1016/j.jeconom.2006.07.011. Kezdi, G., 2002. Robust standard error estimation in ﬁxed-effects panel models. Mimeo. Kiefer, N.M., Vogelsang, T.J., 2002. Heteroskedasticity–autocorrelation robust testing using bandwidth equal to sample size. Econometric Theory 18, 1350–1366. Kiefer, N.M., Vogelsang, T.J., 2005. A new asymptotic theory for heteroskedasticity–autocorrelation robust tests. Econometric Theory 21, 1130–1164. Lancaster, T., 2002. Orthogonal parameters and panel data. Review of Economic Studies 69, 647–666. Liang, K.-Y., Zeger, S., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73 (1), 13–22. MaCurdy, T.E., 1982. The use of time series processes to model the error structure of earnings in a longitudinal data analysis. Journal of Econometrics 18 (1), 83–114. Nickell, S., 1981. Biases in dynamic models with ﬁxed effects. Econometrica 49 (6), 1417–1426. Phillips, P.C.B., Moon, H.R., 1999. Linear regression limit theory for nonstationary panel data. Econometrica 67 (5), 1057–1111. Phillips, P.C.B., Sun, Y., Jin, S., 2003. Consistent HAC estimation and robust regression testing using sharp origin kernels with no truncation. Cowles Foundation Discussion Paper 1407. Rao, C.R., 2002. Linear Statistical Inference and Its Application. Wiley-Interscience. ARTICLE IN PRESS 620 C.B. Hansen / Journal of Econometrics 141 (2007) 597–620 Solon, G., 1984. Estimating autocorrelations in ﬁxed effects models. NBER Technical Working Paper no. 32. Solon, G., Inoue, A., 2004. A portmanteau test for serially correlated errors in ﬁxed effects models. Mimeo. Stata Corporation, 2003. Stata User’s Guide Release 8. Stata Press, College Station, Texas. Vogelsang, T.J., 2003. Testing in GMM models without truncation. In: Fomby, T.B., Hill, R.C. (Eds.), Advances in Econometrics, volume 17, Maximum Likelihood Estimation of Misspeciﬁed Models: Twenty Years Later. Elsevier, Amsterdam, pp. 192–233. White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 (4), 817–838. White, H., 2001. Asymptotic Theory for Econometricians, revised edition. Academic Press, San Diego. Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data. The MIT Press, Cambridge, MA.

Disclaimer: Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.

Why Is My Information Online?