Robust Data Driven Inference For Density Weighted Average ...
Robust Data-Driven Inference for Density-Weighted Average Derivatives
Matias D. Cattaneo
Department of Economics, University of Michigan
Richard K. Crump
Federal Reserve Bank of New York
Michael Jansson
Department of Economics, UC Berkeley and CREATES
February 10, 2010
Abstract.
This paper presents a novel data-driven bandwidth selector com-
patible with the small bandwidth asymptotics developed in Cattaneo, Crump, and
Jansson (2009) for density-weighted average derivatives. The new bandwidth selector
is of the plug-in variety, and is obtained based on a mean squared error expansion of
the estimator of interest. An extensive Monte Carlo experiment shows a remarkable
improvement in performance when the bandwidth-dependent robust inference proce-
dures proposed by Cattaneo, Crump, and Jansson (2009) are coupled with this new
data-driven bandwidth selector. The resulting robust data-driven con…dence intervals
compare favorably to the alternative procedures available in the literature. The online
supplemental material to this paper contains further results from the simulation study.
Keywords: Averaged derivatives, bandwidth selection, robust inference, small band-
width asymptotics.
The authors thank Sebastian Calonico, Lutz Kilian, seminar participants at Georgetown, Michigan, Penn
State and Wisconsin, and conference participants at the 2009 Latin American Meeting of the Econometric
Society and 2010 North American Winter Meeting of the Econometric Society for comments. We also thank
the editor, associate editor and a referee for comments and suggestions that improved this paper. The …rst
author gratefully acknowledges …nancial support from the National Science Foundation (SES 0921505). The
third author gratefully acknowledges …nancial support from the National Science Foundation (SES 0920953)
and the research support of CREATES (funded by the Danish National Research Foundation).
1
Robust Data-Driven Inference for Averaged Derivatives
2
1.
Introduction
Semiparametric models, which include both a …nite dimensional parameter of interest and
an in…nite dimensional nuisance parameter, play a central role in modern statistical and
econometric theory, and are potentially of great interest in empirical work. However, the
applicability of semiparametric estimators is seriously hampered by the sensitivity of their
performance to seemingly ad hoc choices of “smoothing”and “tuning”parameters involved
in the estimation procedure. Although classical large sample theory for semiparametric
estimators is now well developed, these theoretical results are typically invariant to the
particular choice of parameters associated with the nonparametric estimator employed, and
usually require strong untestable assumptions (e.g., smoothness of the in…nite dimensional
nuisance parameter). As a consequence, inference procedures based on these estimators are
in general not robust to changes in the choice of tuning and smoothing parameters underlying
the nonparametric estimator, and to departures from key unobservable model assumptions.
These facts suggest that classical asymptotic results for semiparametric estimators may not
always accurately capture their behavior in …nite samples, posing considerable restrictions
on the overall applicability they may have for empirical work.
This paper proposes two robust data-driven inference procedures for the semiparametric
density-weighted average derivatives estimator of Powell, Stock, and Stoker (1989). The
averaged derivatives is a simple yet important semiparametric estimand of interest, which
naturally arises in many statistical and econometric models such as (nonadditive) single-index
models (see, e.g., Powell (1994) and Matzkin (2007) for review). Moreover, this estimand
has been considered in a variety of empirical problems, including nonparametric demand
estimation (Härdle, Hildenbrand, and Jerison (1991)), policy analysis of tax and subsidy
reform (Deaton and Ng (1998)) and nonlinear pricing in labor markets (Coppejans and
Sieg (2005)). This paper focuses on the density-weighted average derivatives estimator not
only because of its own importance, but also because it admits a particular U -statistic
Robust Data-Driven Inference for Averaged Derivatives
3
representation. As discussed in detail below, this representation is heavily exploited in the
theoretical developments presented here, which implies that the results in this paper may be
extended to cover other estimators having a similar representation.
The main idea is to develop a novel data-driven bandwidth selector compatible with the
small bandwidth asymptotic theory presented in Cattaneo, Crump, and Jansson (2009). This
alternative (…rst-order) large sample theory encompasses the classical large sample theory
available in the literature, and also enjoys several robustness properties. In particular, (i )
it provides valid inference procedures for (small) bandwidth sequences that would render
the classical results invalid, (ii ) it permits the use of a second-order kernel regardless of
the dimension of the regressors and therefore removes strong smoothness assumptions, and
(iii ) it provides a limiting distribution that is in general not invariant to the particular
choices of smoothing and tuning parameters, without necessarily forcing a slower than root-
n rate of convergence (where n is the sample size). The key theoretical insight behind these
results is to accommodate bandwidth sequences that break down the asymptotic linearity
of the estimator of interest, leading to a more general …rst-order asymptotic theory that
is no longer invariant to the particular choices of parameters underlying the preliminary
nonparametric estimator. Consequently, it is expected that an inference procedure based
on this alternative asymptotic theory would (at least partially) “adapt” to the particular
choices of these parameters.
The preliminary simulation results in Cattaneo, Crump, and Jansson (2009) show that
this alternative asymptotic theory opens the possibility for the construction of a robust in-
ference procedure, providing a range of (small) bandwidths for which the appropriate test
statistic enjoys approximately correct size. However, the bandwidth selectors available in the
literature turn out to be incompatible with these new results in the sense that they would
not deliver a bandwidth choice within the robust range. The new data-driven bandwidth
selector presented here achieves this goal, thereby providing a robust automatic (i.e., fully
Robust Data-Driven Inference for Averaged Derivatives
4
data-driven) inference procedure for the estimand of interest. These results are corroborated
by an extensive Monte Carlo experiment, which shows that the asymptotic theory developed
in Cattaneo, Crump, and Jansson (2009) coupled with the data-driven bandwidth selector
proposed here leads to remarkable improvements in inference when compared to the alter-
native procedures available in the literature. In particular, the resulting con…dence intervals
exhibit close-to-correct empirical coverage across all designs considered. Among other ad-
vantages, these data-driven statistical procedures allow for the use of a second-order kernel,
which is believed to deliver more stable results in applications (see, e.g., Horowitz and Härdle
(1996)), and appear to be considerably more robust to the additional variability introduced
by the estimation of the bandwidth selectors. Furthermore, these results are important be-
cause the standard nonparametric bootstrap is not a valid alternative in general to the large
sample theory employed in this paper (Cattaneo, Crump, and Jansson (2010)).
Another interesting feature of the analysis presented here is related to the well known
trade-o¤ between e¢ ciency and robustness in statistical inference. In particular, the novel
procedures presented here are considerably more robust while in general (semiparametric)
ine¢ cient. This feature is captured by the behavior of the new robust con…dence intervals
in the simulation study, where they are seen to have correct size and less bias but larger
length on average. For example, when the classical procedure is valid (i.e., when using a
higher-order kernel), the e¢ ciency loss is found to be around 10% on average, while the bias
of the estimator is reduced by about 60% on average.
This paper contributes to the important literature of semiparametric inference for weighted
average derivatives. This population parameter was originally introduced by Stoker (1986),
and has been intensely studied since then. Härdle and Stoker (1989) and Härdle, Hart, Mar-
ron, and Tsybakov (1992) study general weighted average derivatives estimators, although
their results are considerably complicated by the fact that their representation requires han-
dling stochastic denominators and appears to be very sensitive to the choice of trimming
Robust Data-Driven Inference for Averaged Derivatives
5
parameters. The density-weighted average derivatives estimator circumvents this problem,
while retaining the desirable properties of the general weighted average derivative, and leads
to a simple and useful semiparametric estimator. Powell, Stock, and Stoker (1989) study the
…rst-order large sample properties of this estimator and provide su¢ cient (but not necessary)
conditions for root-n consistency and asymptotic normality. Under appropiate restrictions,
Newey and Stoker (1993) discuss semiparametric e¢ ciency of weighted average derivatives.
Nishiyama and Robinson (2000, 2005) study the second-order large sample properties of
density-weighted average derivatives by deriving valid Edgeworth expansions for the estima-
tor considered in this paper (see also Robinson (1995)), while Härdle and Tsybakov (1993)
and Powell and Stoker (1996) provide second-order mean squared error expansions for this
estimator (see also Newey, Hsieh, and Robins (2004)). Both types of higher-order expansions
provide simple plug-in bandwidth selectors targeting di¤erent properties of this estimator,
and are compatible with the classical large sample theory available in the literature. Ichimura
and Todd (2007) provide a recent survey with particular emphasis on implementation.
The rest of the paper is organized as follows. Section 2 describes the model and reviews
the main results available in the literature regarding …rst-order large sample inference for
density-weighted average derivatives. Section 3 presents the higher-order mean squared error
expansion and develops the new (infeasible) theoretical bandwidth selector, while Section 4
describes how to construct a feasible (i.e., data-driven) bandwidth selector and establishes
its consistency. Section 5 summarizes the results of an extensive Monte Carlo experiment.
Section 6 discusses how the results may be generalized and concludes.
2.
Model and Previous Results
Let zi = (yi; x0)0, i = 1; : : : ; n, be a random sample from a vector z = (y; x0)0, where y
i
2 R is
a dependent variable and x = (x1;
; xd)0 2 Rd is a continuous explanatory variable with a
density f ( ). The population parameter of interest is the density-weighted average derivative
Robust Data-Driven Inference for Averaged Derivatives
6
given by
@
= E f (x)
g (x) ,
@x
where g (x) = E[yjx] denotes the population regression function. For example, this estimand
is a popular choice for the estimation of the coe¢ cients (up to scale) in a single-index model
with unknown link function. To see this, note that
/ when g (x) = (x0 ) for an
unknown (link) function ( ), a semiparametric problem that arises in a variety of contexts,
including discrete choice and censored models.
The following assumption collects typical regularity conditions imposed on this model.
Assumption 1. (a) E[y4] < 1, E[ 2 (x)f (x)] > 0 and V[@e (x)=@x y@f (x)=@x] is posi-
tive de…nite, where 2 (x) = V[yjx] and e(x) = f (x)g (x).
(b) f is (Q + 1) times di¤erentiable, and f and its …rst (Q + 1) derivatives are bounded,
for some Q
2.
(c) g is twice di¤erentiable, and e and its …rst two derivatives are bounded.
(d) v is di¤erentiable and supx2Rd[v (x) f (x)+v (x) k@f (x)=@xk+k@v (x)=@xk] < 1,
where k k is the Euclidean norm and v (x) = E[y2jx].
(e) limkxk!1[f (x) + je(x)j] = 0.
Assumption 1 and integration by parts lead to
=
2E [y @f (x)/ @x], which in turn
motivates the analogue estimator of Powell, Stock, and Stoker (1989) given by
n
n
^
1
@ ^
1
1
xj
x
n =
2
y
f
K
,
n X i@x n;i (xi),
^
fn;i (x) = n 1
hd
hn
j X
i=1
=1;j6=i n
where ^
fn;i ( ) is a “leave-one-out”kernel density estimator for some kernel function K : Rd !
R and some positive (bandwidth) sequence hn. Typical regularity conditions imposed on the
kernel-based nonparametric estimator are given in the following assumption.
Robust Data-Driven Inference for Averaged Derivatives
7
Assumption 2. (a) K is even and di¤erentiable, and K and its …rst derivative are bounded.
(b)
(c) R
_
K (u) _
K (u)0du is positive de…nite, where _
K (u) = @K (u) =@u.
Rd
For some P
2, R [RdjK(u)j(1+kukP)+k _K(u)k(1+kuk2)]du<1, and
Z
8 1; ifl1+ +ld=0;
ul1
uldK (u) du = >
.
1
d
<
Rd
>
: 0; if0<l1+ +ld<P
Powell, Stock, and Stoker (1989) showed that, under appropriate restrictions on the
bandwidth sequence and kernel function, the estimator ^n is asymptotically linear with
in‡uence function given by L (z) = 2[@e (x) =@x
y@f (x) =@x
]. Thus, the asymptotic
variance of this estimator is given by
= E L (z) L (z)0 . Moreover, although not covered
by the results in Newey and Stoker (1993), it is possible to show that L (z) is the e¢ cient
in‡uence function for , and hence
is the semiparametric e¢ ciency bound for this estimand.
The following result describes the exact conditions and summarizes the main conclusion.
(Limits are taken as n ! 1 unless otherwise noted.)
Result 1. (Powell, Stock, and Stoker (1989)) If Assumptions 1 and 2 hold, and if nh2 min(P;Q)
n
!
0 and nhd+2
n
! 1, then
p
n
1
n(^n
) = p
L (z
n X i) + op (1) !d N (0; ).
i=1
Result 1 follows from noting that the estimator ^n admits a n-varying U -statistic repre-
sentation given by
1 n 1
n
^
n
xi
xj
n =
U (z
(y
2
X
i; zj ; hn) ,
U (zi; zj; h) =
h (d+1) _
K
h
i
yj) ,
jX
i=1
=i+1
Robust Data-Driven Inference for Averaged Derivatives
8
which leads to the Hoe¤ding decomposition ^n = n + Ln + Wn, where
n
n
1
n
1 n 1
n = E [U (zi; zj ; hn)] ,
Ln =
L (z
W (z
n X i;hn), Wn = 2 X
i; zj ; hn) ,
jX
i=1
i=1
=i+1
with L (zi; h) = 2[E[U(zi; zj; h)jzi] E[U (zi;zj;h)]] and W (zi;zj;h) = U (zi;zj;h) (L(zi;h)+
L(zj; h))=2
E [U (zi; zj; h)]. This decomposition shows that the estimator admits a bilinear
form representation in general, which clearly justi…es the conditions imposed on the band-
width sequence and the kernel function: (i ) condition nh2 min(P;Q)
n
! 0 ensures that the bias
of the estimator is asymptotically negligible because n
= O(hmin(P;Q)
n
), and (ii ) condition
nhd+2
n
! 1 ensures that the “quadratic term”of the Hoe¤ding decomposition is also as-
ymptotically negligible because Wn = Op(n 1h (d+2)=2
n
). Under the same conditions, Powell,
Stock, and Stoker (1989) also develop a simple consistent estimator for
, which is given by
the analogue estimator
n
n
^
1
^
^
#
n =
L L0 ,
^
L
U (z
.
n X n;i n;i
n;i = 2 " 1
n
1
i; zj ; hn)
^n
j X
i=1
=1;j6=i
Consequently, under the conditions imposed in Result 1, it is straightforward to form
a studentized version of ^n, leading to an asymptotically pivotal test statistic given by
pn^ 1=2
n
(^n
) !d N (0;Id), with ^n !p . This test statistic may be used in the usual
way to construct a con…dence interval for
(or, equivalently, to carry out the corresponding
dual hypothesis test).
As discussed in Newey (1994), asymptotic linearity of a semiparametric estimator has
several distinct features that may be considered attractive from a theoretical point of view.
In particular, asymptotic linearity is a necessary condition for semiparametric e¢ ciency and
leads to a limiting distribution of the statistic of interest that is invariant to the choice of
the nonparametric estimator used in the construction of the semiparametric procedure. In
Robust Data-Driven Inference for Averaged Derivatives
9
other words, regardless of the particular choice of preliminary nonparametric estimator, the
limiting distribution will not depend on the nonparametric estimator whenever the semi-
parametric estimator admits an asymptotic linear representation.
However, achieving an asymptotic linear representation of a semiparametric estimator
imposes several strong model assumptions and leads to a large sample theory than may not
accurately represent the …nite sample behavior of the estimator. In the case of ^n, asymptotic
linearity would require P > 2 unless d = 1, which in turn requires strong smoothness
conditions (Q
P ). Consequently, classical asymptotic theory will require the use of a
higher-order kernel whenever more than one covariate is included. In addition, classical
asymptotic theory (whenever valid) leads to a limiting experiment which is invariant to the
particular choices of smoothing (K) and tuning (hn) parameters involved in the construction
of the estimator, and therefore it is unlikely to be able to “adapt” to changes in these
parameters. In other words, inference based on classical asymptotic theory is silent with
respect to the impact that these parameters may have on the …nite sample behavior of ^n.
In an attempt to better characterize the …nite sample behavior of ^n, Cattaneo, Crump,
and Jansson (2009) show that it is possible to increase the robustness of this estimator by
considering a di¤erent asymptotic experiment. In particular, instead of forcing asymptotic
linearity of the estimator, the authors develop an alternative …rst-order asymptotic the-
ory that accommodates weaker assumptions than those imposed in the classical …rst-order
asymptotic theory discussed above. Intuitively, the idea is to characterize the (joint) as-
ymptotic behavior of both the linear (Ln) and quadratic (Wn) terms. The following result
collects the main …ndings.
Result 2. (Cattaneo, Crump, and Jansson (2009)) If Assumptions 1 and 2 hold, and if
min nhd+2; 1 nh2 min(P;Q)
n
n
! 0 and n2hdn ! 1, then
(V[^n]) 1=2(^n
) !d N (0;Id),
Robust Data-Driven Inference for Averaged Derivatives
10
where
1
n
1
V[^n] =
[
+ o (1)] +
h (d+2) [
+ o (1)] ,
n
2
n
with
= 2E [ 2 (x) f (x)] R _K(u) _K(u)0du. In addition,
Rd
1
1
^
1
n
[
+ o
h (d+2)[
+ o
n n = n
p (1)] + 2
2
n
p (1)].
Result 2 shows that the conditions on the bandwidth sequence may be considerably
weakened without invalidating the limiting Gaussian distribution. In particular, whenever
hn is chosen so that nhd+2 is bounded, the limiting distribution will cease to be invariant
n
with respect to the underlying preliminary nonparametric estimator because ^n is no longer
asymptotically linear. (In particular, note that nhd+2
n
! > 0 retains the root-n consistency
of ^n.) In addition, because hn is allowed to be “smaller”than usual, the bias of the estimator
is controlled in a di¤erent way, removing the need for higher-order kernels. In particular,
Result 2 remains valid even in cases when the estimator is not consistent. Finally, this result
also highligths the well known trade-o¤ between robustness and e¢ ciency in the context of
semiparametric estimation. In particular, the estimator ^n is semiparametric e¢ cient if and
only if nhd+2
n
! 1, while it is possible to construct more robust inference procedures under
considerably weaker conditions.
It follows from Result 2 that the feasible classical testing procedure based on pn ^ 1=2
n
(^n
) will be invalid unless nhd+2
n
! 1, which corresponds to the classical large sample theory
case (Result 1). To solve this problem, Cattaneo, Crump, and Jansson (2009) propose two
alternative corrections to the standard error matrix ^ n, leading to two options for “robust”
standard errors. To construct the …rst “robust” standard error formula, the authors intro-
duce a simple consistent estimator for
, under the same conditions of Result 2, which is
Robust Data-Driven Inference for Averaged Derivatives
11
given by the analogue estimator
1 n 1
n
^
n
^
^
1
^
^
n = hd+2
W
W 0 ,
^
W
L
n
2
X n;ij n;ij n;ij=U(zi;zj;hn) 2 n;i+^Ln;j n.
jX
i=1
=i+1
Thus, using this estimator,
1
^
1
n
V
^
^
1;n =
h (d+2)
n n
2
n
n
yields a consistent standard error estimate under small bandwidth asymptotics (i.e., under
the weaker conditions imposed in Result 2, which include in particular those imposed in
Result 1). To describe the second “robust” standard error formula, let ^ n (Hn) be the
estimator ^ n constructed using a bandwidth sequence Hn (e.g., ^ n = ^ n (hn) by de…nition).
Then, under the same conditions of Result 2,
^
1
V
^
2;n = n n 21=(d+2)hn
also yields a consistent standard error estimate under small bandwidth asymptotics.
Consequently, under the conditions imposed in Result 2, it is straightforward to form
a studentized version of ^n, leading to two simple, robust and pivotal test statistics of the
form ^
V 1=2(^
k;n
n
) !d N (0;Id), with ^V 1
k;n V[^n] !p Id, k = 1; 2. These test statistics
may also be used to construct (asymptotically equivalent) con…dence intervals for
under
the (weaker) conditions imposed in Result 2, and constitute alternative procedures to the
classical con…dence interval introduced above.
These results, however, have the obvious drawback of being dependent on the choice of hn,
which is unrestricted beyond the rate restrictions imposed in Result 2. A preliminary Monte
Carlo experiment reported in Cattaneo, Crump, and Jansson (2009) shows that the new,
robust standard error formulas have the potential to deliver good …nite sample behavior if
Robust Data-Driven Inference for Averaged Derivatives
12
the initial bandwidth is chosen to be small enough. Unfortunately, the plug-in rules available
in the literature for hn fail to deliver a choice of bandwidth that would enjoy the robustness
property introduced by the new asymptotic theory described in Result 2. This is not too
surprising, since these bandwidth selectors are typically constructed to balance (higher-order)
bias and variance in a way that is “appropriate” for the classical large sample theory.
3.
MSE Expansion and “Optimal” Bandwidth Selectors
This paper considers the mean squared error expansion of ^n as the starting point for the
construction of the plug-in “optimal”bandwidth selector. To derive this expansion it is nec-
essary to strengthen the assumptions concerning the data generating process. The following
assumption describes these additional mild su¢ cient conditions.
Assumption 3. (a) E[ k@g (x)=@xk2 f (x)] < 1.
(b) g is (Q + 1) times di¤erentiable, and e and its …rst (Q + 1) derivatives are bounded.
(c) v is three times di¤erentiable, and vf and its …rst three derivatives are bounded.
(d) limkxk!1[ (x) f (x) + k@ (x)=@xkf (x)] = 0.
Assumption 3(a) is used to ensure that the higher-order mean squared expansion is valid
up to the order needed in this paper. Assumptions 3(b) and 3(c) are in agreement with
those imposed in Powell and Stoker (1996) and Nishiyama and Robinson (2000, 2005), while
Assumption 3(d) is slightly stronger than the analogue restriction imposed in those papers.
Theorem 1. If Assumptions 1, 2 and 3 hold, then for s = min (P; Q) and _
f (x) = @f (x) =@x,
1
n
1
n
1
E h(^n )(^n )0i = +
h (d+2)
+
h d
n
2
n
2
n V + h2s
n BB0
+O n 1hs + o n 2h d + h2s ,
n
n
Robust Data-Driven Inference for Averaged Derivatives
13
where
2 ( 1)s
B =
Z ul1 uldK(u)du E" @(l1+ +ld) _f(x)!g(x)#
s!
1
d
@xl1
@xld
0 l X
Rd
1
1;
;l
d
d
s
l1+ +ld=s
and
@2
@
@
0
V = Z _K (u) _K (u)0 u0E 2(x)
f (x) +
g (x)
g (x)
f (x) u du.
Rd
@x@x0
@x
@x
The result in Theorem 1 is similar to the one obtained by Härdle and Tsybakov (1993)
and Powell and Stoker (1996), the key di¤erence being that the additional term of order
O n 2h d is explicitly retained here. (Recall that Result 2 requires n2hd
n
n ! 1.)
To motivate the new “optimal” bandwidth selector, recall that the “robust” variance
matrix in Result 2 is given by the …rst two terms of the mean squared error expansion
presented in Theorem 1, which suggests considering the next two terms of the expansion to
construct an “optimal” bandwidth selector. (Note that, as it is common in the literature,
this approach implicitly assumes that both B and V are non-zero.) Intuitively, balancing
these terms corresponds to the case of nhd+2
n
! < 1, and therefore pushes the selected
bandwidth to the “small bandwidth region”. This approach may be considered “optimal”
in a mean square error sense because it makes the leading terms ignored in the general large
sample approximation presented in Result 2 as small as possible.
To describe the new bandwidth selector, let
2 Rd and consider (for simplicity) a
bandwidth that minimizes the next two terms of E[( 0(^n
))2]. This “optimal”bandwidth
selector is given by
8
1
d( 0V )
2s+d
if 0V > 0
h
= > s( 0B)2n2
.
CCJ
<
>
1
: 2j 0V j 2s+d if 0
( 0B)2n2
V < 0
Robust Data-Driven Inference for Averaged Derivatives
14
This new theoretical bandwidth selector is consistent with the small bandwidth asymp-
totics described in Result 2 because n2 (h
)d
=
CCJ
! 1. In addition, observe that n 1hsn
o n 2h d whenever nhs+d
.
n
n
! 0, which is satis…ed when hn = hCCJ
This new bandwidth selector may be compared to the two competing plug-in bandwidth
selectors available in the literature, proposed by Powell and Stoker (1996) and Nishiyama
and Robinson (2005), and given by
h
= (d + 2)( 0 )! 1
2s+d+2
and
h
= 2( 0 )! 1
2s+d+2
,
P S
N R
s ( 0B)2 n2
( 0B)2 n2
respectively. Inspection of these bandwidth selectors shows that h
h
h
, leading
CCJ
P S
N R
to a bandwidth selection of smaller order.1
4.
Data-Driven Bandwidth Selectors
The previous section described a new (infeasible) plug-in bandwidth selector that is com-
patible with the small bandwidth asymptotic theory introduced in Result 2. In order to
implement this selector in practice, as well as its competitors h
and h
, it is necessary to
P S
N R
construct consistent estimates for each of the leading constants. These estimates would lead
to a data-driven (i.e., automatic) bandwidth selector, denoted ^
hCCJ . This section introduces
easy to implement, consistent nonparametric estimators for B, and V.2
To describe the data-driven plug-in bandwidth selectors, let bn be a preliminary positive
bandwidth sequence, which may be di¤erent for each estimator. A simple analogue estimator
of
was introduced in Section 2. In particular, let ^ n (bn) be the estimator ^ n constructed
1 Nishiyama and Robinson (2000) derive a third alternative bandwidth selector which is not explicitly
discussed here because this procedure is targeted to one-sided hypothesis testing. Nonetheless, inspection
of this alternative bandwidth selection procedure, denoted h
, shows that h
h
whenever
N R00
CCJ
N R00
d + 8 > 2s. Therefore, h
is of smaller order unless strong smoothness assumptions are imposed in the
CCJ
model and a corresponding higher-order kernel is employed.
2 Alternatively, a straightforward bandwidth selector may be constructed using a “rule-of-thumb”estima-
tor based on some ad-hoc distributional assumptions.
Robust Data-Driven Inference for Averaged Derivatives
15
using a bandwidth sequence bn (e.g., ^ n = ^ n (hn) by de…nition). Note that this estimator
is a n-varying U -statistic as well. Theorem 1 and the calculations provided in Cattaneo,
Crump, and Jansson (2009) show that, if Assumptions 1, 2 and 3 hold, then
^ n (bn) =
+ b2
+ n 1=2 + n 1b d=2),
nV + Op(b3n
n
which gives the consistency of this estimator if bn ! 0 and n2bdn ! 1.
Next, consider the construction of consistent estimators of B and V, the two parameters
entering the new bandwidth selector h
. To this end, let k be a kernel function, which
CCJ
may be di¤erent for each estimator, and may be di¤erent from K. The following assump-
tion collects a set of su¢ cient conditions to establish consistency of the plug-in estimators
proposed in this paper for B and V.
Assumption 4. (a) f , v and e are (s + 1 + S) times di¤erentiable, and f , vf , e and their
…rst (s + 1 + S) derivatives are bounded, for some S
1.
(b) k is even and M times di¤erentiable, and k and its …rst M derivatives are bounded,
for some M
0.
(c) For some R
2, RRdjk(u)j(1+kukR)du<1, and
Z
8 1; ifl1+ +ld=0;
ul1
uldk (u) du = >
.
1
d
<
Rd
>
: 0; if0<l1+ +ld<R
For the bias B, a plug-in estimator is given by
^
2 ( 1)s
Bn =
Z ul1 uldK(u)du ^#
s!
1
d
l1;
;ld;n,
0 l X
Rd
1;
;ld s
l1+ +ld=s
Robust Data-Driven Inference for Averaged Derivatives
16
where
n
n
^
1
x
#
@(l1+ +ld) _ i xj !
l
b (d+1)
k
y
1;
;ld;n = n (n
1) X
n
i.
@xl1
@xld
bn
j X
i=1
=1;j6=i
1
d
The estimator ^
#l
@xld)y],
1;
;ld;n is the sample analogue estimator of E[(@(l1+ +ld) _
f (x) =@xl1
1
d
and is also a n-varying U -statistic estimator employing a leave-one-out kernel-based density
estimator.
It is also possible to form an obvious plug-in estimator for the new higher-order term V.
However, this estimator would have the unappealing property of requiring the estimation
of several nonparametric objects ( 2 (x), @2f (x)=@x@x0, @g(x)=@x, f (x)). Moreover, this
direct plug-in approach is likely to be less stable when implemented because it would require
handling stochastic denominators. Fortunately, it is possible to construct an alternative,
indirect estimator much easier to implement in practice. This estimator is intuitively justi…ed
as follows: the results presented above show that, under appropriate regularity conditions,
b 2( ^
+ n 1b d=2 2
n
n (bn)
) = V +Op(bn+n 1=2b 2
n
n
), and therefore an estimator satisfying
~ n =
+ op(b2 ) would lead to
n
^
Vn = b 2(^
n
n (bn)
~ n) = V + op (1),
if bn ! 0, nb4n ! 0 and n2bd+4
n
! 0. Under appropriate conditions, an estimator having
these properties is given by
1 n 1
n
~
_
n
xj
xi
n = ^n Z K (u) _K (u)0 du,
^n =
X b dk
(y
n
b
i
yj)2 .
n
jX
Rd
2
i=1
=i+1
In this case, ^n is a sample analogue estimator of 2E [ 2 (x) f (x)], which is also a n-varying
U -statistic estimator employing a leave-one-out kernel-based density estimator.
Theorem 2. If Assumptions 1, 3 and 4 hold, then:
Robust Data-Driven Inference for Averaged Derivatives
17
(i) For M
s + 1,
^
#
_
l
f (x)!y# + O
+ n 1=2 + n 1b (d+2+2s)=2 .
1;
;ld;n = E " @(l1+ +ld)
p
bmin(R;S)
n
n
@xl1
@xld
1
d
(ii) For R
3,
^n = 2E 2 (x) f (x) + Op bmin(R;s+1+S) + n 1=2 + n 1b d=2 .
n
n
This theorem gives simple su¢ cient conditions to construct a robust data-driven band-
width selector consistent with the small bandwidth asymptotics derived in Cattaneo, Crump,
and Jansson (2009). In particular, de…ne
8
1
d( 0 ^
V
2s+d
n )
if 0 ^
Vn > 0
^
> s( 0^
h
> Bn)2n2
CCJ = <
.
>
1
> 2 2s+d
: j 0^Vn j
if 0 ^
( 0 ^
B
Vn < 0
n)2n2
The following corollary establishes the consistency of the new bandwidth selector ^
hCCJ .
Corollary 1. If Assumptions 1, 2, 3 and 4 hold with M
s + 1 and R
3, and if bn ! 0
and n2bmax(8;d+2+2s)
n
! 1, then for 2 Rd such that 0B 6= 0 and 0V 6= 0,
^
hCCJ
h
!p 1.
CCJ
(The analogous result also holds for ^
hPS and ^
hNR.)
The results presented so far are silent about the selection of the initial bandwidth choice
bn in applications, beyond the rate restrictions imposed by Corollary 1. A simple choice
for the preliminary bandwidth bn may be based on some data-driven bandwidth selector
Robust Data-Driven Inference for Averaged Derivatives
18
developed for a nonparametric object present in the corresponding target estimands B, and
V. Typical examples of such procedures include simple rule-of-thumbs, plug-in bandwidth
selectors and (smoothed) cross-validation.
As shown in the simulations presented in the next section, it appears that a simple
data-driven bandwidth selector from the literature of nonparametric estimation works well
for the choice of bn. Nonetheless, it may be desirable to improve upon this preliminary
bandwidth selector in order to obtain better …nite sample behavior. Although beyond the
scope of this paper, a conceptually feasible (but computationally demanding) idea would be
to compute second-order mean squared error expansions for ^
#l1; ;ld;n, ^ n and ^n. Since these
three estimators are n-varying U -statistics, the results from Powell and Stoker (1996) may be
applied to obtain a corresponding set of “optimal”bandwidth choices. These procedures will,
in turn, also depend on a preliminary bandwidth when implemented empirically, which again
would need to be chosen in some way. This idea mimics, in the context of semiparametric
estimation, the well-known second-generation direct plug-in bandwidth selector (of level 2)
from the literature of nonparametric density estimation. (See, e.g., Wand and Jones (1995)
for a detailed discussion.) Although the validity of such bandwidth selectors would require
stronger assumptions, by analogy from the nonparametric density estimation literature, they
would be expected to improve the …nite sample properties of the bandwidth selector for hn
and, in turn, the performance of the semiparametric inference procedure.
5.
Monte Carlo Experiment
This section summarizes the main …ndings from an extensive Monte Carlo experiment con-
ducted to analyze the …nite sample properties of the new robust data-driven procedures and
their relative merits when compared to the other procedures available. The online supple-
mental material includes a larger set of results from this simulation study, which shows that
the …ndings reported here are consistent across all designs considered.
Robust Data-Driven Inference for Averaged Derivatives
19
Following the results reported in Cattaneo, Crump, and Jansson (2009), the Monte Carlo
experiment considers six di¤erent models of the “single index”form yi = (y ), where y =
i
i
x0 +"
i
i,
( ) is a nondecreasing (link) function and "i s N (0;1) is independent of the vector
of regressors xi 2 Rd. Three di¤erent link functions are considered: (y ) = y , (y ) =
1 (y > 0) and
(y ) = y 1 (y > 0), which correspond to a linear regression, probit, and
Tobit model, respectively. (1 ( ) represents the indicator function.) The vector of regressors
is generated using independent random variables and standardized to have E [xi] = 0 and
E [xix0] = I
i
d, with the …rst component x1i having either a Gaussian distribution or a chi-
squared distribution with 4 degrees of freedom (denoted
), while the remaining components
4
have a Gaussian distribution throughout the experiment. All the components of
are set
equal to unity, and for simplicity only results for the …rst component 1 are considered.
Table I: Monte Carlo Models
yi = y
y
> 0)
y
1 (y > 0)
i
i = 1 (yi
i = yi
i
x1i s N (0;1) Model 1: 1 = 1
Model 3:
Model 5:
4
1 =
1
8 3=2
1 = 1
8
x1i s 4 4
p
Model 2:
Model 4:
8
1 =
1
4p2
1 = 0:02795
Model 6: 1 = 0:03906
Table I summarizes the Monte Carlo models, reports the value of the population parame-
ter of interest, and provides the corresponding label of each model considered. (Whenever
unavailable in closed form, the population parameters are computed by a numerical approx-
imation.) The simulation study considers three sample sizes (n = 100, n = 400, n = 700),
two dimensions of the regressors vector (d = 2, d = 4), and two kernel orders (P = 2, P = 4).
The kernel function K ( ) is chosen to be a Gaussian product kernel, and the preliminary
kernel function k ( ) is chosen to be a fourth-order Gaussian product kernel as required by
Corollary 1. For each combination of parameters 10; 000 replications are carried out. To
conserve space this section only includes the results for d = 2 and n = 400.
Robust Data-Driven Inference for Averaged Derivatives
20
The simulation experiment considers the three (infeasible) population bandwidth choices
derived in Section 3 (h
, h
, h
), and their corresponding data-driven estimates (^
h
P S
N R
CCJ
P S ,
^
hNR, ^
hCCJ ). The three estimated bandwidths are obtained using the results described in
Section 4 with a common initial bandwidth plug-in estimate used to construct ^
Bn, ^n and
^
Vn. To provide a parsimonious data-driven procedure, an estimate of the initial bandwidth
bn is constructed as a sample average of a second-generation direct plug-in level-two estimate
for the (marginal) density of each dimension of the regressors vector (see, e.g., Wand and
Jones (1995)). Con…dence intervals for 1 are constructed using the classical test statistic
pn^ 1=2
n
(^n
), denoted PSS, and the two alternative robust test statistics ^
V 1=2(^
k;n
n
),
k = 1; 2, denoted by CCJ1 and CCJ2, respectively. The classical inference procedure PSS
is only theoretically valid when P = 4, while the robust procedures CCJ1 and CCJ2 are
always valid across all simulation designs.
Figures 1 and 2 plot the empirical coverage for the three competing 95% con…dence
intervals as a function of the choice of bandwidth for each of the six models. To facilitate
the analysis two additional horizontal lines at 0:90 and at the nominal coverage rate 0:95
are included for reference, and the three population bandwidth selectors (h
, h
, h
)
P S
N R
CCJ
are plotted as vertical lines. (Note that h
= h
for the case d = 2 and P = 2.) These
P S
N R
…gures highlight the potential robustness properties that the test statistics CCJ1 and CCJ2
may have when using the new data-driven plug-in bandwidth selector. In particular, the
theoretical bandwidth selector h
lays within the robust region for which both CCJ1 and
CCJ
CCJ2 have correct empirical coverage for a range of bandwidths. For example, this suggests
that (at least) some of the variability introduced by the estimation of this bandwidth selector
will not a¤ect the performance of the robust test statistics CCJ1 and CCJ2, a property
unlikely to hold for the classical procedure PSS. Table 1 reports the empirical coverage of
each possible con…dence intervals (PSS, CCJ1, CCJ2) when using each possible population
bandwidth selector (h
, h
, h
).
P S
N R
CCJ
Robust Data-Driven Inference for Averaged Derivatives
21
Figures 3 and 4 plot corresponding kernel density estimates for the test statistic PSS
coupled with either h
and h
, and for the test statistics CCJ1 and CCJ2 coupled with
P S
N R
h
. To facilitate the comparison the density of the standard normal is also depicted.
CCJ
These …gures show that the Gaussian approximation of the robust test statistics using the
new bandwidth selector is considerably better than the corresponding approximation for PSS
when constructed using either of the classical bandwidth selectors. In particular, the empir-
ical distribution of the classical procedure appears to be more biased and more concentrated
than the empirical distributions of either CCJ1 or CCJ2. These …ndings highlight the well
known trade-o¤ between e¢ ciency and robustness previously discussed. These results are
veri…ed in Table 2, where the average empirical bias and average empirical interval length are
reported for each competing con…dence interval when coupled with each possible population
bandwidth selector.
To analyze the performance of the new data-driven bandwidth selector, and the resulting
robust data-driven con…dence intervals, Table 3 presents the empirical coverage of each pos-
sible con…dence interval (PSS, CCJ1, CCJ2) when using each possible estimated bandwidth
selector (^
hPS, ^
hNR, ^
hCCJ ). These tables provide concrete evidence of the superior perfor-
mance (in terms of achieving correct coverage) of the robust test statistics when coupled
with the new estimated bandwidth. Both robust con…dence intervals (CCJ1, CCJ2) using
^
hCCJ provide close-to-correct empirical coverage across all desings, a property not enjoyed
by the classical con…dence interval (PSS) using either ^
hPS or ^
hNR.
The good performance of CCJ1 and CCJ2 is maintained not only when using a second-
order kernel (P = 2), but also when the dimension of x is larger (d = 4), which provides
simulation evidence of the relatively low sensitivity of the new robust data-driven procedures
to the so-called “curse of dimensionality.”This …nding may be (heuristically) justi…ed by the
fact that under the small bandwidth asymptotics, the limiting distribution is not invariant
to the “parameter”d, which in turn may lead to the additional robustness properties found.
Robust Data-Driven Inference for Averaged Derivatives
22
In addition, as suggested by the superior distributional approximation reported in Figures 3
and 4, the main …ndings continue to hold if other nominal con…dence levels are considered.
6.
Extensions and Final Remarks
This paper introduced a novel data-driven plug-in bandwidth selector compatible with the
small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density-
weighted average derivatives. This new bandwidth selector is of the plug-in variety, and is
obtained based on a mean squared error expansion of the estimator of interest. An extensive
Monte Carlo experiment showed a remarkable improvement in performance of the result-
ing new robust data-driven inference procedure. In particular, the new con…dence intervals
provide approximately correct coverage in cases where there is no valid alternative inference
procedures (i.e., using a second-order kernel with at least two regressors), and also com-
pares favorably to the alternative, classical con…dence intervals when they are theoretically
justi…ed.
Since these results are derived by exploting the n-varying U -statistic representation of ^n,
it is plausible that similar results could be obtained for other estimators having an analogous
representation. For example, the class of estimands considered in Newey, Hsieh, and Robins
(2004, Section 2) have this representation, and therefore it seems possible that the results
presented here could be generalized to cover that class. More generally, as suggested in
Cattaneo, Crump, and Jansson (2009), an n-varying U -statistic may be represented as a
minimizer of the U -process:
1 n 1
n
^
n
n = arg min
Q (z
2
X
i; zj ;
; hn) ,
Q (zi; zj; ; h) = kU (zi;zj;h)
k2 ,
jX
i=1
=i+1
which also suggests that the results presented here may be extended to conver this class of
estimators (see, e.g., Aradillas-Lopez, Honore, and Powell (2007, pp. 1120–1122)).
Robust Data-Driven Inference for Averaged Derivatives
23
7.
Appendix
Proof of Theorem 1. To save notation, for any function a : Rd ! R let _a(x) = @a(x)=@x and
•
a (x) = @a (x) =@x@x0. A Hoe¤ding decomposition of ^n gives
0
E h(^n )(^n )0i = V[^n]+ E[^n]
E[^n]
= V[Ln] + V[Wn] + h2s
n BB0 + o h2s
n
,
where the bias expansion follows immediately by a Taylor series expansion.
For V[Ln], using integration by parts,
E [U
_
n (zi; zj )j zi] = Z _e(xi + uhn)K (u)du yi Z f (xi + uhn)K (u)du,
Rd
Rd
and therefore V[Ln] = 4n 1V[E [Un (zi; zj) jzi]
n] = n 1
+ O n 1hsn .
For V[Wn], by standard calculations,
n
1
V[Wn] =
E Un (zi; zj) Un (zi; zj)0 + O n 2
2
n
1
=
h (d+2)
_
K (u) _
K (u)0 T (x; uhn) dxdu + O n 2 ,
2
n
ZRd
with T (x; u) = (v (x) + v (x + u)
2g (x) g (x + u)) f (x) f (x + u). Then, using a Taylor series
expansion, T (x; uhn) = T1 (x) + T2 (x)0 uhn + u0T3 (x) uh2n + o(h2n), where T1 (x) = 2 2 (x) f (x)2,
T2 (x) = 2 2 (x) f (x) _
f (x) + f (x)2 _ 2 (x), and T3 (x) =
2 (x) f (x) •
f (x) + f (x) _ 2 (x) _
f (x) +
(•
v (x) =2
g (x) •
g (x))f (x)2.
Note that
using integratiR
_
K (u) _
K (u)0 T
_
K (u) _
K (u)0 2 2 (x) f (x)2dxdu =
and,
Rd
on RRd
1 (x)dxdu =
by parts,
RRdRRd
h
_
n Z Z K (u) _K (u)0 T2 (x)0 u dxdu
Rd Rd
0
= h
_
n Z K (u) _K (u)0 Z h 2(x)2f (x) _f(x)+ f (x)2 _2(x)idx u du = 0.
Rd
Rd
Robust Data-Driven Inference for Averaged Derivatives
24
Finally, using integration by parts and the fact that •2 (x) = •
v (x)
2 _g (x) _g (x)0
2g (x) •
g (x),
h2
_
n Z K (u) _K (u)0 u0T3 (x)u dxdu
Rd
= h2
_
2
n Z K (u) _K (u)0 u0 Z
(x) •
f (x) f (x) dx + Z _g(x) _g(x)0f (x)2dx u du.
Rd
Rd
Rd
1
1
Therefore, V[Wn] = n
h (d+2)
h d
2
n
+ n2
n V + o(n 2h d
n ).
Proof of Theorem 2. For part (i), note that ^
#l1; ;ld;n may be written as a n-varying U-
statistic (assuming without loss of generality that s is even), given by
1 n 1
n
^
n
#l
u
1;
;ld;n =
1 (zi; zj ; bn) ,
2
XjX
i=1 =i+1
with (recall that s = l1 +
+ ld)
1
u
_
1 (zi; zj ; b) = b (d+1+s) 0
@ @s k(x)
@xl1
@xld
1
d
x=(x
A(yi yj).
i
xj )=b
First, change of variables and integration by parts give
@s
E[u
_
!
1 (zi; zj ; bn) jzi] = Z k (u)
@s
f (x)
yi
_e (x)
du.
Rd
@xl1
@xld
@xl1
@xld
1
d
x=x
1
d
i
ubn
x=xi ubn
Second, a Taylor series expansion gives E[u1 (zi; zj; bn)] = #l
+ O(bmin(R;S)
1;
;ld
n
). Next, letting
^
#n = ^
#l1; ;ld;n to save notation, a Hoe¤ding decomposition gives V[^#n] = V[^#1;n] + V[^#2;n], where
n
^
1
#1;n =
2 [E[u1 (zi; zj; bn)
n X
jzi] E[u1 (zi;zj;bn)]],
i=1
and
1 n 1
n
^
n
#2;n =
[u1 (zi; zj; bn)
E[u1 (zi; zj; bn)
2
X
jzi] E[u1 (zi;zj;bn)jzj] + E[u1 (zi;zj;bn)]].
j X
i=1 =i+1
Robust Data-Driven Inference for Averaged Derivatives
25
Finally, using standard calculations, V[^#1;n] = O n 1 and V[^#2;n] = O(n 2b (d+2+2s)
n
), and
the conclusion follows by Markov’s Inequality.
For part (ii), note that ^n is also a n-varying U -statistic, given by
1 n 1
n
^
n
xj
xi
n =
u2 (zi; zj; bn) ,
u2 (zi; zj; b) = b dk
(yi
yj)2 .
2
X
b
j X
i=1 =i+1
First, change of variables gives
E[u2 (zi; zj; bn) jzi] = Z k(u) y2if (xi ubn)+v(xi ubn)f (xi ubn) 2yie(xi ubn) du.
Rd
Second, a Taylor’s expansion gives E[^n] = 2E
2 (x) f (x) + O(bmin(R;s+1+S)
n
). Next, a Ho-
e¤ding decomposition gives V[^n] = V[^1;n] + V[^2;n], where
n
^
1
1;n =
2 [E[u2 (zi; zj; bn)
n X
jzi] E[u2 (zi;zj;bn)]],
i=1
and
1 n 1
n
^
n
2;n =
[u2 (zi; zj; bn)
E[u2 (zi; zj; bn)
2
X
jzi] E[u2 (zi;zj;bn)jzj] + E[u2 (zi;zj;bn)]].
j X
i=1 =i+1
Finally, using standard calculations, V[^1;n] = O n 1 and V[^2;n] = O n 2bdn , and the
conclusion follows by Markov’s Inequality.
8.
Supplemental Material
Further Simulation Results This document contains a comprehensive set of results from the
Monte Carlo experiment summarized in Section 5. These results include all combinations of
sample sizes (n = 100, n = 400, n = 700), dimension of regressors vector (d = 2, d = 4), and
kernel orders (P = 2, P = 4).
Robust Data-Driven Inference for Averaged Derivatives
26
References
Aradillas-Lopez, A., B. E. Honore, and J. L. Powell (2007): “Pairwise Di¤erence Esti-
mation with Nonparametric Control Variables,” International Economic Review, 48, 1119–1158.
Cattaneo, M. D., R. K. Crump, and M. Jansson (2009): “Small Bandwidth Asymptotics for
Density-Weighted Average Derivatives,” working paper.
(2010): “On the Validity of the Bootstrap for Density-Weighted Average Derivatives,”
working paper.
Coppejans, M., and H. Sieg (2005): “Kernel Estimation of Average Derivatives and Di¤erences,”
Journal of Business and Economic Statistics, 23, 211–225.
Deaton, A., and S. Ng (1998): “Parametric and Nonparametric Approaches to Price and Tax
Reform,” Journal of the American Statistical Association, 93, 900–909.
Härdle, W., J. Hart, J. Marron, and A. Tsybakov (1992): “Bandwidth Choice for Average
Derivative Estimation,” Journal of the American Statistical Asssociation, 87, 218–226.
Härdle, W., W. Hildenbrand, and M. Jerison (1991): “Empirical Evidence on the Law of
Demand,” Econometrica, 59, 1525–1549.
Härdle, W., and T. Stoker (1989): “Investigating Smooth Multiple Regression by the Method
of Average Derivatives,” Journal of the American Statistical Asssociation, 84, 986–995.
Härdle, W., and A. Tsybakov (1993): “How Sensitive are Average Derivatives?,” Journal of
Econometrics, 58, 31–48.
Horowitz, J., and W. Härdle (1996): “Direct Semiparametric Estimation of Single-Index
Models with Discrete Covariates,” Journal of the American Statistical Asssociation, 91, 1632–
1640.
Ichimura, H., and P. E. Todd (2007): “Implementing Nonparametric and Semiparametric
Estimators,” in Handbook of Econometrics, Volume VIB, ed. by J. Heckman, and E. Leamer,
pp. 5370–5468. Elsevier Science B.V.
Matzkin, R. L. (2007): “Nonparametric Identi…cation,” in Handbook of Econometrics, Volume
VIB, ed. by J. Heckman, and E. Leamer, pp. 5307–5368. Elsevier Science B.V.
Newey, W. K. (1994): “The Asymptotic Variance of Semiparametric Estimators,”Econometrica,
62, 1349–1382.
Robust Data-Driven Inference for Averaged Derivatives
27
Newey, W. K., F. Hsieh, and J. M. Robins (2004): “Twicing Kernels and a Small Bias
Property of Semiparametric Estimators,” Econometrica, 72, 947–962.
Newey, W. K., and T. M. Stoker (1993): “E¢ ciency of Weighted Average Derivative Estima-
tors and Index Models,” Econometrica, 61, 1199–1223.
Nishiyama, Y., and P. M. Robinson (2000): “Edgeworth Expansions for Semiparametric Av-
eraged Derivatives,” Econometrica, 68, 931–979.
(2005): “The Bootstrap and the Edgeworth Correction for Semiparametric Averaged
Derivatives,” Econometrica, 73, 197–240.
Powell, J. L. (1994): “Estimation of Semiparametric Models,” in Handbook of Econometrics,
Volume IV, ed. by R. Engle, and D. McFadden, pp. 2443–2521. Elsevier Science B.V.
Powell, J. L., J. H. Stock, and T. M. Stoker (1989): “Semiparametric Estimation of Index
Coe¢ cients,” Econometrica, 57, 1403–1430.
Powell, J. L., and T. M. Stoker (1996): “Optimal Bandwidth Choice for Density-Weighted
Averages,” Journal of Econometrics, 75, 291–316.
Robinson, P. M. (1995): “The Normal Approximation for Semiparametric Averaged Derivatives,”
Econometrica, 63, 667–680.
Stoker, T. M. (1986): “Consistent Estimation of Scaled Coe¢ cients,” Econometrica, 54, 1461–
1481.
Wand, M., and M. Jones (1995): Kernel Smoothing. Chapman & Hall/CRC, Florida.
Robust Data-Driven Inference for Averaged Derivatives
28
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
Model 5
Bandwidth
Model 6
Bandwidth
400
0.2
0.2
=
n
,
0.1
0.1
2
=
P
0.0
0.0
,
2
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
=
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
d
als:
0.6
0.6
terv
In
0.5
0.5
nce
0.4
0.4
de
0.3
PSS
CCJ1
CCJ2
h_PS
h_NR
h_CCJ
0.3
Con…
Model 3
Bandwidth
Model 4
Bandwidth
0.2
0.2
95%
for
0.1
0.1
Rates
0.0
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
erage
v
Co
0.6
0.6
0.5
0.5
Empirical
0.4
0.4
1:
0.3
0.3
Model 1
Bandwidth
Model 2
Bandwidth
Figure
0.2
0.2
0.1
0.1
0.0
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
Robust Data-Driven Inference for Averaged Derivatives
29
0.8
0.8
0.6
0.6
Model 5
0.4
Bandwidth
Model 6
0.4
Bandwidth
400
=
n
0.2
0.2
,
4
=
P
0.0
0.0
,
2
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
=
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
d
als:
terv
0.8
0.8
In
nce
de
0.6
0.6
PSS
CCJ1
CCJ2
h_PS
h_NR
h_CCJ
Con…
Model 3
0.4
Bandwidth
Model 4
0.4
Bandwidth
95%
0.2
0.2
for
Rates
0.0
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
erage
v
Co
0.8
0.8
Empirical
2:
0.6
0.6
Model 1
0.4
Bandwidth
Model 2
0.4
Bandwidth
Figure
0.2
0.2
0.0
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
age
r e
v
ical Co
95% Empir
age
r e
v
ical Co
95% Empir
Robust Data-Driven Inference for Averaged Derivatives
30
4
2
2
0
0
400
=
Model 5
Model 6
n
−2
,
2
−2
=
P
−4
,
2
−4
=
d
0.4
0.3
0.2
0.1
0.0
0.4
0.3
0.2
0.1
0.0
J
J
4
4
Bandwidth:
PSS−h_PS
PSS−h_NR
CCJ1−h_CC
CCJ2−h_CC
N(0,1)
2
2
opulation
0
P
0
Model 3
Model 4
with
−2
−2
i
mation
−4
−4
x
0.4
0.3
0.2
0.1
0.0
0.5
0.4
0.3
0.2
0.1
0.0
Appro
4
4
Gaussian
2
2
Empirical
0
0
3:
Model 1
Model 2
−2
−2
Figure
−4
−4
0.4
0.3
0.2
0.1
0.0
0.4
0.3
0.2
0.1
0.0
Robust Data-Driven Inference for Averaged Derivatives
31
4
2
2
0
0
400
=
Model 5
−2
Model 6
n
,
−2
4
=
−4
P
−4
,
2
−6
=
d
0.4
0.3
0.2
0.1
0.0
0.4
0.3
0.2
0.1
0.0
4
J
J
4
Bandwidth:
PSS−h_PS
PSS−h_NR
CCJ1−h_CC
CCJ2−h_CC
N(0,1)
2
2
opulation
0
P
0
Model 3
Model 4
with
−2
−2
−4
i
mation
x
−4
0.4
0.3
0.2
0.1
0.0
0.4
0.3
0.2
0.1
0.0
Appro
4
4
Gaussian
2
2
0
Empirical
0
4:
Model 1
Model 2
−2
−2
Figure
−4
−4
0.4
0.3
0.2
0.1
0.0
0.4
0.3
0.2
0.1
0.0
Robust Data-Driven Inference for Averaged Derivatives
32
Table 1: Empirical Coverage Rates of 95% Con…dence Intervals with Population Bandwidth: d = 2, n = 400.
Model 1
Model 3
Model 5
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
P = 2
h
0:244
0:931
0:878
0:876
0:260
0:939
0:887
0:881
0:258
0:929
0:885
0:880
P S
h
0:244
0:931
0:878
0:876
0:260
0:939
0:887
0:881
0:258
0:929
0:885
0:880
N R
h
0:121
0:994
0:948
0:952
0:110
0:995
0:947
0:954
0:125
0:993
0:947
0:951
CCJ
P = 4
h
0:470
0:949
0:926
0:920
0:483
0:951
0:930
0:921
0:488
0:941
0:925
0:918
P S
h
0:498
0:940
0:920
0:912
0:512
0:943
0:925
0:912
0:517
0:935
0:918
0:910
N R
h
0:335
0:978
0:942
0:943
0:333
0:981
0:945
0:945
0:342
0:975
0:940
0:941
CCJ
Model 2
Model 4
Model 6
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
P = 2
h
0:161
0:970
0:921
0:916
0:172
0:978
0:935
0:931
0:197
0:968
0:920
0:919
P S
h
0:161
0:970
0:921
0:916
0:172
0:978
0:935
0:931
0:197
0:968
0:920
0:919
N R
h
0:081
0:994
0:944
0:946
0:093
0:993
0:947
0:949
0:074
0:995
0:946
0:950
CCJ
P = 4
h
0:325
0:951
0:917
0:907
0:338
0:964
0:938
0:927
0:366
0:962
0:936
0:931
P S
h
0:344
0:940
0:909
0:897
0:358
0:958
0:933
0:922
0:388
0:956
0:931
0:926
N R
h
0:254
0:977
0:940
0:939
0:273
0:982
0:945
0:943
0:220
0:990
0:945
0:949
CCJ
Note: Column BW reports population bandwidths.
Table 2: Empirical Average Length of 95% Con…dence Intervals with Population Bandwidth: d = 2, n = 400.
Model 1
Model 3
Model 5
BIAS
PSS
CCJ1
CCJ2
BIAS
PSS
CCJ1
CCJ2
BIAS
PSS
CCJ1
CCJ2
P = 2
h
0:005
0:036
0:031
0:030
0:002
0:013
0:011
0:011
0:003
0:022
0:019
0:019
P S
h
0:005
0:036
0:031
0:030
0:002
0:013
0:011
0:011
0:003
0:022
0:019
0:019
N R
h
0:002
0:110
0:080
0:080
0:000
0:053
0:038
0:038
0:001
0:064
0:047
0:047
CCJ
P = 4
h
0:183
3:096
2:842
2:755
0:050
1:184
1:091
1:043
0:090
1:981
1:849
1:782
P S
h
0:221
2:971
2:762
2:657
0:065
1:133
1:057
1:001
0:112
1:909
1:802
1:723
N R
h
0:070
4:302
3:566
3:556
0:002
1:720
1:417
1:409
0:021
2:696
2:279
2:268
CCJ
Model 2
Model 4
Model 6
BIAS
PSS
CCJ1
CCJ2
BIAS
PSS
CCJ1
CCJ2
BIAS
PSS
CCJ1
CCJ2
P = 2
h
0:006
0:080
0:065
0:064
0:002
0:029
0:023
0:023
0:002
0:030
0:025
0:025
P S
h
0:006
0:080
0:065
0:064
0:002
0:029
0:023
0:023
0:002
0:030
0:025
0:025
N R
h
0:002
0:270
0:193
0:194
0:000
0:083
0:060
0:060
0:000
0:168
0:119
0:120
CCJ
P = 4
h
0:483
5:983
5:292
5:093
0:114
2:229
1:973
1:905
0:125
2:558
2:266
2:227
P S
h
0:551
5:651
5:077
4:853
0:132
2:108
1:896
1:818
0:143
2:432
2:190
2:142
N R
h
0:270
7:995
6:555
6:454
0:061
2:843
2:353
2:317
0:042
5:024
3:796
3:821
CCJ
Note: Column BIAS reports absolute di¤erence between average of ^n (accross simulations) and 0. All …gures times 100.
Table 3: Empirical Coverage Rates of 95% Con…dence Intervals with Estimated Bandwidth: d = 2, n = 400.
Model 1
Model 3
Model 5
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
P = 2
^
hP S
0:248
0:870
0:817
0:809
0:255
0:883
0:819
0:809
0:252
0:887
0:833
0:823
^
hNR
0:248
0:870
0:817
0:809
0:255
0:883
0:819
0:809
0:252
0:887
0:833
0:823
^
hCCJ
0:113
0:980
0:937
0:940
0:132
0:976
0:932
0:932
0:120
0:981
0:938
0:941
P = 4
^
hP S
0:290
0:978
0:921
0:924
0:290
0:980
0:922
0:923
0:290
0:979
0:923
0:926
^
hNR
0:308
0:975
0:921
0:922
0:307
0:977
0:921
0:921
0:308
0:975
0:921
0:922
^
hCCJ
0:187
0:993
0:949
0:953
0:198
0:994
0:948
0:954
0:192
0:995
0:949
0:954
Model 2
Model 4
Model 6
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
BW
PSS
CCJ1
CCJ2
P = 2
^
hP S
0:201
0:858
0:796
0:780
0:208
0:903
0:851
0:838
0:212
0:920
0:860
0:854
^
hNR
0:201
0:858
0:796
0:780
0:208
0:903
0:851
0:838
0:212
0:920
0:860
0:854
^
hCCJ
0:104
0:972
0:916
0:919
0:119
0:973
0:929
0:930
0:105
0:986
0:943
0:946
P = 4
^
hP S
0:239
0:975
0:912
0:911
0:241
0:981
0:925
0:925
0:241
0:986
0:922
0:925
^
hNR
0:254
0:967
0:908
0:906
0:255
0:976
0:925
0:921
0:256
0:981
0:919
0:921
^
hCCJ
0:166
0:991
0:942
0:945
0:175
0:993
0:943
0:948
0:164
0:995
0:951
0:958
Note: Column BW reports sample mean of estimated bandwidths.
Document Outline
- Introduction
- Model and Previous Results
- MSE Expansion and Optimal Bandwidth Selectors
- Data-Driven Bandwidth Selectors
- Monte Carlo Experiment
- Extensions and Final Remarks
- Appendix
- Supplemental Material