Investigating Model Based Time Series Methods To Improve Estimates ...
Joint Statistical Meetings - Business & Economic Statistics Section
INVESTIGATING MODEL-BASED TIME SERIES METHODS TO IMPROVE ESTIMATES FROM MONTHLY VALUE
OF CONSTRUCTION PUT-IN-PLACE SURVEYS
Thuy Trang T. Nguyen, William R. Bell, and James M. Gomish, U.S. Census Bureau
Thuy Trang T. Nguyen, U.S. Census Bureau, 4700 Silver Hill Road, Washington, DC 20233-6900
KEY WORDS: Signal extraction, sampling error Sections 2 through 4 develop these models for the VIP series,
autocorrelation, sampling error model
and Section 5 presents the signal extraction results.
We view the results presented here as preliminary due
1. INTRODUCTION
to some significant data limitations: a) the direct VIP
estimates used in this study were adjusted with
The Value of Construction Put in Place (VIP) is a
undercoverage and late selection factors, whereas the
U.S. Census Bureau publication measuring the value of
estimates of sampling variances and autocovariances that we
construction installed or erected at construction sites during a
had available did not reflect these adjustments, b) the length
given month. The VIP estimates come from the monthly
of the time series we analyzed is eight years, and we had
Construction Progress Reporting Survey (CPRS) augmented
estimates of sampling variances and autocovariances only for
with estimates of a non-CPRS component based on
the last four years. The first of these limitations means that
regulatory filings, phasing of other Census data, we are simply forced to assume that the sampling variances
administrative records, and trade association data. In July
and autocovariances we had available are at least
2002 the Census Bureau began publishing the monthly VIP
approximately valid for the time series of estimates we are
for new “types of construction” (TC) categories that
analyzing. Further analysis might assess this assumption or
reclassified and expanded the previous TC categories. (The
partially address this limitation. The second limitation is
latter can be found in U.S. Census Bureau (2002a).) The
perhaps more directly relevant to the results presented here.
new TC categories contain many more series and levels of
The limited amount of data available means that we had
detail than do the old TC categories. This expansion to more
difficulty determining the appropriateness of the models we
levels of detail resulted in relatively small sample sizes and
developed, and that even if we assume we chose models of
large sampling errors for the direct survey estimates for
appropriate form, we remain quite uncertain about the model
many categories.
parameters. This is important because the signal extraction
In this paper we investigate the use of time series
results as computed are optimistic in that they assume the
modeling and signal extraction methods to borrow correct model is being used, including use of the true values
information over time for improving the VIP estimates.
of the model parameters.
Scott and Smith (1974) and Scott, Smith, and Jones (1977)
proposed use of time series techniques to improve estimates
2. MODELING OF THE SAMPLING VARIANCES
in repeated surveys. More recent work in this area includes
AND AUTOCORRELATIONS
papers by Bell and Hillmer (1990), Binder and Dick
(1989,1990), and Pfeffermann (1991). The approach
Given concerns about the high level of sampling error
requires the development of time series models for the
in the VIP point estimates there is reason for concern also
sampling errors in the direct estimates as well as for the true
about the level of sampling error in their corresponding
underlying series being estimated. Here we develop such
estimates of sampling variances, autocovariances, and
models for 70 VIP time series from a subset of the TC
autocorrelations. To reduce the level of statistical
categories that refer to privately owned nonresidential
uncertainty in these estimates we take the raw (direct)
construction. All these series start in January 1993 and end
sampling variance and autocorrelation estimates and model
in December 2000, and are estimated entirely from the
them. Our philosophy is that, if the direct survey point
CPRS. These direct estimates have last-year (year 2000)
estimates need to be improved via modeling, then so too do
average coefficient of variation (CV) ranging from 3% to
the direct survey variance and autocorrelation estimates.
27%. Table 1 gives a complete list of the TCs and their last-
Unfortunately, sampling variance and autocovariance
month, last-year average, and last-four-year average CVs.
estimates are not available prior to January 1997, i.e., we
We also perform signal extraction with the fitted models to
have no sampling variance and autocovariance estimates for
examine the potential for variance reduction in the estimates
the first four years of our observed time series (1993-1996).
by borrowing information over time through the models.
Because of the sample size and frame changes in January
___________________________________
1997 (Cartwright 1996; Mesenbourg 1997), this means we
also lack information to relate pre-1997 sampling variances
This paper reports the results of research and analysis
to post-1997 sampling variances, and similarly for
undertaken by Census Bureau staff. It has undergone a
autocovariances. Thus, here and in Section 3 we develop
Census Bureau review more limited in scope than that given
sampling error models based only on the post-1997 data.
to official Census Bureau publications. This report is
Use of these models for the full length of the observed time
released to inform interested parties of ongoing research and
series in Section 4 necessitates some “heroic” assumptions as
to encourage discussion of work in progress.
noted there.
2470
Joint Statistical Meetings - Business & Economic Statistics Section
2.1 Modeling of the Sampling Variances
+ b1 / nt + error. 3) RelVârp(yt) = b0 + error. 4) Vârp(yt) =
b1 / nt + error. 5) Vârp(yt) = b0 + b1 / nt + error. 6) Vârp(yt)
The sampling variance Varp(yt) of the direct VIP
= b0 + error. 7) Vârp(yt) = b1nt + error. 8) Vârp(yt) = b0 +
survey estimate yt is estimated using the stratified jackknife
b1nt + error. Models 1 to 3 allow for dependence of
method. This is done using the VPLX program (Fay 1998).
sampling variances on level through modeling of the relative
Sampling variances are expected to depend on sample size
variances, whereas models 4 to 8 imply no explicit
and possibly also on the VIP levels. Various alternative
dependence of sampling variances on level. Models 1 and 4
models for this dependence were thus fitted to the direct
allow for sampling variability to be inversely proportional to
sampling variance estimates and compared empirically.
sample size, and models 2 and 5 generalize this dependence
The first stage in modeling the variances was defining
with an intercept term. Models 3 and 6, however, allow no
nt, the sample size at time t, for each of the 70 TCs. The
dependence of sampling variability on sample size. Models
definition of nt is not so obvious for the VIP estimates. We
7 and 8, which imply that sampling variances increase with
examined different definitions of sample size nt. The
increasing sample size (assuming b1 > 0), require some
alternative definitions differ in regard to the extent to which
explanation. Such dependence is possible because sample
they count certainty cases (that is, having a sampling rate of
size increases with the level of construction activity (more
1-in-1) and how they treat the cases that were originally
active projects in sample), as does the level of the VIP series,
sampled as belonging to another TC (not the TC under
and as would the variance of the estimates of VIP.
consideration) but later discovered to belong to this TC. The
We examined scatter plots of RelVârp(yt) or Vârp(yt)
latter cases were not part of the planned sample for the given
versus nt, with the fitted GVF curves superimposed. While
TC, though they currently are part of the sample. One can
some, not all, of the plots were quite noisy, those that were
argue that certainty cases do not contribute to sampling
not suggested that models 4 to 6 are unreasonable, i.e.,
variability and thus should not be counted towards sample
sampling variances are positively related to the level of the
size. (This argument would be more compelling if the
series. The plots also suggested dependence of sampling
estimate were broken into pieces from the certainty and
variances on sample size, eliminating model 3 from
noncertainty portions of the sample, and the variance
consideration. We thus kept four models for further analysis
estimates were used for modeling the separate noncertainty
(models 1, 2, 7, and 8) and discarded the others. We
portion.). It is possible that the choice among the alternative
computed AICs from these four models using results of the
definitions of nt will make little difference to the variance
regression fits and assuming the error terms were normal and
modeling for a given TC. To check this we computed 5×5
homoscedastic. (AIC = m × log(SSE / m) + 2 × p where m is
correlation matrices between the alternatively defined nt’s for
the number of data points in the fit, p is the number of
each of the 70 TCs. The correlation coefficients exceeded
parameters in the GVF, and SSE is the regression error sum
.90 for almost all pairs of alternative nt’s for all TCs. This
of squares.) Model 2 had the smallest AIC for 62 out of 70
high correlation suggests that choice of a particular nt is
series.
unlikely to appreciably affect the fit of the variance model.
Because the normality assumption for the error terms
Our tentative choice was the definition that removed from
in models 1 to 8 is questionable given that the data are
the sample size count of the certainty cases. For those series
estimated variances and relative variances, we also tried
where the correlation coefficients for the chosen nt definition
fitting models for the logs of the variances and relative
and the other nt definitions were less than .90 we compared
variances. These analogs to models 1, 2, 7, and 8 are: 1-log)
fits of variance models (discussed below) using these
log[RelVârp(yt)] = log(b1 / nt) + error. 2-log)
alternative definitions. The models of the chosen nt
log[RelVârp(yt)] = log(b0 + b1 / nt) + error. 7-log)
definition had smaller AICs than models for the other nt
log[Vârp(yt)] = log(b1 nt) + error. 8-log) log[Vârp(yt)] =
definitions, and so this definition remained the preferred
log(b0 + b1 nt) + error. Note that models 1-log and 7-log
choice.
reduce to linear models that can be fit by linear regression,
A natural way to account for possible dependence of
while models 2-log and 8-log require fitting by nonlinear
sampling variances on the level of the VIP estimates is to
regression (done using PROC NLIN in SAS (1990)). We
model relative sampling variances rather than directly model
compared AICs for the four models, finding that of these
the sampling variances. The relative sampling variance of yt
models 2-log had the smallest AIC for 64 out of the 70
is defined as RelVarp(yt) = Varp(yt) / (Yt)2, where Yt is again
series. We thus discarded models 7-log and 8-log along with
the underlying population quantity estimated by yt (true VIP
models 7 and 8. We then compared AICs for models 1, 1-
for the TC). From a Taylor series linearization, RelVarp(yt)
log, 2, and 2-log. For the log models we added to the AICs –
is approximately Varp[log(yt)], a property that will be
2 times the log-Jacobian of the log transformation, which is -
relevant to the time series modeling of Section 4. Since Yt is
2Σlog|Jt| for t = 1 to m, where Jt = ∂log(νt) / ∂νt = 1 / νt and
unknown, we use RelVârp(yt) = Vârp(yt) / (yt)2 to estimate the
νt is the tth observation of the data being modeled: νt =
relative sampling variances.
RelVârp(yt) for model 1-log and model 2-log. Model 2-log
To investigate alternative possibilities for the had the smallest AIC for 63 of the 70 series, and for three of
dependence of sampling variances on level and sample size,
the other TCs model 1-log, a special case of model 2-log, had
we fitted the following generalized variance function (GVF)
the lowest AIC. Of the remaining four TCs for which model
models by linear regression (Wolter (1985, ch. 5) discusses
2 was preferred by AIC, there was only one TC for which the
GVFs.): 1) RelVârp(yt) = b1 / nt + error. 2) RelVârp(yt) = b0
difference was substantial. To avoid the complexity of using
2471
Joint Statistical Meetings - Business & Economic Statistics Section
different variance models for a few different TCs, we
one. From Box and Jenkins (1976, p. 62) this implies that
adopted model 2-log for all 70 TCs.
2
σ
Var c
. To
c
=
( )
t
= 1
[( + φ ) 1
( − φ )] × 1
[( − φ )2
2
−φ ]
2
2
2
1
estimate the parameters φ1 and φ2 we used the averaged
2.2 Modeling of the Sampling Autocorrelations
sampling autocorrelations developed in Section 2.2 and
applied the Yule-Walker equations for the AR(2) model.
For all 70 TCs in this study, we produced estimates of
From Box and Jenkins (1976, p. 60) this gives ˆ
φ = [r1(1 -
sampling autocovariances and autocorrelations for each pair
1
ˆ
of months from January 1997 through December 2000. Like
r2)] / (1- 2
r ) and φ = (r
r ) / (1 – 2
r ) where r
1
2
2 –
2
1
1
1 and r2 are
the sampling variances, the sampling autocovariances
the averaged sampling autocorrelations at lags 1 and 2.
between time t and t – k Côv
The estimates of h
p(yt, yt-k) were also estimated
t come from the fitted variance
using the VPLX program with the stratified jackknife
models developed in Section 2.1. The model chosen there
method. Follows, the estimated sampling autocorrelations
(model 2-log) is fitted to the estimates, log[RelVârp(yt)],
are computed from the estimated sampling autocovariances
which are taken as estimates of γt = log[Var(et)]. Denote the
and variances as Côrr
fitted values by
ˆ
ˆ
ˆ
γ = log(b + b / n ). We wish to convert these
p(yt, yt-k) = Côvp(yt, yt-k) / [Vârp(yt)
t
0
1
t
Vârp(yt-k)].5.
to estimates of Var(et). Simple exponentiation is one
Assuming stationarity of the autocorrelations, for
obvious way to do this, i.e., we set 2
h = exp( ˆ
γ ) .
t
t
each TC we averaged all the estimated autocorrelations for a
A more involved approach to converting fitted values
given lag, that is, averaging 47 estimated lag-1 from the sampling variance model 2-log to estimates of
autocorrelations, 46 estimated lag-2 autocorrelations, etc.
Var(et) attempts to correct for bias from the log
We then used the averaged autocorrelations to calculate
transformation. We assume that the direct relative variance
corresponding partial autocorrelations by solving the Yule-
estimates, v = RelVârp(yt), are approximately unbiased for
Walker equations of successively higher order (Box and
t
Var(e
Jenkins 1976, pp. 64-65). Graphs of the resulting
t). If we knew the parameters b0 and b1, and the
autocorrelation function (ACF) and partial autocorrelation
variance of the error term (say ω2) in model 2-log, then from
function (PACF) were produced and examined for all 70
properties of the lognormal distribution E(νt) = exp(γt +
TCs.
.5ω2) ≡ exp(b0 + b1 / nt + .5ω2). Since we only have the
The patterns of the ACF and PACF plots were quite
fitted model we assume the fitted values
ˆ
ˆ
ˆ
γ = log(b + b / n )
t
0
1
t
similar across all 70 TC’s. The ACF is dominated by an
are approximately normally distributed with means γt and
exponential decay (apart from some persistent positive,
variances Var(γˆ ). This means that exp(γˆ ) is
t
t
though small, autocorrelations at higher lags that were not
ˆ
γ
γ + 5
. Var( ˆ
γ )
characteristic of the 70 TCs in general). The PACF has a
approximately lognormal with E(
t
e )
t
= e
.
large spike at lag 1 and a much smaller spike at lag 2 (more
Assuming that the residual variance from the fit of model 2-
so for some series than others). Candidate models for such
log ˆ 2
ω = [log(v ) γ
is a consistent estimate of
t
− ˆ ]2 (
t
− )
2
∑
m
patterns include the first order or second-order autoregressive
the true residual variance ω2, this implies that
(AR(1) or AR(2)) model and the mixed ARMA(1,1) model.
γ
ω
γ
E(
2
ˆ + [
5
.
ˆ
Va
− r( ˆ )]
e
γ + ω
t
)≈ E( 25.
e
)
Again, for simplicity, we wanted to use the same model for
= Var(e ) so we set
t
all the TCs. The AR(2) seemed to be a suitable choice for
2
h = exp{ˆ
γ + [
5
.
ˆ 2
ω −Va (
r ˆ
γ )]}.
t
t
t
this purpose.
We tried both of these approaches and found they
sometimes gave different results. The differences from the
3. DEVELOPMENT OF THE SAMPLING ERROR
two approaches were greater than 10% for 30 out of 70 time
MODEL
series, but the other 40 series had less than 10% difference.
The series with large differences tended to be noisier than the
In Section 4 we develop models for the time series of
series with less than 10% differences. The sampling error
the logarithms of the VIP estimates for the 70 TCs, denoting
standard error, h
the time series for a given TC by log(y
t, that will be used in section 4 is the result of
t). Here we complete
the second approach.
development of the models for the sampling error component
Another problem is that we had sampling variance
et of log(yt). In Section 2.1 we developed models for Var(et)
estimates only from January 1997 to December 2000, and so
= Varp[log(yt)], noting that from a Taylor series linearization
fitted the sampling variance models using data from this
Varp[log(yt)] ≈ RelVarp[(yt)]. In Section 2.2 we noted that
period. Because of the sample design changes in January
the sampling error autocorrelations generally appeared to be
1997 (sampling rate changes and sampling frame change) we
well-modeled by an AR(2) model. Putting these two parts of
really have no information to relate sampling variances prior
the model together, we have the following general form of
to 1997 to those from 1997 on. Therefore, from January
the sampling error model: e
ẽ
t = ht t where ht is the standard
1993 to December 1996 we simply set the value of ht to its
deviation of et, i.e., ht = [Var(et)].5 ≈ {RelVarp[(yt)]}.5 and ẽt
value for January 1997. This is not a good solution to this
has variance one and follows the AR(2) model (1 - φ1B -
problem, but the only other option is to restrict the time
φ2B2) ẽt = ct. B is the backward shift operator and ct is white
series modeling to start in January 1997, which would give
noise. So that ẽt has variance one we need to set Var(ct) so
us only four years of data. We intend to pursue the second
that the variance computed from the AR(2) model above is
option later, but for now take the first course of filling in the
2472
Joint Statistical Meetings - Business & Economic Statistics Section
earlier ht values, keeping in mind that this is a significant
cancelled with the nonseasonal MA operator and a trend
limitation to our results.
constant was added to the model. 5) If the estimate of the
seasonal MA parameter was now near 1, the seasonal
4. DEVELOPMENT OF MODELS FOR THE TIME
difference was cancelled and a fixed seasonal and a trend
SERIES OF THE DIRECT ESTIMATES
constant were included.
We continued to modify models as needed until the
The direct log VIP estimate is equal to the true log
results seemed reasonable or simply the best that we could
VIP plus sampling error, log(yt) = log (Yt) + et. The model
do. We did not feel the need to stray from the airline model
for the observed time series log(yt) is determined by the
since in the cases where the RegComponent model seemed
models for the two components log(Yt) and et; we call such a
not to fit well the situation generally was not improved much
model a “component model.” When the model for log(Yt)
by changing the airline model to some other model.
includes regression terms, we call the model for log(yt) a
Basically, some of the series were just quite noisy and
RegComponent model. Given the models for the sampling
difficult to model.
error components et developed in Section 3, and given a
Given that we lacked sampling variance estimates for
specified form for a time series model for log(Yt), we can fit
the first four years of our series our sampling error models
the resulting RegComponent model to the observed series
are questionable for this period. Motivated by this, we
log(yt) to estimate the unknown parameters of the model for
shortened the VIP series to the four years starting in January
log(Yt). In doing so the parameters of the model for the
1997 for which we did have sampling variance estimates, and
sampling error component et are held fixed. The tried fitting the RegComponent models to these shortened
REGCMPNT program developed by the Time Series Staff of
series. Unfortunately, we were generally unsuccessful in
the Census Bureau performs this type of model fitting.
modeling these extremely short series.
As part of exploratory analysis to determine suitable
forms for the models for the true time series log(Yt), we used
5. APPLICATION OF SIGNAL EXTRACTION
the X-12-ARIMA program (U.S. Census Bureau 2002b) to
RESULTS TO INVESTIGATE POTENTIAL FOR
fit some RegARIMA models (regression models with error
IMPROVING ESTIMATES OF THE TRUE VIP
terms following ARIMA models), ignoring the sampling
SERIES
error components. This allowed us to check for trading-day
effects and outliers in the series. Any outliers found were
The REGCMPNT program produces finite sample
carried over for use in the RegComponent model since the
signal extraction estimates of the component series along
REGCMPNT program does not perform outlier detection.
with signal extraction error variances for these estimates.
For the ARIMA models we started in all cases with the
We denote the signal extraction estimates for the log VIP
airline model (Box and Jenkins 1976, ch. 9). In cases where
series by log(Ŷt). Our interest here is primarily in the signal
the estimate of the seasonal moving average parameter was
extraction error variances, denoted Var[log(Yt) - log(Ŷt)].
close to 1 we cancelled the seasonal difference and
The square roots of these error variances can be interpreted
noninvertible seasonal MA operator and converted the model
in percentage terms, analogous to CVs. When compared to
to an ARIMA(0,1,1) with fixed seasonal effects and a trend
the original sampling error CVs for the direct VIP estimates,
constant.
this provides a measure of the improvement from signal
Having made a preliminary determination of the need
extraction.
for trading-day effects and outliers, the resulting
The signal extraction results from REGCMPNT
RegComponent models were fitted by the REGCMPNT
assume that the correct model is used. In particular, no
program for each of the 70 VIP series. We used airline
allowance is made in the signal extraction error variances to
models for log(Yt) except when the ARIMA model fitting
account for uncertainty due to using estimated model
results suggested fixed seasonality. The REGCMPNT fitting
parameters. With reasonably long time series the
results were examined and changes were made to the models
consequence of this is generally some amount of
when they exhibited any of the following properties: 1) if an
understatement of the signal extraction error variance. With
outlier (included in the RegComponent model using the
the limitations of our modeling (very short time series, some
appropriate regression variable) had a t-statistic less than 3.8
series have high levels of sampling error, no sampling
(the critical value used in the X-12-ARIMA outlier
variances prior to 1997) we are quite uncertain about the true
detection), the outlier was dropped from the model. 2) If the
values of our model parameters. This raises the possibility
model included trading-day effects but the chi- squared
that the signal extraction variances we examine here
statistic testing the significance of the trading-day effects was
significantly understate the true error variances. However,
insignificant at the .05 level then the trading-day effects were
overstatement of variances could also occur if the
dropped from the model. 3) If the model included fixed
innovations variance in the model for log(Yt) is
seasonal effects and the fixed seasonal p-value was
overestimated. The bottom line here is that, due to the
extremely large (p-value > .45; note that these p-values
significant amount of uncertainty about our model
tended to be either very large or less than .05), then the fixed
parameters, the signal extraction variances generally provide
seasonal effects would be dropped from the model, leaving a
at best rough indications of potential for improvement from
nonseasonal model. 4) If the estimate of the nonseasonal
signal extraction. Results for single series should not be
MA parameter was near 1, the nonseasonal difference was
taken too seriously, particularly for those series with high
2473
Joint Statistical Meetings - Business & Economic Statistics Section
levels of sampling error. Results considered over all 70 VIP
improvements from signal extraction over the sampling CVs
series probably provide better general indications of potential
of the direct estimates. To summarize the results, the percent
for improvement. The results should not be taken as precise
improvements are shown in the CVs of the last month
quantifications of the potential improvement.
estimate, along with the average percent improvements over
the last year of estimates and the last four years of
estimates. The percent improvements shown are
Table 1. Sampling Coefficient of Variation (CV) and Percentage
multiplicative percent improvements on the sampling CVs
Improvements in CV from Signal Extraction for the Last-month
expressed as percents. Thus, if the sampling CV was 20%
Average, Last-year Average, and Last-four-year Average
and the improvement was 25%, then the signal extraction
Last Month
Last Year Average Last Four-Year Average
CV was (1 − .25)×20% = 15%.
(%) CV
(%) CV
(%) CV
Types of Sampling Improve- Sampling Improve- Sampling Improve-
Table 1 indicates that there was a wide range of
Construction CV (%) ment CV (%) ment CV (%) ment
-
estimated improvements. Note the average improvement in
Lodging
5
3
5
4
5
5
the CVs over the last year ranged from 2% (public safety
Office
4
13
4
16
5
27
Commercial
3
30
3
30
4
43
TC) to 46% (financial TC). For those TCs whose last-year
Health Care
5
32
4
33
5
46
Educational
5
14
5
18
6
24
average sampling error CV is less than 10%, the average
Religious
5
34
5
34
7
50
percent improvement from signal extraction ranged between
Public Safety
9
1
8
2
8
3
Amuse & Rec
5
8
5
12
6
17
2% and 45%. For TCs whose last-year average sampling
Transportation
5
6
6
9
8
11
Sewer & WstDisp
23
14
27
17
22
18
error CV is greater than 10%, the average percent
Water Sup Sys
28
31
26
40
26
46
improvement ranged between 4% and 46%. This showed
Manufacturing
3
4
3
7
4
8
Food/Bev/Tobac
10
10
10
14
12
17
potential for improvements both for TCs with relatively
Textile/App//Leath
14
6
12
8
17
10
Wood
10
3
15
6
16
8
small sampling CVs as well as for TCs with relatively large
Furniture
12
10
24
14
24
18
sampling CVs. However, improvements in the accuracy of
Paper Products
16
27
17
30
24
37
Print/Publishing
17
5
15
7
15
9
estimates for TCs whose sampling CVs are already quite low
Chemical/Allied
7
2
8
4
8
5
Petroleum/Coal
7
6
17
10
12
12
(say < 5%) may not be of much interest.
Rubber/Plastics
9
2
10
2
8
3
Stone/Clay/Glass
6
3
8
3
13
9
Primary Metal
8
2
5
2
7
3
6. CONCLUSIONS
Fabricated Metal
9
3
11
4
11
5
Machinery/Non-el
17
5
19
9
17
12
Computer/Elect/El
7
2
7
3
7
3
The results presented provide rough indications of
Transportation
8
15
9
18
13
24
Miscellaneous
8
3
8
5
12
7
potential for improvement of the VIP estimates through time
Financial
12
52
11
46
14
53
Automotive
8
27
9
30
12
40
series modeling and signal extraction. Assessment of the
Food/Beverage
9
38
9
34
12
49
actual improvements that can be realized, however, is made
Multi-Retail
6
9
5
13
6
18
Other Commercial
8
25
8
24
13
45
difficult by the significant data limitations (short series with
Warehouse
4
13
5
17
6
22
Hospital
7
28
5
31
7
45
no sampling variance estimates for the first four years).
Medical Building
8
13
9
18
9
21
These limitations leave us with considerable uncertainty
Special Care
9
49
9
45
11
49
Preschool
22
28
24
31
33
50
about the parameters of our models, and this affects the
Primary & Second
10
17
11
21
13
27
Higher Education
6
28
15
27
9
38
validity of the signal extraction results. (The signal
Other Educational
18
9
7
15
16
19
extraction variances can be thought of as estimates, here
House of Worship
6
35
6
36
8
50
Other Religious
9
45
9
43
12
50
fairly imprecise estimates, of the true variances of the errors
Theme/Amusemnt 10
3
13
5
16
6
Sports
15
6
10
10
12
11
in the signal extraction estimates.) The high level of
Fitness
15
30
14
31
18
43
sampling error in some of the series is another limitation on
Perform/mtCnter
10
4
12
5
14
7
Social Places
13
30
12
32
16
42
the modeling results as it too contributes to uncertainty about
MovieTheatr & Stud 9
3
10
5
9
7
Air
4
2
4
3
7
4
model parameters. Series with high levels of sampling error
Land
17
4
16
6
16
8
are the most interesting in regard to potentially improving on
General Offices
4
12
4
15
6
26
Auto Sales
16
19
17
24
20
36
the accuracy of the direct survey estimates. Unfortunately,
Auto Service/Parts
13
24
13
26
18
38
Parking
8
6
14
10
18
12
high levels of sampling error make series more difficult to
Food
11
36
12
31
15
47
model.
Dining/Drinking
18
39
16
39
18
50
Fast Food
20
35
26
41
29
51
In the future we hope to do additional work to at least
General Merchand
5
16
9
20
10
28
Shopping Center
9
6
7
11
8
16
partially address some of the limitations of this study. First,
Shopping Mall
21
12
12
16
10
19
we will soon have available one additional year of VIP
Other Stores
13
24
12
26
19
43
Drug Stores
18
23
9
25
22
40
estimates to extend our time series, and can also generate
Building Supplies
9
6
16
10
14
20
General Warehses
4
10
5
18
6
28
corresponding sampling variance estimates. The resulting
Instructional
8
29
9
30
11
40
series will still be rather short, but not quite so short as
Dormitory
14
45
13
41
21
47
Sport/Rec Facility
19
12
21
17
22
20
before. Second, we intend to pursue a Bayesian approach to
Gallery/Museum
22
10
16
17
17
22
Auxiliary Buildings
13
12
13
17
16
21
inference with our models to recognize uncertainty about the
model parameters. (This will most likely recognize
uncertainty only about the parameters in the models for
Keeping these limitations in mind, Table 1 presents
log(Yt), taking the fitted sampling error models as given, but
results derived from the signal extraction variances produced
it is the uncertainty about the parameters in the models for
by REGCMPNT. The table shows the estimated log(Yt) that is of most concern.) The goal here is not really to
2474
Joint Statistical Meetings - Business & Economic Statistics Section
reduce the uncertainty, but simply to account for it in the
Scott, Alastair J., T.M.F. Smith, and Roger G. Jones (1977),
signal extraction results. Finally, if we are able to achieve
“The Application of Time Series Methods to the Analysis of
satisfactory results with the Bayesian approach, we intend to
Repeated Surveys,” International Statistical Review, 45, 13-
use the models to investigate model-based seasonal 28.
adjustment of the VIP series, and the potential for the use of
the models to improve seasonal adjustment results. Again,
U.S. Census Bureau (2002a), “Value of Construction Put in
this will probably need to be done with a Bayesian approach.
Place”, (Report No. C30/02-5 (Annual)), Washington, DC:
U.S. Department of Commerce.
ACKNOWLEDEMENTS
---------- (2002b), X-12-ARIMA Reference Manual, Final
Version 0.2, Washington, DC: U.S. Census Bureau.
We would like to thank Brian Monsell of the
Statistical Research Division for his assistance in writing
Wolter, Kirk M. (1985), Introduction to Variance
routines to compute the partial autocorrelations with the
Estimation, New York: Springer-Verlag Inc.
autocorrelations as input. We thank Bob Fay of the
Director’s Office for his assistance with VPLX. We also
thank Masato Asanuma of the Manufacturing and
Construction Division for providing the information
regarding the survey design.
REFERENCES
Bell, William R. and S. C. Hillmer (1990), “The Time Series
Approach to Estimation for Repeated Surveys,” Survey
Methodology, 16, 195-215.
Binder, David A. and Peter J. Dick (1989), “Modeling and
Estimation for Repeated Surveys,” Survey Methodology, 15,
29-45.
---------- (1990), “A Method for the Analysis of Seasonal
ARIMA Models,” Survey Methodology, 16, 239-253.
Box, George E. and Gwilym M. Jenkins (1976), Time Series
Analysis: Forecasting and Control, California: Holden-Day
Inc.
Cartwright, David C. (1996), “Construction Progress
Reporting Surveys (CPRS) System Redesign – Revised
Sampling Rates”, MCD Memorandum, April 29, 1996.
Fay, Robert E. (1998), VPLX Program Documentation,
Washington DC: U.S. Census Bureau.
Mesenbourg, Thomas L. (1997), “Construction Progress
Reporting Surveys System Redesign – Revised Sampling
Rates”, MCD Memorandum, July 16, 1997.
Pfeffermann, Danny (1991), “Estimation and Seasonal
Adjustment of Population Means Using Data from Repeated
Surveys,” Journal of Business and Economic Statistics, 9,
163-175.
SAS Institute, Inc. (1990), SAS Language: Reference,
Version 6, First Edition, Cary, NC: SAS Institute, Inc.
Scott, Alastair J. and T.M.F. Smith (1974), “Analysis of
Repeated Surveys Using Time Series Methods,” Journal of
the American Statistical Association, 69, 674-678.
2475
Document Outline
- Return to Main Menu
- =================
- Search CD-ROM
- ================
- Next Page
- Previous Page
- =================
- Program book
- Table of Contents
- =================
- Full Text Search
- Search Results
- Print
- =================
- Help
- Exit CD