Computing Standard Errors for MEPS Estimates
Steven Machlin, William Yu, and Marc Zodet
Introduction
The Household Component of the
Medical Expenditure Panel Survey (MEPSHC) is designed to
produce national and regional estimates of the health care use,
expenditures, sources of payment, and insurance coverage of the
U.S. civilian noninstitutionalized population. The sample design
of the survey includes stratification, clustering, multiple
stages of selection, and disproportionate sampling. Furthermore,
the MEPS sampling weights reflect adjustments for survey
nonresponse and adjustments to population control totals from
the Current Population Survey. These survey design and
estimation complexities require special consideration when
analyzing MEPS data (i.e., it is not appropriate to assume
simple random sampling).
To obtain accurate estimates from MEPS survey data, for either
descriptive statistics or more sophisticated analyses based on
multivariate models, the MEPS survey design complexities need to
be taken into account by applying MEPS survey weights to produce
estimates and using an appropriate technique to derive standard
errors associated with the weighted estimates. Several methods
for estimating standard errors for estimates from complex
surveys have been developed, including the Taylorseries
linearization method, balanced repeated replication, and the
jackknife method.
The MEPS public use files include
variables to obtain weighted estimates and to implement a
Taylorseries approach to estimate standard errors for weighted
survey estimates. These variables, which jointly reflect the MEPS
survey design, include the estimation weight, sampling strata, and
primary sampling unit (PSU). The documentation and codebook for
MEPS public use files contain these survey design variables. For
example, the documentation for file HC070 (2002 fullyear
consolidated data file) includes the person weight (PERWT02F),
stratum (VARSTR), and PSU (VARPSU) variables.
Statistical software packages that
are commonly used to estimate standard errors from complex
multistage designs using the Taylorseries linearization method
include SAS® (version 8.2 or higher), SUDAAN®, Stata®, and SPSS®
(version 12.0 or higher). Examples of basic programming code from
these packages to produce selected estimates and the corresponding
standard errors are provided in this document. The software
packages vary with respect to the specific types of estimates and
models that can be produced accounting for the complex survey
design and the treatment of missing data. For complete information
on the capabilities of each package, analysts need to refer to the
appropriate software user documentation manuals. The Web sites for
SAS, SUDAAN, Stata, and SPSS are http://www.sas.com, http://www.rti.org,
http://www.stata.com, and
http://www.spss.com, respectively. The
R language also has a package for complex survey analysis.
Information on this package can be found in the June 2003 R News
newsletter available on the R website at
http://www.rproject.org.
Standard errors for MEPS estimates
are most accurate when the analytic file contains all of the MEPS
sample persons (e.g., those with positive values for the person
weight variable) and the appropriate syntax is used to analyze
population subgroups. Section I below provides examples of basic
programming code for SAS, SUDAAN, Stata, and SPSS to generate
estimates from MEPS personlevel files, both for the total
population and for population subgroups. Section II provides
options for estimation in situations where analytic files do not
include all of the MEPS sample persons. These situations include
analyses based solely on data from MEPS event files, which only
contain sample persons that received a particular type of care,
and analyses of data from MEPS supplements (e.g., the diabetes
supplement data in PUF HC070), which require the use of special
analytic weights that exclude the sample persons who were not
included in the supplement.
^top
I.
MEPS PersonLevel Files
A. Analyses of the
Total Population
Example: Using the 2002 MEPS fullyear consolidated file
(PUF HC070) as the analytic file, the basic programming
code provided below for each software package will produce
correct estimates of the overall mean total expenditures
in 2002 ($2,813.24) and the corresponding standard error
($58.99).
SAS
proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var totexp02;
SUDAAN
Note: SUDAAN requires that the data be sorted by the
survey design variables that appear on the NEST statement
(i.e., varstr varpsu in example below).
proc descript filetype=sas design=wr;
nest varstr varpsu;
weight perwt02f;
var totexp02;
Stata (syntax below applies to releases 8.0 and higher)
svyset [pweight=perwt02f], strata(varstr)
psu(varpsu)
svymean totexp02
SPSS
csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=totexp02
/mean
/statistics se.
B. Analyses Limited to a Population Subgroup
Analyses are often limited to a subgroup of the population.
However, creating a special analysis file that contains
only observations for the subgroup of interest may yield
incorrect standard errors or an error message (e.g., "stratum
with only one psu detected" in Stata) because all
of the observations corresponding to a stage of the MEPS
sample design may be deleted. Therefore, it is advisable
to preserve the entire survey design structure for the
program by reading in the entire personlevel file. Each
software package provides a capability to limit the analysis
to a subgroup of the population without subsetting the
analysis file.
Example: Using the 2002 MEPS full year consolidated file
(PUF HC070) as the analytic file, the following statements
will produce accurate estimates of the average total
expenditures in 2002 for children younger than 18 years
of age ($1,085.82) and the corresponding standard error
($70.28).
SAS
proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var totexp02;
domain agegroup;
Note: The domain statement in this example will generate
estimates for all categories of the variable agegroup
(a hypothetical constructed analytic variable where the
youngest group is children under 18). There is no option
within the surveymeans procedure to select only a specific
population subgroup (e.g., agegroup=1).
SUDAAN
proc descript filetype=sas design=wr;
nest varstr varpsu;
weight perwt02f;
var totexp02;
subpopn agegroup=1;
Note: The subpopn statement in this example generates
estimates for children under 18 (where agegroup is a
constructed analytic variable that is equal to 1 for
children under 18).
Stata (syntax below applies to releases 8.0 and higher)
svyset [pweight=perwt02f], strata(varstr) psu(varpsu)
svymean totexp02, subpop(children)
Note: The subpop statement in this example generates
estimates for children under 18 only (where children
is a constructed variable set equal to 1 for persons
under 18 and set equal to 0 for all other persons).
SPSS
csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=totexp02
/mean
/statistics se
/subpop table=children.
Note: The subpop statement in this example will generate
estimates for all categories of the variable children
(a hypothetical constructed dichotomous analytic variable
where 1=children under 18 and 0=adults 18 and over).
There is no option within the csdescriptives procedure
to select only a specific population subgroup (e.g.,
children=1).
^top
II. Analysis
of MEPS EventLevel Files and MEPS Supplements
There are some situations where it is not convenient to include all of the
MEPS sample persons in the analytic file. In particular, MEPS eventlevel files
only contain sample persons that received a particular type of care. Also,
while data from MEPS supplements are typically contained on personlevel files
that include all sample persons, their analysis requires the use of special
analytic weights that essentially exclude sample persons who were not included
in the supplement.
While standard errors are technically most accurate when the analytic file
contains all of the MEPS sample persons (e.g. those with positive values on
the person weight variable), it is possible to produce standard errors that
will usually be fairly accurate without creating an analytic file that includes
the entire sample. The following two sections provide more detailed information
as well as examples of programming code for SAS and SUDAAN when working with
MEPS eventlevel files and data from MEPS supplements.
A. MEPS EventLevel Files
MEPS eventlevel files include only records for persons with health care use in the year. Therefore, analyses based solely on eventlevel files do not preserve the entire estimation structure because persons in the sample without health care use are not represented. If a substantial number of persons in the sample are not represented in the file and some strata contain only observations from one PSU, then estimating standard errors becomes problematic.
Error messages are generated in SUDAAN and Stata when only one PSU is encountered
in a stratum because standard errors cannot be estimated. While the MISSUNIT
option on the NEST statement in SUDAAN will generate standard error estimates
(see note under SUDAAN example below), Stata does not provide an option for
estimating standard errors when there are some strata with only one PSU. However,
there is a discussion of options for dealing with this situation on the Stata
Web site at http://www.stata.com/support/faqs/stat/stratum.html.
In contrast to SUDAAN and Stata, SAS and SPSS will automatically generate standard errors when there are some strata with only one PSU. The methodology used to compute standard errors when there are some strata with only one PSU used by SAS and SPSS differs from that used by SUDAAN when the MISSUNIT option is specified (see note under example below). Consequently, standard errors from these packages will not necessarily be identical (see example below).
Example: Using the 2002 MEPS hospital inpatient stays file (PUF HC067D) as the analytic file, the following sample programming code for SAS, SUDAAN, and SPSS produce estimates of the average total expense per hospital stay in 2002 ($8,698.00) and the corresponding standard error ($286.55 from SAS and SPSS versus $298.30 from SUDAAN with MISSUNIT option).
SAS
proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var ipxp02x;
Note: The variances for the strata with only one PSU are considered to be 0
in the overall computation of standard errors.
SUDAAN
Note: SUDAAN requires that the data be sorted by the survey design variables
that appear on the NEST statement (i.e. varstr02 varpsu02 in example below).
proc descript filetype=sas design=wr;
nest varstr varpsu/missunit;
weight perwt02f;
var ipxp02x;
Note: The missunit option on the nest statement specifies that if only one
sample unit is encountered within a stage (i.e., one PSU in a stratum), then
the contribution of that unit toward the overall standard error is estimated
using the difference in that unit is value and the overall mean value of the
population.
SPSS
csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=ipxp02x
/mean
/statistics se.
B. MEPS Supplements
The MEPS includes periodic supplements that collect data for only a subset
of sample persons (e.g., persons with a specific health condition). Analyzing
data from these supplements requires the use of special weights that are set
to 0 for persons not included in the supplement. Analysis of MEPS supplements
in which a substantial number of persons have a weight of 0 can produce similar
problems and require similar approaches to those described above in the section
on MEPS eventlevel files.
Example: Using the 2002 MEPS fullyear consolidated file (PUF HC070) as the
analytic file (which contains data from the diabetes supplement), the following
sample programming code for SAS, SUDAAN, and SPSS produce estimates of the
proportion of the population in 2002 that treated their diabetes with insulin
injections (26.63 percent) and the corresponding standard error (1.17 percent
from SAS and SPSS versus 1.19 percent from SUDAAN with MISSUNIT option).
The analytic variable INSINJECT in this example was set equal to 1 if the respondent
indicated they used insulin injections and set equal to 0 if they indicated
they did not use insulin injections.
SAS
proc surveymeans;
stratum varstr;
cluster varpsu;
weight diabw02f;
var insinject;
Note: The variances for the strata with only one PSU are considered to be 0
in the overall standard error computation.
SUDAAN
Note: SUDAAN requires that the data be sorted by the survey design variables
that appear on the NEST statement (i.e., varstr00 varpsu00 in example below).
proc descript filetype=sas design=wr;
nest varstr varpsu/missunit;
weight diabw02f;
var insinject;
Note: The missunit option on the nest statement specifies that if only one
sample unit is encountered within a stage (i.e., one PSU in a stratum), then
the contribution of that unit toward the overall standard error is estimated
using the difference in that unit's value and the overall mean value of the
population.
SPSS
csplan analysis
/plan file=’filename’
/planvars analysisweight=diabw02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=insinject
/mean
/statistics se.
^top
Suggested
Citation:
Machlin, S., Yu, W., and Zodet, M. Computing Standard
Errors for MEPS Estimates. January 2005. Agency for Healthcare
Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/survey_comp/standard_errors.jsp


