Methodology Report #21: An Analysis of the Effects of Post-stratification on Errors for Estimates Using the 2003 Medical Expenditure Panel Survey Household Component

Font Size:

Methodology Report #21:
An Analysis of the Effects of Post-stratification on Errors for Estimates Using the 2003 Medical Expenditure Panel Survey Household Component

John P. Sommers, PhD, Agency for Healthcare Research and Quality

Table of Contents

Abstract

The Medical Expenditure Panel Survey (MEPS)

Background: Post-stratification

Household Component of the Medical Expenditure Panel Survey and Post-stratification

Results: Taylor Series Error Estimates

Table A. Average relative standard errors using regional raking and Taylor series, MEPS-HC, 2003

Table B. Average relative standard errors using state raking and Taylor series, MEPS-HC, 2003

Results: Balanced Repeated Replication Error Estimates

Table C. Average relative standard errors using regional raking and balanced repeated replication, MEPS-HC, 2003

Table D. Average relative standard errors using state raking and balanced repeated replication, MEPS-HC, 2003

Conclusions and Recommendations

References

Appendix

Abstract

This report shows the effects of post-stratification of weights for the Household Component of the 2003 Medical Expenditure Panel Survey, using various sets of marginal control totals on two types of variance estimates for a variety of national, regional, and state estimates. The variances estimation methods are a balanced repeated replication BRR method, which includes the effects of the raking on the variance, and a Taylor series method, which does not include the effects of raking. It is shown that although the estimates decrease using the BRR method when a limited set of selected controls are added to the raking process, the Taylor series estimates for the same values actually increase.

The estimates in this report are based on the most recent data available at the time the report was written. However, selected elements of MEPS data may be revised on the basis of additional analyses, which could result in slightly different estimates from those shown here. Please check the MEPS Web site for the most current file releases.

Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
http://www.meps.ahrq.gov/

Return to Table of Contents

The Medical Expenditure Panel Survey (MEPS)

Background

The Medical Expenditure Panel Survey (MEPS) is conducted to provide nationally representative estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian noninstitutionalized population. MEPS is cosponsored by the Agency for Healthcare Research and Quality (AHRQ), formerly the Agency for Health Care Policy and Research, and the National Center for Health Statistics (NCHS).

MEPS comprises three component surveys: the Household Component (HC), the Medical Provider Component (MPC), and the Insurance Component (IC). The HC is the core survey, and it forms the basis for the MPC sample and part of the IC sample. Together these surveys yield comprehensive data that provide national estimates of the level and distribution of health care use and expenditures, support health services research, and can be used to assess health care policy implications.

MEPS is the third in a series of national probability surveys conducted by AHRQ on the financing and use of medical care in the United States. The National Medical Care Expenditure Survey (NMCES) was conducted in 1977, the National Medical Expenditure Survey (NMES) in 1987. Beginning in 1996, MEPS continues this series with design enhancements and efficiencies that provide a more current data resource to capture the changing dynamics of the health care delivery and insurance system.

The design efficiencies incorporated into MEPS are in accordance with the Department of Health and Human Services (DHHS) Survey Integration Plan of June 1995, which focused on consolidating DHHS surveys, achieving cost efficiencies, reducing respondent burden, and enhancing analytical capacities. To accommodate these goals, new MEPS design features include linkage with the National Health Interview Survey (NHIS), from which the sample for the MEPS-HC is drawn, and enhanced longitudinal data collection for core survey components. The MEPS-HC augments NHIS by selecting a sample of NHIS respondents, collecting additional data on their health care expenditures, and linking these data with additional information collected from the respondents’ medical providers, employers, and insurance providers.

Household Component

The MEPS-HC, a nationally representative survey of the U.S. civilian noninstitutionalized population, collects medical expenditure data at both the person and household levels. The HC collects detailed data on demographic characteristics, health conditions, health status, use of medical care services, charges and payments, access to care, satisfaction with care, health insurance coverage, income, and employment.

The HC uses an overlapping panel design in which data are collected through a preliminary contact followed by a series of five rounds of interviews over a two and a half year period. Using computer-assisted personal interviewing (CAPI) technology, data on medical expenditures and use for two calendar years are collected from each household. This series of data collection rounds is launched each subsequent year on a new sample of households to provide overlapping panels of survey data and, when combined with other ongoing panels, will provide continuous and current estimates of health care expenditures.

The sampling frame for the MEPS-HC is drawn from respondents to NHIS, conducted by NCHS. NHIS provides a nationally representative sample of the U.S. civilian noninstitutionalized population, with oversampling of Hispanics and blacks.

Medical Provider Component

The MEPS-MPC supplements and validates information on medical care events reported in the MEPS-HC by contacting medical providers and pharmacies identified by household respondents. The MPC sample includes all hospitals, hospital physicians, home health agencies, and pharmacies reported in the HC. Also included in the MPC are all office-based physicians:

Providing care for HC respondents receiving Medicaid.
Associated with a 75 percent sample of households receiving care through an HMO (health maintenance organization) or managed care plan.
Associated with a 25 percent sample of the remaining households. Data are collected on medical and financial characteristics of medical and pharmacy events reported by HC respondents, including:

Diagnoses coded according to ICD-9 (9th Revision, International Classification of Diseases) and DSMIV (Fourth Edition, Diagnostic and Statistical Manual of Mental Disorders).
Physician procedure codes classified by CPT-4 (Current Procedural Terminology, Version 4).
Inpatient stay codes classified by DRG (diagnosis related group).
Prescriptions coded by national drug code (NDC), medication names, strength, and quantity dispensed.
Charges, payments, and the reasons for any difference between charges and payments.

The MPC is conducted through telephone interviews and mailed survey materials.

Insurance Component

The MEPS-IC collects data on health insurance plans obtained through private and public sector employers. Data obtained in the IC include the number and types of private insurance plans offered, benefits associated with these plans, premiums, contributions by employers and employees, and employer characteristics.

Establishments participating in the MEPS-IC are selected through three sampling frames:

A list of employers or other insurance providers identified by MEPS-HC respondents who report having private health insurance at the Round 1 interview.
A Bureau of the Census list frame of private-sector business establishments.
The Census of Governments from the Bureau of the Census.

To provide an integrated picture of health insurance, data collected from the first sampling frame (employers and other insurance providers) are linked back to data provided by the MEPS-HC respondents. Data from the other three sampling frames are collected to provide annual national and state estimates of the supply of private health insurance available to American workers and to evaluate policy issues pertaining to health insurance. Since 2000, the Bureau of Economic Analysis has used national estimates of employer contributions to group health insurance from the MEPS-IC in the computation of Gross Domestic Product (GDP).

The MEPS-IC is an annual panel survey. Data are collected from the selected organizations through a prescreening telephone interview, a mailed questionnaire, and a telephone follow-up for nonrespondents.

Survey Management

MEPS data are collected under the authority of the Public Health Service Act. They are edited and published in accordance with the confidentiality provisions of this act and the Privacy Act. NCHS provides consultation and technical assistance.

As soon as data collection and editing are completed, the MEPS survey data are released to the public in staged releases of summary reports and microdata files. Summary reports are released as printed documents and electronic files. Microdata files are released on CD-ROM and/or as electronic files.

Printed documents and CD-ROMs are available through the AHRQ Publications Clearinghouse. Write or call:

AHRQ Publications Clearinghouse
Attn: (publication number)
P.O. Box 8547 Silver Spring, MD 20907
800-358-9295
703-437-2078 (callers outside the United States only)
888-586-6340 (toll-free TDD service; hearing impaired only)

To order online, send an e-mail to: ahrqpubs@ahrq.gov.

Be sure to specify the AHRQ number of the document or CD-ROM you are requesting. Selected electronic files are available through the Internet on the MEPS Web site: http://www.meps.ahrq.gov/

For more information, visit the MEPS Web site or e-mail mepspd@ahrq.gov.

Return to Table of Contents

Background: Post-stratification

Post-stratification is a process used in survey sampling in which estimates from a survey are normalized to external control totals by adjusting the weights of the sampling units so that they add to external totals. These totals are usually Census or other values that are considered to have no error or far less error than the estimates from the survey. Although weight adjustments can be quite complex, a simple example of the most straightforward method, cell adjustment, where adjustment is done by adjusting the weights for a set of cell as follows:

Suppose we had a demographic person-level survey where each person had a weight W sub i j where i = 1 or 2 and j = 1, 2... n sub i . Suppose that the value of i indicated the gender of the respondent and for some measurement, y sub i j , collected from each person, the gender of the person was highly correlated with the value of . An example would be the height of the respondent.

Suppose one desired an estimate of the average height of the entire population. Using only information from the survey, then the estimate of average height would be

The estimate of average height equals ((sum of weight times height of each respondent in group 1) +
(sum of weight times height of each respondent in group 2)) then divides by ((sum of weight of each respondent
in group 1) + (sum of weight of each respondent in group 2))

a simple average.

However, if c sub i were the census total for the ith group, then a better estimate might be

The estimate of average height equals ((census total times R for group 1) + (census total times R
for group 2)) then divides by (census total for group 1 + census total for group 2) ,

where

R equals sum of weight times height of each respondent for a group then divided by
sum of weight of each respondent for that group

If one defined

The estimated census total for a group equals sum of weight for that group

then one could see that

The estimate of average height equals ((the estimated census total times R for group 1) +
(the estimated census total times R for group 2)) then divides by (the estimated census total for group 1 +
the estimated census total for group 2 .

Thus, one has basically formed the new estimate, h sub a , by replacing estimates of the number in each group with a better estimate or census total. If the expected values of the R’s for each group are very different and the estimation errors for the R’s are not large this new estimate could be much better than the original estimate for the average.

One can see that if one changed the weights to

The weight adjustment equals the census total for the ith group times each weight in that group
then divides by the estimated census total for that group

then if these adjusted weights were used in the original estimate the adjusted estimate would be obtained. Thus, one can use the old form of the estimator and the adjusted weights to obtain the new adjusted estimates.

In general, simple cell adjustment post-stratification could break the population into many more cells. These cells could be defined by the crosses of several variables. At this point, the estimate for the entire data set would be

The estimate of average height equals sum of census total times R for each cell divides by sum of
census total for each cell

In the cell adjustment method just specified, because adjustments to the weights are the same for each person in the cell, the estimates R sub i are the original estimates for the average for the cell. This is not always the case with weight-adjustment methods.

Return to Table of Contents

If the cells were defined using the cross of more than one cell-defining variable, then the adjustment is made for cells defined using the cross of all the variables. For instance, if one had two gender groups and five age groups, there would be 10 cells and each cell would have its population adjusted to the population total, and the R’s would be the estimates using the original weights for the rate for the persons with both characteristics used to define the cell, persons of both a specific gender and age group.

There is improvement in precision if the expected values of the R’s are dispersed and the sampling errors in the R’s are not as large relative to the differences in expected values. If the errors are larger than the differences, the errors essentially mask the difference in expected values. For example, if the expected R’s were equal in value and their errors were large because cell samples were small, post-stratification would not help because there is no difference in the cell average, so they should be combined to make a better estimate of the single R value. In such cases, there could be higher precision if the number of cells were reduced. Kish says this happens when the samples in the sub cells are subject to variability and the sample sizes become significantly out of proportion to the cell totals (Kish, 1965).

The effects of post-stratification on standard errors depend upon the survey and the cells. However, its positive effects may be minimal. According to Kish, it seldom results in large gains, and Korn and Graubard (1999) imply it has little effect and that most people do not consider post-stratification in their error calculations because the difference in results would be minimal. Kish also notes that the counts being estimated from the survey must be the same item as the control total being used or there is bias in the values of the post-stratified estimate. Thus, for instance, if the control counts came from another survey that had a different expected value in its counts per cell due to question differences or different methods of administration of the surveys, bias could occur if counts with different expected values per cell from the other survey were used for post-stratification. One should also note that the adjustment of weights may increase the variability of the weights and lead to higher errors for domains that are not one of the weighting cells.

Another common method of using control totals is raking. In this method, the sum of the weights, the estimate of the population for the cell, is again adjusted to population totals. However, if multiple variables are used in raking, the weights are adjusted so that the marginal totals are equal to the population totals, not the values for the individual cells made by crossing the variables. Thus, in our example above with two gender and five age classes, the sample totals would equal the population totals for the two gender groups and five age groups but not the cell defined by the cross of each a specific gender and age group. This type of adjustment is usually done using some type of iterative process.(Kalton and Flores-Cervantes, 2003.)

When raking is used, the values of the R’s for the raked cells are not the original R values because weights are changed within the cells. Thus, if, for example, we raked over two dimensions, age and race, then if we considered the sum over age:

The estimate of average height over age equals sum of census total times R for each cell
divides by sum of census total for each cell

we would need to consider the effect of changed weights on the variance of R sup apostrophe because the raking can change the weights within the cells when raking is done. One needs to be cautious, as this unequal weighting may increase the error more than the controlling of the weights to population can decrease it.

Typically, post-stratification cells are defined using age, race/ethnicity, and sex plus perhaps a variable that is uniquely related to the purpose of the survey. For instance, the National Survey of Family Growth uses number of children born along with the standard demographics. Cell classification is likely to be used if sample sizes within cells support this method. However, if not, then raking might be used to cut the numbers of cells.(Korn and Graubard, 1999)

Return to Table of Contents

Household Component of the Medical Expenditure Panel Survey and Post-stratification

The Household Component of the Medical Expenditure Panel Survey (MEPS-HC) is a national household survey of the U.S. civilian noninstitutionalized population. The survey collects information on the respondent’s health, health care, and health expenditures along with certain personal demographic, insurance, and job information. Currently, the sample for the HC is selected as a subsample of respondents from the National Health Interview Survey (NHIS) (Cohen, 2000). The sample contains an over sample of blacks, Hispanics, Asians, and poor (MEPS Web site). The final weights are for the HC are created using adjustments to the NHIS weights. The weights are adjusted for dwelling unit nonresponse, person-level (survey attrition) nonresponse. After nonresponse adjustment, the weights are adjusted using a complex raking poststratification adjustment that uses multiple sets of controls developed using sets of cells defined using crosses of age, race/ethnicity, sex, region, poverty status, and metropolitan statistical area (MSA) status (whether one lives in an MSA) (Alvarez-Rojas, 2005). The purpose of this analysis is to gain knowledge of the effects of raking post-stratification on MEPS-HC error estimates.

To begin this analysis, we created a large number of sets of post-stratified weights for the MEPS-HC 2003 data. These weights were created by raking to an array of control totals defined by the factors region, state, race/ethnicity, age, MSA status, poverty status, and gender. Basically, we created a set of weights within region that were raked to each of the other factors except state and gender, individually, then using these four marginals raked to each combination of two marginals, then combinations of three marginals, and four marginals. We then did the case with five marginals by including the gendermarginal control. The list of different sets of marginals considered can be seen more clearly in the next sections, which discuss results. We did not do any weights where the controls were defined by the cross of two of the controls within region. So, when we say we raked to the region, by age and MSA status, we mean we raked to the age and MSA status marginals within each of the four regions. We repeated the process by using a set of state-defined geographic cells where each of the 30 largest states was an individual cell similar to a region, and the remaining states within each region made up four more cells. Thus, if we say we raked to state by age and MSA status, it means we raked to the state marginal totals for age group and MSA status group within each of the 34 state groups. This created a much finer break. We did only the 30 states individually because from experience we knew that for the remaining states the samples were very small andestimates could not be made for the smallest states. The definitions of cells that define the marginals can be seen in the Appendix. We also created national, regional, and state post-stratifications by using the geographic controls to do a cell adjustment for each of the three sets of geographic cells. These were done as base cases. Note: We did not trim the weights as is done in the actual raking with the MEPS-HC, since in 2003 only one weight was trimmed and our largest weights after raking were not much different than those in operations after trimming. Thus, we believed for this project trimming was not necessary.

With each set of weights, we created a set of national-, regional-, and state-level estimates for six types of expenditure classes: all, dental, prescription drugs, office based, hospital outpatient, and hospital inpatient. For each class, we created estimates of the percentage of persons with an expenditure, the mean expenditure for those with an expenditure, the percentage paid out of pocket, and the percentage paid by private insurance. For each estimate, we created a variance estimate using two methods:

Taylor series method, which does not consider the effect of post-stratification of errors.
Balanced repeated replication (BRR), where the sets of weights were raked for each of the replicates to the control totals. This method should show effects of the post-stratification on the error estimates. (Wolter, 1985)

Since the MEPS-HC sample that is post-stratified to the set of point-in-time control totals is the set of persons in the sample who are still living at the end of the survey year, this sample was the only sample used in this study. This sample is the vast majority of the total MEPS-HC sample, and the relative results should still hold when the small amount of other sample is included in estimates. This dropped sample includes a small number of persons in the total sample who die during the year or who are in the civilian noninstitutionalized population for only part of the year. This small sample does not go through the post-stratification process we are discussing. (MEPS Web site)

Return to Table of Contents

Results: Taylor Series Error Estimates

The results of Taylor Series estimation of relative standard errors (RSE) using the various sets of raked weights are given in tables A and B. Table A has the results for estimates made with weights raked at the regional level, and table B has results where the raking cells were created at the state level. For each sets of raked weights, there are 4 averages given. The first is for 120 estimates of means made at the national and regional levels. The second set is for the 816 mean estimates at the state level. The third set is for estimates of sums made at the national and regional levels. The fourth set is for estimates of sums made at the state level. We divided the results into these categories because these groupings show the key differences in results. In reviewing these results, one must remember that the Taylor Series estimates of error do not reflect the effects of post-stratification.

The following are some key features of these results:

There is not a wide range of results for any of the types of estimates.
For means for each set of estimates (column), the average errors are higher for post-stratification to the same defining variables done at the state than the regional level. (For example, the results for means for the weights raked to poverty at state level are higher than those for weights raked to poverty at the regional level.)
For means for each set of estimates, there is a high correlation between the standard errors of the distributions of the sets of weights and the average standard errors obtained with the set of weights. This is true for each of the two mean columns taken individually in table A or table B. It is also true for each of the two columns if the results from tables A and B are pooled. Since the state post-stratifications in table B have many more sets of controls, this would imply the fact reported in the second feature above.

**Table A. Average relative standard errors using regional raking and Taylor series, MEPS-HC, 2003**
Regional raking cells	Std error of weights	Regional/national estimates	State estimates	Regional/national estimates	State estimates
		Average relative standard errors
		Mean estimates		Sum estimates
National^*	6001	0.05523	0.13637	0.08981	0.32013
None^**	6012	0.05524	0.13637	0.08980	0.32013
Age	6092	0.05581	0.13672	0.09015	0.32018
Ethnicity	6006	0.05529	0.13639	0.08990	0.32015
MSA	5958	0.05462	0.13604	0.08927	0.31809
Poverty	6069	0.05529	0.13629	0.09007	0.32035
Age, Ethnicity	6075	0.05584	0.13674	0.09021	0.32019
Age, MSA	6040	0.05513	0.13638	0.08960	0.31818
Age, Poverty	6145	0.05587	0.13672	0.09036	0.32037
Ethnicity, MSA	5922	0.05461	0.13606	0.08924	0.31798
Poverty, Ethnicity	6049	0.05528	0.13627	0.09003	0.32035
Poverty, MSA	6037	0.05463	0.13587	0.08953	0.31840
Age, Ethnicity, MSA	5994	0.05506	0.13643	0.09009	0.32001
Age, Poverty , Ethnicity	6112	0.05586	0.13672	0.09030	0.32036
Age, Poverty, MSA	6107	0.05514	0.13632	0.08980	0.31844
Age, Ethnicity, Sex	6078	0.05589	0.13678	0.09025	0.32024
Poverty, Ethnicity, MSA	5983	0.05453	0.13582	0.08936	0.31823
Age, Poverty , Ethnicity, MSA	6051	0.05505	0.13630	0.08961	0.31827
Age, Poverty , Ethnicity, MSA, Sex	6052	0.05506	0.13631	0.08961	0.31831

^* Raked to national total only
^** Raked to regional totals only, no sub-regional raking

Return to Table of Contents

**Table B. Average relative standard errors using state raking and Taylor series, MEPS-HC, 2003**
State raking cells	Std error of weights	Regional/national estimates	State estimates	Regional/national estimates	State estimates
		Average relative standard errors
		Mean estimates		Sum estimates
National^*	6001	0.05523	0.13637	0.08981	0.32013
None^**	6210	0.05624	0.13637	0.08844	0.32013
Age	6334	0.05660	0.13735	0.08831	0.31971
Ethnicity	6317	0.05632	0.13603	0.08862	0.32016
MSA	6368	0.05624	0.13771	0.0887	0.31943
Poverty	6404	0.05681	0.13760	0.08884	0.32065
Age, Ethnicity	6435	0.05660	0.13689	0.08843	0.31971
Age, MSA	6485	0.05658	0.13865	0.08867	0.31924
Age, Poverty	6522	0.05697	0.13867	0.08849	0.32019
Ethnicity, MSA	6474	0.05637	0.13734	0.08906	0.31969
Poverty, Ethnicity	6482	0.05673	0.13706	0.08887	0.32072
Poverty, MSA	6565	0.05661	0.13854	0.08890	0.32002
Age, Ethnicity, MSA	6585	0.05634	0.13809	0.08901	0.32124
Age, Poverty , Ethnicity	6600	0.05677	0.13803	0.08839	0.32025
Age, Poverty, MSA	6673	0.05674	0.13958	0.08860	0.31969
Age, Ethnicity, Sex	6445	0.05660	0.13697	0.08842	0.31968
Poverty, Ethnicity, MSA	6643	0.05663	0.13795	0.08918	0.32029
Age, Poverty , Ethnicity, MSA	6752	0.05665	0.13898	0.08871	0.31995
Age, Poverty , Ethnicity, MSA, Sex	6760	0.05662	0.13911	0.08865	0.31989

^* Raked to national total only
^** Raked to state totals only, no sub-state raking

Return to Table of Contents

Results: Balanced Repeated Replication Error Estimates

The BRR results are shown in tables C and D. Table C contains results of average RSE’s for regional post-stratifications and table D state-level post-stratifications. The BRR results have estimates that are at overall levels similar to those of the Taylor Series. However, the patterns and relationships among the results are different since these results reflect the effect of post-stratification on the variances.

Some of the key results from the BRR results include the following:

Post-stratification has a small but significant effect on the errors of the estimates. For instance, for regional and national estimates and regional post-stratification, the best results for means decline about 4 percent relative to the estimates with simple one-cell national-level post-stratification.
Adding raking variables generally improves results. Results with two variables are generally better than those with one variable, three variables better than two, etc. Adding the fifth variable, sex, only seems to help for totals. However, this may be because sex is a weak predictor, not because we have reached any limit caused by using too many cells.
Of the four main variables used, age, ethnicity, MSA status, and poverty, there seems to be no clear pattern of variables that consistently gives better results in terms of precision. For instance, for regional-level means, the variable that singly gave the best improvement when raking to region crossed with that variable, poverty seems to be the single most important variable. But for the state-level estimates with state raking, age seems to be the single most important variable. It may be that because improvements in results are so small, although all the variables improve results, as seen with the overall downward trend when more variables are added, that it is impossible to order the variables in their overall effect.
Regional raking works best for regional results and state raking works best for state results. State-level raking for regional and national estimates may create too many cells for those types of estimates.
Any type of geographic raking to a geographic level equal or below that of the estimates has a marked effect on the errors of totals. State or regional raking helped regional estimates for totals significantly. State-level raking improved state estimates, but regional raking did not.
There was no significant correlation within groups of estimates with the standard errors of the estimates.

**Table C. Average relative standard errors using regional raking and balanced repeated replication, MEPS**
Regional raking cells	Std error of weights	Regional/national estimates	State estimates	Regional/national estimates	State estimates
		Average relative standard errors
		Mean estimates		Sum estimates
National^*	6001	0.05505	0.14061	.09431	.31878
None^**	6012	0.05504	0.14061	.06794	.31265
Age	6092	0.05452	0.14044	.06675	.31175
Ethnicity	6006	0.05481	0.14065	.06870	.31156
MSA	5958	0.05431	0.14027	.06748	.31042
Poverty	6069	0.05428	0.14063	.06712	.31238
Age, Ethnicity	6075	0.05424	0.14046	.06629	.31067
Age, MSA	6040	0.05379	0.14011	.06610	.30959
Age, Poverty	6145	0.05393	0.14030	.06614	.31145
Ethnicity, MSA	5922	0.05402	0.14027	.06666	.30908
Poverty, Ethnicity	6049	0.05422	0.14038	.06706	.31141
Poverty, MSA	6037	0.05349	0.13990	.06628	.31025
Age, Ethnicity, MSA	5994	0.05342	0.14035	.06598	.31018
Age, Poverty , Ethnicity	6112	0.05383	0.14075	.06598	.31049
Age, Poverty, MSA	6107	0.05315	0.13992	.06544	.30939
Age, Ethnicity, Sex	6078	0.05429	0.14053	.06623	.31078
Poverty, Ethnicity, MSA	5983	0.05333	0.13991	.06616	.30903
Age, Poverty , Ethnicity, MSA	6051	0.05298	0.13996	.06522	.30820
Age, Poverty , Ethnicity, MSA, Sex	6052	0.05302	0.14001	.06513	.30830

^* Raked to national total only
^** Raked to regional totals only, no sub-regional raking

Return to Table of Contents

**Table D. Average relative standard errors using state raking and balanced repeated replication, MEPS-HC, 2003**
State raking cells	Std error of weights	Regional/national estimates	State estimates	Regional/national estimates	State estimates
		Average relative standard errors
		Mean estimates		Sum estimates
National^*	6001	.05505	.14061	.09431	.31878
None^**	6210	.05632	.14061	.06868	.18075
Age	6334	.05596	.13746	.06777	.17516
Ethnicity	6317	.05626	.13981	.06870	.18032
MSA	6368	.05555	.13948	.06731	.17889
Poverty	6407	.05637	.14063	.06877	.18068
Age, Ethnicity	6435	.05585	.13664	.06629	.17473
Age, MSA	6485	.05512	.13621	.06669	.17325
Age, Poverty	6525	.05574	.13763	.06725	.17490
Ethnicity, MSA	6473	.05551	.13861	.06748	.17824
Poverty, Ethnicity	6484	.05611	.13950	.06853	.18021
Poverty, MSA	6568	.05544	.13902	.06736	.17833
Age, Ethnicity, MSA	6585	.05476	.13519	.06671	.17392
Age, Poverty , Ethnicity	6602	.05540	.13643	.06693	.17459
Age, Poverty, MSA	6677	.05477	.13582	.06612	.17250
Age, Ethnicity, Sex	6444	.05583	.13658	.06749	.17459
Poverty, Ethnicity, MSA	6646	.05527	.13787	.06737	.17788
Age, Poverty , Ethnicity, MSA	6585	.05455	.13465	.06604	.17224
Age, Poverty , Ethnicity, MSA, Sex	6762	.05453	.13473	.06590	.17218

^* Raked to national total only
^** Raked to state totals only, no sub-state raking

Return to Table of Contents

Conclusions and Recommendations

From this current research performed until now on post-stratification of MEPS-HC data, we see that:

Post-stratification with the key non-geographic HC variables for the most part has improved results. No one variable seems to be the most effective over all types of estimates..
Post-stratification by state is useful for state estimates, especially totals, but not higher-level geographic estimates.
It appears that the use of all the five raking dimensions selected does not cause the quality of results to diminish when done at the regional level. This would indicate that more post-stratification dimensions could possibly be added at the regional level without decreasing the quality of the results.
Taylor Series estimates of error are of the same magnitude as the BRR estimates that take into account the effects of post-stratification. However, they increase slightly as the weights are post-stratified, even for post-stratification that actually improves the actual expected error indicated with BRR error estimates that consider the effects of post-stratification.

It is recommended that:

State estimates be made with data that is post-stratified to state control totals plus at least age and ethnicity.
More work be done with post-stratification at the regional level using somewhat more complex marginal control totals. For instance, one might want to add cross classifications of some the variables for the regional marginal controls; e.g., one might use age crossed with ethnicity or MSA status control totals.
Other variables be tried in the raking that are not now used in the HC raking. An example would be marital status.
The effect of post-stratification at the national level using more complex sets of cells, such as cells defined by crosses of three variables, be examined for nationaland regional-level estimates.
The effect of the current HC post-stratification be evaluated.

Return to Table of Contents

References

Alvarez-Rojas, L. (2005). Panel 7 and Panel 8 combined, Full Year 2003: raking person weights including the poverty status to obtain the Expenditure person weights. Memorandum to Tom Hankins. Internal Memorandum, WGTS 1230.02, D-P7P8FY03: 10.02, Rockville, Md.: Westat, Inc.

Cohen, S. B. Sample Design of the 1997 Medical Expenditure Panel Survey Household Component. MEPS Methodology Report No. 11 AHRQ Pub No. 01-0001. AHRQ: Rockville, Md.: 2000.

Kalton, G. and Flores-Cervantes, I. (2003). Weighting Methods. Journal of Official Statistics, 19, 81-97.

Kish, L. Survey Sampling. New York: Wiley: 1965.

Korn, E. L. and Graubard, B. I. Analysis of Health Surveys. New York: Wiley: 1999.

MEPS Web site. HC-082 2004 Full Year Population Characteristics File (replaced by HC-089 Full Year Consolidated Data File). http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-089

Wolter, K. M. Introduction to Variance Estimation. New York: Springer-Verlag: 1985.

Return to Table of Contents

Appendix

Variable Sex
Male
Female

Variable MSA
non-MSA
MSA

Variable Region
Northeast
Midwest
South
West

Variable Poverty
Poor/near poor (less than 125 percent of poverty line)
Low (125 percent to 200 percent of poverty line)
Middle (200 percent to 400 percent of poverty line)
High (over 400 percent of poverty line)

Variable Ethnicity
Hispanic
Non-Hispanic black
Others

Variable Age
0
1-19
20-29
30-44
45-64
65 and over

Return to Table of Contents

Return to the MEPS Homepage

Suggested Citation:
Sommers, J. P. An Analysis of the Effects of Post-stratification on Errors for Estimates Using the 2003 Medical Expenditure Panel Survey Household Components. Methodology Report No. 21. March 2007. Agency for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr21/mr21.shtml

MEPS HOME . CONTACT MEPS . MEPS FAQ . MEPS SITE MAP . MEPS PRIVACY POLICY . ACCESSIBILITY . VIEWERS & PLAYERS . COPYRIGHT