The ma_projection()
function implements a model-assisted
projection estimator for combining information from two independent
surveys. This method is especially useful in survey sampling scenarios
where:
This vignette illustrates how to use ma_projection()
for
domain-level estimation using various supervised learning models,
including machine learning techniques via the parsnip
interface.
The approach follows the work of Kim & Rao (2012), where a working model is trained on Survey 2 to predict the outcome variable. Predictions are made for the auxiliary-only Survey 1 data. These predictions are then aggregated by domain to generate small area estimates.
# Filter non-missing values for income
svy22_income <- df_svy22 %>% filter(!is.na(income))
svy23_income <- df_svy23 %>% filter(!is.na(income))
# Fit projection model
lm_result <- ma_projection(
income ~ age + sex + edu + disability,
cluster_ids = "PSU",
weight = "WEIGHT",
strata = "STRATA",
domain = c("PROV", "REGENCY"),
working_model = linear_reg(),
data_model = svy22_income,
data_proj = svy23_income,
nest = TRUE
)
# View results
head(lm_result$df_result)
# Filter youth population for NEET classification
svy22_neet <- df_svy22 %>% filter(between(age, 15, 24))
svy23_neet <- df_svy23 %>% filter(between(age, 15, 24))
# Fit logistic regression model
lr_result <- ma_projection(
formula = neet ~ sex + edu + disability,
cluster_ids = ~ PSU,
weight = ~ WEIGHT,
strata = ~ STRATA,
domain = ~ PROV + REGENCY,
working_model = logistic_reg(),
data_model = svy22_neet,
data_proj = svy23_neet,
nest = TRUE
)
# View results
head(lr_result$df_result)
# Define LightGBM model with tuning
lgbm_model <- boost_tree(
mtry = tune(), trees = tune(), min_n = tune(),
tree_depth = tune(), learn_rate = tune(),
engine = "lightgbm"
)
# Fit with cross-validation
lgbm_result <- ma_projection(
formula = neet ~ sex + edu + disability,
cluster_ids = "PSU",
weight = "WEIGHT",
strata = "STRATA",
domain = c("PROV", "REGENCY"),
working_model = lgbm_model,
data_model = svy22_neet,
data_proj = svy23_neet,
cv_folds = 3,
tuning_grid = 5,
nest = TRUE
)
# View results
head(lgbm_result$df_result)
ma_projection()
supports many working models using the
parsnip
interface, including:
linear_reg()
, logistic_reg()
(also with
Stan engine)poisson_reg()
, mlp()
,
naive_bayes()
, nearest_neighbor()
decision_tree()
, bag_tree()
,
boost_tree()
with LightGBM/XGBoost,
rand_forest()
(ranger, aorsf), bart()
svm_linear()
, svm_poly()
,
svm_rbf()
Kim, J. K., & Rao, J. N. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika, 99(1), 85–100. doi:10.1093/biomet/asr063
ma_projection()
provides a flexible and robust way to
combine survey data using modern modeling tools. It supports a wide
range of use cases including socioeconomic indicators, health estimates,
and more.