--- title: "Factor Analysis with Missing Data" author: "Po-Hsien Huang" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Factor Analysis with Missing Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r comment = "", message = FALSE, setup, include=FALSE} options(digits = 3) options(width = 100) ``` In this example, we will show how to use `lslx` to conduct semi-confirmatory factor analysis with missing data. The example uses data `HolzingerSwineford1939` in the package `lavaan`. Hence, `lavaan` must be installed. ## Missing Data Construction Because `HolzingerSwineford1939` doesn't contain missing values, we use the code in `semTools` to create `NA` (see the example of `twostage()` function in `semTools`). ```{r comment = "", message = FALSE} data_miss <- lavaan::HolzingerSwineford1939 data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3), NA, data_miss$x5) data_miss$age <- data_miss$ageyr + data_miss$agemo/12 data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3), NA, data_miss$x9) ``` By the construction, we can see that the missingness of `x5` depends on the value of `x1` and the missingness of `x9` relies on the `age` variable. Note that `age` is created by `ageyr` and `agemo`. Since `ageyr` and `agemo` are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis. ## Model Specification and Object Initialization A usual confirmatory factor analysis (CFA) model is specified. ```{r comment = "", message = FALSE} model_miss <- "visual :=> x1 + x2 + x3 textual :=> x4 + x5 + x6 speed :=> x7 + x8 + x9 visual <=> 1 * visual textual <=> 1 * textual speed <=> 1 * speed" ``` Here, `1` before `*` will be interpreted as `fix(1)`. To initialize an `lslx` object with auxiliary variables, we need to specify the `auxiliary_variable` argument. The `auxiliary_variable` argument only accepts numeric variables. If any categorical variable is considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using `model.matrix` function. ```{r comment = "", message = FALSE} library(lslx) lslx_miss <- lslx$new(model = model_miss, data = data_miss, auxiliary_variable = c("ageyr", "agemo")) ``` Because the specified CFA might not fit the data well, we add a correlated residual structure to the model by `$penalize_block()` ```{r comment = "", message = FALSE} lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE) ``` The code penalizes all the coefficients in `y<->y` block with fixed parameter type. Note that this model is not identified under the usual SEM framework. PL method can still estimate it because the penalty function introduces additional constraints on parameters. However, we don't recommend using such type of model because it is difficult to be interpreted. ## Model Fitting So far, the specified auxiliary variables are only stored in `lslx` object. They are actually used after implementing the `$fit()` related methods. ```{r comment = "", message = FALSE} lslx_miss$fit_lasso() ``` By default, `fit` related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via `missing_method` argument. Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two-step method. ## Model Summarizing The following code summarizes the fitting result under the penalty level selected by a Robust version of Akaike information criterion (RAIC). The `number of missing patterns` shows how many missing patterns present in the data set (include the complete pattern). If the `lslx` object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test. The correction is based on the asymptotic covariance of saturated moments derived by full information maximum likelihood. Also, the mean adjusted likelihood ratio test is based on this quantity. For the reference, please see the section of Missing Data in `?lslx`. ```{r comment = "", message = FALSE, fig.width = 24, fig.height = 14} lslx_miss$summarize(selector = "raic") ```