In this example, we will show how
to use lslx
to conduct semi-confirmatory factor analysis
with missing data. The example uses data
HolzingerSwineford1939
in the package lavaan
.
Hence, lavaan
must be installed.
Because HolzingerSwineford1939
doesn’t contain missing
values, we use the code in semTools
to create
NA
(see the example of twostage()
function in
semTools
).
data_miss <- lavaan::HolzingerSwineford1939
data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3),
NA, data_miss$x5)
data_miss$age <- data_miss$ageyr + data_miss$agemo/12
data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3),
NA, data_miss$x9)
By the construction, we can see that the missingness of
x5
depends on the value of x1
and the
missingness of x9
relies on the age
variable.
Note that age
is created by ageyr
and
agemo
. Since ageyr
and agemo
are
not the variables that we are interested, the two variables are treated
as auxiliary in the later analysis.
A usual confirmatory factor analysis (CFA) model is specified.
model_miss <- "visual :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed :=> x7 + x8 + x9
visual <=> 1 * visual
textual <=> 1 * textual
speed <=> 1 * speed"
Here, 1
before *
will be interpreted as
fix(1)
. To initialize an lslx
object with
auxiliary variables, we need to specify the
auxiliary_variable
argument. The
auxiliary_variable
argument only accepts numeric variables.
If any categorical variable is considered as a valid auxiliary variable,
user should transform it as a set of dummy variables first. One possible
method is using model.matrix
function.
library(lslx)
lslx_miss <- lslx$new(model = model_miss, data = data_miss,
auxiliary_variable = c("ageyr", "agemo"))
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
Auxiliary Variables: ageyr agemo
Because the specified CFA might not fit the data well, we add a
correlated residual structure to the model by
$penalize_block()
The code penalizes all the coefficients in y<->y
block with fixed parameter type. Note that this model is not identified
under the usual SEM framework. PL method can still estimate it because
the penalty function introduces additional constraints on parameters.
However, we don’t recommend using such type of model because it is
difficult to be interpreted.
So far, the specified auxiliary variables are only stored in
lslx
object. They are actually used after implementing the
$fit()
related methods.
By default, fit
related methods implement two-step
method (possibly with auxiliary variables) for handling missing values.
User can specify the missing method explicitly via
missing_method
argument. Another missing method in the
current version is listwise deletion. However, listwise deletion has no
theoretical advantages over the two-step method.
The following code summarizes the fitting result under the penalty
level selected by a Robust version of Akaike information criterion
(RAIC). The number of missing patterns
shows how many
missing patterns present in the data set (include the complete pattern).
If the lslx
object is initialized via raw data, by default,
a corrected sandwich standard error will be used for coefficient test.
The correction is based on the asymptotic covariance of saturated
moments derived by full information maximum likelihood. Also, the mean
adjusted likelihood ratio test is based on this quantity. For the
reference, please see the section of Missing Data in
?lslx
.
General Information
number of observations 301.000
number of complete observations 138.000
number of missing patterns 4.000
number of groups 1.000
number of responses 9.000
number of factors 3.000
number of free coefficients 30.000
number of penalized coefficients 36.000
Numerical Conditions
selected lambda 0.130
selected delta none
selected step none
objective value 0.168
objective gradient absolute maximum 0.001
objective Hessian convexity 0.741
number of iterations 6.000
loss value 0.051
number of non-zero coefficients 41.000
degrees of freedom 13.000
robust degrees of freedom 16.175
scaling factor 1.244
Fit Indices
root mean square error of approximation (rmsea) 0.025
comparative fit index (cfi) 0.997
non-normed fit index (nnfi) 0.992
standardized root mean of residual (srmr) 0.032
Likelihood Ratio Test
statistic df p-value
unadjusted 15.428 13.000 0.281
mean-adjusted 12.399 13.000 0.495
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.025 0.000 0.071
mean-adjusted 0.000 0.000 0.068
Coefficient Test (Std.Error = "sandwich")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-visual free 0.921 0.092 9.995 0.000 0.740 1.101
x2<-visual free 0.441 0.079 5.560 0.000 0.286 0.597
x3<-visual free 0.607 0.070 8.631 0.000 0.469 0.745
x4<-textual free 0.974 0.063 15.585 0.000 0.852 1.097
x5<-textual free 1.057 0.063 16.738 0.000 0.933 1.180
x6<-textual free 0.920 0.059 15.687 0.000 0.805 1.035
x7<-speed free 0.438 0.094 4.669 0.000 0.254 0.622
x8<-speed free 0.526 0.093 5.627 0.000 0.343 0.709
x9<-speed free 0.799 0.092 8.647 0.000 0.618 0.980
Covariance
type estimate std.error z-value P(>|z|) lower upper
textual<->visual free 0.486 0.075 6.504 0.000 0.340 0.633
speed<->visual free 0.567 0.077 7.405 0.000 0.417 0.718
speed<->textual free 0.243 0.089 2.732 0.006 0.069 0.417
x2<->x1 pen 0.000 - - - - -
x3<->x1 pen 0.000 - - - - -
x4<->x1 pen 0.048 0.050 0.959 0.338 -0.050 0.147
x5<->x1 pen -0.112 0.089 -1.253 0.210 -0.286 0.063
x6<->x1 pen 0.000 - - - - -
x7<->x1 pen -0.080 0.063 -1.260 0.208 -0.204 0.044
x8<->x1 pen 0.000 - - - - -
x9<->x1 pen 0.000 - - - - -
x3<->x2 pen 0.103 0.074 1.387 0.166 -0.043 0.249
x4<->x2 pen 0.000 - - - - -
x5<->x2 pen 0.000 - - - - -
x6<->x2 pen 0.000 - - - - -
x7<->x2 pen -0.104 0.058 -1.817 0.069 -0.217 0.008
x8<->x2 pen 0.000 - - - - -
x9<->x2 pen 0.000 - - - - -
x4<->x3 pen 0.000 - - - - -
x5<->x3 pen -0.079 0.060 -1.317 0.188 -0.196 0.039
x6<->x3 pen 0.000 - - - - -
x7<->x3 pen 0.000 - - - - -
x8<->x3 pen 0.000 - - - - -
x9<->x3 pen 0.000 - - - - -
x5<->x4 pen 0.000 - - - - -
x6<->x4 pen 0.000 - - - - -
x7<->x4 pen 0.079 0.043 1.851 0.064 -0.005 0.163
x8<->x4 pen 0.000 - - - - -
x9<->x4 pen 0.000 - - - - -
x6<->x5 pen 0.000 - - - - -
x7<->x5 pen 0.000 - - - - -
x8<->x5 pen 0.000 - - - - -
x9<->x5 pen 0.008 0.065 0.120 0.904 -0.120 0.136
x7<->x6 pen 0.000 - - - - -
x8<->x6 pen 0.013 0.042 0.300 0.764 -0.070 0.095
x9<->x6 pen -0.017 0.048 -0.354 0.723 -0.111 0.077
x8<->x7 pen 0.254 0.072 3.530 0.000 0.113 0.394
x9<->x7 pen 0.000 - - - - -
x9<->x8 pen 0.000 - - - - -
Variance
type estimate std.error z-value P(>|z|) lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.473 0.143 3.301 0.001 0.192 0.754
x2<->x2 free 1.150 0.108 10.683 0.000 0.939 1.361
x3<->x3 free 0.878 0.077 11.471 0.000 0.728 1.028
x4<->x4 free 0.387 0.053 7.305 0.000 0.284 0.491
x5<->x5 free 0.410 0.064 6.435 0.000 0.285 0.535
x6<->x6 free 0.348 0.046 7.496 0.000 0.257 0.439
x7<->x7 free 0.930 0.086 10.859 0.000 0.762 1.098
x8<->x8 free 0.719 0.090 8.019 0.000 0.543 0.895
x9<->x9 free 0.334 0.137 2.431 0.015 0.065 0.603
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.420 0.093 47.369 0.000 4.237 4.603
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.366 0.074 72.693 0.000 5.221 5.510