In this example, we will show how
to use lslx
to conduct semi-confirmatory factor analysis.
The example uses data HolzingerSwineford1939
in the package
lavaan
. Hence, lavaan
must be installed.
In the following specification, x1
- x9
is
assumed to be measurements of 3 latent factors: visual
,
textual
, and speed
.
model_fa <- "visual :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed :=> x7 + x8 + x9
visual :~> x4 + x5 + x6 + x7 + x8 + x9
textual :~> x1 + x2 + x3 + x7 + x8 + x9
speed :~> x1 + x2 + x3 + x4 + x5 + x6
visual <=> fix(1)* visual
textual <=> fix(1)* textual
speed <=> fix(1)* speed"
The operator :=>
means that the LHS latent factors is
defined by the RHS observed variables. In particular, the loadings are
freely estimated. The operator :~>
also means that the
LHS latent factors is defined by the RHS observed variables, but these
loadings are set as penalized coefficients. In this model,
visual
is mainly measured by x1
-
x3
, textual
is mainly measured by
x4
- x6
, and speed
is mainly
measured by x7
- x9
. However, the inclusion of
the penalized loadings indicates that each measurement may not be only
influenced by one latent factor. The operator <=>
means that the LHS and RHS variables/factors are covaried. If the LHS
and RHS variable/factor are the same, <=>
specifies
the variance of that variable/factor. For scale setting,
visual <=> fix(1) * visual
makes the variance of
visual
to be zero. Details of model syntax can be found in
the section of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Every time
we conduct analysis with lslx
, an lslx
object
must be initialized. The following code initializes an lslx
object named lslx_fa
.
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
Here, lslx
is the object generator for lslx
object and $new()
is the build-in method of
lslx
to generate a new lslx
object. The
initialization of lslx
requires users to specify a model
for model specification (argument model
) and a data to be
fitted (argument sample_data
). The data set must contain
all the observed variables specified in the given model. In is also
possible to initialize an lslx
object via sample moments
(see vignette("structural-equation-modeling")
).
To see the model specification, we may use the
$extract_specification()
method.
relation left right group reference matrix block type start
x1<-1/g x1<-1 x1 1 g FALSE alpha y<-1 free NA
x2<-1/g x2<-1 x2 1 g FALSE alpha y<-1 free NA
x3<-1/g x3<-1 x3 1 g FALSE alpha y<-1 free NA
x4<-1/g x4<-1 x4 1 g FALSE alpha y<-1 free NA
x5<-1/g x5<-1 x5 1 g FALSE alpha y<-1 free NA
x6<-1/g x6<-1 x6 1 g FALSE alpha y<-1 free NA
x7<-1/g x7<-1 x7 1 g FALSE alpha y<-1 free NA
x8<-1/g x8<-1 x8 1 g FALSE alpha y<-1 free NA
x9<-1/g x9<-1 x9 1 g FALSE alpha y<-1 free NA
x1<-visual/g x1<-visual x1 visual g FALSE beta y<-f free NA
x2<-visual/g x2<-visual x2 visual g FALSE beta y<-f free NA
x3<-visual/g x3<-visual x3 visual g FALSE beta y<-f free NA
x4<-visual/g x4<-visual x4 visual g FALSE beta y<-f pen NA
x5<-visual/g x5<-visual x5 visual g FALSE beta y<-f pen NA
x6<-visual/g x6<-visual x6 visual g FALSE beta y<-f pen NA
x7<-visual/g x7<-visual x7 visual g FALSE beta y<-f pen NA
x8<-visual/g x8<-visual x8 visual g FALSE beta y<-f pen NA
x9<-visual/g x9<-visual x9 visual g FALSE beta y<-f pen NA
x1<-textual/g x1<-textual x1 textual g FALSE beta y<-f pen NA
x2<-textual/g x2<-textual x2 textual g FALSE beta y<-f pen NA
x3<-textual/g x3<-textual x3 textual g FALSE beta y<-f pen NA
x4<-textual/g x4<-textual x4 textual g FALSE beta y<-f free NA
x5<-textual/g x5<-textual x5 textual g FALSE beta y<-f free NA
x6<-textual/g x6<-textual x6 textual g FALSE beta y<-f free NA
x7<-textual/g x7<-textual x7 textual g FALSE beta y<-f pen NA
x8<-textual/g x8<-textual x8 textual g FALSE beta y<-f pen NA
x9<-textual/g x9<-textual x9 textual g FALSE beta y<-f pen NA
x1<-speed/g x1<-speed x1 speed g FALSE beta y<-f pen NA
x2<-speed/g x2<-speed x2 speed g FALSE beta y<-f pen NA
x3<-speed/g x3<-speed x3 speed g FALSE beta y<-f pen NA
x4<-speed/g x4<-speed x4 speed g FALSE beta y<-f pen NA
x5<-speed/g x5<-speed x5 speed g FALSE beta y<-f pen NA
x6<-speed/g x6<-speed x6 speed g FALSE beta y<-f pen NA
x7<-speed/g x7<-speed x7 speed g FALSE beta y<-f free NA
x8<-speed/g x8<-speed x8 speed g FALSE beta y<-f free NA
x9<-speed/g x9<-speed x9 speed g FALSE beta y<-f free NA
visual<->visual/g visual<->visual visual visual g FALSE phi f<->f fixed 1
textual<->visual/g textual<->visual textual visual g FALSE phi f<->f free NA
speed<->visual/g speed<->visual speed visual g FALSE phi f<->f free NA
textual<->textual/g textual<->textual textual textual g FALSE phi f<->f fixed 1
speed<->textual/g speed<->textual speed textual g FALSE phi f<->f free NA
speed<->speed/g speed<->speed speed speed g FALSE phi f<->f fixed 1
x1<->x1/g x1<->x1 x1 x1 g FALSE phi y<->y free NA
x2<->x2/g x2<->x2 x2 x2 g FALSE phi y<->y free NA
x3<->x3/g x3<->x3 x3 x3 g FALSE phi y<->y free NA
x4<->x4/g x4<->x4 x4 x4 g FALSE phi y<->y free NA
x5<->x5/g x5<->x5 x5 x5 g FALSE phi y<->y free NA
x6<->x6/g x6<->x6 x6 x6 g FALSE phi y<->y free NA
x7<->x7/g x7<->x7 x7 x7 g FALSE phi y<->y free NA
x8<->x8/g x8<->x8 x8 x8 g FALSE phi y<->y free NA
x9<->x9/g x9<->x9 x9 x9 g FALSE phi y<->y free NA
label penalty set weight
x1<-1/g <NA> none 0 0
x2<-1/g <NA> none 0 0
x3<-1/g <NA> none 0 0
x4<-1/g <NA> none 0 0
x5<-1/g <NA> none 0 0
x6<-1/g <NA> none 0 0
x7<-1/g <NA> none 0 0
x8<-1/g <NA> none 0 0
x9<-1/g <NA> none 0 0
x1<-visual/g <NA> none 0 0
x2<-visual/g <NA> none 0 0
x3<-visual/g <NA> none 0 0
x4<-visual/g <NA> default 1 1
x5<-visual/g <NA> default 1 1
x6<-visual/g <NA> default 1 1
x7<-visual/g <NA> default 1 1
x8<-visual/g <NA> default 1 1
x9<-visual/g <NA> default 1 1
x1<-textual/g <NA> default 1 1
x2<-textual/g <NA> default 1 1
x3<-textual/g <NA> default 1 1
x4<-textual/g <NA> none 0 0
x5<-textual/g <NA> none 0 0
x6<-textual/g <NA> none 0 0
x7<-textual/g <NA> default 1 1
x8<-textual/g <NA> default 1 1
x9<-textual/g <NA> default 1 1
x1<-speed/g <NA> default 1 1
x2<-speed/g <NA> default 1 1
x3<-speed/g <NA> default 1 1
x4<-speed/g <NA> default 1 1
x5<-speed/g <NA> default 1 1
x6<-speed/g <NA> default 1 1
x7<-speed/g <NA> none 0 0
x8<-speed/g <NA> none 0 0
x9<-speed/g <NA> none 0 0
visual<->visual/g <NA> none 0 0
textual<->visual/g <NA> none 0 0
speed<->visual/g <NA> none 0 0
textual<->textual/g <NA> none 0 0
speed<->textual/g <NA> none 0 0
speed<->speed/g <NA> none 0 0
x1<->x1/g <NA> none 0 0
x2<->x2/g <NA> none 0 0
x3<->x3/g <NA> none 0 0
x4<->x4/g <NA> none 0 0
x5<->x5/g <NA> none 0 0
x6<->x6/g <NA> none 0 0
x7<->x7/g <NA> none 0 0
x8<->x8/g <NA> none 0 0
x9<->x9/g <NA> none 0 0
The row names show the coefficient names. The most two relevant
columns are type
which shows the type of the coefficient
and start
which gives the starting value. In
lslx
, many extract
related methods are
defined. extract
related methods can be used to extract
quantities stored in lslx
object. For details, please see
the section of Extract-Related Methods via ?lslx
.
After an lslx
object is initialized, method
$fit()
can be used to fit the specified model to the given
data.
lslx_fa$fit(penalty_method = "mcp",
lambda_grid = seq(.01, .60, .01),
delta_grid = c(1.5, 3.0, Inf))
The fitting process requires users to specify the penalty method
(argument penalty_method
) and the considered penalty levels
(argument lambda_grid
and delta_grid
). In this
example, the mcp
penalty is implemented on the lambda grid
seq(.01, .60, .01)
and delta grid
c(1.5, 3, Inf)
. Note that in this example
lambda = 0
is not considered because it may result in
unidentified model. All the fitting result will be stored in the
fitting
field of lslx_fa
.
Unlike traditional SEM analysis, lslx
fits the model
into data under all the penalty levels considered. To summarize the
fitting result, a selector to determine an optimal penalty level must be
specified. Available selectors can be found in the section of Penalty
Level Selection via ?lslx
. The following code summarize the
fitting result under the penalty level selected by Bayesian information
criterion (BIC).
General Information
number of observations 301
number of complete observations 301
number of missing patterns none
number of groups 1
number of responses 9
number of factors 3
number of free coefficients 30
number of penalized coefficients 18
Numerical Conditions
selected lambda 0.140
selected delta 1.500
selected step none
objective value 0.158
objective gradient absolute maximum 0.001
objective Hessian convexity 0.593
number of iterations 8.000
loss value 0.103
number of non-zero coefficients 34.000
degrees of freedom 20.000
robust degrees of freedom 20.640
scaling factor 1.032
Fit Indices
root mean square error of approximation (rmsea) 0.043
comparative fit index (cfi) 0.988
non-normed fit index (nnfi) 0.978
standardized root mean of residual (srmr) 0.030
Likelihood Ratio Test
statistic df p-value
unadjusted 30.903 20.000 0.056
mean-adjusted 29.945 20.000 0.071
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.043 0.000 0.075
mean-adjusted 0.041 0.000 0.075
Coefficient Test (Std.Error = "sandwich")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-visual free 0.709 0.095 7.477 0.000 0.523 0.895
x2<-visual free 0.555 0.082 6.798 0.000 0.395 0.715
x3<-visual free 0.744 0.070 10.576 0.000 0.606 0.882
x4<-visual pen 0.000 - - - - -
x5<-visual pen -0.102 0.073 -1.411 0.158 -0.245 0.040
x6<-visual pen 0.000 - - - - -
x7<-visual pen -0.254 0.118 -2.147 0.032 -0.486 -0.022
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.323 0.075 4.287 0.000 0.175 0.471
x1<-textual pen 0.255 0.081 3.136 0.002 0.096 0.415
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.987 0.061 16.133 0.000 0.867 1.106
x5<-textual free 1.143 0.062 18.583 0.000 1.023 1.264
x6<-textual free 0.913 0.058 15.726 0.000 0.799 1.027
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.824 0.105 7.817 0.000 0.617 1.031
x8<-speed free 0.731 0.069 10.603 0.000 0.596 0.866
x9<-speed free 0.499 0.063 7.960 0.000 0.376 0.622
Covariance
type estimate std.error z-value P(>|z|) lower upper
textual<->visual free 0.325 0.086 3.767 0.000 0.156 0.494
speed<->visual free 0.374 0.100 3.729 0.000 0.177 0.570
speed<->textual free 0.278 0.078 3.566 0.000 0.125 0.431
Variance
type estimate std.error z-value P(>|z|) lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.671 0.113 5.950 0.000 0.450 0.892
x2<->x2 free 1.073 0.106 10.165 0.000 0.866 1.280
x3<->x3 free 0.719 0.091 7.943 0.000 0.542 0.896
x4<->x4 free 0.377 0.050 7.507 0.000 0.279 0.476
x5<->x5 free 0.415 0.060 6.869 0.000 0.297 0.534
x6<->x6 free 0.362 0.046 7.904 0.000 0.273 0.452
x7<->x7 free 0.596 0.109 5.464 0.000 0.382 0.810
x8<->x8 free 0.487 0.084 5.772 0.000 0.322 0.653
x9<->x9 free 0.540 0.064 8.431 0.000 0.414 0.666
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.341 0.074 58.452 0.000 4.195 4.486
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.374 0.058 92.546 0.000 5.260 5.488
In this example, we can see that most penalized coefficients are
estimated as zero under the selected penalty level except for
x9<-visual
, which shows the benefit of using the
semi-confirmatory approach. The summarize
method also shows
the result of significance tests for the coefficients. In
lslx
, the default standard errors are calculated based on
sandwich formula whenever raw data is available. It is generally valid
even when the model is misspecified and the data is not normal. However,
it may not be valid after selecting an optimal penalty level.
lslx
provides four methods for visualizing the fitting
results. The method $plot_numerical_condition()
shows the
numerical condition under all the penalty levels. The following code
plots the values of n_iter_out
(number of iterations in
outer loop), objective_gradient_abs_max
(maximum of
absolute value of gradient of objective function), and
objective_hessian_convexity
(minimum of univariate
approximate hessian). The plot can be used to evaluate the quality of
numerical optimization.
The method $plot_information_criterion()
shows the
values of information criteria under all the penalty levels.
The method $plot_fit_index()
shows the values of fit
indices under all the penalty levels.
The method $plot_coefficient()
shows the solution path
of coefficients in the given block. The following code plots the
solution paths of all coefficients in the block y<-f
,
which contains all the regression coefficients from latent factors to
observed variables (i.e., factor loadings).
In lslx
, many quantities related to SEM can be extracted
by extract-related method. For example, the loading matrix can be
obtained by
$g
visual textual speed
x1 0.709 0.255 0.000
x2 0.555 0.000 0.000
x3 0.744 0.000 0.000
x4 0.000 0.987 0.000
x5 -0.102 1.143 0.000
x6 0.000 0.913 0.000
x7 -0.254 0.000 0.824
x8 0.000 0.000 0.731
x9 0.323 0.000 0.499
The model-implied covariance matrix and residual matrix can be obtained by
$g
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 1.3560 0.4393 0.5892 0.479 0.474 0.444 0.0755 0.246 0.423
x2 0.4393 1.3806 0.4129 0.178 0.149 0.165 0.0299 0.152 0.283
x3 0.5892 0.4129 1.2728 0.239 0.200 0.221 0.0401 0.203 0.379
x4 0.4792 0.1779 0.2386 1.350 1.095 0.901 0.1444 0.200 0.240
x5 0.4742 0.1493 0.2002 1.095 1.657 1.013 0.1618 0.204 0.226
x6 0.4435 0.1646 0.2208 0.901 1.013 1.196 0.1336 0.185 0.222
x7 0.0755 0.0299 0.0401 0.144 0.162 0.134 1.1832 0.533 0.381
x8 0.2456 0.1516 0.2034 0.200 0.204 0.185 0.5327 1.021 0.453
x9 0.4235 0.2828 0.3792 0.240 0.226 0.222 0.3812 0.453 1.014
$g
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 -0.00233 0.03195 0.00934 -0.025600 0.03358 -0.01127 -9.24e-03 -0.018274 -0.03488
x2 0.03195 -0.00115 -0.03818 -0.031028 -0.06179 -0.08290 1.27e-01 0.041960 0.03874
x3 0.00934 -0.03818 -0.00202 0.030433 0.08795 -0.02328 -4.83e-02 -0.008963 0.00539
x4 -0.02560 -0.03103 0.03043 -0.000307 -0.00284 0.00524 -7.54e-02 0.074751 -0.00298
x5 0.03358 -0.06179 0.08795 -0.002840 -0.00319 -0.00118 1.88e-02 0.023548 -0.06891
x6 -0.01127 -0.08290 -0.02328 0.005237 -0.00118 -0.00027 -1.05e-02 0.019988 -0.01352
x7 -0.00924 0.12662 -0.04829 -0.075384 0.01876 -0.01047 2.54e-05 -0.002561 0.00787
x8 -0.01827 0.04196 -0.00896 0.074751 0.02355 0.01999 -2.56e-03 -0.000536 -0.00433
x9 -0.03488 0.03874 0.00539 -0.002981 -0.06891 -0.01352 7.87e-03 -0.004329 -0.00102
plsem()
and S3
MethodsAfter version 0.6.3, the previous analysis can be equivalently
conducted by plsem()
with lavaan
style model
syntax.
model_fa <- "visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
pen() * visual =~ x4 + x5 + x6 + x7 + x8 + x9
pen() * textual =~ x1 + x2 + x3 + x7 + x8 + x9
pen() * speed =~ x1 + x2 + x3 + x4 + x5 + x6
visual ~~ 1 * visual
textual ~~ 1 * textual
speed ~~ 1 * speed"
lslx_fa <- plsem(model = model_fa,
data = lavaan::HolzingerSwineford1939,
penalty_method = "mcp",
lambda_grid = seq(.01, .60, .01),
delta_grid = c(1.5, 3.0, Inf))
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
General Information
number of observations 301
number of complete observations 301
number of missing patterns none
number of groups 1
number of responses 9
number of factors 3
number of free coefficients 30
number of penalized coefficients 18
Numerical Conditions
selected lambda 0.140
selected delta 1.500
selected step none
objective value 0.158
objective gradient absolute maximum 0.001
objective Hessian convexity 0.593
number of iterations 8.000
loss value 0.103
number of non-zero coefficients 34.000
degrees of freedom 20.000
robust degrees of freedom 20.639
scaling factor 1.032
Fit Indices
root mean square error of approximation (rmsea) 0.043
comparative fit index (cfi) 0.988
non-normed fit index (nnfi) 0.978
standardized root mean of residual (srmr) 0.030
Likelihood Ratio Test
statistic df p-value
unadjusted 30.902 20.000 0.057
mean-adjusted 29.945 20.000 0.071
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.043 0.000 0.075
mean-adjusted 0.041 0.000 0.075
Coefficient Test (Std.Error = "sandwich")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-visual free 0.709 0.095 7.478 0.000 0.523 0.895
x2<-visual free 0.555 0.082 6.797 0.000 0.395 0.715
x3<-visual free 0.744 0.070 10.574 0.000 0.606 0.882
x4<-visual pen 0.000 - - - - -
x5<-visual pen -0.103 0.073 -1.413 0.158 -0.245 0.040
x6<-visual pen 0.000 - - - - -
x7<-visual pen -0.254 0.118 -2.146 0.032 -0.486 -0.022
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.323 0.075 4.290 0.000 0.176 0.471
x1<-textual pen 0.255 0.081 3.136 0.002 0.096 0.415
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.987 0.061 16.133 0.000 0.867 1.106
x5<-textual free 1.143 0.062 18.583 0.000 1.023 1.264
x6<-textual free 0.913 0.058 15.726 0.000 0.799 1.027
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.824 0.105 7.815 0.000 0.617 1.030
x8<-speed free 0.731 0.069 10.607 0.000 0.596 0.866
x9<-speed free 0.499 0.063 7.960 0.000 0.376 0.622
Covariance
type estimate std.error z-value P(>|z|) lower upper
textual<->visual free 0.325 0.086 3.766 0.000 0.156 0.494
speed<->visual free 0.374 0.100 3.733 0.000 0.178 0.570
speed<->textual free 0.278 0.078 3.565 0.000 0.125 0.431
Variance
type estimate std.error z-value P(>|z|) lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.671 0.113 5.950 0.000 0.450 0.892
x2<->x2 free 1.073 0.106 10.164 0.000 0.866 1.280
x3<->x3 free 0.719 0.091 7.945 0.000 0.542 0.897
x4<->x4 free 0.377 0.050 7.507 0.000 0.279 0.476
x5<->x5 free 0.415 0.060 6.869 0.000 0.297 0.534
x6<->x6 free 0.362 0.046 7.903 0.000 0.273 0.452
x7<->x7 free 0.596 0.109 5.466 0.000 0.382 0.810
x8<->x8 free 0.487 0.084 5.769 0.000 0.322 0.653
x9<->x9 free 0.540 0.064 8.430 0.000 0.414 0.666
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.341 0.074 58.452 0.000 4.195 4.486
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.374 0.058 92.546 0.000 5.260 5.488