The structural model

The structural model specifies the relationships between constructs (i.e., the statistical representation of a concept) via paths (arrows) and associated path coefficients. The path coefficients - sometimes also called structural coefficients - express the magnitude of the influence exerted by the construct at the start of the arrow on the variable at the arrow’s end. In composite-based SEM constructs are always operationalized (not modeled!!) as composites, i.e., weighted linear combinations of its respective indicators. Consequently, depending on how a given construct is modeled, such a composite may either serve as a proxy for an underlying latent variable (common factor) or as a composite in its own right. Despite this crucial difference, we stick with the common - although somewhat ambivalent - notation and represent both the construct and the latent variable (which is only a possible construct) by η\eta. Let xkjx_{kj}(k=1,,Kj)(k = 1,\dots, K_j) be an indicator (observable) belonging to construct ηj\eta_j(j=1,J)(j = 1\dots, J) and wkjw_{kj} be a weight. A composite is definied as: η̂j=k=1Kjwkjxkj\hat{\eta}_j = \sum^{K_j}_{k = 1} w_{kj} x_{kj} Again, η̂j\hat{\eta}_j may represent a latent variable ηj\eta_j but may also serve as composite in its own right in which case we would essentially say that
η̂j=ηj\hat{\eta}_j = \eta_j and refer to ηj\eta_j as a construct instead of a latent variable. Since η̂j\hat{\eta}_j generally does not have a natural scale, weights are usually chosen such that η̂j\hat{\eta}_j is standardized. Therefore, unless otherwise stated:

E(η̂j)=0andVar(η̂j)=E(η̂j2)=1E(\hat\eta_j) = 0\quad\quad \text{and}\quad\quad Var(\hat\eta_j) = E(\hat\eta^2_j) = 1

Since the relations between concepts(or its statistical sibbling the constructs) are a product of the researcher’s theory and assumptions to be analyzed, some constructs are typically not directly connected by a path. Technically this implies a restriction of the path between construct jj and ii to zero. If all constructs of the reserchers model are connected by a path we call the structural model saturated. If at least one path is restricted to zero, the structural model is called non-saturated.

The reflective measurement model

Define the general reflective (congeneric) measurement model as: xkj=ηkj+εkj=λkjηj+εkjfork=1,,Kjandj=1,,J x_{kj} = \eta_{kj} + \varepsilon_{kj} = \lambda_{kj}\eta_j + \varepsilon_{kj}\quad\text{for}\quad k = 1, \dots, K_j\quad\text{and}\quad j = 1, \dots, J

Call ηkj=λkjηj\eta_{kj} = \lambda_{kj}\eta_j the (indicator) true/population score and ηj\eta_j the underlying latent variable supposed to be the common factor or cause of the KjK_j indicators connected to latent variable ηj\eta_j. Call λkj\lambda_{kj} the loading or direct effect of the latent variable on its indicator. Let xkjx_{kj} be an indicator (observable), εkj\varepsilon_{kj} be a measurement error and
η̂j=k=1Kjwkjxkj=k=1Kjwkjηkj+k=1Kjwkjεkj=ηj+εj=ηjk=1KJwkjλkj+εkj,\hat{\eta}_j = \sum^{K_j}_{k = 1} w_{kj} x_{kj} = \sum^{K_j}_{k = 1} w_{kj} \eta_{kj} + \sum^{K_j}_{k = 1} w_{kj} \varepsilon_{kj} = \bar\eta_{j} + \bar\varepsilon_{j} = \eta_j\sum_{k=1}^{K_J}w_{kj}\lambda_{kj} + \bar\varepsilon_{kj}, be a proxy/test score/composite/stand-in for/of ηj\eta_j based on a weighted sum of observables, where wkjw_{kj} is a weight to be determined and ηj\bar\eta_j the proxy true score, i.e., a weighted sum of (indicator) true scores. Note the distinction between what we refer to as the indicator true score ηkj\eta_{kj} and the proxy true score which is the true score for η̂j\hat\eta_j (i.e, the true score of a score that is in fact a linear combination of (indicator) scores!).

We will usually refer to η̂j\hat\eta_j as a proxy for ηj\eta_j as it stresses the fact that η̂j\hat\eta_j is generally not the same as ηj\eta_j unless εj=0\bar\varepsilon_{j} = 0 and k=1KJwkjλkj=1\sum_{k=1}^{K_J}w_{kj}\lambda_{kj} = 1.

Assume that E(εkj)=E(ηj)=Cov(ηj,εkj)=0E(\varepsilon_{kj}) = E(\eta_j) = Cov(\eta_j, \varepsilon_{kj}) = 0. Further assume that Var(ηj)=E(ηj2)=1Var(\eta_j) = E(\eta^2_j) = 1 to determine the scale.

It often suffices to look at a generic test score/latent variable. For the sake of clarity the index jj is therefore dropped unless it is necessary to avoid confusion.

Note that most of the classical literature on quality criteria such as reliability is centered around the idea that the proxy η̂\hat\eta is a in fact a simple sum score which implies that all weighs are set to one. Treatment is more general here since η̂\hat{\eta} is allowed to be any weighted sum of related indicators. Readers familiar with the “classical treatment” may simply set weights to one (unit weights) to “translate” results to known formulae.

Based on the assumptions and definitions above the following quantities necessarily follow:

$$ Cov(xk,η)=λkVar(ηk)=λk2Var(xk)=λk2+Var(εk)Cor(xk,η)=ρxk,η=λkVar(xk)Cov(ηk,ηl)=Cor(ηk,ηl)=E(ηkηl)=λkλlE(η2)=λkλlCov(xk,xl)=λkλlE(η2)+λkE(ηεk)+λlE(ηεl)+E(εkεl)=λkλl+δklCor(xk,xl)=λkλl+δklVar(xk)Var(xl)Var(η)=E(η2)=wk2λk2+2k<lwkwlλkλl=(wkλk)2=(𝐰𝛌)2Var(ε)=E(ε2)=wk2E(εk2)+2k<lwkwlE(εkεl)Var(η̂)=E(η̂2)=wk2(λk2+Var(εk))+2k<lwkwl(λkλl+δkl)=wk2λk2+2k<lwkwlλkλl+wk2Var(εk)+2k<lwkwlδkl=Var(η)+Var(ε)=(𝐰𝛌)2+Var(ε)=𝐰𝚺𝐰Cov(η,η̂)=E(wkλkη2)=wkλk=𝐰𝛌=Var(η)\begin{align} Cov(x_k, \eta) &= \lambda_k \\ Var(\eta_k) &= \lambda^2_k \\ Var(x_k) &= \lambda^2_k + Var(\varepsilon_k) \\ Cor(x_k, \eta) &= \rho_{x_k, \eta} = \frac{\lambda_k}{\sqrt{Var(x_k)}} \\ Cov(\eta_k, \eta_l) &= Cor(\eta_k, \eta_l) = E(\eta_k\eta_l) = \lambda_k\lambda_lE(\eta^2) = \lambda_k\lambda_l \\ Cov(x_k, x_l) &= \lambda_k\lambda_lE(\eta^2) + \lambda_kE(\eta\varepsilon_k) + \lambda_lE(\eta\varepsilon_l) + E(\varepsilon_k\varepsilon_l) = \lambda_k\lambda_l + \delta_{kl} \\ Cor(x_k, x_l) &= \frac{\lambda_k\lambda_l + \delta_{kl}}{\sqrt{Var(x_k)Var(x_l)}} \\ Var(\bar\eta) &= E(\bar\eta^2) = \sum w_k^2\lambda^2_k + 2\sum_{k < l} w_k w_l \lambda_k\lambda_l = \left(\sum w_k\lambda_k \right)^2 = (\boldsymbol{\mathbf{w}}'\boldsymbol{\mathbf{\lambda}})^2 \\ Var(\bar\varepsilon) &= E(\bar\varepsilon^2) = \sum w_k^2E(\varepsilon_k^2) + 2\sum_{k < l} w_k w_lE(\varepsilon_k\varepsilon_l)\\ Var(\hat\eta) &= E(\hat\eta^2) = \sum w_k^2(\lambda^2_k + Var(\varepsilon_k)) + 2\sum_{k < l} w_k w_l (\lambda_k\lambda_l + \delta_{kl}) \\ &= \sum w_k^2\lambda^2_k + 2\sum_{k < l} w_k w_l \lambda_k\lambda_l + \sum w_k^2Var(\varepsilon_k) + 2\sum_{k < l} w_k w_l \delta_{kl} \\ &=Var(\bar\eta) + Var(\bar\varepsilon) = (\boldsymbol{\mathbf{w}}'\boldsymbol{\mathbf{\lambda}})^2 + Var(\bar\varepsilon) = \boldsymbol{\mathbf{w}}'\boldsymbol{\mathbf{\Sigma}}\boldsymbol{\mathbf{w}} \\ Cov(\eta, \hat\eta) &= E\left(\sum w_k \lambda_k \eta^2\right) = \sum w_k\lambda_k = \boldsymbol{\mathbf{w}}'\boldsymbol{\mathbf{\lambda}}= \sqrt{Var(\bar\eta)} \end{align} $$

where δkl=Cov(εk,εl)\delta_{kl} = Cov(\varepsilon_{k}, \varepsilon_{l}) for klk \neq l is the measurement error covariance and 𝚺\boldsymbol{\mathbf{\Sigma}} is the indicator variance-covariance matrix implied by the measurement model:

𝚺=(λ12+Var(ε1)λ1λ2+δ12λ1λK+δ1Kλ2λ1+δ21λ22+Var(ε2)λ2λK+δ1KλKλ1+δK1λKλ2+δK2λK2+Var(εK)) \boldsymbol{\mathbf{\Sigma }}= \begin{pmatrix} \lambda^2_1 + Var(\varepsilon_1) & \lambda_1\lambda_2 + \delta_{12} & \dots & \lambda_1\lambda_K + \delta_{1K} \\ \lambda_2\lambda_ 1 + \delta_{21} & \lambda^2_2 + Var(\varepsilon_2) & \dots & \lambda_2\lambda_K +\delta_{1K} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{K}\lambda_1 + \delta_{K1} & \lambda_K\lambda_2 + \delta_{K2} &\dots &\lambda^2_K + Var(\varepsilon_K) \end{pmatrix}

In cSEM indicators are always standardized and weights are always appropriately scaled such that the variance of η̂\hat\eta is equal to one. Furthermore, unless explicitly specified measurement error covariance is restricted to zero. As a consequence, it necessarily follows that:

Var(xk)=1Cov(xk,η)=Cor(xk,η)Cov(xk,xl)=Cor(xk,xl)Var(η̂)=𝐰𝚺𝐰=1Var(εk)=1Var(ηk)=1λk2Cov(εk,εl)=0Var(ε)=wk2(1λk2) \begin{align} Var(x_k) &= 1 \\ Cov(x_k, \eta) &= Cor(x_k, \eta) \\ Cov(x_k, x_l) &= Cor(x_k, x_l) \\ Var(\hat\eta) &= \boldsymbol{\mathbf{w}}'\boldsymbol{\mathbf{\Sigma}}\boldsymbol{\mathbf{w}} = 1 \\ Var(\varepsilon_k) &= 1 - Var(\eta_k) = 1 - \lambda^2_k \\ Cov(\varepsilon_k, \varepsilon_l) &= 0 \\ Var(\bar\varepsilon) &= \sum w_k^2 (1 - \lambda_k^2) \end{align} For most formulae this implies a significant simplification, however, for ease of comparison to extant literature formulae we stick with the “general form” here but mention the “simplified form” or “cSEM form” in the Methods and Formula sections.

Notation table

Symbol Dimension Description
xkjx_{kj} (1×1)(1 \times 1) The kk’th indicator of construct jj
ηkj\eta_{kj} (1×1)(1 \times 1) The kk’th (indicator) true score related to construct jj
ηj\eta_j (1×1)(1 \times 1) The jj’th common factor/latent variable
λkj\lambda_{kj} (1×1)(1 \times 1) The kk’th (standardized) loading or direct effect of ηj\eta_j on xkjx_{kj}
εkj\varepsilon_{kj} (1×1)(1 \times 1) The kk’th measurement error or error score
η̂j\hat\eta_j (1×1)(1 \times 1) The jj’th test score/composite/proxy for ηj\eta_j
wkjw_{kj} (1×1)(1 \times 1) The kk’th weight
ηj\bar\eta_j (1×1)(1 \times 1) The jj’th (proxy) true score, i.e. the weighted sum of (indicator) true scores
δkl\delta_{kl} (1×1)(1 \times 1) The covariance between the kk’th and the ll’th measurement error
𝐰\boldsymbol{\mathbf{w}} (K×1)(K \times 1) A vector of weights
𝛌\boldsymbol{\mathbf{\lambda}} (K×1)(K \times 1) A vector of loadings