TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Estimation of factor models (4:47)
This video briefly explains how
confirmatory factor analysis models are estimated and what to do when
models may not fit the observed data.
Click to view transcript
We will now take a look at estimation of factor models, and particularly the confirmatory factor analysis model. This is important to understand because sometimes your factor analysis results indicate that the model doesn't fit the data, that is indicated by the chi-square statistic, then you have to understand what to do. And to understand what to do, you have to understand what the factor analysis actually does and what kind of relationships it models in the data.
So let's take a look at how confirmatory factor analysis models are estimated. The idea in confirmatory factor analysis model estimation is that you apply tracing rules. So this is the same thing that you apply in mediation models or in regression models if you estimate it from a correlation matrix. We have a factor model here and we can specify that the correlations between a1 and a2, a1 and b1, and a1 with itself - which is the variance - are functions of these model parameters. We use the Phi letter - Greek letter Phi - for factor correlation, that's a convention, and then we use lambda for factor loading, that's also a convention. And all these lambda are different lambdas. So they have different values.
So correlation between a1 and a2 is whatever different paths. We can go from a1 to a2, so we can go up here and then we go down and that's one path. And there are no other paths from a1 to a2, so we multiply everything along the way. So we have one factor loading and then we have another factor loading and that's the lambda a1, lambda a2. That's the correlation a1a2, assuming that these are standardized estimates.
Then a1b1 is calculated similarly. The path is we take from a1 to A, then we take the correlation, and then we take B to b1. So that's the correlation with a1 and b1. For the variation of a1, we have two different ways to go somewhere and come back. So we can go to A and come back and we're going to go to the error term e and come back. So that's the variance of A. And how we estimate this model again is that then we calculate a model implied correlation with all indicators and we try to adjust the model so that the correlations match the observed data.
Here we have positive degrees of freedom. So we are estimating altogether 13 different things from the data. So we have six factor loadings, we have six error terms and then we have one correlation. So six plus six plus one is 13, and we have 21 units of information because we have 21 unique elements in a correlation matrix of 6 indicators. So we have 6 variances and then we have 15 unique correlations. So these don't count because they're not unique. The degree of freedom is 8, which means that we have a positive degrees of freedom and the model is then overestimated or over identified. That means that we cannot typically solve it exactly. So we cannot find a set of model implied correlations for these correlations so that every correlation would match the observed correlation.
So we cannot solve it. We have to just find a way to quantify the difference between the implied correlation and that observed correlation. We could take a sum of squares, which would be the unweighted least squares estimator. Typically we take a weighted sum of these implied correlations minus the observed correlations, and a particular set of weights produces the maximum likelihood estimator for this particular model. So the idea is that we find the model parameters so that the implied correlations are as close to the observed correlations as possible. To do that, there are some other things that we need to consider before you can actually estimate the model. That relates to identification and scale setting that I'll describe in the next video.