TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Diagnostics after CFA (8:14)
This video goes through diagnostics after CFA. The video explains how after statistical analysis we usually always have to do some sort of diagnostics on the results before the results can be trusted. The video also goes through situations where the model does not fit as well as the modification indices.
Click to view transcript
After a statistical analysis you will nearly always
have to do some kind of diagnostics for the results before you can trust them. In confirmatory factor analysis the most important
diagnostic information is the chi-square statistic and when you have a
chi-square that is significant - it indicates that the model did not reproduce
the empirical correlation matrix completely. It means that the model doesn't
really explain every part of the data well enough that the residuals can be
attributed to the chance only. So in this case I estimated same data set as in the
empirical example but I specified the factor model that had some factor
correlations that were constrained to be zero. The chi-square detects that the
correlations were not actually zero in the population. Therefore it rejects the
model. So what do we do? It's actually very common that your
chi-square statistic rejects the model. So you can't conclude that everything
is well. You have to then again understand why that occurs. So you have to do some diagnostics. There are two main
ways of doing diagnostics for confirmatory factor analysis in an exploratory
manner. So the exploratory manner means that you don't have any prior
hypothesis of what is incorrect. The first approach is modification indices. I said
earlier that your software could indicate that if you add a correlation between
two error terms then that will indicate that - that will improve the fit of the
model. It will make the chi-square smaller and we hope non-significant. The idea of modification indices is that the computer
calculates things that you can add to your model to make it better. That should
not be done mindlessly. Mesquito and Lazzari give a good example of how to
report these modification indices. First of all they report what is the purpose of this
indices. So the purpose of this indices is that you can make the model
reproduce the correlation matrix better by adding something to the model. Then
they found - then you explain what you do. So they add some stuff and they add some other
stuff. So is that justified? Well every time when you do a
change to your model it has to be justified based on your theory. For example
if we have these six indicators and we have a modification indices that
indicates that these error terms should be correlated, then we have to explain
what the correlation means. For example if we have indicators of innovativeness
indicators about productivity we could say that this indicator also measures
something about personnel and this measures about something about personnel as
well. So these indicators have this personnel dimension and therefor we say
that their errors should be correlated. The first structural regression model course that I
took the instructor told us that when you see modification index then unless it
gives you this kind of aha-moment then you shouldn't add anything to your model. So the
modification index is only something that tells you that this is a part that you should
consider. Then it's up to you to decide whether it makes sense.
The idea of factor analysis model is not to produce the date perfectly - the
idea is to have a theoretical presentation of the process that could have
caused your data and it's also possible that factor analysis simply says that
no you're data don't measure the things you want - you say they do measure and
that's a result. So every modification must be done based on theory.
Another way of doing this is looking at the residuals. So we have residual
correlations which is the difference with the implied matrix and the observed
correlation matrix or covariance matrix. Here are the residuals for the full model. So there
are two things that we need to check. First is the overall distribution of
these residuals. Turns out that if the model is correctly specified these
residual correlations are normally distributed with the mean zero and we can
see here that we have this bump here on the right hand side of the tail so that
indicates miss-specification and this tail also indicates - because there's bump in it - it indicates there's local
miss-specification. So there is some part of the model that is incorrectly
specified. It's mostly ok. So most of these correlations are close to zero but
there are some parts this bump here - big bump and smaller bump - then indicate
that there are parts where the model doesn't reproduce the data. Then it's up
to us to look at the residuals and see where are the high values. We can see here that one block of items here - the
vertical covernance or horizontal covernance indicators correlate much more
than the model implies. Then we have to look at the model and then think ok so
we have an implied correlation of let's say zero, so why is it zero in the
implied correlation matrix. That relates back to the tracing rules. So what in the
model predicts the correlation? In this case I constraint these two factors to
be uncorrelated and that caused these residuals to go up and it indicates the
model is miss-specified because there horizontal and vertical are actually
quite highly correlated. Another thing is that we can find that these are -
these high values also single indicator factors - I constrained that to be
uncorrelated with other factors as well. So that way you can look at the residuals and look
which correlation the model doesn't explain well and then you think ok so why -
what influences that correlation in your model? Is that part of your model
correct? This requires a bit more expertise than just doing the modification indices. But the problem with
the modification indices is that sometimes the modification indices don't make
any sense at all. And it's easier to do nonsensical decision using the
modification indices than it's using the residuals. So the way I do diagnostics
is that I usually quickly take the modification indices if my model doesn't fit
well and then I print out the residuals. Also it may make sense to print out a
part of these residuals. So after - this is a big matrix so going through it
one by one is difficult but once you have identified the segment of the matrix
where you have larger values then you can fit a submodel. So for example we
could only fit the model with horizontal covernance and vertical covernance and
then maybe one other factor. So the way to do diagnostics is that if a full model
doesn't work then you start doing submodels. So can you get smaller model work
- drop something from the model and then if it works then you know that
something that you drop from the model was the reason why it didn't work. Then you can look at the part that you dropped or
split the model into two and then do diagnostics for first part. Once you are
happy with that then do it for the second part. Once you are happy with that
then do it for the full model. It's a good idea - good engineering principle is that
once you have big system that doesn't work, start looking at individual parts
and then figure out which of those parts don't work and whether it can be fixed
and only after verifying all the parts, then you look at the whole because
looking at the big correlation matrix is very difficult to do.