TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Structural regression models (15:38)
What are structural equation
models? How do they differ from regression with sum scales? What are the
benefits of SEMs and why should you use them? What should you know
before trying to apply SEMs?
Click to view transcript
This video explains the basic idea of structural regression models that are sometimes referred to as structural equation models in the literature. What is a structural regression model? This technique is used in for example the Mesquita and Lazzarini's paper. They explain the technique that it's a combination of a factor analysis and a path analysis. Path analysis is basically the regression analysis where there are multiple equations for example when you do a mediation model using the simultaneous equations approach that will be called a path analysis. So path analysis is regression with opposite variables except that you have more than one dependent variables and factor analysis is the analysis where we check what different indicators have in common and perhaps whether we can group those indicators and consider them as measures of the same concept.
So SEM or structural regression model combines these two analysis approaches. To understand what SEM is and what it does we can start with the basic regression analysis model. So the basic regression analysis model makes the important assumption that the X 1 and X 2 here are measured without any measurement error. So the X 1 and X 2 are the quantities of interest in terms instead of being measures of the quantities of interest. So X 1 is of direct interest instead of being a measure with possibly some error in there of some concept that we can't measure - observe directly.So regression analysis makes that assumption if that assumption of no measurement error fails these regression coefficients beta 1 and beta 2 will be inconsistent and biased. Then we have the factor analysis model. The idea of a factor analysis model was that we have a set of indicators and then we ask what these indicators have in common and what they have in common is one factor.
In confirmatory factor analysis we ask do these indicators represent one factor or not. The computer gives us an answer in an exploratory analysis, which is not part of structural regression model, the computer finds the factors.
So we define a factor structure here and then we estimate it. So that's part of structural regression model.
The idea of structural regression model is that we take these variables - these analysis approaches and we combine them. So we have a regression analysis model here where instead of having the indicators that are possibly contaminated with measurement error we model regression between latent variables X 1 X 2 and Y and then we add the factor analysis directly to the model.
So we have a combination of factor analysis and regression analysis between the factors in the factor analysis. This is a clearly more complicated concept than simply applying regression analysis on scales course This model has two parts. This inner part here with the latent variables is referred to as the latent variable model. Some people call this a part of the model as the structural model but that's a bit misleading because these measurement relationships here are also equally structural in terms that they have theoretical causal interpretations.
Then the outer part linking the measures to the factors is called measurement model and this is uniformly accepted definition. So whenever anyone speaks about or talks about measurement model it means the part that links the latent variables to their indicators. So that's a big model and it's a complicated model. The question is - this is clearly more complicated than taking a sum of indicators and using regression analysis - so why would you want to use a more complicated analysis approach? The structural regression model approach has a couple of advantages over regression analysis with scale scores. Let's take a look at this example. So we have these concepts A and B represented by these two latent variables and then we have indicators here. The indicators variances here consist of variance due to the concept A and variance due to the concept B plus all these different sources of measurement error values. So we have random noise E and then we have some item uniqueness here that is not related to the concept B or A that these indicators are supposed to measure.
When we take a sum of these indicators of A sum of these indicators of B then all the sources of variation including the measurement errors will be in the sum. So we just take everything together - we take a sum and we have this combination of mostly variation of interest but also some variation that is not of interest. When we estimate this regression coefficient beta here then the estimate will be too small you'll be are attenuated and it's going to be inconsistent and biased. So what can SCM bring us that will help with this problem?The idea of SCM or structural regression model is that instead of taking sum of the indicators we estimate the factor model and a regression analysis between the factors. So the idea of a confirmatory factor analysis was that you take the variation of these indicators apart so for example the b1 b2 and b3 indicators variation is modeled as being due to the factor here and also due to these measurement error components here. Because we have now these factors that are pursued to be free of measurement error - the correlation between the factors the beta is going to be correct. The advantage is that structure regression or a structural equation model corrects for measurement error. This correction comes with certain assumptions that I will explain a bit later in this video but that is the basic idea if your model is correct then measurement error is controlled for.
The practical outcome is presented here. So this is a paper - from a paper that I've written - and we simulated a data set from two concepts that we were measuring each with three indicators and so we have six indicators together total. We take a sum of the first three indicators. We take a sum of the indicators 4 5 or 6 and we calculate the correlation between those two sums. We vary how much the concepts correlate in the population. We varied between zero point zero to zero point six and then we replicate this analysis 300 times. We estimate the correlation between using SEM or using sum's scales sum of the indicators and regression analysis. We can see here clearly then when we take a sum of the indicators and when we apply regression analysis regardless of whether we take a sum of indicators or we use weights that maximize the reliability of the indicators. There is not much difference. These correlations here will be too small because there's anyway measurement error ending up in the sum of those scale items.
In SEM - because we model not a sum correlation between two sums but the correlation between two factors - this effect is unbiased. So we can see that it the effect here - the estimates here are correct so that's the true value here and it's roughly equally - they are roughly normally distributed around the true value. So SEM provides you this small advantage in precision and that's a good thing if you can apply it well. There is also another advantage in SEM that I have demonstrated in the earlier videos and it's testing the model. So we had the confirmatory factor analysis example model. We have the chi-square test that tells whether the factor model fits the data if it doesn't you have to do diagnostics and then we have the mediation example. We'll also have the chi-square test that tells whether the full mediation model fits the data well or not.
The idea of the chi-square test again is to test if the constraints implied by the model are close enough to the correlations in the data so that we can say that these differences here are only due to chance only. And that we want it here to not reject the null hypothesis because rejecting the null hypothesis that these discrepancies in the implied correlation - observed correlations - are due to chance only means that we have to declare or we have to conclude that the model is not correctly specified and we need to do some diagnostics to understand why. So this is the second advantage in structure regression models. It allows you to test whether the model fits the data. Regression analysis doesn't allow you to test the model. It only allows you to assess how much the model explains the data. It doesn't allow you testing whether the model is correct. So that's the second big advantage.
There are also other advantages in SEM such as we can model relationships that go into both ways. So reciprocal causation for example but that's more advanced and these are the reasons why people typically apply structural regression models or SEMs instead of regression with sum scales. There is this slippery slope to SEM. So whenever you have a scale with multiple items you should apply a factor analysis. So every time you have an a survey instrument for example you get data then you run a factor analysis. That's a - you must do that to for example calculate coefficient alpha to addresses reliability. Then if you do an exploratory factor analysis then in most cases actually the confirmatory factor analysis would be better because it's a bit more rigorous it allows you to test whether the model is correct and it also - in cases where exploratory factor analysis cannot find your solution then it's possible that confirmatory factor analysis still works because you give the solution and don't require the computer to find it for you. But then if you apply confirmatory factor analysis then instead of taking the sums of indicators and using those as in regression analysis you really should be using structural regression model because it's again more rigorous and it allows you to control for measurement error and it allows you to do overall more tests. So there's - every time when you do a survey or any other multiple item measurement you must do a factor analysis. If you do a factor analysis then it's better to up like CFA if you do CFA then it's better to apply structural regression models than to do Regression analysis with sum scales.
So this is all good and but there are reasons why you probably shouldn't apply structural regression models as your first analysis technique. So if structure regression models are so much better than regression with sum scales why would I not use so it? That's the question. There are good reasons.
The reasons not to use structural regression models - the first reason is that it's more complicated to apply. So that has two implications. The first implication is that if you are a beginner and you want to get your first paper for a first conference publication out then doing that with regression of sum scales it's easier and you can get more done with regressions analysis than SEM. In SEM it's possible that when you give you the computer data the computer doesn't give you any results at all. That doesn't happen with regression analysis. If it happens with SEM then you need some expertise to be able to get the model to work. There is also another reason related to the complication of application. It is that it's better that if you know a tool well - like a regression analysis that is slightly sub optimal so regression analysis can't deal with measurement error the same way that structural equation models can - it's nevertheless better to use that technique than a more complicated technique that you may not understand very well. So it's better to have results that you know are done correctly using a slightly suboptimal techniques than having results that are done with the state of the art technique but you're not sure whether they're done correctly. So I would encourage you to first run - do a regression analysis really well and only after you know that then move to the more complicated ones.
SEM also has some statistical issues. So SEM requires that the model is correctly specified. The idea of correct model specification is that if your model is not correct the SEM results can be highly misleading. Model correctness means that the measurement model must be correctly specified so each indicator must belong to those factors that they say that they do and then all these causal relationship between the factors must be correctly specified. Otherwise the results can be very misleading. Then what helps you here is the chi-square test. If your chi-square test rejects the model then that means that something is incorrect. Something - the model is incorrect for the data in someway. You have to understand why and you have to do diagnostics. That requires an expertise to do and unless you do that then the results could be widely misleading. It's probably easier to get misleading results with structural regression models than regression analysis with sum scores. My personal take is that if you know how to use structure regression models well you should probably always use that as your own the main analysis technique instead of regression analysis.
Then again I have the impression that most people who apply structure regression models or structural equation models probably don't understand these techniques well enough to use them in a way that we can rely on the results to be correct and that's a big problem and for that reason I recommend that people start with regression analysis instead. Finally if you want to get started with regression analysis. Study a good book. There's so many different ways that can go incorrect and my favorite SEM book is Klein's book principles and practice of structural equation modeling. He concludes his book with this nice chapter of how to fool yourself with SEM and then he had at least 52 different things that can go wrong and you need to know these things really before you apply this technique because otherwise you will have problems with the technique and your results may not be trustworthy. But it is a technique worth learning in the long run because it allows you to do things that you cannot do with regression analysis.