TU-L0022_aalto-CUR-141790-3063741: Simultaneous equations approach to mediation (15:58)

Home Schools Course feedback Service Links Intelliboard

This course space end date is set to 06.04.2022 Search Courses: TU-L0022

Simultaneous equations approach to mediation (15:58)

Receive a grade

Description to be added.

This video provides an alternative approach to calculating a mediation model using a covariance, or correlation in this case, matrix. Parallels are drawn between path analysis and regression analysis to explain how to better fit the model to the data.

Click to view transcript

The simultaneous equation approach is another way of calculating a mediation model. How this approach works is that we take the mediation model as one large model and instead of estimating the regressions of y and m separately, we derive the model implied covariance matrix. I will be using the correlation metric here for simplification but in practice we work with covariances nearly always.

So we look at, for example, the correlation between X and Y. We find that we can go from X to Y using two different paths. We go from X through m to Y and we go from X to Y directly. So that gives us two ways, or two elements. We have the mediation effect, beta m1 times beta y2, plus the direct effect, beta y1. Similarly, we can calculate the correlation between Y and m, it is the direct path, and it is this spurious correlation due to X that is a common cause for both. And that gives us the correlation between m and Y here.

How we estimate this model is that we find the betas so that the data correlation matrix and model implied correlation matrix are as close to one another as possible. To understand the calculation, we have to first take a look at the degrees of freedom because that's important for this particular problem. The degrees of freedom for this model is calculated based on these correlations. Because we only use information from the correlations, we don't look at the individual observations.

Our units of information, the data, we have 5 correlations that depend on the model parameters. Importantly, the variance of X doesn't count because that doesn't depend on any of the model parameters. So we have these five elements, variance of m, variance of Y, and all the correlations that depend on the model. So we have five units of data.

Then we have five things that we estimate, we have five free parameters, we have three regression coefficients here, and then we have these two variance, the variance of this error term and variance of that error term. So we estimate five different things and the degrees of freedom for this model is then zero, because it's a difference between these two.

And we say that this is a 'just identified' model. The just identified means, that we can estimate the model but we are using all the information from the data to estimate the model and we could not add anything more to the model. It also means that the model will fit perfectly and what that means I will explain a bit later in the video.

So we have a just identified model, it means that we can find the values of these variances and the Beta's, so that the model implied correlation matrix matches exactly the data correlation matrix. We can do that for example using the Lavaan package in R. So Lavaan gives us output, you can do the same with the SEM command in Stata. And the output contains, importantly, two different sections.

So we have estimation information, and this is not particularly useful. Because our degrees of freedom is zero, we can't do model testing. If we have positive degrees of freedom, we could test the model, and I'll talk more about that later in the video. And then we have these coefficients.

So we have regressions, we have regressions of Y on m and X, so that's beta y1 and beta y2. Then we have regression of m on X. That's beta m1, and then we have these estimated error variances, Ym and Yu, Yu and Ym, that way. So we get the estimates here, we get the standard errors here, we get these Z values, they are not are not T values because this is based on large sample theory. And then we get p-values for these estimates.

Then we can also calculate using this package, the mediation effect, so we define that into the models. That's something that the software will calculate for automatically. The standard error, the Z value, and the p-value for the standard error. And then we have the total effect which is the effect of X on Y that goes directly, and X on m through m. So that's a total effect, this influence of X and Y regardless of whether it goes directly or through m. And then the direct effect is just beta y1.

So that gives us the estimates, and how it actually then works if we want to test a partial mediation model. Importantly, these estimates will be the exact same that you get from regression analysis if you estimate this model separately using regressions, then you will get the exact same results.

There will be differences once we start to estimate models that are over identified. For example, if we estimate directly a full mediation model. So we're saying that there is no path from X to Y. Estimate the model where we assume that all effects of X to m, X to Y, go through m. And we apply tracing rules again, we can see that there are equations are a bit simpler here because we only go from X to Y using this one path, beta in one beta y2.

So there's no direct path anymore from X to Y, it's only this product. And this has a positive degrees of freedom. So the data are the same, so we have 5 units of data, but we now only have 4 parameters that we estimate. So we have two regression coefficients and two error variances. Then the degrees of freedom is the difference. So we have one degree of freedom and we call this an over identified model.

The problem, or a feature whether you want to call it that, of these over identified models is that generally, we cannot make the model implied correlation matrix to exactly equal the data correlation matrix. Instead of making those the same and solving, we have to make the model implied correlation matrix as close as possible to the data correlation matrix.

So to make that model implied correlation matrix as close as possible to data correlation matrix, we have to define what we mean by close. So we have to define how we quantify the distance between this, how different the model implied correlation matrix is from the data correlation matrix. This problem of quantifying the difference between these two matrices is comparable to the regression analysis.

So in regression analysis we use the discrepancy function. So we calculate the difference between a regression line and the actual observations. And to do that we calculate the residual, so the difference between a line and the observations. We take the squares of residuals. The idea of taking squares is that we want to avoid having a large estimation, large prediction errors. So we are OK with small prediction errors but we want to avoid having large prediction errors.

Then we take a sum of these squares and that gives us the ordinary least squares estimator. We minimize that, gives us the regression coefficient. In path analysis we calculate the difference between each unique cell in the observed correlation, or covariance, matrix and the model implied correlation, or covariance, matrix.

We raise those differences to the second power. The idea again is that we want to avoid having models that explain some parts of the data really badly and we are kind of OK with models that are slightly off compared to the data. Then we sum these differences, these squared differences, and that provides the unweighted least squares estimator.

There's another parallel between our path analysis and regression analysis. So besides minimizing the discrepancy function, and that gives us estimates that are in some way ideal. Then the discrepancy can be used to quantify the goodness of fit of the model. So the R-square, one definition of R-square regression analysis, is based on this sum of squares. So we calculate the sum of squares regression and then we compare that to the total sum of squares and that gives us R-square.

Then here we have the sum of squares of these covariance errors and that can be used to quantify the model fit as well. Let's take a look. So I estimated information, there's estimation information again. We have one degree of freedom for this full mediation model and we have a p-value that is non significant. I'll go through the p-value shortly. So the idea of the p-value is that it quantifies how different the actual observed correlation matrix is from the implied correlation matrix.

So the difference between this observed correlation matrix and this model implied correlation matrix is called the residual correlation matrix. So again, there's a parallel to regression analysis residuals where we work with raw observations like in regression analysis. The residual is the difference between actual observations and predictive value. Here when we work with correlations, the residual is the difference between a predicted correlation and observed correlation. So this residual correlation matrix here is basically the observed correlations minus the implied correlations. You can verify that it actually is the case here.

So the question that the p-value here answers whether this small correlation here can be by chance only. So is it possible that the sampling error in the observed correlation matrix produces that kind of discrepancy. That's close to zero, so we can say that that's probably due to chance, but if it was far from zero, then we would know that this model doesn't adequately explain the correlation between X and Y, and we would probably conclude that X has also a direct effect of Y. So it would be a partial mediation instead of a full mediation model that is specified here.

So that's the test here, the p-value of about 0.7 indicates that getting this kind of effect by chance only is plausible. So normally, and this is called an over identification test because we have one degree of freedom and we are testing whether that one degree of freedom is consistent with what we have in the model, we want to accept the chi-square test. We don't accept the null hypothesis here. The reason is that normally in the regression analysis we are interested in showing that the null hypothesis that coefficient is zero is not supported because we usually want to say that there is an effect.

Now we want to say that there is no difference with the model implied matrix and the actual matrix. So we are saying that the model implied matrix fits well to the data and therefore we can conclude that the model implied matrix is in some sense correct and the model is in some sense correct. So we want to accept the null hypothesis. If we reject the null hypothesis then we conclude that this model is inadequate for the data and we shouldn't make much inferences based on the model estimates. Instead we should be looking at why the model doesn't explain the data well and perhaps adjust the model, for example, add the direct path from X to Y.

Now this is, here we have just one statistic, so we could be just comparing this statistic against an appropriately chosen normal distribution. We don't do that, instead we use the chi-square test. The reason is that for more complicated models, or more complex models, there are typically more than one element of this residual correlation matrix that is nonzero.

So when we ask the question of can this small difference be by chance only, we can take a look at the normal distribution and how far from zero the estimate is. And that gives us the p-value. If we have this, so that gives us the Z value estimate divided by standard deviation or estimate divided by the standard error.

If we have two cells here that are different from zero, then we have to do a test that these both are zero at the same time. So we are looking at the plane, so instead of looking at one variable we look at two variables and how far they are from from zero. And you may remember from our earlier video that in this case, or from your math class in high school, this distance is calculated by taking a square of this coordinate and a square of this coordinate taking a sum, and then taking a square root.

In practice, we don't take the square root because we can just use a reference distribution that takes the square root into account. So we have the square of this estimate and square of this estimate. We take a sum and that gives us the the chi-square, so that statistic. So the chi-square is the sum of two normally distributed random variables when both have sum of squares of two normally distributed variables when both have a mean of zero.

So the null hypothesis is that both of these are zero, then the distribution is chi-square. So we take one random variable normally distributed, centered at zero. We square that, we take another one, we square that, we take a sum, and that gives us the reference distribution. So it's, basically there's a parallel again, minimize the sum of squared residuals. Well we want to minimize the sum of squares of these differences and we quantify these differences by looking at the actual sum of squares. So we take squares of this estimate and standard error, which gives is the variance and that gives us the chi-square statistic.

So the logic is that the instead of comparing just one statistic against a normal distribution, we compare the sum of squares estimate. So sum of squares of two differences against the the sum of squares of two normal distributed variables. If it's plausible that a random process of two normally distributed variables would have produced the same distance then we conclude that this could be by chance only.

You are in preview mode.

TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022

Simultaneous equations approach to mediation (15:58)

Students

Teachers

About service