TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Model comparisons (7:56)
This video explains the logic of comparing multiple regression models and the indicators used for model comparison.
Click to view transcript
In
a typical research paper that uses multiple regression analysis we do
many different regression models. The reason for that is that we want to
do model comparisons. Now we will take a look at why we compare models
and how we do that.
In Hekman's paper, which is our example for
this video, we will be focusing on their first study, they say that they
used hierarchical moderated regression analysis. So what does that
mean? The hierarchical here is the key term, it simply means that they
are estimating multiple models. Start with a simple one, then add more
variables, compare, add more variables and compare. The moderated part
here means that they have interaction terms in their model. They could
just as well have said that they used regression analysis, because we
use regression analysis nearly always in a hierarchical way. And it's
obvious, based on the regression results, that they contain interaction
terms. So this is a bit unnecessary, complicated way of saying, we did
regression, we estimated multiple models.
Now let's look at the
actual models, and the modeling results, and the logic for multiple
model comparisons. They say that in the first model, they have the
control variables only here and in the second model, they included some
of the interesting variables. So we'll be focusing on the first two
models, Model 1 and Model 2. Model 1 has control variables only, Model 2
is controls and some interesting variables.
The logic in model
comparison, when we do that kind of comparison, is to ask the question:
"Do the interesting variables and the controls together explain the
dependent variable more than the controls only?" If the control
variables and the interesting variables together don't explain the data
more than the controls only, then we conclude that the interesting
variables are not very useful in explaining the dependent variable, and
we can conclude that they don't really have an effect.
How we do a
model comparison is that we compare the R-squared statistic. So here
they have the adjusted R-squared and the actual R-squared. The model
comparison, if we just want to assess the magnitude, how much better the
Model 2 is in small samples, the more appropriate statistic is the
adjusted R-squared. However, the adjusted R-squared statistic doesn't
really have a well-known test. So instead of looking at the adjusted
R-squared, we test the R-squared difference. They present the R-squared
difference here. So this is the difference between the first model
R-squared and the second model R-squared, and they have some stars here.
So the important question is, does the second model explain the data
better than the first model? The adjusted R-squared difference is 4, the
actual R-squared difference is 7 or 0.07, seven percent. So the
interesting variables explain the data a bit more than the control
variables only.
Now we will be focusing on these test statistics
here. So where do these stars come from? These stars come from an F test
that tests the null hypothesis that all the regression coefficients for
every variable added to this model are zero. We look at the logic of
the test now. So the idea of the F test between the first two models is
that it is a nested model comparison test. So one model is nested in
another, that means that one model is a special case of another. So in
this case Model 2 is the unrestricted model or unconstrained model,
Model 1 is the restricted model or constrained model. So, why can we say
that Model 1 is a special case of a more general model, Model 2? The
reason is that Model 1, which leaves out these variables, is the same
model as Model 2, except that the effects of these variables are
constrained to be 0. So by leaving out variables, we constrain the
regression coefficient of that variable to be zero. And that's the
reason, why we say that this model is a constrained version of that
model here. The effects of the last three variables are freely
estimated, here they are constrained to be 0.
So how do we test
these differences, whether the difference in R-squared is more than what
we can expect by chance only? Remember that every time we add something
to the model, the R-squared can only go up. It can stay the same or go
up, typically it goes up. So is that increase in R-squared statistically
significant? To answer that question, we do the t-test. And let's do a
t-test by hand now. We need to first have the degrees of freedom for the
first two models to do the F test. The degrees of freedom for the
regression model is n, the sample size, minus k, the number of estimated
parameters, or regression coefficients, or number of variables in the
model, minus 1 for the intercept.
So we have a sample that
provides us with 113 units of information, we estimate for the first
model effects of 15 variables, we estimate the intercept, so we have 97
degrees of freedom remaining for the restricted model. In the
unrestricted model we estimate three more things, so it's 113 - 18 - 1 =
94 degrees of freedom for that model. So these degrees of freedom
calculations are pretty simple, it's just basic subtraction.
Now
we need to have a test statistic as well, and the F statistic can be
defined based on the R-squared values. So it's the R-squared difference
divided by the degrees of freedom difference divided by that thing
there. So that's the F statistic, your econometrics textbook will
explain, where that comes from. But importantly, we are here interested
in how much the R-squared increases per the degrees of freedom consumed
when we estimate the model. Quite often we compare increased explanation
against increased complexity. That's a fairly general comparison, which
we use in multiple different tests.
So we do that, we plug in
the numbers, we get the result of 3.22. We compare that against the
proper F distribution, we get a p-value of 0.026, which has one
significance star. So they presented two stars, the reason, I have no
idea, but I've done this example in multiple classes, over multiple
years and I don't know why this is different. It could be that there's a
typo in the paper, that's probably the case, because this kind of
difference, getting that because of rounding error in the R-squared is
quite unlikely. So that's the idea of F test. You take a constrained
model and you take an unconstrained model, you calculate the difference
per the degrees of freedom difference, you scale it with this thing, and
then you will get a test statistic that you compare against the F
distribution.
In more
complicated models, for which we don't know how they behave in small
samples, we use the chi-square distribution instead of the F
distribution. But the principle is the same.
In practice, your
software will do the calculation for you, but it is useful to understand
that these calculations are not complicated, and have a little bit of
understanding of the logic behind the calculations.