TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
Kurssiasetusten perusteella kurssi on päättynyt 06.04.2022 Etsi kursseja: TU-L0022
Statistical controls (11:40)
This video explains the importance and characteristics of control variables in social science research.
Click to view transcript
Because
experiments are not always feasible in business research, we do
statistical controlling for alternative explanations. So that's our
second strategy for making causal claims. The idea of statistical
controls is to introduce alternative plausible explanations to your
analysis. So instead of just comparing men-led companies against
women-led companies, we introduce possible confounding factors to the
analysis. There are a
couple of different ways. One intuitive way is an instance of a general
strategy called matching, so we try to make the samples more comparable.
So let's assume that there are only a few women-led companies with more
than 250 people, and most women-led companies have 250 people or less.
We could make the samples more comparable by dropping large companies.
So we only focus on medium-sized companies with 250 people or less. And
that would be a more fair comparison. And if size actually is a factor
that influences both gender, the CEO selection and performance, then
these kinds of more comparable samples should give us a smaller
performance difference, which they do in this case. So we can see that
when we make the samples more comparable, the difference is 1.4 instead
of 4.7.
For example, we could say that this difference between
men and women-led companies, this is just an arbitrary values here, this
difference here is not because of the gender differences, instead, we
could claim that it's a company size difference. so that, there is
actually this overlap between gender and performance, here the
correlation, is partially caused by gender, but it's partially also
because of, smaller companies are more likely to hire women as CEOs, and
smaller companies are more profitable. So we say that this relationship
between gender and performance is at least partially explained by size
being a factor in CEO decisions, and size being a factor in influencing
performance. So, how do we take that kind of control variable into
account?
Matching is an intuitive way of understanding statistical
controlling, but it's not a practical strategy for a couple of reasons.
First of all, when you have multiple different things that you want to
control for, then constructing this kind of matched sample, in this kind
of simple strategy, it's not a viable option anymore, because you
cannot have exactly the same companies in both samples. So once the
factors to be controlled increases then it's not possible to construct
two samples that are comparable on all those factors. So to take that
into consideration, we don't normally apply matching. instead, we apply a
statistical model. So we say that return on assets depends on CEO
gender and company size, so that we can express return on asset as a
linear function of CEO gender and size, so we multiply CEO gender,
female is one, male is zero, and company size, we multiply that with
another variable beta 2, and then we ask the computer to give us some
estimates for these beta 0, beta 1 and beta 2, so that we can predict
the return on assets as well as possible. And a computer will do that
for us, then we interpret the results to see whether the gender effect
actually exists.
Either way, regardless of how we actually
implement this statistical controlling, we need to decide, which factors
we need to control for. And the factors that we control for are called
control variables. So control variables are present in nearly every
study in business research. It's quite often that you actually see a
section in the paper that is explicitly labelled as control variables
like here in the Hekman's paper, that we use as an example. So control
variance is alternative explanations or alternative theories for the
data. If we say that the women-led companies are more profitable than
men-led companies, we have to think really hard, why is that the case?
Then we have all kinds of reasons, plausible reasons that we could come
up with: the size effect, industry effect, selection effect, and then we
include those into the same model. So we say that we have independent
variabe, that we assume to influence the dependent variable, and we also
have control variables in the same model and we kind of like put these
variables together to compete against one another to see, which one of
them actually explains the dependent variable, return on assets in this
case. So it's important that the control variables are selected based on
theory, instead of just throwing in a standard set of gender and age,
if we have people or industrial revenue if we have companies. So you
need to choose them carefully to rule out alternative explanations and
it's important that you justify, why you think that the control variable
is related to both, your independent variable and the dependent
variable.
One common thing that I see in articles and which I
complain about as reviewer, is that the authors generally only justify
the relations between the control and the dependent variable, but it's
almost as important that you justify, why you think that the control and
the interesting independent variable, CEO gender in this case, are
correlated.
Let's take a look at an example, so we have the
article by Deephouse and they have a variable called market share. So is
market share an interesting, a good control variable, based on this
correlation matrix? To understand whether it's a good control variable
empirically,
we have to look at certain correlations, so market
share is a relevant control variable if it's correlated with the key
independent variable, and we are looking at the effects of strategic
deviation, variable number four, on relative return on assets, variable
number one. So we need to take a look at the correlations of market
share with variable one and variable four. So we are here, market share
is weakly and negatively correlated with return on assets and it's very
strongly correlated with strategic deviation. That would suggest that we
can't infer, whether there is or is not a causal relationship based on a
correlation. But this strong correlation raises the question that, if
market share has an effect on return of assets, Then because it's
correlated with the strategic deviation variable, it could create a
spurious correlation. So market share is relevant to control, if we have
theoretical reasons that return on assets depends on market share.
Let's
take a look at the actual modelling results. So this is based on
Deephouse's paper, so they say that market share has a negative effect
on return on assets, so that when your market share goes up, return on
assets goes down. And compared to the other effects in the article, this
is an OK, a very large effect. The effect of strategic deviation is
-0.02 so it's small, you can't compare directly but we will do that for
convenience now. And they are highly correlated, so what will happen,
what is the interpretation of this figure?
The interpretation is
that, larger firms, firms with more market share, are more strategically
deviant, according to their definition. Larger firms are also less
profitable and these two relationships cause a spurious relationship. If
larger firms are more deviant and larger firms have smaller ROA, it
means that if this effect was not controlled for, then we would get a
very different estimate for strategic deviation. If we don't control for
market share, then this effect here will be inflated, because it
confounds the effect of market share and strategic deviation. So let's
assume we leave market share out, then our estimate of strategic
deviation would be the actual direct effect of strategic deviation and
also the effect of size, because size is correlated with deviation. So
the effect would be -0.058, or three times as large as before.
So
omitting the important control variable would have a serious
consequence for the modelling results. And in this case, it will result
in omitted variable bias, which makes the estimate three times as large
as it otherwise would be, assuming that the model is otherwise correctly
specific. So dealing with controls, because the controls are so
important for your causal claims, you should take it very seriously,
which variables you include and really think, what kind of alternative
explanations there are for the observed associations, or the association
that you expect to observe.
Statistical controls and
experimental approaches can be compared. So in the experiments, you have
treatment and control groups that you assign yourself, and you apply
the treatment, so you have full control over the study. And the groups
are perfectly comparable to start with, because of randomization, and if
after treatment there is a difference between the two groups, then we
can make a claim that the difference is because of the treatment, so
that's fairly simple.
In statistical controls, we don't have
control over the cases, so we are just passive observers of what
happens. And the only way we can rule out alternative explanations is to
think based on existing theory, what kind of other plausible
explanations there is for an association, and then we rule them out
using control variables in our analysis.