TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
This course space end date is set to 06.04.2022 Search Courses: TU-L0022
Link between theory and data (10:50)
This video goes through the link
between theory and data and how that link is built in empirical papers.
The video explains the propositions, hypothesis, and measurement. The
video also explains the validity and reliability.
Click to view transcript
Arguing
that your data are reliable and valid measures of the constructs in your theory
is a challenging task. In this video I will look at the link between theory and
data and how that link is built in empirical papers. The idea of
the link between theory and data is that data is something that we observe and
then quite far from the data is the theoretical concept and we have to somehow
argue that the data are related to the theory. If the data are unrelated the
theory then we cannot claim that the data would allow us to test the theory. So what
exactly is the nature of this link is something that your study needs to
address. One way to think about this issue is to introduce the concept of an
empirical concept between the theoretical concept and the actual measurement
result which is your data. The idea of
an empirical concept is that it's a lower level concept than your theoretical
concept and it allows you to actually collect some data. So let's take a look
at how that approach works in practice. We need an
example and I'm going to use this example that I have used in the past. In 2005
there are the largest 500 Finnish companies. There is a finding that the women
lead companies were four point seven percentage points more profitable than man
lead companies and we want to make a claim that naming a woman as a CEO causes
the profitability to increase. So our
theoretical concept here is the CEO gender and second theoretical concept is
profitability or performance. Then we have to figure out how exactly we link
those two theoretical concepts to the data. How it
works is that we introduce the empirical concept and we have been using this
diagram before when we were discussing about inductive and deductive logic. The
idea was that we start with a theoretical proposition. Then from the
theoretical proposition we derive a testable hypothesis that is on a lower
level of abstraction. Then we collect some data and we test for statistical association
which allows us to make claims about the correctness of the hypothesis. The idea
was that we apply deductive logic so that if the proposition is correct then
the hypothesis should be observed and then we check if we actually do observe
by calculating something based on our measurement results. Our focus
this far has been on the proposition hypothesis and statistical association and
we haven't really discussed much about these arrows here. So now we're going to
be looking at specifically what these two arrows here mean. And let's
go back to our example. So the first concept was CEO gender and we need to have
an empirical concept that we can actually collect data for. For example, if the
gender of the CEO is theoretical concept we could have the result of a medical
examination as an empirical concept that is something that we can observe data
for but that's not a practical solution. In practice, we can just use our
empirical concept or we can define it us whether the CEOs first name is a man's
name or a woman's name. That of course could have some reliability or validity
problems because we may not be able to know for sure that a name indicates a
woman because some names are used for both genders. Then we have specific names
for specific CEOs. The same
thing here we need to have a concept. We have the performance that's the
dependent variable theoretical variable. ROA is the empirical concept here in
the example and then we have ROA data for specific firms. Now the
question is how do we justify these relationships? How do we justify that
whether the CEO's name is a man's name is a reliable and valid measure of the
theoretical concept? How do we justify here that ROA is a valid performance
measure and how do we justify that our data is reliable? Let's take
a look at ROA. So why would ROA be a valid and reliable measure. We have to
first understand what is reliability and what is validity here. Reliability
here in this figure is between return on assets the conceptual definition of
the empirical concept and the actual data. So do we get the same data again if
we collect the same data for the same sample. With ROA, because it's an
accounting figure that comes from a database, we concluded it is probably
highly reliable. So reliability is here and then validity on the other hand is
a much more challenging question. Can we
claim that return on assets is actually a valid measure of performance and how
do we do that? Reliability is fairly simple to argue. So the simplest way would
be just to measure the same thing again. Demonstrate that you get the same
result then it's reliable. So reliability is not about whether the variable
actually measures what it is supposed to measure. It's simply that if we do the
study again would we get the same result. Doing the study again doing the
measurement again is a simple way of doing it. Validity on
the other hand - we have to argue that return on assets is a valid performance
measure. So how exactly we do that? There are a couple of different strategies
but this is non-statistical argument so it's an argument based on theory and
based in our understanding of the phenomenon. For example, we could argue that
ROA, return on assets is a valid measure of performance because that is a
performance measure that investors and managers care about. So if it's
a relevant measure for investors and managers who we hope to inform with our
study then it's a valid measure. That's one way. Another way of thinking about
is that the purpose of the company is to generate profits and earn money for
the owners so that's the purpose of a business organization. And then
return on assets is a function of that money generated divided by the money
invested in terms of assets. So it's kind of like a way of standardizing taking
into account that companies of different size produce different amount of
results. So it's scales the ultimate output which is the profits based on the
company size. So that would be an argument for ROA as well. But this is not a
statistical argument. It's an argument that this is a relevant metric and it's
based on either that we have a theoretical understanding what is the purpose of
the organization, then we say that this reflects a purpose or it could be made
by arguing that that's a relevant variable for practitioners. Either way
it's a substantive instead of methodological argument. So this is a statistical
problem reliability and this is the theoretical and a philosophical problem. So
it relates to really is this irrelevant for the readers of your audience and
your theory. So most
researchers, when we do research, we apply the empirical concept as a proxy and
in practice that means that we simply assume that the empirical concept is
equal to the theoretical concept. So once we have argued that this empirical
concept has some relevance for the theory then we use it as a substitute or a
proxy for the theoretical concept. The reason for that is that we really cannot
measure a theoretical concept directly so using this empirical concept as a
proxy is the best thing that we can actually do. Let's take
a look at how Deephouse paper does this kind of thinking. So they had a
proposition about statistical similarity and performance. Then they are using
relative ROA as their performance measure the empirical concept and stability
deviation as empirical concept measuring strategy similarity and then they had
some data that they used for to calculate this result. How do we
argue that strategic deviation is a valid measure of strategy similarity?
Simply the fact it's labeled similarly to strategic similarity doesn't really
mean anything. The fact
that we decide to label something doesn't give it a meaning. So that is called
the nominalist fallacy. If we claim that just because we decided to name this
strategic similarity it must be a measure of the similarity is not a valid
argument. So how do
we justify it? We talked about ROA in the last slide so that's simple.
Strategic similarity their argument is basically that which asset categories
behold is one of the most important strategic decisions of commercial banks. So
that's the argument for why they take these different asset categories into
consideration. Then they claim that previous research has summarized these
different asset categories that they use for calculating deviation in a certain
way and they use the same approach and they use other study for justification. So the way
you argue for validity there are a couple of different ways. You have to first
explain the relevance of the variables or the data for your theory. In this
case, asset categories are relevant for banks and then the actual measurement
approach you either have to justified yourself or you can say that others have
used this approach and others have provided justification. If you do
that you must be careful that you actually check that the paper that your site
provides a justification because sometimes researchers use completely
unjustified measures and just the fact that something has been published with
the measurement approach doesn't make that measurement approach necessarily
valid. So you have to look at the actual validity of claims and validity
evidence in published studies when you decide which measurement approach to
use.