TU-L0022_aalto-CUR-141790-3063741: Link between theory and data (10:50)

Etusivu Koulut Kurssipalaute Palvelulinkit Intelliboard

Kurssiasetusten perusteella kurssi on päättynyt 06.04.2022 Etsi kursseja: TU-L0022

Link between theory and data (10:50)

Vaatii arvosanan

This video goes through the link between theory and data and how that link is built in empirical papers. The video explains the propositions, hypothesis, and measurement. The video also explains the validity and reliability.

Click to view transcript

Arguing that your data are reliable and valid measures of the constructs in your theory is a challenging task. In this video I will look at the link between theory and data and how that link is built in empirical papers.

The idea of the link between theory and data is that data is something that we observe and then quite far from the data is the theoretical concept and we have to somehow argue that the data are related to the theory. If the data are unrelated the theory then we cannot claim that the data would allow us to test the theory.

So what exactly is the nature of this link is something that your study needs to address. One way to think about this issue is to introduce the concept of an empirical concept between the theoretical concept and the actual measurement result which is your data.

The idea of an empirical concept is that it's a lower level concept than your theoretical concept and it allows you to actually collect some data. So let's take a look at how that approach works in practice.

We need an example and I'm going to use this example that I have used in the past. In 2005 there are the largest 500 Finnish companies. There is a finding that the women lead companies were four point seven percentage points more profitable than man lead companies and we want to make a claim that naming a woman as a CEO causes the profitability to increase.

So our theoretical concept here is the CEO gender and second theoretical concept is profitability or performance. Then we have to figure out how exactly we link those two theoretical concepts to the data.

How it works is that we introduce the empirical concept and we have been using this diagram before when we were discussing about inductive and deductive logic. The idea was that we start with a theoretical proposition. Then from the theoretical proposition we derive a testable hypothesis that is on a lower level of abstraction. Then we collect some data and we test for statistical association which allows us to make claims about the correctness of the hypothesis.

The idea was that we apply deductive logic so that if the proposition is correct then the hypothesis should be observed and then we check if we actually do observe by calculating something based on our measurement results.

Our focus this far has been on the proposition hypothesis and statistical association and we haven't really discussed much about these arrows here. So now we're going to be looking at specifically what these two arrows here mean.

And let's go back to our example. So the first concept was CEO gender and we need to have an empirical concept that we can actually collect data for. For example, if the gender of the CEO is theoretical concept we could have the result of a medical examination as an empirical concept that is something that we can observe data for but that's not a practical solution. In practice, we can just use our empirical concept or we can define it us whether the CEOs first name is a man's name or a woman's name. That of course could have some reliability or validity problems because we may not be able to know for sure that a name indicates a woman because some names are used for both genders. Then we have specific names for specific CEOs.

The same thing here we need to have a concept. We have the performance that's the dependent variable theoretical variable. ROA is the empirical concept here in the example and then we have ROA data for specific firms.

Now the question is how do we justify these relationships? How do we justify that whether the CEO's name is a man's name is a reliable and valid measure of the theoretical concept? How do we justify here that ROA is a valid performance measure and how do we justify that our data is reliable?

Let's take a look at ROA. So why would ROA be a valid and reliable measure. We have to first understand what is reliability and what is validity here. Reliability here in this figure is between return on assets the conceptual definition of the empirical concept and the actual data. So do we get the same data again if we collect the same data for the same sample. With ROA, because it's an accounting figure that comes from a database, we concluded it is probably highly reliable. So reliability is here and then validity on the other hand is a much more challenging question.

Can we claim that return on assets is actually a valid measure of performance and how do we do that? Reliability is fairly simple to argue. So the simplest way would be just to measure the same thing again. Demonstrate that you get the same result then it's reliable. So reliability is not about whether the variable actually measures what it is supposed to measure. It's simply that if we do the study again would we get the same result. Doing the study again doing the measurement again is a simple way of doing it.

Validity on the other hand - we have to argue that return on assets is a valid performance measure. So how exactly we do that? There are a couple of different strategies but this is non-statistical argument so it's an argument based on theory and based in our understanding of the phenomenon. For example, we could argue that ROA, return on assets is a valid measure of performance because that is a performance measure that investors and managers care about.

So if it's a relevant measure for investors and managers who we hope to inform with our study then it's a valid measure. That's one way. Another way of thinking about is that the purpose of the company is to generate profits and earn money for the owners so that's the purpose of a business organization.

And then return on assets is a function of that money generated divided by the money invested in terms of assets. So it's kind of like a way of standardizing taking into account that companies of different size produce different amount of results. So it's scales the ultimate output which is the profits based on the company size. So that would be an argument for ROA as well. But this is not a statistical argument. It's an argument that this is a relevant metric and it's based on either that we have a theoretical understanding what is the purpose of the organization, then we say that this reflects a purpose or it could be made by arguing that that's a relevant variable for practitioners.

Either way it's a substantive instead of methodological argument. So this is a statistical problem reliability and this is the theoretical and a philosophical problem. So it relates to really is this irrelevant for the readers of your audience and your theory.

So most researchers, when we do research, we apply the empirical concept as a proxy and in practice that means that we simply assume that the empirical concept is equal to the theoretical concept. So once we have argued that this empirical concept has some relevance for the theory then we use it as a substitute or a proxy for the theoretical concept. The reason for that is that we really cannot measure a theoretical concept directly so using this empirical concept as a proxy is the best thing that we can actually do.

Let's take a look at how Deephouse paper does this kind of thinking. So they had a proposition about statistical similarity and performance. Then they are using relative ROA as their performance measure the empirical concept and stability deviation as empirical concept measuring strategy similarity and then they had some data that they used for to calculate this result.

How do we argue that strategic deviation is a valid measure of strategy similarity? Simply the fact it's labeled similarly to strategic similarity doesn't really mean anything.

The fact that we decide to label something doesn't give it a meaning. So that is called the nominalist fallacy. If we claim that just because we decided to name this strategic similarity it must be a measure of the similarity is not a valid argument.

So how do we justify it? We talked about ROA in the last slide so that's simple. Strategic similarity their argument is basically that which asset categories behold is one of the most important strategic decisions of commercial banks. So that's the argument for why they take these different asset categories into consideration. Then they claim that previous research has summarized these different asset categories that they use for calculating deviation in a certain way and they use the same approach and they use other study for justification.

So the way you argue for validity there are a couple of different ways. You have to first explain the relevance of the variables or the data for your theory. In this case, asset categories are relevant for banks and then the actual measurement approach you either have to justified yourself or you can say that others have used this approach and others have provided justification.

If you do that you must be careful that you actually check that the paper that your site provides a justification because sometimes researchers use completely unjustified measures and just the fact that something has been published with the measurement approach doesn't make that measurement approach necessarily valid. So you have to look at the actual validity of claims and validity evidence in published studies when you decide which measurement approach to use.

Tämä sisältö näytetään esikatselutilassa, suoritustasi ei tallenneta.

TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022

Link between theory and data (10:50)

Opiskelijoille

Opettajille

Palvelusta