TU-L0022_aalto-CUR-141790-3063741: Statistical inference (12:38)

Etusivu Koulut Kurssipalaute Palvelulinkit Intelliboard

Kurssiasetusten perusteella kurssi on päättynyt 06.04.2022 Etsi kursseja: TU-L0022

Statistical inference (12:38)

Vaatii arvosanan

This video explains statistical inference, which is the task of making statistical claims about the population.

Click to view transcript

Now we start talking about statistical inference. Which refers to the task of making some kind of statistical claims about the population.

We don't yet discuss causality or causal claims but just making claims that there is an association between two variables or difference between two groups in a population, based on sample data.

And our example now comes from the Talouselämä 500-magazine that I covered in a previous video. This is a Finnish business magazine that follows the 500 largest Finnish companies. And in one particular year, in 2005, there were big headlines in Finnish newspapers because on this list the return on assets of women-led companies was 4.7% points higher than in men-led companies.

So our question now is we have an observation that the return on assets difference is 4.7% points. Is it a big deal? Does it matter? What does the data tell us? And what kind of inferences can we make from this sample? 4.7 % points is a pretty big difference. So what? What does it mean? What the data are and what the data tell us directly, is that in one point of time and in one sample, firms led by women are more profitable. That's what the data tells us and now the question is can we generalize? Can we say something beyond that particular sample? Can we say that this generalizes to other years or is it just one year? If it's just one year, and the women-led companies happen to be more profitable but in wouldn't generalize to other years, then it's not a big deal. If it generalizes to other years then it's probably a big deal. The second question is: Does it generalize to other firms? Is it just these 500 companies in which the women-led companies are more profitable or does it generalize to the thousand largest companies or all companies in Finland or all companies in all countries. How do we generalize, how widely can we generalize?

The first question that we need to ask when we start discussing generalizability of a sample statistic. This is a sample statistic. It's something calculated, a number from a sample. Does it generalize to the population? We have to ask: Could this be by chance only? Is it possible that because of sampling variation, we just happen to have the companies that were led by women happen to have a better year than companies that were led by men? Could it be just random occurrence or is it evidence of a systematic difference?

And we have to ask two important questions to answer that, whether it can be by chance only. The one is: Is 4.7% points a large difference? Large differences really occur by chance only, small differences occur by chance only frequently. When we calculate something from a sample, the sample estimate is hardly ever exactly the population value. It's somewhere close. So is it far enough to say that it's improbable that this kind of result could occur by chance only. Or is it close enough to the population value that it actually makes no difference.

Then we have to look if it is a large effect. The mean ROA is about 10 in this sample. And 4.7% point difference would mean that if the men-led companies are let's say 8% ROA then women-led companies are 13% ROA. So they are more than 50% more profitable than men-led companies, that's a big thing. That's a big difference.

The second important question relates to sample size. We know that the full sample is 500 companies, but that's not the full story. We also have to consider how many women there are. If there are just five women or if there are 250 women, those two conditions would lead to very different conclusions. It happens to be that there were 22 women in the sample so that's a fairly small number of observations.

Now the question of statistical inference. We want to see if actually, this return on assets of 4.7% point is large enough that we can conclude that there probably is a difference, a systematic difference. And this is not due to sampling fluctuations only.
We have to ask the question: What would be the probability of getting this kind of difference by chance only. You watched the video about John Rauser. What would John Rauser do in this scenario?

We have 500 companies, we want to know whether the difference between the women-led companies and the men-led companies could occur by chance only. What we do is, one strategy of answering that question is to do a permutation analysis or a permutation test which is a fairly intuitive way of understanding statistical testing. And what we do is that we take the list of companies. And we have the largest companies, I got the data from a database this may not be the exact same 500 companies, but it doesn't matter for the example. We choose 22 companies at random and we compare the remaining 478 and we calculate the difference. So we have 22 companies again, a mean of 22 companies compared to mean of 478 companies. We repeat 10,000 times and we see what's the difference. What is the probability of getting at least a 4.7% point difference in these comparisons?

So let's take a look at the results. I did the analysis, here are the first 200 comparisons. We can see that quite often when we take randomly 22 companies and compare them against the 478 remaining companies, the difference is very close to zero. So here is very close to zero, no difference. Sometimes we get a negative difference here. So the difference actually, there's no systematic difference, there cannot be, because I chose companies randomly and two random samples are always comparable.

But we get these differences larger than 4.7, we get 9/200 comparisons using this permutation testing strategy. So the probability of getting 4.7% points difference or larger in this test is 0.045 for the first 200 observations. Is that enough evidence to conclude that the 4.7% point difference is unlikely to be by chance only? Let's take a look at the bigger picture. So we have the distribution of the estimates and we have 10,000 repeated samples. And sometimes we get a large negative estimate, sometimes we get a large positive estimate, typically we get an estimate where there is no difference because there should not be any. Because we are taking a random sample from population comparing to another sample there should be, because of randomization there shouldn't be any differences.

The probability for getting 4.7% points or higher difference is 0.0347/10,000 replications. This probability is called the p-value.
It is the probability of observing an effect equally large or greater under there being no effect. We can also, we don't have to do the permutation testing, we don't have to do the random sampling. Because this shape here looks familiar; so that's the normal distribution. We see that the difference is normally distributed and many things are, many in statistics they follow normal distribution. So we can just, instead of approximating this difference by taking random samples, we only need to find out what is the right normal distribution so where do we draw the distribution. And then compare against that normal distribution.

So here is the normal distribution, overlaid against that observed, if observed distribution of estimates. Here we have the mean of the normal distributions, we see here that's zero, so that's our base case of no difference. And then normal distribution, also we need to know the dispersion, the standard deviation. And this standard deviation is estimated using the standard error. So We have the standard error which the statistical software will print out for us. And we draw a normal distribution mean at 0, which is the null hypothesis value of no difference. Then we have the dispersion here which is quantified by the standard error.

Then we compare how probable, what is the size of this area here. How probable it is to get an estimate of 4.7% points or higher given the null hypothesis? 0.04, that is less than 0.05, which is the normal criterion for statistical inference, for statistical significance.

Could it be by chance only? P is less than 0.05. If this was a research paper, we would conclude that there is a statistically significant difference and we would write a paper. We would get it hopefully published somewhere because we have a statistically significant result.

Of course, we have to think that, in this particular scenario, there are probably reporters who want to say something positive about women. So they could do multiple comparisons, they could do comparisons of growth, profitability, other important statistics.
And if they happen to find one statistic that makes women look better, then they'd write up a newspaper article about it. So the p-values work well when you do just one comparison. But because of the nature of the test, we will get eventually large effects by chance only. If we repeat this study, for example, every year, we check profitability and we check liquidity, we check growth and we do that over ten years so we have 30 comparisons. One of those comparisons will almost certainly give us P is less than 0.05 by chance only.

So P less than 0.05 is not very strong evidence. It is some evidence if it is just one comparison. But if we do multiple comparisons, we can do this kind of data mining and always get a P that is less than 0.05. If we would have less than 0.001 then I would buy the claim that there is actually an effect in the population.

Tämä sisältö näytetään esikatselutilassa, suoritustasi ei tallenneta.

TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022

Statistical inference (12:38)

Opiskelijoille

Opettajille

Palvelusta