TU-L0022 - Statistical Research Methods D, Lecture, 25.10.2022-29.3.2023
This course space end date is set to 29.03.2023 Search Courses: TU-L0022
Interpretation with odds ratios (7:56)
This video explain ho to interpret
logistic regression results and covers the concept of odds ratio with
the help of specific examples
Click to view transcript
Interpreting the logistic regression analysis results differs a bit from normal regression analysis interpretation. Let's take a look at the results from logistic regression analysis, using the "menarche" dataset.
The R GLM-command gives us these results. So we'll just focus on the actual coefficients for now, and leave these other things for another video. We have the estimate, which is an estimate. Then we have the standard error, which quantifies how much the estimate is likely to change from one sample to another, if we repeat the study over. We have a Z value, which is the ratio of the estimate divided by the standard error. So the Z value is the same as the T value in regression analysis. It is called a Z statistic instead of T statistic, because the maximum likelihood estimates are based on large sample theory, and instead of comparing against the T distribution, we compare this against the normal distribution. So under the null hypothesis that this estimate is zero in the population, and if the sample size is large enough, the Z value follows a standard normal distribution and that allows us to calculate the p-values. So whether age has an effect or not can be interpreted from these p-values. We can see that age has a large, very statistically significant result. So we can confidently say that age has some kind of effect on the probability of having had menarche.
What is the magnitude of that effect is a bit more complicated question to answer. We really can't say that the probability of having had menarche increases by 1.6 when the girl gets one year older. One reason is that 1.6 increase gets us beyond the range of the data. So if the probability is 0, initially you increase age by one, the predicted probability would be 1.62. So doesn't work that way. The reason why we can't interpret this directly is, these are the effects before we applied the logistic link function. So these are effects on the linear predictor and not on the actual dependent variable. So it's the same thing as in when you do a log transformation from a dependent variable, then the interpretation is that the coefficient tells you what is the effect in log scale, and you want to know what's the effect on the original scale. This coefficient here tells you, what is the effect in the scale of the linear predictor. But you are not really interested in that, you are interested in what is the effect on the observed variable scale. So we don't interpret these directly instead we interpret them as odds ratios.
So the odds ratio is a concept that is useful for regression analysis and for some other logistic regression analysis and for some other models as well. The idea is that odds are the ratio of two outcomes. So here we have the outcome of girl having had menarche and not having had menarche. If 1 in 100 girls have had menarche, then odds for having had menarche is 1 to 99, because 1 out of the sample, out of 100 has had it, and then remaining 99 hasn't had it. You can think of one common use of odds is in gambling. So if you have a team, two soccer teams, one has won two matches in the past, another one has won five matches in the past. Then you say that based on that data the odds for the first team winning is two to five. So that's the idea of odds. And more formally, if the probability of an outcome is p, then the odds are defined us p against one minus p. So it's the probability of one outcome divided by the probability of another outcome, if you have only two possible outcomes. And the exponential: if you exponentiate the logistic regression coefficients, those can be interpreted as odds ratios. And the idea is that when you exponentiate the coefficients, then the coefficients tell you that one unit increase in independent variable causes the odds to change proportionally to the regression coefficient.
I'll show you an example. Let's take a look at the idea of odds ratio, and why we can interpret these coefficients as odds ratios. So example odds for the data. And this is some guess of the results, we have the linear prediction, we have the fitted like probability, we have fitted odds, which is the probability against the other probability, and we calculate the value. So the odds for this first girl having had menarche is 74% to 26%, which is 2.79. The odds for the second girl is 8% to 92%, which is 0.09 and so on. So these are the odds. And when we calculate marginal prediction. So in regression analysis, we are interested in the marginal effect, what is the increase of one independent variable, what is the effect of increasing one independent variable Y by one unit, holding everything else constant. So we are interested in marginal effect. And let's calculate marginal effects now for girls of different ages. So instead of using this actual data, we have a hypothetical girl at age of 9, 10, 11, 12 and so on. We calculate the fitted probabilities using our model and we calculate odds. We calculate the value of the odds, and when we compare two odds here, the ratio of these two 0, they are actually not exactly 0, is 4.6. So every time we go and we increase the girl's age by one, then the odds increase by 4.6. So every additional year increases the odds by 4.6 units. So that's the odds ratio interpretation. So these always increase by 4.6. And how we use that in regression analysis? Well, we calculate the odds ratios, and this is for the actual data. So we calculate the odds ratio, which is about 5 and the interpretation is that each additional year of age increases the odds of having had menarche by fivefold. So that kind of quantifies, how large the effect of age is. We know that if something is increased fivefold, then it's a pretty large effect.
The problem still is that with odds ratios, we can't really say, how much does the actual probability increase, because odds and probability are not the same things. And quite often we want to know, how much does the probability of having had menarche depend on the age, and what does the effect look like. To do that we would need to plot the marginal predictions from the model.