TU-L0022 - Statistical Research Methods D, Lecture, 25.10.2022-29.3.2023
This course space end date is set to 29.03.2023 Search Courses: TU-L0022
Maximum likelihood estimation of logistic regression model (6:39)
This video explains how the maximum likelihood estimation principle can be applied to the logistic regression model.
Click to view transcript
In
this video, I will explain how the maximum likelihood estimation
principle can be applied to estimate a logistic regression model.
Our
data are girls and the dependent variable is, whether a girl has had a
Menarche or not. And the independent variable is age. And we're fitting
the logistic curve. So we can see here that when the girl's age is close
to 10, which is the minimum of the sample, the predictive probability
of menarche is about zero, and when the girl is 18, which is about the
maximum of the sample, the predictive probability of menarche is about
1. And we want to estimate this logistic curve, how it goes, and it
tells us the relationship between age and menarche.
We apply the
probability calculations to values that are 1's and 0's, that's the
dependent variable. And to do that, we use the Bernoulli distribution.
The idea of a Bernoulli distribution is that we only have 1's and 0's.
And in this example, the 0's are twice as prevalent as 1's, and the
population is always very large in maximum likelihood estimation,
because when we take a sample of 1, 0 or 1, 1 away from the population,
the ratio of 1's or 0's should stay the same, even if we take a sample
away from the population. The probability of getting 0 is 67% from this
sample, and the probability of getting 1 is 33%. So when we have this
set of observed values that are sample, we have seven 0's and two 1's.
They happen to be in this order by random, it doesn't have any
significance or any meaning. And we calculate the probabilities, then we
calculate the total probability by multiplying all these individual
probabilities together. So when we know, what the population is, then we
know the probabilities of getting particular values from that
population.
In maximum likelihood estimation, the population is
not known, but we have to estimate, what is the effect of age on
menarche in the population, and what's the base level. And, so we don't
talk about probabilities, we talk about likelihoods. So the idea of
maximum likelihood estimation is that we try to find a population that
has the maximum likelihood of having produced these values here. So we
don't know, what the mean is or what's the ratio of 1's and 0's, we only
know the data. And we assume that the model exists for the population.
Then we calculate, we have some guesses for this ratio, and then we
calculate likelihoods, we calculate the cumulative likelihood, and we
maximize the cumulative likelihood to find the maximum likelihood
estimation by changing our model parameters. So for example, we could
guess that, the ratio is 2 to 7, that gives us probabilities of 78% and
22% for 0' and 1's. We calculate the cumulative probabilities, or we
multiply everything together. And this is the likelihood of the sample
given our estimated population.
The maximum likelihood estimate
is simply found by changing our guess of the ratio of 1's and 0's so
that this value here becomes as large as possible. This principle is
applied to the logistic regression analysis. The idea is that we
calculate using this logistic curve and this age here, and the known
ages and the known menarche status of these girls. We calculate the
individual likelihood for the observations, and then we use those
individual likelihoods to find the best possible logistic curve for the
data. How it works in practice is that we have some kind of guess. So we
guess that menarche is a linear function of age, and an intercept
transforms using the logistic function. So let's say that the intercept
is -20, and the effect of age is 1.54, we apply logit function to the
linear prediction, then that gives us the expected probabilities. Then
we check, how likely that particular observation is, given the fitted
probability. So for example, the first girl here is 13.6 years and she
has had menarche. The linear prediction for that girl using this
equation here is 0.94. Then the fitted probability using the logistic
function, to this linear prediction, is 73.6%. So if the probability is
73.6%, and the girl has had menarche, then the likelihood for that
observation is 73.6. Then we move on to the next girl. So that's 11.4
years and she has not had menarche. The linear prediction is -2 .44, so
it's calculated using this equation here, and we apply logit function,
gives us 8% predictive probability. Because it's only 8% probability
that this girl would have had menarche given her age and she didn't.
Then the likelihood for this observation is 1 - 8%, which is 92% here.
We
do that calculation, we calculate the likelihood for all the girls and
that gives us the product 6.4%. For computational reasons, we don't
typically work with these raw likelihoods, and multiply them together.
Instead, we work with logarithms. So we calculate the logarithm of the
likelihood, called the log-likelihood for each individual observation,
and we take a sum of these log-likelihoods and that gives us the full
log-likelihood of the sample. And we adjust the values of intercept, and
values of age or the coefficient for age to make this full sample
log-likelihood as large as possible. In practice, this is almost always a
negative number. So we try to make it closer to zero or smaller
negative number.