TU-L0022 - Statistical Research Methods D, Lecture, 25.10.2022-29.3.2023
This course space end date is set to 29.03.2023 Search Courses: TU-L0022
Bootstrapping (15:57)
This video goes through the bootstrapping. Video explains what bootstrapping is and when it works. The video also explains bootstrap regression coefficient, confidence intervals, and normal approximation and empirical confidence intervals
Click to view transcript
When we do
statistical analysis we always get the point estimate or the estimate of the
effect, just one regression coefficient or one number. We also need to know how
certain we are about that number and that certainty is quantified by the
standard error. So that
standard error quantifies the precision and we use the standard error and the
actual estimate to calculate a statistics that give us the p-values. In some
scenarios calculating the standard error is hard or calculating the standard
error is something that requires assumptions that we are not willing to make or
assumptions that we know that they are not true for our particular data and
analysis. Bootstrapping
provides an alternative way of calculating standard errors or estimating how
much a statistic would vary from one sample to another and bootstrapping is
like a computational approach to the problem of calculating a standard error. How
bootstrapping works is that we have original sample. So we have a sample of 10
observations here from a normally distributed population with mean of 0 and
standard deviation of 1. So that's our original sample here, the mean is zero
point 13 from that sample and if we take
multiple samples from the same population, here is the
sampling distribution of the of the sample mean if the sample size is 10 from
this population. Most of the time we get values close to the 0 which is the
population value of mean and then sometimes we get estimates that are far from
the actual population value. The idea of
bootstrapping is that if we don't know how we estimate the width of this
sampling distribution or the shape using statistical theory or are closed for
equation then we can do the temperately. So instead of our calculating it using
an equation we take repeated samples from our original sample. So that's our
original sample it forms the population for the bootstrap. Then we take a
repeated sample so we take first or 0.31 it is our here, then we put it back so
we allow every observation to include it be included in the sample multiple
times. Then we take randomly another one 0.83, it's here we put it back, then
we take yet another number, yet another number, we take the zero point - 0.84
the second time and so on. So we take
these samples from an original data and every observation can be included in
the in the sample multiple times. So each of these are randomly chosen numbers
and doesn't depend on any other previous choices. So we get our using this
bootstrap sample, we get 0.34 a sample mean, we calculated it many times,
typically we do one hundred five hundred or thousand times or even ten thousand
times depending on the complexity of the calculation. Thousand repetitions is
quite normal nowadays. So we can
see that from sample to sample this sample mean varies and these are various of
sample mean, calculator the distribution of this sample mean from the bootstrap
samples, calculated from our thousand bootstrap replications here is about the
same shape as that if we would take the samples from the actual population. So these
two distributions are quite similar and we can use that information the
knowledge that these two distributions are similar. They approach each other
when the sample size increases. We can use that knowledge to say that this
distribution here is a good representation of that distribution and if we want
to estimate the standard deviation of this distribution which is what standard
error quantifies or estimates. Then we can just use the standard deviation of
that distribution. Here we can
see that the mean of this distribution is slightly off. That's called the
bootstrap bias. So this mean here is roughly at the mean here. So it's not that
the population mean in instead of if it's closer that they are the mean of this
particular sample. Then
they're also the width of this distribution is in this case slightly smaller,
so that this person here is slightly smaller than this person here and
that is also something that we in sometimes need to take in the consideration. The key
thing in bootstrapping is that when sample size increases then this mean and
the standard deviation will be closer to that mean and that standard deviation. Let's take
a look at a demonstration of how bootstrapping works. This is a video from a
Statistics Department from University of Auckland and they demonstrate that you
have your original sample here. So we have two variables. We have up there on X
variable and Y variable and then we have a regression coefficient. So we
calculate the regression coefficient here and we are interested in how much
this regression coefficient, the slope would vary if we were to take this
sample over and over from the same population. So that's
what the standard error quantifies. For some reason we don't want to use the
normal formula that our statistical software uses to calculate the standard
error. We want to do it by bootstrapping. So we take samples from your original
data. You can see here that each observation can be included multiple times.
Sometimes an observation is not included in the sample. Then we get the
regression coefficient that is slightly different from the original one. We do
another bootstrap sample. We get another regression coefficient again slightly
different from the original one. We take yet another bootstrap sample, we get
slightly different one and we go on a hundred times a thousand times and
ultimately we get an estimate of how much this regression coefficient would
really vary if we were to take multiple different samples. So that's when you
get a thousand samples or a hundred samples. Then you
can see that the variance of the regression coefficient is that much between
the bootstrap samples and if sample size is large enough this variation of the
bootstrap samples is a good approximation of how much the regression
coefficient would vary if we were to repeat the same independent samples from
the same population and calculate's the regression analysis again and again
from those independent samples. Bootstrapping
can be used to calculate the standard error within which case we just take a standard
deviation of these regression slopes and then that is our standard error
estimate. We can also
use bootstrapping to calculate confidence intervals. So the idea of a
confidence interval is that instead of estimating a standard error and a
p-value, we estimate a point estimate. So for example a value of a correlation
one single value and then we estimate an interval let's say 95% a interval
which has an upper limit and lower limit and then if we repeat the calculation
many times from independent samples then the population value will be within
the interval if it's a valid interval 95% of the times. So this is
an example of correlation and we can see that the correlation estimates, when
there is a zero correlation in the population, we have a small sample size,
they vary between zero point minus 0.2 and plus zero point two and most of the
time when we draw the confidence interval which is the line here.
The line includes the population value. This is two and a half percent of the
replications here and it doesn't include the population values. So the
population value here falls above, there are the upper limit. Here we
have extremely large correlations and the population value for about two and a
half percent of the replications falls below the lower limit. In 95% of the
cases here, there are population value is within the interval. So that's the
idea of confidence intervals. Here we can see them when the police value is
large then the width of the confidence interval depends on the correlation
estimate. So when the correlation estimate is very high then there are the
confidence intervals, it's narrow. When the corrosion estimate is very low then
it's a lot wider here, they are the confidence interval. So the
confidence interval depends on the value of the statistic and also it depends
on the estimated standard error of the statistic. Now there are a couple of
ways that bootstrapping can be used for calculating confidence interval.
Normally when we do confidence intervals we use the normal approximation. So
the idea is that we assume that the estimate is normally distributed over
repeated samples. Then we calculate the confidence interval, it is estimate
plus or minus 1.96 which covers 95% of the normal distribution multiplied by
the standard error. So that
gives us the plus or minus. So if we have an estimate of correlation that is
here then we multiply the standard error by 1.96 - estimate minus. That is the
load limit estimate plus one point nine times the standard error is here. So
that gives us the upper and lower limit, in this example 1 percent and 13
percent when the actual estimate is about our 5 percent. So we
calculate how we use bootstrapping for this calculation is that there the
standard error is simply the standard deviation of the bootstrap estimate. So
if we take a correlation with bootstrap it, then we calculate how much their
correlation varies between the bootstrap samples using standard deviation
metric and then we use that plug that in. That formula gives us the confidence
intervals. So that works when we can assume that the estimate is normally
distributed What if we
can't assume that the estimate are normally? That is the case when we can use
empirical confidence intervals based on bootstrapping. So the idea of the
normal approximation interval is that the estimate is normally distributed.
Then we can use this equation or we can use empirical confidence intervals. The
idea of an empirical confidence interval is that we do the bootstrapping and
then we take our thousand bootstrap replications. Then we take the 25th from
smallest to largest, we take the 25th value of the bootstrap replicates and
that is our lower limit for the confidence interval. Then we take the 975th and
that is the upper limit so that's 2.5% and 97.5% and that's the upper limit of
our or confidence interval. So that's called percentile intervals. So when we
have this kind of bootstrap distribution we will take replication here to 25th
replication that is our lower limit, and we take the 975th replication here,
that is our upper limit. So that gives us the confidence interval for the mean
that is estimated here. That has
two problems this approach. First, the bootstrap distribution is biased. So the mean
of these bootstrap replications is about 0.15 and the actual sample value for
the mean is zero. To account for that bias we have a bias corrected confidence
intervals. The idea of bias corrected confidence intervals is that instead of
taking the 25th and 975th bootstrap replicate as the endpoints, we first
estimate how much the bootstrap bias is and then a based on that estimate we
take for example the 40th and 980th replication. So instead of taking the fixed
25th and fixed 975th, we adjust which replicates we take as the end points. There's
also the problem that the variance, the standard deviation here is not always
the same as the standard deviation of here. So in the correlation example you
saw that the confidence interval decreased as actual correlation estimate went
up. So the idea
is that the width of the interval depends on the value of the estimate. To take
that into account we have bias correlate correctly and an accelerated
confidence intervals, which apply the same idea at us the bias corrected ones
but instead of just taking the bias into account, they take the estimated
differences in variance of these two distributions into account when we choose
the endpoints for the confidence intervals. Now the
question is, these looks really good, so we can estimate the variance of any
statistic empirically and we don't have to know the math and that's basically
true with some qualifications. The qualifications are that bootstrapping
requires large sample size. There is a
good article or a book chapter by Koopman and co-authors in the book edited by
Bundaberg about statistical myths and urban legends, and they point out that
there are three different claims made in the literature. There's the claim that
bootstrapping works well in small samples and there is a fact that
bootstrapping assumes that sample is representative of the population. So if
our sample is very different from the population then the bootstrap samples
that we take from our original sample cannot approximate how the samples would
actually behave from the real population. Then sampling error, which means how
different the sample is from the population, is troublesome in small samples. So in small
samples the sample may not be very accurate representation of the population.
So if small samples are not representative population and if we require that
sample must be representative to population then bootstrapping cannot work in
small samples. So bootstrapping generally requires a large sample size. Then there
are also some boundary conditions under which bootstrapping doesn't work even
if you have a large sample. So there are that kind of scenarios but for most
practical applications only the sample size is the thing that you need to be
concerned about. The problem is that it is very hard to say when your sample
size is large enough.