**Topics:**Education › Research

**Type:** Research Essays

**Sample donated:** Teresa Matthews

**Last updated:** August 20, 2019

In regression analysis, bootstrapping is an efficient tool for statisticaldeduction, which focused on making a sampling distribution with the key idea ofresampling the originally observed data with replacement1. The termbootstrapping, proposed by Bradley Efron in his “Bootstrap methods:another look at the jackknife” published in 1979, is extracted from the clichéof ‘pulling oneself up by one’s bootstraps’2.

So, from the meaningof this concept, sample data is considered as a population and replacementsamples are repeatedly drawn from the sample data, which is considered as apopulation, to generate the statistical deduction about the sample data. The essential bootstrap analogy states that “thepopulation is to the sample as the sample is to the bootstrap samples”2. The bootstrap falls into two types, parametric and nonparametric. Parametricbootstrapping assumes that the original data set is drawn from some specificdistributions, e.g. normal distribution2. And the samples generally arepulled as the same size as the original data set. Nonparametricbootstrapping is right the one described in the start of this summary, which repeatedlyand randomly draws a certain size of bootstrapping samples from the originaldata.

According to our regression analysis lecture, bootstrapping is quite usefulin non-linear regression and generalized linear models. For small sample size,the parametric bootstrapping method is highly preferred.2 In largesample size, nonparametric bootstrapping method would be preferably utilized. Fora further clarification of nonparametric bootstrapping, a sample data set, A ={x1, x2, …, xk} is randomly drawn from a population B = {X1, X2, ..

., XK} andK is much larger than k. The statistic T = t(A) is considered as an estimate ofthe corresponding population parameter P = t(B).2 Nonparametricbootstrapping generates the estimate of the sampling distribution of astatistic in an empirical way. Noassumptions of the form of the population is necessary.

Next, a sample of size kis drawn from the elements of A with replacement, which represents as A?1 = {x?11, x?12, …, x?1k}. In the resampling,a * note is added to distinguish resampled data from original data. Replacementis mandatory and supposed to be repeated typically one thousand or ten thousandtimes, which is still developing since computation power develops, otherwiseonly original sample A would be generated.1 And for each bootstrap estimate ofthese samples, mean is calculated to estimate the expectation of thebootstrapped statistics.

Mean minus T isthe estimate of T’s bias. And T?, the bootstrap variance estimate, estimates the sampling variance of the population, P. Then bootstrap confidenceintervals can be constructed using either bootstrap percentile intervalapproach or normal theory interval approach.

Confidence intervals by bootstrappercentile method is to use the empirical quantiles of the bootstrap estimates,which is written as T?(lower) < P < T?(upper). In more details,it can be written as Tˆ ? (Tˆ ? upper – T*ˆ) ? P ? Tˆ + (T*ˆ + Tˆ ?lower). 2 Bootstrapping is an effectivemethod to doublecheck the stability of the model estimation results. It is muchbetter than the intervals calculated by sample variance with normalityassumption. And simplicity is bootstrapping's another important benefit.

Forcomplicated estimators, such as correlation coefficients, percentile points,for complex parameters in the distribution, it is a pretty simple way to generateestimates of confidence intervals and standard errors. However, simplicity can alsobring up disadvantage for bootstrapping, which makes the important assumptions forthe bootstrapping easy to neglect1. And bootstrapping is oftenover-optimistic and doesn’t assure finite sample size1.

There are several types of bootstrapping schemes in the regressionproblems. One typical approach is to resample residuals in the regressionmodels. The main procedure is firstly fit the original data set with the model,and generate model estimates, ?ˆ and calculate residuals, ?ˆ; secondly randomlyand repeatedly sample the residuals (typically 1000 or 10000 times) to get Ksets residuals of size k and add each resampled residual to the originalequation, generating bootstrapped Y*; Finally use bootstrapped Y* to refit themodel and get bootstrap estimate ?ˆ?2. Another typical approach in the regression context is random-xresampling, which is also called case resampling2. We can eitherapply Monte Carlo algorithm, which is to repeatedly resample the data of thesame size as the original data set with replacement, or identify any possibleresampling of the data set2. In our case, before fitting regressionmodel with the original predictor variable and response pairs (xi, yi), for i =1, 2, . .

., k, these data pairs are resampled to get K new data pairs of sizek. Then the regression model is fit to each of these K new data sets. ?ˆ? is generated from Kparameter estimates. In the next section, I’m going to review the nonparametric bootstrappingpackage in R with some examples in my research area—–populationpharmacokinetics analysis. In R, a package is called “boot”, which provides varioussources for bootstrapping either a single statistic or a vector.

To run theboot function in the boot library, there are 3 necessary parameters3:1) data, which canbe a vector, matrix, or data frame for bootstrap resampling3; 2) statistic, thefunction that produces the statistic for bootstrapping. This function shouldinclude the data set and an indices parameter, giving the selection of casesfor each resampling3; 3) R, the number of resamplingtimes3.The function boot() runs the statistic function for Rtimes.

In each call, it generates a group of random indices with replacement toselect a sample. Then calculated statistics for each sample are collected inthe bootobject function. So the function boot() is used as bootobject <- boot(data= , statistic= , R=,...)3.

After seeing the satisfying plot, we use boot.ci(bootobject,conf=, type= ) to get confidence intervals3. Bootstrapping is prevalently used inthe population analysis of clinical trials in pharmaceutical/biotechindustries. It is a pretty useful tool to assess and control the model analysisstability. A good example is how bootstrapping validates populationpharmacokinetic (PK) model for Triptan, a vasopressor used for the acutetreatment of migraine attack5. A single oral dose of 50 mg was givento 26 healthy Korean male subjects. Plasma data were obtained for pre-dose,0.

25, 0.5, 0.75, 1, 1.

5, 2, 2.5, 3, 4, 6, 8, 10, and 12 h post-dose5.Population PK analysis of Triptan was performed using plasma concentration databy our software called NONMEM building models using differential equations. Total364 observations of plasma concentrations were successfully described by aone-compartment model with first-order of both absorption with lag time and elimination,and a combined transit compartment5. The model scheme is shown asFigure 1 as below:Figure 1: The scheme of the final PK model of Triptan 5The final model was validated through a 1000-timeresampling bootstrapping. The procedure was conducted with 1000 datasets resampledfrom the original dataset5.

The median and 90% confidence intervalsof all the PK parameters were shown in the Table 1 together with the finalparameter estimates. Results from the visual prediction check with Table 1: NONMEM estimated Parametersand Bootstrap Results51000 simulations wereassessed by visual comparison of the grayarea of 90% prediction interval from the simulated data with an overlay of the circledraw data. Any excess of data going outside the gray area indicates that theestimates were not legitimate.Figure 2:Visual predictive check plot of the model from time 0 to 12 h after a singleoral administration of 50 mg Triptan. Circles represent the raw data set: the90% confidence interval of the 1000 times simulations (gray area), and observedconcentration (solid line) of the 5th, median, and 95th percentiles.

5 Our conclusion is that thefinal model and its estimated parameter were sufficiently robust and stable bythe assessment of the bootstrapping. All estimated parameter from the finalmodel were within the 95% bootstrap confidence intervals.