Resampling methods
The following program generates an artificial sample of size 50 from the
standard normal and bootstraps the distribution of the sample mean and
median.
Note that the theoretical standard error of the sample median for the
standard normal of size n is Ö
(0.5p / n) which is approximately
0.177 for n = 50. By comparison, the standard error of the sample
mean is Ö(1 / n)
» 0.141.
' bootstrap sample mean and median
' series proc version (slower than matrix version)
' version 4
' 11/2/99 h
' revised and last checked 9/27/2000 h
' set size of sample
!n = 50
' create workfile
workfile bootmed1 u 1 !n
' set seed of random number generator
rndseed 456
' generate pseudo-draws from a standard normal
series x = nrnd
' set number of bootstrap replications
!reps = 1000
' store means and medians in matrix
matrix(!reps,2) out
' bootstrap loop
tic
for !i = 1 to !reps
' generate bootstrap sample
x.resample x_b
' store bootstrap median
out(!i,1) = @median(x_b)
' store bootstrap mean
out(!i,2) = @mean(x_b)
next
scalar elp = @toc
' display descriptive statistics
show out.stats
' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)
' convert bootstrap sample into series
expand 1 !reps
smpl 1 !reps
series out_medi
series out_mean
group gout out_medi out_mean
mtos(out, gout)
' display results in table
table tab
setcolwidth(tab,1,20)
tab(1,2) = "median"
tab(1,3) = "mean"
setline(tab,2)
tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)
tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)
tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea
setline(tab,6)
tab(7,1) = "sample size = "
tab(7,2) = @str(!n)
tab(8,1) = "bootstrap replications = "
tab(8,2) = @str(!reps)
tab(9,1) = "elapsed time = "
tab(9,2) = @str(elp) + " seconds"
show tab
'get kernel density estimates
!n=100 'points to evaluate
do out_mean.kdensity(!n,o=dist_mea,b=0.04)
do out_medi.kdensity(!n,o=dist_med,b=0.2)
smpl 1 !n
'convert to series
series x_mean
series f_mean
group g1 x_mean f_mean
mtos(dist_mea, g1)
series x_medi
series f_medi
group g2 x_medi f_medi
mtos(dist_med, g2)
'compare bootstrap distribution
group g3 g1 g2
freeze(gra1) g3.xyline(b)
'gra1.setelem(1) legend(mean)
'gra1.setelem(2) legend(median)
gra1.addtext(t) Kernel density estimates of bootstrap distribution
gra1.legend -display
gra1.addtext(1.3,0.2) Mean
gra1.addtext(0.2,2.2) Median
show gra1
The following is the matrix function version of the above program that
produces identical results. Note that while the above program took about
4.7 seconds on a Pentium II (333 mhz) machine, the program below takes
about 1.9 seconds.
' bootstrap sample mean and median
' matrix function version (faster than series version)
' version 4
' 11/2/99 h
' last checked 9/19/2000 h
' create workfile
workfile bootmedian2 u 1 1
' set seed of random number generator
rndseed 456
' set size of sample
!n = 50
' generate pseudo-draws from a standard normal
matrix(!n,1) x
nrnd(x)
' set number of bootstrap replications
!reps = 1000
' store means and medians in vectors
vector(!reps,1) out_medi
vector(!reps,1) out_mean
' declare matrix to hold bootstrap sample
matrix x_b
' bootstrap loop
' reset timer
tic
for !i = 1 to !reps
' generate bootstrap sample
x_b = @resample(x)
' store bootstrap median
out_medi(!i) = @median(x_b)
' store bootstrap mean
out_mean(!i) = @mean(x_b)
next
' store elapsed time in seconds
scalar elp = @toc
' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)
' display results in table
table tab
setcolwidth(tab,1,20)
tab(1,2) = "median"
tab(1,3) = "mean"
setline(tab,2)
tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)
tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)
tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea
setline(tab,6)
tab(7,1) = "sample size = "
tab(7,2) = @str(!n)
tab(8,1) = "bootstrap replications = "
tab(8,2) = @str(!reps)
tab(9,1) = "elapsed time = "
tab(9,2) = @str(elp) + " seconds"
show tab
The following program replicates the permutation test in Johnston-DiNardo
(1997, 11.3). The hypothesis under test is whether there is a significant
difference in the sample of employment changes between two states. The
program constructs a nonparameteric 95% confidence interval of the difference
in means using 1000 permutation samples.
' two-sample permutation test
' (Johnston-DiNardo 11.2)
' matrix permutation function (fast)
' for version 4
' 11/3/99 h
' revised and last checked 9/19/2000 h
' create workfile
workfile cardkrug u 1 1
' input data in matrix (first 33 rows from N.J.)
matrix(40) data
data.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5
' extract submatrices for each sample
matrix sub0 = @subextract(data,1,1,33,1)
matrix sub1 = @subextract(data,34,1)
' compute difference in means for actual sample
scalar mdiff = @mean(sub0) - @mean(sub1)
' set number of replications
!reps = 1000
' set seed of random number generator
rndseed 123456
' declare storage matrices
matrix pdata ' permuted data
vector(!reps) pdiff ' difference in means from pdata
' permutation loop
for !i=1 to !reps
' permute data
pdata = @permute(data)
' extract submatrices from permuted data
sub0 = @subextract(pdata,1,1,33,1)
sub1 = @subextract(pdata,34,1)
' store difference in means
pdiff(!i) = @mean(sub0) - @mean(sub1)
next
' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025)
scalar upper = @quantile(pdiff,0.975)
' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = mdiff
out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]"
out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out
For reference, we also provide a program that uses the group resample
procedure. This program also computes the t-statistic for testing the
difference of means.
' two-sample t-test
' (Johnston-DiNardo 11.2)
' group resample function (slow)
' for version 4
' 11/3/99 h
' last checked 9/19/2000 h
' create workfile
workfile cardkrug u 1 40
' input data in matrix
' first 33 from N.J.
series x
x.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5
' create dummy for each subsample
series dum = 0
smpl 1 33
dum = 1
smpl @all
'----------------------------------------------------------------------------
' parametric t-test
'----------------------------------------------------------------------------
freeze(ttest) x.testby(mean) dum
show ttest
'----------------------------------------------------------------------------
' non-parametric permutation test
'----------------------------------------------------------------------------
' compute difference in means for actual sample
series mdiff = @mean(x,"@all if dum=1") - @mean(x,"@all if dum=0")
scalar dmean = @elem(mdiff,"1")
' set number of replications
!reps = 1000
' set seed of random number generator
rndseed 123456
' declare storage matrices
matrix pdata ' permuted data
vector(!reps) pdiff ' difference in means from pdata
' permutation loop
for !i=1 to !reps
' permute data
x.resample(permute) x_b
' compute difference in means
mdiff = @mean(x_b,"@all if dum=1") - @mean(x_b,"@all if dum=0")
' store difference in means
pdiff(!i) = @elem(mdiff,"1")
next
' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025)
scalar upper = @quantile(pdiff,0.975)
' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = dmean
out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]"
out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out