Resampling methods

Bootstrapping the sample median (bootmed1.prg, bootmed2.prg)
Two-sample permutation test (cardkrug1.prg, cardkrug2.prg)

Bootstrapping the sample median

The following program generates an artificial sample of size 50 from the standard normal and bootstraps the distribution of the sample mean and median. Note that the theoretical standard error of the sample median for the standard normal of size n is Ö (0.5p / n) which is approximately 0.177 for n = 50. By comparison, the standard error of the sample mean is Ö(1 / n) » 0.141.

' bootstrap sample mean and median
' series proc version (slower than matrix version)
' version 4 
' 11/2/99 h
' revised and last checked 9/27/2000 h


' set size of sample
!n = 50

' create workfile
workfile bootmed1 u 1 !n

' set seed of random number generator
rndseed 456

' generate pseudo-draws from a standard normal
series x = nrnd

' set number of bootstrap replications
!reps = 1000

' store means and medians in matrix
matrix(!reps,2) out

' bootstrap loop
tic
for !i = 1 to !reps
	' generate bootstrap sample
	x.resample x_b
	' store bootstrap median
	out(!i,1) = @median(x_b)
	' store bootstrap mean
	out(!i,2) = @mean(x_b)
next
scalar elp = @toc

' display descriptive statistics
show out.stats

' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)

' convert bootstrap sample into series
expand 1 !reps
smpl 1 !reps

series out_medi
series out_mean
group gout out_medi out_mean
mtos(out, gout)

' display results in table
table tab
setcolwidth(tab,1,20)

tab(1,2) = "median"
tab(1,3) = "mean"

setline(tab,2)

tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)

tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)

tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea

setline(tab,6)

tab(7,1) = "sample size = "
tab(7,2) = @str(!n)

tab(8,1) = "bootstrap replications = " 
tab(8,2) = @str(!reps)

tab(9,1) = "elapsed time = " 
tab(9,2) = @str(elp) + " seconds"
show tab

'get kernel density estimates
!n=100	'points to evaluate
do out_mean.kdensity(!n,o=dist_mea,b=0.04)
do out_medi.kdensity(!n,o=dist_med,b=0.2)

smpl 1 !n

'convert to series
series x_mean
series f_mean
group g1 x_mean f_mean
mtos(dist_mea, g1)

series x_medi
series f_medi
group g2 x_medi f_medi
mtos(dist_med, g2)

'compare bootstrap distribution
group g3 g1 g2
freeze(gra1) g3.xyline(b)
'gra1.setelem(1) legend(mean)
'gra1.setelem(2) legend(median)
gra1.addtext(t) Kernel density estimates of bootstrap distribution
gra1.legend -display
gra1.addtext(1.3,0.2) Mean
gra1.addtext(0.2,2.2) Median
show gra1

The following is the matrix function version of the above program that produces identical results. Note that while the above program took about 4.7 seconds on a Pentium II (333 mhz) machine, the program below takes about 1.9 seconds.

' bootstrap sample mean and median
' matrix function version (faster than series version)
' version 4 
' 11/2/99 h
' last checked 9/19/2000 h

' create workfile
workfile bootmedian2 u 1 1

' set seed of random number generator
rndseed 456

' set size of sample
!n = 50

' generate pseudo-draws from a standard normal
matrix(!n,1) x
nrnd(x)

' set number of bootstrap replications
!reps = 1000

' store means and medians in vectors
vector(!reps,1) out_medi
vector(!reps,1) out_mean

' declare matrix to hold bootstrap sample
matrix x_b

' bootstrap loop
' reset timer
tic
for !i = 1 to !reps
	' generate bootstrap sample
	x_b = @resample(x)
	' store bootstrap median
	out_medi(!i) = @median(x_b)
	' store bootstrap mean
	out_mean(!i) = @mean(x_b)
next
' store elapsed time in seconds
scalar elp = @toc

' theoretical standard error of median from normal
' se(med) = 1 / ( 2*m*(obs)^0.5 )
' where m is the median value
' (Kendall-Stuart, 1977, 4th ed, vol 1, p.252)
!pi = @acos(-1)
scalar se_med = @sqrt(0.5*!pi/!n)
scalar se_mea = @sqrt(1/!n)

' display results in table
table tab
setcolwidth(tab,1,20)

tab(1,2) = "median"
tab(1,3) = "mean"

setline(tab,2)

tab(3,1) = "bootstrap mean:"
tab(3,2) = @mean(out_medi)
tab(3,3) = @mean(out_mean)

tab(4,1) = "bootstrap s.d.:"
tab(4,2) = @stdev(out_medi)
tab(4,3) = @stdev(out_mean)

tab(5,1) = "theoretical s.d.:"
tab(5,2) = se_med
tab(5,3) = se_mea

setline(tab,6)

tab(7,1) = "sample size = "
tab(7,2) = @str(!n)

tab(8,1) = "bootstrap replications = " 
tab(8,2) = @str(!reps)

tab(9,1) = "elapsed time = " 
tab(9,2) = @str(elp) + " seconds"
show tab

Two-sample permutation test

The following program replicates the permutation test in Johnston-DiNardo (1997, 11.3). The hypothesis under test is whether there is a significant difference in the sample of employment changes between two states. The program constructs a nonparameteric 95% confidence interval of the difference in means using 1000 permutation samples.

' two-sample permutation test
' (Johnston-DiNardo 11.2)
' matrix permutation function (fast)
' for version 4
' 11/3/99 h
' revised and last checked 9/19/2000 h

' create workfile
workfile cardkrug u 1 1

' input data in matrix (first 33 rows from N.J.)
matrix(40) data
data.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5

' extract submatrices for each sample
matrix sub0 = @subextract(data,1,1,33,1)
matrix sub1 = @subextract(data,34,1)

' compute difference in means for actual sample
scalar mdiff = @mean(sub0) - @mean(sub1)

' set number of replications
!reps = 1000

' set seed of random number generator
rndseed 123456

' declare storage matrices
matrix pdata				' permuted data
vector(!reps) pdiff			' difference in means from pdata

' permutation loop
for !i=1 to !reps
	' permute data
	pdata = @permute(data)
	' extract submatrices from permuted data
	sub0 = @subextract(pdata,1,1,33,1)
	sub1 = @subextract(pdata,34,1)
	' store difference in means
	pdiff(!i) = @mean(sub0) - @mean(sub1) 
next

' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025) 
scalar upper = @quantile(pdiff,0.975)

' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = mdiff

out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" 

out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out

For reference, we also provide a program that uses the group resample procedure. This program also computes the t-statistic for testing the difference of means.

' two-sample t-test
' (Johnston-DiNardo 11.2)
' group resample function (slow)
' for version 4
' 11/3/99 h
' last checked 9/19/2000 h

' create workfile
workfile cardkrug u 1 40

' input data in matrix
' first 33 from N.J.
series x
x.fill -20, -17.5, -13, -12.5, -4.5, -4, -3.5, -2, -1.5, -1, -0.5, -0.5, 0, 0, 0.5, 0.5, 1.5, 2, 2, 2.25, 3, 4.5, 4.5, 5.5, 6, 6.25, 8.25, 9, 10, 10.5, 12, 14.75, 34, -7, -6, -2.5, -0.5, 4, 4.5, 4.5

' create dummy for each subsample
series dum = 0
smpl 1 33
dum = 1
smpl @all

'----------------------------------------------------------------------------
' parametric t-test
'----------------------------------------------------------------------------

freeze(ttest) x.testby(mean) dum
show ttest

'----------------------------------------------------------------------------
' non-parametric permutation test
'----------------------------------------------------------------------------

' compute difference in means for actual sample
series mdiff = @mean(x,"@all if dum=1") - @mean(x,"@all if dum=0")
scalar dmean = @elem(mdiff,"1")

' set number of replications
!reps = 1000

' set seed of random number generator
rndseed 123456

' declare storage matrices
matrix pdata				' permuted data
vector(!reps) pdiff			' difference in means from pdata

' permutation loop
for !i=1 to !reps
	' permute data
	x.resample(permute)  x_b
	' compute difference in means
	mdiff = @mean(x_b,"@all if dum=1") - @mean(x_b,"@all if dum=0")
	' store difference in means
	pdiff(!i) = @elem(mdiff,"1") 
next

' 95% interval (percentile method)
scalar lower = @quantile(pdiff,0.025) 
scalar upper = @quantile(pdiff,0.975)

' display output in table
table out
setcolwidth(out,1,30)
out(1,1) = "Sample difference in means:"
out(1,2) = dmean

out(2,1) = "Permutation confidence interval:"
out(2,2) = "[" + @str(lower) + "," + @str(upper) + "]" 

out(3,1) = "Number of permutations:"
out(3,2) = @str(!reps)
show out