Monte Carlo Basics


The following document guides you through the steps to carry out a Monte Carlo simulation exercise using EViews. To understand this material you must be fairly confortable with the EViews programming language. If you have not yet done so, first read Chapter 5 (Working with Matrices) and Chapter 7 (Programming in EViews) of the Command and Programming Reference before proceeding.

A typical Monte Carlo simulation exercise consists of the following steps:

  1. Completely specify the "true" model, or data generating process underlying the data.
  2. Simulate a draw from the data and estimate the model using the simulated data.
  3. Repeat step 2 many times, each time storing the results of interest.
  4. The end result is a series of estimation results, one for each repetition. We can then characterize the empirical distribution of these results by tabulating the sample moments or by plotting the histogram or kernel denisty estimate.
Step 2 typically involves simulating a random draw from a specified distribution. EViews has built-in pseudo-random number generating functions for a wide range of commonly used distributions; see Appendix A, Statistical Distribution Functions, pp.426-429 of the Command and Programming Reference.

The step that requires a little thinking is how to store the results from each repetition (step 3). There are two methods that you can use in EViews, each with its advantage and disadvantage. The first method is to store the results into a series or group of series. This is probably what most users will try to do. The difficulty with this approach is that a series is indexed by a sample in EViews. To store the result from each replication as a different observation in a series, you will have to keep shifting the sample every time you store a new result. Moreover, the length of the series will be constrained by the size of your workfile sample. If you are doing 1000 replications on a workfile with 100 observations, you will not be able to store all 1000 results in a series with 100 observations.

The alternative method is to store the results in a matrix (or vector). The matrix is indexed by its row-column position and its size is independent of the workfile sample. For example, you can declare a matrix with 1000 rows and 10 columns in a workfile with only 1 observation. (Verify this!) The disadvantage of the matrix method is that the matrix object does not have as much built-in functions as a series object. For example, there is no kernel density estimate view out of a matrix (which is available for a series object).

For pedagogical purposes, I will illustrate both approaches. (My own recommendation is a mixed approach. Store all the results in a matrix. If you need to do further processing, convert the matrix into a group of series.) As a concrete example, consider the following exercise (3.26) from Damodar Gujarati Basic Econometrics, 3rd edition:

Refer to the 10 X values of Table 3.2 (which are: 80, 100, 120, 140, 160, 180, 200, 220, 240, 260). Let Beta(1) = 25 and Beta(2) = 0.5. Assume that the errors are distributed N(0, 9), that is, the errors are normally distributed with mean 0 and variance 9. Generate 100 samples using these values, obtaining 100 estimates of Beta(1) and Beta(2). Graph these estimates. What conclusions can you draw from the Monte Carlo study?
The first step is to write a program file that does one replication of the Monte Carlo experiment. For this problem, the program file will look something as follows:
' create workfile
workfile mcarlo u 1 10

' create data series for x
series x
x.fill 80, 100, 120, 140, 160, 180, 200, 220, 240, 260

' set seed for random number generator
rndseed 123456

' simulate y data
series y = 2.5 + 0.5*x + 3*nrnd

' regress y on a constant and x
equation eq1.ls y c x

' display results
show eq1.output
Since we want to run the regression several times (in this case 100 times), we will add a loop in the program as follows:
' create workfile
workfile mcarlo u 1 10

' create data series for x 
' NOTE: x is fixed in repeated samples
series x
x.fill 80, 100, 120, 140, 160, 180, 200, 220, 240, 260

' set seed for random number generator
rndseed 123456

' assign number of replications to a control variable
!reps = 100

' begin loop
for !i = 1 to !reps
	' simulate y data
	series y = 2.5 + 0.5*x + 3*nrnd

	' regress y on a constant and x
	equation eq1.ls y c x
next
' end of loop
This program is not useful as it is since it does not store the results from each replication. At the end of the program, the C coefficient vector only contains the estimated parameter values from the last replication.

Storing results in a series

In order to store results for 100 replications in a series, we expand the workfile to the larger of the number of replications and the number of observations for one sample. For this problem this will be 100. Then inside the loop, we set the sample to the first 10 observations, draw a sample for y, and estimate the model. Then we reset the sample to the replication number and store the estimated coefficient in the appropriate observation number. This is repeated for each replication. Thus for the 1st replication, we store the estimates in observation 1, the 2nd replication estimates are stored in observation 2, and so on until the last replication which is stored in the last observation. The program will look as follows:
' store monte carlo results in a series

' set workfile range to number of monte carlo replications
workfile mcarlo u 1 100

' create data series for x 
' NOTE: x is fixed in repeated samples
' only first 10 observations are used (remaining 90 obs missing)
series x
x.fill 80, 100, 120, 140, 160, 180, 200, 220, 240, 260

' set seed for random number generator
rndseed 123456

' assign number of replications to a control variable
!reps = 100

' begin loop
for !i = 1 to !reps
	' set sample to estimation sample
	smpl 1 10
 
	' simulate y data (only for 10 obs)
	series y = 2.5 + 0.5*x + 3*nrnd

	' regress y on a constant and x
	equation eq1.ls y c x

	' set sample to one observation
	smpl !i !i

	' and store each coefficient estimate in a series
	series b1 = eq1.@coefs(1)
	series b2 = eq1.@coefs(2)
next
' end of loop

' set sample to full sample
smpl 1 100

' show kernel density estimate for each coef
freeze(gra1) b1.kdensity
' draw vertical dashline at true parameter value
gra1.draw(dashline, bottom, rgb(156,156,156)) !beta1
show gra1

freeze(gra2) b2.kdensity
' draw vertical dashline at true parameter value
gra2.draw(dashline, bottom, rgb(156,156,156)) !beta2
show gra2
The crucial step in the above program is the line
smpl !i !i
inside the loop. If you can answer the following question, you are probably ready to write code for your own application: what happens if that smpl statement was not there? Without that line, the sample is set to the first 10 observation at the top of the loop and hence the first 10 observations of the series b1, b2 will be assigned the estimated coefficient value from that iteration. Therefore when the program exits the loop, b1 and b2 will contain only estimates from the last iteration in observations 1 to 10 and observations 11 to 100 will be missing. (Again, verify this!)

Storing results in a matrix

The alternative approach is to store the results in a matrix object. The workfile range can now be set to the sample size. Before you enter the loop, you declare the matrix to store the results. Then inside the loop, you store the results in the appropriate cell of the storage matrix. The code will be something like this:
' store monte carlo results in a matrix

' set workfile range to number of obs
workfile mcarlo u 1 10

' create data series for x 
' NOTE: x is fixed in repeated samples
series x
x.fill 80, 100, 120, 140, 160, 180, 200, 220, 240, 260

' set seed for random number generator
rndseed 123456

' assign number of replications to a control variable
!reps = 100

' declare storage matrix 
matrix(!reps,2) beta

' begin loop
for !i = 1 to !reps 
	' simulate y data
	series y = 2.5 + 0.5*x + 3*nrnd

	' regress y on a constant and x
	equation eq1.ls y c x

	' store each coefficient estimate in matrix
	beta(!i,1) = eq1.@coefs(1)	' column 1 is intercept
	beta(!i,2) = eq1.@coefs(2)	' column 2 is slope
next
' end of loop

' show descriptive stats of coef distribution
beta.stats
If you run this program, you will notice that the matrix version runs much faster than the series version. However, you will not be able to draw the kernel density estimate directly from a matrix object.

(last revised 11/2/2000 h)