Simulation

Output Data Analysis

56 minute read



Notice a tyop typo? Please submit an issue or open a PR.

Output Data Analysis

Introduction

Let's recall the steps that we take when performing a simulation study. First, we conduct a preliminary analysis of the system. Then, we build models. Third, we verify and validate our models and their implementations, potentially returning to previous steps depending on the outcome. Next, we design experiments, conduct simulation runs, and perform statistical analysis of the output. Finally, we implement our solutions.

Don't forget to analyze simulation output!

Why Worry about Output?

Since the input processes driving a simulation - interarrival times, service times, breakdown times, etc. - are random variables, we must regard the simulation's output as random.

As a result, simulation runs only yield estimates of measures of system performance. For example, we might run our simulation many times and arrive at an estimate for the mean customer waiting time. We derive these estimates from observations output by the simulation.

Since the output is random, the estimators are also random variables. Therefore, they are subject to sampling error, which we must consider to make valid inferences about system performance.

For example, we wouldn't want to run a simulation several times and then report just the average value for a particular metric. Ideally, we'd like to present a confidence interval that presents a range of likely values.

Measures of Interest

We might be interested in means. For example, if we are simulating a waiting room, we might be very curious about the mean waiting time.

We also have to look at variances. Again, if we are simulating a waiting room, we should look at how much waiting time is liable to vary from customer to customer.

Quantiles can also be very important, especially in the domain of risk analysis. For example, we might want to know the 99% quantile for waiting time in a particular queue.

We could be interested in success probabilities. We might want to know the probability that a particular job finishes before a certain time.

Additionally, we want to have point estimators and confidence intervals for the various statistics above.

The Problem

The problem is that simulations rarely produce raw input that is independent and identically distributed. We have to be careful about how we analyze data since we cannot make this iid assumption.

For example, let's consider customer waiting times from a queueing system. These observations are not independent - typically, they are serially correlated. If one customer has to wait a long time, we expect the person next to them to have to wait a long time as well.

Additionally, these observations are not identically distributed. Customers showing up early in the morning are likely to have much shorter waiting times than those who arrive during peak hours.

Furthermore, these observations are not even normally distributed. Waiting times skew right - occasionally, we have to wait a very long time - and they certainly cannot be negative.

As a result, we can't apply the classical statistical techniques to the analysis of simulation output. We will focus on other methods to perform statistical analysis of output from discrete-event computer simulations.

If we conduct improper statistical analysis of our output, we can invalidate any results we may wish to present. Conversely, if we give the output the proper treatment, we may uncover something important. Finally, if we want to study further, there are many exciting research problems to explore.

Types of Simulation

There are two types of simulations we will discuss with respect to output analysis.

When we look at finite-horizon or terminating simulations, we look at the short-term performance of systems with a defined beginning and end. In steady-state simulations, we are interested in the performance of a long-running system.

Finite-Horizon Simulations

A finite-horizon simulation ends at a specific time or because of a specific event. For instance, if we are simulating a mass-transit system during rush hour, the simulation ends when rush hour ends. If we are simulating a distribution system for one month, the simulation ends when the month ends. Other examples of finite-horizon simulations include simulating a production system until a set of machines breaks down or simulating the start-up phase of a system.

Steady-State Simulations

The purpose of a steady-state simulation is to study the long-run behavior of a system. A performance measure in such a system might be a parameter characteristic of the process's long-term equilibrium. For instance, we might be interested in a continuously operating communication system where the objective is the computation of the mean delay of a packet in the long run.

A Mathematical Interlude (OPTIONAL)

In this lesson, we will look at some math that informs why we cannot give simulation output the classical statistical treatment.

A Mathematical Interlude

We'll look at a few examples illustrating the fact that results look a little different when we don't have iid observations. Like we've said, we have to be careful.

Let's talk about our working assumptions. For the remainder of this lesson, let's assume that our observations, Y1,Y2,...,YnY_1, Y_2,...,Y_n, are identically distributed with mean ฮผ\mu, but are not independent. Such an assumption often applies in the context of steady-state simulation.

We'll compute the sample mean and sample variance, which we know are very effective estimators when the observations are iid. As we will see, the sample mean still works well, but we'll have to be careful regarding the sample variance.

Properties of the Sample Mean

Let's look at the expected value of the sample mean, Yห‰n\bar Y_n:

E[Yห‰n]=1nโˆ‘i=1nE[Yi]E[\bar Y_n] = \frac{1}{n}\sum_{i=1}^n E[Y_i]

Since E[Yi]=ฮผE[Y_i] = \mu, E[Yห‰n]=ฮผE[\bar Y_n] = \mu. Therefore, the sample mean is still unbiased for ฮผ\mu. We already knew this: the sample mean is always unbiased for ฮผ\mu so long as all of the observations come from a distribution with mean ฮผ\mu, regardless of whether they are correlated with one another.

Now let's consider the covariance function, which we define as RkR_k:

Rkโ‰กCov(Yi,Yi+k)R_k \equiv \text{Cov}(Y_i, Y_{i+k})

We also know that, by definition, the variance of a random variable is equal to the covariance of that random variable with itself:

Var(X)=Cov(X,X)\text{Var}(X) = \text{Cov}(X,X)

Thus, we can create an expression for the variance of the sample mean:

Var(Yห‰n)=Cov(Yห‰n,Yห‰n)\text{Var}(\bar Y_n) = \text{Cov}(\bar Y_n, \bar Y_n)

We can re-express the covariance of the sample mean using the definition of the sample mean. Notice that we have two summation expressions, and we divide by nn twice:

Var(Yห‰n)=1n2โˆ‘i=1nโˆ‘j=1nCov(Yi,Yj)\text{Var}(\bar Y_n) = \frac{1}{n^2} \sum_{i=1}^n\sum_{j=1}^n \text{Cov}(Y_i, Y_j)

By definition of RkR_k, we can replace Cov(Yi,Yj)\text{Cov}(Y_i, Y_j) with Rโˆฃiโˆ’jโˆฃR_{|i-j|}:

Var(Yห‰n)=1n2โˆ‘i=1nโˆ‘j=1nRโˆฃiโˆ’jโˆฃ(1)\text{Var}(\bar Y_n) = \frac{1}{n^2} \sum_{i=1}^n\sum_{j=1}^n R_{|i-j|} \quad (1)

Notice that we take the absolute value of iโˆ’ji-j to prevent a negative value for kk. After a bunch of algebra, equation (1)(1) simplifies to a single summation expression:

Var(Yห‰n)=1n[R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rk](2)\text{Var}(\bar Y_n) = \frac{1}{n} \left[R_0 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)R_k\right] \quad (2)

How did we get from (1)(1) to (2)(2)? Let's take all of the (Yi,Yj)(Y_i, Y_j) pairs and build an nร—nn \times n covariance matrix.

(R0R1R2โ‹ฏRnโˆ’3Rnโˆ’2Rnโˆ’1R1R0R1โ‹ฏRnโˆ’4Rnโˆ’3Rnโˆ’2R2R1R0โ‹ฏRnโˆ’5Rnโˆ’4Rnโˆ’3โ‹ฎโ‹ฎโ‹ฎRnโˆ’3Rnโˆ’4Rnโˆ’5โ‹ฏR0R1R2Rnโˆ’2Rnโˆ’3Rnโˆ’4โ‹ฏR1R0R1Rnโˆ’1Rnโˆ’2Rnโˆ’3โ‹ฏR2R1R0)\begin{pmatrix} R_0 & R_1 & R_2 & \cdots & R_{n-3} & R_{n-2} & R_{n-1} \\ R_1 & R_0 & R_1 & \cdots & R_{n-4} & R_{n-3} & R_{n-2} \\ R_2 & R_1 & R_0 & \cdots & R_{n-5} & R_{n-4} & R_{n-3} \\ & & \vdots & \vdots & \vdots & & & \\ R_{n-3} & R_{n-4} & R_{n-5} & \cdots & R_0 & R_1 & R_2 \\ R_{n-2} & R_{n-3} & R_{n-4} & \cdots & R_1 & R_0 & R_1 \\ R_{n-1} & R_{n-2} & R_{n-3} & \cdots & R_2 & R_1 & R_0 \\ \end{pmatrix}

Let's look at the first row. These values correspond to (Y1,Yj)(Y_1, Y_j) pairs. For instance, the first cell in the row is the (Y1,Y1)(Y_1, Y_1) pair, which, using the RkR_k notation, is R0R_0. We continue to increment the jj's until we reach (Y1,Yn)(Y_1, Y_n), which has a corresponding RkR_k value of Rnโˆ’1R_{n-1}.

Similarly, we can see that the second row corresponds to (Y2,Yj)(Y_2, Y_j) pairs, the nnth row corresponds to (Yn,Yj)(Y_n, Y_j) pairs, the first column corresponds to (Yi,Y1)(Y_i, Y_1) pairs, and the last column corresponds to (Yi,Yn)(Y_i, Y_n) pairs.

We can see that there are nn R0R_0 terms, since we have an nร—nn \times n matrix. How many R1R_1 terms do we have? If we shift the R1R_1 diagonal above the R0R_0 diagonal down by one row, we see that we have nโˆ’1n-1 R1R_1 terms. If we shift the R1R_1 diagonal below the R0R_0 diagonal up by one row, we see that we have another nโˆ’1n-1 R1R_1 terms. In total, we have 2(nโˆ’1)R12(n-1) R_1 terms.

If we continue this process, we will see that we have 2(nโˆ’2)2(n-2) R2R_2 terms, and, generally 2(nโˆ’(nโˆ’1))Rnโˆ’12(n - (n-1)) R_{n-1} terms.

Equation (2)(2) above is very important because it relates the variance of the sample mean to the covariances of the process. With this in mind, let's define a new quantity, ฯƒn2\sigma^2_n, which is nn times the variance of the sample mean:

ฯƒn2โ‰กnVar(Yห‰n)=R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rk\sigma^2_n \equiv n\text{Var}(\bar Y_n) = R_0 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)R_k

We can also define the related variance parameter, ฯƒ2\sigma^2, which is the limit of ฯƒn2\sigma^2_n as nn goes to infinity:

ฯƒ2โ‰กlimโกnโ†’โˆžฯƒn2=โˆ—R0+2โˆ‘k=1โˆžRk=โˆ‘k=โˆ’โˆžโˆžRk\sigma^2 \equiv \lim_{n \to \infty} \sigma^2_n =^* R_0 + 2 \sum_{k=1}^\infty R_k = \sum_{k=-\infty}^\infty R_k

Note that, as nn goes to infinity, the k/nk / n term goes to zero so:

R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rkโ†’R0+2โˆ‘k=1โˆžRkR_0 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)R_k \to R_0 + 2\sum_{k=1}^{\infty}R_k

The weird =โˆ—=^* notation means that the equality holds only if the RkR_k terms decrease to 0 quickly as kโ†’โˆžk \to \infty.

If the YiY_i's are iid, then for all kโ‰ 0k \neq 0, we have Rk=0R_k = 0. In that case, ฯƒn2=ฯƒ2=R0=Var(Y1)\sigma^2_n = \sigma^2 = R_0 = \text{Var}(Y_1).

However, in the dependent case, ฯƒ2\sigma^2 adds in the effects of all of the pairwise, nonzero covariances. In queueing applications, these covariances are always positive, which makes sense: if we wait a long time, the person next to us probably waits a long time. As a result, ฯƒ2โ‰ฯƒn2โ‰ซVar(Y1)\sigma^2 \doteq \sigma^2_n \gg \text{Var}(Y_1), which may be much bigger than we expect.

We can think of the ratio ฯƒn2/Var(Y1)\sigma^2_n / \text{Var}(Y_1) as sort of the number of YiY_i observations needed to obtain the information that is equivalent to one "independent" observation.

If ฯƒn2โ‰ซVar(Y1)\sigma^2_n \gg \text{Var}(Y_1), then the classical confidence interval (CI) for the mean ฮผ\mu will not be accurate, and we'll discuss this phenomenon more shortly.

Example

The first-order autoregressive process is defined by:

Yi=ฯ•Yiโˆ’1+ฯตi,i=1,2,...,Y_i = \phi Y_{i-1} + \epsilon_i, \quad i = 1,2,...,

Note that โˆ’1<ฯ•<1-1 < \phi < 1, Y0โˆผNor(0,1)Y_0 \sim \text{Nor}(0,1), and the ฯตi\epsilon_i's are iid Nor(0,1โˆ’ฯ•2)\text{Nor}(0, 1-\phi^2) random variables that are independent of Y0Y_0. The YiY_i's are all Nor(0,1) and the covariance function is defined as:

Rk=ฯ•โˆฃkโˆฃR_k = \phi^{|k|}

Let's apply the definition of ฯƒ2\sigma^2:

ฯƒ2=โˆ‘k=โˆ’โˆžโˆžฯ•โˆฃkโˆฃ\sigma^2 = \sum_{k=-\infty}^\infty \phi^{|k|}

After some algebra, we get the following result:

ฯƒ2=1+ฯ•1โˆ’ฯ•\sigma^2 = \frac{1 + \phi}{1 - \phi}

Note that ฯƒ2โ‰ 1=Var(Y1)\sigma^2 \neq 1 = \text{Var}(Y_1). As an example, if ฯ•=0.9\phi = 0.9, then ฯƒ2=19\sigma^2 = 19. We need to collect nineteen correlated observations to have the information that we would get from just one iid observation.

Properties of the Sample Variance

Let's remember the formula for the sample variance:

SY2โ‰ก1nโˆ’1โˆ‘i=1n(Yiโˆ’Yห‰n)2S_Y^2 \equiv \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar Y_n)^2

If Y1,Y2,...,YnY_1, Y_2,...,Y_n are iid, then SY2S^2_Y is unbiased for R0=Var(Y1)R_0 = \text{Var}(Y_1). Moreover, SY2S^2_Y is also unbiased for ฯƒn2=nVar(Yห‰n)=R0\sigma^2_n = n\text{Var}(\bar Y_n) = R_0 and ฯƒ2=limโกnโ†’โˆžฯƒn2=R0\sigma^2 = \lim_{n \to \infty}\sigma^2_n = R_0.

If the YiY_i's are dependent, then SY2S^2_Y may not be such a great estimator for Var(Y1)\text{Var}(Y_1), ฯƒn2\sigma^2_n, or ฯƒ2\sigma^2.

Let's again suppose that Y1,Y2,...,YnY_1, Y_2,...,Y_n are identically distributed with mean ฮผ\mu, and correlated with covariance function RkR_k.

Let's get the expected value of SY2S^2_Y:

E[SY2]=1nโˆ’1E[โˆ‘i=1n(Yiโˆ’Yห‰n)2]=1nโˆ’1E[โˆ‘i=1nYi2โˆ’nYห‰n2]\begin{alignedat}{1} E[S^2_Y] &= \frac{1}{n-1} E\left[\sum_{i=1}^n(Y_i - \bar Y_n)^2\right] \\ &= \frac{1}{n-1} E\left[\sum_{i=1}^nY_i^2 - n\bar Y_n^2\right] \end{alignedat}

Note that, since the YiY_i's are identically distributed, E[Yi]=E[Y1]E[Y_i] = E[Y_1]:

E[SY2]=1nโˆ’1E[โˆ‘i=1nY12โˆ’nYห‰n2]=1nโˆ’1E[nY12โˆ’nYห‰n2]=nnโˆ’1E[Y12]โˆ’E[Yห‰n2]\begin{alignedat}{1} E[S^2_Y] &= \frac{1}{n-1} E\left[\sum_{i=1}^nY_1^2 - n\bar Y_n^2\right] \\ &= \frac{1}{n-1} E\left[nY_1^2 - n\bar Y_n^2\right] \\[2ex] & = \frac{n}{n-1} E[Y_1^2] - E[\bar Y_n^2] \end{alignedat}

Now, let's remember that Var(X)=E[X2]โˆ’E[X]2\text{Var}(X) = E[X^2] - E[X]^2. Therefore:

E[SY2]=nnโˆ’1E[Y12]โˆ’E[Yห‰n2]=nnโˆ’1[{Var(Y1)+(E[Y1]2}โˆ’{Var(Yห‰n)+(E[Yห‰n]2}]\begin{alignedat}{1} E[S^2_Y] &= \frac{n}{n-1} E[Y_1^2] - E[\bar Y_n^2] \\ &= \frac{n}{n-1} \left[\left\{\text{Var}(Y_1) + (E[Y_1]^2\right\} - \left\{\text{Var}(\bar Y_n) + (E[\bar Y_n]^2\right\}\right] \end{alignedat}

Since ฮผ=E[Y1]=E[Yห‰n]\mu = E[Y_1] = E[\bar Y_n], then:

E[SY2]=nnโˆ’1[{Var(Y1)+(E[Y1]2}โˆ’{Var(Yห‰n)+(E[Yห‰n]2}]=nnโˆ’1[Var(Y1)โˆ’Var(Yห‰n)]\begin{alignedat}{1} E[S^2_Y] &= \frac{n}{n-1} \left[\left\{\text{Var}(Y_1) + (E[Y_1]^2\right\} - \left\{\text{Var}(\bar Y_n) + (E[\bar Y_n]^2\right\}\right] \\ &= \frac{n}{n-1} \left[\text{Var}(Y_1) - \text{Var}(\bar Y_n)\right] \end{alignedat}

Let's assume that the RkR_k's are greater than zero. Remember equation (2)(2) from above:

Var(Yห‰n)=1n[R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rk](2)\text{Var}(\bar Y_n) = \frac{1}{n} \left[R_0 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)R_k\right] \quad (2)

Given that R0=Var(Y1)R_0 = \text{Var}(Y_1), we can rewrite the expected value of the sample variance as:

E[SY2]=nnโˆ’1{R0โˆ’1n[R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rk]}E[S^2_Y] = \frac{n}{n-1}\left\{R_0 - \frac{1}{n}\left[R_0 + 2 \sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)R_k\right]\right\}

Let's simplify:

E[SY2]=R0โˆ’2nโˆ’1โˆ‘k=1nโˆ’1(1โˆ’kn)RkE[S^2_Y] = R_0 - \frac{2}{n-1} \sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)R_k

If the RkR_k's are positive, which we expect them to be since the YiY_i's are correlated, then:

2nโˆ’1โˆ‘k=1nโˆ’1(1โˆ’kn)Rk>0\frac{2}{n-1} \sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)R_k > 0

Therefore:

E[SY2]<R0E[S^2_Y] < R_0

In turn:

R0โ‰ชฯƒn2=R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)RkR_0 \ll \sigma^2_n = R_0 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)R_k

Collecting these results shows that:

E[SY2]<Var(Y1)โ‰ชnVar(Yห‰n)E[S^2_Y] < \text{Var}(Y_1) \ll n \text{Var}(\bar Y_n)

As a result, we should not use SY2/nS^2_Y / n to estimate Var(Yห‰n)\text{Var}(\bar Y_n). What happens if we do use it?

Let's look at the classical 100(1โˆ’ฮฑ)100(1 - \alpha) CI for the mean ฮผ\mu of iid normal observations with unknown variance:

ฮผโˆˆYห‰nยฑta/2,nโˆ’1SY2/n\mu \in \bar Y_n \pm t_{a/2, n-1}\sqrt{S^2_Y/n}

Since we just showed that E[SY2]โ‰ชVar(Yห‰n)E[S^2_Y] \ll \text{Var}(\bar Y_n), the CI will have true coverage โ‰ช1โˆ’ฮฑ\ll 1 - \alpha. Our confidence interval will have much less coverage than we claim that it has. This result demonstrates why we have to be very careful with correlated data.

Finite-Horizon Analysis

In this lesson, we will talk about how to deal with correlated data in the context of finite-horizon simulations using a technique called the method of independent replications.

Finite-Horizon Analysis

Our goal is to simulate some system of interest over a finite time horizon and analyze the output. For now, we will look at discrete simulation output, Y1,Y2,...,YmY_1, Y_2,...,Y_m, where the number of observations, mm, can be a constant or random.

For example, we can obtain the waiting times, Y1,Y2,...,Y100Y_1, Y_2,...,Y_{100}, of the first one hundred customers to show up at a store. In this case, mm is a specified constant: 100. Alternatively, mm could denote the random number of customers observed during a time interval, [0,T][0, T], where TT itself is known or random. For instance, we might consider all the customers that show up between 10 a.m. and 2 p.m. In this case, TT is specified, but mm is random. We could also consider all customers from 10 a.m. until the store owner has to leave to pick up his kid from school. In this case, TT is random.

We don't have to focus on only discrete output. We might observe continuous simulation output, {Y(t)โˆฃ0โ‰คtโ‰คT}\{Y(t) | 0 \leq t \leq T\}, over an interval [0,T][0, T], where TT again can be known or random. For example, Y(t)Y(t) might denote the number of customers in the queue between 8 a.m. and 5 p.m, or it might denote the number of customers between 8 a.m. and whenever the store owner has to leave to pick his kid up. In the first case, TT is known; in the second, TT is random.

Estimating Sample Mean

We want to estimate the expected value of the sample mean of a collection of discrete observations. For example, we might be taking observations at the bank, and we might want to estimate the average waiting time for a customer. Let's call the expected value of the sample mean ฮธ\theta. Therefore:

ฮธโ‰กE[Yห‰m],Yห‰mโ‰ก1mโˆ‘i=1mYi\theta \equiv E[\bar Y_m], \quad \bar Y_m \equiv \frac{1}{m}\sum_{i=1}^m Y_i

By definition, Yห‰m\bar Y_m is an unbiased estimator for ฮธ\theta because E[Yห‰m]=ฮธE[\bar Y_m] = \theta. Now, we also need to provide an estimate for the variance of Yห‰m\bar Y_m. With these two estimates, we can provide a confidence interval for the true value of ฮธ\theta.

Since the YiY_i's are not necessarily iid, the variance of Yห‰m\bar Y_m is not equal to Var(Yห‰1)/m\text{Var}(\bar Y_1) / m. As a result, we cannot use the sample variance, SY2S^2_Y, divided by mm to estimate Var(Yห‰m)\text{Var}(\bar Y_m). What should we do?

Method of Independent Replications

We can use the method of independent replications to estimate Var(Yห‰m)\text{Var}(\bar Y_m). This method involves conducting rr independent simulation runs, or replications, of the system under study, where each replication consists of mm observations. We can easily make the replications independent by re-initializing each replication with a different pseudo-random number seed.

We denote the sample mean from replication ii by:

Ziโ‰ก1mโˆ‘j=1mYi,jZ_i \equiv \frac{1}{m} \sum_{j=1}^m Y_{i,j}

Note that Yi,jY_{i,j} is observation j=1,2,...,mj = 1,2,...,m from replication i=1,2,...,ri = 1,2,...,r. Put another way, Yi,jY_{i,j} is customer jj's waiting time in replication ii. For example, if we have five replications, each with one hundred observations, then Y3,20Y_{3,20} is the twentieth observation from the third replication.

If we start each replication under the same operating conditions - all of the queues are empty and idle, for example - then the replication sample means, Z1,Z2,...,ZrZ_1, Z_2,...,Z_r are iid random variables.

We can define the grand sample mean, Zห‰r\bar Z_r, as the mean of the replication means:

Zห‰rโ‰ก1rโˆ‘i=1rZi\bar Z_r \equiv \frac{1}{r} \sum_{i=1}^r Z_i

The obvious point estimator for Var(Yห‰m)=Var(Zi)\text{Var}(\bar Y_m) = \text{Var}(Z_i) is the sample variance of the ZiZ_i's, because those observations are iid. Let's remember the definition of the sample variance:

SZ2โ‰ก1rโˆ’1โˆ‘i=1r(Ziโˆ’Zห‰r)2S^2_Z \equiv \frac{1}{r-1} \sum_{i=1}^r (Z_i - \bar Z_r)^2

Note that the forms of SZ2S^2_Z and SY2S^2_Y, which we shouldn't use, resemble each other. But, since the replication sample means are iid, SZ2S^2_Z is usually much less biased for Var(Yห‰m)=Var(Zi)\text{Var}(\bar Y_m) = \text{Var}(Z_i) than is SY2/mS^2_Y / m. Since the ZiZ_i's are iid, we can use SZ2/rS^2_Z / r as a reasonable estimator for the variance of Zห‰r\bar Z_r.

If the number of observations per replication, mm, is large enough, the central limit theorem tells us that the replicate sample means, Z1,Z2,...,ZrZ_1, Z_2, ..., Z_r, are approximately iid Nor(ฮธ,Var(Z1))\theta, \text{Var}(Z_1)). From that result, we can see that the SZ2S^2_Z is approximately distributed as a ฯ‡2\chi^2 random variable:

SZ2โ‰ˆVar(Z1)ฯ‡2(rโˆ’1)rโˆ’1S^2_Z \approx \frac{\text{Var}(Z_1)\chi^2(r-1)}{r-1}

From there, we have the approximate independent replications 100(1โˆ’ฮฑ)%100(1-\alpha)\% two-sided confidence interval for ฮธ\theta:

ฮธโˆˆZห‰rยฑtฮฑ/2,rโˆ’1SZ2/r\theta \in \bar Z_r \pm t_{\alpha/2, r-1}\sqrt{S^2_Z / r}

Let's recap what we did. We took rr replications of mm correlated observations, which we then transformed into rr iid ZiZ_i's. From there, we took the sample mean and the sample variance, and we were able to construct a confidence interval for ฮธ\theta.

Example

Suppose we want to estimate the expected average waiting time for the first m=5000m = 5000 customers at the bank on a given day. We will make r=5r=5 independent replications, each initialized empty and idle and consisting of 50005000 waiting times. Consider the following replicate means:

i12345Zi3.24.35.14.24.6\begin{array}{c|ccccc} i & 1 & 2 & 3 & 4 & 5 \\ \hline Z_i & 3.2 & 4.3 & 5.1 & 4.2 & 4.6 \end{array}

If we plug and chug, we can see that Zห‰5=4.28\bar Z_5 = 4.28 and SZ2=0.487S^2_Z = 0.487. If ฮฑ=0.05\alpha = 0.05, the corresponding t-quantile is t0.025,4=2.78t_{0.025, 4} = 2.78, which gives us the following 95% confidence interval for the expected average waiting time for the first 5000 customers:

ฮธโˆˆ4.28ยฑ(2.78)0.487/5=[3.41,5.15]\theta \in 4.28 \pm (2.78) \sqrt{0.487 / 5} = [3.41, 5.15]

Finite-Horizon Extensions

In this lesson, we will expand the range of benefits that the method of independent replications (IR) can provide. For example, we can use IR to get smaller confidence intervals for the mean or to produce confidence intervals for things other than the mean, like quantiles.

Smaller Confidence Intervals

If we want a smaller confidence interval, we need to run more replications. Let's recall the approximate independent replications 100(1โˆ’ฮฑ)%100(1-\alpha)\% two-sided confidence interval for ฮธ\theta:

ฮธโˆˆZห‰rยฑtฮฑ/2,rโˆ’1SZ2/r\theta \in \bar Z_r \pm t_{\alpha/2, r-1}\sqrt{S^2_Z / r}

We can define the half-length of the current confidence interval as the term to the right of the plus-minus sign. We'll call that HH:

Hโ‰กtฮฑ/2,rโˆ’1SZ2/rH \equiv t_{\alpha/2,r-1}\sqrt{S^2_Z / r}

Suppose we would like our confidence interval to have half-length ฯต\epsilon:

ฯตโ‰กtฮฑ/2,rโˆ—โˆ’1SZ2/rโˆ—\epsilon \equiv t_{\alpha/2,r^*-1}\sqrt{S^2_Z / r^*}

Note here that we are holding the variance estimator fixed based on the current number of replications - we can't look into the future. We need to find some rโˆ—>rr^* > r such that our confidence interval is sufficiently narrow.

Let's solve the equation for the desired half-length for rโˆ—r^*:

ฯต=tฮฑ/2,rโˆ—โˆ’1SZ2/rโˆ—ฯต2=tฮฑ/2,rโˆ—โˆ’12SZ2rโˆ—rโˆ—=tฮฑ/2,rโˆ—โˆ’12SZ2ฯต2\begin{alignedat}{1} \epsilon &= t_{\alpha/2,r^*-1}\sqrt{S^2_Z / r^*} \\ \epsilon^2 &= \frac{t^2_{\alpha/2,r^*-1}S^2_Z}{r^*} \\ r^* &= \frac{t^2_{\alpha/2,r^*-1}S^2_Z}{\epsilon^2} \\ \end{alignedat}

As we increase the degrees of freedom, the corresponding value of the tt-quantile decreases. Therefore, since rโˆ—>rr^* > r:

tฮฑ/2,rโˆ—โˆ’12<tฮฑ/2,rโˆ’12t^2_{\alpha/2,r^*-1} < t^2_{\alpha/2,r-1}

So:

rโˆ—=tฮฑ/2,rโˆ—โˆ’12SZ2ฯต2<tฮฑ/2,rโˆ’12SZ2ฯต2r^* = \frac{t^2_{\alpha/2,r^*-1}S^2_Z}{\epsilon^2} < \frac{t^2_{\alpha/2,r-1}S^2_Z}{\epsilon^2}

With a little algebra, we can see that:

H2r=tฮฑ/2,rโˆ’12SZ2H^2r = t^2_{\alpha/2,r-1}S^2_Z

So:

rโˆ—=tฮฑ/2,rโˆ—โˆ’12SZ2ฯต2<tฮฑ/2,rโˆ’12SZ2ฯต2=(H/ฯต)2rr^* = \frac{t^2_{\alpha/2,r^*-1}S^2_Z}{\epsilon^2} < \frac{t^2_{\alpha/2,r-1}S^2_Z}{\epsilon^2} = (H/\epsilon)^2r

Using this expression, we can set rโˆ—=(H/ฯต)2rr^* = (H/\epsilon)^2r, run rโˆ—โˆ’rr^* - r more replications, and re-calculate the confidence interval using all rโˆ—r^* reps. Our new confidence interval will have a half-length that is probably close to ฯต\epsilon.

Notice the squared term in this expression: if we want to reduce the length of the confidence interval by a factor of xx, we will need to increase our replications by a factor of x2x^2.

Quantiles

The p-quantile of a random variable, WW, having cdf F(w)F(w) is defined as:

ฮพโ‰กminโก{wโˆฃF(w)โ‰ฅp}\xi \equiv \min \{w|F(w) \geq p\}

If WW is continuous, then ฮพp\xi_p is the unique value such that F(ฮพp)=pF(\xi_p) = p and Fโˆ’1(p)=ฮพpF^{-1}(p) = \xi_p.

For example, let's suppose that we have a random variable WโˆผExp(ฮป)W \sim \text{Exp}(\lambda). We know that WW has the cdf F(w)=1โˆ’eโˆ’ฮปwF(w) = 1 - e^{-\lambda w}. Therefore:

p=1โˆ’eโˆ’ฮปฮพp1โˆ’p=eโˆ’ฮปฮพplnโก(1โˆ’p)=โˆ’ฮปฮพpโˆ’1ฮปlnโก(1โˆ’p)=ฮพp\begin{alignedat}{1} p &= 1 - e^{-\lambda \xi_p} \\ 1 - p &= e^{-\lambda \xi_p} \\ \ln(1-p) &= -\lambda \xi_p \\ \frac{-1}{\lambda} \ln(1-p) &= \xi_p \end{alignedat}

We can use the method of independent replications to obtain confidence intervals for quantiles. For example, let WiW_i denote the maximum waiting time that some customer experiences at the airport ticket counter between 8 a.m. and 5 p.m. during replication ii of a simulation, i=1,2,...,ri = 1,2,...,r.

Let's order the iid WiW_i's: W(1)โ‰คW(2)โ‰คโ‹ฏโ‰คW(r)W_{(1)} \leq W_{(2)} \leq \cdots \leq W_{(r)}. Then, the typical point estimator for the quantile ฮพp\xi_p is:

ฮพ^pโ‰กW(โŒŠrp+0.5โŒ‹)\widehat \xi_p \equiv W_{(\lfloor rp+ 0.5\rfloor)}

In layman's terms, we retrieve the WiW_i that is larger than rprp other WiW_i's. Since we are using order statistics, we can just retrieve, essentially, the rpthrp^{\text{th}} order statistic. As we have seen before, the 0.50.5 addition is a continuity correction, and โŒŠโ‹…โŒ‹\lfloor \cdot \rfloor is the "round-down" function. For example, if r=100r=100 and p=0.75p= 0.75, we would look at the โŒŠ75.0+0.5โŒ‹th=75th\lfloor 75.0 + 0.5\rfloor^{\text{th}} = 75^{\text{th}} order statistic.

Now that we have a point estimator for ฮพp\xi_p, we can get a confidence interval for ฮพp\xi_p. A slightly conservative, approximate, nonparametric confidence interval will turn out to be of the form ฮพpโˆˆ[W(j),W(k)]\xi_p \in [W_{(j)}, W_{(k)}].

Let's remember that, by definition:

P(Wiโ‰คฮพp)=pP(W_i \leq \xi_p) = p

As a result, we can think about a single Wiโ‰คฮพpW_i \leq \xi_p event as a Bern(pp) trial. If we define a random variable, AA, as the number of WiW_i's that are less than or equal to ฮพp\xi_p, then:

AโˆผBin(r,p)A \sim \text{Bin}(r,p)

The event {jโ‰คAโ‰คkโˆ’1}\{j \leq A \leq k-1 \} means that between jj and kโˆ’1k-1 of the WiW_i's are โ‰คฮพp\leq \xi_p. This event is equivalent to the following expression involving the order statistics W(j)W_{(j)} and W(i)W_{(i)}:

{(ฮพpโ‰ฅW(j))ย andย (ฮพp<W(k))}\left\{(\xi_p \geq W_{(j)}) \text{ and } (\xi_p < W_{(k)})\right\}

Therefore:

P(W(j)โ‰คฮพp<W(k))=P(jโ‰คAโ‰คkโˆ’1)P(W_{(j)} \leq \xi_p < W_{(k)}) = P(j \leq A \leq k-1)

We have a binomial expression that is equivalent to the expression on the right above:

P(jโ‰คAโ‰คkโˆ’1)=โˆ‘l=jkโˆ’1(rl)pl(1โˆ’p)rโˆ’lP(j \leq A \leq k-1) = \sum_{l=j}^{k-1}\binom{r}{l}p^l(1-p)^{r-l}

This binomial expression is approximately equal to:

ฮฆ(kโˆ’0.5โˆ’rprp(1โˆ’p))โˆ’ฮฆ(jโˆ’0.5โˆ’rprp(1โˆ’p))\Phi\left(\frac{k - 0.5 - rp}{\sqrt{rp(1-p)}}\right) - \Phi\left(\frac{j - 0.5 - rp}{\sqrt{rp(1-p)}}\right)

Note that ฮฆ(โ‹…)\Phi(\cdot) is the Nor(0,1) cdf, the 0.5 terms are continuity corrections, and the approximation requires that rp,r(1โˆ’p)โ‰ฅ5rp, r(1-p) \geq 5.

To get the approximate 100(1โˆ’ฮฑ)%100(1-\alpha)\% confidence interval of the form ฮพpโˆˆ[W(j),W(k)]\xi_p \in [W_{(j)}, W_{(k)}], we need to find jj and kk such that:

ฮฆ(kโˆ’0.5โˆ’rprp(1โˆ’p))โˆ’ฮฆ(jโˆ’0.5โˆ’rprp(1โˆ’p))โ‰ฅ1โˆ’ฮฑ\Phi\left(\frac{k - 0.5 - rp}{\sqrt{rp(1-p)}}\right) - \Phi\left(\frac{j - 0.5 - rp}{\sqrt{rp(1-p)}}\right) \geq 1 - \alpha

How do we choose jj and kk? We can set:

jโˆ’0.5โˆ’rprp(1โˆ’p)=โˆ’zฮฑ/2,kโˆ’0.5โˆ’rprp(1โˆ’p)=zฮฑ/2\frac{j - 0.5 - rp}{\sqrt{rp(1-p)}} = -z_{\alpha/2}, \quad \frac{k - 0.5 - rp}{\sqrt{rp(1-p)}} = z_{\alpha/2}

After some algebra:

j=โŒŠrp+0.5โˆ’zฮฑ/2rp(1โˆ’p)โŒ‹k=โŒˆrp+0.5+zฮฑ/2rp(1โˆ’p)โŒ‰ \begin{aligned} j = \left\lfloor rp + 0.5 - z_{\alpha/2}\sqrt{rp(1-p)}\right\rfloor \\ k = \left\lceil rp + 0.5 + z_{\alpha/2}\sqrt{rp(1-p)}\right\rceil \\ \end{aligned}

Remember that we need the floor and ceiling function here because jj and kk must be integers.

Note that if we want to get reasonably small half-lengths for "extreme" (โ‰ฅ0.999\geq 0.999) quantiles, we will likely need to run a huge number of replications.

Example

Let's suppose that we want a 95%95\% confidence interval for ฮพ0.9\xi_{0.9} and we've made r=1000r = 1000 replications. The point estimator for ฮพ0.9\xi_{0.9} is:

ฮพ^0.9=W(โŒŠ1000(0.9)+0.5โŒ‹)=W(900)\widehat \xi_{0.9} = W_{(\lfloor1000(0.9) + 0.5\rfloor)} = W_{(900)}

With the confidence interval in mind, let's compute jj and kk:

j=โŒŠ900.5โˆ’1.9690โŒ‹=881,k=โŒŠ900.5+1.9690โŒ‹=920,j = \left\lfloor 900.5 - 1.96\sqrt{90}\right\rfloor = 881, \quad k = \left\lfloor 900.5 + 1.96\sqrt{90}\right\rfloor = 920,

As a result, the 95%95\% confidence interval for the quantile is [W(881),W(920)][W_{(881)}, W_{(920)}], and the point estimator is W(900)W_{(900)}.

Simulation Initialization Issues

In this lesson, we will look at how to start a simulation and when to start keeping data for steady-state analysis.

Introduction

Before running a simulation, we must provide initial values for all of the simulation's state variables. We might not know appropriate initial values for these variables, so we might choose them arbitrarily. As always, we have to be careful. Such a choice can have a significant but unrecognized impact on the simulation run's outcome.

For example, it might be convenient to initialize a queue as empty and idle, but that might not be the best choice. What if the store that we are modeling is having a sale, and there is a long line already formed by the time the store opens?

This problem of initialization bias can lead to errors, particularly in steady-state output analysis.

Some Initialization Issues

Visual detection of initialization effects is sometimes very difficult. For example, queuing systems have high intrinsic variance, and it can be hard to see the initialization effects through the noise of the variance.

We have to think about how we should initialize a simulation. For example, suppose a machine shop closes at a certain time each day. There might be customers remaining at the end of the day, and we have to make sure that the next day starts with those customers in mind.

Initialization bias can lead to point estimators for steady-state parameters that have a high mean-squared error and confidence intervals that have poor coverage, all because the initialization is not representative of the steady-state.

Detecting Initialization Bias

We can try to detect initialization bias visually by scanning the simulation output. However, this approach is not simple: the analysis is tedious, and it's easy to miss bias amidst general random noise.

To make the analysis more efficient, we can first transform the data - for instance, take logarithms or square roots - we can smooth it, average it across several replications, or apply other strategies to illuminate bias.

We can also conduct formal statistical tests for initialization bias. For example, we might conduct a test to see if the mean or variance of a process changes over time. If it does, then the process was not completely in steady-state when we began to collect observations.

Dealing with Bias

We can truncate the output by allowing the simulation to "warm up" before retaining data. Ideally, we want the remaining data to be representative of the steady-state system. Output truncation is the most popular method for dealing with bias, and most simulation languages have built-in truncation functions.

We have to be careful when selecting a truncation point. If we truncate the output too early, then significant bias might still exist in our retained data. If we truncate the output too late, we end up wasting good steady-state observations.

To make matters worse, all simple rules to give truncation points do not perform well in general. However, a reasonable practice is to average the observations across several replications and then visually choose a truncation point. There are also newer, sophisticated, sequential change-point detection algorithms that are proving to be useful.

Example

We want to select a truncation point somewhere in the following series of observations.

Let's run three replications and look at the resulting data.

Let's average the observations across the three replications.

We can even smooth out the average using moving averages.

Here we can identify what looks like a reasonable truncation point. We'd start retaining observations from this point forward.

Dealing with Bias, Again

As an alternative, we could take an extremely long simulation run to overwhelm the initialization effects. This strategy is conceptually simple to carry out and may yield point estimators having lower mean-squared error than the truncation method. However, this approach can be wasteful with observations. In other words, we might need an excessively long run before the initialization effects become negligible.

Steady-State Analysis

In this lesson, we will introduce steady-state simulation analysis and the method of batch means.

Steady-State Analysis

Let's assume that we have removed our initialization bias, and we have steady-state, or stationary, simulation output Y1,Y2,...,YnY_1, Y_2,...,Y_n. Our goal is to use this output to estimate some parameter of interest, such as the mean customer waiting time or the expected profit produced by a certain factory configuration.

In particular, suppose the mean of this output is the unknown quantity ฮผ\mu. Like we've seen in the past, we can use the sample mean, Yห‰n\bar Y_n, to estimate ฮผ\mu. As in the case of terminating / finite-horizon simulations, we must accompany the value of any point estimator with a measure of its variance. In other words, we need to provide an estimate for Var(Yห‰n)\text{Var}(\bar Y_n).

Instead of estimating Var(Yห‰n)\text{Var}(\bar Y_n) directly, we can estimate the variance parameter, which is basically equivalent:

ฯƒ2โ‰กlimโกnโ†’โˆžnVar(Yห‰n)\sigma^2 \equiv \lim_{n \to \infty} n\text{Var}(\bar Y_n)

Remember that, if the observations are correlated, then nVar(Yห‰n)n\text{Var}(\bar Y_n) is equal to the following:

nVar(Yห‰n)=R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rkn\text{Var}(\bar Y_n) = R_0 + 2 \sum_{k=1}^{n-1} \left(1 - \frac{k}{n}\right)R_k

If the RkR_k's decrease quickly as kโ†’โˆžk \to \infty, then ฯƒ2\sigma^2 is just the sum of the covariances:

ฯƒ2=โˆ‘k=โˆ’โˆžโˆžRk\sigma^2 = \sum_{k=-\infty}^\infty R_k

The quantity ฯƒ2\sigma^2 shows up all over the place, from simulation output analysis to Brownian motions to financial engineering applications, and more.

For example, consider a MA(1) process, Yi+1=ฮธฯตi+ฯตi+1Y_{i+1} = \theta\epsilon_i + \epsilon_{i+1}, where the ฯตi\epsilon_i's are iid Nor(0,1). We determined that R0=1+ฮธ2R_0 = 1+ \theta^2, Rยฑ=ฮธR_{\pm} = \theta, and Rk=0R_k = 0. Therefore:

ฯƒ2=limโกnโ†’โˆž[R0+2โˆ‘k=1nโˆ’1(1โˆ’kn)Rk]=limโกnโ†’โˆž[(1+ฮธ2)+2ฮธ(1โˆ’1n)+2โˆ‘k=2nโˆ’10]=limโกnโ†’โˆž[(1+ฮธ2)+2ฮธโˆ’2ฮธn]=limโกnโ†’โˆž[(1+ฮธ)2โˆ’2ฮธn]=(1+ฮธ)2\begin{alignedat}{1} \sigma^2 &= \lim_{n \to \infty}\left[R_0 + 2 \sum_{k=1}^{n-1} \left(1 - \frac{k}{n}\right)R_k\right] \\ &= \lim_{n \to \infty}\left[(1 + \theta^2) + 2\theta\left(1-\frac{1}{n}\right) + 2 \sum_{k=2}^{n-1}0\right] \\ &= \lim_{n \to \infty}\left[(1 + \theta^2) + 2\theta-\frac{2\theta}{n}\right] \\ &= \lim_{n \to \infty}\left[(1 + \theta)^2 -\frac{2\theta}{n}\right] \\ &= (1+\theta)^2 \end{alignedat}

As another example, consider an AR(1) process, Yi+1=ฯ•Yi+ฯตi+1Y_{i+1} = \phi Y_i + \epsilon_{i+1}, where the ฯตi\epsilon_i's are iid Nor(0,1โˆ’ฯ•2)\text{Nor}(0, 1-\phi^2), โˆ’1<ฯ•<1-1 < \phi < 1, and Y0โˆผNor(0,1)Y_0 \sim \text{Nor}(0,1). For this process, Rk=ฯ•โˆฃkโˆฃR_k = \phi^{|k|}. We determined that ฯƒ2=(1+ฯ•)/(1โˆ’ฯ•)\sigma^2 = (1+\phi) / (1- \phi), so ฯƒ2โ†’โˆž\sigma^2 \to \infty as ฯ•โ†’1\phi \to 1.

Many methods exist for estimating ฯƒ2\sigma^2 and conducting steady-state output analysis in general, such as:

  • batch means
  • independent replications
  • standardized time series
  • spectral analysis
  • regeneration
  • ARMA time series modeling

Method of Batch Means

We can use the method of batch means (BM) to estimate ฯƒ2\sigma^2 and calculate confidence intervals for ฮผ\mu. We divide one long simulation run into several contiguous batches and then appeal to a central limit theorem to assume that the resulting batch sample means are approximately iid normal.

In particular, suppose we partition our observations, Y1,Y2,...,YnY_1, Y_2,...,Y_n, into bb nonoverlapping, contiguous batches, each consisting of mm observations. Therefore, n=bmn = bm. For example, the first batch contains the observations Y1,...,YmY_1,...,Y_m, the second batch contains the observations Ym+1,...,Y2mY_{m+1},...,Y_{2m}, and the bbth batch contains the observations Y(bโˆ’1)m+1,...,YbmY_{(b-1)m+1},...,Y_{bm}.

We can admit that the observations at the end of batch bโˆ’1b-1 and the beginning of batch bb are slightly correlated. Still, we can treat just about all of the observations across batches as being independent.

The iith batch mean is the sample mean of the mm observations from batch i=1,2,...,bi = 1,2,...,b:

Yห‰i,mโ‰ก1mโˆ‘j=1mY(iโˆ’1)m+j\bar Y_{i,m} \equiv \frac{1}{m} \sum_{j=1}^m Y_{(i-1)m + j}

Batch ii begins at observation Y(iโˆ’1)m+1Y_{(i-1)m + 1} and ends at Y(iโˆ’1)m+m=imY_{(i-1)m + m = im}. If we divide the mm observations by mm, we have computed the sample mean.

The batch means are correlated for small mm, but, for large mm:

Yห‰1,m,...,Yห‰b,mโ‰ˆiidNor(ฮผ,Var(Yห‰i,m))โ‰ˆNor(ฮผ,ฯƒ2/m)\bar Y_{1,m},...,\bar Y_{b,m} \overset{\text{iid}}{\approx} \text{Nor}(\mu, \text{Var}(\bar Y_{i,m})) \approx \text{Nor}(\mu, \sigma^2 /m)

Similar to the method of independent replications, we can define the batch means estimator for ฯƒ2=limโกnโ†’โˆžnVar(Yห‰n)=limโกmโ†’โˆžmVar(Yห‰1,m)\sigma^2 = \lim_{n \to \infty} n \text{Var}(\bar Y_n) = \lim_{m \to \infty} m \text{Var}(\bar Y_{1,m}) as:

V^Bโ‰กmbโˆ’1โˆ‘i=1b(Yห‰i,mโˆ’Yห‰n)2\widehat V_B \equiv \frac{m}{b-1} \sum_{i=1}^b \left(\bar Y_{i,m} - \bar Y_n\right)^2

We can see that V^B\widehat V_B looks a lot like the sample variance of the batch means. Notice the mm in the numerator. Since we defined ฯƒ2\sigma^2 as the limit of mVar(Yห‰1,m)m \text{Var}(\bar Y_{1,m}), we need to include the mm in the estimator. We can also see that V^B\widehat V_B is also approximately distributed as a ฯ‡2\chi^2 random variable:

V^Bโ‰ˆฯƒ2ฯ‡2(bโˆ’1)bโˆ’1\widehat V_B \approx \frac{\sigma^2\chi^2(b-1)}{b-1}

How good is V^B\widehat V_B as an estimator for ฯƒ2\sigma^2? Let's look at the expected value:

E[V^B]โ‰ฯƒ2bโˆ’1E[ฯ‡2(bโˆ’1)]E[\widehat V_B] \doteq \frac{\sigma^2}{b-1} E[\chi^2(b-1)]

Since E[ฯ‡2(bโˆ’1)]=bโˆ’1E[\chi^2(b-1)] = b-1, then E[V^B]=ฯƒ2E[\widehat V_B] = \sigma^2, so V^B\widehat V_B is asymptotically unbiased for ฯƒ2\sigma^2 as the batch size mโ†’โˆžm \to \infty.

Properties of Batch Means (OPTIONAL)

In this lesson, we will look at some properties of the batch means estimator for ฯƒ2\sigma^2 as well as the batch means confidence interval for the steady-state mean. This confidence interval is the key result in steady-state analysis, if not output analysis as a whole.

Estimator

Let's recall the batch means estimator for ฯƒ2\sigma^2:

V^Bโ‰กmbโˆ’1โˆ‘i=1b(Yห‰i,mโˆ’Yห‰n)2\widehat V_B \equiv \frac{m}{b-1} \sum_{i=1}^b (\bar Y_{i,m} - \bar Y_n)^2

Here, mm is the batch size, and bb is the number of batches, nn is the sample size, Yห‰i,m\bar Y_{i,m} is the iith batch mean, and Yห‰n\bar Y_n is the sample mean. As we saw last time, V^B\widehat V_B is approximately distributed as a ฯ‡2\chi^2 random variable:

V^Bโ‰ˆฯƒ2ฯ‡2(bโˆ’1)bโˆ’1\widehat V_B \approx \frac{\sigma^2\chi^2(b-1)}{b-1}

How good is V^B\widehat V_B as an estimator of ฯƒ2\sigma^2? Let's look at its mean and variance. As we saw last time:

E[V^B]โ‰ฯƒ2bโˆ’1E[ฯ‡2(bโˆ’1)]=ฯƒ2E[\widehat V_B] \doteq \frac{\sigma^2}{b-1}E[\chi^2(b-1)] = \sigma^2

Since the estimator's expected value equals the value it estimates, V^B\widehat V_B is asymptotically unbiased for ฯƒ2\sigma^2 as mโ†’โˆžm \to \infty.

More-Precise Result

It turns out that we can come up with a more precise definition for the expected value of V^B\widehat V_B:

E[V^B]=ฯƒ2+ฮณ(b+1)mb+o(1/m)E[\widehat V_B] = \sigma^2 + \frac{\gamma(b+1)}{mb} + o(1/m)

We don't particularly care about the oo function, as it goes to zero faster than 1/m1/m as mm gets big.

However, the middle term doesn't go away as quickly, and it's a first-order bias term. The good news is that whatever the number of batches, bb, this term decreases as the batch size, mm, gets big.

Let's look at the definition of ฮณ\gamma, which is a function of the covariances:

ฮณโ‰กโˆ’2โˆ‘k=1โˆžkRk\gamma \equiv -2 \sum_{k=1}^\infty kR_k

Let's look at the variance of our estimator:

Var(V^B)โ‰ฯƒ4(bโˆ’1)2Var(ฯ‡2(bโˆ’1))=2ฯƒ4bโˆ’1\text{Var}(\widehat V_B) \doteq \frac{\sigma^4}{(b-1)^2} \text{Var}(\chi^2(b-1)) = \frac{2\sigma^4}{b-1}

Mean-Squared Error of V^B\widehat V_B

These facts immediately imply that, for large mm and bb:

MSE(V^B)=Bias2+Varโ‰(ฮณ(b+1)mb)2+2ฯƒ4bโˆ’1\text{MSE}(\widehat V_B) = \text{Bias}^2 + \text{Var} \doteq \left(\frac{\gamma(b+1)}{mb}\right)^2 + \frac{2\sigma^4}{b-1}

Of course, for large bb, bโˆ’1โ‰ˆb+1โ‰ˆbb - 1 \approx b +1 \approx b, so:

MSE(V^B)โ‰ฮณ2m2+2ฯƒ4b\text{MSE}(\widehat V_B) \doteq \frac{\gamma^2}{m^2} + \frac{2\sigma^4}{b}

Our goal is to find the best choice of bb and mm, subject to the constraint n=bmn = bm, so as to minimize the MSE.

Let's fix bb for now and find the appropriate value for mm. To do so, lets take m=cnฮดm = cn^\delta, where c>0c >0 and 0<ฮด<10 < \delta < 1.

MSE(V^B)โ‰ฮณ2c2n2ฮด+2ฯƒ4b\text{MSE}(\widehat V_B) \doteq \frac{\gamma^2}{c^2n^{2\delta}} + \frac{2\sigma^4}{b}

We know that b=n/mb = n / m so:

b=n/cnฮด=n1โˆ’ฮด/cb = n / cn^\delta = n^{1-\delta} / c

Therefore:

MSE(V^B)โ‰ฮณ2c2n2ฮด+2cฯƒ4n1โˆ’ฮด\text{MSE}(\widehat V_B) \doteq \frac{\gamma^2}{c^2n^{2\delta}} + \frac{2c\sigma^4}{n^{1-\delta}}

If we choose any ฮดโ‰ 1/3\delta \neq 1/3, one of the two terms will converge to zero more slowly than the other. If we choose ฮด=1/3\delta = 1/3, both terms will converge to zero by n2/3n^{2/3}. Therefore:

MSE(V^B)โ‰1n2/3[ฮณ2c2+2cฯƒ4]\text{MSE}(\widehat V_B) \doteq \frac{1}{n^{2/3}} \left[ \frac{\gamma^2}{c^2} + 2c\sigma^4\right ]