Why the sample variance is unbiased

Have you ever wondered why the sample variance has an $n-1$ in the denominator?

$\hat{\sigma^2} = \frac{1}{n-1} \sum_{i=1}^{n}(X_i - \bar{X})^2$

This video works it out. I'll write out the math here.

First, remember that by linearity of expectation:
$E(\Sigma X) = \Sigma E(X)$
$E(cX) = cE(x)$

And also that the variance of a random variable can be written in terms of expectation like so:
$Var(X) = E(X^2) - E(X)^2$

Rewriting this, we get:
$E(X^2) = Var(X) + E(X)^2$
$E(X^2) = \sigma^2 + \mu^2$

Also recall that $Var(\bar{X}) = \frac{\sigma^2}{n}$ and so:
$Var(\bar{X}) = E(\bar{X}^2) - E(\bar{X})^2$
$E(\bar{X}^2) = Var(\bar{X}) + E(\bar{X})^2$
$E(\bar{X}^2) = \frac{\sigma^2}{n} + \mu^2$

Now let's work out the expectation of the sample variance.
$E[\hat{\sigma^2}] = E[\frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}]$

We are going to confirm that it's equal to the true variance. Let's just consider the numerator for now. We'll also leave off the summation bounds for ease of reading (and typing...):
$E[\Sigma(X_i - \bar{X})^2]$
$E[\Sigma(X_i^2 - 2X_i\bar{X} + \bar{X}^2)]$
$E[\Sigma(X_i^2) - \Sigma(2X_i\bar{X}) + \Sigma(\bar{X}^2)]$

Because $\bar{X}$ is constant w.r.t. the sum, we can pull it out of the second and third summations:
$E[\Sigma(X_i^2) - 2\bar{X}\Sigma(X_i) + n\bar{X}^2]$

Now since the sample mean $\bar{X} = \frac{\Sigma X_i}{n}$, we can replace $\Sigma X_i = n\bar{X}$:
$E[\Sigma(X_i^2) - 2\bar{X}n\bar{X} + n\bar{X}^2]$
$E[\Sigma(X_i^2) - 2n\bar{X}^2 + n\bar{X}^2]$
$E[\Sigma(X_i^2) - n\bar{X}^2]$
$\Sigma E(X_i^2) - n E(\bar{X}^2)$

And now using the expectations we worked out earlier:
$\Sigma(\sigma^2 + \mu^2) - n (\frac{\sigma^2}{n} - \mu^2)$
$n\sigma^2 + n\mu^2 - \sigma^2 - n\mu^2$
$n\sigma^2 - \sigma^2$
$(n-1)\sigma^2$

So that works out the numerator. When we divide by $(n-1)$, we are indeed left with the variance, $\sigma^2$, making the sample variance $\hat{\sigma^2}$ (as defined above) an unbiased estimator.



No comments:

Post a Comment

Maximum Likelihood Estimation for dummies

What is Maximum Likelihood Estimation (MLE)? It's simple, but there are some gotchas. First, let's recall what likelihood  is. ...