Sample Variance

The sample variance $S^{2}$ measures the spread of observations in a sample around the Sample Mean $\bar{X}$ .

Definition

For a sample $X_{1}, \dots, X_{n}$ :

S^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}

where $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ is the sample mean.

The divisor is $n - 1$ (not $n$ ) to make $S^{2}$ an unbiased estimator of the population variance $σ^{2}$ :

E [S^{2}] = σ^{2}

This correction is called Bessel's correction. The $n - 1$ accounts for the Degrees of Freedom lost when estimating $μ$ with $\bar{X}$ .

Let $X_{1}, \dots, X_{n}$ be a Random Sample (iid) from a distribution where $E [X_{i}] = μ$ and $Var (X_{i}) = σ^{2}$ .

We want to show that $E [S^{2}] = σ^{2}$ .

First, we establish a useful identity: $E [\sum_{i = 1}^{n} X_{i}^{2} - n {\bar{X}}^{2}] = (n - 1) σ^{2}$ .

Linearity: $E [\sum_{i = 1}^{n} X_{i}^{2}] - n E [{\bar{X}}^{2}] = \sum_{i = 1}^{n} E [X_{i}^{2}] - n E [{\bar{X}}^{2}]$
Recall Variance Identities:
- $E [X_{i}^{2}] = σ^{2} + μ^{2}$
- $Var (\bar{X}) = \frac{σ^{2}}{n}$
- $E [{\bar{X}}^{2}] = Var (\bar{X}) + E [\bar{X}]^{2} = \frac{σ^{2}}{n} + μ^{2}$
Substitution: $= \sum_{i = 1}^{n} [σ^{2} + μ^{2}] - n [\frac{σ^{2}}{n} + μ^{2}]$ $= n [σ^{2} + μ^{2}] - σ^{2} - n μ^{2}$ $= n σ^{2} - σ^{2} = (n - 1) σ^{2}$

Using the algebraic expansion of the sum of squares:

S^{2} = \frac{1}{n - 1} (\sum_{i = 1}^{n} X_{i}^{2} - n {\bar{X}}^{2})

Taking the expectation:

E [S^{2}] = \frac{1}{n - 1} E [\sum_{i = 1}^{n} X_{i}^{2} - n {\bar{X}}^{2}]

Applying the Lemma:

E [S^{2}] = \frac{1}{n - 1} (n - 1) σ^{2} = σ^{2}

Computationally, the variance can be expressed as:

(n - 1) S^{2} = \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}

This form appears frequently in distribution theory, particularly when working with the Chi-squared Distribution.

When sampling from a Normal Distribution $N (μ, σ^{2})$ :

\frac{(n - 1) S^{2}}{σ^{2}} \sim χ_{(n - 1)}^{2}

The Sample Variance ( $S^{2}$ ) measures the dispersion (spread) of data points in a sample around the Sample Mean.
Formula:
- $S^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$
Components:
- $n$ : Sample size.
- $n - 1$ : Degrees of freedom (Bessel's correction). Using $n - 1$ instead of $n$ makes $S^{2}$ an unbiased estimator of the population variance $σ^{2}$ .
- $X_{i}$ : Individual observations.
- $\bar{X}$ : Sample mean.
Relationship to Standard Deviation:
- The Standard Deviation ( $S$ ) is the square root of the variance: $S = \sqrt{S^{2}}$ .
Calculation steps:
1. Calculate sample mean $\bar{X}$ .
2. Subtract mean from each observation $(X_{i} - \bar{X})$ .
3. Square the differences.
4. Sum the squares.
5. Divide by $n - 1$ .