Sample Variance

The sample variance S2 measures the spread of observations in a sample around the Sample Mean X¯.

Definition

For a sample X1,,Xn:

S2=1n1i=1n(XiX¯)2

where X¯=1ni=1nXi is the sample mean.

Why n1?

The divisor is n1 (not n) to make S2 an unbiased estimator of the population variance σ2:

E[S2]=σ2

This correction is called Bessel's correction. The n1 accounts for the Degrees of Freedom lost when estimating μ with X¯.

Proof: Unbiased Estimator

Let X1,,Xn be a Random Sample (iid) from a distribution where E[Xi]=μ and Var(Xi)=σ2.

We want to show that E[S2]=σ2.

Lemma

First, we establish a useful identity: E[i=1nXi2nX¯2]=(n1)σ2.

  1. Linearity:E[i=1nXi2]nE[X¯2]=i=1nE[Xi2]nE[X¯2]
  2. Recall Variance Identities:
    • E[Xi2]=σ2+μ2
    • Var(X¯)=σ2n
    • E[X¯2]=Var(X¯)+E[X¯]2=σ2n+μ2
  3. Substitution:=i=1n[σ2+μ2]n[σ2n+μ2]=n[σ2+μ2]σ2nμ2=nσ2σ2=(n1)σ2

Derivation of Expectation

Using the algebraic expansion of the sum of squares:

S2=1n1(i=1nXi2nX¯2)

Taking the expectation:

E[S2]=1n1E[i=1nXi2nX¯2]

Applying the Lemma:

E[S2]=1n1(n1)σ2=σ2

Alternative Form

Computationally, the variance can be expressed as:

(n1)S2=i=1n(XiX¯)2

This form appears frequently in distribution theory, particularly when working with the Chi-squared Distribution.

Sampling Distribution

When sampling from a Normal Distribution N(μ,σ2):

(n1)S2σ2χ(n1)2

See Marginal Distribution of Sample Variance for details.

Summary