STA260 Lecture 03

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample from a normal distribution, denoted as $X_{i} \sim N (μ, σ^{2})$ .
The sample mean is defined as $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ .
Under these conditions, the sampling distribution of the mean is exactly $\bar{X} \sim N (μ, \frac{σ^{2}}{n})$ .
- Note: This result holds exactly if the population is normal. If the population is not normal, $\bar{X} \sim N (μ, \frac{σ^{2}}{n})$ approximately as $n \to \infty$ via the Central Limit Theorem. This is just one specific case.
General Property of Linear Combinations:
- If independent random variables are distributed as $X_{i} \sim N (μ_{i}, σ_{i}^{2})$ …
- Then their linear combination follows: $\sum_{i = 1}^{n} a_{i} X_{i} \sim N (\sum_{i = 1}^{n} a_{i} μ_{i}, \sum_{i = 1}^{n} a_{i}^{2} σ_{i}^{2})$ .
  - #tk expected to remember
A consequence of the distribution $\bar{X} \sim N (μ, \frac{σ^{2}}{n})$ :
- A difference in $μ$ affects the location of the center (mean, median, mode).
- $σ^{2}$ obviously affects the dispersion (spread).
Standardization of the Normal Distribution:
- To work with standard tables, we shift $\bar{X}$ towards the middle by subtracting the mean: $\bar{X} - μ$ .
- The distribution of this shifted variable is $\bar{X} - μ \sim N (0, \frac{σ^{2}}{n})$ .
- This is how we standardize the Normal Distribution to become a Standard Normal Distribution.
- We divide by the standard deviation of the mean (standard error): $\frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}} \sim N (0, 1)$ .
So now we have the final Standard Normal Distribution variable $Z$ :
- $Z \sim N (0, 1)$ where $Z = \frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}}$ .
Summary of Procedure:
- Given $X_{1}, X_{2}, \dots, X_{n} \overset{i i d}{\sim} N (μ, σ^{2})$ .
- For the statistic $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ .
- Since our standardization is $Z = \frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}} \sim N (0, 1)$ .
- We can use Z-score tables to evaluate probabilities for the sample mean.
General Example:
- Original problem has: $\bar{X} \sim N (μ, \frac{σ^{2}}{n})$ .
- We want to find the probability $P (a \leq \bar{X} \leq b)$ .
- Then we convert it into $P (c \leq Z \leq d)$ where $Z \sim N (0, 1)$ .
- Then use the Z-Score table (CDF $Φ (z)$ ) to find the output.

Example Problem 1: Probability Calculation

Given:
- $\bar{X} \sim N (μ, \frac{σ^{2}}{n})$ where the population variance is $σ^{2} = 1$ (so $σ = 1$ ).
- Sample size $n = 9$ .
- We want to find: $P (μ - 0.3 \leq \bar{X} \leq μ + 0.3)$ .
Solution steps:
- You can rewrite the inequality as the distance from the mean: $P (| \bar{X} - μ | \leq 0.3)$ .
- Expanded: $P (- 0.3 \leq \bar{X} - μ \leq 0.3)$ .
- Divide all parts by the standard error $\frac{σ}{\sqrt{n}}$ : $P (- \frac{0.3}{\frac{σ}{\sqrt{n}}} \leq \frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}} \leq \frac{0.3}{\frac{σ}{\sqrt{n}}})$
- Because $\frac{σ}{\sqrt{n}} > 0$ (it's positive), we don't need to worry about flipping the inequality signs.
- Substitute $Z$ : $P (- \frac{0.3}{\frac{σ}{\sqrt{n}}} \leq Z \leq \frac{0.3}{\frac{σ}{\sqrt{n}}})$ .
- Substitute values ( $σ = 1, n = 9 ⟹ \sqrt{n} = 3$ ): $P (- \frac{0.3}{\frac{1}{3}} \leq Z \leq \frac{0.3}{\frac{1}{3}})$
- Simplify: $P (- 0.9 \leq Z \leq 0.9)$ .
- Calculation using standard normal tables: $P (- 0.9 \leq Z \leq 0.9) = 1 - 2 P (Z > 0.9) = 1 - 2 (0.1841) = 0.6318$

Visualization:

\begin{document}
  \begin{tikzpicture}[>=stealth, scale=3]
 % Define the area to be filled (the centre)
 \fill[color=blue!15] plot[domain=-0.9:0.9, samples=100] (\x, {exp(-0.5*\x*\x)}) -- (0.9,0) -- (-0.9,0) -- cycle;

 % Draw the normal distribution curve
 \draw[thick, color=blue!80!black] plot[domain=-3:3, samples=100] (\x, {exp(-0.5*\x*\x)});

 % Draw the horizontal axis
 \draw[->] (-3.5,0) -- (3.5,0) node[right] {$z$};

 % Vertical dashed lines at z = -0.9 and z = 0.9
 \draw[dashed, thin] (-0.9,0) -- (-0.9, {exp(-0.5*0.81)});
 \draw[dashed, thin] (0.9,0) -- (0.9, {exp(-0.5*0.81)});

 % Labels for the z-axis coordinates
 \node[below] at (-0.9,0) {\small $-0.9$};
 \node[below] at (0.9,0) {\small $0.9$};
 \node[below] at (0,0) {\small $0$};

 % Label for the central area
 \node at (0, 0.4) {\scalebox{0.9}{$0.6318$}};

 % Labelling the symmetrical tails
 % Left tail
 \draw[<-] (-1.3, 0.1) -- (-2, 0.5) node[above] {\small $0.1841$};
 % Right tail
 \draw[<-] (1.3, 0.1) -- (2, 0.5) node[above] {\small $0.1841$};

 % Symmetry indicator arrow
 \draw[<->, bend left=30, color=gray] (-1.1, 1.1) to node[above, color=black] {\footnotesize Symmetry} (1.1, 1.1);

 % Title or formula context
 \node[above] at (0, 1.3) {\footnotesize $P(-0.9 \leq Z \leq 0.9) = 1 - 2(0.1841)$};
  \end{tikzpicture}
\end{document}

Example Problem 2: Sample Size Determination

Question: How big of a sample size ( $n$ ) do we want if we require the sample mean to be within $0.3$ of $μ$ with probability $0.95$ ?
Setup:
- $P (- 0.3 \leq \bar{X} - μ \leq 0.3) = 0.95$
- Standardize: $P (- \frac{0.3}{\frac{1}{\sqrt{n}}} \leq Z \leq \frac{0.3}{\frac{1}{\sqrt{n}}}) = 0.95$
- Simplify fraction: $P (- 0.3 \sqrt{n} \leq Z \leq 0.3 \sqrt{n}) = 0.95$
- Express as absolute value: $P (| Z | \leq 0.3 \sqrt{n}) = 0.95$
Solving for $n$ :
- Let $a = 0.3 \sqrt{n}$ .
- We need $P (| Z | \leq a) = 0.95$ .
- This implies the tails sum to $0.05$ , so each tail is $0.025$ .
- We look for the Z-score where the area to the right is $0.025$ (or cumulative area is $0.975$ ).
- From tables, $a = 1.96$ .
- Set equations equal: $0.3 \sqrt{n} = 1.96$ .
- $\sqrt{n} = \frac{1.96}{0.3} \approx 6.533$ .
- $n = (6.533)^{2} \approx 42.684$ .
- Find the ceiling: $n = 43$ . We always round up when determining sample size to ensure the probability condition is fully met.

Review: Convergence Concepts

Review from STA256:
The Cumulative Distribution Function (CDF) is defined as $F_{X} (x) = P (X \leq x)$ .
Convergence in Distribution:
- Let $X_{1}, X_{2}, \dots$ be a sequence of random variables with corresponding CDFs $F_{X_{n}} (x)$ .
- Let $X$ be a random variable with CDF $F_{X} (x)$ .
- We say $X_{n} \overset{D}{\to} X$ (converges in distribution) if: $lim_{n \to \infty} F_{X_{n}} (x) = F_{X} (x) \forall x \in R$
- Where $F_{X} (x)$ is continuous.
Application to Central Limit Theorem:
- This explains why the distribution of sample means approaches normality.
- Let $X_{1}, X_{2}, \dots, X_{n}$ be an iid sequence of $n$ Random Variables with finite mean $E [X_{i}] = μ$ and variance $Var (X_{i}) = σ^{2}$ .
- Let $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ .
- The Central Limit Theorem tells us $\bar{X} \overset{D}{\to} N (μ, \frac{σ^{2}}{n})$ .
- Or in standardized form: $\frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}} \overset{D}{\to} N (0, 1)$ .

Example Problem 3: CLT Application

Given:
- $μ = 1.5$ (mean service time in minutes).
- $σ = 1$ (standard deviation).
Question: Approximate the probability that $n = 100$ can be served in $2$ hours.
- Convert time units: $2 hours = 120 minutes$ .
- We want to find $P (\sum_{i = 1}^{n} X_{i} < 120)$ .
Approach:
- Since we don't know the distribution of each $X_{i}$ , we have to use CLT.
- We assume $X_{i} \overset{i i d}{\sim} ? (μ = 1.5, σ = 1)$ .
- By Central Limit Theorem: $\bar{X} \sim N (μ_{\bar{X}} = μ, σ_{\bar{X}}^{2} = \frac{σ^{2}}{n})$ .
- Rewrite the sum in terms of the sample mean: $P (\sum_{i = 1}^{100} X_{i} < 120) = P (n \bar{X} < 120)$
- $P (\bar{X} < \frac{120}{n}) = P (\bar{X} < \frac{120}{100}) = P (\bar{X} < 1.2)$
Calculation:
- Standardize the event: $P (\bar{X} - μ < 1.2 - μ)$
- $P (\frac{\bar{X} - μ}{\frac{σ}{\sqrt{n}}} < \frac{1.2 - 1.5}{\frac{1}{\sqrt{100}}})$
- $P (Z < \frac{- 0.3}{0.1})$
- $P (Z < - 3)$
- $\overset{By Symmetry}{=} 0.00135$
Chi-squared Distribution
Gamma Function
Gamma Distribution

Responses to #tk Flags

#tk Item 1: Linear Combinations of Normal Variables

Context: "If $X_{i} \sim N (μ_{i}, σ_{i}^{2})$ Then $\sum_{i = 1}^{n} a_{i} X_{i} \sim N (\sum_{i = 1}^{n} a_{i} μ_{i}, \sum_{i = 1}^{n} a_{i}^{2} σ_{i}^{2})$ #tk expected to remember"

Explanation:
This property is fundamental to statistical theory involving normal distributions. It states that any linear combination of independent normal random variables is itself normally distributed.

There are three key components to remember for this formula:

Normality: The sum of normal variables remains normal. It does not change shape to some other distribution.
Mean (Linearity of Expectation): The expected value operator is linear. $E [\sum a_{i} X_{i}] = \sum a_{i} E [X_{i}] = \sum a_{i} μ_{i}$
Variance (Independence): The variance operator is not linear in the same way. When variables are independent, the variance of a sum is the sum of the variances. Crucially, constants pull out squared. $V a r (\sum a_{i} X_{i}) = \sum V a r (a_{i} X_{i}) = \sum a_{i}^{2} V a r (X_{i}) = \sum a_{i}^{2} σ_{i}^{2}$

Relevance to Lecture: This explains why the sample mean $\bar{X} = \frac{1}{n} \sum X_{i}$ is normal. Here, each $a_{i} = \frac{1}{n}$ .

Mean: $\sum \frac{1}{n} μ = \frac{n μ}{n} = μ$
Variance: $\sum (\frac{1}{n})^{2} σ^{2} = n (\frac{1}{n^{2}} σ^{2}) = \frac{σ^{2}}{n}$ .

Lecture Summary

Main Thesis:
This lecture establishes the foundational machinery for statistical inference by demonstrating how to calculate probabilities for sample means using the Central Limit Theorem (CLT) and the Standard Normal Distribution. It connects the theoretical concept of Convergence in Distribution to the practical application of approximating probabilities for large samples, even when the underlying population distribution is unknown.

Key Concepts:

Sampling Distribution of $\bar{X}$ : If the population is normal, $\bar{X}$ is exactly normal. If the population is non-normal but $n$ is large, $\bar{X}$ is approximately normal ( $C L T$ ) with mean $μ$ and variance $\frac{σ^{2}}{n}$ .
Standardization ( $Z$ -score): To compute probabilities, any normal sample mean $\bar{X}$ can be transformed into a standard normal variable $Z$ using the formula $Z = \frac{\bar{X} - μ}{σ / \sqrt{n}}$ .
Sample Size Determination: We can reverse-engineer the probability formula to find the minimum sample size $n$ required to ensure the sample mean falls within a specific margin of error with a desired confidence level (e.g., 95%).
Convergence in Distribution: The formal mathematical definition involves the limit of Cumulative Distribution Functions (CDFs). Specifically, $X_{n} \overset{D}{\to} X$ means the CDF of the sequence approaches the CDF of the target distribution as $n \to \infty$ .

Practice Questions

Remember/Understand Level

Define Convergence in Distribution. What specifically must converge for a sequence of random variables to converge in distribution?
State the parameters. If $X \sim N (μ, σ^{2})$ and we take a sample of size $n$ , what are the mean and variance of the sampling distribution of $\bar{X}$ ?
Explain the Standard Error. What is the difference between $σ$ and $\frac{σ}{\sqrt{n}}$ ? When do you use one versus the other?

Apply/Analyze Level

Calculate Probability. A factory produces bolts with a mean length of 10 cm and standard deviation of 0.2 cm. If you sample 25 bolts, what is the probability that the average length is greater than 10.05 cm?
Determine Sample Size. You want to estimate a population mean. You know $σ = 15$ . How large a sample is needed so that the probability of your estimate being off by more than 2 units is only 0.01? (Hint: $P (| Z | > z) = 0.01$ )
Linear Combinations. Let $X \sim N (10, 4)$ and $Y \sim N (5, 9)$ be independent. What is the distribution of $W = 2 X - Y$ ?

Evaluate/Create Level

Evaluate Assumptions. In the example problem where we calculated the probability of serving 100 customers in 2 hours, we assumed the service times were independent. What would happen to our estimate if the service times were positively correlated (e.g., a slow server implies the next customer is also served slowly)? Would the variance of the sum be higher or lower?

Challenging Concepts to Review

Concept 1: Standard Deviation Vs. Standard Error

Why it's challenging: Students often confuse $σ$ (variability of a single observation) with $\frac{σ}{\sqrt{n}}$ (variability of the sample average).
Study strategy: Visualize the difference. $σ$ is how "wide" the population bell curve is. $\frac{σ}{\sqrt{n}}$ is how "wide" the distribution of averages is. As $n$ gets huge, the average gets very precise, so the bell curve for $\bar{X}$ gets extremely narrow (approaches a spike at $μ$ ), while the population curve stays the same.

Concept 2: The Variance of Linear Combinations

Why it's challenging: It is intuitive to add means ( $E [X + Y] = E [X] + E [Y]$ ), but unintuitive that variances also add ( $V a r (X - Y) = V a r (X) + V a r (Y)$ ) even when subtracting variables, provided they are independent. Also, remembering to square the constants ( $a^{2}$ ) is a common error source.
Study strategy: Use the definition of variance: $V a r (a X) = E [(a X - a μ)^{2}] = E [a^{2} (X - μ)^{2}] = a^{2} V a r (X)$ . Work through the proof once to see where the square comes from.

Concept 3: Convergence in Distribution

Why it's challenging: It is a limiting concept involving functions (CDFs) rather than single numbers. It is more abstract than "the numbers get closer."
Study strategy: Think of it graphically. Imagine the graph of the CDF for $n = 1, n = 2, n = 10$ . Watch how the curve changes shape until it perfectly overlaps the curve of the Normal CDF. It's about the shape stabilizing.

Your Action Plan

Immediate Review Actions

Memorize the properties of linear combinations: $E [\sum a_{i} X_{i}]$ and $V a r (\sum a_{i} X_{i})$ .
Verify the definition of Convergence in Distribution against textbook definitions in STA256.
Review the calculated example for $n = 100$ customers to ensure you understand the transition from Sum to Mean.

Practice and Application

Complete the "Apply/Analyze" practice question #1 (Factory bolts) to practice standardization.
Complete "Apply/Analyze" question #3 (Linear Combinations) to test understanding of the #tk item.
Re-calculate the Sample Size example using a different confidence level (e.g., 0.99 instead of 0.95) to practice the reverse lookup.

Deep Dive Study

Review Chi-squared and Gamma distribution definitions in preparation for the next lecture, as they were teased at the end.
Visualize the Central Limit Theorem using an online simulation (sampling from a uniform/skewed distribution) to see convergence in action.

Verification and Integration

Check if the assumption of $σ$ being known is realistic. (Hint: In future lectures, you will likely introduce the T-distribution for when $σ$ is unknown).