STA260 Lecture 03
- Let
be a random sample from a normal distribution, denoted as . - The sample mean is defined as
. - Under these conditions, the sampling distribution of the mean is exactly
. - Note: This result holds exactly if the population is normal. If the population is not normal,
approximately as via the Central Limit Theorem. This is just one specific case.
- Note: This result holds exactly if the population is normal. If the population is not normal,
- General Property of Linear Combinations:
- If independent random variables are distributed as
… - Then their linear combination follows:
. - #tk expected to remember
- If independent random variables are distributed as
- A consequence of the distribution
: - A difference in
affects the location of the center (mean, median, mode). obviously affects the dispersion (spread).
- A difference in
- Standardization of the Normal Distribution:
- To work with standard tables, we shift
towards the middle by subtracting the mean: . - The distribution of this shifted variable is
. - This is how we standardize the Normal Distribution to become a Standard Normal Distribution.
- We divide by the standard deviation of the mean (standard error):
.
- To work with standard tables, we shift
- So now we have the final Standard Normal Distribution variable
: where .
- Summary of Procedure:
- Given
. - For the statistic
. - Since our standardization is
. - We can use Z-score tables to evaluate probabilities for the sample mean.
- Given
- General Example:
- Original problem has:
. - We want to find the probability
. - Then we convert it into
where . - Then use the Z-Score table (CDF
) to find the output.
- Original problem has:
Example Problem 1: Probability Calculation
- Given:
where the population variance is (so ). - Sample size
. - We want to find:
.
- Solution steps:
- You can rewrite the inequality as the distance from the mean:
. - Expanded:
. - Divide all parts by the standard error
: - Because
(it's positive), we don't need to worry about flipping the inequality signs. - Substitute
: . - Substitute values (
): - Simplify:
. - Calculation using standard normal tables:
- You can rewrite the inequality as the distance from the mean:
- Visualization:
\begin{document} \begin{tikzpicture}[>=stealth, scale=3] % Define the area to be filled (the centre) \fill[color=blue!15] plot[domain=-0.9:0.9, samples=100] (\x, {exp(-0.5*\x*\x)}) -- (0.9,0) -- (-0.9,0) -- cycle; % Draw the normal distribution curve \draw[thick, color=blue!80!black] plot[domain=-3:3, samples=100] (\x, {exp(-0.5*\x*\x)}); % Draw the horizontal axis \draw[->] (-3.5,0) -- (3.5,0) node[right] {$z$}; % Vertical dashed lines at z = -0.9 and z = 0.9 \draw[dashed, thin] (-0.9,0) -- (-0.9, {exp(-0.5*0.81)}); \draw[dashed, thin] (0.9,0) -- (0.9, {exp(-0.5*0.81)}); % Labels for the z-axis coordinates \node[below] at (-0.9,0) {\small $-0.9$}; \node[below] at (0.9,0) {\small $0.9$}; \node[below] at (0,0) {\small $0$}; % Label for the central area \node at (0, 0.4) {\scalebox{0.9}{$0.6318$}}; % Labelling the symmetrical tails % Left tail \draw[<-] (-1.3, 0.1) -- (-2, 0.5) node[above] {\small $0.1841$}; % Right tail \draw[<-] (1.3, 0.1) -- (2, 0.5) node[above] {\small $0.1841$}; % Symmetry indicator arrow \draw[<->, bend left=30, color=gray] (-1.1, 1.1) to node[above, color=black] {\footnotesize Symmetry} (1.1, 1.1); % Title or formula context \node[above] at (0, 1.3) {\footnotesize $P(-0.9 \leq Z \leq 0.9) = 1 - 2(0.1841)$}; \end{tikzpicture} \end{document}
Example Problem 2: Sample Size Determination
- Question: How big of a sample size (
) do we want if we require the sample mean to be within of with probability ? - Setup:
- Standardize:
- Simplify fraction:
- Express as absolute value:
- Solving for
: - Let
. - We need
. - This implies the tails sum to
, so each tail is . - We look for the Z-score where the area to the right is
(or cumulative area is ). - From tables,
. - Set equations equal:
. . . - Find the ceiling:
. We always round up when determining sample size to ensure the probability condition is fully met.
- Let
Review: Convergence Concepts
- Review from STA256:
- The Cumulative Distribution Function (CDF) is defined as
. - Convergence in Distribution:
- Let
be a sequence of random variables with corresponding CDFs . - Let
be a random variable with CDF . - We say
(converges in distribution) if: - Where
is continuous.
- Let
- Application to Central Limit Theorem:
- This explains why the distribution of sample means approaches normality.
- Let
be an iid sequence of Random Variables with finite mean and variance . - Let
. - The Central Limit Theorem tells us
. - Or in standardized form:
.
Example Problem 3: CLT Application
-
Given:
(mean service time in minutes). (standard deviation).
-
Question: Approximate the probability that
can be served in hours. - Convert time units:
. - We want to find
.
- Convert time units:
-
Approach:
- Since we don't know the distribution of each
, we have to use CLT. - We assume
. - By Central Limit Theorem:
. - Rewrite the sum in terms of the sample mean:
- Since we don't know the distribution of each
-
Calculation:
- Standardize the event:
- Standardize the event:
Responses to #tk Flags
#tk Item 1: Linear Combinations of Normal Variables
Context: "If
Explanation:
This property is fundamental to statistical theory involving normal distributions. It states that any linear combination of independent normal random variables is itself normally distributed.
There are three key components to remember for this formula:
- Normality: The sum of normal variables remains normal. It does not change shape to some other distribution.
- Mean (Linearity of Expectation): The expected value operator is linear.
- Variance (Independence): The variance operator is not linear in the same way. When variables are independent, the variance of a sum is the sum of the variances. Crucially, constants pull out squared.
Relevance to Lecture: This explains why the sample mean
- Mean:
- Variance:
.
Lecture Summary
Main Thesis:
This lecture establishes the foundational machinery for statistical inference by demonstrating how to calculate probabilities for sample means using the Central Limit Theorem (CLT) and the Standard Normal Distribution. It connects the theoretical concept of Convergence in Distribution to the practical application of approximating probabilities for large samples, even when the underlying population distribution is unknown.
Key Concepts:
- Sampling Distribution of
: If the population is normal, is exactly normal. If the population is non-normal but is large, is approximately normal ( ) with mean and variance . - Standardization (
-score): To compute probabilities, any normal sample mean can be transformed into a standard normal variable using the formula . - Sample Size Determination: We can reverse-engineer the probability formula to find the minimum sample size
required to ensure the sample mean falls within a specific margin of error with a desired confidence level (e.g., 95%). - Convergence in Distribution: The formal mathematical definition involves the limit of Cumulative Distribution Functions (CDFs). Specifically,
means the CDF of the sequence approaches the CDF of the target distribution as .
Practice Questions
Remember/Understand Level
- Define Convergence in Distribution. What specifically must converge for a sequence of random variables to converge in distribution?
- State the parameters. If
and we take a sample of size , what are the mean and variance of the sampling distribution of ? - Explain the Standard Error. What is the difference between
and ? When do you use one versus the other?
Apply/Analyze Level
- Calculate Probability. A factory produces bolts with a mean length of 10 cm and standard deviation of 0.2 cm. If you sample 25 bolts, what is the probability that the average length is greater than 10.05 cm?
- Determine Sample Size. You want to estimate a population mean. You know
. How large a sample is needed so that the probability of your estimate being off by more than 2 units is only 0.01? (Hint: ) - Linear Combinations. Let
and be independent. What is the distribution of ?
Evaluate/Create Level
- Evaluate Assumptions. In the example problem where we calculated the probability of serving 100 customers in 2 hours, we assumed the service times were independent. What would happen to our estimate if the service times were positively correlated (e.g., a slow server implies the next customer is also served slowly)? Would the variance of the sum be higher or lower?
Challenging Concepts to Review
Concept 1: Standard Deviation Vs. Standard Error
Why it's challenging: Students often confuse
Study strategy: Visualize the difference.
Concept 2: The Variance of Linear Combinations
Why it's challenging: It is intuitive to add means (
Study strategy: Use the definition of variance:
Concept 3: Convergence in Distribution
Why it's challenging: It is a limiting concept involving functions (CDFs) rather than single numbers. It is more abstract than "the numbers get closer."
Study strategy: Think of it graphically. Imagine the graph of the CDF for