STA258 Lecture 01
Pre-Lecture
1. Introduction.pdf
STA258 Pre-Lecture Summary 01
Lecture
1. Introduction.pdf
Mathematical Statistics with Applications (7thEdition) Wackerly, Mendenhall and Scheaffer Cengage Learning; 7th edition
In a typical Statistics problem we have: a random variable , whose distribution is known. But its parameters are unknown.
Ex:
Lifetime of a rat in a lab, what's suitable? Exponential Distribution . Because the chance for it to live forever is not really possible. It has to die sometime.
X ∼ Exp ( λ )
What's the value of λ ?
We have to infer based on the data we collect, we can get what λ is.
Or we see something else.
X ∼ N ( μ , σ 2 )
The errors of measurement follow a gaussian / Normal Distribution
X ∼ Poisson ( λ )
We represent the parameters of our distributions with θ .
Instead of λ , μ , σ 2 for these distributions, we just use θ
So we have X ∼ Exp ( θ ) or X ∼ N ( θ 1 , θ 2 )
Or X ∼ Poisson ( θ )
How can we estimate the unknown θ based on our observations.
Statistic
Ex:
We have X 1 , … , X n ∼ f ( X ; θ )
This distribution is parameterized by θ
The sequence of Random Variables : X 1 , … , X n is called a sample
Each of the RVs is called an Observation
Once the sample is Realized , we denote the numerical values taken by each observation using lowercase letters.
X 1 ∼ Exp ( λ )
X 2 ∼ Exp ( λ )
X 3 ∼ Exp ( λ )
X ¯ is an Estimator for λ
x ¯ is an Estimate for λ
Our Estimate is always known.
Realized sample mean: average
Prior to measurement
Then we realize it and get our observation:
Prior to Measurement Observations X 1 ∼ Exp ( λ ) X 1 = 1 X 2 ∼ Exp ( λ ) X 2 = 2 X 3 ∼ Exp ( λ ) X 3 = 3 Probability Theory Statistics
Note a Statistic is always a function of the sample
Γ = g ( X 1 , … , X n )
So X ¯ = X 1 + X 2 + ⋯ + X n n is the sample mean
X ( n ) = max { X 1 , … , X n } is the sample max
Each of these Statistic s can be an Estimator
While we call T an Estimator of θ , we call its realization t an estimate of θ .
The Estimator is always a random variable
A sample is random if the observations that you have are iid.
We have quantitative and qualitative data .
Population Studies
Types of Studies
Ex:
5 observations
110 102 130 130 115
We can get the sample mean X ¯ = ⋯ = 117.4
We can get variance as S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ¯ ) 2
S 2 = 1 5 − 1 [ ( 110 − 117.4 ) 2 + ( 102 − 117.4 ) 2 + ⋯ + ( 115 − 117.4 ) 2 ] = 153.4
This is an Estimator for population mean, variance, etc.
Average is not a robust Estimator :
110 102 130 1000 115
We have one outlier
Can be due to manual data entry errors or whatever
Now average X ¯ = 110 + 102 + 130 + 1000 + 115 5 = 1457 5 = 291.4
Crazy difference between this and last set.
So X ¯ our sample mean, is not a robust Estimator against outliers.
Sample Standard Deviation
S = S 2 = 153.4 = 12.3854753643128
Median
Sort the data:
Median
With a data set with outliers
102 120 115 ⏟ Median 130 1000
Same Median
So this is a robust Estimator against outliers
Mode
Ex:
102 110 115 130 130
See that 130 × 2 so that's our mode
Sometimes it can show that maximum of the distribution, if you have a pdf like normal. Then that's our μ , so we have one parameter.
If a distribution has two modes, it's bimodal
Average doesn't tell you much here for the whole class.
If we have a unimodel distribution, the average does tell us about the class.
Otherwise average doesn't mean much.
Ex: 50 60 70 80 80 and 68 69 70 71 72
We have μ 1 = 70 = μ 2
Average doesn't tell us anything, same for both
But second set performed more consistent
range 1 = 40
range 2 = 4