STA260 Lecture 11
- Review
- Maximum Likelihood Estimation
- If where is a parameter.
- We have the score function as:
- This gives us information about the strength and direction of evidence using and
- For sufficienciently small
- Small shows us the localized behaviour of around .
- This is useful if we have something in higher dimensions we can't visualize. We can use the score function to get a sense of where the maximum is.
- Fisher Information
- Measures information about (a point) or a sample.
- High fisher information means is steep around its maximum, which means precise parameter estimation with data is possible.
- Low fisher information means is flat around its maximum, which means not really a precise parameter estimation.
- Under certain conditions (regularity conditions) on , converges in distribution to a standard normal distribution as .
- This means that is approximately normally distributed with mean and variance for large .
- Under regularity conditions:
- is twice differentiable with respect to .
-
- : Fisher Information of a point.
- : Fisher Information in a sample of size .
- Required to have multiple data points to get a precise estimate of .
- Now let's try to manipulate to show the above for fisher information.
- So variance of is .
- Example Using Theorem 10
- Find the distribution the MLE of
- Let
- Fisher Information:
- #tk practice doing with first derivative method as well.
- In the previous section we know that
- By theorem 10:
- We know
- Bayesian Approach to Parameter Estimation
- Review from STA256, Bayes' Theorem
- For two events and with , we have:
- Apply this but for parameter estimation:
- Let be a parameter of interest and be the observed data.
- We look at
- Probability of a parameter given data.
- is our prior belief about before seeing the data.
- Initial best guess.
- Can be based on previous studies, expert knowledge, or subjective belief.
- Can be informative (specific belief) or non-informative (vague belief).
-
- Likelihood of the data given the parameter .
- How well does the parameter explain the observed data?
- Same as the likelihood function in MLE.
-
- Marginal likelihood or evidence.
- Normalizing constant to ensure is a valid probability distribution.
- Calculated by integrating over all possible values of :
-
- Posterior distribution of given the observed data .
- Prior: Past.
- Posterior: Future - Present Beliefs.
- Updated belief about after observing the data.
- Combines prior belief and likelihood of the observed data.
- Select a prior
- Best initial guess / starting point.
- Determine the likelihood
- Calculate using assumed distribution of
- How well does the parameter explain the observed data?
- Notice:
- doesn't involve , no parameter.
- Then treat it as a constant. Toss it.
- To get the marginal distribution of , we integrate out :
- Substitute it into Bayes' Theorem.
- The denominator is hard to integrate, only gives info on our data.
- No info on the parameter . So we can ignore it.
- means "proportional to"
- where is a constant that doesn't depend on .
- Examine the resulting form, compare it with a known distribution. Then you get the form of the posterior distribution.
- Example:
- is our prior.
- Apply the Bayesian approach to find the posterior distribution of .
- We have the prior, so we just need the likelihood.
- Assuming independence, we can write the likelihood as a product of the individual likelihoods for each .
- The first part is a constant, so we can ignore it. Just has data.
- has a prior as
- Different from the formula sheet.
- To find do change of variables:
- Throw away the constants, just has data.
- Bring it together:
- Likelihood times prior.
- Group the garbage terms together, just has data.
- Let and
- Because we're missing the term, we can say.
- now divide by the constant to make it a valid distribution.
- Absorb the first part into the propto.
- Looks similar to a gamma distribution, so we can say