<- ? # How many samples to average over
n <- ? # Population mean
l <- ? # Population standard deviation
s <- ? # The odd observation
x
1 - pnorm(x, mean = ?, sd = ?)
# Simulation
<- 10000 # The number of e.g. years to simulate
k
<- matrix(rpois(n*k, l), k, n) # Simulate data
X
<- apply(X, 1, mean) # Compute row-wise mean
mX
hist(mX)
sum(mX > x) / length(mX)
9 Central limit theorem
- Understand what the central limit theorem is, and how this is a useful theorem when dealing with uncertain data.
- Know the difference between the distribution of a population and the distribution of the population parameters.
9.1 Reading material
- Central Limit Theorem (CLT) - in short
- Videos on the central limit theorem:
- Chapter 8.3 in Introduction to Probability and Statistics Using R by J.G. Kern
- Available on Absalon
9.2 Central Limit Theorem (CLT) - in short
In short, the Central Limit Theorem (CLT) says, that the central parameter (e.g. the mean) obtained from a sample (stikprøve) from ANY distribution is normal distributed with variance equal to the sample variance divided by the sample size. That is:
\[ \bar{X} \sim \mathcal{N}(\mu,\sigma^2/n) \quad for \: n \rightarrow \infty \]
As we are dealing with samples of finite size, and further needs to estimate the parameters (mean and variance) based on this sample, the Normal distribution is exchanged by the T-distribution. The T-distribution take the uncertainty of having finite data into account. That is, if \(X_1,X_2,..,X_n\) is randomly sampled from some distribution with mean \(\mu\) and variance \(\sigma^2\) (approximated by \(s_X^2\)), then:
\[ \frac{\bar{X} - \mu}{s_X/\sqrt{n}} \sim \mathcal{T}(df) \quad for \: finite \: n \]
Where \(df\) (degrees of freedom) is equal to how well the variance is estimated (usually \(df = n-1\)).