Central Limit Theorem


This entry is part 3 of 5 in the series Inferential Statistics

The Central Limit Theorem (CLT) states that no matter the distribution of the population (Binomial, Uniform, Exponential, or another one), the sampling distribution of the mean will approximate a normal distribution. Its mean is similar to the population’s mean. The more you sample, the closer you are to the population mean. Eventually, sample mean will be roughly equal to the population mean.

Data professionals use the central limit theorem to estimate population parameters for data in economics, science, business, and many other fields.

What do we mean by sampling distribution of the mean? Suppose we have a large population of people. Let’s take a sample of people’s heights and calculate the mean. Let’s do this over and over again. Now we have a sampling distribution of the mean. The mean of all those means will approximate the population mean. The variance will be equal to the original variance divided by the sample size. So we can take many samples and create a new dataset that consists of a bunch of means. We have a set of sample means. When we are referring to a distribution formed by samples, we use the sampling distribution. If it’s means we have, then we can say that we have a sampling distribution of the means. If we take the mean of those sample means, we expect to get a very close approximation of the population mean.

For the Central Limit Theorem (CLT) to apply, we need at least 30 observations.

What makes the CLT important in statistics? The CLT allows us to perform tests, solve problems and make inferences using the normal distribution even when the population is not normally distributed.

Standard Error

What is the standard error? The standard error is the standard deviation of the distribution formed by the sample means. In other words – the standard deviation of the sampling distribution. The standard error descreases when the sample size increases. Recall that the variance is sigma squared (σ2) divided by n. Like a standard deviation, the standard error shows variability.

SE = \dfrac{\sigma}{\sqrt{n}}

The standard deviation of a sample statistic is called the standard error. The standard error of the mean measures variability among all your sample means. You have taken several samples from the same population and have computed the mean for each sample. Do not confuse this with the standard deviation which refers to the variability of the individual values themselves.

Do not confuse this with the margin of error.

Estimator

What is an estimator of a population parameter? It is an approximation depending solely on sample information. A specific value is called an estimate. There are two types of estimates – point estimates and confidence interval estimates. A point estimate is a single number, while a confidence interval naturally is an interval (or range). The point estimate is located exactly in the middle of the confidence interval. However, confidence intervals provide much more information and are preferred when making inferences.

The sample mean, x bar (x̄), is a point estimate of the population mean mu (μ). The sample variance S squared was an estimate of the population variance: sigma squared (σ2). A point estimate is a type of statistic.

We always want the most efficient and unbiased estimators.

Confidence Intervals New Series

In the next post we’ll look at Confidence Intervals. This post is in a new series of statistics posts called Confidence Intervals.

More Information

Here is an article on the Central Limit Theorem at Investopedia.

Learn with YouTube

Here is a video by 3Blue1Brown called But what is the Central Limit Theorem?

Series Navigation<< The Normal DistributionStandard Error >>

Leave a Reply