Measures of Dispersion


This entry is part 6 of 9 in the series Statistics

After considering measures of central tendency and asymmetry in our discussion of descriptive statistics, it’s time to look at measures of dispersion. How wide-spread is the data? Are the numbers fairly close together or are they wide apart or somewhere in between? For measures of variability, we’ll look at variance, standard deviation and coefficient of variation.

Sample and Population

We will typically use different formulas when working with population data and sample data. They may be computed in the same way but the notation is a bit different. When you take a sample of this population and you compute a sample statistic, it is interpreted as an approximation of the population parameter. The sample mean is the average of the sample data points, while the population mean is the average of the population data points.

\mu = \dfrac{\sum_{i=1}^{N} x^i}{N}

The sample mean is

\bar{x} = \dfrac{\sum_{i=1}^{n} x^i}{n}

Variation

Given a set of numbers how far apart are they, or how close together are they? Two sets of numbers could have the same mean and median, but one set could be spread out much farther than the other. This is called a dispersion. The spread is calculated by subtracting the lowest value from the highest. It is affected by outlying (extreme) values. How will we calculate variability? We can use the range, variance, standard deviation and coefficient of variation. There are others, but these are the most common ways of measuring variability.

Range

Variance

In statistics, we will typically use different formulas when working with population data and sample data. When work with the whole population, each data point is known so you are certain of the measures you are calculating. When you take a sample of this population and you compute a sample statistic, it is an approximation of the population parameter. Statisticians have solved this challenge by adjusting the algebraic formulas for many statistics to reflect this issue. Below is the formula for population variance.

\alpha^2 = \dfrac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}

Below is the formula for sample variance.

s^2 = \dfrac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}

Standard Deviation

Thew variance is squared so that we can get rid of negative values to get a meaningful number. However, you end up with a large number compared to the data itself. The fix is to take the square root of the variance and obtain the standard deviation. We generally prefer this statistic over the variance because it’s more meaningful. The population formulas use capitals and the sample formulas use small case. Below is the formula for sample standard deviation.

For a single dataset, the standard deviation is the most common measure of variability.

s = \sqrt{\dfrac{\sigma^2}{n}} = \dfrac{\sigma}{\sqrt{n}}

Coefficient of Variation

The coefficient of variation (CV) is calculated as the standard deviation divided by the mean. Another name for the term is relative standard deviation. The CV the standard deviation relative to the mean. Standard deviation is the most common measure of variability for a single data set. But why do we need yet another measure such as the coefficient of variation? Comparing the standard deviations of two different data sets is meaningless, but comparing coefficients of variation is not.

For comparing two or more datasets, use the coefficient of variation.

Use Case for Coefficient of Variation

You want to compute the variability of two sample datasets. You’ve got a product that you want to compute the variability of the prices of the product, but the data comes from two different countries with two different currencies. No problem. We’ll use the coefficient of variation. That number is a constant number that has no units attached. We’ll be able to say which country’s variation is greater or lesser even when they are using different currencies. We can compare variation across different units.

Series Navigation<< Central TendencyMeasures of Position >>

Leave a Reply