What is the variance for the following population of scores scores five two five four?

Let’s calculate the variance of the follow data set: 2, 7, 3, 12, 9.

The first step is to calculate the mean. The sum is 33 and there are 5 data points. Therefore, the mean is 33 ÷ 5 = 6.6. Then you take each value in data set, subtract the mean and square the difference. For instance, for the first value:

(2 - 6.6)2  = 21.16

The squared differences for all values are added:

21.16 + 0.16 + 12.96 + 29.16 + 5.76 = 69.20

The sum is then divided by the number of data points:

69.20 ÷5 = 13.84

The variance is 13.84. To get the standard deviation, you calculate the square root of the variance, which is 3.72.

Measures of Variability

David M. Lane

Prerequisites

Percentiles, Distributions, Measures of Central Tendency

Learning Objectives

  1. Determine the relative variability of two distributions
  2. Compute the range
  3. Compute the inter-quartile range
  4. Compute the variance in the population
  5. Estimate the variance from a sample
  6. Compute the standard deviation from the variance

What is Variability?

Variability refers to how "spread out" a group of scores is. To see what we mean by spread out, consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score for each quiz is 7.0. Despite the equality of means, you can see that the distributions are quite different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The differences among students were much greater on Quiz 2 than on Quiz 1.

Quiz 1

What is the variance for the following population of scores scores five two five four?


Quiz 2

What is the variance for the following population of scores scores five two five four?

Figure 1. Bar charts of two quizzes.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a distribution is. Just as in the section on central tendency where we discussed measures of the center of a distribution of scores, in this chapter we will discuss measures of the variability of a distribution. There are four frequently used measures of variability: the range, interquartile range, variance, and standard deviation. In the next few paragraphs, we will look at each of these four measures of variability in more detail.

Range
The range is the simplest measure of variability to calculate, and one you have probably encountered many times in your life. The range is simply the highest score minus the lowest score. Let’s take a few examples. What is the range of the following group of numbers: 10, 2, 5, 6, 7, 3, 4? Well, the highest number is 10, and the lowest number is 2, so 10 - 2 = 8. The range is 8. Let’s take another example. Here’s a dataset with 10 numbers: 99, 45, 23, 67, 45, 91, 82, 78, 62, 51. What is the range? The highest number is 99 and the lowest number is 23, so 99 - 23 equals 76; the range is 76. Now consider the two quizzes shown in Figure 1. On Quiz 1, the lowest score is 5 and the highest score is 9. Therefore, the range is 4. The range on Quiz 2 was larger: the lowest score was 4 and the highest score was 10. Therefore the range is 6.

Interquartile Range

The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution. It is computed as follows:

IQR = 75th percentile - 25th percentile

For Quiz 1, the 75th percentile is 8 and the 25th percentile is 6. The interquartile range is therefore 2. For Quiz 2, which has greater spread, the 75th percentile is 9, the 25th percentile is 5, and the interquartile range is 4. Recall that in the discussion of box plots, the 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge. Using this terminology, the interquartile range is referred to as the H-spread.

A related measure of variability is called the semi-interquartile range. The semi-interquartile range is defined simply as the interquartile range divided by 2. If a distribution is symmetric, the median plus or minus the semi-interquartile range contains half the scores in the distribution.

Variance

Variability can also be defined in terms of how close the scores in the distribution are to the middle of the distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as the average squared difference of the scores from the mean. The data from Quiz 1 are shown in Table 1. The mean score is 7.0. Therefore, the column "Deviation from Mean" contains the score minus 7. The column "Squared Deviation" is simply the previous column squared.

Table 1. Calculation of Variance for Quiz 1 scores.

ScoresDeviation from MeanSquared Deviation
9 2 4
9 2 4
9 2 4
8 1 1
8 1 1
8 1 1
8 1 1
7 0 0
7 0 0
7 0 0
7 0 0
7 0 0
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
5 -2 4
5 -2 4
Means
7 0 1.5

One thing that is important to notice is that the mean deviation from the mean is 0. This will always be the case. The mean of the squared deviations is 1.5. Therefore, the variance is 1.5. Analogous calculations with Quiz 2 show that its variance is 6.7. The formula for the variance is:

What is the variance for the following population of scores scores five two five four?

where σ2 is the variance, μ is the mean, and N is the number of numbers. For Quiz 1, μ = 7 and N = 20.

If the variance in a sample is used to estimate the variance in a population, then the previous formula underestimates the variance and the following formula should be used:

What is the variance for the following population of scores scores five two five four?

where s2 is the estimate of the variance and M is the sample mean. Note that M is the mean of a sample taken from a population with a mean of μ. Since, in practice, the variance is usually computed in a sample, this formula is most often used. The simulation "estimating variance" illustrates the bias in the formula with N in the denominator.

Let's take a concrete example. Assume the scores 1, 2, 4, and 5 were sampled from a larger population. To estimate the variance in the population you would compute s2 as follows:

 M = (1 + 2 + 4 + 5)/4 = 12/4 = 3.

s2 = [(1-3)2 + (2-3)2 + (4-3)2 + (5-3)2]/(4-1)

   = (4 + 1 + 1 + 4)/3 = 10/3 = 3.333

There are alternate formulas that can be easier to use if you are doing your calculations with a hand calculator. You should note that these formulas are subject to rounding error if your values are very large and/or you have an extremely large number of observations.

What is the variance for the following population of scores scores five two five four?

and

What is the variance for the following population of scores scores five two five four?

For this example,

What is the variance for the following population of scores scores five two five four?

Standard Deviation

The standard deviation is simply the square root of the variance. This makes the standard deviations of the two quiz distributions 1.257 and 2.203. The standard deviation is an especially useful measure of variability when the distribution is normal or approximately normal (see Chapter on Normal Distributions) because the proportion of the distribution within a given number of standard deviations from the mean can be calculated. For example, 68% of the distribution is within one standard deviation of the mean and approximately 95% of the distribution is within two standard deviations of the mean. Therefore, if you had a normal distribution with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60. Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 + 2 x 10 = 70. The symbol for the population standard deviation is σ; the symbol for an estimate computed in a sample is s. Figure 2 shows two normal distributions. The red distribution has a mean of 40 and a standard deviation of 5; the blue distribution has a mean of 60 and a standard deviation of 10. For the red distribution, 68% of the distribution is between 35 and 45; for the blue distribution, 68% is between 50 and 70.

What is the variance for the following population of scores scores five two five four?

Figure 2. Normal distributions with standard deviations of 5 and 10.

q1=c(9,9,9,8,8,8,8,7,7,7,7,7,6,6,6,6,6,6,5,5)
IQR(q1, type = 6)
[1] 2
x=c(1,2,4,5)
var(x)
[1] 3.333333
sd(q1)
[1] 1.256562
q2=c(10,10,9,9,9,8,8,8,7,7,7,6,6,6,5,5,4,4,3,3)
sd(q2)
[1] 2.202869

Please answer the questions:

What is the variance for the following population of scores scores five two five four?
feedback

How do you find the population variance?

The population variance is the variance of the population. To calculate the population variance, use the formula σ2=1NN∑i=1(xi−μ)2 σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 where N is the size of the population consisting of x1,x2,…

What is the value of SS for the following sample 1/2 6?

Scores: 1, 2, 6 0 SS = 17 0 ss = (9)2 SS = 14 5S = 13/2.

What is the value of SS for the following set of scores 0 1 4 5?

Answer and Explanation: Scores are: 0,1,4 and 5. Hence the value of SS is 17 and none of the options is correct.

How many standard deviations from the mean will contain at least 99% of the data for any type of distribution?

The Empirical Rule states that 99.7% of data observed following a normal distribution lies within 3 standard deviations of the mean. Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean.