When dealing with statistical data, it is important to distinguish between
"population" data sets and "sample" data sets.
|
A population data set contains all members of a specified group (the entire list of possible data values). [Utilizes the count n in formulas.]
Example: The population may be "ALL people living in the US." |
|
A sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken. [Utilizes the count n - 1 in formulas.]
Example: The sample may be "SOME people living in the US." |
When calculating the formulas for mean absolute deviation (MAD), variance, and standard deviation, it is important to know if you are working with an entire population (where you have all of the possible data), or if you are working with only a sample (a part) of the data. In addition, if you are using a sample of the data, you need to know if you will be making generalizations about the entire population, based upon this sample.
The only difference between the formulas in each section is division by n or n - 1.
Read more about these formulas under Measures of Spread.
Note: When working with "sample data sets", statisticians use n for the number of data entries and for the mean, however, when working with "population data sets", they use N for the number of data entries and
for the mean. In Algebra 1, to avoid confusion and to coordinate with the notations used by the TI-84+ calculators, we will be using n for the number of data entries and for the mean for both population and sample data sets (as seen above).
Let's take a look at an example dealing with variance, to see an application of "population" versus "sample".
(a)
Find the variance of the heights of all fourteen year old boys in your Algebra class. |
This task is only dealing with the heights of fourteen year old boys in one specific class. The intent is not to estimate the heights of all fourteen year old boys in the world. The "population" in this task is only the fourteen year old boys in your Algebra class. Since you have the entire population available for this situation, you will be finding the population variance (dividing by n).
(b)
Find the variance of the heights of all fourteen year old boys in the world. |
In this situation, the population is extremely large. There is actually no way of obtaining all of the data in the population. You simply will not have all of the data available for your use. You will need to use a sample of the population. It will be necessary to "estimate" the population's variance based upon the variance of a sample of the population. You will be finding the sample variance (dividing by n - 1). |