Chapter 2: Descriptive Statistics

Chapter 2 Review

Chapter Review from 2.1

A stem-and-leaf plot is a way to plot data and look at the distribution. In a stem-and-leaf plot, all data values within a class are visible. The advantage in a stem-and-leaf plot is that all values are listed, unlike a histogram, which gives classes of data values. A line graph is often used to represent a set of data values in which a quantity varies with time. These graphs are useful for finding trends. That is, finding a general pattern in data sets including temperature, sales, employment, company profit or cost over a period of time. A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories. One axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one (grouped bar graphs), and others show the bars divided into subparts to show cumulative effect (stacked bar graphs). Bar graphs are especially useful when categorical data is being used.

Exercises

For each of the following data sets, create a stem plot and identify any outliers.

The miles per gallon rating for 30 cars are shown below (lowest to highest).

19, 19, 19, 20, 21, 21, 25, 25, 25, 26, 26, 28, 29, 31, 31, 32, 32, 33, 34, 35, 36, 37, 37, 38, 38, 38, 38, 41, 43, 43

Solution
Stem Leaf
1 9 9 9
2 0 1 1 5 5 5 6 6 8 9
3 1 1 2 2 3 4 5 6 7 7 8 8 8 8
4 1 3 3

2. The height in feet of 25 trees is shown below (lowest to highest).

25, 27, 33, 34, 34, 34, 35, 37, 37, 38, 39, 39, 39, 40, 41, 45, 46, 47, 49, 50, 50, 53, 53, 54, 54

 

3. The data are the prices of different laptops at an electronics store. Round each value to the nearest ten.

249, 249, 260, 265, 265, 280, 299, 299, 309, 319, 325, 326, 350, 350, 350, 365, 369, 389, 409, 459, 489, 559, 569, 570, 610

Solution
Stem Leaf
2 5 5 6 7 7 8
3 0 0 1 2 3 3 5 5 5 7 7 9
4 1 6 9
5 6 7 7
6 1

 

4. The data are daily high temperatures in a town for one month.

61, 61, 62, 64, 66, 67, 67, 67, 68, 69, 70, 70, 70, 71, 71, 72, 74, 74, 74, 75, 75, 75, 76, 76, 77, 78, 78, 79, 79, 95

Exercises

For the next three exercises, use the data to construct a line graph.

In a survey, 40 people were asked how many times they visited a store before making a major purchase. The results are shown in [link].

Number of times in store Frequency
1 4
2 10
3 16
4 6
5 4
Solution
This is a line graph that matches the supplied data. The x-axis shows the number of times people reported visiting a store before making a major purchase, and the y-axis shows the frequency.

2. In a survey, several people were asked how many years it has been since they purchased a mattress. The results are shown in [link].

Years since last purchase Frequency
0 2
1 8
2 13
3 22
4 16
5 9

3. Several children were asked how many TV shows they watch each day. The results of the survey are shown in [link].

Number of TV Shows Frequency
0 12
1 18
2 36
3 7
4 2
Solution
This is a line graph that matches the supplied data. The x-axis shows the number of TV shows a kid watches each day, and the y-axis shows the frequency.

Exercises

The students in Ms. Ramirez’s math class have birthdays in each of the four seasons. [link] shows the four seasons, the number of students who have birthdays in each season, and the percentage (%) of students in each group. Construct a bar graph showing the number of students.

Seasons Number of students Proportion of population
Spring 8 24%
Summer 9 26%
Autumn 11 32%
Winter 6 18%

Using the data from Mrs. Ramirez’s math class supplied in [link], construct a bar graph showing the percentages.

Solution
This is a bar graph that matches the supplied data. The x-axis shows the seasons of the year, and the y-axis shows the proportion of birthdays.

Exercises

David County has six high schools. Each school sent students to participate in a county-wide science competition. [link] shows the percentage breakdown of competitors from each school, and the percentage of the entire student population of the county that goes to each school. Construct a bar graph that shows the population percentage of competitors from each school.

High School Science competition population Overall student population
Alabaster 28.9% 8.6%
Concordia 7.6% 23.2%
Genoa 12.1% 15.0%
Mocksville 18.5% 14.3%
Tynneson 24.2% 10.1%
West End 8.7% 28.8%

Use the data from the David County science competition supplied in [link]. Construct a bar graph that shows the county-wide population percentage of students at each school.

Solution
This is a bar graph that matches the supplied data. The x-axis shows the county high schools, and the y-axis shows the proportion of county students.

 

Chapter Review from 2.2

A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large, continuous, quantitative data sets. A frequency polygon can also be used when graphing large data sets with data points that repeat. The data usually goes on the y-axis with the frequency being graphed on the x-axis. Time series graphs can be helpful when looking at large amounts of data for one variable over a period of time.

Exercises

Sixty-five randomly selected car salespersons were asked the number of cars they generally sell in one week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve generally sell five cars; nine generally sell six cars; eleven generally sell seven cars. Complete the table.

Data Value (# cars) Frequency Relative Frequency Cumulative Relative Frequency
Solution
Data Value (# cars) Frequency Relative Frequency Cumulative Relative Frequency
 3 14 0.22 0.22
4 19 0.29 0.51
5 12 0.18 0.69
6 9 0.14 0.83
7 11 0.17 1

What does the frequency column in the table sum to? Why?

Solution

65

What does the relative frequency column in the table sum to? Why?

What is the difference between relative frequency and frequency for each data value in the table?

Solution

The relative frequency shows the proportion of data points that have each value. The frequency tells the number of data points that have each value.

What is the difference between cumulative relative frequency and relative frequency for each data value?

To construct the histogram for the data in the table, determine appropriate minimum and maximum x and y values and the scaling. Sketch the histogram. Label the horizontal and vertical axes with words. Include numerical scaling.

An empty graph template for use with this question.
Solution

Answers will vary. One possible histogram is shown:

 

Exercises

Construct a frequency polygon for the following:

  1. Pulse Rates for Women Frequency
    60–69 12
    70–79 14
    80–89 11
    90–99 1
    100–109 1
    110–119 0
    120–129 1
  2. Actual Speed in a 30 MPH Zone Frequency
    42–45 25
    46–49 14
    50–53 7
    54–57 3
    58–61 1
  3. Tar (mg) in Nonfiltered Cigarettes Frequency
    10–13 1
    14–17 0
    18–21 15
    22–25 7
    26–29 2

 

Exercises

Construct a frequency polygon from the frequency distribution for the 50 highest ranked countries for depth of hunger.

Depth of Hunger Frequency
230–259 21
260–289 13
290–319 5
320–349 7
350–379 1
380–409 1
410–439 1
Solution

Find the midpoint for each class. These will be graphed on the x-axis. The frequency values will be graphed on the y-axis values.

This is a frequency polygon that matches the supplied data. The x-axis shows the depth of hunger, and the y-axis shows the frequency.

Exercises

Use the two frequency tables to compare the life expectancy of men and women from 20 randomly selected countries. Include an overlaid frequency polygon and discuss the shapes of the distributions, the center, the spread, and any outliers. What can we conclude about the life expectancy of women compared to men?

Life Expectancy at Birth – Women Frequency
49–55 3
56–62 3
63–69 1
70–76 3
77–83 8
84–90 2
Life Expectancy at Birth – Men Frequency
49–55 3
56–62 3
63–69 1
70–76 1
77–83 7
84–90 5

Exercises

Construct a times series graph for (a) the number of male births, (b) the number of female births, and (c) the total number of births.

Sex/Year 1855 1856 1857 1858 1859 1860 1861
Female 45,545 49,582 50,257 50,324 51,915 51,220 52,403
Male 47,804 52,239 53,158 53,694 54,628 54,409 54,606
Total 93,349 101,821 103,415 104,018 106,543 105,629 107,009
Sex/Year 1862 1863 1864 1865 1866 1867 1868 1869
Female 51,812 53,115 54,959 54,850 55,307 55,527 56,292 55,033
Male 55,257 56,226 57,374 58,220 58,360 58,517 59,222 58,321
Total 107,069 109,341 112,333 113,070 113,667 114,044 115,514 113,354
Sex/Year 1871 1870 1872 1871 1872 1827 1874 1875
Female 56,099 56,431 57,472 56,099 57,472 58,233 60,109 60,146
Male 60,029 58,959 61,293 60,029 61,293 61,467 63,602 63,432
Total 116,128 115,390 118,765 116,128 118,765 119,700 123,711 123,578
Solution

Exercises

The following data sets list full time police per 100,000 citizens along with homicides per 100,000 citizens for the city of Detroit, Michigan during the period from 1961 to 1973.

Year 1961 1962 1963 1964 1965 1966 1967
Police 260.35 269.8 272.04 272.96 272.51 261.34 268.89
Homicides 8.6 8.9 8.52 8.89 13.07 14.57 21.36
Year 1968 1969 1970 1971 1972 1973
Police 295.99 319.87 341.43 356.59 376.69 390.19
Homicides 28.03 31.49 37.39 46.26 47.24 52.33
  1. Construct a double time series graph using a common x-axis for both sets of data.
  2. Which variable increased the fastest? Explain.
  3. Did Detroit’s increase in police officers have an impact on the murder rate? Explain.

Chapter Review from 2.3

The values that divide a rank-ordered set of data into 100 equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the 50th percentile would be greater than 50 percent of the other observations in the set. Quartiles divide data into quarters. The first quartile (Q1) is the 25th percentile, the second quartile (Q2 or median) is 50th percentile, and the third quartile (Q3) is the 75th percentile. The interquartile range, or IQR, is the range of the middle 50 percent of the data values. The IQR is found by subtracting Q1 from Q3, and can help determine outliers by using the following two expressions.

  • Q3 + IQR(1.5)
  • Q1IQR(1.5)

Formula Review

[latex]i=\left(\frac{k}{100}\right)\left(n+1\right)[/latex]

where i = the ranking or position of a data value,

k = the kth percentile,

n = total number of data.

Expression for finding the percentile of a data value: [latex]\left(\frac{x\text{ + }0.5y}{n}\right)(100)[/latex], where

x = the number of values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile,

y = the number of data values equal to the data value for which you want to find the percentile,

n = total number of data

Exercises

Listed are 29 ages for Academy Award winning best actors in order from smallest to largest.

18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77

  1. Find the 40th percentile.
  2. Find the 78th percentile.
Solution
  1. The 40th percentile is 37 years.
  2. The 78th percentile is 70 years.

Exercises

Listed are 32 ages for Academy Award winning best actors in order from smallest to largest.

18; 18; 21; 22; 25; 26; 27; 29; 30; 31; 31; 33; 36; 37; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77

  1. Find the percentile of 37.
  2. Find the percentile of 72.

Jesse was ranked 37th in his graduating class of 180 students. At what percentile is Jesse’s ranking?

Solution

Jesse graduated 37th out of a class of 180 students. There are 180 – 37 = 143 students ranked below Jesse. There is one rank of 37.

x = 143 and y = 1. [latex]\frac{x+0.5y}{n}[/latex](100) = [latex]\frac{143+0.5\left(1\right)}{180}[/latex](100) = 79.72. Jesse’s rank of 37 puts him at the 80th percentile.

Exercises

  1. For runners in a race, a low time means a faster run. The winners in a race have the shortest running times. Is it more desirable to have a finish time with a high or a low percentile when running a race?
  2. The 20th percentile of run times in a particular race is 5.2 minutes. Write a sentence interpreting the 20th percentile in the context of the situation.
  3. A bicyclist in the 90th percentile of a bicycle race completed the race in 1 hour and 12 minutes. Is he among the fastest or slowest cyclists in the race? Write a sentence interpreting the 90th percentile in the context of the situation.
  4. For runners in a race, a higher percentile means a faster run. Is it more desirable to have a speed with a high or a low percentile when running a race?
  5. The 40th percentile of speeds in a particular race is 7.5 miles per hour. Write a sentence interpreting the 40th percentile in the context of the situation.
Solution

1. For runners in a race it is more desirable to have a high percentile for speed. A high percentile means a higher speed which is faster.

5. 40% of runners ran at speeds of 7.5 miles per hour or less (slower). 60% of runners ran at speeds of 7.5 miles per hour or more (faster).

Exercises

On an exam, would it be more desirable to earn a grade with a high or low percentile? Explain.

Exercises

Mina is waiting in line at the Department of Motor Vehicles (DMV). Her wait time of 32 minutes is the 85th percentile of wait times. Is that good or bad? Write a sentence interpreting the 85th percentile in the context of this situation.

Solution

When waiting in line at the DMV, the 85th percentile would be a long wait time compared to the other people waiting. 85% of people had shorter wait times than Mina. In this context, Mina would prefer a wait time corresponding to a lower percentile. 85% of people at the DMV waited 32 minutes or less. 15% of people at the DMV waited 32 minutes or longer.

Exercises

In a survey collecting data about the salaries earned by recent college graduates, Li found that her salary was in the 78th percentile. Should Li be pleased or upset by this result? Explain.

 

Exercises

In a study collecting data about the repair costs of damage to automobiles in a certain type of crash tests, a certain model of car had $1,700 in damage and was in the 90th percentile. Should the manufacturer and the consumer be pleased or upset by this result? Explain and write a sentence that interprets the 90th percentile in the context of this problem.

Solution

The manufacturer and the consumer would be upset. This is a large repair cost for the damages, compared to the other cars in the sample. INTERPRETATION: 90% of the crash tested cars had damage repair costs of $1700 or less; only 10% had damage repair costs of $1700 or more.

Exercises

The University of California has two criteria used to set admission standards for freshman to be admitted to a college in the UC system:

  1. Students’ GPAs and scores on standardized tests (SATs and ACTs) are entered into a formula that calculates an “admissions index” score. The admissions index score is used to set eligibility standards intended to meet the goal of admitting the top 12% of high school students in the state. In this context, what percentile does the top 12% represent?
  2. Students whose GPAs are at or above the 96th percentile of all students at their high school are eligible (called eligible in the local context), even if they are not in the top 12% of all students in the state. What percentage of students from each high school are “eligible in the local context”?

Exercises

Suppose that you are buying a house. You and your realtor have determined that the most expensive house you can afford is the 34th percentile. The 34th percentile of housing prices is $240,000 in the town you want to move to. In this town, can you afford 34% of the houses or 66% of the houses?

Solution

You can afford 34% of houses. 66% of the houses are too expensive for your budget. INTERPRETATION: 34% of houses cost $240,000 or less. 66% of houses cost $240,000 or more.

 

Exercises

Use [link] to calculate the following values:

First quartile = _______

Second quartile = median = 50th percentile = _______

Solution

4

Third quartile = _______

Interquartile range (IQR) = _____ – _____ = _____

Solution

6 – 4 = 2

10th percentile = _______

70th percentile = _______

Solution

6

 

Chapter Review from 2.4

Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.

Exercises

Sixty-five randomly selected car salespersons were asked the number of cars they generally sell in one week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve generally sell five cars; nine generally sell six cars; eleven generally sell seven cars.

Construct a box plot below. Use a ruler to measure and scale accurately.

Looking at your box plot, does it appear that the data are concentrated together, spread out evenly, or concentrated in some areas, but not in others? How can you tell?

Solution

More than 25% of salespersons sell four cars in a typical week. You can see this concentration in the box plot because the first quartile is equal to the median. The top 25% and the bottom 25% are spread out evenly; the whiskers have the same length.

Chapter Review from 2.5

The mean and the median can be calculated to help you find the “center” of a data set. The mean is the best estimate for the actual data set, but the median is the best measurement when a data set contains several outliers or extreme values. The mode will tell you the most frequently occurring datum (or data) in your data set. The mean, median, and mode are extremely helpful when you need to analyze your data, but if your data set consists of ranges which lack specific values, the mean may seem impossible to calculate. However, the mean can be approximated if you add the lower boundary with the upper boundary and divide by two to find the midpoint of each interval. Multiply each midpoint by the number of values found in the corresponding range. Divide the sum of these values by the total number of data values in the set.

Formula Review

[latex]\mu =\frac{\sum fm}{\sum f}[/latex] Where f = interval frequencies and m = interval midpoints.

Exercises

Find the mean for the following frequency tables.

  1. Grade Frequency
    49.5–59.5 2
    59.5–69.5 3
    69.5–79.5 8
    79.5–89.5 12
    89.5–99.5 5
  2. Daily Low Temperature Frequency
    49.5–59.5 53
    59.5–69.5 32
    69.5–79.5 15
    79.5–89.5 1
    89.5–99.5 0
  3. Points per Game Frequency
    49.5–59.5 14
    59.5–69.5 32
    69.5–79.5 15
    79.5–89.5 23
    89.5–99.5 2

    Exercises

    Use the following information to answer the next three exercises:

    The following data show the lengths of boats moored in a marina. The data are ordered from smallest to largest:
    16; 17; 19; 20; 20; 21; 23; 24; 25; 25; 25; 26; 26; 27; 27; 27; 28; 29; 30; 32; 33; 33; 34; 35; 37; 39; 40

    Calculate the mean.

    Solution

    Mean:

    [latex]16 + 17 + 19 + 20 + 20 + 21 + 23 + 24 + 25 + 25 + 25 + 26 + 26 + 27 + 27 + 27 + 28 + 29 + 30 + 32 + 33 + 33 + 34 + 35 + 37 + 39 + 40 = 738[/latex]

    [latex]\frac{738}{27} = 27.33[/latex]

    Identify the median.

    Identify the mode.

    Solution

    The most frequent lengths are 25 and 27, which occur three times. Mode = 25, 27

    Exercises

    Use the following information to answer the next three exercises:

    Sixty-five randomly selected car salespersons were asked the number of cars they generally sell in one week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve generally sell five cars; nine generally sell six cars; eleven generally sell seven cars. Calculate the following:

    sample mean = [latex]\overline{x} =[/latex] _______

    median = _______

    Solution

    4

    mode = _______

     

    Chapter Review from 2.6

    Looking at the distribution of data can reveal a lot about the relationship between the mean, the median, and the mode. There are three types of distributions. A right (or positive) skewed distribution has a shape like [link]. A left (or negative) skewed distribution has a shape like [link]. A symmetrical distribution looks like [link].

    Exercises

    Use the following information to answer the next three exercises:

    State whether the data are symmetrical, skewed to the left, or skewed to the right.

    1; 1; 1; 2; 2; 2; 2; 3; 3; 3; 3; 3; 3; 3; 3; 4; 4; 4; 5; 5

    Solution

    The data are symmetrical. The median is 3 and the mean is 2.85. They are close, and the mode lies close to the middle of the data, so the data are symmetrical.

    16; 17; 19; 22; 22; 22; 22; 22; 23

    87; 87; 87; 87; 87; 88; 89; 89; 90; 91

    Solution

    The data are skewed right. The median is 87.5 and the mean is 88.2. Even though they are close, the mode lies to the left of the middle of the data, and there are many more instances of 87 than any other number, so the data are skewed right.

     

    Exercises

    When the data are skewed left, what is the typical relationship between the mean and median?

    When the data are symmetrical, what is the typical relationship between the mean and median?

    Solution

    When the data are symmetrical, the mean and median are close or the same.

    What word describes a distribution that has two modes?

    Exercises

    Describe the shape of this distribution.

    This is a histogram which consists of 5 adjacent bars with the x-axis split into intervals of 1 from 3 to 7. The bar heights peak at the first bar and taper lower to the right.
    Solution

    The distribution is skewed right because it looks pulled out to the right.

    Describe the relationship between the mode and the median of this distribution.

    This is a histogram which consists of 5 adjacent bars with the x-axis split into intervals of 1 from 3 to 7. The bar heights peak at the first bar and taper lower to the right. The bar heights from left to right are: 8, 4, 2, 2, 1.

    Describe the relationship between the mean and the median of this distribution.

    This is a histogram which consists of 5 adjacent bars with the x-axis split into intervals of 1 from 3 to 7. The bar heights peak at the first bar and taper lower to the right. The bar heights from left to right are: 8, 4, 2, 2, 1.
    Solution

    The mean is 4.1 and is slightly greater than the median, which is four.

    Describe the shape of this distribution.

    This is a histogram which consists of 5 adjacent bars with the x-axis split into intervals of 1 from 3 to 7. The bar heights peak in the middle and taper down to the right and left.

    Describe the relationship between the mode and the median of this distribution.

    This is a histogram which consists of 5 adjacent bars with the x-axis split intervals of 1 from 3 to 7. The bar heights peak in the middle and taper down to the right and left.
    Solution

    The mode and the median are the same. In this case, they are both five.

    Are the mean and the median the exact same in this distribution? Why or why not?

    This is a histogram which consists of 5 adjacent bars with the x-axis split into intervals of 1 from 3 to 7. The bar heights from left to right are: 2, 4, 8, 5, 2.

    Describe the shape of this distribution.

    This is a histogram which consists of 5 adjacent bars over an x-axis split into intervals of 1 from 3 to 7. The bar heights from left to right are: 1, 1, 2, 4, 7.
    Solution

    The distribution is skewed left because it looks pulled out to the left.

    Describe the relationship between the mode and the median of this distribution.

    This is a histogram which consists of 5 adjacent bars over an x-axis split into intervals of 1 from 3 to 7. The bar heights from left to right are: 1, 1, 2, 4, 7.

    Describe the relationship between the mean and the median of this distribution.

    This is a histogram which consists of 5 adjacent bars over an x-axis split into intervals of 1 from 3 to 7. The bar heights from left to right are: 1, 1, 2, 4, 7.
    Solution

    The mean and the median are both six.

    Exercises

    The mean and median for the data are the same.

    3; 4; 5; 5; 6; 6; 6; 6; 7; 7; 7; 7; 7; 7; 7

    Is the data perfectly symmetrical? Why or why not?

     

    Exercises

    Which is the greatest, the mean, the mode, or the median of the data set?

    11; 11 ;12 ;12 ;12 ;12 ;13 ;15 ;17 ;22 ;22 ;22

    Solution

    The mode is 12, the median is 13.5, and the mean is 15.1. The mean is the largest.

     

    Exercises

    Which is the least, the mean, the mode, and the median of the data set?

    56; 56; 56; 58; 59; 60; 62; 64; 64; 65; 67

    Of the three measures, which tends to reflect skewing the most, the mean, the mode, or the median? Why?

    Solution

    The mean tends to reflect skewing the most because it is affected the most by outliers.

    In a perfectly symmetrical distribution, when would the mode be different from the mean and median?

     

    Chapter Review from 2.7

    The standard deviation can help you calculate the spread of data. There are different equations to use if you are calculating the standard deviation of a sample or of a population.

    • The Standard Deviation allows us to compare individual data or classes to the data set mean numerically.
    • [latex]s=\sqrt{\frac{{\sum }^{\text{​}}{\left(x-\overline{x}\right)}^{2}}{n-1}}[/latex] or [latex]s=\sqrt{\frac{{\sum }^{\text{​}}f{\left(x-\overline{x}\right)}^{2}}{n-1}}[/latex] is the formula for calculating the standard deviation of a sample. To calculate the standard deviation of a population, we would use the population mean, [latex]\mu[/latex], and the formula [latex]\sigma = \sqrt{\frac{{\sum }^{\text{​}}{\left(x-\mu \right)}^{2}}{N}}[/latex] or [latex]\sigma = \sqrt{\frac{{\sum }^{\text{​}}f{\left(x-\mu \right)}^{2}}{N}}[/latex].

    Formula Review

    Sample Standard Deviation

    • [latex]s=\sqrt{\frac{\Sigma {\left(x-\overline{x}\right)}^{2}}{n-1}}[/latex] or [latex]s=\sqrt{\frac{\Sigma f{\left(x-\overline{x}\right)}^{2}}{n-1}}[/latex]
    • For the sample standard deviation, the denominator is n – 1, that is the sample size MINUS 1.

    Population Standard Deviation

    • [latex]\sigma=\sqrt{\frac{\Sigma{\left(x-\mu\right)}^{2}}{N}}[/latex] or [latex]\sigma = \sqrt{\frac{\Sigma f{\left(x–\mu\right)}^{2}}{N}}[/latex]
    • For the population standard deviation, the denominator is N, the number of items in the population.

    Exercises

    Use the following information to answer the next two exercises: The following data are the distances between 20 retail stores and a large distribution center. The distances are in miles.

    29; 37; 38; 40; 58; 67; 68; 69; 76; 86; 87; 95; 96; 96; 99; 106; 112; 127; 145; 150

    Use a graphing calculator or computer to find the standard deviation and round to the nearest tenth.

    Solution

    s = 34.5

    Find the value that is one standard deviation below the mean.

    Exercises

    Two baseball players, Fredo and Karl, on different teams wanted to find out who had the higher batting average when compared to his team. Which baseball player had the higher batting average when compared to his team?

    Baseball Player Batting Average Team Batting Average Team Standard Deviation
    Fredo 0.158 0.166 0.012
    Karl 0.177 0.189 0.015
    Solution

    For Fredo: [latex]z=\frac{0.158\text{ – }0.166}{0.012}=-0.67[/latex]

    For Karl: z = [latex]z=\frac{0.177\text{ – }0.189}{0.015}=-0.8[/latex]

    Fredo’s z-score of –0.67 is higher than Karl’s z-score of –0.8. For batting average, higher values are better, so Fredo has a better batting average compared to his team.

     

    Exercises

    Use [link] to find the value that is three standard deviations:

    • above the mean
    • below the mean

    Exercises

    Find the standard deviation for the following frequency tables using the formula. Check the calculations with the TI 83/84.

    1. Grade Frequency
      49.5–59.5 2
      59.5–69.5 3
      69.5–79.5 8
      79.5–89.5 12
      89.5–99.5 5
    2. Daily Low Temperature Frequency
      49.5–59.5 53
      59.5–69.5 32
      69.5–79.5 15
      79.5–89.5 1
      89.5–99.5 0
    3. Points per Game Frequency
      49.5–59.5 14
      59.5–69.5 32
      69.5–79.5 15
      79.5–89.5 23
      89.5–99.5 2
    Solution
    1. [latex]{s}_{x}=\sqrt{\frac{\sum f{m}^{2}}{n}-{\overline{x}}^{2}}=\sqrt{\frac{193157.45}{30}-{79.5}^{2}}=10.88[/latex]
    2. [latex]{s}_{x}=\sqrt{\frac{\sum f{m}^{2}}{n}-{\overline{x}}^{2}}=\sqrt{\frac{380945.3}{101}-{60.94}^{2}}=7.62[/latex]
    3. [latex]{s}_{x}=\sqrt{\frac{\sum f{m}^{2}}{n}-{\overline{x}}^{2}}=\sqrt{\frac{440051.5}{86}-{70.66}^{2}}=11.14[/latex]

     

    License

    Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

    Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

    Share This Book