"

Chapter 2: Descriptive Statistics

2.4 Box Plots

Learning Objectives

By the end of this section, the student should be able to:

  • Display data graphically and interpret box plots.

Box plots (also called box-and-whisker plots or box-whisker plots) give a good graphical image of the concentration of the data. They also show how far the extreme values are from most of the data. A box plot is constructed from five values: the minimum value, the first quartile, the median, the third quartile, and the maximum value. We use these values to compare how close other data values are to them.

To construct a box plot, use a horizontal or vertical number line and a rectangular box. The smallest and largest data values label the endpoints of the axis. The first quartile marks one end of the box and the third quartile marks the other end of the box. Approximately the middle 50 percent of the data fall inside the box. The "whiskers" extend from the ends of the box to the smallest and largest data values. The median or second quartile can be between the first and third quartiles, or it can be one, or the other, or both. The box plot gives a good, quick picture of the data.

Note

You may encounter box-and-whisker plots that have dots marking outlier values. In those cases, the whiskers are not extending to the minimum and maximum values.

Consider, again, this dataset: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5

The first quartile is two, the median is seven, and the third quartile is nine. The smallest value is one, and the largest value is 11.5. The following image shows the constructed box plot.

 

Horizontal boxplot with minimum at 1, Q1 at 2, Median at 7, Q3 at 9, and maximum at 11.5.
Figure 1. Box Plot for dataset: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5

The two whiskers extend from the first quartile to the smallest value and from the third quartile to the largest value. The median is shown with a dashed line.

Note

It is important to start a box plot with a scaled number line. Otherwise the box plot may not be useful.

Example

The following data are the heights of 40 students in a statistics class.

59; 60; 61; 62; 62; 63; 63; 64; 64; 64; 65; 65; 65; 65; 65; 65; 65; 65; 65; 66; 66; 67; 67; 68; 68; 69; 70; 70; 70; 70; 70; 71; 71; 72; 72; 73; 74; 74; 75; 77

Construct a box plot with the following properties:

  • Minimum value = 59
  • Maximum value = 77
  • [latex]Q_1[/latex]: First quartile = 64.5
  • [latex]Q_2[/latex]: Second quartile or median= 66
  • [latex]Q_3[/latex]: Third quartile = 70
Horizontal boxplot with minimum at 59, Q1 at 64.5, median at 66, Q3 at 70, and largest value at 77.
Figure 2. Box Plot for Heights of 40 Students in Statistics Class
  1. Each quarter has approximately 25% of the data.
  2. The spreads of the four quarters are [latex]64.5 – 59 = 5.5[/latex] (first quarter), [latex]66 – 64.5 = 1.5[/latex] (second quarter), [latex]70 – 66 = 4[/latex] (third quarter), and [latex]77 – 70 = 7[/latex] (fourth quarter). So, the second quarter has the smallest spread and the fourth quarter has the largest spread.
  3. Range = maximum value – the minimum value [latex]= 77 – 59 = 18[/latex]
  4. Interquartile Range: [latex]IQR = Q_3 – Q_1 = 70 – 64.5 = 5.5[/latex]
  5. The interval 59–65 has more than 25% of the data so it has more data in it than the interval 66 through 70 which has 25% of the data.
  6. The middle 50% (middle half) of the data has a range of 5.5 inches.

Your Turn!

The following data are the number of pages in 40 books on a shelf. Construct a box plot and state the interquartile range.

136; 140; 178; 190; 205; 215; 217; 218; 232; 234; 240; 255; 270; 275; 290; 301; 303; 315; 317; 318; 326; 333; 343; 349; 360; 369; 377; 388; 391; 392; 398; 400; 402; 405; 408; 422; 429; 450; 475; 512

Hint: You first must calculate the minimum value, maximum value, [latex]Q_1[/latex], [latex]Q_2[/latex], and [latex]Q_3[/latex].

Solution
Horizontal boxplot from first whisker of around 138 to second whisker of around 515, with approximated minimum, Q1, median, Q3, and maximum
Figure 3. Box Plot for Number of Page in 40 Books on a Shelf

 

[latex]IQR = 158[/latex]

For some sets of data, some of the largest value, smallest value, first quartile, median, and third quartile may be the same. For instance, you might have a data set in which the median and the third quartile are the same. In this case, the diagram would not have a dotted line inside the box displaying the median. The right side of the box would display both the third quartile and the median. For example, if the smallest value and the first quartile were both one, the median and the third quartile were both five, and the largest value was seven, the box plot would look like:

Horizontal boxplot with smallest value and Q1 at 1, Q3 and median at 5, and lone whisker extending from to 7.
Figure 4. Box Plot Showing Repeated Min/Q1 and Median/Q3 Values

In this case, at least 25% of the values are equal to one. Twenty-five percent of the values are between one and five, inclusive. At least 25% of the values are equal to five. The top 25% of the values fall between five and seven, inclusive.

Example

Test scores for a college statistics class held during the day are:

99; 56; 78; 55.5; 32; 90; 80; 81; 56; 59; 45; 77; 84.5; 84; 70; 72; 68; 32; 79; 90

Test scores for a college statistics class held during the evening are:

98; 78; 68; 83; 81; 89; 88; 76; 65; 45; 98; 90; 80; 84.5; 85; 79; 78; 98; 90; 79; 81; 25.5

  1. Find the smallest and largest values, the median, and the first and third quartile for the day class.
  2. Find the smallest and largest values, the median, and the first and third quartile for the night class.
  3. For the day data set, what percentage of the data is between the smallest value and the first quartile? the first quartile and the median? the median and the third quartile? the third quartile and the largest value? What percentage of the data is between the first quartile and the largest value?
  4. Create a box plot for each set of data. Use one number line for both box plots.
  5. Which box plot has the widest spread for the middle 50% of the data (the data between the first and third quartiles)? What does this mean for that set of data in comparison to the other set of data?
Solution
  1. The values are given below: [latex]\text{Min} = 32[/latex]; [latex]Q_1 = 56[/latex]; [latex]\text{Median} = 74.5[/latex]; [latex]Q_3 = 82.5[/latex]; [latex]\text{Max} = 99[/latex]
  2. The values are given below: [latex]\text{Min} = 25.5[/latex]; [latex]Q_1 = 78[/latex]; [latex]\text{Median} = 81[/latex]; [latex]Q_3 = 89[/latex]; \text{Max = 98}
  3. Day class: There are six data values ranging from 32 to 56: 30%. There are six data values ranging from 56 to 74.5: 30%. There are five data values ranging from 74.5 to 82.5: 25%. There are five data values ranging from 82.5 to 99: 25%. There are 16 data values between the first quartile, 56, and the largest value, 99: 75%.
  4. Two box plots showing a comparison of the min, Q1, median, Q3, and maximum of the day and night test scores from questions 1 and 2.
    Figure 5. Comparison of Box Plots for Day and Night Test Scores
  5. The first data set has the wider spread for the middle 50% of the data. The IQR for the first data set is greater than the IQR for the second set. This means that there is more variability in the middle 50% of the first data set.

Long Image Description: Two box plots over a number line from 0 to 100. The top plot shows a whisker from 32 to 56, a solid line at 56, a dashed line at 74.5, a solid line at 82.5, and a whisker from 82.5 to 99. The lower plot shows a whisker from 25.5 to 78, solid line at 78, dashed line at 81, solid line at 89, and a whisker from 89 to 98.

Your Turn!

The following data set shows the heights in inches for the boys in a class of 40 students.

66; 66; 67; 67; 68; 68; 68; 68; 68; 69; 69; 69; 70; 71; 72; 72; 72; 73; 73; 74

The following data set shows the heights in inches for the girls in a class of 40 students.

61; 61; 62; 62; 63; 63; 63; 65; 65; 65; 66; 66; 66; 67; 68; 68; 68; 69; 69; 69

Construct a box plot for each data set and state which box plot has the wider spread for the middle 50% of the data.

Hint: You first must calculate the minimum value, maximum value, [latex]Q_1[/latex], [latex]Q_2[/latex], and [latex]Q_3[/latex] for each data set.

Solution

Two box plots showing a comparison of the min, Q1, median, Q3, and maximum of the heights of boys and girls.
Figure 6. Comparison of Box Plots for Heights of Boys and Girls in Class

 

Long Image Description: Two box plots over a number line from 60 to 76. The top plot showing heights of boys shows a whisker from 66 to 68, a solid line at 66, a dashed line at 69, a solid line at 72, and a whisker from 72 to 74. The lower plot showing heights of girls has a whisker from 61 to 63, solid line at 63, dashed line at 65.5, solid line at 68, and a whisker from 68 to 69.

IQR for the boys = 4

IQR for the girls = 5

The box plot for the heights of the girls has the wider spread for the middle 50% of the data.

Example

Graph a box-and-whisker plot for the data values shown.

10; 10; 10; 15; 35; 75; 90; 95; 100; 175; 420; 490; 515; 515; 790

The five numbers used to create a box-and-whisker plot are:

  • Min: 10
  • [latex]Q_1[/latex]: 15
  • Med: 95
  • [latex]Q_3[/latex]: 490
  • Max: 790
Solution

The following graph shows the box-and-whisker plot.

Horizontal boxplot with minimum at 10, Q1 at 15, median at 95, Q3 at 490, and maximum at 790
Figure 7.  Box Plot for Data Values Provided

Your Turn!

Graph a box-and-whisker plot for the data values shown.

0, 5, 5, 15, 30, 30, 45, 50, 50, 60, 75, 110, 140, 240, 330

Hint: You first must calculate the minimum value, maximum value, [latex]Q_1[/latex], [latex]Q_2[/latex], and [latex]Q_3[/latex].

Solution

The data are in order from least to greatest. There are 15 values, so the eighth number in order is the median: 50. There are seven data values written to the left of the median and 7 values to the right. The five values that are used to create the box plot are:

  • Min: 0
  • [latex]Q_1[/latex]: 15
  • Med: 50
  • [latex]Q_3[/latex]: 110
  • Max: 330
Horizontal boxplot with minimum at 0, Q1 not labeled, median at 50, Q3 not labeled, and maximum around 325
Figure 8. Box Plot of Data Values Provided

Section 2.4 Review

Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.

Section 2.4 Practice

In a survey of 20-year-olds in China, Germany, and the United States, people were asked the number of foreign countries they had visited in their lifetime. The following box plots display the results.

Three box plots showing a comparison of the min, Q1, median, Q3, and maximum of China, Germany, and US number of foreign countries visited
Figure 9. Comparison of Box Plots of 20-year-olds in China, Germany, and the United States Asked the Number of Countries Visited

 

Long Image Description: Three box plots over a number line from 0 to 11. The top plot shows China with a whisker from 0 to 5 and no box. The middle plot shows Germany with a whisker from 0 to 4, a solid line at 4, a dashed line at 8, and a whisker from 8 to 11. The lower plot shows United States with a solid line at 0, a dashed line at 2, a solid line at 5, and a whisker from 5 to 11.

  1. In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected.
  2. Have more Americans or more Germans surveyed been to over eight foreign countries?
  3. Compare the three box plots. What do they imply about the foreign travel of 20-year-old residents of the three countries when compared to each other?

Given the following box plot, answer the questions.

Horizontal boxplot with minimum and Q1 at 0, median at 20, Q3 not labeled, and maximum at 150
Figure 10. Box Plot for Example
  1. Think of an example (in words) where the data might fit into the above box plot. In 2–5 sentences, write down the example.
  2. What does it mean to have the first and second quartiles so close together, while the second to third quartiles are far apart?
Solution
  1. Answers will vary. Possible answer: State University conducted a survey to see how involved its students are in community service. The box plot shows the number of community service hours logged by participants over the past year.
  2. Because the first and second quartiles are close, the data in this quarter is very similar. There is not much variation in the values. The data in the third quarter is much more variable or spread out. This is clear because the second quartile is so far away from the third quartile.

Given the following box plots, answer the questions.

Two box plots comparing Data 1 and Data 2.
Figure 11. Comparison of Box Plots of Data 1 and Data 2

 

Long Image Description: Two box plots over a number line from 0 to 7. The top plot shows Data 1 with a whisker from 0 to 2, a solid line at 2, a dashed line at 4, a solid line at around 5 (not labeled) and a whisker to 7. The lower plot shows Data 2 with a whisker from 0 to around 1.5 (not labeled), a dashed line at 2, a solid line around 2.5 (not labeled) and a whisker to 7.

  1. In complete sentences, explain why each statement is false.
    • Data 1 has more data values above two than Data 2 has above two.
    • The data sets cannot have the same mode.
    • For Data 1, there are more data values below four than there are above four.
  2. For which group, Data 1 or Data 2, is the value of “7” more likely to be an outlier? Explain why in complete sentences.

A survey was conducted of 130 purchasers of new BMW 3 series cars, 130 purchasers of new BMW 5 series cars, and 130 purchasers of new BMW 7 series cars. In it, people were asked the age they were when they purchased their car. The following box plots display the results.

 

Three box plots comparing BMW 3, 5, and 7 series.
Figure 12. Comparison of Box Plots of BMW 3, BMW 5, and BMW 7 Series

 

Long Image Description: This shows three box plots graphed over a number line from 25 to 80. The first whisker on the BMW 3 plot extends from 25 to 30. The box begins at the first quartile, 30 and ends at the third quartile, 41. A vertical, dashed line marks the median at 34. The second whisker extends from the third quartile to 66. The first whisker on the BMW 5 plot extends from 31 to 40. The box begins at the first quartile, 40, and ends at the third quartile, 55. A vertical, dashed line marks the median at 41. The second whisker extends from 55 to 64. The first whisker on the BMW 7 plot extends from 35 to 41. The box begins at the first quartile, 41, and ends at the third quartile, 59. A vertical, dashed line marks the median at 46. The second whisker extends from 59 to 68.

  1. In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected for that car series.
  2. Which group is most likely to have an outlier? Explain how you determined that.
  3. Compare the three box plots. What do they imply about the age of purchasing a BMW from the series when compared to each other?
  4. Look at the BMW 5 series. Which quarter has the smallest spread of data? What is the spread?
  5. Look at the BMW 5 series. Which quarter has the largest spread of data? What is the spread?
  6. Look at the BMW 5 series. Estimate the interquartile range (IQR).
  7. Look at the BMW 5 series. Are there more data in the interval 31 to 38 or in the interval 45 to 55? How do you know this?
  8. Look at the BMW 5 series. Which interval has the fewest data in it: 31-35, 38-41, or 41-64? How do you know this?
Solution
  1. Each box plot is spread out more in the greater values. Each plot is skewed to the right, so the ages of the top 50% of buyers are more variable than the ages of the lower 50%.
  2. The BMW 3 series is most likely to have an outlier. It has the longest whisker.
  3. Comparing the median ages, younger people tend to buy the BMW 3 series, while older people tend to buy the BMW 7 series. However, this is not a rule, because there is so much variability in each data set.
  4. The second quarter has the smallest spread. There seems to be only a three-year difference between the first quartile and the median.
  5. The third quarter has the largest spread. There seems to be approximately a 14-year difference between the median and the third quartile.
  6. IQR ~ 17 years
  7. There is not enough information to tell. Each interval lies within a quarter, so we cannot tell exactly where the data in that quarter is concentrated.
  8. The interval from 31 to 35 years has the fewest data values. Twenty-five percent of the values fall in the interval 38 to 41, and 25% fall between 41 and 64. Since 25% of values fall between 31 and 38, we know that fewer than 25% fall between 31 and 35.

References

Data from West Magazine.

definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.