"

Chapter 10: Linear Regression and Correlation

10.2 Scatter Plots

Learning Objectives

By the end of this section, the student should be able to:

  • Construct and interpret scatter plots.

Before we take up the discussion of linear regression and correlation, we need to examine a way to display the relation between two variables [latex]x[/latex] and [latex]y[/latex]. The most common and easiest way is a scatter plot. The following example illustrates a scatter plot.

Example

In Europe and Asia, m-commerce is popular. M-commerce users have special mobile phones that work like electronic wallets as well as provide phone and Internet services. Users can do everything from paying for parking to buying a TV set or soda from a machine to banking to checking sports scores on the Internet. For the years 2000 through 2004, was there a relationship between the year and the number of m-commerce users? Construct a scatter plot. Let [latex]x = \text{the year}[/latex], and let [latex]y = \text{the number of m-commerce users, in millions}[/latex].

The table below shows the data:

Table 1: Number of m-commerce users (in millions) (y-values) by Year (x-values)
[latex]x[/latex] (year) [latex]y[/latex] (# of users)
2000 0.5
2002 20.0
2003 33.0
2004 47.0
Here is the scatter plot showing the number of m-commerce users (in millions) by year:
This is a scatter plot for the data provided. There are four points plotted, at (2000, 0.5), (2002, 20.0), (2003, 33.0), (2004, 47.0).
Figure 1. Scatter plot for the number of m-commerce users in millions versus the year

Example

The following data is real. The percent of declared ethnic minority students at De Anza College for selected years from 1970–1995 was:

Table 2: Year and Student Minority Percentage
Year (x) Student Ethnic Minority Percentage (y)
1970 14.13
1973 12.27
1976 14.08
1979 18.16
1982 27.64
1983 28.72
1986 31.86
1989 33.14
1992 45.37
1995 53.1
The independent variable is "Year," while the independent variable is "Student Ethnic Minority Percent."
Here is the scatter plot for the data:
This is a scatterplot for the data provided. The points show a strong, curved, upward trend.
Figure 2. Scatter Plot for year versus percent for "Student Ethnic Minority Percentage"

Your Turn!

Amelia plays basketball for her high school. She wants to improve to play at the college level. She notices that the number of points she scores in a game goes up in response to the number of hours she practices her jump shot each week. She records the following data:

Table 3: Points Scored versus Hours Practicing
X (hours practicing jump shot) Y (points scored in a game)
5 15
7 22
9 28
10 31
11 33
12 36

Construct a scatter plot and state if what Amelia thinks appears to be true.

Solution
This is a scatter plot for the data provided.
Figure 3. Scatter Plot Example

 

Yes, Amelia’s assumption appears to be correct. The number of points Amelia scores per game goes up when she practices her jump shot more.

A scatter plot shows the direction of a relationship between the variables. A clear direction happens when there is either:

  • High values of one variable occurring with high values of the other variable or low values of one variable occurring with low values of the other variable.
  • High values of one variable occurring with low values of the other variable.

You can determine the strength of the relationship by looking at the scatter plot and seeing how close the points are to a line, a power function, an exponential function, or some other type of function. For a linear relationship there is an exception. Consider a scatter plot where all the points fall on a horizontal line providing a "perfect fit." The horizontal line would in fact show no relationship.

When you look at a scatter plot, you want to notice the overall pattern and any deviations from the pattern. The following scatter plot examples illustrate these concepts.

Figure 4 - The first graph (a) is a scatter plot with 6 points plotted. The points form a pattern that moves upward to the right, almost in a straight line. The second graph (b) is a scatter plot with the same 6 points as the first graph. A 7th point is plotted in the top left corner of the quadrant. It falls outside the general pattern set by the other 6 points.

Figure 5 - The first graph (a) is a scatter plot with 6 points plotted. The points form a pattern that moves downward to the right, almost in a straight line. The second graph (b) is a scatter plot of 8 points. These points form a general downward pattern, but the points do not align in a tight pattern.

Figure 6 - The first graph (a) is a scatter plot of 7 points in an exponential pattern. The pattern of the points begins along the x-axis and curves steeply upward to the right side of the quadrant. The second graph (b) shows a scatter plot with many points scattered everywhere, exhibiting no pattern.

 

As described in the text above.
Figure 4.
As described in the text above.
Figure 5.
As described in the text above.
Figure 6.

In this chapter, we are interested in scatter plots that show a linear pattern. Linear patterns are quite common. The linear relationship is strong if the points are close to a straight line, except in the case of a horizontal line where there is no relationship. If we think that the points show a linear relationship, we would like to draw a line on the scatter plot. This line can be calculated through a process called linear regression. However, we only calculate a regression line if one of the variables helps to explain or predict the other variable. If [latex]x[/latex] is the independent variable and [latex]y[/latex] the dependent variable, then we can use a regression line to predict [latex]y[/latex] for a given value of [latex]x[/latex].

Section 10.2 Review

Scatter plots are particularly helpful graphs when we want to see if there is a linear relationship among data points. They indicate both the direction of the relationship between the x variables and the y variables, and the strength of the relationship. We calculate the strength of the relationship between an independent variable and a dependent variable using linear regression.

Section 10.2 Practice

Does the scatter plot appear linear? Strong or weak? Positive or negative?

A scatter plot with several points. The points move downward to the right. The points are widely scattered.
Figure 7. Scatter Plot Example

Does the scatter plot appear linear? Strong or weak? Positive or negative?

A scatterplot with several points plotted. The points form a clear pattern, moving upward to the right. The overall pattern can be modeled with a line.
Figure 8. Scatter Plot Example

Does the scatter plot appear linear? Strong or weak? Positive or negative?

This is a scatter plot with several points plotted all over the first quadrant. There is no pattern.
Figure 9. Scatter Plot Example

The Gross Domestic Product Purchasing Power Parity is an indication of a country’s currency value compared to another country. The table below shows the GDP PPP of Cuba as compared to US dollars. Construct a scatter plot of the data.

Table 4: Cuba's PPP versus Year
Year Cuba’s PPP Year Cuba’s PPP
1999 1,700 2006 4,000
2000 1,700 2007 11,000
2002 2,300 2008 9,500
2003 2,900 2009 9,700
2004 3,000 2010 9,900
2005 3,500

The following table shows the poverty rates and cell phone usage in the United States. Construct a scatter plot of the data

Table 5: Year, Poverty Rate, and Cellular Usage per Capita
Year Poverty Rate Cellular Usage per Capita
2003 12.7 54.67
2005 12.6 74.19
2007 12 84.86
2009 12 90.82

Does the higher cost of tuition translate into higher-paying jobs?

The table lists the top ten colleges based on mid-career salary and the associated yearly tuition costs. Construct a scatter plot of the data.

Table 6: School, Midcareer Salary, and Yearly Tuition
School Midcareer Salary (in thousands) Yearly Tuition
Princeton 137 28,540
Harvey Mudd 135 40,133
CalTech 127 39,900
US Naval Academy 122 0
West Point 120 0
MIT 118 42,050
Lehigh University 118 43,220
NYU-Poly 117 39,565
Babson College 117 40,400
Stanford 114 54,506

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.