Chapter 10: Linear Regression and Correlation
Chapter 10 Practice
Bring It Together from 10.6
The average number of people in a family that received welfare for various years is given in [link].
| Year | Welfare family size |
|---|---|
| 1969 | 4.0 |
| 1973 | 3.6 |
| 1975 | 3.2 |
| 1979 | 3.0 |
| 1983 | 3.0 |
| 1988 | 3.0 |
| 1991 | 2.9 |
- Using “year” as the independent variable and “welfare family size” as the dependent variable, draw a scatter plot of the data.
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- Pick two years between 1969 and 1991 and find the estimated welfare family sizes.
- Based on the data in [link], is there a linear relationship between the year and the average number of people in a welfare family?
- Using the least-squares line, estimate the welfare family sizes for 1960 and 1995. Does the least-squares line give an accurate estimate for those years? Explain why or why not.
- Are there any outliers in the data?
- What is the estimated average welfare family size for 1986? Does the least squares line give an accurate estimate for that year? Explain why or why not.
- What is the slope of the least squares (best-fit) line? Interpret the slope.
The percent of female wage and salary workers who are paid hourly rates is given in [link] for the years 1979 to 1992.
| Year | Percent of workers paid hourly rates |
|---|---|
| 1979 | 61.2 |
| 1980 | 60.7 |
| 1981 | 61.3 |
| 1982 | 61.3 |
| 1983 | 61.8 |
| 1984 | 61.7 |
| 1985 | 61.8 |
| 1986 | 62.0 |
| 1987 | 62.7 |
| 1990 | 62.8 |
| 1992 | 62.9 |
- Using “year” as the independent variable and “percent” as the dependent variable, draw a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- Find the estimated percentages for 1991 and 1988.
- Based on the data, is there a linear relationship between the year and the percent of female wage and salary earners who are paid hourly rates?
- Are there any outliers in the data?
- What is the estimated percent for the year 2050? Does the least-squares line give an accurate estimate for that year? Explain why or why not.
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
Solution
- Check student’s solution.
- yes
- ŷ = −266.8863+0.1656x
- 0.9448; Yes
- 62.8233; 62.3265
- yes
- yes; (1987, 62.7)
- 72.5937; no
- slope = 0.1656.
As the year increases by one, the percent of workers paid hourly rates tends to increase by 0.1656.
Use the following information to answer the next two exercises. The cost of a leading liquid laundry detergent in different sizes is given in [link].
| Size (ounces) | Cost (💲) | Cost per ounce |
|---|---|---|
| 16 | 3.99 | |
| 32 | 4.99 | |
| 64 | 5.99 | |
| 200 | 10.99 |
- Using “size” as the independent variable and “cost” as the dependent variable, draw a scatter plot.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- If the laundry detergent were sold in a 40-ounce size, find the estimated cost.
- If the laundry detergent were sold in a 90-ounce size, find the estimated cost.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers in the given data?
- Is the least-squares line valid for predicting what a 300-ounce size of the laundry detergent would you cost? Why or why not?
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
- Complete [link] for the cost per ounce of the different sizes.
- Using “size” as the independent variable and “cost per ounce” as the dependent variable, draw a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- If the laundry detergent were sold in a 40-ounce size, find the estimated cost per ounce.
- If the laundry detergent were sold in a 90-ounce size, find the estimated cost per ounce.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers in the data?
- Is the least-squares line valid for predicting what a 300-ounce size of the laundry detergent would cost per ounce? Why or why not?
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
Solution
-
Size (ounces) Cost (💲) cents/oz 16 3.99 24.94 32 4.99 15.59 64 5.99 9.36 200 10.99 5.50 - Check student’s solution.
- There is a linear relationship for the sizes 16 through 64, but that linear trend does not continue to the 200-oz size.
- ŷ = 20.2368 – 0.0819x
- r = –0.8086
- 40-oz: 16.96 cents/oz
- 90-oz: 12.87 cents/oz
- The relationship is not linear; the least squares line is not appropriate.
- no outliers
- No, you would be extrapolating. The 300-oz size is outside the range of x.
- slope = –0.08194; for each additional ounce in size, the cost per ounce decreases by 0.082 cents.
According to a flier by a Prudential Insurance Company representative, the costs of approximate probate fees and taxes for selected net taxable estates are as follows:
| Net Taxable Estate (💲) | Approximate Probate Fees and Taxes (💲) |
|---|---|
| 600,000 | 30,000 |
| 750,000 | 92,500 |
| 1,000,000 | 203,000 |
| 1,500,000 | 438,000 |
| 2,000,000 | 688,000 |
| 2,500,000 | 1,037,000 |
| 3,000,000 | 1,350,000 |
- Decide which variable should be the independent variable and which should be the dependent variable.
- Draw a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx.
- Find the correlation coefficient. Is it significant?
- Find the estimated total cost for a next taxable estate of 💲1,000,000. Find the cost for 💲2,500,000.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers in the data?
- Based on these results, what would be the probate fees and taxes for an estate that does not have any assets?
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
The following are advertised sale prices of color televisions at Anderson’s.
| Size (inches) | Sale Price (💲) |
|---|---|
| 9 | 147 |
| 20 | 197 |
| 27 | 297 |
| 31 | 447 |
| 35 | 1177 |
| 40 | 2177 |
| 60 | 2497 |
- Decide which variable should be the independent variable and which should be the dependent variable.
- Draw a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- Find the estimated sale price for a 32 inch television. Find the cost for a 50 inch television.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers in the data?
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
Solution
- Size is x, the independent variable, price is y, the dependent variable.
- Check student’s solution.
- The relationship does not appear to be linear.
- ŷ = –745.252 + 54.75569x
- r = 0.8944, yes it is significant
- 32-inch: 💲1006.93, 50-inch: 💲1992.53
- No, the relationship does not appear to be linear. However, r is significant.
- yes, the 60-inch TV
- For each additional inch, the price increases by 💲54.76
[link] shows the average heights for American boys in 1990.
| Age (years) | Height (cm) |
|---|---|
| birth | 50.8 |
| 2 | 83.8 |
| 3 | 91.4 |
| 5 | 106.6 |
| 7 | 119.3 |
| 10 | 137.1 |
| 14 | 157.5 |
- Decide which variable should be the independent variable and which should be the dependent variable.
- Draw a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. Is it significant?
- Find the estimated average height for a one-year-old. Find the estimated average height for an eleven-year-old.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers in the data?
- Use the least squares line to estimate the average height for a sixty-two-year-old man. Do you think that your answer is reasonable? Why or why not?
- What is the slope of the least-squares (best-fit) line? Interpret the slope.
| State | # letters in name | Year entered the Union | Ranks for entering the Union | Area (square miles) |
|---|---|---|---|---|
| Alabama | 7 | 1819 | 22 | 52,423 |
| Colorado | 8 | 1876 | 38 | 104,100 |
| Hawaii | 6 | 1959 | 50 | 10,932 |
| Iowa | 4 | 1846 | 29 | 56,276 |
| Maryland | 8 | 1788 | 7 | 12,407 |
| Missouri | 8 | 1821 | 24 | 69,709 |
| New Jersey | 9 | 1787 | 3 | 8,722 |
| Ohio | 4 | 1803 | 17 | 44,828 |
| South Carolina | 13 | 1788 | 8 | 32,008 |
| Utah | 4 | 1896 | 45 | 84,904 |
| Wisconsin | 9 | 1848 | 30 | 65,499 |
We are interested in whether there is a relationship between the ranking of a state and the area of the state.
- What are the independent and dependent variables?
- What do you think the scatter plot will look like? Make a scatter plot of the data.
- Does it appear from inspection that there is a relationship between the variables? Why or why not?
- Calculate the least-squares line. Put the equation in the form of: ŷ = a + bx
- Find the correlation coefficient. What does it imply about the significance of the relationship?
- Find the estimated areas for Alabama and for Colorado. Are they close to the actual areas?
- Use the two points in part f to plot the least-squares line on your graph from part b.
- Does it appear that a line is the best way to fit the data? Why or why not?
- Are there any outliers?
- Use the least squares line to estimate the area of a new state that enters the Union. Can the least-squares line be used to predict it? Why or why not?
- Delete “Hawaii” and substitute “Alaska” for it. Alaska is the forty-ninth, state with an area of 656,424 square miles.
- Calculate the new least-squares line.
- Find the estimated area for Alabama. Is it closer to the actual area with this new least-squares line or with the previous one that included Hawaii? Why do you think that’s the case?
- Do you think that, in general, newer states are larger than the original states?
Solution
- Let rank be the independent variable and area be the dependent variable.
- Check student’s solution.
- There appears to be a linear relationship, with one outlier.
- ŷ (area) = 24177.06 + 1010.478x
- r = 0.50047, r is not significant so there is no relationship between the variables.
- Alabama: 46407.576 Colorado: 62575.224
- Alabama estimate is closer than Colorado estimate.
- If the outlier is removed, there is a linear relationship.
- There is one outlier (Hawaii).
- rank 51: 75711.4; no
-
Alabama 7 1819 22 52,423 Colorado 8 1876 38 104,100 Alaska 6 1959 51 656,424 Iowa 4 1846 29 56,276 Maryland 8 1788 7 12,407 Missouri 8 1821 24 69,709 New Jersey 9 1787 3 8,722 Ohio 4 1803 17 44,828 South Carolina 13 1788 8 32,008 Utah 4 1896 45 84,904 Wisconsin 9 1848 30 65,499 - ŷ = –87065.3 + 7828.532x
- Alabama: 85,162.404; the prior estimate was closer. Alaska is an outlier.
- yes, with the exception of Hawaii