Chapter 11: The Chi-Square Distribution

11.7 Using Technology for The Chi-Square Distribution

Learning Objectives

By the end of this section, the student should be able to:

  • use technology to solve problems involving The Chi-Square Distribution

This is another section to show technology tools that can be used to help with the statistics concepts taught in this chapter. This section will cover tools to help solve problems involving The Chi-Square Distribution.

Graphing Calculator

Similar to all the other technology sections, the main tool introduced will be the TI-83, 83+, 84, or 84+ Graphing Calculator. The graphing calculator can help to perform the entire hypothesis test for one sample, including getting the test statistic and p-value. It can also get a graph of the shaded p-value region, if you'd like that.

**Be aware!! The graphing calculator cannot determine the hypothesis for the test or which test you need to do, and it will not help you make the correct conclusion for the test. It only takes what you input into it.

Let's look at the options we have in the graphing calculator for hypothesis testing with the chi-square distribution:

Using the TI-83, 83+, 84, 84+ Calculator for The Chi-Square Distribution

  1. Access [DISTR] by pressing 2nd key, vars key. (Note: DISTR stands for "Distributions")
  2. The functions for the Chi-Square Distribution are:
    • 2pdf()> yields the probability density function value. It is only useful to plot the chi2 curve, in which case "x" is the variable.
    • 2cdf()> corresponds to P(left bound < Χ2 < right bound). Used for the Chi-Square Goodness-of-Fit Test and Test of a Single Variance

The syntax for them is as follows:

  • Χ2pdf(x, df)
  • Χ2cdf(left bound, right bound, df)

Remember, for continuous distributions:

  • \(-\infty \) uses the value –1EE99 for left bound
  • \(\infty \) uses the value 1EE99 for right bound
  • The [EE] is found by pressing 2nd key, then the comma button.

Hypothesis Testing with Chi-Square in the TI-83, 83+, 84, or 84+ Graphing Calculator

  1. Access statistics mode by pressing stat key.

  2. Navigate to <TESTS> by using the arrow right key.

On this screen you will find all of the available hypothesis tests for on sample:

  • 2-Test> is the hypothesis test for independence and homogeneity.
  • 2GOF-Test> is the hypothesis test for goodness-of-fit (TI-84+ only).

Highlight your choice and press enter key.

Notes:

  • The 2-Test> uses matrices. You will need to press 2nd key, then the [late]x^{-1}[\latex] button.
  • The 2GOF-Test> is the hypothesis test for goodness-of-fit but is on in the TI-84+ calculator. For this Test screen, you will need to put data into two lists in your calculator. Put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF.
  • Selecting <Calculate> will show you the value of the df, test statistic, and p-value
  • Selecting <Draw> will show you the shaded region under the Chi-Square curve with the test statistic and the p-value. Make sure when you use <Draw> that no other equations are highlighted in Y= key and the plots are turned off (2nd key , [STAT PLOT]).

Legend

  • A blank calculator button represents a button press
  • [ ] represents yellow command or green letter behind a key
  • < > represents items on the screen

 

Let's now look at specific examples of each.

 

The Chi-Square Goodness-of-Fit Test

Similarly to one and two sample hypothesis testing for the mean and proportions, the by-hand method will still be shown for the below problems as it did in 11.2 Goodness-of-Fit Test. This is to continuously emphasize that doing hypothesis tests is a lot more than just plugging into the calculator - you need to know the process involved. You will see a box that separates the calculator information.

Example

Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as shown in the table below. For the population of employees, do the days for the highest number of absences occur with equal frequencies during a five-day work week? Test at a 5% significance level.

Table 1: Day of the Week Employees Were Most Absent
Monday Tuesday Wednesday Thursday Friday
Number of Absences 15 12 9 9 15

Solution

The null and alternative hypotheses are:

  • [latex]H_0:[/latex] The absent days occur with equal frequencies, that is, they fit a uniform distribution.
  • [latex]H_a:[/latex] The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample: [latex]15 + 12 + 9 + 9 + 15 = 60[/latex]), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday. These numbers are the expected ([latex]E[/latex]) values. The values in the table are the observed ([latex]O[/latex]) values or data.

This time, calculate the [latex]\chi^2[/latex] test statistic by hand. Make a chart with the following headings and fill in the columns:

  • Expected ([latex]E[/latex]) values (12, 12, 12, 12, 12)
  • Observed ([latex]O[/latex]) values (15, 12, 9, 9, 15)
  • ([latex]O – E[/latex])
  • [latex](O – E)^2[/latex]
  • [latex]\frac{{(O – E)}^{2}}{E}[/latex]

Now add (sum) the last column. The sum is three. This is the [latex]\chi^2[/latex] test statistic.

To find the p-value, calculate [latex]P(\chi^2 > 3)[/latex]. This test is right-tailed. (Use a computer or calculator to find the p-value. You should get [latex]\text{p-value} = 0.5578[/latex].)

The [latex]df \text{s}[/latex] are [latex]\text{the number of cells} – 1 = 5 – 1 = 4[/latex].

Using the TI-83, 83+, 84, 84+ Calculator

Press 2nd DISTR. Arrow down to [latex]\chi^2 \text{cdf}[/latex]. Press ENTER. Enter (3,10^99,4). Rounded to four decimal places, you should see 0.5578, which is the p-value.

Next, complete a graph like the following one with the proper labeling and shading. (You should shade the right tail.)

This is a blank nonsymmetrical chi-square curve for the test statistic of the days of the week absent.
Figure 1. Chi-Square Curve Example

 

The decision is not to reject the null hypothesis.

Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

Using the TI-83, 83+, 84, 84+ Calculator

TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the goodness-of-fit test. The next example has the calculator instructions. The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw. Make sure you clear any lists before you start. To Clear Lists in the calculators: Go into STAT EDIT and arrow up to the list name area of the particular list. Press CLEAR and then arrow down. The list will be cleared. Alternatively, you can press STAT and press 4 (for ClrList). Enter the list name and press ENTER.

Example

One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population) as in the following table.

Table 2: Number of Televisions and Percent
Number of Televisions Percent
0 10
1 16
2 55
3 11
4+ 8

The table contains expected ([latex]E[/latex]) percentages.

A random sample of 600 families in the far western United States resulted in the data in the following table.

Table 3: Number of Television and Frequency
Number of Televisions Frequency
Total = 600
0 66
1 119
2 340
3 60
4+ 15

The table contains observed ([latex]O[/latex]) frequency values.

At the 1% significance level, does it appear that the distribution "number of televisions" of far western United States families is different from the distribution for the American population as a whole?

Solution

This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.

The first table contains expected percentages. To get expected ([latex]E[/latex]) frequencies, multiply the percentage by 600. The expected frequencies are shown in the following table.

Table 4: Number of Televisions, Percent, and Expected Frequency
Number of Televisions Percent Expected Frequency
0 10 (0.10)(600) = 60
1 16 (0.16)(600) = 96
2 55 (0.55)(600) = 330
3 11 (0.11)(600) = 66
over 3 8 (0.08)(600) = 48

Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter 0.10*600.

[latex]H_0[/latex]: The "number of televisions" distribution of far western United States families is the same as the "number of televisions" distribution of the American population.

[latex]H_a[/latex]: The "number of televisions" distribution of far western United States families is different from the "number of televisions" distribution of the American population.

Distribution for the test: [latex]{\chi }_{4}^{2}[/latex] where [latex]df = \text{the number of cells} – 1 = 5 – 1 = 4[/latex].

Note: [latex]df \neq 600 – 1[/latex]
Calculate the test statistic: [latex]\chi^2 = 29.65[/latex]
Graph:
Nonsymmetrical chi-square curve with a mean of 4 and test statistic of 29.65. The area to the right of this is equal to the p-value.
Figure 2. A nonsymmetric chi-square curve with a test statistic of 29.65 and a shaded area is equal to the p-value of 0.000006

 

Probability statement: [latex]\text{p-value} = P(\chi^2 > 29.65) = 0.000006[/latex]

Compare α and the p-value: [latex]\alpha = 0.01[/latex]; [latex]\text{p-value} = 0.000006[/latex]; So, [latex]\alpha > \text{p-value}[/latex].

Make a decision: Since [latex]\alpha > \text{p-value}[/latex], reject [latex]H_0[/latex].

This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.

Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the "number of televisions" distribution for the far western United States is different from the "number of televisions" distribution for the American population as a whole.

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and ENTER. Make sure to clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 66, 119, 349, 60, 15. Into L2, put the expected frequencies .10*600, .16*600, .55*600, .11*600, .08*600. Arrow over to list L3 and up to the name area "L3". Enter (L1-L2)^2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You should see "sum" (Enter L3). Rounded to 2 decimal places, you should see 29.65. Press 2nd DISTR. Press 7 or Arrow down to 7:χ2cdf and press ENTER. Enter (29.65,1E99,4). Rounded to four places, you should see 5.77E-6 = .000006 (rounded to six decimal places), which is the p-value.

The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw. Make sure you clear any lists before you start.

Example

Suppose you flip two coins 100 times. The results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the coins fair? Test at a 5% significance level.

Solution

This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is {HH, HT, TH, TT}. Out of 100 flips, you would expect 25 HH, 25 HT, 25 TH, and 25 TT. This is the expected distribution. The question, "Are the coins fair?" is the same as saying, "Does the distribution of the coins (20 HH, 27 HT, 30 TH, 23 TT) fit the expected distribution?"

Random Variable: Let [latex]X =[/latex] the number of heads in one flip of the two coins. X takes on the values 0, 1, 2. (There are 0, 1, or 2 heads in the flip of two coins.) Therefore, the number of cells is three. Since [latex]X =[/latex] the number of heads, the observed frequencies are 20 (for two heads), 57 (for one head), and 23 (for zero heads or both tails). The expected frequencies are 25 (for two heads), 50 (for one head), and 25 (for zero heads or both tails). This test is right-tailed.

[latex]H_0:[/latex] The coins are fair.

[latex]H_a:[/latex] The coins are not fair.

Distribution for the test: [latex]{\chi }_{2}^{2}[/latex] where [latex]df = 3 – 1 = 2[/latex].

Calculate the test statistic: [latex]\chi^2 = 2.14[/latex]

Graph:

Nonsymmetrical chi-square curve with a test statistic of 2.14. The area to the right of this is equal to the p-value.
Figure 3. A nonsymmetric chi-square curve with a test statistic of 2.14 and a shaded area is equal to the p-value of 0.3430.

 

Probability statement: p-value = [latex]P(\chi^2 > 2.14) = 0.3430[/latex]

Compare [latex]\alpha[/latex] and the p-value: Since [latex]\alpha = 0.05[/latex] and [latex]\text{p-value} = 0.3430[/latex], [latex]\alpha \lt \text{p-value}[/latex].

Make a decision: Since [latex]\alpha \lt \text{p-value}[/latex], do not reject [latex]H_0[/latex].

Conclusion: There is insufficient evidence to conclude that the coins are not fair.

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and ENTER. Make sure you clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 20, 57, 23. Into L2, put the expected frequencies 25, 50, 25. Arrow over to list L3 and up to the name area "L3". Enter (L1-L2)^2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You should see "sum". Enter L3. Rounded to two decimal places, you should see 2.14. Press 2nd DISTR. Arrow down to 7:χ2cdf (or press 7). Press ENTER. Enter 2.14,1E99,2). Rounded to four places, you should see .3430, which is the p-value.

The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw. Make sure you clear any lists before you start.

Your Turn!

Students in a social studies class hypothesize that the literacy rates across the world for every region are 82%. The following table shows the actual literacy rates across the world broken down by region. What are the test statistics and the degrees of freedom?

Table 5: MDG Region and Adult Literacy Rate (%)
MDG Region Adult Literacy Rate (%)
Developed Regions 99.0
Commonwealth of Independent States 99.5
Northern Africa 67.3
Sub-Saharan Africa 62.5
Latin America and the Caribbean 91.0
Eastern Asia 93.8
Southern Asia 61.9
South-Eastern Asia 91.9
Western Asia 84.5
Oceania 66.4

Solution

[latex]df = 9[/latex]

[latex]\chi^2 \text{test statistic} = 26.38[/latex]

Nonsymmetrical chi-square curve (df = 9) with a mean of 9 and test statistic of 26.38. The area to the right of this is equal to the p-value.
Figure 3. A nonsymmetric chi-square curve (df = 9) with a test statistic of 26.38 and a shaded area is equal to the p-value of 0.0018

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and ENTER. Make sure you clear lists L1, L2, and L3 if they have data in them. Into L1, put the observed frequencies 99, 99.5, 67.3, 62.5, 91, 93.8, 61.9, 91.9, 84.5, 66.4. Into L2, put the expected frequencies 82, 82, 82, 82, 82, 82, 82, 82, 82, 82. Arrow over to list L3 and up to the name area "L3". Enter (L1-L2)^2/L2 and ENTER. Press 2nd QUIT. Press 2nd LIST and arrow over to MATH. Press 5. You should see "sum". Enter L3. Rounded to two decimal places, you should see 26.38. Press 2nd DISTR. Arrow down to 7:χ2cdf (or press 7). Press ENTER. Enter 26.38,1E99,9). Rounded to four places, you should see .0018, which is the p-value.

The newer TI-84 calculators have in STAT TESTS the test Chi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. Press STAT TESTS and Chi2 GOF. Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and press calculate or draw. Make sure you clear any lists before you start.

 

Test of Independence

Keeping with our style, the by-hand method will still be shown for the below problems as it did in 11.3 Test of Independence. This is to continuously emphasize that doing hypothesis tests is a lot more than just plugging into the calculator - you need to know the process involved. You will see a box that separates the calculator information.

Example

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. The table below shows a sample of the adult volunteers and the number of hours they volunteer per week.

Table 6: Number of Hours Worked per Week by Volunteer Type (Observed)
Type of Volunteer 1–3 Hours 4–6 Hours 7–9 Hours Row Total
Community College Students 111 96 48 255
Four-Year College Students 96 133 61 290
Nonstudents 91 150 53 294
Column Total 298 379 162 839

Note: The table contains observed (O) values (data).

Is the number of hours volunteered independent of the type of volunteer?

Solution

The observed table and the question at the end of the problem, "Is the number of hours volunteered independent of the type of volunteer?" tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.

[latex]H_0:[/latex] The number of hours volunteered is independent of the type of volunteer.

[latex]H_a:[/latex] The number of hours volunteered is dependent on the type of volunteer.

The expected results are in the following table.

Table 7: Number of Hours Worked per Week by Volunteer Type (Expected)
Type of Volunteer 1-3 Hours 4-6 Hours 7-9 Hours
Community College Students 90.57 115.19 49.24
Four-Year College Students 103.00 131.00 56.00
Nonstudents 104.42 132.81 56.77

Note: The table contains expected (E) values (data).

For example, the calculation for the expected frequency for the top left cell is [latex]E=\frac{\left(\text{row total}\right)\left(\text{column total}\right)}{\text{total number surveyed}}=\frac{\left(255\right)\left(298\right)}{839}=90.57[/latex]

Calculate the test statistic: [latex]\chi^2 = 12.99[/latex] (You can find this using your calculator or computer.)

Distribution for the test: [latex]{\chi }_{4}^{2}[/latex]

[latex]df = (3 \text{columns} – 1)(3 \text{rows} – 1) = (2)(2) = 4[/latex]

Graph:

Nonsymmetrical chi-square curve with a test statistic of 12.99. The area to the right of this is equal to the p-value.
Figure 4. A nonsymmetric chi-square curve with a test statistic of 12.99 and a shaded area is equal to the p-value of 0.0113.

 

Probability statement: [latex]\text{p-value} = P(\chi^2 > 12.99) = 0.0113[/latex]

Compare [latex]\alpha[/latex] and the p-value: Since no [latex]\alpha[/latex] is given, assume [latex]\alpha = 0.05[/latex]. [latex]\text{p-value} = 0.0113[/latex], so [latex]\alpha > \text{p-value}[/latex].

Make a decision: Since [latex]\alpha > p-value[/latex], reject [latex]H_0[/latex]. This means that the factors are not independent.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.

For the example in the expected results table, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row from the expected results table. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 12.9909 and the p-value = 0.0113. Do the procedure a second time, but arrow down to Draw instead of calculate.

Your Turn!

The Bureau of Labor Statistics gathers data about employment in the United States. A sample is taken to calculate the number of U.S. citizens working in one of several industry sectors over time. The table below shows the results:

Table 8: Industry Sector and Employment by Year
Industry Sector 2000 2010 2020 Total
Nonagriculture wage and salary 13,243 13,044 15,018 41,305
Goods-producing, excluding agriculture 2,457 1,771 1,950 6,178
Services-providing 10,786 11,273 13,068 35,127
Agriculture, forestry, fishing, and hunting 240 214 201 655
Nonagriculture self-employed and unpaid family worker 931 894 972 2,797
Secondary wage and salary jobs in agriculture and private household industries 14 11 11 36
Secondary jobs as a self-employed or unpaid family worker 196 144 152 492
Total 27,867 27,351 31,372 86,590

We want to know if the change in the number of jobs is independent of the change in years. State the null and alternative hypotheses and the degrees of freedom.

Solution

[latex]H_0:[/latex] The number of jobs is independent of the year.

[latex]H_a:[/latex] The number of jobs is dependent on the year.

[latex]df = 12[/latex]

Nonsymmetrical chi-square curve (df = 12) with a test statistic of 227.73. The area to the right of this is equal to the p-value.
Figure 5. A nonsymmetric chi-square curve with a test statistic of 227.73 and a shaded area is equal to the p-value of almost 0.

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 227.73 and the p−value = 5.90E - 42 = 0. Do the procedure a second time but arrow down to Draw instead of calculate.

 

Test for Homogeneity

Using problems from 11.4 Test for Homogeneity, showing the by-hand method will still be our go-to, but you will see a box that separates the calculator information.

Example

Do male and female college students have the same distribution of living arrangements? Use a level of significance of 0.05. Suppose that 250 randomly selected male college students and 300 randomly selected female college students were asked about their living arrangements: dormitory, apartment, with parents, other. The results are shown in the table below. Do male and female college students have the same distribution of living arrangements?

Table 9: Distribution of Living Arrangements for College Males and College Females
Dormitory Apartment With Parents Other
Males 72 84 49 45
Females 91 86 88 35

Solution

[latex]H_0:[/latex] The distribution of living arrangements for male college students is the same as the distribution of living arrangements for female college students.

[latex]H_a:[/latex] The distribution of living arrangements for male college students is not the same as the distribution of living arrangements for female college students.

Degrees of Freedom: [latex]df = \text{number of columns} – 1 = 4 – 1 = 3[/latex]

Distribution for the test: [latex]{\chi }_{3}^{2}[/latex]

Calculate the test statistic: [latex]\chi^2 = 10.1287[/latex] (Use your calculator or computer.)

Probability statement: [latex]\text{p-value} = P(\chi^2 >10.1287) = 0.0175[/latex]

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 2 ENTER 4 ENTER. Enter the table values by row. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:[latex]\chi^2[/latex]-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 10.1287 and the p-value = 0.0175. Do the procedure a second time but arrow down to Draw instead of calculate.

Compare [latex]\alpha[/latex] and the [latex]\text{p-value}[/latex]: Since no [latex]\alpha[/latex] is given, assume [latex]\alpha = 0.05[/latex]. [latex]\text{p-value} = 0.0175[/latex], so [latex]\alpha > \text{p-value}[/latex].

Make a decision: Since [latex]\alpha > \text{p-value}[/latex], reject [latex]H_0[/latex]. This means that the distributions are not the same.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the distributions of living arrangements for male and female college students are not the same.

Notice that the conclusion is only that the distributions are not the same. We cannot use the test for homogeneity to draw any conclusions about how they differ.

Example

Both before and after a recent earthquake, surveys were conducted asking voters which of the three candidates they planned on voting for in the upcoming city council election. Has there been a change since the earthquake? Use a level of significance of 0.05. The following table shows the results of the survey. Has there been a change in the distribution of voter preferences since the earthquake?

Table 10: Before/After and Candidate Voting
Perez Chung Stevens
Before 167 128 135
After 214 197 225

Solution
[latex]H_0[/latex]: The distribution of voter preferences was the same before and after the earthquake.

[latex]H_a[/latex]: The distribution of voter preferences was not the same before and after the earthquake.

Degrees of Freedom: [latex]df = \text{number of columns} – 1 = 3 – 1 = 2[/latex]

Distribution for the test: [latex]{\chi }_{2}^{2}[/latex]

Calculate the test statistic: [latex]\chi^2 = 3.2603[/latex] (Use your calculator or computer.)

Probability statement: [latex]\text{p-value}=P(\chi^2 > 3.2603) = 0.1959[/latex]

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 2 ENTER 3 ENTER. Enter the table values by row. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:[latex]\chi^2[/latex]-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 3.2603 and the p-value = 0.1959. Do the procedure a second time but arrow down to Draw instead of calculate.

Compare [latex]\alpha[/latex] and the p-value: [latex]\alpha = 0.05[/latex] and the [latex]\text{p-value} = 0.1959[/latex], so [latex]\alpha \lt \text{p-value}[/latex].

Make a decision: Since [latex]\alpha \lt \text{p-value}[/latex], do not reject [latex]H_0[/latex].

Conclusion: At a 5% level of significance, from the data, there is insufficient evidence to conclude that the distribution of voter preferences was not the same before and after the earthquake.

Your Turn!

Ivy League schools receive many applications, but only some can be accepted. At the schools listed in the table below, two types of applications are accepted: regular and early decision.

Table 11: Application Type and Ivy League School
Application Type Accepted Brown Columbia Cornell Dartmouth Penn Yale
Regular 2,115 1,792 5,306 1,734 2,685 1,245
Early Decision 577 627 1,228 444 1,195 761

We want to know if the number of regular applications accepted follows the same distribution as the number of early applications accepted. State the null and alternative hypotheses, the degrees of freedom and the test statistic, sketch the graph of the p-value, and draw a conclusion about the test of homogeneity.

Solution

[latex]H_0:[/latex] The distribution of regular applications accepted is the same as the distribution of early applications accepted.

[latex]H_a[/latex]: The distribution of regular applications accepted is not the same as the distribution of early applications accepted.

[latex]df = 5[/latex]

[latex]\chi^2 \text{test statistic} = 430.06[/latex]

Nonsymmetrical chi-square curve with a mean of 5 and a test statistic of 430.06. The area to the right of this is equal to the p-value.
Figure 6. A nonsymmetric chi-square curve with a test statistic of 430.06 and a shaded area is equal to the p-value of almost 0

Using the TI-83, 83+, 84, 84+ Calculator

Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down toC:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 430.06 and the p-value = 9.80E-91. Do the procedure a second time but arrow down to Draw instead of calculate.

 

Test of a Single Variance

Using similar examples from 11.6 Test of a Single Variance, showing the by-hand method still to make sure you know the process involved. You will see a box that separates the calculator information.

Example

With individual lines at its various windows, a post office finds that the standard deviation for normally distributed waiting times for customers on Friday afternoon is 7.2 minutes. The post office experiments with a single, main waiting line and finds that for a random sample of 25 customers, the waiting times for customers have a standard deviation of 3.5 minutes.

With a significance level of 5%, test the claim that a single line causes lower variation among waiting times (shorter waiting times) for customers.

Solution

Since the claim is that a single line causes less variation, this is a test of a single variance. The parameter is the population variance, [latex]\sigma^2[/latex], or the population standard deviation, [latex]\sigma[/latex].

Random Variable: The sample standard deviation, s, is the random variable.

Let [latex]s =[/latex] standard deviation for the waiting times.

  • [latex]H_0: \sigma^2 = 7.22[/latex]
  • [latex]H_a: \sigma^2 \lt 7.22[/latex]

The word "less" tells you this is a left-tailed test.

Distribution for the test: [latex]{\chi }_{24}^{2}[/latex], where [latex]n =[/latex] the number of customers sampled and [latex]df = n – 1 = 25 – 1 = 24[/latex].

Calculate the test statistic:

[latex]{\chi }^{2}=\frac{\left(n\text{ }-\text{ }1\right){s}^{2}}{{\sigma }^{2}}=\frac{\left(25\text{ }-\text{ }1\right){\left(3.5\right)}^{2}}{{7.2}^{2}}=5.67[/latex] where [latex]n = 25[/latex], [latex]s = 3.5[/latex], and [latex]\sigma = 7.2[/latex].

Graph:

Nonsymmetrical chi-square curve with a test statistic of 5.67. The area to the left of this is equal to the p-value.
Figure 7. A nonsymmetric chi-square curve with a test statistic of 5.67 and a shaded area is equal to the p-value of 0.000042

 

Probability statement: [latex]\text{p-value} = P (\chi^2 \lt 5.67) = 0.000042[/latex]

Compare [latex]\alpha[/latex] and the p-value: [latex]\alpha = 0.05[/latex]; [latex]\text{p-value} = 0.000042[/latex]; [latex]\alpha > \text{p-value}[/latex]

Make a decision: Since [latex]\alpha > \text{p-value}[/latex], reject [latex]H_0[/latex]. This means that you reject [latex]\sigma^2 = 7.22[/latex]. In other words, you do not think the variation in waiting times is 7.2 minutes; you think the variation in waiting times is less.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that a single line causes a lower variation among the waiting times or with a single line, the customer waiting times vary less than 7.2 minutes.

Using the TI-83, 83+, 84, 84+ Calculator

In 2nd DISTR, use 7: [latex]\chi^2[/latex]cdf. The syntax is (lower, upper, df) for the parameter list. [latex]\chi^2[/latex]cdf(-1E99,5.67,24). The p-value is 0.000042.

Your Turn!

The FCC conducts broadband speed tests to measure how much data per second passes between a consumer’s computer and the internet. As of August of 2012, the standard deviation of Internet speeds across Internet Service Providers (ISPs) was 12.2 percent. Suppose a sample of 15 ISPs is taken, and the standard deviation is 13.2. An analyst claims that the standard deviation of speeds is more than what was reported. State the null and alternative hypotheses, compute the degrees of freedom, the test statistic, sketch the graph of the p-value, and draw a conclusion. Test at the 1% significance level.

Solution

[latex]H_0: \chi^2 = 12.2^2[/latex]

[latex]H_a: \chi^2 > 12.2^2[/latex]

[latex]df = 14[/latex]

[latex]\chi^2 \text{test statistic} = 16.39[/latex]

Nonsymmetrical chi-square curve with a test statistic of 16.39. The area to the right of this is equal to the p-value.
Figure 8. A nonsymmetric chi-square curve with a test statistic of 16.39 and a shaded area is equal to the p-value of 0.2902.

 

The p-value is 0.2902, so we decline to reject the null hypothesis. There is not enough evidence to suggest that the variance is greater than [latex]12.2^2[/latex].

Using the TI-83, 83+, 84, 84+ Calculator

In 2nd DISTR, use 7: [latex]\chi^2[/latex]cdf. The syntax is (lower, upper, df) for the parameter list. [latex]\chi^2[/latex]cdf(16.39,10^99,14). The p-value is 0.2902.

 

Helpful Videos for the Graphing Calculator

Below are links to helpful videos for using the graphing calculator for the concepts covered on this page:

 

Additional Technology Tools

In addition to the graphing calculator, there are some additional technology tools that can be used for the concepts covered on this page. Below are links to helpful videos for those tools:

 

Section Practice

The columns in the table below contain the Race/Ethnicity of U.S. Public Schools for a recent year, the percentages for the Advanced Placement Examinee Population for that class, and the Overall Student Population. Suppose the right column contains the result of a survey of 1,000 local students from that year who took an AP Exam. Perform a goodness-of-fit test to determine whether the local results follow the distribution of the U.S. AP examinee population, based on ethnicity.

Table 12: Race/Ethnicity and AP Exam Population
Race/Ethnicity AP Examinee Population Overall Student Population Survey Frequency
Asian, Asian American, or Pacific Islander 10.2% 5.4% 113
Black or African American 8.2% 14.5% 94
Hispanic or Latino 15.5% 15.9% 136
American Indian or Alaska Native 0.6% 1.2% 10
White 59.4% 61.6% 604
Not reported/other 6.1% 1.4% 43

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Solution
  1. [latex]H_0:[/latex] The local results follow the distribution of the U.S. AP examinee population
  2. [latex]H_a:[/latex] The local results do not follow the distribution of the U.S. AP examinee population
  3. [latex]df = 5[/latex]
  4. chi-square distribution with [latex]df = 5[/latex]
  5. chi-square test statistic = 13.4
  6. p-value = 0.0199
  7. Check student’s solution.
  8. Correct decision (“reject” or “do not reject” the null hypothesis) and appropriate conclusions:

    • Alpha = 0.05
    • Decision: Reject null when [latex]\alpha = 0.05[/latex]
    • Reason for Decision: p-value < alpha
    • Conclusion: When [latex]\alpha = 0.05[/latex], local data do not fit the AP Examinee Distribution
    • Alpha = 0.01
    • Decision: Do not reject null when [latex]\alpha = 0.01[/latex]
    • Reason for Decision: p-value > alpha
    • Conclusion: When [latex]\alpha = 0.01[/latex], there is insufficient evidence to conclude that local data do not follow the distribution of the U.S. AP examinee distribution.

A sample of 212 commercial businesses was surveyed for recycling one commodity; a commodity here means any one type of recyclable material such as plastic or aluminum. The table below shows the business categories in the survey, the sample size of each category, and the number of businesses in each category that recycle one commodity. Based on the study, on average half of the businesses were expected to be recycling one commodity. As a result, the last column shows the expected number of businesses in each category that recycle one commodity. At the 5% significance level, perform a hypothesis test to determine if the observed number of businesses that recycle one commodity follows the uniform distribution of the expected values.

Table 13: Business Type, Frequency, Observed, and Expected
Business type Number in class Observed number that recycle one commodity Expected number that recycle one commodity
Office 35 19 17.5
Retail/Wholesale 48 27 24
Food/Restaurants 53 35 26.5
Manufacturing/Medical 52 21 26
Hotel/Mixed 24 9 12

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

A recent debate about where in the United States skiers believe the skiing is best prompted the following survey. Test to see if the best ski area is independent of the level of the skier.

Table 14: US Ski Area and Level of Skier
U.S. Ski Area Beginner Intermediate Advanced
Tahoe 20 30 40
Utah 10 30 60
Colorado 10 40 50

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

An ice cream maker performs a nationwide survey about favorite flavors of ice cream in different geographic areas of the U.S. Based on the table below, do the numbers suggest that geographic location is independent of favorite ice cream flavors? Test at the 5% significance level.

Table 15: US Region and Flavor Ice Cream
U.S. region/Flavor Strawberry Chocolate Vanilla Rocky Road Mint Chocolate Chip Pistachio Row total
West 12 21 22 19 15 8 97
Midwest 10 32 22 11 15 6 96
East 8 31 27 8 15 7 96
South 15 28 30 8 15 6 102
Column Total 45 112 101 46 60 27 391

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

The following table provides a recent survey of the youngest online entrepreneurs whose net worth is estimated at one million dollars or more. Their ages range from 17 to 30. Each cell in the table illustrates the number of entrepreneurs who correspond to the specific age group and their net worth. Are the ages and net worth independent? Perform a test of independence at the 5% significance level.

Table 16: Age Group and New Worth Value
Age Group\ Net Worth Value (in millions of US dollars) 1–5 6–24 ≥25 Row Total
17–25 8 7 5 20
26–30 6 5 9 20
Column Total 14 12 14 40

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Solution
  1. [latex]H_0:[/latex] Age is independent of the youngest online entrepreneurs’ net worth.
  2. [latex]H_a:[/latex] Age is dependent on the net worth of the youngest online entrepreneurs.
  3. [latex]df = 2[/latex]
  4. chi-square distribution with [latex]df = 2[/latex]
  5. test statistic = 1.76
  6. p-value = 0.4144
  7. Check student’s solution.
  8. Correct decision (“reject” or “do not reject” the null hypothesis) and appropriate conclusions:
    • Alpha: 0.05
    • Decision: Do not reject the null hypothesis.
    • Reason for decision: p-value > alpha
    • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that age and net worth for the youngest online entrepreneurs are dependent.

In 2007, the United States had 1.5 million homeschooled students, according to the U.S. National Center for Education Statistics. In the table below you can see that parents decide to homeschool their children for different reasons, and some reasons are ranked by parents as more important than others. According to the survey results shown in the table, is the distribution of applicable reasons the same as the distribution of the most important reason? Provide your assessment at the 5% significance level. Did you expect the result you obtained?

Table 17: Homeschooling Reason and Ranking of Reason
Reasons for Homeschooling Applicable Reason (in thousands of respondents) Most Important Reason (in thousands of respondents) Row Total
Concern about the environment of other schools 1,321 309 1,630
Dissatisfaction with academic instruction at other schools 1,096 258 1,354
To provide religious or moral instruction 1,257 540 1,797
Child has special needs, other than physical or mental 315 55 370
Nontraditional approach to child’s education 984 99 1,083
Other reasons (e.g., finances, travel, family time, etc.) 485 216 701
Column Total 5,458 1,477 6,935

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

When looking at energy consumption, we are often interested in detecting trends over time and how they correlate among different countries. The information in the table below shows the average energy use (in units of kg of oil equivalent per capita) in the USA and the joint European Union countries (EU) for the six-year period 2005 to 2010. Do the energy use values in these two areas come from the same distribution? Perform the analysis at the 5% significance level.

Table 18: EU and US Energy Consumption by Year
Year European Union United States Row Total
2010 3,413 7,164 10,557
2009 3,302 7,057 10,359
2008 3,505 7,488 10,993
2007 3,537 7,758 11,295
2006 3,595 7,697 11,292
2005 3,613 7,847 11,460
Column Total 45,011 20,965 65,976

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Solution
  1. [latex]H_0:[/latex] The distribution of average energy use in the USA is the same as in Europe between 2005 and 2010.
  2. [latex]H_a:[/latex] The distribution of average energy use in the USA is not the same as in Europe between 2005 and 2010.
  3. [latex]df = 4[/latex]
  4. chi-square with [latex]df = 4[/latex]
  5. test statistic = 2.7434
  6. p-value = 0.7395
  7. Check student’s solution.
  8. Correct decision (“reject” or “do not reject” the null hypothesis) and appropriate conclusions:
    • Alpha: 0.05
    • Decision: Do not reject the null hypothesis.
    • Reason for decision: p-value > alpha
    • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the average energy use values in the US and EU are not derived from different distributions for the period from 2005 to 2010.

The Insurance Institute for Highway Safety collects safety information about all types of cars every year, and publishes a report of Top Safety Picks among all cars, makes, and models. The table below presents the number of Top Safety Picks in six car categories for the two years 2009 and 2013. Analyze the table data to conclude whether the distribution of cars that earned the Top Safety Picks safety award has remained the same between 2009 and 2013. Derive your results at the 5% significance level.

Table 19: Year and Car Type
Year \ Car Type Small Mid-Size Large Small SUV Mid-Size SUV Large SUV Row Total
2009 12 22 10 10 27 6 87
2013 31 30 19 11 29 4 124
Column Total 43 52 29 21 56 10 211

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Is there a difference between the distribution of community college statistics students and the distribution of university statistics students in what technology they use on their homework? Of some randomly selected community college students, 43 used a computer, 102 used a calculator with built in statistics functions, and 65 used a table from the textbook. Of some randomly selected university students, 28 used a computer, 33 used a calculator with built in statistics functions, and 40 used a table from the textbook.

Conduct an appropriate hypothesis test using a 0.05 level of significance.

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Solution
  1. [latex]H_0:[/latex] The distribution for technology use is the same for community college students and university students.
  2. [latex]H_a:[/latex] The distribution for technology use is not the same for community college students and university students.
  3. [latex]df=2[/latex]
  4. chi-square with [latex]df = 2[/latex]
  5. 7.05
  6. p-value = 0.0294
  7. Check student’s solution.
  8. Correct decision (“reject” or “do not reject” the null hypothesis) and appropriate conclusions:
    • Alpha: 0.05
    • Decision: Reject the null hypothesis.
    • Reason for decision: p-value < alpha
    • Conclusion: There is sufficient evidence to conclude that the distribution of technology used for statistics homework is not the same for statistics students at community colleges and at universities.

According to an avid aquarist, the average number of fish in a 20-gallon tank is 10, with a standard deviation of two. His friend, also an aquarist, does not believe that the standard deviation is two. She counts the number of fish in 15 other 20-gallon tanks. Based on the results that follow, do you think that the standard deviation is different from two?

Data: 11; 10; 9; 10; 10; 11; 11; 10; 12; 9; 7; 9; 11; 10; 11

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

Solution
  1. [latex]H_0: \sigma = 2[/latex]
  2. [latex]H_a: \sigma \neq 2[/latex]
  3. [latex]df = 14[/latex]
  4. chi-square distribution with [latex]df = 14[/latex]
  5. chi-square test statistic = 5.2094
  6. p-value = 0.0346
  7. Check student’s solution.
  8. Correct decision, reason, and conclusion:
    • Alpha = 0.05
    • Decision: Reject the null hypothesis
    • Reason for decision: p-value < alpha
    • Conclusion: There is sufficient evidence to conclude that the standard deviation is different than 2.

The manager of "Frenchies" is concerned that patrons are not consistently receiving the same amount of French fries with each order. The chef claims that the standard deviation for a ten-ounce order of fries is at most 1.5 oz., but the manager thinks that it may be higher. He randomly weighs 49 orders of fries, which yields a mean of 11 oz. and a standard deviation of two oz.

Use the Chi-Square Distribution - Solution Sheet on the Introduction to Chapter 11 page. (Round expected frequency to two decimal places.)

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.