"

Chapter 9: Hypothesis Testing with Two Samples

9.4 Matched or Paired Samples

Learning Objectives

By the end of this section, the student should be able to:

  • Conduct and interpret hypothesis tests for matched or paired samples.

Hypothesis Tests for Matched or Paired Samples

When using a hypothesis test for matched or paired samples, the following characteristics should be present:

  1. Simple random sampling is used.
  2. Sample sizes are often small.
  3. Two measurements (samples) are drawn from the same pair of individuals or objects.
  4. Differences are calculated from the matched or paired samples.
  5. The differences form the sample that is used for the hypothesis test.
  6. Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, [latex]\mu_d[/latex], is then tested using a Student's t-test for a single population mean with [latex]n – 1[/latex] degrees of freedom, where [latex]n[/latex] is the number of differences.

The test statistic (t-score) is: [latex]t=\frac{{\overline{x}}_{d}-{\mu }_{d}}{\left(\frac{{s}_{d}}{\sqrt{n}}\right)}[/latex]

Remember, use the probability table found in the Back Matter - Statistics Tables where needed.

Example

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in the table below. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. The differences have a normal distribution. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Table 1: Before and After for Effectiveness of Hypnotism in Reducing Pain
Subject A B C D E F G H
Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6
After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0
Solution

Corresponding "before" and "after" values form matched pairs. (Calculate "after" – "before.")

Table 2: After and Before with Difference Calculated 
After Data Before Data Difference
6.8 6.6 0.2
2.4 6.5 -4.1
7.4 9 -1.6
8.5 10.3 -1.8
8.1 11.3 -3.2
6.1 8.1 -2
3.4 6.3 -2.9
2 11.6 -9.6

The data for the test are the differences: {0.2, –4.1, –1.6, –1.8, –3.2, –2, –2.9, –9.6}

The sample mean and sample standard deviation of the differences are [latex]\overline{{x}_{d}}=–3.13[/latex] and [latex]{s}_{d}=2.91[/latex]

Verify these values.

Let [latex]{\mu }_{d}[/latex] be the population mean for the differences. We use the subscript [latex]d[/latex] to denote "differences."

Random variable: [latex]{\overline{X}}_{d}[/latex] = the mean difference of the sensory measurements

[latex]H_0: \mu_d \ge 0[/latex]

The null hypothesis is zero or positive, meaning that there is the same or more pain felt after hypnotism. That means the subject shows no improvement. μd is the population mean of the differences.

[latex]H_a: \mu_d \lt 0[/latex]

The alternative hypothesis is negative, meaning there is less pain felt after hypnotism. That means the subject shows improvement. The score should be lower after hypnotism, so the difference ought to be negative to indicate improvement.

Distribution for the test: The distribution is a Student's t with [latex]df = n – 1 = 8 – 1 = 7[/latex]. Use [latex]t_7[/latex]. (Notice that the test is for a single population mean.)

Calculate the p-value using the Student's t-distribution: [latex]\text{p-value} = 0.0095[/latex]

Graph:

Normal distribution curve with values of -3.13 and 0. The p-value is indicated in the area in the left tail.
Figure 1. Normal distribution curve of the average difference of sensory measurements with values of -3.13 and 0 on the axis. The p-value of 0.0095 in the left tail.

 

[latex]{\overline{X}}_{d}[/latex] is the random variable for the differences.

The sample mean and sample standard deviation of the differences are:

[latex]{\overline{x}}_{d} = -3.13[/latex]

[latex]{\overline{s}}_{d} = 2.9[/latex]

Compare α and the p-value: [latex]\alpha = 0.05[/latex] and [latex]\text{p-value} = 0.0095[/latex], so [latex]\alpha > \text{p-value}[/latex].

Make a decision: Since [latex]\alpha > \text{p-value}[/latex], reject [latex]H_0[/latex]. This means that [latex]\mu_d \lt 0[/latex] and there is improvement.

Conclusion: At a 5% level of significance, from the sample data, there is sufficient evidence to conclude that the sensory measurements, on average, are lower after hypnotism. Hypnotism appears to be effective in reducing pain.

Your Turn!

A study was conducted to investigate how effective a new diet was in lowering cholesterol. Results for the randomly selected subjects are shown in the table. The differences have a normal distribution. Are the subjects’ cholesterol levels lower on average after the diet? Test at the 5% level.

Table 3: Before and After for Effectiveness of New Diet in Lowering Cholesterol
Subject A B C D E F G H I
Before 209 210 205 198 216 217 238 240 222
After 199 207 189 209 217 202 211 223 201
Solution

The p-value is 0.0130, so we can reject the null hypothesis. There is enough evidence to suggest that the diet lowers cholesterol.

Example

A college football coach was interested in whether the college's strength development class increased his players' maximum lift (in pounds) on the bench press exercise. He asked four of his players to participate in a study. The amount of weight they could each lift was recorded before they took the strength development class. After completing the class, the amount of weight they could each lift was again measured. The data are as follows:

Table 4: Prior and After Amount Weight Lifted
Weight (in pounds) Player 1 Player 2 Player 3 Player 4
Amount of weight lifted prior to the class 205 241 338 368
Amount of weight lifted after the class 295 252 330 360

The coach wants to know if the strength development class makes his players stronger, on average.

Record the differences data. Calculate the differences by subtracting the amount of weight lifted prior to the class from the weight lifted after completing the class. The data for the differences are: {90, 11, -8, -8}. Assume the differences have a normal distribution.

Using the differences data, calculate the sample mean and the sample standard deviation.

[latex]{\overline{x}}_{d} = 21.3[/latex], [latex]s_d = 46.7[/latex]

Note: The data given here would indicate that the distribution is actually right-skewed. The difference 90 may be an extreme outlier? It is pulling the sample mean to be 21.3 (positive). The means of the other three data values are actually negative.

Using the difference data, this becomes a test of a single [latex]\underline{\hspace{2cm}}[/latex] (fill in the blank).

Define the random variable: [latex]{\overline{X}}_{d}[/latex] mean difference in the maximum lift per player.

The distribution for the hypothesis test is [latex]t_3[/latex].

[latex]H_0: \mu_d \le 0, H_a: \mu_d > 0[/latex]

Graph:

Normal distribution curve with values of 0 and 21.3. The p-value is indicated in the area in the right tail.
Figure 2. Normal distribution curve with p-value of 0.2150 in the right tail.

 

Calculate the p-value: The p-value is 0.2150

Decision: If the level of significance is 5%, the decision is not to reject the null hypothesis, because [latex]\alpha \lt \text{p-value}[/latex].

What is the conclusion?

At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the strength development class helped to make the players stronger, on average.

Your Turn!

A new prep class was designed to improve SAT test scores. Five students were selected at random. Their scores on two practice exams were recorded, one before the class and one after. The data recorded in the table below. Are the scores, on average, higher after the class? Test at a 5% level.

Table 5: Before and After SAT Scores
SAT Scores Student 1 Student 2 Student 3 Student 4
Score before class 1840 1960 1920 2150
Score after class 1920 2160 2200 2100
Solution

The p-value is 0.0874, so we decline to reject the null hypothesis. The data do not support that the class improves SAT scores significantly.

Example

Seven eighth graders at Kennedy Middle School measured how far they could push the shot-put with their dominant (writing) hand and their weaker (non-writing) hand. They thought that they could push equal distances with either hand. The data were collected and recorded in the table below.

Table 6: Dominant versus Weaker Hand Shot-Put Throws
Distance (in feet) using Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7
Dominant Hand 30 26 34 17 19 26 20
Weaker Hand 28 14 27 18 17 26 16

Conduct a hypothesis test to determine whether the mean difference in distances between the children’s dominant versus weaker hands is significant.

Record the differences data. Calculate the differences by subtracting the distances with the weaker hand from the distances with the dominant hand. The data for the differences are: {2, 12, 7, –1, 2, 0, 4}. The differences have a normal distribution.

Using the differences data, calculate the sample mean and the sample standard deviation. [latex]{\overline{x}}_{d} = 3.71[/latex], [latex]{s}_{d} = 4.5[/latex]

Random variable: [latex]{\overline{X}}_{d}[/latex] = mean difference in the distances between the hands.

Distribution for the hypothesis test: [latex]t_6[/latex]

[latex]H_0: \mu_d = 0[/latex], and [latex]H_a: \mu_d \neq 0[/latex]

Graph:

This is a normal distribution curve with mean equal to zero. Both the right and left tails of the curve are shaded. Each tail represents 1/2(p-value) = 0.0358.
Figure 3. Normal Distribution with a two-tailed test.

 

Calculate the p-value: The p-value is 0.0716 (using the data directly).

([latex]\text{test statistic} = 2.18[/latex], [latex]\text{p-value} = 0.0719[/latex] using [latex]\left({\overline{x}}_{d}=3.71, {s}_{d}=4.5\right)[/latex])

Decision: Assume [latex]\alpha = 0.05[/latex]. Since [latex]\alpha \lt \text{p-value}[/latex], Do not reject [latex]H_0[/latex].

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude that there is a difference in the children’s weaker and dominant hands to push the shot-put.

Your Turn!

Five ball players think they can throw the same distance with their dominant hand (throwing) and off-hand (catching hand). The data were collected and recorded in the table below. Conduct a hypothesis test to determine whether the mean difference in distances between the dominant and off-hand is significant. Test at the 5% level.

Table 7: Dominant versus Off-hand Players 1-5
Player 1 Player 2 Player 3 Player 4 Player 5
Dominant Hand 120 111 135 140 125
Off-hand 105 109 98 111 99
Solution

The p-level is 0.0230, so we can reject the null hypothesis. The data show that the players do not throw the same distance with their off-hands as they do with their dominant hands.

Videos

Below are helpful videos for the content covered in this Section. Videos are provided from YouTube.

Section 9.4 Review

A hypothesis test for matched or paired samples (t-test) has these characteristics:

  • Test the differences by subtracting one measurement from the other measurement
  • Random Variable: [latex]{\overline{x}}_{d}[/latex] = mean of the differences
  • Distribution: Student’s-t distribution with [latex]n – 1[/latex] degrees of freedom
  • If the number of differences is small (less than 30), the differences must follow a normal distribution.
  • Two samples are drawn from the same set of objects.
  • Samples are dependent.

Formula Review

  • Test Statistic (t-score): [latex]t = \frac{{\overline{x}}_{d}-{\mu }_{d}}{\left(\frac{{s}_{d}}{\sqrt{n}}\right)}[/latex], where:
    • [latex]{\overline{x}}_{d}[/latex] is the mean of the sample differences.
    • [latex]\mu_d[/latex] is the mean of the population differences.
    • [latex]s_d[/latex] is the sample standard deviation of the differences.
    • [latex]n[/latex] is the sample size.

Section 9.4 Practice

A study was conducted to test the effectiveness of a software patch in reducing system failures over a six-month period. Results for randomly selected installations are shown in the table below. The “before” value is matched to an “after” value, and the differences are calculated. The differences have a normal distribution. Test at the 1% significance level.

Table 8: Before and After Installation
Installation A B C D E F G H
Before 3 6 4 2 5 8 2 6
After 1 5 2 0 1 0 2 2
  1. What is the random variable?
  2. State the null and alternative hypotheses.
  3. What is the p-value?
  4. Draw the graph of the p-value.
  5. What conclusion can you draw about the software patch?

A study was conducted to test the effectiveness of a juggling class. Before the class started, six subjects juggled as many balls as they could at once. After the class, the same six subjects juggled as many balls as they could. The differences in the number of balls are calculated. The differences have a normal distribution. Test at the 1% significance level.

Table 9: Before and After Effectiveness of Juggling Class
Subject A B C D E F
Before 3 4 3 2 4 5
After 4 5 6 4 5 7
  1. State the null and alternative hypotheses.
  2. What is the p-value?
  3. What is the sample mean difference?
  4. Draw the graph of the p-value.
  5. What conclusion can you draw about the juggling class?

A doctor wants to know if a blood pressure medication is effective. Six subjects have their blood pressures recorded. After twelve weeks on the medication, the same six subjects have their blood pressure recorded again. For this test, only systolic pressure is of concern. Test at the 1% significance level.

Table 10: Before and After Blood Pressure Medication
Patient A B C D E F
Before 161 162 165 162 166 171
After 158 159 166 160 167 169
  1. State the null and alternative hypotheses.
  2. What is the test statistic?
  3. What is the p-value?
  4. What is the sample mean difference?
  5. What is the conclusion?

Ten individuals went on a low–fat diet for 12 weeks to lower their cholesterol. The data are recorded in the table below. Do you think that their cholesterol levels were significantly lowered?

Table 11: Starting versus Ending Cholesterol Level on Diet
Starting cholesterol level Ending cholesterol level
140 140
220 230
110 120
240 220
200 190
180 150
190 200
360 300
280 300
260 240

Use the Hypothesis Testing with Two Samples - Solution Sheet on the Introduction to Chapter 9 page. (Note: If you are using a Student's t-distribution, including for paired data, you may assume that the underlying population is normally distributed, but in a real situation, you must first prove that assumption, however.)

A new AIDS prevention drug was tried on a group of 224 HIV positive patients. Forty-five patients developed AIDS after four years. In a control group of 224 HIV positive patients, 68 developed AIDS after four years. We want to test whether the method of treatment reduces the proportion of patients that develop AIDS after four years or if the proportions of the treated group and the untreated group stay the same.

Let the subscript t = treated patient and ut = untreated patient.

  1. The appropriate hypotheses are:
    • [latex]H_0: p_t \lt p_{ut} and H_a: p_t \ge p_{ut}[/latex]
    • [latex]H_0: p_t \le p_{ut} and H_a: p_t > p_{ut}[/latex]
    • [latex]H_0: p_t = p_{ut} and H_a: p_t \neq p_{ut}[/latex]
    • [latex]H_0: p_t = p_{ut} and H_a: p_t \lt p_{ut}[/latex]
  2. If the p-value is 0.0062 what is the conclusion (use [latex]\alpha = 0.05[/latex])?
    • The method has no effect.
    • There is sufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.
    • There is sufficient evidence to conclude that the method increases the proportion of HIV positive patients who develop AIDS after four years.
    • There is insufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.

An experiment is conducted to show that blood pressure can be consciously reduced in people trained in a “biofeedback exercise program.” Six subjects were randomly selected and blood pressure measurements were recorded before and after the training. The difference between blood pressures was calculated (after - before) producing the following results: [latex]{\overline{x}}_{d} = -10.2; s_d = 8.4.[/latex] Using the data, test the hypothesis that the blood pressure has decreased after the training.

  1. The distribution for the test is:
    • [latex]t_5[/latex]
    • [latex]t_6[/latex]
    • [latex]N(−10.2, 8.4)[/latex]
    • [latex]N(−10.2, \frac{8.4}{\sqrt{6}})[/latex]
  2. If [latex]\alpha = 0.05[/latex], the p-value and the conclusion are
    • 0.0014; There is sufficient evidence to conclude that the blood pressure decreased after the training.
    • 0.0014; There is sufficient evidence to conclude that the blood pressure increased after the training.
    • 0.0155; There is sufficient evidence to conclude that the blood pressure decreased after the training.
    • 0.0155; There is sufficient evidence to conclude that the blood pressure increased after the training.

A golf instructor is interested in determining if her new technique for improving players’ golf scores is effective. She takes four new students. She records their 18-hole scores before learning the technique and then after having taken her class. She conducts a hypothesis test. The data are as follows.

Table 12: Mean Score Before and After
Player 1 Player 2 Player 3 Player 4
Mean score before class 83 78 93 87
Mean score after class 80 80 86 86

The correct decision is:

  1. Reject [latex]H_0[/latex].
  2. Do not reject the [latex]H_0[/latex].

A local cancer support group believes that the estimate for new female breast cancer cases in the south is higher in 2013 than in 2012. The group compared the estimates of new female breast cancer cases by southern state in 2012 and in 2013. The results are in the table below.

Table 13: Southern States 2012 versus 2013
Southern States 2012 2013
Alabama 3,450 3,720
Arkansas 2,150 2,280
Florida 15,540 15,710
Georgia 6,970 7,310
Kentucky 3,160 3,300
Louisiana 3,320 3,630
Mississippi 1,990 2,080
North Carolina 7,090 7,430
Oklahoma 2,630 2,690
South Carolina 3,570 3,580
Tennessee 4,680 5,070
Texas 15,050 14,980
Virginia 6,190 6,280
Solution
  • Test: two matched pairs or paired samples (t-test)
  • Random variable: [latex]{\overline{X}}_{d}[/latex]
  • Distribution: [latex]t_{12}[/latex]
  • Hypothesis: [latex]H_0: \mu_d = 0; H_a: \mu_d > 0[/latex]
  • The mean of the differences of new female breast cancer cases in the south between 2013 and 2012 is greater than zero. The estimate for new female breast cancer cases in the south is higher in 2013 than in 2012.
  • Graph: right-tailed
  • p-value: 0.0004
  • Decision: Reject [latex]H_0[/latex]
  • Conclusion: At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that there was a higher estimate of new female breast cancer cases in 2013 than in 2012.
This is a normal distribution curve with mean equal to zero. The region under the curve to the right of the line is shaded representing p-value = 0.0004.
Figure 5. Normal Distribution with a right tailed test.

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.