Chapter 1: Sampling and Data
Chapter 1 Practice
Practice from 1.2
Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment program. Suppose that a new AIDS antibody drug is currently under study. It is given to patients once the AIDS symptoms have revealed themselves. Of interest is the average (mean) length of time in months patients live once they start the treatment. Two researchers each follow a different set of 40 patients with AIDS from the start of treatment until their deaths. The following data (in months) are collected.
Researcher A: 3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13; 21; 22; 10; 12; 8; 40; 32; 26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34
Researcher B: 3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21; 21; 16; 12; 18; 41; 22; 16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29
Determine what the following key terms refer to in the example for Researcher A:
- population
- sample
- parameter
- statistic
- variable
Solution
- The population is AIDS patients.
- The sample is AIDS patients sampled from researcher A and researcher B.
- The parameter is the average length of time (in months) AIDS patients live after treatment.
- The statistic is the average length of time (in months) AIDS patients from the sample live after treatment.
- The variable is X = the length of time (in months) AIDS patients live after treatment.
“Number of times per week” is what type of data?
a. qualitative
b. quantitative discrete
c. quantitative continuous
Solution
b
Use the following information to answer the next four exercises.
A study was done to determine the age, number of times per week, and the duration (amount of time) of residents using a local park in San Antonio, Texas. The first house in the neighborhood around the park was selected randomly, and then the resident of every eighth house in the neighborhood around the park was interviewed.
1. The sampling method was
a. simple random
b. systematic
c. stratified
d. cluster
Solution
b
2. “Duration (amount of time)” is what type of data?
a. qualitative
b. quantitative discrete
c. quantitative continuous
Solution
c
3. The colors of the houses around the park are what kind of data?
a. qualitative
b. quantitative discrete
c. quantitative continuous
Solution
a
4. The population is ______________________
Solution
the houses in the neighborhood around a local park in San Antonio, Texas for the given example.
The table below contains the total number of deaths worldwide as a result of earthquakes from 2000 to 2012.
| Year | Total Number of Deaths |
|---|---|
| 2000 | 231 |
| 2001 | 21,357 |
| 2002 | 11,685 |
| 2003 | 33,819 |
| 2004 | 228,802 |
| 2005 | 88,003 |
| 2006 | 6,605 |
| 2007 | 712 |
| 2008 | 88,011 |
| 2009 | 1,790 |
| 2010 | 320,120 |
| 2011 | 21,953 |
| 2012 | 768 |
| Total | 823,856 |
Use the table to answer the following questions.
- What is the proportion of deaths between 2007 and 2012 (not including the years 2007 and 2012)?
- What percent of deaths occurred before 2001?
- What is the percentage of deaths that occurred in 2003 or after 2010?
- What is the fraction of deaths that happened before 2012?
- What kind of data is the number of deaths?
- Earthquakes are quantified according to the amount of energy they produce (examples are 2.1, 5.0, 6.7). What type of data is that?
- What contributed to the large number of deaths in 2010? In 2004? Explain.
Solution
- 0.5242
- 0.03%
- 6.86%
- [latex]\frac{823,088}{823,856}[/latex]
- quantitative discrete
- quantitative continuous
- In both years, underwater earthquakes produced massive tsunamis.
For the following four exercises, determine the type of sampling used (simple random, stratified, systematic, cluster, or convenience).
1. A group of test subjects is divided into twelve groups; then four of the groups are chosen at random.
Solution
cluster
2. A market researcher polls every tenth person who walks into a store.
Solution
systematic
3. The first 50 people who walk into a sporting event are polled on their television preferences.
Solution
convenience
4. A computer generates 100 random numbers, and 100 people whose names correspond with the numbers on the list are chosen.
Solution
simple random
Use the following information to answer the next seven exercises.
Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment program. Suppose that a new AIDS antibody drug is currently under study. It is given to patients once the AIDS symptoms have revealed themselves. Of interest is the average (mean) length of time in months patients live once starting the treatment. Two researchers each follow a different set of 40 AIDS patients from the start of treatment until their deaths. The following data (in months) are collected.
Researcher A: 3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13; 21; 22; 10; 12; 8; 40; 32; 26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34
Researcher B: 3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21; 21; 16; 12; 18; 41; 22; 16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29
Complete the tables using the data provided:
| Survival Length (in months) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 0.5–6.5 | |||
| 6.5–12.5 | |||
| 12.5–18.5 | |||
| 18.5–24.5 | |||
| 24.5–30.5 | |||
| 30.5–36.5 | |||
| 36.5–42.5 | |||
| 42.5–48.5 |
| Survival Length (in months) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 0.5–6.5 | |||
| 6.5–12.5 | |||
| 12.5–18.5 | |||
| 18.5–24.5 | |||
| 24.5–30.5 | |||
| 30.5–36.5 | |||
| 36.5-45.5 |
1. Determine what the key term data refers to in the above example for Researcher A.
Solution
values for X, such as 3, 4, 11, and so on
2. List two reasons why the data may differ.
Solution
Answers will vary. Sample answer: One reason may be the average age of the individuals in the two samples. Or, perhaps the drug affects men and women differently. If the ratio of men and women aren’t the same in both sample groups, then the data would differ.
3. Can you tell if one researcher is correct and the other one is incorrect? Why?
Solution
No, we do not have enough information to make such a claim.
4. Would you expect the data to be identical? Why or why not?
Solution
Since the treatment is not the same the data might be different unless neither treatment has an effect.
5. How might the researchers gather random data?
Solution
Take a simple random sample from each group. One way is by assigning a number to each patient and using a random number generator to randomly select patients.
6. Suppose that the first researcher conducted his survey by randomly choosing one state in the nation and then randomly picking 40 patients from that state. What sampling method would that researcher have used?
Solution
He has used a simple random sample method.
7. Suppose that the second researcher conducted his survey by choosing 40 patients he knew. What sampling method would that researcher have used? What concerns would you have about this data set, based upon the data collection method?
Solution
This would be convenience sampling and is not random.
Use the following data to answer the next five exercises.
Two researchers are gathering data on hours of video games played by school-aged children and young adults. They each randomly sample different groups of 150 students from the same school. They collect the following data.
| Hours Played per Week | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 0–2 | 26 | 0.17 | 0.17 |
| 2–4 | 30 | 0.20 | 0.37 |
| 4–6 | 49 | 0.33 | 0.70 |
| 6–8 | 25 | 0.17 | 0.87 |
| 8–10 | 12 | 0.08 | 0.95 |
| 10–12 | 8 | 0.05 | 1 |
| Hours Played per Week | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 0–2 | 48 | 0.32 | 0.32 |
| 2–4 | 51 | 0.34 | 0.66 |
| 4–6 | 24 | 0.16 | 0.82 |
| 6–8 | 12 | 0.08 | 0.90 |
| 8–10 | 11 | 0.07 | 0.97 |
| 10–12 | 4 | 0.03 | 1 |
1. Give a reason why the data may differ.
Solution
The researchers are studying different groups, so there will be some variation in the data.
2. Would the sample size be large enough if the population is the students in the school?
Solution
Yes, the sample size of 150 would be large enough to reflect a population of one school.
3. Would the sample size be large enough if the population is school-aged children and young adults in the United States?
Solution
There are many school-aged children and young adults in the United States, and the study was done at only one school, so the sample size is not large enough to reflect the population.
4. Researcher A concludes that most students play video games between four and six hours each week. Researcher B concludes that most students play video games between two and four hours each week. Who is correct?
Solution
Even though the specific data support each researcher’s conclusions, the different results suggest that more data need to be collected before the researchers can reach a conclusion.
5. As part of a way to reward students for participating in the survey, the researchers gave each student a gift card to a video game store. Would this affect the data if students knew about the award before the study?
Solution
Yes, people who play games more might be more likely to participate, since they would want the gift card more than a student who does not play video games. This would leave out many students who do not play games at all and skew the data.
Use the following data to answer the next five exercises.
A pair of studies was performed to measure the effectiveness of a new software program designed to help stroke patients regain their problem-solving skills. Patients were asked to use the software program twice a day, once in the morning and once in the evening. The studies observed 200 stroke patients recovering over a period of several weeks. The first study collected the data in Table A. The second study collected the data in Table B.
| Table A | |||
|---|---|---|---|
| Group | Showed improvement | No improvement | Deterioration |
| Used program | 142 | 43 | 15 |
| Did not use program | 72 | 110 | 18 |
| Table B | |||
|---|---|---|---|
| Group | Showed improvement | No improvement | Deterioration |
| Used program | 105 | 74 | 19 |
| Did not use program | 89 | 99 | 12 |
1. Given what you know, which study is correct?
Solution
There is not enough information given to judge if either one is correct or incorrect.
2. The first study was performed by the company that designed the software program. The second study was performed by the American Medical Association. Which study is more reliable?
Solution
The second study is more reliable, because the company would be interested in showing results that favored a higher rate of improvement from patients using their software. The data may be skewed; however, the American Medical Association is not concerned with the success of the software and so should be objective.
3. Both groups that performed the study concluded that the software works. Is this accurate?
Solution
The software program seems to work because the second study shows that more patients improve while using the software than not. Even though the difference is not as large as that in the first study, the results from the second study are likely more reliable and still show improvement.
4. The company takes the two studies as proof that their software causes mental improvement in stroke patients. Is this a fair statement?
Solution
No, the data suggest the two are correlated, but more studies need to be done to prove that using the software causes improvement in stroke patients.
5. Patients who used the software were also a part of an exercise program whereas patients who did not use the software were not. Does this change the validity of the conclusions from the tables?
Solution
Yes, because we cannot tell if the improvement was due to the software or the exercise; the data is confounded, and a reliable conclusion cannot be drawn. New studies should be performed.
1. Is a sample size of 1,000 a reliable measure for a population of 5,000?
Solution
Yes, 1,000 represents 20% of the population and should be representative, if the population of the sample is chosen at random.
2. Is a sample of 500 volunteers a reliable measure for a population of 2,500?
Solution
No, even though the sample is large enough, the fact that the sample consists of volunteers makes it a self-selected sample, which is not reliable.
A question on a survey reads: “Do you prefer the delicious taste of Brand X or the taste of Brand Y?” Is this a fair question?
Solution
No, the question is creating undue influence by adding the word “delicious” to describe Brand X. The wording may influence responses.
1. Is a sample size of two representative of a population of five?
Solution
No, even though the sample is a large portion of the population, two responses are not enough to justify any conclusions. Because the population is so small, it would be better to include everyone in the population to get the most accurate data.
2. Is it possible for two experiments to be well run with similar sample sizes to get different data?
Solution
Yes, there will most likely be a degree of variation between any two studies, even if they are set up and run the same way. Each study may be affected differently by unknown factors such as location, mood of the subjects, or time of year.