Chapter 7: Confidence Intervals
7.3 A Population Proportion
Learning Objectives
By the end of this section, the student should be able to:
- Calculate the error bound for a proportion.
- Calculate the confidence interval of [latex]p[/latex].
- Calculate the sample size [latex]n[/latex].
The error bound for a proportion
During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: [latex](0.40 – 0.03[/latex], [latex]0.40 + 0.03)[/latex].
Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.
The procedure to find the confidence interval, the sample size, the error bound, and the confidence level for a proportion is similar to that for the population mean, but the formulas are different.
How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. (There is no mention of a mean or average.) If [latex]X[/latex] is a binomial random variable, then [latex]X \sim B(n, p)[/latex] where [latex]n[/latex] is the number of trials and [latex]p[/latex] is the probability of a success. To form a proportion, take [latex]X[/latex], the random variable for the number of successes and divide it by [latex]n[/latex], the number of trials (or the sample size). The random variable [latex]P^{\prime}[/latex](read "P prime") is that proportion, [latex]{P}^{\prime }=\frac{X}{n}[/latex]
(Sometimes the random variable is denoted as [latex]\hat{P}[/latex], read "P hat".)
When [latex]n[/latex] is large and [latex]p[/latex] is not close to zero or one, we can use the normal distribution to approximate the binomial.
[latex]X\sim N\left(np,\sqrt{npq}\right)[/latex]
If we divide the random variable, the mean, and the standard deviation by [latex]n[/latex], we get a normal distribution of proportions with [latex]{P}^{\prime}[/latex], called the estimated proportion, as the random variable. (Recall that a proportion as the number of successes divided by [latex]n[/latex].)
[latex]\frac{X}{n}={P}^{\prime } \sim N\left(\frac{np}{n},\frac{\sqrt{npq}}{n}\right)[/latex]
Using algebra to simplify: [latex]\frac{np}{n} = p[/latex]
[latex]\frac{\sqrt{npq}}{n}=\sqrt{\frac{pq}{n}}[/latex]
[latex]P^{\prime}[/latex] follows a normal distribution for proportions: [latex]\frac{X}{n}={P}^{\prime } \sim N\left(p,\sqrt{\frac{pq}{n}}\right)[/latex]
The confidence interval has the form [latex](p^{\prime} – \text{EBP}, p^{\prime} + \text{EBP})[/latex]. EBP is error bound for the proportion.
[latex]p^{\prime} = \frac{x}{n}[/latex]
[latex]p^{\prime} = \text{the estimated proportion of successes}[/latex] ([latex]p^{\prime}[/latex] is a point estimate for [latex]p[/latex], the true proportion.)
[latex]x = \text{the number of successes}[/latex]
[latex]n = \text{the size of the sample}[/latex]
The error bound for a proportion is [latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}\right)[/latex] where [latex]q^{\prime} = 1 – p^{\prime}[/latex].
This formula is similar to the error bound formula for a mean, except that the "appropriate standard deviation" is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is [latex]\frac{\sigma }{\sqrt{n}}[/latex]. For a proportion, the appropriate standard deviation is [latex]\sqrt{\frac{pq}{n}}[/latex].
However, in the error bound formula, we use [latex]\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}[/latex] as the standard deviation, instead of [latex]\sqrt{\frac{pq}{n}}[/latex].
In the error bound formula, the sample proportions [latex]p^{\prime}[/latex] and [latex]q^{\prime}[/latex] are estimates of the unknown population proportions [latex]p[/latex] and [latex]q[/latex]. The estimated proportions [latex]{p}^{\prime}[/latex] and [latex]{q}^{\prime}[/latex] are used because [latex]p[/latex] and [latex]q[/latex] are not known. The sample proportions [latex]p^{\prime}[/latex] and [latex]q^{\prime}[/latex] are calculated from the data: [latex]p^{\prime}[/latex] is the estimated proportion of successes, and [latex]q^{\prime}[/latex] is the estimated proportion of failures.
Confidence Interval for a Proportion
To calculate the confidence interval, you must find [latex]{p}^{\prime}[/latex], [latex]{q}^{\prime}[/latex], and [latex]\text{EBP}[/latex].
[latex]{p}^{\prime }=\frac{x}{n}[/latex]
[latex]{q}^{\prime}= 1 - {p}^{\prime}[/latex]
[latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}[/latex]. Remember, like we did in Section 7.1, use the probability table found in the Back Matter - Statistics Tables to find the z-score.
The confidence interval for the true binomial population proportion is [latex](p^{\prime} - \text{EBP}, p^{\prime} + \text{EBP})[/latex].
The confidence interval can be used only if the number of successes [latex]n{p}^{\prime}[/latex] and the number of failures [latex]n{q}^{\prime}[/latex] are both greater than five.
Note
For the normal distribution of proportions, the z-score formula is as follows:
If [latex]{P}^{\prime } \sim N\left(p,\sqrt{\frac{pq}{n}}\right)[/latex] then the z-score formula is [latex]z=\frac{{p}^{\prime }-p}{\sqrt{\frac{pq}{n}}}[/latex]
Example
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.
Solution
Let X = the number of people in the sample who have cell phones. X is binomial. [latex]X \sim B\left(500,\frac{421}{500}\right)[/latex].
To calculate the confidence interval, you must find [latex]{p}^{\prime}[/latex], [latex]{q}^{\prime}[/latex], and [latex]\text{EBP}[/latex].
[latex]n = 500[/latex]
[latex]x = \text{the number of successes} = 421[/latex]
[latex]{p}^{\prime }=\frac{x}{n}=\frac{421}{500}=0.842[/latex]
[latex]{p}^{\prime} = 0.842[/latex] is the sample proportion; this is the point estimate of the population proportion.
[latex]{q}^{\prime}= 1 - {p}^{\prime}= 1 - 0.842 = 0.158[/latex]
Since [latex]CL = 0.95[/latex], then [latex]\alpha = 1 - CL = 1 - 0.95 = 0.05 \left(\frac{\alpha }{2}\right)= 0.025[/latex].
Then [latex]{z}_{\frac{\alpha }{2}}={z}_{0.025}=1.96[/latex]
[latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}=\left(1.96\right)\sqrt{\frac{\left(0.842\right)\left(0.158\right)}{500}}=0.032[/latex]
[latex]p^{\prime} - \text{EBP} =0.842 - 0.032 = 0.81[/latex]
[latex]{p}^{\prime }+\text{EBP} = 0.842 + 0.032 = 0.874[/latex]
The confidence interval for the true binomial population proportion is [latex](p^{\prime} - \text{EBP}, p^{\prime} + \text{EBP}) = (0.810, 0.874)[/latex].
Interpretation: We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.
Explanation of 95% Confidence Level
Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones. The confidence interval is (0.81003, 0.87397).
The confidence interval is (0.81003, 0.87397).
Your Turn!
Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.
Solution
(0.3315, 0.4525)
Example
For a class project, a political science student at a large university wants to estimate the percent of students who are registered voters. He surveys 500 students and finds that 300 are registered voters. Compute a 90% confidence interval for the true percent of students who are registered voters, and interpret the confidence interval.
Solution
[latex]x = 300[/latex] and [latex]n = 500[/latex]
[latex]{p}^{\prime }=\frac{x}{n}=\frac{300}{500}=0.600[/latex]
[latex]{q}^{\prime }=1-{p}^{\prime }=1-0.600=0.400[/latex]
Since [latex]\text{CL} = 0.90[/latex], then [latex]\alpha = 1 - \text{CL} = 1 - 0.90 = 0.10 \left(\frac{\alpha }{2}\right) = 0.05[/latex]
[latex]{z}_{\frac{\alpha }{2}} = z_{0.05} = 1.645[/latex]
[latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}=\left(1.645\right)\sqrt{\frac{\left(0.60\right)\left(0.40\right)}{500}}=0.036[/latex]
[latex]{p}^{\prime } - \text{EBP}=0.60-0.036=0.564[/latex]
[latex]{p}^{\prime }+ \text{EBP}=0.60+0.036=0.636[/latex]
The confidence interval for the true binomial population proportion is [latex](p^{\prime} - \text{EBP}, p^{\prime} + \text{EBP}) = (0.564,0.636)[/latex].
Interpretation
- We estimate with 90% confidence that the true percent of all students that are registered voters is between 56.4% and 63.6%.
- Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL students are registered voters.
Explanation of 90% Confidence Level Ninety percent of all confidence intervals constructed in this way contain the true value for the population percent of students that are registered voters.
Your Turn!
A student polls his school to see if students in the school district are for or against the new legislation regarding school uniforms. She surveys 600 students and finds that 480 are against the new legislation.
- Compute a 90% confidence interval for the true percent of students who are against the new legislation, and interpret the confidence interval.
- In a sample of 300 students, 68% said they own an iPod and a smartphone. Compute a 97% confidence interval for the true percent of students who own an iPod and a smartphone.
Solution
(0.7731, 0.8269); We estimate with 90% confidence that the true percent of all students in the district who are against the new legislation is between 77.31% and 82.69%.
Sixty-eight percent (68%) of students own an iPod and a smartphone.
[latex]{p}^{\prime }=0.68[/latex]
[latex]{q}^{\prime }=1 - {p}^{\prime }=1 - 0.68=0.32[/latex]
Since [latex]\text{CL} = 0.97[/latex], we know [latex]\alpha = 1 - 0.97 = 0.03[/latex] and [latex]\frac{\alpha }{2} = 0.015[/latex].
The area to the left of [latex]z_{0.015}[/latex] is 0.015, and the area to the right of [latex]z_{0.015}[/latex] is [latex]1 - 0.015 = 0.985[/latex].
[latex]{z}_{0.015}=2.17[/latex]
[latex]\text{EPB}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}=2.17\sqrt{\frac{0.68\left(0.32\right)}{300}}\approx 0.0584[/latex]
[latex]p^{\prime} - \text{EPB} = 0.68 - 0.0584 = 0.6216[/latex]
[latex]p^{\prime} + \text{EPB} = 0.68 + 0.0584 = 0.7384[/latex]
We are 97% confident that the true proportion of all students who own an iPod and a smartphone is between 0.6216 and 0.7384.
Your Turn!
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.
Let [latex]X = \text{the number of people in the sample who have cell phones}[/latex]. [latex]X[/latex] is binomial, [latex]X \sim B(500, 421500)[/latex].
To calculate the confidence interval, you must find [latex]p^{\prime}[/latex], [latex]q^{\prime}[/latex], and [latex]\text{EBP}[/latex].
[latex]n = 500[/latex]
[latex]x = \text{the number of successes} = 421[/latex]
[latex]p^{\prime}= \frac{x}{n}=421500=0.842[/latex]
[latex]p^{\prime} = 0.842[/latex] is the sample proportion; this is the point estimate of the population proportion.
[latex]q^{\prime} = 1 - p^{\prime}= 1 - 0.842 = 0.158[/latex]
Since [latex]\text{CL} = 0.95[/latex], then [latex]\alpha = 1 - \text{CL} = 1 - 0.95 = 0.05 (\frac{\alpha}{2}) = 0.025[/latex].
Then [latex]z_{\frac{\alpha}{2}} = z_{0.025}=1.96[/latex].
[latex]\text{EBP}=(z_{\frac{\alpha}{2}}) \sqrt{\frac{p^{\prime}q^{\prime}}{n}} = (1.96) \sqrt{\frac{(0.842)(0.158)}{500}} = 0.032[/latex]
[latex]p^{\prime} - EBP=0.842 - 0.032=0.81[/latex]
[latex]p^{\prime}+EBP=0.842+0.032=0.874[/latex]
The confidence interval for the true binomial population proportion is [latex](p^{\prime} - \text{EBP}, p^{\prime}+ \text{EBP}) = (0.810, 0.874)[/latex].
Interpretation: We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.
Explanation of 95% Confidence Level: Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.
“Plus Four” Confidence Interval for p
There is a certain amount of error introduced into the process of calculating a confidence interval for a proportion. Because we do not know the true proportion for the population, we are forced to use point estimates to calculate the appropriate standard deviation of the sampling distribution. Studies have shown that the resulting estimation of the standard deviation can be flawed.
Fortunately, there is a simple adjustment that allows us to produce more accurate confidence intervals. We simply pretend that we have four additional observations. Two of these observations are successes and two are failures. The new sample size, then, is [latex]n + 4[/latex], and the new count of successes is [latex]x + 2[/latex].
Computer studies have demonstrated the effectiveness of this method. It should be used when the confidence level desired is at least 90% and the sample size is at least ten.
Example
A random sample of 25 statistics students was asked: “Have you smoked a cigarette in the past week?” Six students reported smoking within the past week. Use the “plus-four” method to find a 95% confidence interval for the true proportion of statistics students who smoke.
Solution
Six students out of 25 reported smoking within the past week, so [latex]x = 6[/latex] and [latex]n = 25[/latex]. Because we are using the “plus-four” method, we will use [latex]x = 6 + 2 = 8[/latex] and [latex]n = 25 + 4 = 29[/latex].
[latex]{p}^{\prime }=\frac{x}{n}=\frac{8}{29}\approx 0.276[/latex]
[latex]{q}^{\prime }=1 - {p}^{\prime }=1 - 0.276=0.724[/latex]
Since [latex]\text{CL} = 0.95[/latex], we know [latex]\alpha = 1 - 0.95 = 0.05[/latex] and [latex]\frac{\alpha}{2} = 0.025[/latex].
[latex]{z}_{0.025}=1.96[/latex]
[latex]\text{EPB}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}=\left(1.96\right)\sqrt{\frac{0.276\left(0.724\right)}{29}}\approx 0.163[/latex]
[latex]p^{\prime} - \text{EPB} = 0.276 - 0.163 = 0.113[/latex]
[latex]p^{\prime} + \text{EPB} = 0.276 + 0.163 = 0.439[/latex]
We are 95% confident that the true proportion of all statistics students who smoke cigarettes is between 0.113 and 0.439.
Your Turn!
Out of a random sample of 65 freshmen at State University, 31 students have declared a major. Use the “plus-four” method to find a 96% confidence interval for the true proportion of freshmen at State University who have declared a major.
Solution
Using “plus four,” we have [latex]x = 31 + 2 = 33[/latex] and [latex]n = 65 + 4 = 69[/latex].
[latex]{p}^{\prime}=\frac{33}{69}\approx 0.478[/latex]
[latex]{q}^{\prime }=1 - {p}^{\prime }=1 - 0.478=0.522[/latex]
Since [latex]\text{CL} = 0.96[/latex], we know [latex]\alpha = 1 - 0.96 = 0.04[/latex] and [latex]\frac{\alpha }{2} = 0.02[/latex].
[latex]{z}_{0.02}=2.054[/latex]
[latex]\text{EPB}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}=\left(2.054\right)\left(\sqrt{\frac{\left(0.478\right)\left(0.522\right)}{69}}\right) \approx 0.124[/latex]
[latex]p^{\prime} - \text{EPB} = 0.478 - 0.124 = 0.354[/latex]
[latex]p^{\prime} + \text{EPB} = 0.478 + 0.124 = 0.602[/latex]
We are 96% confident that between 35.4% and 60.2% of all freshmen at State U have declared a major.
Example
The Berkman Center for Internet & Society at Harvard recently conducted a study analyzing the privacy management habits of teen internet users. In a group of 50 teens, 13 reported having more than 500 friends on Facebook. Use the “plus four” method to find a 90% confidence interval for the true proportion of teens who would report having more than 500 Facebook friends.
Solution
Using “plus-four,” we have [latex]x = 13 + 2 = 15[/latex] and [latex]n = 50 + 4 = 54[/latex].
[latex]{p}^{\prime}=\frac{15}{54}\approx 0.278[/latex]
[latex]{q}^{prime}=1–{p}^{\text{'}}=1-0.241=0.722[/latex]
Since [latex]\text{CL} = 0.90[/latex], we know [latex]\alpha = 1 – 0.90 = 0.10[/latex] and [latex]\frac{\alpha }{2} = 0.05[/latex].
[latex]{z}_{0.05}=1.645[/latex]
[latex]\text{EPB}=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}\right)=\left(1.645\right)\left(\sqrt{\frac{\left(0.278\right)\left(0.722\right)}{54}}\right)\approx 0.100[/latex]
[latex]p^{\prime} - \text{EPB} = 0.278 - 0.100 = 0.178[/latex]
[latex]p^{\prime} + \text{EPB} = 0.278 + 0.100 = 0.378[/latex]
We are 90% confident that between 17.8% and 37.8% of all teens would report having more than 500 friends on Facebook.
Your Turn!
The Berkman Center Study referenced in the previous example talked to teens in smaller focus groups, but also interviewed additional teens over the phone. When the study was complete, 588 teens had answered the question about their Facebook friends with 159 saying that they have more than 500 friends. Use the “plus-four” method to find a 90% confidence interval for the true proportion of teens that would report having more than 500 Facebook friends based on this larger sample. Compare the results to those in the previous example.
Solution
Using “plus-four,” we have [latex]x = 159 + 2 = 161[/latex] and [latex]n = 588 + 4 = 592[/latex].
[latex]{p}^{\prime }= \frac{161}{592} \approx 0.272[/latex]
[latex]{q}^{\prime }=1 - {p}^{\prime }=1 - 0.272=0.728[/latex]
Since [latex]\text{CL} = 0.90[/latex], we know [latex]\alpha = 1 - 0.90 = 0.10[/latex] and [latex]\frac{\alpha }{2} = 0.05[/latex].
[latex]\text{EPB}=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{{p}^{\text{'}}{q}^{\text{'}}}{n}}\right)=\left(1.645\right)\left(\sqrt{\frac{\left(0.272\right)\left(0.728\right)}{592}}\right)\approx 0.030[/latex]
[latex]p^{\prime} - \text{EPB} = 0.272 – 0.030 = 0.242[/latex]
[latex]p^{\prime} + \text{EPB} = 0.272 + 0.030 = 0.302[/latex]
We are 90% confident that between 24.2% and 30.2% of all teens would report having more than 500 friends on Facebook.
Conclusion: The confidence interval for the larger sample is narrower than the interval from the previous example. Larger samples will always yield more precise confidence intervals than smaller samples. The “plus four” method has a greater impact on the smaller sample. It shifts the point estimate from 0.26 ([latex]\frac{13}{50}[/latex]) to 0.278 ([latex]\frac{15}{54}[/latex]). It has a smaller impact on the EPB, changing it from 0.102 to 0.100. In the larger sample, the point estimate undergoes a smaller shift: from 0.270 [latex](\frac{159}{588})[/latex] to 0.272 [latex](\frac{161}{592})[/latex]. It is easy to see that the plus-four method has the greatest impact on smaller samples.
The Sample Size n
If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size.
The error bound formula for a population proportion is
- [latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}\right)[/latex]
- Solving for [latex]n[/latex] gives you an equation for the sample size.
- [latex]n = \frac{ \left( z_{\frac{\alpha}{2}} \right)^{2} \left( {p}^{\prime} {q}^{\prime} \right)}{\text{EBP}^{2}}[/latex]
Example
Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the company survey in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of customers aged 50+ who use text messaging on their cell phones.
Solution
From the problem, we know that [latex]\text{EBP} = 0.03[/latex] [latex](3 \%=0.03)[/latex] and
[latex]{z}_{\frac{\alpha }{2}} = z_0.05 = 1.645[/latex] because the confidence level is 90%.
However, in order to find [latex]n[/latex], we need to know the estimated (sample) proportion [latex]p^{\prime}[/latex]. Remember that [latex]q^{\prime} = 1 - p^{\prime}[/latex]. But, we do not know [latex]p^{\prime}[/latex] yet. Since we multiply [latex]p^{\prime}[/latex] and [latex]q^{\prime}[/latex] together, we make them both equal to 0.5 because [latex]p^{\prime} q^{\prime} = (0.5)(0.5) = 0.25[/latex] results in the largest possible product. (Try other products: [latex](0.6)(0.4) = 0.24[/latex]; [latex](0.3)(0.7) = 0.21[/latex]; [latex](0.2)(0.8) = 0.16[/latex] and so on). The largest possible product gives us the largest [latex]n[/latex]. This gives us a large enough sample so that we can be 90% confident that we are within three percentage points of the true population proportion. To calculate the sample size [latex]n[/latex], use the formula and make the substitutions.
[latex]n = \frac{{z}^{2} {p}^{\prime} {q}^{\prime }}{\text{EBP}^{2}}[/latex] gives [latex]n=\frac{{1.645}^{2}\left(0.5\right)\left(0.5\right)}{{0.03}^{2}}=751.7[/latex]
Round the answer to the next higher value. The sample size should be 752 cell phone customers aged 50+ in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of all customers aged 50+ who use text messaging on their cell phones.
Your Turn!
Suppose an internet marketing company wants to determine the current percentage of customers who click on ads on their smartphones. How many customers should the company survey in order to be 90% confident that the estimated proportion is within five percentage points of the true population proportion of customers who click on ads on their smartphones?
Solution
271 customers should be surveyed. Check the Real Estate section in your local
Your Turn!
Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the company survey in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of customers aged 50+ who use text messaging on their cell phones?
Solution
From the problem, we know that [latex]\text{EBP} = 0.03 (3 \%=0.03)[/latex] and [latex]{z}_{\frac{\alpha}{2}} = {z}_{0.05} = 1.645[/latex] because the confidence level is 90%.
[latex]n=(\frac{z}{\text{EBP}})^{2} p^{\prime} q^{\prime}[/latex] gives [latex]n=(\frac{1.645}{0.03})^{2}(0.5)(0.5)[/latex]
Round the answer to the next higher value. The sample size should be 752 cell phone customers aged 50+ in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of all customers aged 50+ who use text messaging on their cell phones.
Videos
Below are helpful videos for the content covered in this Section. Videos are provided from YouTube.
- Finding a Confidence Interval of a Population Proportion
- Lesson: Calculate a Confidence Interval for a Population Proportion
- Calculate a Confidence Interval for a Population Proportion (Voters)
- Calculate a Confidence Interval for a Population Proportion (Basic)
- Calculate a Confidence Interval for a Population Proportion (Plus Four Method)
- Determine a Sample Size of a Population Proportion
Section 7.3 Review
Some statistical measures, like many survey questions, measure qualitative rather than quantitative data. In this case, the population parameter being estimated is a proportion. It is possible to create a confidence interval for the true population proportion following procedures similar to those used in creating confidence intervals for population means. The formulas are slightly different, but they follow the same reasoning.
Let [latex]p^{\prime}[/latex] represent the sample proportion, [latex]\frac{x}{n}[/latex], where [latex]x[/latex] represents the number of successes and [latex]n[/latex] represents the sample size. Let [latex]q^{\prime} = 1 – p^{\prime}[/latex]. Then the confidence interval for a population proportion is given by the following formula: [latex](\text{lower bound, upper bound}) = \left({p}^{\prime}–EBP,{p}^{\prime}+EBP\right)= \left({p}^{\prime}–z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}}, {p}^{\prime}+z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}}\right)[/latex]
The “plus four” method for calculating confidence intervals is an attempt to balance the error introduced by using estimates of the population proportion when calculating the standard deviation of the sampling distribution. Simply imagine four additional trials in the study; two are successes and two are failures. Calculate [latex]{p}^{\prime }=\frac{x+2}{n+4}[/latex], and proceed to find the confidence interval. When sample sizes are small, this method has been demonstrated to provide more accurate confidence intervals than the standard formula used for larger samples.
Formula Review
- [latex]p^{\prime}= \frac{x}{n}[/latex] where [latex]x[/latex] represents the number of successes and [latex]n[/latex] represents the sample size. The variable [latex]p^{\prime}[/latex] is the sample proportion and serves as the point estimate for the true population proportion.
- [latex]q^{\prime} = 1 – p^{\prime}[/latex]
- [latex]{p}^{\prime }\sim N\left(p,\sqrt{\frac{pq}{n}}\right)[/latex] The variable p′ has a binomial distribution that can be approximated with the normal distribution shown here.
- [latex]\text{EBP} =[/latex] the error bound for a proportion = [latex]{z}_{\frac{\alpha }{2}}\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}[/latex]
- Confidence interval for a proportion:
- [latex](\text{lower bound, upper bound}) = \left({p}^{\prime}– \text{EBP}, \text{ }{p}^{\prime}+\text{EBP}\right)=\left({p}^{\prime}–z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}},\text{ } {p}^{\prime}+z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}}\right)[/latex]
- [latex]n=\frac{{z}_{\frac{\alpha}{2}}{}^{2}{p}^{\prime}{q}^{\prime}}{EB{P}^{2}}[/latex] provides the number of participants needed to estimate the population proportion with confidence [latex]1 - \alpha[/latex] and margin of error [latex]\text{EBP}[/latex].
- Use the normal distribution for a single population proportion [latex]p\prime=\frac{x}{n}[/latex]
- [latex]\text{EBP}=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{p\prime q\prime }{n}}; p\prime +q\prime =1[/latex]
- The confidence interval has the format [latex](p^{\prime} – \text{EBP}, p^{\prime} + \text{EBP})[/latex].
- [latex]\overline{x}[/latex] is a point estimate for [latex]\mu[/latex]
- [latex]p^{\prime}[/latex] is a point estimate for [latex]p[/latex]
- [latex]s[/latex] is a point estimate for [latex]\sigma[/latex]
Section 7.3 Practice
Marketing companies are interested in knowing the population percent of women who make the majority of household purchasing decisions.
- When designing a study to determine this population proportion, what is the minimum number you would need to survey to be 90% confident that the population proportion is estimated to within 0.05?
- If it were later determined that it was important to be more than 90% confident and a new survey were commissioned, how would it affect the minimum number you need to survey? Why?
Suppose the marketing company did do a survey. They randomly surveyed 200 households and found that in 120 of them, the woman made the majority of the purchasing decisions. We are interested in the population proportion of households where women make the majority of the purchasing decisions.
- Identify the following:
- [latex]x =\underline{\hspace{2cm}}[/latex]
- [latex]n =\underline{\hspace{2cm}}[/latex]
- [latex]p^{\prime} =\underline{\hspace{2cm}}[/latex]
- Define the random variables [latex]X[/latex] and [latex]P^{\prime}[/latex] in words.
- Which distribution should you use for this problem?
- Construct a 95% confidence interval for the population proportion of households where the women make the majority of the purchasing decisions. State the confidence interval, sketch the graph, and calculate the error bound.
- List two difficulties the company might have in obtaining random results, if this survey were done by email.
Of 1,050 randomly selected adults, 360 identified themselves as manual laborers, 280 identified themselves as non-manual wage earners, 250 identified themselves as mid-level managers, and 160 identified themselves as executives. In the survey, 82% of manual laborers preferred trucks, 62% of non-manual wage earners preferred trucks, 54% of mid-level managers preferred trucks, and 26% of executives preferred trucks.
- We are interested in finding the 95% confidence interval for the percent of executives who prefer trucks. Define random variables [latex]X[/latex] and [latex]P^{\prime}[/latex] in words.
- Which distribution should you use for this problem?
- Construct a 95% confidence interval. State the confidence interval, sketch the graph, and calculate the error bound.
- Suppose we want to lower the sampling error. What is one way to accomplish that?
- The sampling error given in the survey is [latex]\pm 2 \%[/latex]. Explain what the [latex]\pm 2 \%[/latex] means.
A poll of 1,200 voters asked what the most significant issue was in the upcoming election. Sixty-five percent answered the economy. We are interested in the population proportion of voters who feel the economy is the most important.
- Define the random variable [latex]X[/latex] in words.
- Define the random variable [latex]P^{\prime}[/latex] in words.
- Which distribution should you use for this problem?
- Construct a 90% confidence interval, and state the confidence interval and the error bound.
- What would happen to the confidence interval if the level of confidence were 95%?
The Ice Chalet offers dozens of different beginning ice-skating classes. All of the class names are put into a bucket. The 5 P.M., Monday night, ages 8 to 12, beginning ice-skating class was picked. In that class were 64 girls and 16 boys. Suppose that we are interested in the true proportion of girls, ages 8 to 12, in all beginning ice-skating classes at the Ice Chalet. Assume that the children in the selected class are a random sample of the population.
- What is being counted?
- In words, define the random variable X.
- Calculate the following:
- [latex]x = \underline{\hspace{2cm}}[/latex]
- [latex]n = \underline{\hspace{2cm}}[/latex]
- [latex]p^{\prime} = \underline{\hspace{2cm}}[/latex]
- State the estimated distribution of [latex]X[/latex]. [latex]X\sim \underline{\hspace{2cm}}[/latex]
- Define a new random variable [latex]P^{\prime}[/latex]. What is [latex]p^{\prime}[/latex] estimating?
- In words, define the random variable [latex]P^{\prime}[/latex].
- State the estimated distribution of [latex]P^{\prime}[/latex]. Construct a 92% Confidence Interval for the true proportion of girls in the ages 8 to 12 beginning ice-skating classes at the Ice Chalet.
- How much area is in both tails (combined)?
- How much area is in each tail?
- Calculate the following:
- lower limit
- upper limit
- error bound
- The 92% confidence interval is [latex]\underline{\hspace{2cm}}[/latex].
- Fill in the blanks on the graph with the areas, upper and lower limits of the confidence interval, and the sample proportion.
- In one complete sentence, explain what the interval means.
- Using the same [latex]p^{\prime}[/latex] and level of confidence, suppose that n was increased to 100. Would the error bound become larger or smaller? How do you know?
- Using the same [latex]p^{\prime}[/latex] and [latex]n = 80[/latex], how would the error bound change if the confidence level were increased to 98%? Why?
- If you decrease the allowable error bound, why would the minimum sample size increase (keeping the same level of confidence)?
The image below is to assist with confidence intervals. Remember, a normal curve has a peak at [latex]\overline{x}[/latex], points [latex]\overline{x}-\text{EBM}[/latex] and [latex]\overline{x}+\text{EBM}[/latex] labeled with each unshaded tail has area a [latex]\frac{\alpha }{2}[/latex].

Insurance companies are interested in knowing the population percentage of drivers who always buckle up before riding in a car.
- When designing a study to determine this population proportion, what is the minimum number you would need to survey to be 95% confident that the population proportion is estimated to within 0.03?
- If it were later determined that it was important to be more than 95% confident and a new survey was commissioned, how would that affect the minimum number you would need to survey? Why?
Solution
- 1,068
- The sample size would need to be increased since the critical value increases as the confidence level increases.
Suppose that the insurance companies did do a survey. They randomly surveyed 400 drivers and found that 320 claimed they always buckle up. We are interested in the population proportion of drivers who claim they always buckle up.
- Identify the following:
- [latex]x = \underline{\hspace{2cm}}[/latex]
- [latex]n = \underline{\hspace{2cm}}[/latex]
- [latex]p^{\prime} = \underline{\hspace{2cm}}[/latex]
- Define the random variables [latex]X[/latex] and [latex]P^{\prime}[/latex], in words.
- Which distribution should you use for this problem? Explain your choice.
- Construct a 95% confidence interval for the population proportion who claim they always buckle up.
- State the confidence interval.
- Sketch the graph.
- Calculate the error bound.
- If this survey were done by telephone, list three difficulties the companies might have in obtaining random results.
According to a recent survey of 1,200 people, 61% feel that the president is doing an acceptable job. We are interested in the population proportion of people who feel the president is doing an acceptable job.
- Define the random variables [latex]X[/latex] and [latex]P^{\prime}[/latex] in words.
- Which distribution should you use for this problem? Explain your choice.
- Construct a 90% confidence interval for the population proportion of people who feel the president is doing an acceptable job.
- State the confidence interval.
- Sketch the graph.
- Calculate the error bound.
Solution
-
[latex]X =[/latex] the number of people who feel that the president is doing an acceptable job; [latex]P^{\prime}[/latex] = the proportion of people in a sample who feel that the president is doing an acceptable job.
- [latex]N\left(0.61,\sqrt{\frac{\left(0.61\right)\left(0.39\right)}{1200}}\right)[/latex]
- Construct a 90% confidence interval
- [latex]\text{CI}: (0.59, 0.63)[/latex]
- Check student’s solution
- [latex]\text{EBM}: 0.02[/latex]
A telephone poll of 1,000 adult Americans was reported in an issue of Time Magazine. One of the questions asked was “What is the main problem facing the country?” Twenty percent answered “crime.” We are interested in the population proportion of adult Americans who feel that crime is the main problem.
- Define the random variables [latex]X[/latex] and [latex]P^{\prime}[/latex] in words.
- Which distribution should you use for this problem? Explain your choice.
- Construct a 95% confidence interval for the population proportion of adult Americans who feel that crime is the main problem.
- State the confidence interval.
- Sketch the graph.
- Calculate the error bound.
- Suppose we want to lower the sampling error. What is one way to accomplish that?
- The sampling error given by Yankelovich Partners; Inc. (which conducted the poll) is [latex]\pm 3 \%[/latex]. In one to three complete sentences, explain what the [latex]\pm 3 \%[/latex] represents.
Solution
- [latex]X =[/latex] the number of adult Americans who feel that crime is the main problem; [latex]P^{\prime}[/latex]= the proportion of adult Americans who feel that crime is the main problem
- Since we are estimating a proportion, given [latex]P^{\prime} = 0.2[/latex] and [latex]n = 1000[/latex], the distribution we should use is [latex]N\left(0.2,\sqrt{\frac{\left(0.2\right)\left(0.8\right)}{1000}}\right)[/latex].
- Construct a 95% confidence interval
- [latex]\text{CI}: (0.18, 0.22)[/latex]
- Check student’s solution.
- [latex]\text{EBM}: 0.02[/latex]
- One way to lower the sampling error is to increase the sample size.
- The stated “[latex]\pm 3 \%[/latex]” represents the maximum error bound. This means that those doing the study are reporting a maximum error of 3%. Thus, they estimate the percentage of adult Americans who feel that crime is the main problem to be between 18% and 22%.
According to a Field Poll, 79% of California adults (actual results are 400 out of 506 surveyed) feel that “education and our schools” is one of the top issues facing California. We wish to construct a 90% confidence interval for the true proportion of California adults who feel that education and the schools is one of the top issues facing California.
- A point estimate for the true population proportion is [latex]\underline{\hspace{2cm}}[/latex]
- A 90% confidence interval for the population proportion is [latex]\underline{\hspace{2cm}}[/latex].
- The error bound is approximately [latex]\underline{\hspace{2cm}}[/latex].
Five hundred and eleven (511) homes in a certain southern California community are randomly surveyed to determine if they meet minimal earthquake preparedness recommendations. One hundred seventy-three (173) of the homes surveyed met the minimum recommendations for earthquake preparedness, and 338 did not.
- Find the confidence interval at the 90% Confidence Level for the true population proportion of southern California community homes meeting at least the minimum recommendations for earthquake preparedness.
- The point estimate for the population proportion of homes that do not meet the minimum recommendations for earthquake preparedness is [latex]\underline{\hspace{2cm}}[/latex].
On May 23, 2013, Gallup reported that of the 1,005 people surveyed, 76% of U.S. workers believe that they will continue working past retirement age. The confidence level for this study was reported at 95% with [latex]a \pm 3 \%[/latex] margin of error.
- Determine the estimated proportion from the sample.
- Determine the sample size.
- Identify [latex]\text{CL}[/latex] and [latex]\alpha[/latex].
- Calculate the error bound based on the information provided.
- Compare the error bound in part d to the margin of error reported by Gallup. Explain any differences between the values.
- Create a confidence interval for the results of this study.
- A reporter is covering the release of this study for a local news station. How should she explain the confidence interval to her audience?
A national survey of 1,000 adults was conducted on May 13, 2013 by Rasmussen Reports. It concluded with 95% confidence that 49% to 55% of Americans believe that big-time college sports programs corrupt the process of higher education.
- Find the point estimate and the error bound for this confidence interval.
- Can we (with 95% confidence) conclude that more than half of all American adults believe this?
- Use the point estimate from part a and n = 1,000 to calculate a 75% confidence interval for the proportion of American adults that believe that major college sports programs corrupt higher education.
- Can we (with 75% confidence) conclude that at least half of all American adults believe this?
Solution
- [latex]p^{\prime} = \frac{(0.55 + 0.49)}{2} = 0.52[/latex]; [latex]\text{EBP} = 0.55 - 0.52 = 0.03[/latex]
- No, the confidence interval includes values less than or equal to 0.50. It is possible that less than half of the population believe this.
- [latex]\text{CL} = 0.75[/latex], so [latex]\alpha = 1 – 0.75 = 0.25[/latex] and [latex]\frac{\alpha }{2}=0.125; {z}_{\frac{\alpha }{2}}=1.150[/latex]. (The area to the right of this z is 0.125, so the area to the left is [latex]1- 0.125 = 0.875[/latex].)
[latex]\text{EBP} = \left(1.150\right)\sqrt{\frac{0.52\left(0.48\right)}{1,000}}\approx 0.018[/latex]
[latex](p^{\prime} - \text{EBP}, p^{\prime} + \text{EBP}) = (0.52 – 0.018, 0.52 + 0.018) = (0.502, 0.538)[/latex]
Answer is (0.502, 0.538)
- Yes – this interval does not fall less than 0.50 so we can conclude that at least half of all American adults believe that major sports programs corrupt education – but we do so with only 75% confidence.
Public Policy Polling recently conducted a survey asking adults across the U.S. about music preferences. When asked, 80 of the 571 participants admitted that they have illegally downloaded music.
- Create a 99% confidence interval for the true proportion of American adults who have illegally downloaded music.
- This survey was conducted through automated telephone interviews on May 6 and 7, 2013. The error bound of the survey compensates for sampling error, or natural variability among samples. List some factors that could affect the survey’s outcome that are not covered by the margin of error.
- Without performing any calculations, describe how the confidence interval would change if the confidence level changed from 99% to 90%.
You plan to conduct a survey on your college campus to learn about the political awareness of students. You want to estimate the true proportion of college students on your campus who voted in the 2012 presidential election with 95% confidence and a margin of error no greater than five percent. How many students must you interview?
Solution
[latex]\text{CL} = 0.95[/latex]
[latex]\alpha = 1 – 0.95 = 0.05[/latex]
[latex]\frac{\alpha }{2} = 0.025[/latex]
[latex]{z}_{\frac{\alpha }{2}} = 1.96[/latex].
Use [latex]p^{\prime} = q^{\prime} = 0.5[/latex].
[latex]n=\frac{{z_{\frac{\alpha}{2}}}^2 {p'} {q'}}{EBP^2}= \frac{{1.96}^2\left(0.5\right)\left(0.5\right)}{{0.05}^2}=384.16[/latex]
You need to interview at least 385 students to estimate the proportion to within 5% at 95% confidence.
In a recent Zogby International Poll, nine of 48 respondents rated the likelihood of a terrorist attack in their community as “likely” or “very likely.” Use the “plus four” method to create a 97% confidence interval for the proportion of American adults who believe that a terrorist attack in their community is likely or very likely. Explain what this confidence interval means in the context of the problem.
References
Jensen, Tom. “Democrats, Republicans Divided on Opinion of Music Icons.” Public Policy Polling. Available online at http://www.publicpolicypolling.com/Day2MusicPoll.pdf (accessed July 2, 2013).
Madden, Mary, Amanda Lenhart, Sandra Coresi, Urs Gasser, Maeve Duggan, Aaron Smith, and Meredith Beaton. “Teens, Social Media, and Privacy.” PewInternet, 2013. Available online at http://www.pewinternet.org/Reports/2013/Teens-Social-Media-And-Privacy.aspx (accessed July 2, 2013).
Prince Survey Research Associates International. “2013 Teen and Privacy Management Survey.” Pew Research Center: Internet and American Life Project. Available online at http://www.pewinternet.org/~/media//Files/Questionnaire/2013/Methods%20and%20Questions_Teens%20and%20Social%20Media.pdf (accessed July 2, 2013).
Saad, Lydia. “Three in Four U.S. Workers Plan to Work Past Retirement Age: Slightly more say they will do this by choice rather than necessity.” Gallup® Economy, 2013. Available online at http://www.gallup.com/poll/162758/three-four-workers-plan-work-past-retirement-age.aspx (accessed July 2, 2013).
The Field Poll. Available online at http://field.com/fieldpollonline/subscribers/ (accessed July 2, 2013).
Zogby. “New SUNYIT/Zogby Analytics Poll: Few Americans Worry about Emergency Situations Occurring in Their Community; Only one in three have an Emergency Plan; 70% Support Infrastructure ‘Investment’ for National Security.” Zogby Analytics, 2013. Available online at http://www.zogbyanalytics.com/news/299-americans-neither-worried-nor-prepared-in-case-of-a-disaster-sunyit-zogby-analytics-poll (accessed July 2, 2013).
“52% Say Big-Time College Athletics Corrupt Education Process.” Rasmussen Reports, 2013. Available online at http://www.rasmussenreports.com/public_content/lifestyle/sports/may_2013/52_say_big_time_college_athletics_corrupt_education_process (accessed July 2, 2013).
the margin of error; depends on the confidence level, sample size, and the estimated (from the sample) proportion of successes.
A discrete random variable which arises from Bernoulli trials; there are a fixed number, n, of independent trials with two outcomes called success and failure with probability p and q respectively. The binomial random variable X is the number of successes in n trials, denoted [latex]X \sim B(n,p)[/latex]. The mean is [latex]\mu = np[\latex] and the standard deviation is [latex]\sigma = \sqrt{npq}[/latex]. The probability of exactly [latex]x[/latex] successes in n trials is [latex]P\left(X=x\right)=\left(\genfrac{}{}{0}{}{n}{x}\right){p}^{x}{q}^{n-x}[/latex].