Chapter 7: Confidence Intervals
Chapter 7 Review
7.1 Chapter Review
In this module, we learned how to calculate the confidence interval for a single population mean where the population standard deviation is known. When estimating a population mean, the margin of error is called the error bound for a population mean (EBM). A confidence interval has the general form:
(lower bound, upper bound) = (point estimate – EBM, point estimate + EBM)
The calculation of EBM depends on the size of the sample and the level of confidence desired. The confidence level is the percent of all possible samples that can be expected to include the true population parameter. As the confidence level increases, the corresponding EBM increases as well. As the sample size increases, the EBM decreases. By the central limit theorem,
[latex]EBM=z\frac{\sigma }{\sqrt{n}}[/latex]
Given a confidence interval, you can work backwards to find the error bound (EBM) or the sample mean. To find the error bound, find the difference of the upper bound of the interval and the mean. If you do not know the sample mean, you can find the error bound by calculating half the difference of the upper and lower bounds. To find the sample mean given a confidence interval, find the difference of the upper bound and the error bound. If the error bound is unknown, then average the upper and lower bounds of the confidence interval to find the sample mean.
Sometimes researchers know in advance that they want to estimate a population mean within a specific margin of error for a given level of confidence. In that case, solve the EBM formula for n to discover the size of the sample that is needed to achieve this goal:
[latex]n=\frac{{z}^{2}{\sigma}^{2}}{EB{M}^{2}}[/latex]
Formula Review
[latex]\overline{X}\sim N\left({\mu }_{X},\frac{\sigma }{\sqrt{n}}\right)[/latex] The distribution of sample means is normally distributed with mean equal to the population mean and standard deviation given by the population standard deviation divided by the square root of the sample size.
The general form for a confidence interval for a single population mean, known standard deviation, normal distribution is given by
(lower bound, upper bound) = (point estimate – EBM, point estimate + EBM)
= [latex]\left(\overline{x}-EBM,\overline{x}+EBM\right)[/latex]
= [latex]\left(\overline{x}-z\frac{\sigma }{\sqrt{n}},\overline{x}+z\frac{\sigma }{\sqrt{n}}\right)[/latex]
EBM = [latex]z\frac{\sigma }{\sqrt{n}}[/latex] = the error bound for the mean, or the margin of error for a single population mean; this formula is used when the population standard deviation is known.
CL = confidence level, or the proportion of confidence intervals created that are expected to contain the true population parameter
α = 1 – CL = the proportion of confidence intervals that will not contain the population parameter
[latex]{z}_{\frac{\alpha }{2}}[/latex] = the z-score with the property that the area to the right of the z-score is [latex]\frac{\propto}{2}[/latex] this is the z-score used in the calculation of “EBM where α = 1 – CL.
n = [latex]\frac{{z}^{2}{\sigma }^{2}}{EB{M}^{2}}[/latex] = the formula used to determine the sample size (n) needed to achieve a desired margin of error at a given level of confidence
General form of a confidence interval
(lower value, upper value) = (point estimate−error bound, point estimate + error bound)
To find the error bound when you know the confidence interval
error bound = upper value−point estimate OR error bound = [latex]\frac{\text{upper value}-\text{lower value}}{2}[/latex]
Single Population Mean, Known Standard Deviation, Normal Distribution
Use the Normal Distribution for Means, Population Standard Deviation is Known EBM = z[latex]\frac{\alpha }{2}\cdot \frac{\sigma }{\sqrt{n}}[/latex]
The confidence interval has the format ([latex]\overline{x}[/latex] − EBM, [latex]\overline{x}[/latex] + EBM).
Use the following information to answer the next five practice exercises:
7.2 Chapter Review
In many cases, the researcher does not know the population standard deviation, σ, of the measure being studied. In these cases, it is common to use the sample standard deviation, s, as an estimate of σ. The normal distribution creates accurate confidence intervals when σ is known, but it is not as accurate when s is used as an estimate. In this case, the Student’s t-distribution is much better. Define a t-score using the following formula:
t = [latex]\frac{\overline{x}–\mu }{\left(\frac{s}{\sqrt{n}}\right)}[/latex],
The t-score follows the Student’s t-distribution with n – 1 degrees of freedom. The confidence interval under this distribution is calculated with EBM = [latex]\left({t}_{\frac{\alpha }{2}}\right)\frac{s}{\sqrt{n}}[/latex] where [latex]{t}_{\frac{\alpha }{2}}[/latex] is the t-score with area to the right equal to [latex]\frac{\alpha }{2}[/latex], s is the sample standard deviation, and n is the sample size. Use a table, calculator, or computer to find [latex]{t}_{\frac{\alpha }{2}}[/latex] for a given α.
Formula Review
s = the standard deviation of sample values.
[latex]t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}[/latex] is the formula for the t-score which measures how far away a measure is from the population mean in the Student’s t-distribution
df = n – 1; the degrees of freedom for a Student’s t-distribution where n represents the size of the sample
T~tdf the random variable, T, has a Student’s t-distribution with df degrees of freedom
[latex]EBM={t}_{\frac{\alpha }{2}}\frac{s}{\sqrt{n}}[/latex] = the error bound for the population mean when the population standard deviation is unknown
[latex]{t}_{\frac{\alpha }{2}}[/latex] is the t-score in the Student’s t-distribution with area to the right equal to [latex]\frac{\alpha }{2}[/latex]
The general form for a confidence interval for a single mean, population standard deviation unknown, Student’s t is given by (lower bound, upper bound)
= (point estimate – EBM, point estimate + EBM)
= [latex]\left(\overline{x}–\frac{ts}{\sqrt{n}},\overline{x}\text{+ }\frac{ts}{\sqrt{n}}\right)[/latex]
Use the following information to answer the next five practice exercises.
7.3 Chapter Review
Some statistical measures, like many survey questions, measure qualitative rather than quantitative data. In this case, the population parameter being estimated is a proportion. It is possible to create a confidence interval for the true population proportion following procedures similar to those used in creating confidence intervals for population means. The formulas are slightly different, but they follow the same reasoning.
Let p′ represent the sample proportion, x/n, where x represents the number of successes and n represents the sample size. Let q′ = 1 – p′. Then the confidence interval for a population proportion is given by the following formula:
(lowerbound,upperbound)[latex]=\left({p}^{\prime}–EBP,{p}^{\prime}+EBP\right)= \left({p}^{\prime}–z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}},{p}^{\prime}+z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}}\right)[/latex]
The “plus four” method for calculating confidence intervals is an attempt to balance the error introduced by using estimates of the population proportion when calculating the standard deviation of the sampling distribution. Simply imagine four additional trials in the study; two are successes and two are failures. Calculate [latex]{p}^{\prime }=\frac{x+2}{n+4}[/latex], and proceed to find the confidence interval. When sample sizes are small, this method has been demonstrated to provide more accurate confidence intervals than the standard formula used for larger samples.
Formula Review
p′ = x / n where x represents the number of successes and n represents the sample size. The variable p′ is the sample proportion and serves as the point estimate for the true population proportion.
q′ = 1 – p′
[latex]{p}^{\prime }\sim N\left(p,\sqrt{\frac{pq}{n}}\right)[/latex] The variable p′ has a binomial distribution that can be approximated with the normal distribution shown here.
EBP = the error bound for a proportion = [latex]{z}_{\frac{\alpha }{2}}\sqrt{\frac{{p}^{\prime }{q}^{\prime }}{n}}[/latex]
Confidence interval for a proportion:
(lower bound, upper bound) [latex]=\left({p}^{\prime}–EBP,{p}^{\prime}+EBP\right)=\left({p}^{\prime}–z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}},{p}^{\prime}+z\sqrt{\frac{{p}^{\prime}{q}^{\prime}}{n}}\right)[/latex]
[latex]n=\frac{{z}_{\frac{\alpha}{2}}{}^{2}{p}^{\prime}{q}^{\prime}}{EB{P}^{2}}[/latex] provides the number of participants needed to estimate the population proportion with confidence 1 – α and margin of error EBP.
Use the normal distribution for a single population proportion [latex]p\prime=\frac{x}{n}[/latex]
[latex]EBP=\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{p\prime q\prime }{n}} p\prime +q\prime =1[/latex]
The confidence interval has the format (p′ – EBP, p′ + EBP).
[latex]\overline{x}[/latex] is a point estimate for μ
p′ is a point estimate for ρ
s is a point estimate for σ
Use the following information to answer the next two practice exercises:
a continuous random variable (RV) with pdf [latex]f\text{(}x\text{)}=\frac{1}{\sigma \sqrt{2\pi }}{e}^{–{\left(x–\mu \right)}^{2}/2{\sigma }^{2}}[/latex], where μ is the mean of the distribution and σ is the standard deviation, notation: X ~ N(μ,σ). If μ = 0 and σ = 1, the RV is called the standard normal distribution.
investigated and reported by William S. Gossett in 1908 and published under the pseudonym Student; the major characteristics of the random variable (RV) are: (1)It is continuous and assumes any real values; (2) The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution; (3) It approaches the standard normal distribution as n gets larger. There is a "family of t–distributions: each representative of the family is completely defined by the number of degrees of freedom, which is one less than the number of data.
A discrete random variable which arises from Bernoulli trials; there are a fixed number, n, of independent trials with two outcomes called success and failure with probability p and q respectively. The binomial random variable X is the number of successes in n trials, denoted [latex]X \sim B(n,p)[/latex]. The mean is [latex]\mu = np[\latex] and the standard deviation is [latex]\sigma = \sqrt{npq}[/latex]. The probability of exactly [latex]x[/latex] successes in n trials is [latex]P\left(X=x\right)=\left(\genfrac{}{}{0}{}{n}{x}\right){p}^{x}{q}^{n-x}[/latex].