Chapter 6: The Normal Distribution and The Central Limit Theorem

6.5 The Normal Approximation to the Binomial

Learning Objectives

By the end of this section, the student should be able to:

  • Approximate the binomial distribution using the normal distribution

 

The binomial formula is cumbersome when the sample size (n) is large, particularly when we consider a range of observations. Consider the following example.

Example

Approximately 15% of the US population smokes cigarettes. A local government believed their community had a lower smoker rate and commissioned a survey of 400 randomly selected individuals. The survey found that only 42 of the 400 participants smoke cigarettes. If the true proportion of smokers in the community was really 15%, what is the probability of observing 42 or fewer smokers in a sample of 400 people?

The computations in the previous example are tedious, long, and near impossible if you do not have access to technology. Luckily, we have discovered the binomcdf() function previously, so this problem with the technology is not horrible to do (Try it for yourself: binomcdf(400, 0.15, 42) = 0.0054).

In some cases we may use the normal distribution as an easier and faster way to estimate binomial probabilities. In general, we should avoid such work if an alternative method exists that is faster, easier, and still accurate. Recall that calculating probabilities of a range of values is much easier in the normal model. We might wonder, is it reasonable to use the normal model in place of the binomial distribution? Surprisingly, yes, if certain conditions are met.

 

Historical Note

Historically, being able to compute binomial probabilities was one of the most important applications of the central limit theorem. Binomial probabilities with a small value for n (say, 20) were displayed in a table in a book. To calculate the probabilities with large values of n, you had to use the binomial formula, which could be very complicated. Using the normal approximation to the binomial distribution simplified the process.

To compute the normal approximation to the binomial distribution, take a simple random sample from a population. You must meet the conditions for a binomial distribution:

  • there are a certain number n of independent trials
  • the outcomes of any trial are success or failure
  • each trial has the same probability of a success p

Recall that if X is the binomial random variable, then X ~ B(n, p). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five (np > 5 and nq > 5; the approximation is better if they are both greater than or equal to 10). Then the binomial can be approximated by the normal distribution with mean μ = np and standard deviation σ = [latex]\sqrt{npq}[/latex]. Remember that q = 1 – p.

 

Binomial Approximation Conditions

Consider the binomial model when the probability of a success is p = 0.10. The following figures show four hollow histograms for simulated samples from the binomial distribution using four different sample sizes: n = 10, 30, 100, 300. What happens to the shape of the distributions as the sample size increases? What distribution does the last histogram resemble?

Four hollow histograms side by side. First: represents n=10 and has higher values towards 0-2 and lower ones to the right. Second: represents n=30 and has higher values around 4 with lower ones to the left and right of 4. Third: represents n=100 and has x values ranging from 0-20. Follows a bell shape. Fourth: represents n=300 and has x values ranging from 10-50. Follows a bell shape.
Figure 5.14. Hollow Histograms for Different Sample Sizes

It appears the distribution is transformed from a blocky and skewed distribution into one that rather resembles the normal distribution in the last hollow histogram.

The binomial distribution with probability of success p is nearly normal when the sample size n is sufficiently large that np and n(1 − p) are both at least 10. The approximate normal distribution has parameters corresponding to the mean and standard deviation of the binomial distribution: µ = np  and σ = np(1 − p)

The normal approximation may be used when computing the range of many possible successes. For instance, we may apply the normal distribution to the setting of the previous example:

Example

Use the normal approximation to estimate the probability of observing 42 or fewer smokers in a sample of 400, if the true proportion of smokers is p = 0.15.

Already knowing that the binomial model, we then verify that both np and n(1 − p) are at least 10:

  • np = 400 × 0.15 = 60 n(1 − p) = 400 × 0.85 = 340

With these conditions met, we may use the normal approximation in place of the binomial distribution using the mean and standard deviation from the binomial model:

  • µ = np = 60 and σ =np(1 − p) = 7.14

We want to find the probability of observing 42 or fewer smokers using this model. Use the normal model N(µ = 60, σ = 7.14) and standardize to estimate the probability of observing 42 or fewer smokers. Your answer should be approximately equal to the solution we found in the previous example, 0.0054.

Compute the Z-score first:

 

The Continuity Correction

The normal approximation to the binomial distribution tends to perform poorly when estimating the probability of a small range of counts, even when the conditions are met.

Suppose we wanted to compute the probability of observing 49, 50, or 51 smokers in 400 when p = 0.15. With such a large sample, we might be tempted to apply the normal approximation and use the range 49 to 51. However, we would find that the binomial solution and the normal approximation notably differ:

  • Binomial: 0.0649
  • Normal: 0.0421

We can identify the cause of this discrepancy in the next figure which shows the areas representing the binomial probability (outlined) and normal approximation (shaded). Notice that the width of the area under the normal distribution is 0.5 units too slim on both sides of the interval.

A bell-shaped curve with x-axis ranges from 40-80 by 10. A section of the graph is highlighted on x=50.
Figure 5.15. Continuity Correction

The normal approximation to the binomial distribution for intervals of values can usually be improved if cutoff values are modified slightly. The cutoff values for the lower end of a shaded region should be reduced by 0.5, and the cutoff value for the upper end should be increased by 0.5.  This is called the continuity correction.

The tip to add extra area when applying the normal approximation is most often useful when examining a range of observations. In the example above, the revised normal distribution estimate is 0.0633, much closer to the exact value of 0.0649. While it is possible to also apply this correction when computing a tail area, the benefit of the modification usually disappears since the total interval is typically quite wide. So, in order to get the best approximation, add 0.5 to x or subtract 0.5 from x (use x + 0.5 or x – 0.5). The number 0.5 is called the continuity correction factor and is used in the following example.

 

Example

Suppose in a local Kindergarten through 12th grade (K – 12) school district, 53 percent of the population favor a charter school for grades K through 5. A simple random sample of 300 is surveyed.

  1. Find the probability that at least 150 favor a charter school.
  2. Find the probability that at most 160 favor a charter school.
  3. Find the probability that more than 155 favor a charter school.
  4. Find the probability that fewer than 147 favor a charter school.
  5. Find the probability that exactly 175 favor a charter school.

Let X = the number that favors a charter school for grades K through 5. X ~ B(n, p) where n = 300 and p = 0.53. Since np > 5 and nq > 5, use the normal approximation to the binomial.

Remember, the formulas for the mean and standard deviation are μ = np and σ = [latex]\sqrt{npq}[/latex]. The mean is 159 and the standard deviation is 8.6447. The random variable for the normal distribution is Y: Y ~ N(159, 8.6447).

Remember in Section 6.2, we learned how to use the normalcdf() function in the graphing calculator:

Using the TI-83+ and TI-84 Calculators for Normal Probabilities

Go into 2nd DISTR.

Press 2:normalcdf.

The syntax for the instructions are as follows: normalcdf(lower value, upper value, mean, standard deviation)

In some instances:

  • the upper number of the area might be 1E99 (= 1099). You get 1E99 (= 1099) by pressing 1, the EE key (a 2nd key) and then 99. Or, you can enter 10^99 instead. The number 1099 is way out in the right tail of the normal curve.
  • the lower number of the area might be –1E99 (= –1099). The number –1099 is way out in the left tail of the normal curve.

Use that function for this problem!

 

Solution

For part 1, you include 150 so P(X ≥ 150) has normal approximation P(Y ≥ 149.5) = 0.8641.

normalcdf(149.5,10^99,159,8.6447) = 0.8641.

 

Solution

For part 2, you include 160 so P(X ≤ 160) has normal approximation P(Y ≤ 160.5) = 0.5689.

normalcdf(0,160.5,159,8.6447) = 0.5689

 

Solution

For part 3, you exclude 155 so P(X > 155) has normal approximation P(y > 155.5) = 0.6572.

normalcdf(155.5,10^99,159,8.6447) = 0.6572.

 

Solution

For part 4, you exclude 147 so P(X < 147) has normal approximation P(Y < 146.5) = 0.0741.

normalcdf(0,146.5,159,8.6447) = 0.0741

 

Solution

For part 5, P(X = 175) has normal approximation P(174.5 < Y < 175.5) = 0.0083.

normalcdf(174.5,175.5,159,8.6447) = 0.0083

 

Note

Because of calculators and computer software that let you calculate binomial probabilities for large values of n easily, it is not necessary to use the normal approximation to the binomial distribution, provided that you have access to these technology tools. Most school labs have Microsoft Excel, an example of computer software that calculates binomial probabilities. Many students have access to the TI-83 or 84 series calculators, and they easily calculate probabilities for the binomial distribution. If you type in “binomial probability distribution calculation” in an Internet browser, you can find at least one online calculator for the binomial.

For the previous example, the probabilities are calculated using the following binomial distribution: (n = 300 and p = 0.53). Compare the binomial and normal distribution answers.

P(X ≥ 150) : 1 - binomcdf(300,0.53,149) = 0.8641

P(X ≤ 160) : binomcdf(300,0.53,160) = 0.5684

P(X > 155) : 1 - binomcdf(300,0.53,155) = 0.6576

P(X < 147) : binomcdf(300,0.53,146) = 0.0742

P(X = 175) : binompdf(300,0.53,175) = 0.0083

 

Your Turn!

In a city, 46 percent of the population favor the incumbent, Dawn Morgan, for mayor. A simple random sample of 500 is taken. Using the continuity correction factor, find the probability that at least 250 favor Dawn Morgan for mayor.

 

Solution

0.0401

 

References

Data from the Wall Street Journal.

“National Health and Nutrition Examination Survey.” Center for Disease Control and Prevention. Available online at http://www.cdc.gov/nchs/nhanes.htm (accessed May 17, 2013).

definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2024 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book