Chapter 4: Discrete Random Variables
4.3 Binomial Distribution
Learning Objectives
By the end of this section, you should be able to:
- Identify the components of a binomial experiment
- Use the formulas for a binomial random variable to compute mean, variance, and standard deviation
Binomial Experiments
There are three characteristics of a binomial experiment.
- There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter [latex]n[/latex] denotes the number of trials.
- There are only two possible outcomes, called “success” and “failure,” for each trial. The letter [latex]p[/latex] denotes the probability of a success on one trial, and [latex]q[/latex] denotes the probability of a failure on one trial [latex]p + q = 1[/latex].
- The [latex]n[/latex] trials are independent and are repeated using identical conditions. Because the [latex]n[/latex] trials are independent, the outcome of one trial does not help in predicting the outcome of another trial. Another way of saying this is that for each individual trial, the probability, [latex]p[/latex], of a success and probability, [latex]q[/latex], of a failure remain the same.
For example, randomly guessing at a true-false statistics question has only two outcomes. If a success is guessing correctly, then a failure is guessing incorrectly. Suppose Taylor consistently guesses correctly on any statistics true-false question with probability [latex]p = 0.6[/latex]. Then, [latex]q = 0.4.[/latex] Consistency in guessing means that for every true-false statistics question Taylor answers, the probability of success ([latex]p = 0.6[/latex]) and the probability of failure ([latex]q = 0.4[/latex]) remain the same, a necessary criteria for a situation to be binomial.
Any experiment that has characteristics two and three and where [latex]n = 1[/latex] is called a Bernoulli Trial (named after Jacob Bernoulli who, in the late 1600s, studied them extensively). A binomial experiment takes place when the number of successes is counted in one or more Bernoulli Trials.
The outcomes of a binomial experiment fit a binomial probability distribution. The random variable [latex]X =[/latex] the number of successes obtained in the [latex]n[/latex] independent trials.
Notation for the Binomial
We use [latex]B[/latex] to represent the binomial probability distribution, and when [latex]X[/latex] fits the binomial distribution, we write [latex]X \sim B(n,p)[/latex]. Read this as “[latex]X[/latex] is a random variable with a binomial distribution with parameters [latex]n[/latex] and [latex]p[/latex],” which again represent the number of trials and the probability of “success.”
Example
At ABC College, the withdrawal rate from an elementary physics course is 30% for any given term. This implies that, for any given term, 70% of the students stay in the class for the entire term. A “success” could be defined as an individual who withdrew. The random variable [latex]X =[/latex] the number of students who withdraw from the randomly selected elementary physics class.
If at the start of a particular term, 300 students are enrolled in the elementary physics course, [latex]n = 300[/latex] and [latex]p = 0.3[/latex] and [latex]X \sim B(300, 0.3)[/latex]. The possible outcomes are [latex]x = 0, \ldots, 300[/latex] and the probability [latex]P(X=x)[/latex] is the probability that [latex]x[/latex] students will withdraw during the term.
Your turn!
Approximately 70% of statistics students do their homework in time for it to be collected and graded. Each student does homework independently. In a statistics class of 50 students, what is the probability that at least 40 will do their homework on time?
a. This is a binomial problem because there is only a success or a [latex]\underline{\hspace{20pt}}[/latex], there are a fixed number of trials, and the probability of a success is 0.70 for each trial.
b. If we are interested in the number of students who do their homework on time, then how do we define [latex]X[/latex]?
c. What values does [latex]x[/latex] take on?
d. What is a “failure,” in words?
e. If [latex]p + q = 1[/latex], then what is [latex]q[/latex]?
f. The words “at least” translate as what kind of inequality for the probability question [latex]P(X \underline{\hspace{20pt}} 40)[/latex].
Your turn!
Suppose you play a game that you can only either win or lose. The probability that you win any game is 55%, and the probability that you lose is 45%. Each game you play is independent. If you play the game 20 times, write the function that describes the probability that you win 15 of the 20 times.
Experiments That Are Not Binomial
Here are some common experiments that are not binomial:
- Flipping a coin until you get one head. While there are only two outcomes (heads and tails) to each coin flip, and while the probability of getting a head on each flip is consistent, the number of trials will vary.
- Most characteristics about people, such as weight, height, ethnicity, and gender. When we survey a group of people and ask for their weight, we get far more than two responses. When surveying ethnicity, as discussed in Chapter 1 an Other or Unknown category is needed to include all people in our results, specifically people who did not feel they fit into any of the ethnicity categories or declined to respond. While gender has historically been treated as a binomial outcome (male and female), not all people will feel they fit those two categories, so we may lose information and reliability by modeling gender as binomial.
- Drawing cards without replacement. While you can specify a set number of [latex]n[/latex] trials, and you can set it up as two outcomes (success = red, failure = black), the draws are not independent. If you draw a red on the first draw, it is now slightly more likely to draw black than red on the second draw.
Example
ABC College has a student advisory committee made up of ten staff members and six students. The committee wishes to choose a chairperson and a recorder by putting the names of all committee members into a box, and two names are drawn without replacement. The first name drawn determines the chairperson and the second name the recorder. Suppose the college wishes to find the probability that the chairperson and recorder are both students. Is this binomial?
Your turn!
Suppose 55% of people pass the state driver’s exam on the first try. You survey fifty randomly chosen adults in the state.
Which of these are binomial problems?
- The number of the adults who have a driver’s license.
- The number of the adults who got their driver’s license on the first try.
- The number of times each adult has taken the driver’s exam.
Binomial Probability Function
Once we have decided we can use the binomial for a given situation, we can use the binomial probability function to find the probability of a specific number of successes, [latex]P(X=x)[/latex]. The binomial PMF is made up of two parts:
First, we need to find out how many different ways we can get x successes in n trials. To do this we can use the “Choose” function, also called the binomial coefficient, written as:
nCx = [latex]=\binom nx =\frac{n!}{x!(n-x)!}[/latex]
Note: The ! mark is the factorial operator.
The next part gives us the probability of a single one of those ways to get x successes in n trials. We can do this by using our independent multiplication rule. We multiply the probability of success ([latex]p[/latex]) raised to the number of successes ([latex]x[/latex]) by the probability of failure ([latex]q=1-p[/latex]) raised to the number of failures ([latex]n-x[/latex]).
[latex]p^x q^{(n-x)}[/latex]
Since we know each of these ways are equally likely and how many ways are possible we can now put the two pieces together. We multiply the probability of one way by how many we have to give us our overall probability of x successes in n trials.
[latex]P(X = x) = \frac{n!}{x!(n-x)!} p^x q^{(n-x)}[/latex]
Unfortunately the binomial does not have a nice form of CDF, but it is simply the sum of PDFs up until that point. Consider the following example to demonstrate this point.
Example
It has been stated that about 41% of adult workers have a high school diploma but do not pursue any further education. Twenty adult workers are randomly selected.
Let [latex]X =[/latex] the number of workers who have a high school diploma but do not pursue any further education.
Then [latex]X[/latex] takes on the values [latex]0, 1, 2, \ldots, 20[/latex] where [latex]n = 20, p = 0.41[/latex], and [latex]q = 1-0.41 = 0.59[/latex]. Finally, [latex]X \sim B(20,0.41).[/latex]
The y-axis contains the probability of [latex]x[/latex], where [latex]X =[/latex] the number of workers who have only a high school diploma.
The graph of [latex]X \sim B(20, 0.41)[/latex] is as follows:
Find the probability that:
(a) Exactly 12 of them have a high school diploma
(b) At most 12 of them have a high school diploma but do not pursue any further education. How many adult workers do you expect to have a high school diploma but do not pursue any further education?
Note
To compute binomial probabilities on a graphing calculator, go into 2nd DISTR. The syntax for the instructions are as follows:
To calculate [latex]P(X = x)[/latex]: binompdf(n, p, x).
If [latex]x[/latex] is left out, the result is the binomial probability table.
To calculate [latex]P(X \leq x)[/latex]: binomcdf(n, p, x).
If [latex]x[/latex] is left out, the result is the cumulative binomial probability table.
For the example problem: After you are in 2nd DISTR, arrow down to binomcdf(. Press ENTER. Enter 20,0.41,12). The result is [latex]P(X \leq 12) = 0.9738[/latex].
If you wanted to instead find [latex]P(X>12)[/latex], use 1 – binomcdf(20,0.41,12).
In Excel, both binomial probabilities are computed using BINOM.DIST(x, n, p, True/False), where False computes [latex]P(X=x)[/latex] and is equivalent to binompdf, and True computes [latex]P(X \leq x)[/latex] and is equivalent to binomcdf.
Your turn!
About 32% of students participate in a community volunteer program outside of school. If 30 students are selected at random, find:
(a) The probability that exactly 14 of them participate in a community volunteer program outside of school. First try plugging in to the binomial formula by hand, then check yourself with technology.
(b) The probability that exactly 14 of them participate in a community volunteer program outside of school. Rely on technology for this probability.
Expected Value and Standard Deviation
The mean, [latex]\mu[/latex], and variance, [latex]\sigma^2[/latex], for the binomial probability distribution are [latex]\mu = np[/latex] and [latex]\sigma^2 = npq[/latex]. The standard deviation is then [latex]\sigma = \sqrt{npq}[/latex].
Example
In the 2013 Jerry’s Artarama art supplies catalog, there are 560 pages. Eight of the pages feature signature artists. Suppose we randomly sample 100 pages. Let X = the number of pages that feature signature artists.
- What values does x take on?
- What is the probability distribution? Find the following probabilities:
- the probability that two pages feature signature artists
- the probability that at most six pages feature signature artists
- the probability that more than three pages feature signature artists.
- Using the formulas, calculate the (i) mean and (ii) standard deviation.
Solution
- x = 0, 1, 2, 3, …, n, where n=100
- X ~ B[latex]\left(100,\frac{8}{560}\right)[/latex]
- P(x = 2) = binompdf[latex]\left(100,\frac{8}{560},2\right)[/latex] = 0.2466
- P(x ≤ 6) = binomcdf[latex]\left(100,\frac{8}{560},6\right)[/latex] = 0.9994
- P(x > 3) = 1 – P(x ≤ 3) = 1 – binomcdf[latex]\left(100,\frac{8}{560},3\right)[/latex] = 1 – 0.9443 = 0.0557
- Mean = np = (100)[latex]\left(\frac{8}{560}\right)[/latex] = [latex]\frac{800}{560}[/latex] ≈ 1.4286
- Standard Deviation = [latex]\sqrt{npq}[/latex] = [latex]\sqrt{\left(100\right)\left(\frac{8}{560}\right)\left(\frac{552}{560}\right)}[/latex] ≈ 1.1867
Your turn!
According to a Gallup poll, 60% of American adults prefer saving over spending. Let X = the number of American adults out of a random sample of 50 who prefer saving to spending.
- What is the probability distribution for X?
- Use your calculator to find the following probabilities:
- the probability that 25 adults in the sample prefer saving over spending
- the probability that at most 20 adults prefer saving
- the probability that more than 30 adults prefer saving
- Using the formulas, calculate the (i) mean and (ii) standard deviation of X.
Your Turn!
During the 2013 regular NBA season, DeAndre Jordan of the Los Angeles Clippers had the highest field goal completion rate in the league. DeAndre scored with 61.3% of his shots. Suppose you choose a random sample of 80 shots made by DeAndre during the 2013 season. Let X = the number of shots that scored points.
- What is the probability distribution for X?
- Using the formulas, calculate the (i) mean and (ii) standard deviation of X.
- Use your calculator to find the probability that DeAndre scored with 60 of these shots.
- Find the probability that DeAndre scored with more than 50 of these shots.
References
“Access to electricity (% of population),” The World Bank, 2013. Available online at http://data.worldbank.org/indicator/EG.ELC.ACCS.ZS?order=wbapi_data_value_2009%20wbapi_data_value%20wbapi_data_value-first&sort=asc (accessed May 15, 2015).
“Distance Education.” Wikipedia. Available online at http://en.wikipedia.org/wiki/Distance_education (accessed May 15, 2013).
“NBA Statistics – 2013,” ESPN NBA, 2013. Available online at http://espn.go.com/nba/statistics/_/seasontype/2 (accessed May 15, 2013).
Newport, Frank. “Americans Still Enjoy Saving Rather than Spending: Few demographic differences seen in these views other than by income,” GALLUP® Economy, 2013. Available online at http://www.gallup.com/poll/162368/americans-enjoy-saving-rather-spending.aspx (accessed May 15, 2013).
Pryor, John H., Linda DeAngelo, Laura Palucki Blake, Sylvia Hurtado, Serge Tran. The American Freshman: National Norms Fall 2011. Los Angeles: Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, 2011. Also available online at http://heri.ucla.edu/PDFs/pubs/TFS/Norms/Monographs/TheAmericanFreshman2011.pdf (accessed May 15, 2013).
“The World FactBook,” Central Intelligence Agency. Available online at https://www.cia.gov/library/publications/the-world-factbook/geos/af.html (accessed May 15, 2013).
“What are the key statistics about pancreatic cancer?” American Cancer Society, 2013. Available online at http://www.cancer.org/cancer/pancreaticcancer/detailedguide/pancreatic-cancer-key-statistics (accessed May 15, 2013).
Media Attributions
- Picture1
an experiment with the following characteristics:
1. There are only two possible outcomes called “success” and “failure” for each trial.
2. The probability p of a success is the same for any trial (so the probability q = 1 − p of a failure is the same for any trial).
A statistical experiment that satisfies three conditions:
1. There are a fixed number of trials, n.
2. There are only two possible outcomes, called "success" and "failure," for each trial. The letter p denotes the probability of a success on one trial, and q denotes the probability of a failure on one trial.
3. The n trials are independent and are repeated using identical conditions.
A discrete random variable which arises from Bernoulli trials; there are a fixed number, n, of independent trials with two outcomes called success and failure with probability p and q respectively. The binomial random variable X is the number of successes in n trials, denoted [latex]X \sim B(n,p)[/latex]. The mean is [latex]\mu = np[\latex] and the standard deviation is [latex]\sigma = \sqrt{npq}[/latex]. The probability of exactly [latex]x[/latex] successes in n trials is [latex]P\left(X=x\right)=\left(\genfrac{}{}{0}{}{n}{x}\right){p}^{x}{q}^{n-x}[/latex].
A function that gives the probability that a discrete random variable (X) is exactly equal to some value (x)
A function that gives the probability that a random variable takes a value less than or equal to x