Chapter 6: The Normal Distribution and The Central Limit Theorem
6.2 Using the Normal Distribution
Learning Objectives
By the end of this section, the student should be able to:
- Find areas under the normal distribution curve.
- Use the inverse normal to identify values for a given area under the normal curve.
- Compute probabilities and percentiles for real-world applications using the normal distribution.
Finding Normal Probabilities
The shaded area in the following graph indicates the area to the left of x. This area is represented by the probability [latex]P(X \lt x)[/latex]. Normal tables, computers, and calculators provide or calculate the probability [latex]P(X \lt x)[/latex].
The area to the right is then [latex]P(X > x) = 1 - P(X \lt x)[/latex]. Remember, [latex]P(X \lt x) = \text{Area to the left of the vertical line through x}[/latex]. [latex]P(X \lt x) = P(X > x) = 1 - P(X \lt x)= \text{Area to the right of the vertical line through x}[/latex]. [latex]P(X \lt x)[/latex] is the same as [latex]P( X \le x)[/latex] and [latex]P(X > x)[/latex] is the same as [latex]P(x \ge x)[/latex] for continuous distributions.
Your Turn!
If the area to the left of x is 0.012, then what is the area to the right?
Solution
[latex]1 - 0.012 = 0.988[/latex]
There are 3 main ways we can find probabilities associated with the Normal Distribution. These include:
- Math (via Calculus Integration)
- The Standardizing Process
- Technology
We would like to avoid complicated math if possible, so option 1 is out.
In order to avoid the math, a process called “Standardizing” can be used (option 2). This involves Z-scores, the Standard Normal Distribution and Tables. Although this tried and true process is now somewhat antiquated, it is a great place to start.
There are many technologies such as calculators and various statistical software that let us skip the entire standardizing process and instantaneously provide us with a probability. Although we typically have these at our disposal to use in practice, it is good to understand the process going on behind the scenes to make sure we apply our technology correctly.
The Standardizing Process
The standard normal distribution (SND) is the simplest form of the normal distribution you can think of. Recall some important facts from the previous section:
- The mean for the standard normal distribution is zero, and the standard deviation is one.
- If X is a normally distributed random variable and [latex]X \sim N( \mu, \sigma)[/latex], then the z-score is: [latex]z=\frac{x–\mu }{\sigma }[/latex]. The transformation [latex]z = \frac{x-\mu }{\sigma }[/latex] produces the distribution [latex]Z \sim N(0, 1)[/latex]. The value x in the given equation comes from a normal distribution with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex].
- The z-scores are converted to units of the standard deviation.
- A z-score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, [latex]\mu[/latex]. Values of x that are larger than the mean have positive z-scores, and values of x that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of zero.
We have the Z table at our disposal with probabilities already calculated and organized. Note that some Z tables (like in Figure 6.4) give us the left tailed CDF, or “less than” probability. For example, using the Figure 6.4 below, the area to the left of a Z score of -3.37, [latex]P(Z \le -3.37) = 0.0004[/latex].
We can then use these CDF values, [latex]P(Z \le z)[/latex], and some probability rules to find greater than [latex]P(Z \ge z) = 1 – P(Z \le z)[/latex] or in between [latex]P(a \le Z \le b) = P(Z \le b) – P(Z \le a)[/latex] probabilities.
Here is a link to another Z table example: Normal Table (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009). This table gives positive z-scores and the area in the table is the area under the normal curve from 0 (the mean) to that z-score. Using this table takes some practice and understanding symmetry and area under the normal curve.
A few examples are provided on the table link (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009):
- The area to the left of a Z score of 1.53 can be found by going to 1.5 in the X column, go right to the 0.03 column to find the value 0.43699. Now add 0.5 (for the probability less than zero) to obtain the final result of 0.93699. [latex]P(Z \le 1.53) = 0.93699[/latex].
- The probability that z is less than or equal to -1.53 (also called the area to the left of -1.53) can also be found using this table. Remember that the curve is symmetrical, so [latex]P(Z \ge z) = P(Z \le -z)[/latex]. Since we found [latex]P(Z \le 1.53) = 0.93699[/latex], then [latex]P(Z \ge 1.53) = 1 - 0.93699 = 0.06301 = P(Z \le -1.53)[/latex].
- Finding probabilities (area) between values is also possible. To find the probability that z is between -1 and 0.5, look up the values for 0.5 ([latex]0.5 + 0.19146 = 0.69146[/latex] and [latex]1- (1 - (0.5 + 0.34134)) = 0.15866[/latex]). Then subtract the results ([latex]0.69146 - 0.15866[/latex]) to obtain the result 0.5328.
To use any of these Z tables with a non-standard normal distribution (either the location parameter is not 0 or the scale parameter is not 1), standardize your value by subtracting the mean and dividing the result by the standard deviation. Then look up the value for this standardized value. (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009)
Example
Use the Z table (Normal Table) to find the following probabilities:
a. [latex]P(Z \le -0.54)[/latex]
Solution
The area between 0.54 and 0 can be found by going to 0.5 in the X column, go right to the 0.04 column to find the value 0.20540. If you add 0.5 to that value, ([latex]0.5 + 0.20540 = 0.7054[/latex]) you’ll have the area all the way to the left of 0.54. [latex]P(Z \le 0.54) = 0.7054[/latex]. To find the equal negative z-score, subtract that from 1. So [latex]1 - 0.7054 = 0.2946[/latex].
b. [latex]P(Z \ge 1.2)[/latex]
Solution
The area between 1.2 and 0 can be found by going to 1.2 in the X column, go right to the 0.00 column to find the value 0.38493. Since the curve has an area to the right of 0 that equals 0.5, then to the right of just 1.2 must be [latex]0.5 - 0.38493 = 0.11507[/latex].
c. [latex]P(-1.5 \le Z \le 0.84)[/latex]
Solution
The area between 0 and 0.84 is found by going to 0.8 in the X column, go right to the 0.04 column to find the value 0.29955. The area between -1.5 and 0 is the same as the area between 1.5 and 0, which can be found on the table as a value of 0.43319. This area can be found by adding [latex]0.29955 + 0.43319 = 0.73274[/latex].
Your Turn
Use the Z table to find the following probabilities:
a. [latex]P(Z \le 1)[/latex]
Solution
0.8413
b. [latex]P(Z \ge 1)[/latex]
Solution
0.1587
c. [latex]P(-1 \le Z \le 1)[/latex]
Solution
0.6826
So far we have seen the idea that we can convert any normal distribution with any mean and standard deviation to the standard normal distribution in units of z-scores. We also have the associated probabilities in our Z table. Essentially, the work has been done for us if we know how to standardize and look up the associated probability in the table. The general process is: [latex]X \sim N(\mu, \sigma) \rightarrow Z \sim N(0, 1) \rightarrow \text{Probability from Z table}[/latex]
This process, while maybe outdated in our technology age, is good for beginners to understand and useful when we do not have access to technology.
Example
Height and weight are two measurements used to track a child’s development. The World Health Organization measures child development by comparing the weights of children who are the same height and the same gender. In 2009, weights for all 80 cm girls in the reference population had a mean µ = 10.2 kg and standard deviation σ = 0.8 kg. Weights are normally distributed. X ~ N(10.2, 0.8). Calculate the z-scores that correspond to the following weights, then find the associated probabilities.
a. The probability that a child weighs less than 11 kg
Solution
[latex]\frac{11 - 10.2}{0.8} = 1[/latex]
A child who weighs 11 kg is one standard deviation above the mean of 10.2 kg
[latex]P(Z \le 1) = 0.8413[/latex]
b. The probability that a child weighs more than 7.9 kg
Solution
[latex]\frac{7.9 - 10.2}{0.8} = -2.875[/latex]
A child who weighs 7.9 kg is 2.875 standard deviations below the mean of 10.2 kg
[latex]P(Z \ge -2.88) = 1- P(Z \le -2.88) = 1 - 0.002 = 0.998[/latex]
c. The probability that a child weighs between 11.2 and 12.2 kg
Solution
[latex]z_1 = \frac{11.2 - 10.2}{0.8} = 1.25[/latex] and [latex]z_2 = \frac{12.2 - 10.2}{0.8} = 2.5[/latex]
A child who weighs 12.2 kg is 2.5 standard deviation above the mean of 10.2 kg
[latex]P( 1.25 \le Z \le 2.5) = P(Z \le 2.5) - P(Z \le1.25) = 0.9938 - 0.8944 = 0.0994[/latex]
Your Turn!
The golf scores for a school team were normally distributed with a mean of 68 and a standard deviation of three.
a. Find the probability that a randomly selected golfer scored less than 65.
Solution
0.1587
b. Find the probability that a golfer scored between 66 and 70.
Solution
0.4950
Using Technology
Probabilities can be quickly calculated using technology. The technology that will be used for the following examples are the TI-83+ and TI-84 calculators. The best news about using the TI graphing calculator is that there is no need to standardize. You don’t have to go from [latex]X \sim N(\mu, \sigma) \rightarrow Z \sim N(0, 1)[/latex], you just use [latex]X \sim N(\mu, \sigma)[/latex]. Although the calculator is a quick method, drawing the image of the shaded region can help you to see if your answer is valid.
Using the TI-83+ and TI-84 Calculators for Normal Probabilities
Go into 2nd DISTR.
Press 2: normalcdf.
The syntax for the instructions are as follows: [latex]\text{normalcdf(lower value, upper value, mean, standard deviation)}[/latex]
In some instances:
- the upper number of the area might be [latex]1\text{E}99[/latex] (which equals [latex]10^{99}[/latex]). You get [latex]1\text{E}99 = 10^{99})[/latex] by pressing [latex]1[/latex], the [latex]\text{EE}[/latex] key (a 2nd key) and then [latex]99[/latex]. Or, you can enter [latex]10^{99}[/latex] instead. The number [latex]10^{99}[/latex] is way out in the right tail of the normal curve.
- the lower number of the area might be [latex]-1 \text{E} 99[/latex] (which equals [latex]-10^{99}[/latex]). The number [latex]-10^{99}[/latex] is way out in the left tail of the normal curve.
Example
The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.
a. Find the probability that a randomly selected student scored more than 65 on the exam.
Solution
Draw the graph, then find [latex]P(x > 65)[/latex].
[latex]\text{normalcdf}(65, 1 \text{E}99, 63, 5) = 0.3446[/latex]
b. Find the probability that a randomly selected student scored less than 85.
Solution
[latex]\text{normalcdf}(-1 \text{E} 99, 85, 63, 5) = 1[/latex] (rounds to one)
The probability that one student scores less than 85 is approximately one (or 100%).
Example
There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years, respectively.
a. Determine the probability that a random smartphone user in the age range 13 to 55+ is between 23 and 64.7 years old.
Solution
[latex]\text{normalcdf}(23, 64.7, 36.9, 13.9) = 0.8186[/latex]
b. Determine the probability that a randomly selected smartphone user in the age range 13 to 55+ is at most 50.8 years old.
Solution
[latex]\text{normalcdf}(-10^{99}, 50.8, 36.9, 13.9) = 0.8413[/latex]
Your Turn!
A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the probability that a household personal computer is used for entertainment between 1.8 and 2.75 hours per day.
Solution
Let X = the amount of time (in hours) a household personal computer is used for entertainment. [latex]X \sim N(2, 0.5)[/latex] where [latex]\mu = 2[/latex] and [latex]\sigma = 0.5[/latex]. Find [latex]P(1.8 \lt x \lt 2.75)[/latex].
The probability for which you are looking is the area between [latex]x = 1.8[/latex] and [latex]x = 2.75[/latex].
[latex]\text{normalcdf}(1.8, 2.75, 2, 0.5) = 0.5886[/latex]
The probability that a household personal computer is used between 1.8 and 2.75 hours per day for entertainment is 0.5886.
Using the TI-83+ and TI-84 Calculators for Finding Percentiles (Z-scores)
Press 2nd DISTR
Press 3: invNorm
The syntax for the instructions are as follows: [latex]\text{invNorm(area to the left, mean, standard deviation)}[/latex]
**Note: Percentiles are values that are left area, but be careful to read if you are given a right area in the problem. Remember, [latex]\text{Right Area} = 1 - \text{Left Area}[/latex].
Example
The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.
a. Find the 90th percentile (that is, find the score k that has 90% of the scores below k and 10% of the scores above k).
Solution
Draw a graph and shade the area that corresponds to the 90th percentile. Let [latex]k = \text{the }90^{\text{th}}\text{ percentile}[/latex]. The variable k is located on the x-axis. [latex]P(x \lt k)[/latex] is the area to the left of k. The 90th percentile k separates the exam scores into those that are the same or lower than k and those that are the same or higher. Ninety percent of the test scores are the same or lower than k, and ten percent are the same or higher.
For this problem, [latex]\text{invNorm}(0.90, 63, 5) = 69.4[/latex]
The 90th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall at or above.
b. Find the 70th percentile (that is, find the score k such that 70% of scores are below k and 30% of the scores are above k).
Solution
[latex]\text{invNorm}(0.70, 63, 5) = 65.6[/latex]
The 70th percentile is 65.6. This means that 70% of the test scores fall at or below 65.5 and 30% fall at or above.
Example
Two thousand students took an exam. The scores on the exam have an approximate normal distribution with a mean [latex]\mu = 81 \text{ points}[/latex] and standard deviation [latex]\sigma = 15 \text{ points}[/latex].
a. Calculate the first and third quartile scores for this exam.
Solution
[latex]Q_1 = 25^{\text{th}} \text{percentile} = \text{invNorm}(0.25, 81, 15) = 70.9[/latex]
[latex]Q_3 = 75^{\text{th}} \text{percentile} = \text{invNorm}(0.75, 81, 15) = 91.1[/latex]
b. The middle 50% of the exam scores are between what two values?
Solution
In part a, The middle 50% of the scores are between 70.9 and 91.1.
Example
There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years respectively. Using this information, answer the following questions (round answers to one decimal place).
a. Calculate the interquartile range (IQR).
Solution
b. Forty percent of the ages that range from 13 to 55+ are at least what age?
Solution
Your Turn!
A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment.
Solution
To find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment, find the 25th percentile, k, where [latex]P(x \lt k) = 0.25[/latex].
[latex]\text{invNorm}(0.25, 2, 0.5) = 1.66[/latex]
The maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment is 1.66 hours.
Your Turn
A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm. The middle 40% of mandarin oranges from this farm are between ______ and ______.
Solution
The middle area = 0.40, so each tail has an area of 0.30.
[latex]1 - 0.40 = 0.60[/latex]
The tails of the graph of the normal distribution each have an area of 0.30.
Find [latex]k_1[/latex], the 30th percentile and [latex]k_2[/latex], the 70th percentile ([latex]0.40 + 0.30 = 0.70[/latex]).
[latex]k_1 = \text{invNorm(0.30, 5.85, 0.24)} = 5.72 \text{ cm}[/latex]
[latex]k_2 = \text{invNorm(0.70, 5.85, 0.24)} = 5.98 \text{ cm}[/latex]
Your Turn
A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm.
a. Find the 90th percentile for the diameters of mandarin oranges
Solution
6.16
b. The middle 20% of mandarin oranges from this farm have diameters between _____ and _____.
Solution
Between 5.79 and 5.91
Section 6.2 Review
Summary
The normal distribution, which is continuous, is the most important of all the probability distributions. Its graph is bell-shaped. This bell-shaped curve is used in almost all disciplines. Since it is a continuous distribution, the total area under the curve is one. The parameters of the normal are the mean [latex]\mu[/latex] and the standard deviation [latex]\sigma[/latex]. A special normal distribution, called the standard normal distribution is the distribution of z-scores. Its mean is zero, and its standard deviation is one.
Formula Review
Normal Distribution: [latex]X \sim N(\mu, \sigma)[/latex] where [latex]\mu[/latex] is the mean and [latex]\sigma[/latex] is the standard deviation.
Standard Normal Distribution: [latex]Z \sim N(0,1)[/latex].
Calculator function for probability: [latex]\text{normalcdf} (\text{lower }x \text{value of the area, upper }x \text{value of the area, mean, standard deviation})[/latex]
Calculator function for the [latex]k^{\text{th}}[/latex] percentile: [latex]k = \text{invNorm} (\text{area to the left of }k \text{, mean, standard deviation)}[/latex]
Section 6.2 Practice
How would you represent the area to the left of one in a probability statement?
Solution
P(x < 1)
What is the area to the right of one?
Solution
1 – P(x < 1) or P(x < 1)
Is P(x < 1) equal to P(x ≤ 1)? Why?
Solution
Yes, because they are the same in a continuous distribution: P(x = 1) = 0
How would you represent the area to the left of three in a probability statement?
Solution
P(x < 3)
What is the area to the right of three?
Solution
1 – P(x < 3) or P(x > 3)
If the area to the left of x in a normal distribution is 0.123, what is the area to the right of x?
Solution
1 – 0.123 = 0.877
If the area to the right of x in a normal distribution is 0.543, what is the area to the left of x?
Solution
1 – 0.543 = 0.457
Use the following information to answer the next four exercises:
X ~ N(54, 8)
1. Find the probability that x > 56.
Solution
0.4013
2. Find the probability that x < 30.
Solution
0.0013
3. Find the 80th percentile.
Solution
60.73
4. Find the 60th percentile.
Solution
56.03
X ~ N(6, 2)
Find the probability that x is between three and nine.
Solution
0.8664
X ~ N(–3, 4)
Find the probability that x is between one and four.
Solution
0.1186
X ~ N(4, 5)
Find the maximum of x in the bottom quartile.
Solution
0.6276
Use the following information to answer the next three exercise:
The life of Sunshine CD players is normally distributed with a mean of 4.1 years and a standard deviation of 1.3 years. A CD player is guaranteed for three years. We are interested in the length of time a CD player lasts.
1. Find the probability that a CD player will break down during the guarantee period.
a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probability.
b. P(0 < x < ____________) = ___________ (Use zero for the minimum value of x.)
Solution
a. Check student’s solution.
b. 3, 0.1979
2. Find the probability that a CD player will last between 2.8 and six years.
a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probability.
b. P(__________ < x < __________) = __________
Solution
a. Check student’s solution
b. 2.8, 6, 0.7694
3. Find the 70th percentile of the distribution for the time a CD player lasts.
a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the lower 70%.
b. P(x < k) = __________ Therefore, k = _________
Solution
a. Check student’s solution.
b. 0.70, 4.78 years
Use the following information to answer the next two exercises:
The patient recovery time from a particular surgical procedure is normally distributed with a mean of 5.3 days and a standard deviation of 2.1 days.
What is the probability of spending more than two days in recovery?
- 0.0580
- 0.8447
- 0.0553
- 0.9420
Solution
0.9420
The 90th percentile for recovery times is?
- 8.89
- 7.07
- 7.99
- 4.32
Solution
7.99
Based upon the given information and numerically justified, would you be surprised if it took less than one minute to find a parking space?
- Yes
- No
- Unable to determine
Solution
Yes
Find the probability that it takes at least eight minutes to find a parking space.
- 0.0001
- 0.9270
- 0.1862
- 0.0668
Solution
0.0668
Seventy percent of the time, it takes more than how many minutes to find a parking space?
- 1.24
- 2.41
- 3.95
- 6.05
Solution
3.95
According to a study done by De Anza students, the height for Asian adult males is normally distributed with an average of 66 inches and a standard deviation of 2.5 inches. Suppose one Asian adult male is randomly chosen. Let X = height of the individual.
- X ~ _____(_____, _____)
- Find the probability that the person is between 65 and 69 inches. Include a sketch of the graph, and write a probability statement.
- Would you expect to meet many Asian adult males over 72 inches? Explain why or why not, and justify your answer numerically.
- The middle 40% of heights fall between what two values? Sketch the graph, and write the probability statement.
Solution
- X ~ N(66, 2.5)
- 0.5404
- No, the probability that an Asian male is over 72 inches tall is 0.0082
IQ is normally distributed with a mean of 100 and a standard deviation of 15. Suppose one individual is randomly chosen. Let X = IQ of an individual.
- X ~ _____(_____, _____)
- Find the probability that the person has an IQ greater than 120. Include a sketch of the graph, and write a probability statement.
- MENSA is an organization whose members have the top 2% of all IQs. Find the minimum IQ needed to qualify for the MENSA organization. Sketch the graph, and write the probability statement.
- The middle 50% of IQs fall between what two values? Sketch the graph and write the probability statement.
Solution
- X ~ N(100, 15)
- The probability that a person has an IQ greater than 120 is 0.0918.
- A person has to have an IQ over 130 to qualify for MENSA.
- The middle 50% of IQ scores falls between 89.95 and 110.05.
The percent of fat calories that a person in America consumes each day is normally distributed with a mean of about 36 and a standard deviation of 10. Suppose that one individual is randomly chosen. Let X = percent of fat calories.
- X ~ _____(_____, _____)
- Find the probability that the percent of fat calories a person consumes is more than 40. Graph the situation. Shade in the area to be determined.
- Find the maximum number for the lower quarter of percent of fat calories. Sketch the graph and write the probability statement.
Solution
- X ~ N(36, 10)
- The probability that a person consumes more than 40% of their calories as fat is 0.3446.
- Approximately 25% of people consume less than 29.26% of their calories as fat.
Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with a mean of 250 feet and a standard deviation of 50 feet.
- If X = distance in feet for a fly ball, then X ~ _____(_____, _____)
- If one fly ball is randomly chosen from this distribution, what is the probability that this ball traveled fewer than 220 feet? Sketch the graph. Scale the horizontal axis X. Shade the region corresponding to the probability. Find the probability.
- Find the 80th percentile of the distribution of fly balls. Sketch the graph, and write the probability statement.
Solution
- X ~ N(250, 50)
- The probability that a fly ball travels less than 220 feet is 0.2743.
- Eighty percent of the fly balls will travel less than 292 feet.
In China, four-year-olds average three hours a day unsupervised. Most of the unsupervised children live in rural areas, considered safe. Suppose that the standard deviation is 1.5 hours and the amount of time spent alone is normally distributed. We randomly select one Chinese four-year-old living in a rural area. We are interested in the amount of time the child spends alone per day.
- In words, define the random variable X.
- X ~ _____(_____,_____)
- Find the probability that the child spends less than one hour per day unsupervised. Sketch the graph, and write the probability statement.
- What percent of the children spend over ten hours per day unsupervised?
- Seventy percent of the children spend at least how long per day unsupervised?
Solution
- X = number of hours that a Chinese four-year-old in a rural area is unsupervised during the day.
- X ~ N(3, 1.5)
- The probability that the child spends less than one hour a day unsupervised is 0.0918.
- The probability that a child spends over ten hours a day unsupervised is less than 0.0001.
- 2.21 hours
In the 1992 presidential election, Alaska’s 40 election districts averaged 1,956.8 votes per district for President Clinton. The standard deviation was 572.3. (There are only 40 election districts in Alaska.) The distribution of the votes per district for President Clinton was bell-shaped. Let X = number of votes for President Clinton for an election district.
- State the approximate distribution of X.
- Is 1,956.8 a population mean or a sample mean? How do you know?
- Find the probability that a randomly selected district had fewer than 1,600 votes for President Clinton. Sketch the graph and write the probability statement.
- Find the probability that a randomly selected district had between 1,800 and 2,000 votes for President Clinton.
- Find the third quartile for votes for President Clinton.
Solution
- X ~ N(1956.8, 572.3)
- This is a population mean, because all election districts are included.
- The probability that a district had less than 1,600 votes for President Clinton is 0.2676.
- 0.3798
- Seventy-five percent of the districts had fewer than 2,340 votes for President Clinton.
Suppose that the duration of a particular type of criminal trial is known to be normally distributed with a mean of 21 days and a standard deviation of 7 days.
- In words, define the random variable X.
- X ~ _____(_____,_____)
- If one of the trials is randomly chosen, find the probability that it lasted at least 24 days. Sketch the graph and write the probability statement.
- Sixty percent of all trials of this type are completed within how many days?
Solution
- X = the distribution of the number of days a particular type of criminal trial will take
- X ~ N(21, 7)
- The probability that a randomly selected trial will last more than 24 days is 0.3336.
- 22.77
Terri Vogel, an amateur motorcycle racer, averages 129.71 seconds per 2.5 mile lap (in a seven-lap race) with a standard deviation of 2.28 seconds. The distribution of her race times is normally distributed. We are interested in one of her randomly selected laps.
-
- In words, define the random variable X.
- X ~ _____(_____,_____)
- Find the percent of her laps that are completed in less than 130 seconds.
- The fastest 3% of her laps are under _____.
- The middle 80% of her laps are from _______ seconds to _______ seconds.
Solution
- X = the distribution of race times that Terry Vogel produces
- X ~ N(129.71, 2.28)
- Terri completes 55.17% of her laps in less than 130 seconds.
- Terri completes 55.17% of her laps in less than 130 seconds.
- 124.4 and 135.02
Thuy Dau, Ngoc Bui, Sam Su, and Lan Voung conducted a survey as to how long customers at Lucky claimed to wait in the checkout line until their turn. Let X = time in line. [link] displays the ordered real data (in minutes):
| 0.50 | 4.25 | 5 | 6 | 7.25 |
| 1.75 | 4.25 | 5.25 | 6 | 7.25 |
| 2 | 4.25 | 5.25 | 6.25 | 7.25 |
| 2.25 | 4.25 | 5.5 | 6.25 | 7.75 |
| 2.25 | 4.5 | 5.5 | 6.5 | 8 |
| 2.5 | 4.75 | 5.5 | 6.5 | 8.25 |
| 2.75 | 4.75 | 5.75 | 6.5 | 9.5 |
| 3.25 | 4.75 | 5.75 | 6.75 | 9.5 |
| 3.75 | 5 | 6 | 6.75 | 9.75 |
| 3.75 | 5 | 6 | 6.75 | 10.75 |
- Calculate the sample mean and the sample standard deviation.
- Construct a histogram.
- Draw a smooth curve through the midpoints of the tops of the bars.
- In words, describe the shape of your histogram and smooth curve.
- Let the sample mean approximate μ and the sample standard deviation approximate σ. The distribution of X can then be approximated by X ~ _____(_____,_____)
- Use the distribution in part e to calculate the probability that a person will wait fewer than 6.1 minutes.
- Determine the cumulative relative frequency for waiting less than 6.1 minutes.
- Why aren’t the answers to part f and part g exactly the same?
- Why are the answers to part f and part g as close as they are?
- If only ten customers have been surveyed rather than 50, do you think the answers to part f and part g would have been closer together or farther apart? Explain your conclusion.
Solution
- mean = 5.51, s = 2.15
- Check student’s solution.
- Check student’s solution.
- Check student’s solution.
- X ~ N(5.51, 2.15)
- 0.6029
- The cumulative frequency for less than 6.1 minutes is 0.64.
- The answers to part f and part g are not exactly the same, because the normal distribution is only an approximation to the real one.
- The answers to part f and part g are close, because a normal distribution is an excellent approximation when the sample size is greater than 30.
- The approximation would have been less accurate, because the smaller sample size means that the data does not fit the normal curve as well.
Suppose that Ricardo and Anita attend different colleges. Ricardo’s GPA is the same as the average GPA at his school. Anita’s GPA is 0.70 standard deviations above her school average. In complete sentences, explain why each of the following statements may be false.
- Ricardo’s actual GPA is lower than Anita’s actual GPA.
- Ricardo is not passing because his z-score is zero.
- Anita is in the 70th percentile of students at her college.
Solution
- If the average GPA is less at Anita’s school than it is at Ricardo’s, then Ricardo’s actual score could be higher.
- Passing can be defined differently at different schools. Also, since Ricardo’s z-score is 0, his GPA is actually the average for his school, which is typically a passing GPA.
- Anita’s percentile is higher than the 70th percentile.
The table below shows a sample of the maximum capacity (maximum number of spectators) of sports stadiums. The table does not include horse-racing or motor-racing stadiums.
| 40,000 | 40,000 | 45,050 | 45,500 | 46,249 | 48,134 |
| 49,133 | 50,071 | 50,096 | 50,466 | 50,832 | 51,100 |
| 51,500 | 51,900 | 52,000 | 52,132 | 52,200 | 52,530 |
| 52,692 | 53,864 | 54,000 | 55,000 | 55,000 | 55,000 |
| 55,000 | 55,000 | 55,000 | 55,082 | 57,000 | 58,008 |
| 59,680 | 60,000 | 60,000 | 60,492 | 60,580 | 62,380 |
| 62,872 | 64,035 | 65,000 | 65,050 | 65,647 | 66,000 |
| 66,161 | 67,428 | 68,349 | 68,976 | 69,372 | 70,107 |
| 70,585 | 71,594 | 72,000 | 72,922 | 73,379 | 74,500 |
| 75,025 | 76,212 | 78,000 | 80,000 | 80,000 | 82,300 |
- Calculate the sample mean and the sample standard deviation for the maximum capacity of sports stadiums (the data).
- Construct a histogram.
- Draw a smooth curve through the midpoints of the tops of the bars of the histogram.
- In words, describe the shape of your histogram and smooth curve.
- Let the sample mean approximate μ and the sample standard deviation approximate σ. The distribution of X can then be approximated by X ~ _____(_____,_____).
- Use the distribution in part e to calculate the probability that the maximum capacity of sports stadiums is less than 67,000 spectators.
- Determine the cumulative relative frequency that the maximum capacity of sports stadiums is less than 67,000 spectators. Hint: Order the data and count the sports stadiums that have a maximum capacity less than 67,000. Divide by the total number of sports stadiums in the sample.
- Why aren’t the answers to part f and part g exactly the same?
Solution
- mean = 60,136; s = 10,468
- Answers will vary.
- Answers will vary.
- Answers will vary.
- X ~ N(60136, 10468)
- 0.7440
- The cumulative relative frequency is 43/60 = 0.717.
- The answers for part f and part g are not the same, because the normal distribution is only an approximation.
An expert witness for a paternity lawsuit testifies that the length of a pregnancy is normally distributed with a mean of 280 days and a standard deviation of 13 days. An alleged father was out of the country from 240 to 306 days before the birth of the child, so the pregnancy would have been less than 240 days or more than 306 days long if he was the father. The birth was uncomplicated, and the child needed no medical intervention.
What is the probability that he was NOT the father? What is the probability that he could be the father? Calculate the z-scores first, and then use those to calculate the probability.
Solution
For x = 240, z =−3.0769
For x = 306, z = 2
P(240 < x < 306) = P(–3.0769 < z < 2) = normalcdf(–3.0769,2,0,1) = 0.9762.
According to the scenario given, this means that there is a 97.62% chance that he is not the father.
To answer the second part of the question, there is a 1 – 0.9762 = 0.0238 = 2.38% chance that he is the father.
A NUMMI assembly line, which has been operating since 1984, has built an average of 6,000 cars and trucks a week. Generally, 10% of the cars were defective coming off the assembly line. Suppose we draw a random sample of n = 100 cars. Let X represent the number of defective cars in the sample.
What can we say about X in regard to the 68-95-99.7 empirical rule (one standard deviation, two standard deviations and three standard deviations from the mean are being referred to)? Assume a normal distribution for the defective cars in the sample.
Solution
- n = 100; p = 0.1; q = 0.9
- μ = np = (100)(0.10) = 10
- σ = [latex]\sqrt{npq}[/latex] = [latex]\sqrt{\text{(100)(0}\text{.1)(0}\text{.9)}}[/latex] = 3
- z = ±1: x1 = µ + zσ = 10 + 1(3) = 13 and x2 = µ – zσ = 10 – 1(3) = 7. 68% of the defective cars will fall between seven and 13.
- z = ±2: x1 = µ + zσ = 10 + 2(3) = 16 and x2 = µ – zσ = 10 – 2(3) = 4. 95 % of the defective cars will fall between four and 16
- z = ±3: x1 = µ + zσ = 10 + 3(3) = 19 and x2 = µ – zσ = 10 – 3(3) = 1. 99.7% of the defective cars will fall between one and 19.
We flip a coin 100 times (n = 100) and note that it only comes up heads 20% (p = 0.20) of the time. The mean and standard deviation for the number of times the coin lands on heads is µ = 20 and σ = 4 (verify the mean and standard deviation). Solve the following:
- There is about a 68% chance that the number of heads will be somewhere between ___ and ___.
- There is about a ____chance that the number of heads will be somewhere between 12 and 28.
- There is about a ____ chance that the number of heads will be somewhere between eight and 32.
Solution
- There is about a 68% chance that the number of heads will be somewhere between 16 and 24. z = ±1: x1 = µ + zσ = 20 + 1(4) = 24 and x2 = µ-zσ = 20 – 1(4) = 16.
- There is about a 95% chance that the number of heads will be somewhere between 12 and 28. For this problem: normalcdf(12,28,20,4) = 0.9545 = 95.45%
- There is about a 99.73% chance that the number of heads will be somewhere between eight and 32. For this problem: normalcdf(8,32,20,4) = 0.9973 = 99.73%.
A $1 scratch-off lotto ticket will be a winner one out of five times. Out of a shipment of n = 190 lotto tickets, find the probability for the lotto tickets that there are
- somewhere between 34 and 54 prizes.
- somewhere between 54 and 64 prizes.
- more than 64 prizes.
Solution
- n = 190; p = [latex]\frac{1}{5}[/latex]
= 0.2; q = 0.8 - μ = np = (190)(0.2) = 38
- σ = [latex]\sqrt{npq}[/latex] = [latex]\sqrt{\text{(190)(0}\text{.2)(0}\text{.8)}}[/latex] = 5.5136
- For this problem: P(34 < x < 54) = normalcdf(34,54,48,5.5136) = 0.7641
- For this problem: P(54 < x < 64) = normalcdf(54,64,48,5.5136) = 0.0018
- For this problem: P(x > 64) = normalcdf(64,1099,48,5.5136) = 0.0000012 (approximately 0)
Facebook provides a variety of statistics on its website that detail the growth and popularity of the site.
On average, 28 percent of 18- to 34-year-olds check their Facebook profiles before getting out of bed in the morning. Suppose this percentage follows a normal distribution with a standard deviation of 5 percent.
- Find the probability that the percent of 18- to 34-year-olds who check Facebook before getting out of bed in the morning is at least 30.
- Find the 95th percentile, and express it in a sentence.
Solution
X = the percent of 18- to 34-year-olds who check Facebook before getting out of bed in the morning.
X ~ N(28, 5)
P(x ≥ 30) = 0.3446; normalcdf(30,1EE99,28,5) = 0.3446
invNorm(0.95,0.28,0.05) = 0.3622.95% of the percentage of 18- to 34-year-olds who check Facebook before getting out of bed in the morning is at most 36.22%.
P(25 < x < 55) = normalcdf(25,55,28,5) = 0.7257(0.7257)(400) = 290.28
References
“Naegele’s rule.” Wikipedia. Available online at http://en.wikipedia.org/wiki/Naegele’s_rule (accessed May 14, 2013).
“403: NUMMI.” Chicago Public Media & Ira Glass, 2013. Available online at http://www.thisamericanlife.org/radio-archives/episode/403/nummi (accessed May 14, 2013).
“Scratch-Off Lottery Ticket Playing Tips.” WinAtTheLottery.com, 2013. Available online at http://www.winatthelottery.com/public/department40.cfm (accessed May 14, 2013).
“Smart Phone Users, By The Numbers.” Visual.ly, 2013. Available online at http://visual.ly/smart-phone-users-numbers (accessed May 14, 2013).
“Facebook Statistics.” Statistics Brain. Available online at http://www.statisticbrain.com/facebook-statistics/(accessed May 14, 2013).
Media Attributions
- Private: Figure 5.12 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
- Private: Figure 5.13 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
- fig-ch06_05_01-1
a continuous random variable (RV) with a mean of 0 and a standard deviation of 1 which z-scores follow: X ~ N(0, 1); when X follows the standard normal distribution, it is often noted as Z ~ N(0, 1).
the linear transformation of the form z = [latex]\frac{x\text{ }–\text{ }\mu }{\sigma }[/latex]; if this transformation is applied to any normal distribution X ~ N(μ, σ) the result is the standard normal distribution Z ~ N(0,1). If this transformation is applied to any specific value x of the RV with mean μ and standard deviation σ, the result is called the z-score of x. The z-score allows us to compare data that are normally distributed but scaled differently.