Chapter 6: The Normal Distribution and The Central Limit Theorem
6.2 Using the Normal Distribution
Learning Objectives
By the end of this section, the student should be able to:
- Find areas under the normal distribution curve.
- Use the inverse normal to identify values for a given area under the normal curve.
- Compute probabilities and percentiles for real-world applications using the normal distribution.
Finding Normal Probabilities
The shaded area in the following graph indicates the area to the left of x. This area is represented by the probability P(X < x). Normal tables, computers, and calculators provide or calculate the probability P(X < x).
The area to the right is then P(X > x) = 1 – P(X < x). Remember, P(X < x) = Area to the left of the vertical line through x. P(X < x) = 1 – P(X < x) = Area to the right of the vertical line through x. P(X < x) is the same as P(X ≤ x) and P(X > x) is the same as P(X ≥ x) for continuous distributions.
Your Turn!
If the area to the left of x is 0.012, then what is the area to the right?
Solution
1 − 0.012 = 0.988
There are 3 main ways we can find probabilities associated with the Normal Distribution. These include:
- Math (via Calculus Integration)
- The Standardizing Process
- Technology
We would like to avoid complicated math if possible, so option 1 is out.
In order to avoid the math, a process called “Standardizing” can be used (option 2). This involves Z-scores, the Standard Normal Distribution and Tables. Although this tried and true process is now somewhat antiquated, it is a great place to start.
There are many technologies such as calculators and various statistical software that let us skip the entire standardizing process and instantaneously provide us with a probability. Although we typically have these at our disposal to use in practice, it is good to understand the process going on behind the scenes to make sure we apply our technology correctly.
The Standardizing Process
The standard normal distribution (SND) is the simplest form of the normal distribution you can think of. Recall some important facts from the previous section:
- The mean for the standard normal distribution is zero, and the standard deviation is one.
- If X is a normally distributed random variable and X ~ N(μ, σ), then the z-score is: [latex]z=\frac{x\text{ }–\text{ }\mu }{\sigma }[/latex]. The transformation z = [latex]\frac{x-\mu }{\sigma }[/latex] produces the distribution Z ~ N(0, 1). The value x in the given equation comes from a normal distribution with mean μ and standard deviation σ.
- The z-scores are converted to units of the standard deviation.
- A z-score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z-scores, and values of x that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of zero.
We have the Z table at our disposal with probabilities already calculated and organized. Note that some Z tables (like in Figure 6.4) give us the left tailed CDF, or “less than” probability. For example, using the Figure 6.4 below, the area to the left of a Z score of -3.37, P(Z ≤ -3.37) =0.0004.
We can then use these CDF values, P(Z ≤ z), and some probability rules to find greater than [P(Z ≥ z) = 1 – P(Z ≤ z)] or in between [P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a)] probabilities.
Here is a link to another Z table example: Normal Table (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009). This table gives positive z-scores and the area in the table is the area under the normal curve from 0 (the mean) to that z-score. Using this table takes some practice and understanding symmetry and area under the normal curve.
A few examples are provided on the table link (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009):
- The area to the left of a Z score of 1.53 can be found by going to 1.5 in the X column, go right to the 0.03 column to find the value 0.43699. Now add 0.5 (for the probability less than zero) to obtain the final result of 0.93699. P(Z ≤ 1.53) = 0.93699.
- The probability that z is less than or equal to -1.53 (also called the area to the left of -1.53) can also be found using this table. Remember that the curve is symmetrical, so P(Z ≥ z) = P(Z ≤ -z). Since we found P(Z ≤ 1.53) = 0.93699, then P(Z ≥ 1.53) = 1 – 0.93699 = 0.06301 = P(Z ≤ -1.53).
- Finding probabilities (area) between values is also possible. To find the probability that z is between -1 and 0.5, look up the values for 0.5 (0.5 + 0.19146 = 0.69146) and -1 (1 – (0.5 + 0.34134) = 0.15866). Then subtract the results (0.69146 – 0.15866) to obtain the result 0.5328.
To use any of these Z tables with a non-standard normal distribution (either the location parameter is not 0 or the scale parameter is not 1), standardize your value by subtracting the mean and dividing the result by the standard deviation. Then look up the value for this standardized value. (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009)
Example
Use the Z table (Normal Table) to find the following probabilities:
a. P(Z ≤ -0.54)
Solution
The area between 0.54 and 0 can be found by going to 0.5 in the X column, go right to the 0.04 column to find the value 0.20540. If you add 0.5 to that value, (0.5 + 0.20540 = 0.7054) you’ll have the area all the way to the left of 0.54. P(Z ≤ 0.54) = 0.7054. To find the equal negative z-score, subtract that from 1. So 1 – 0.7054 = 0.2946.
b. P(Z ≥ 1.2)
Solution
The area between 1.2 and 0 can be found by going to 1.2 in the X column, go right to the 0.00 column to find the value 0.38493. Since the curve has an area to the right of 0 that equals 0.5, then to the right of just 1.2 must be 0.5 – 0.38493 = 0.11507.
c. P(-1.5 ≤ Z ≤ 0.84)
Solution
The area between 0 and 0.84 is found by going to 0.8 in the X column, go right to the 0.04 column to find the value 0.29955. The area between -1.5 and 0 is the same as the area between 1.5 and 0, which can be found on the table as a value of 0.43319. This area can be found by adding 0.29955 + 0.43319 = 0.73274.
Your Turn
Use the Z table to find the following probabilities:
a. P(Z ≤ 1)
b. P(Z ≥ 1)
c. P(-1 ≤ Z ≤ 1)
So far we have seen the idea that we can convert any normal distribution with any mean and standard deviation to the standard normal distribution in units of z-scores. We also have the associated probabilities in our Z table. Essentially, the work has been done for us if we know how to standardize and look up the associated probability in the table. The general process is: X~N(μ,σ) -> Z~N(0,1) -> Probability from Z table
This process, while maybe outdated in our technology age, is good for beginners to understand and useful when we do not have access to technology.
Example
Height and weight are two measurements used to track a child’s development. The World Health Organization measures child development by comparing the weights of children who are the same height and the same gender. In 2009, weights for all 80 cm girls in the reference population had a mean µ = 10.2 kg and standard deviation σ = 0.8 kg. Weights are normally distributed. X ~ N(10.2, 0.8). Calculate the z-scores that correspond to the following weights, then find the associated probabilities.
a. The probability that a child weighs less than 11 kg
b. The probability that a child weighs more than 7.9 kg
c. The probability that a child weighs between 11.2 and 12.2 kg
Your Turn!
The golf scores for a school team were normally distributed with a mean of 68 and a standard deviation of three.
a. Find the probability that a randomly selected golfer scored less than 65.
Solution
0.1587
b. Find the probability that a golfer scored between 66 and 70.
Solution
0.4950
The “Un-standardizing” Process
Sometimes we may be given a percentile or z-score and want to work backwards through the standardizing process to find a value on the original distribution. You could call this “un-standardizing” or finding a normal quantile. quantile.
The process looks like this: Probability in Z table -> Z~N(0,1) -> X~N(μ,σ)
For example, if the mean of a normal distribution is five and the standard deviation is two, what value is three standard deviations above (or to the right of) the mean (z-score = three). Rearranging the z-score formula, the calculation is as follows: x = μ + (z)(σ) = 5 + (3)(2) = 11
Often we are given a percentile to find on the original distribution. For example, what if we want to know a value on the previous distribution that corresponds to the 90th percentile? We can look up a probability of 0.9 in the Z table and find a corresponding z-score of approximately 1.28.
x = μ + (z)(σ) = 5 + (1.28)(2) = 7.56
Example
Two thousand students took an exam. The scores on the exam have an approximate normal distribution with a mean μ = 81 points and standard deviation σ = 15 points.
a. Calculate the first and third quartile scores for this exam.
Solution
Start by finding 0.25 (or the closest value possible) in the Z table. You can find 0.24537 in the table for a z-score of 0.66. Now using the rearranged z-score formula, x = μ + (z)(σ) = 81 + (0.66)() = 7.56
the Q1 = 25th percentile = invNorm(0.25,81,15) = 70.9
Q3 = 75th percentile = invNorm(0.75,81,15) = 91.9
b. The middle 50% of the exam scores are between what two values?
Solution
In part a, The middle 50% of the scores are between 70.9 and 91.1.
Your Turn
A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm.
a. Find the 90th percentile for the diameters of mandarin oranges
b. The middle 20% of mandarin oranges from this farm have diameters between _____ and _____.
Using Technology
Probabilities can be quickly calculated using technology. The technology that will be used for the following examples are the TI-83+ and TI-84 calculators. The best news about using the TI graphing calculator is that there is no need to standardize. You don’t have to go from X~N(μ,σ) -> Z~N(0,1), you just use X~N(μ,σ). Although the calculator is a quick method, drawing the image of the shaded region can help you to see if your answer is valid.
Using the TI-83+ and TI-84 Calculators for Normal Probabilities
Go into 2nd DISTR
.
Press 2:normalcdf
.
The syntax for the instructions are as follows: normalcdf(lower value, upper value, mean, standard deviation)
In some instances:
- the upper number of the area might be 1E99 (= 1099). You get 1E99 (= 1099) by pressing
1
, theEE
key (a 2nd key) and then99
. Or, you can enter10^99
instead. The number 1099 is way out in the right tail of the normal curve. - the lower number of the area might be –1E99 (= –1099). The number –1099 is way out in the left tail of the normal curve.
Example
The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.
a. Find the probability that a randomly selected student scored more than 65 on the exam.
Solution
Draw the graph, then find P(x > 65).
normalcdf(65, 1E99, 63, 5) = 0.3446.
b. Find the probability that a randomly selected student scored less than 85.
Solution
normalcdf(-1E99, 85, 63, 5) = 1 (rounds to one)
The probability that one student scores less than 85 is approximately one (or 100%).
Example
There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years, respectively.
a. Determine the probability that a random smartphone user in the age range 13 to 55+ is between 23 and 64.7 years old.
Solution
normalcdf(23, 64.7, 36.9, 13.9) = 0.8186
b. Determine the probability that a randomly selected smartphone user in the age range 13 to 55+ is at most 50.8 years old.
Solution
normalcdf(–1099, 50.8, 36.9, 13.9) = 0.8413
Your Turn!
A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the probability that a household personal computer is used for entertainment between 1.8 and 2.75 hours per day.
Solution
Let X = the amount of time (in hours) a household personal computer is used for entertainment. X ~ N(2, 0.5) where μ = 2 and σ = 0.5. Find P(1.8 < x < 2.75).
The probability for which you are looking is the area between x = 1.8 and x = 2.75.
normalcdf(1.8, 2.75, 2, 0.5) = 0.5886
The probability that a household personal computer is used between 1.8 and 2.75 hours per day for entertainment is 0.5886.
Using the TI-83+ and TI-84 Calculators for Finding Percentiles (Z-scores)
Press 2nd DISTR
Press 3:invNorm
The syntax for the instructions are as follows: invNorm(area to the left, mean, standard deviation)
**Note: Percentiles are values that are left area, but be careful to read if you are given a right area in the problem. Remember, Right Area = 1 – Left Area.
Example
The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.
a. Find the 90th percentile (that is, find the score k that has 90% of the scores below k and 10% of the scores above k).
Solution
Draw a graph and shade the area that corresponds to the 90th percentile. Let k = the 90th percentile. The variable k is located on the x-axis. P(x < k) is the area to the left of k. The 90th percentile k separates the exam scores into those that are the same or lower than k and those that are the same or higher. Ninety percent of the test scores are the same or lower than k, and ten percent are the same or higher.
For this problem, invNorm(0.90, 63, 5) = 69.4
The 90th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall at or above.
b. Find the 70th percentile (that is, find the score k such that 70% of scores are below k and 30% of the scores are above k).
Solution
invNorm(0.70,63,5) = 65.6
The 70th percentile is 65.6. This means that 70% of the test scores fall at or below 65.5 and 30% fall at or above.
Example
There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years respectively. Using this information, answer the following questions (round answers to one decimal place).
a. Calculate the interquartile range (IQR).
Solution
b. Forty percent of the ages that range from 13 to 55+ are at least what age?
Solution
Your Turn!
A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment.
Solution
To find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment, find the 25th percentile, k, where P(x < k) = 0.25.
invNorm(0.25,2,0.5) = 1.66
The maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment is 1.66 hours.
Your Turn
A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm. The middle 40% of mandarin oranges from this farm are between ______ and ______.
Solution
The middle area = 0.40, so each tail has an area of 0.30.
1 – 0.40 = 0.60
The tails of the graph of the normal distribution each have an area of 0.30.
Find k1, the 30th percentile and k2, the 70th percentile (0.40 + 0.30 = 0.70).
k1 = invNorm(0.30,5.85,0.24) = 5.72 cm
k2 = invNorm(0.70,5.85,0.24) = 5.98 cm
References
“Naegele’s rule.” Wikipedia. Available online at http://en.wikipedia.org/wiki/Naegele’s_rule (accessed May 14, 2013).
“403: NUMMI.” Chicago Public Media & Ira Glass, 2013. Available online at http://www.thisamericanlife.org/radio-archives/episode/403/nummi (accessed May 14, 2013).
“Scratch-Off Lottery Ticket Playing Tips.” WinAtTheLottery.com, 2013. Available online at http://www.winatthelottery.com/public/department40.cfm (accessed May 14, 2013).
“Smart Phone Users, By The Numbers.” Visual.ly, 2013. Available online at http://visual.ly/smart-phone-users-numbers (accessed May 14, 2013).
“Facebook Statistics.” Statistics Brain. Available online at http://www.statisticbrain.com/facebook-statistics/(accessed May 14, 2013).
Media Attributions
- Private: Figure 5.12 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
- Private: Figure 5.13 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
- fig-ch06_05_01-1
a continuous random variable (RV) with a mean of 0 and a standard deviation of 1 which z-scores follow: X ~ N(0, 1); when X follows the standard normal distribution, it is often noted as Z ~ N(0, 1).
the linear transformation of the form z = [latex]\frac{x\text{ }–\text{ }\mu }{\sigma }[/latex]; if this transformation is applied to any normal distribution X ~ N(μ, σ) the result is the standard normal distribution Z ~ N(0,1). If this transformation is applied to any specific value x of the RV with mean μ and standard deviation σ, the result is called the z-score of x. The z-score allows us to compare data that are normally distributed but scaled differently.
Points in a distribution that relate to the rank order of values in that distribution