6.2 Using the Normal Distribution

Jared Eusea; Phyllis Okwan; Rachid Belmasrour; Stephan Patterson; Stephen Andrus

Chapter 6: The Normal Distribution and The Central Limit Theorem

6.2 Using the Normal Distribution

Learning Objectives

By the end of this section, the student should be able to:

Find areas under the normal distribution curve.
Use the inverse normal to identify values for a given area under the normal curve.
Compute probabilities and percentiles for real-world applications using the normal distribution.

Finding Normal Probabilities

The shaded area in the following graph indicates the area to the left of x. This area is represented by the probability P(X < x). Normal tables, computers, and calculators provide or calculate the probability P(X < x).

Diagram showing a bell-shaped curve with uppercase X at the extreme right end of the X axis. The X axis also contains a lowercase x about one-quarter of the way across the X axis from the right. The area under the bell curve to the right of the lowercase x is shaded. The label states: shaded area represents probability P(X < x). — **Figure 6.3.**

The area to the right is then P(X > x) = 1 – P(X < x). Remember, P(X < x) = Area to the left of the vertical line through x. P(X < x) = 1 – P(X < x) = Area to the right of the vertical line through x. P(X < x) is the same as P(X ≤ x) and P(X > x) is the same as P(X ≥ x) for continuous distributions.

Note

If the area to the left is 0.0228, then the area to the right is 1 – 0.0228 = 0.9772.

Your Turn!

If the area to the left of x is 0.012, then what is the area to the right?

Solution

1 − 0.012 = 0.988

There are 3 main ways we can find probabilities associated with the Normal Distribution. These include:

Math (via Calculus Integration)
The Standardizing Process
Technology

We would like to avoid complicated math if possible, so option 1 is out.

In order to avoid the math, a process called “Standardizing” can be used (option 2). This involves Z-scores, the Standard Normal Distribution and Tables. Although this tried and true process is now somewhat antiquated, it is a great place to start.

There are many technologies such as calculators and various statistical software that let us skip the entire standardizing process and instantaneously provide us with a probability. Although we typically have these at our disposal to use in practice, it is good to understand the process going on behind the scenes to make sure we apply our technology correctly.

The Standardizing Process

The standard normal distribution (SND) is the simplest form of the normal distribution you can think of. Recall some important facts from the previous section:

The mean for the standard normal distribution is zero, and the standard deviation is one.
If X is a normally distributed random variable and X ~ N(μ, σ), then the z-score is: [latex]z=\frac{x\text{ }–\text{ }\mu }{\sigma }[/latex]. The transformation z = [latex]\frac{x-\mu }{\sigma }[/latex] produces the distribution Z ~ N(0, 1). The value x in the given equation comes from a normal distribution with mean μ and standard deviation σ.
The z-scores are converted to units of the standard deviation.
A z-score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z-scores, and values of x that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of zero.

We have the Z table at our disposal with probabilities already calculated and organized. Note that some Z tables (like in Figure 6.4) give us the left tailed CDF, or “less than” probability. For example, using the Figure 6.4 below, the area to the left of a Z score of -3.37, P(Z ≤ -3.37) =0.0004.

Z score table that highlights the associated value of 0.0004 with Z value of -3.37. — **Figure 6.4.**

We can then use these CDF values, P(Z ≤ z), and some probability rules to find greater than [P(Z ≥ z) = 1 – P(Z ≤ z)] or in between [P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a)] probabilities.

Here is a link to another Z table example: Normal Table (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009). This table gives positive z-scores and the area in the table is the area under the normal curve from 0 (the mean) to that z-score. Using this table takes some practice and understanding symmetry and area under the normal curve.

A few examples are provided on the table link (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009):

The area to the left of a Z score of 1.53 can be found by going to 1.5 in the X column, go right to the 0.03 column to find the value 0.43699. Now add 0.5 (for the probability less than zero) to obtain the final result of 0.93699. P(Z ≤ 1.53) = 0.93699.
The probability that z is less than or equal to -1.53 (also called the area to the left of -1.53) can also be found using this table. Remember that the curve is symmetrical, so P(Z ≥ z) = P(Z ≤ -z). Since we found P(Z ≤ 1.53) = 0.93699, then P(Z ≥ 1.53) = 1 – 0.93699 = 0.06301 = P(Z ≤ -1.53).
Finding probabilities (area) between values is also possible. To find the probability that z is between -1 and 0.5, look up the values for 0.5 (0.5 + 0.19146 = 0.69146) and -1 (1 – (0.5 + 0.34134) = 0.15866). Then subtract the results (0.69146 – 0.15866) to obtain the result 0.5328.

To use any of these Z tables with a non-standard normal distribution (either the location parameter is not 0 or the scale parameter is not 1), standardize your value by subtracting the mean and dividing the result by the standard deviation. Then look up the value for this standardized value. (Source: NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, January 3, 2009)

Example

Use the Z table (Normal Table) to find the following probabilities:

a. P(Z ≤ -0.54)

Solution

The area between 0.54 and 0 can be found by going to 0.5 in the X column, go right to the 0.04 column to find the value 0.20540. If you add 0.5 to that value, (0.5 + 0.20540 = 0.7054) you’ll have the area all the way to the left of 0.54. P(Z ≤ 0.54) = 0.7054. To find the equal negative z-score, subtract that from 1. So 1 – 0.7054 = 0.2946.

b. P(Z ≥ 1.2)

Solution

The area between 1.2 and 0 can be found by going to 1.2 in the X column, go right to the 0.00 column to find the value 0.38493. Since the curve has an area to the right of 0 that equals 0.5, then to the right of just 1.2 must be 0.5 – 0.38493 = 0.11507.

c. P(-1.5 ≤ Z ≤ 0.84)

Solution

The area between 0 and 0.84 is found by going to 0.8 in the X column, go right to the 0.04 column to find the value 0.29955. The area between -1.5 and 0 is the same as the area between 1.5 and 0, which can be found on the table as a value of 0.43319. This area can be found by adding 0.29955 + 0.43319 = 0.73274.

Your Turn

Use the Z table to find the following probabilities:

a. P(Z ≤ 1)

b. P(Z ≥ 1)

c. P(-1 ≤ Z ≤ 1)

So far we have seen the idea that we can convert any normal distribution with any mean and standard deviation to the standard normal distribution in units of z-scores. We also have the associated probabilities in our Z table. Essentially, the work has been done for us if we know how to standardize and look up the associated probability in the table. The general process is: X~N(μ,σ) -> Z~N(0,1) -> Probability from Z table

This process, while maybe outdated in our technology age, is good for beginners to understand and useful when we do not have access to technology.

Example

Height and weight are two measurements used to track a child’s development. The World Health Organization measures child development by comparing the weights of children who are the same height and the same gender. In 2009, weights for all 80 cm girls in the reference population had a mean µ = 10.2 kg and standard deviation σ = 0.8 kg. Weights are normally distributed. X ~ N(10.2, 0.8). Calculate the z-scores that correspond to the following weights, then find the associated probabilities.

a. The probability that a child weighs less than 11 kg

b. The probability that a child weighs more than 7.9 kg

c. The probability that a child weighs between 11.2 and 12.2 kg

Your Turn!

The golf scores for a school team were normally distributed with a mean of 68 and a standard deviation of three.

a. Find the probability that a randomly selected golfer scored less than 65.

Solution

0.1587

b. Find the probability that a golfer scored between 66 and 70.

Solution

0.4950

The “Un-standardizing” Process

Sometimes we may be given a percentile or z-score and want to work backwards through the standardizing process to find a value on the original distribution. You could call this “un-standardizing” or finding a normal quantile. quantile.

The process looks like this: Probability in Z table -> Z~N(0,1) -> X~N(μ,σ)

For example, if the mean of a normal distribution is five and the standard deviation is two, what value is three standard deviations above (or to the right of) the mean (z-score = three). Rearranging the z-score formula, the calculation is as follows: x = μ + (z)(σ) = 5 + (3)(2) = 11

Often we are given a percentile to find on the original distribution. For example, what if we want to know a value on the previous distribution that corresponds to the 90th percentile? We can look up a probability of 0.9 in the Z table and find a corresponding z-score of approximately 1.28.

x = μ + (z)(σ) = 5 + (1.28)(2) = 7.56

Example

Two thousand students took an exam. The scores on the exam have an approximate normal distribution with a mean μ = 81 points and standard deviation σ = 15 points.

a. Calculate the first and third quartile scores for this exam.

Solution

Start by finding 0.25 (or the closest value possible) in the Z table. You can find 0.24537 in the table for a z-score of 0.66. Now using the rearranged z-score formula, x = μ + (z)(σ) = 81 + (0.66)() = 7.56

the Q₁ = 25^th percentile = invNorm(0.25,81,15) = 70.9

Q₃ = 75^th percentile = invNorm(0.75,81,15) = 91.9

b. The middle 50% of the exam scores are between what two values?

Solution

In part a, The middle 50% of the scores are between 70.9 and 91.1.

Your Turn

A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm.

a. Find the 90^th percentile for the diameters of mandarin oranges

b. The middle 20% of mandarin oranges from this farm have diameters between _____ and _____.

Using Technology

Probabilities can be quickly calculated using technology. The technology that will be used for the following examples are the TI-83+ and TI-84 calculators. The best news about using the TI graphing calculator is that there is no need to standardize. You don’t have to go from X~N(μ,σ) -> Z~N(0,1), you just use X~N(μ,σ). Although the calculator is a quick method, drawing the image of the shaded region can help you to see if your answer is valid.

Using the TI-83+ and TI-84 Calculators for Normal Probabilities

Go into 2nd DISTR.

Press 2:normalcdf.

The syntax for the instructions are as follows: normalcdf(lower value, upper value, mean, standard deviation)

In some instances:

the upper number of the area might be 1E99 (= 10⁹⁹). You get 1E99 (= 10⁹⁹) by pressing 1, the EE key (a 2nd key) and then 99. Or, you can enter 10^99 instead. The number 10⁹⁹ is way out in the right tail of the normal curve.
the lower number of the area might be –1E99 (= –10⁹⁹). The number –10⁹⁹ is way out in the left tail of the normal curve.

Example

The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.

a. Find the probability that a randomly selected student scored more than 65 on the exam.

Solution

Draw the graph, then find P(x > 65).

normalcdf(65, 1E99, 63, 5) = 0.3446.

b. Find the probability that a randomly selected student scored less than 85.

Solution

normalcdf(-1E99, 85, 63, 5) = 1 (rounds to one)

The probability that one student scores less than 85 is approximately one (or 100%).

Example

There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years, respectively.

a. Determine the probability that a random smartphone user in the age range 13 to 55+ is between 23 and 64.7 years old.

Solution

normalcdf(23, 64.7, 36.9, 13.9) = 0.8186

b. Determine the probability that a randomly selected smartphone user in the age range 13 to 55+ is at most 50.8 years old.

Solution

normalcdf(–10⁹⁹, 50.8, 36.9, 13.9) = 0.8413

Your Turn!

A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the probability that a household personal computer is used for entertainment between 1.8 and 2.75 hours per day.

Solution

Let X = the amount of time (in hours) a household personal computer is used for entertainment. X ~ N(2, 0.5) where μ = 2 and σ = 0.5. Find P(1.8 < x < 2.75).

The probability for which you are looking is the area between x = 1.8 and x = 2.75.

This is a normal distribution curve. The peak of the curve coincides with the point 2 on the horizontal axis. The values 1.8 and 2.75 are also labeled on the x-axis. Vertical lines extend from 1.8 and 2.75 to the curve. The area between the lines is shaded.

normalcdf(1.8, 2.75, 2, 0.5) = 0.5886

The probability that a household personal computer is used between 1.8 and 2.75 hours per day for entertainment is 0.5886.

Using the TI-83+ and TI-84 Calculators for Finding Percentiles (Z-scores)

Press 2nd DISTR

Press 3:invNorm

The syntax for the instructions are as follows: invNorm(area to the left, mean, standard deviation)

**Note: Percentiles are values that are left area, but be careful to read if you are given a right area in the problem. Remember, Right Area = 1 – Left Area.

Example

The final exam scores in a statistics class were normally distributed with a mean of 63 and a standard deviation of five.

a. Find the 90^th percentile (that is, find the score k that has 90% of the scores below k and 10% of the scores above k).

Solution

Draw a graph and shade the area that corresponds to the 90^th percentile. Let k = the 90^th percentile. The variable k is located on the x-axis. P(x < k) is the area to the left of k. The 90^th percentile k separates the exam scores into those that are the same or lower than k and those that are the same or higher. Ninety percent of the test scores are the same or lower than k, and ten percent are the same or higher.

This is a normal distribution curve. The peak of the curve coincides with the point 63 on the horizontal axis. A point, k, is labeled to the right of 63. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. This represents the probability that x is less than k: P(x < k) = 0.90

For this problem, invNorm(0.90, 63, 5) = 69.4

The 90^th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall at or above.

b. Find the 70^th percentile (that is, find the score k such that 70% of scores are below k and 30% of the scores are above k).

Solution

invNorm(0.70,63,5) = 65.6

The 70^th percentile is 65.6. This means that 70% of the test scores fall at or below 65.5 and 30% fall at or above.

Example

There are approximately one billion smartphone users in the world today. In the United States the ages 13 to 55+ of smartphone users approximately follow a normal distribution with approximate mean and standard deviation of 36.9 years and 13.9 years respectively. Using this information, answer the following questions (round answers to one decimal place).

a. Calculate the interquartile range (IQR).

Solution

IQR = Q₃ – Q₁

Calculate Q₃ = 75^th percentile and Q₁ = 25^th percentile.

invNorm(0.75,36.9,13.9) = Q₃ = 46.2754

invNorm(0.25,36.9,13.9) = Q₁ = 27.5246

IQR = Q₃ – Q₁ = 18.7508

b. Forty percent of the ages that range from 13 to 55+ are at least what age?

Solution

Find k where P(x > k) = 0.40 (“At least” translates to “greater than or equal to.”)

0.40 = the area to the right.

Area to the left = 1 – 0.40 = 0.60.

The area to the left of k = 0.60.

invNorm(0.60,36.9,13.9) = 40.4215.

k = 40.42.

Forty percent of the ages that range from 13 to 55+ are at least 40.42 years.

Your Turn!

A personal computer is used for office work at home, research, communication, personal finances, education, entertainment, social networking, and a myriad of other things. Suppose that the average number of hours a household personal computer is used for entertainment is two hours per day. Assume the times for entertainment are normally distributed and the standard deviation for the times is half an hour. Find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment.

Solution

To find the maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment, find the 25^th percentile, k, where P(x < k) = 0.25.

This is a normal distribution curve. The area under the left tail of the curve is shaded. The shaded area shows that the probability that x is less than k is 0.25. It follows that k = 1.67.

invNorm(0.25,2,0.5) = 1.66

The maximum number of hours per day that the bottom quartile of households uses a personal computer for entertainment is 1.66 hours.

Your Turn

A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm. The middle 40% of mandarin oranges from this farm are between ______ and ______.

Solution

The middle area = 0.40, so each tail has an area of 0.30.

1 – 0.40 = 0.60

The tails of the graph of the normal distribution each have an area of 0.30.

Find k1, the 30^th percentile and k2, the 70^th percentile (0.40 + 0.30 = 0.70).

k1 = invNorm(0.30,5.85,0.24) = 5.72 cm

k2 = invNorm(0.70,5.85,0.24) = 5.98 cm

References

“Naegele’s rule.” Wikipedia. Available online at http://en.wikipedia.org/wiki/Naegele’s_rule (accessed May 14, 2013).

“403: NUMMI.” Chicago Public Media & Ira Glass, 2013. Available online at http://www.thisamericanlife.org/radio-archives/episode/403/nummi (accessed May 14, 2013).

“Scratch-Off Lottery Ticket Playing Tips.” WinAtTheLottery.com, 2013. Available online at http://www.winatthelottery.com/public/department40.cfm (accessed May 14, 2013).

“Smart Phone Users, By The Numbers.” Visual.ly, 2013. Available online at http://visual.ly/smart-phone-users-numbers (accessed May 14, 2013).

“Facebook Statistics.” Statistics Brain. Available online at http://www.statisticbrain.com/facebook-statistics/(accessed May 14, 2013).

Media Attributions

Private: Figure 5.12 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
Private: Figure 5.13 © Significant Statistics by John Morgan Russell is licensed under a CC BY-SA (Attribution ShareAlike) license
fig-ch06_05_01-1

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Finding Normal Probabilities

The Standardizing Process

The “Un-standardizing” Process

Using Technology

References

Media Attributions

License

Share This Book