Chapter 6: The Normal Distribution and The Central Limit Theorem
6.4 The Central Limit Theorem for Sums
Learning Objectives
By the end of this section, the student should be able to:
- Recognize characteristics of the Central Limit Theorem for the Sums
- Apply and interpret the Central Limit Theorem for the Sums and use it to solve real-world applications.
Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution) and suppose:
- [latex]\mu_X = \text{the mean of } X[/latex]
- [latex]\sigma_X = \text{the standard deviation of } X[/latex]
If you draw random samples of size n, then as n increases, the random variable ΣX consisting of sums tends to be normally distributed and [latex]\Sigma{X} \sim N \left((n)(\mu_x), (\sqrt{n})(\sigma_X) \right)[/latex].
The central limit theorem for sums says that if you keep drawing larger and larger samples and taking their sums, the sums form their own normal distribution (the sampling distribution), which approaches a normal distribution as the sample size increases. The normal distribution has a mean equal to the original mean multiplied by the sample size and a standard deviation equal to the original standard deviation multiplied by the square root of the sample size.
The random variable [latex]\Sigma{X}[/latex] has the following z-score associated with it:
- [latex]\Sigma{X}[/latex] is one sum.
- [latex]z = \frac{\Sigma x–(n)({\mu }_{X})}{(\sqrt{n})({\sigma }_{X})}[/latex]
- [latex](n)(\mu_X) = \text{the mean of } \Sigma{X}[/latex]
- [latex](\sqrt{n})({\sigma }_{X}) = \text{standard deviation of }\Sigma X[/latex]
Using the TI-83, 83+, 84, 84+ Calculator
To find probabilities for sums on the calculator, follow these steps.
2nd DISTR
2: normalcdf()
[latex]\text{normalcdf} \left(\text{lower value of the area, upper value of the area, }(n)(\text{mean}), (\sqrt{n})(\text{standard deviation}) \right)[/latex]
where:
- mean is the mean of the original distribution
- standard deviation is the standard deviation of the original distribution
- sample size = n
Example
An unknown distribution has a mean of 90 and a standard deviation of 15. A sample of size 80 is drawn randomly from the population.
- Find the probability that the sum of the 80 values (or the total of the 80 values) is more than 7,500.
- Find the sum that is 1.5 standard deviations above the mean of the sums.
Solution
Let X = one value from the original unknown population. The probability question asks you to find a probability for the sum (or total of) 80 values.
[latex]\Sigma{X} = \text{the sum or total of 80 values}[/latex]. Since [latex]\mu_X = 90[/latex], [latex]\sigma_X = 15[/latex], and [latex]n = 80[/latex], [latex]\Sigma X \sim N \left((80)(90), (\sqrt{80})(15) \right)[/latex]
- [latex]\text{mean of the sums} = (n)(\mu_X) = (80)(90) = 7,200[/latex]
- [latex]\text{standard deviation of the sums} = (\sqrt{n})(\sigma _{X}) = (\sqrt{80})(15)[/latex]
- [latex]\text{sum of 80 values} = \Sigma{X} = 7,500[/latex]
1. Find [latex]P(\Sigma{x} > 7,500)[/latex]
[latex]P(\Sigma{x} > 7,500) = 0.0127[/latex]
[latex]\text{normalcdf(lower value, upper value, mean of sums, stdev of sums)}[/latex]
The parameter list is abbreviated [latex]\text{lower, upper, }(n)(\mu_X, (\sqrt{n})(\sigma_X)[/latex].
[latex]\text{normalcdf} \left(7500, 1\text{E}99, (80)(90), (\sqrt{80})(15) \right) = 0.0127[/latex]
2. Find [latex]\Sigma{x}[/latex] where [latex]z = 1.5[/latex].
[latex]\Sigma{x} = (n)(\mu_x)+(z)(\sqrt{n})(\sigma_x) = (80)(90) + (1.5)(\sqrt{80})(15) = 7,401.2[/latex]
Your Turn!
An unknown distribution has a mean of 45 and a standard deviation of eight. A sample size of 50 is drawn randomly from the population. Find the probability that the sum of the 50 values is more than 2,400.
Solution
0.0040
Using the TI-83, 83+, 84, 84+ Calculator
To find percentiles for sums on the calculator, follow these steps.
2nd DISTR
3:invNorm()
[latex]k = \text{invNorm} \left(\text{area to the left of }k, (n)(\text{mean}), (\sqrt{n})(\text{standard deviation}) \right)[/latex]
where:
- k is the kth percentile
- mean is the mean of the original distribution
- standard deviation is the standard deviation of the original distribution
- sample size = n
Example
In an October 29, 2012, study reported on the Flurry Blog, the mean age of tablet users is 34 years. Suppose the standard deviation is 15 years. The sample size is 50.
- What are the mean and standard deviation for the sum of the ages of tablet users? What is the distribution?
- Find the probability that the sum of the ages is between 1,500 and 1,800 years.
- Find the 80th percentile for the sum of the 50 ages.
Solution
- [latex]\mu_{\Sigma{x}} = n \mu_x = 50(34) = 1,700[/latex] and [latex]\sigma_{\Sigma{x}} = (\sqrt{n}) (\sigma_x) = (\sqrt{50})(15) = 106.01[/latex]
The distribution is normal for sums by the central limit theorem. - [latex]P(1500 \lt \Sigma{x} \lt 1800) = \text{normalcdf} \left(1500, 1800, (50)(34), (\sqrt{50})(15) \right) = 0.7974[/latex]
- Let k = the 80th percentile.
[latex]k = \text{invNorm} \left(0.80, (50)(34), (\sqrt{50})(15) \right) = 1,789.3[/latex]
Your Turn!
In an October 29, 2012, study reported on the Flurry Blog, the mean age of tablet users is 35 years. Suppose the standard deviation is ten years. The sample size is 39.
- What are the mean and standard deviation for the sum of the ages of tablet users? What is the distribution?
- Find the probability that the sum of the ages is between 1,400 and 1,500 years.
- Find the 90th percentile for the sum of the 39 ages.
Solution
- [latex]\mu_{\Sigma{x}} = n \mu_x = 1,365[/latex] and [latex]\sigma_{\Sigma{x}} = (\sqrt{n}) (\sigma_x) = 62.4[/latex]
The distribution is normal for sums by the central limit theorem. - [latex]P(1400 \lt \Sigma{x} \lt 1500) = \text{normalcdf}(1400, 1500, (39)(35), (\sqrt{39})(10)) = 0.2723[/latex]
- Let k = the 90th percentile.
[latex]k = \text{invNorm} \left(0.90, (39)(35), (\sqrt{39})(10) \right) = 1,445.0[/latex]
Example
The mean number of minutes for app engagement by a tablet user is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample of size 70.
- What are the mean and standard deviation for the sums?
- Find the 95th percentile for the sum of the sample. Interpret this value in a complete sentence.
- Find the probability that the sum of the sample is at least ten hours.
Solution
- [latex]\mu_{\Sigma{x}} = n \mu_x = 70(8.2) = 574 \text{ minutes}[/latex] and [latex]\sigma_{\Sigma{x}} = (\sqrt{n}) (\sigma_x) = (\sqrt{70})(1) = 8.37 \text{minutes}[/latex]
- Let k = the 95th percentile.
[latex]k = \text{invNorm}(0.95, (70)(8.2), (\sqrt{70})(1)) = 587.76 \text{minutes}[/latex]
Ninety five percent of the app engagement times are at most 587.76 minutes. - ten hours = 600 minutes
[latex]P(\Sigma{x} \ge 600) = \text{normalcdf} \left(600, \text{E}99, (70)(8.2), (\sqrt{70})(1) \right) = 0.0009[/latex]
Your Turn!
The mean number of minutes for app engagement by a tablet user is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample size of 70.
- What is the probability that the sum of the sample is between seven hours and ten hours? What does this mean in context of the problem?
- Find the 84th and 16th percentiles for the sum of the sample. Interpret these values in context.
Solution
- 7 hours = 420 minutes and 10 hours = 600 minutes
[latex]P\left(420\le \Sigma x\le 600\right) = \text{normalcdf} \left(420, 600, (70)(8.2), \sqrt{70}(1) \right)=0.9991[/latex]This means that for these sample sums, there is a 99.9% chance that the sums of usage minutes will be between 420 minutes and 600 minutes. - [latex]\text{invNorm}\left(0.84,\left(70\right)\left(8.2\right),\sqrt{70}\left(1\right)\right)=582.32[/latex]
[latex]\text{invNorm}\left(0.16,\left(70\right)\left(8.2\right),\sqrt{70}\left(1\right)\right)=565.68[/latex]Since 84% of the app engagement times are at most 582.32 minutes and 16% of the app engagement times are at most 565.68 minutes, we may state that 68% of the app engagement times are between 565.68 minutes and 582.32 minutes.
Note
It is important for you to understand when to use the central limit theorem and when to differentiate between the CLT for the Mean versus the CLT for the Sum (or Total). If you are being asked to find the probability of the mean, use the clt for the mean, but if you are being asked to find the probability of a sum or total, use the clt for sums. This also applies to percentiles for means and sums.
Example
A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students. Find:
- The probability that the mean stress score for the 75 students is less than two.
- The 90th percentile for the mean stress score for the 75 students.
- The probability that the total of the 75 stress scores is less than 200.
- The 90th percentile for the total stress score for the 75 students.
Let X = one stress score. Problems 1 and 2 ask you to find a probability or a percentile for a mean. Problems 3 and 4 ask you to find a probability or a percentile for a total or sum.
The sample size, n, is equal to 75. Since the individual stress scores follow a uniform distribution, [latex]X \sim U(1, 5)[/latex] where [latex]a = 1[/latex] and [latex]b = 5[/latex].
[latex]\mu_X = \frac{a+b}{2} = \frac{1 + 5}{2} = 3[/latex]
[latex]\sigma_X = \sqrt{\frac{{(b–a)}^{2}}{12}} = \sqrt{\frac{{(5-1)}^2}{12}} = 1.15[/latex]
For problems 1 and 2., let [latex]\overline{X} = \text{the mean stress score for the 75 students}[/latex]. Then, [latex]\overline{X} \sim N \left(3, \frac{1.15}{\sqrt{75}}\right)[/latex] where [latex]n = 75[/latex].
1. Find [latex]P(\overline{x} \lt 2)[/latex].
Solution
[latex]P(\overline{x} \lt 2) = 0[/latex]
The probability that the mean stress score is less than two is about zero.
[latex]\text{normalcdf}\left(1, 2, 3, \frac{1.15}{\sqrt{75}}\right) = 0[/latex]
Reminder: The smallest stress score is one.
2. Find the 90th percentile for the mean of 75 stress scores. Draw a graph.
Solution
Let k = the 90th percentile.
Find k, where [latex]P(\overline{x} \lt k) = 0.90[/latex].
[latex]k = 3.2[/latex]
This is a normal distribution curve. The peak of the curve coincides with the point 3 on the horizontal axis. A point, k, is labeled to the right of 3. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. The shaded area shows that [latex]P(\overline{x} \lt k) = 0.90[/latex]
The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 75 stress scores are at most 3.2, and that 10% are at least 3.2.
[latex]\text{invNorm}\left(0.90, 3, \frac{1.15}{\sqrt{75}}\right) = 3.2[/latex]
For problems 3 and 4, let [latex]\Sigma{X} = \text{the sum of the 75 stress scores}[/latex]. Then, [latex]\Sigma{X} \sim N \left[(75)(3), (\sqrt{75})(1.15) \right][/latex]
3. Find [latex]P(\Sigma{x} \lt 200)[/latex]. Draw the graph.
Solution
The mean of the sum of 75 stress scores is [latex](75)(3) = 225[/latex]
The standard deviation of the sum of 75 stress scores is [latex](\sqrt{75})(1.15) = 9.96[/latex]
[latex]P(\Sigma{x} \lt 200) = 0[/latex]
The probability that the total of 75 scores is less than 200 is about zero.
[latex]\text{normalcdf}(75, 200, (75)(3), (\sqrt{75})(1.15) = 0.006[/latex]
Reminder: The smallest total of 75 stress scores is 75, because the smallest single score is one.
4. Find the 90th percentile for the total of 75 stress scores. Draw a graph.
Solution
Let k = the 90th percentile.
Find k where [latex]P(\Sigma{x} \lt k) = 0.90[/latex]
[latex]k = 237.8[/latex]
This is a normal distribution curve. The peak of the curve coincides with the point 225 on the horizontal axis. A point, k, is labeled to the right of 225. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. The shaded area shows that [latex]P(\Sigma{x} \lt k) = 0.90[/latex]
Example
In the United States, someone is sexually assaulted every two minutes, on average, according to a number of studies. Suppose the standard deviation is 0.5 minutes and the sample size is 100.
- Find the median, the first quartile, and the third quartile for the sample mean time of sexual assaults in the United States.
- Find the median, the first quartile, and the third quartile for the sum of sample times of sexual assaults in the United States.
- Find the probability that a sexual assault occurs on the average between 1.75 and 1.85 minutes.
- Find the value that is two standard deviations above the sample mean.
- Find the IQR for the sum of the sample times.
Solution
- We have [latex]\mu_x = \mu = 2[/latex] and [latex]\sigma_x = \frac{\sigma }{\sqrt{n}} = \frac{0.5}{10} = 0.05[/latex]. Therefore:
- [latex]50^{\text{th}} \text{ percentile} = \mu_x = \mu = 2[/latex]
- [latex]25^{\text{th}} \text{ percentile} = \text{invNorm}(0.25, 2, 0.05) = 1.97[/latex]
- [latex]75^{\text{th}} \text{ percentile} = \text{invNorm}(0.75, 2, 0.05) = 2.03[/latex]
- We have [latex]\mu_{\Sigma{X}} = n(\mu_x) = 100(2) = 200[/latex] and [latex]\sigma_{\mu_{x}} = (\sqrt{n})(\sigma_x) = (10)(0.5) = 5[/latex]. Therefore:
- [latex]50^{\text{th}} \text{ percentile} = \mu_{\Sigma{X}} = n(\mu_x) = 100(2) = 200[/latex]
- [latex]25^{\text{th}} \text{ percentile} = \text{invNorm}(0.25, 200, 5) = 196.63[/latex]
- [latex]75^{\text{th}} \text{ percentile} = \text{invNorm}(0.75, 200, 5) = 203.37[/latex]
- [latex]P( 1.75 \lt \overline{x} \lt 1.85) = \text{normalcdf}(1.75, 1.85, 2, 0.05) = 0.0013[/latex]
- Using the z-score equation, [latex]z = \frac{\overline{x} - {\mu }_{\overline{x}}}{{\sigma }_{\overline{x}}}[/latex], and solving for x, we have [latex]x = 2(0.05) + 2 = 2.1[/latex].
- The IQR is [latex]75^{\text{th}} \text{ percentile} - 25^{\text{th}} \text{ percentile} = 203.37 – 196.63 = 6.74[/latex]
Your Turn!
A study was done about violence against prostitutes and the symptoms of the posttraumatic stress that they developed. The age range of the prostitutes was 14 to 61. The mean age was 30.9 years with a standard deviation of nine years.
- In a sample of 25 prostitutes, what is the probability that the mean age of the prostitutes is less than 35?
- Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.
- In a sample of 49 prostitutes, what is the probability that the sum of the ages is no less than 1,600?
- Is it likely that the sum of the ages of the 49 prostitutes is at most 1,595? Interpret the results.
- Find the 95th percentile for the sample mean age of 65 prostitutes. Interpret the results.
- Find the 90th percentile for the sum of the ages of 65 prostitutes. Interpret the results.
Solution
- [latex]P(\overline{x} \lt 35) = \text{normalcdf}(-1 \text{E}99, 35, 30.9, 1.8) = 0.9886[/latex]
- [latex]P(\overline{x} > 50) = \text{normalcdf}(50, 1\text{E}99, 30.9, 1.8) \approx 0[/latex]. For this sample group, it is almost impossible for the group’s average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50.
- [latex]P(\Sigma{x} \ge 1,600) = \text{normalcdf}(1600, 1 \text{E}99, 1514.10, 63) = 0.0864[/latex]
- [latex]P(\Sigma{x} \le 1,595) = \text{normalcdf}(-1 \text{E}99, 1595, 1514.10, 63) = 0.9005[/latex]. This means that there is a 90% chance that the sum of the ages for the sample group [latex]n = 49[/latex] is at most 1595.
- The [latex]95^{\text{th}} \text{ percentile} = \text{invNorm}(0.95, 30.9, 1.1) = 32.7[/latex]. This indicates that 95% of the prostitutes in the sample of 65 are younger than 32.7 years, on average.
- The [latex]90^{\text{th}} \text{ percentile} = \text{invNorm}(0.90, 2008.5, 72.56) = 2101.5[/latex]. This indicates that 90% of the prostitutes in the sample of 65 have a sum of ages less than 2,101.5 years.
Summary
The central limit theorem tells us that for a population with any distribution, the distribution of the sums for the sample means approaches a normal distribution as the sample size increases. In other words, if the sample size is large enough, the distribution of the sums can be approximated by a normal distribution even if the original population is not normally distributed. Additionally, if the original population has a mean of [latex]\mu_x[/latex] and a standard deviation of [latex]\sigma_x[/latex], the mean of the sums is [latex]n \mu_x[/latex] and the standard deviation is [latex]\left(\sqrt{n}\right)(\sigma_x)[/latex] where [latex]n[/latex] is the sample size.
Formula Review
The Central Limit Theorem for Sums: [latex]\Sigma{X} \sim N[(n)(\mu_x), (\sqrt{n})(\sigma_x)][/latex]
Mean for Sums ([latex]\Sigma{X}[/latex]): [latex](n)(\mu_x)[/latex]
The Central Limit Theorem for Sums z-score and standard deviation for sums: [latex]z\text{ (for the sample mean)} = \frac{\Sigma x–\left(n\right)\left({\mu }_{X}\right)}{\left(\sqrt{n}\right)\left({\sigma }_{X}\right)}[/latex]
Standard deviation for Sums ([latex]\Sigma{X}[/latex]): [latex]\left(\sqrt{n}\right)(\sigma_x)[/latex]
References
Farago, Peter. “The Truth About Cats and Dogs: Smartphone vs Tablet Usage Differences.” The Flurry Blog, October 29, 2012. Available online at https://www.flurry.com/blog (accessed May 17, 2013).