Chapter 8: Hypothesis Testing with One Sample
8.3 Distribution Needed for Hypothesis Testing
Learning Objectives
By the end of this section, the student should be able to:
- Identify the distribution required to conduct a hypothesis test.
Distributions Required to Conduct a Hypothesis Test
Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large or the sample size is large).
If you are testing a single population mean, the distribution for the test is for means:
[latex]\overline{X}\sim N\left({\mu }_{X},\frac{{\sigma }_{X}}{\sqrt{n}}\right)[/latex] or [latex]{t}_{df}[/latex]
The population parameter is μ. The estimated value (point estimate) for μ is [latex]\overline{x}[/latex], the sample mean.
If you are testing a single population proportion, the distribution for the test is for
proportions or percentages:
[latex]{P}^{\prime }\sim N\left(p,\sqrt{\frac{p\cdot q}{n}}\right)[/latex]
The population parameter is p. The estimated value (point estimate) for p is p′. p′ = [latex]\frac{x}{n}[/latex] where x is the number of successes and n is the sample size.
Assumptions
When you perform a hypothesis test of a single population mean μ using a Student's t-distribution (often called a t-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed.)
When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.
When you perform a hypothesis test of a single population proportion p, you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p. The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five (np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and [latex]\sigma =\sqrt{\frac{pq}{n}}[/latex].
Remember that q = 1 – p.
a continuous random variable (RV) with pdf [latex]f\text{(}x\text{)}=\frac{1}{\sigma \sqrt{2\pi }}{e}^{–{\left(x–\mu \right)}^{2}/2{\sigma }^{2}}[/latex], where μ is the mean of the distribution and σ is the standard deviation, notation: X ~ N(μ,σ). If μ = 0 and σ = 1, the RV is called the standard normal distribution.
investigated and reported by William S. Gossett in 1908 and published under the pseudonym Student. The major characteristics of the random variable (RV) are:
It is continuous and assumes any real values.
The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
It approaches the standard normal distribution as n gets larger.
There is a "family" of t distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.
The average distance (deviation) of each observation from the mean
A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)