statistics for management and economics study notes 3
9. Sampling Distributions
9.1 Sampling Distribution of the Mean
Central Limit Theorem: The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of X will resemble a normal distribution.
\[\mu_{\bar x} = \mu\]
\[\sigma_{\bar x}^2 = \frac{\sigma^2}{n}\]
If X is normal, then \(\bar X\) is normal. If X is nonnormal, then \(\bar X\) is approximately normal for sufficiently large sample sizes. The definition of “sufficiently large” depends on the extent of nonnormality of X.
Standardizing the sample mean:
\[Z = \frac{\bar X - \mu}{\sigma / \sqrt{n}}\]
9.2 Sampling Distribution of a Sample Proportion
\(\hat P\) is approximately normally distributed provided that np and n(1 − p) are greater than or equal to 5.
\[E(\hat P) = p\]
\[V(\hat P) = \sigma_{\hat p}^2 = \frac{p(1-p)}{n}\]
Standardizing the sample proportion:
\[Z = \frac{\hat P - p}{\sqrt{p(1-p)/n}}\]
9.3 Sampling Distribution of the Difference between Two Means
\[E(\bar X_{1} - \bar X_{2}) = \mu_{\bar x_{1} - \bar x_{2}} = \mu_{1} - \mu_{2}\]
\[V(\bar X_{1} - \bar X_{2}) = \sigma_{\bar x_{1} - \bar x_{2}}^2 = \frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}\]
Standardizing the difference between two sample means:
\[Z = \frac{(\bar X_{1} - \bar X_{2}) - (\mu_{1} - \mu_{2})}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}\]
10. Introduction to Estimation
- An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter.
- An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger.
- If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to have relative efficiency.
10.1 Estimating the Population Mean When the Population Standard Deviation is Known
\[\bar x \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\]
10.2 Determining the Sample Size to Estimate \(\mu\)
\[n = (\frac{z_{\alpha/2}\sigma}{B})^2\]
\[B = Z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\]
B stands for the bound on the error of estimation.
11. Introduction to Hypothesis Testing
11.1 Concepts of Hypothesis Testing
- null hypothesis usually refers to a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. \(H_{0}\)
- alternative hypothesis (or maintained hypothesis or research hypothesis) refers the hypothesis to be accepted if the null hypothesis is rejected. \(H_{1}\)
- A Type I error occurs when we reject a true null hypothesis. \(\alpha\)
- A Type II error is defined as not rejecting a false null hypothesis. \(\beta\)
- The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true.
- If we reject the null hypothesis, we conclude that there is enough statistical evidence to infer that the alternative hypothesis is true.
- If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
11.2 Testing the Population Mean When the Population Standard Deviation is Known
- A two-tail test is conducted whenever the alternative hypothesis specifies that the mean is not equal to the value stated in the null hypothesis.
- a one-tail test that focuses on the right tail of the sampling distribution whenever we want to know whether there is enough evidence to infer that the mean is greater than the quantity specified by the null hypothesis.
- a one-tail test that focuses on the left tail of the sampling distribution whenever we want to know whether there is enough evidence to infer that the mean is less than the quantity specified by the null hypothesis.
11.2.1 Standardized Test Statistic
\[z = \frac{\bar x - \mu}{\sigma / \sqrt{n}}\]
The rejection region:
\[z > z_{\alpha / 2}\]
or
\[z < - z_{\alpha / 2}\]
11.2.2 Testing Hypotheses and Confidence Interval Estimators
\[\bar x \pm z_{\alpha / 2}\frac{\sigma}{\sqrt{n}}\]
we compute the interval estimate and determine whether the hypothesized value of the mean falls into the interval.
11.3 Calculating the Probability of a Type II Error
Example: A random sample of 400 monthly accounts is drawn, for which the sample mean is $178. The accounts are approximately normally distributed with a standard deviation of $65. Whether the mean is greater than $170 with \(\alpha\) = 5%?
\(H_{0}\): \(\mu \le 170\)
\(H_{1}\): \(\mu \gt 170\)
\(\frac{\bar x_{L} - 170}{65/\sqrt{400}} = 1.645\)
\(\bar x_{L} = 175.34\)
Therefore, the rejection region is:
\(\bar x \gt 175.34\)
The sample mean was computed to be 178. Because the test statistic (sample mean) is in the rejection region (it is greater than 175.34), we reject the null hypothesis. Thus, there is sufficient evidence to infer that the mean monthly account is greater than $170.
\(\beta = P(\bar X \lt 175.34\), given that the null hypothesis is false )
Suppose that when the mean account is at least $180.
\(\beta = P(\bar X \lt 175.34\), given that \(\mu = 180)\)
\(\beta = P(\frac{\bar X - \mu}{\sigma / \sqrt{n}} < \frac{175.34-180}{65/\sqrt{400}}) = P(Z \lt - 1.43) = 0.0764\)
This plot illustrates the inverse relationship between the probabilities of Type I and Type II errors. Unfortunately, there is no simple formula to determine what the significance level should be.
11.4 Larger Sample Size Equals More Information Equals Better Decisions
11.5 Power of a Test
power: the probability of its leading us to reject the null hypothesis when it is false. Thus, the power of a test is 1 − β.
12. Inference About a Population
12.1 Inference about a Population Mean When the Population Standard Deviation is Unknown
When the population standard deviation is unknown and the population is normal, the test statistic for testing hypotheses about μ is
\[t = \frac{\bar x - \mu}{s/\sqrt{n}}\]
which is Student t-distributed with ν = n − 1 degrees of freedom.
Confidence Interval Estimator of μ When σ Is Unknown
\[\bar x \pm t_{\alpha/2}\frac{s}{\sqrt{n}}\]
12.2 Inference about a Population Variance
The test statistic used to test hypotheses about \(\sigma^2\) is
\[\chi^2 = \frac{(n-1)s^2}{\sigma^2}\]
which is chi-squared distributed with ν = n − 1 degrees of freedom when the population random variable is normally distributed with variance equal to \(\sigma^2\).
Confidence Interval Estimator of \(\sigma^2\)
Lower confidence limit (LCL) = \(\frac{(n-1)s^2}{\chi_{\alpha /2}^2}\)
Upper confidence limit (UCL) = \(\frac{(n-1)s^2}{\chi_{1-\alpha /2}^2}\)
12.3 Inference about a Population Proportion
\[\hat p = \frac{x}{n}\]
Test Statistic for p
\[z = \frac{\hat P - p}{\sqrt{p(1-p)/n}}\]
which is approximately normal when np and n(1 − p) are greater than 5.
Confidence Interval Estimator of p
\[\hat p \pm z_{\alpha /2} \sqrt{\hat p (1 - \hat p)/n}\]
Sample Size to Estimate a Proportion
\[n = (\frac{z_{\alpha /2}\sqrt{\hat p (1-\hat p)}}{B})^2\]
\[B = z_{\alpha /2} \sqrt{\frac{\hat p (1-\hat p)}{n}}\]
13. Inference about Comparing Two Populations
13.1 Inference about the Difference between two Means: Independent Samples
Sampling Distribution of \(\bar x_{1} - \bar x_{2}\):
\(\bar x_{1} - \bar x_{2}\) is normally distributed if the populations are normal and approximately normal if the populations are nonnormal and the sample sizes are large.
\[E( \bar x_{1} - \bar x_{2} ) = \mu_{1} - \mu_{2}\] \[V( \bar x_{1} - \bar x_{2} ) = \frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}\] \[Z = \frac{(\bar x_{1} - \bar x_{2}) -(\mu_{1} - \mu_{2})}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}\]
13.1.1 Test Statistic for \(\mu_{1} - \mu_{2}\) when \(\sigma_{1}^2 = \sigma_{2}^2\)
\[t = \frac{(\bar x_{1} - \bar x_{2}) -(\mu_{1} - \mu_{2})}{\sqrt{s_{p}^2(\frac{1}{n_{1}} + \frac{1}{n_{2}})}}\]
where \(s_{p}^2\) is called the pooled variance estimator:
\[s_{p}^2 = \frac{(n_{1} -1)s_{1}^2 + (n_{2} -1)s_{2}^2}{n_{1} + n_{2} - 2}\]
13.1.2 Confidence Interval Estimator of \(\mu_{1} - \mu_{2}\) when \(\sigma_{1}^2 = \sigma_{2}^2\)
\[(\bar x_{1} - \bar x_{2}) \pm t_{\alpha /2}\sqrt{s_{p}^2(\frac{1}{n_{1}} + \frac{1}{n_{2}})}\]
13.1.3 Test Statistic for \(\mu_{1} - \mu_{2}\) when \(\sigma_{1}^2 \ne \sigma_{2}^2\)
\[t = \frac{(\bar x_{1} - \bar x_{2}) -(\mu_{1} - \mu_{2})}{\sqrt{\frac{s_{1}^2}{n_{1}} + \frac{s_{2}^2}{n_{2}}}}\]
\[\nu = \frac{(s_{1}^2/n_{1} + s_{2}^2/n_{2})^2}{\frac{(s_{1}^2/n_{1})^2}{n_{1}-1} + \frac{(s_{2}^2/n_{2})^2}{n_{2}-1}}\]
13.1.4 Confidence Interval Estimator of \(\mu_{1} - \mu_{2}\) when \(\sigma_{1}^2 \ne \sigma_{2}^2\)
\[(\bar x_{1} - \bar x_{2}) \pm t_{\alpha /2}\sqrt{\frac{s_{1}^2}{n_{1}} + \frac{s_{2}^2}{n_{2}}}\]
13.1.5 Testing the Population Variances
\(H_{0}\): \(\frac{\sigma_{1}^2}{\sigma_{2}^2} = 1\)
\(H_{1}\): \(\frac{\sigma_{1}^2}{\sigma_{2}^2} \ne 1\)
\[F = \frac{s_{1}^2}{s_{2}^2}\]
\(\nu_{1} = n_{1} - 1\) and \(\nu_{2} = n_{2} - 1\). This is a two-tail test so that the rejection region is \(F \gt F_{\alpha/2, \nu_{1},\nu_{2}}\) or \(F \lt F_{1-\alpha/2, \nu_{1},\nu_{2}}\).
Confidence Interval Estimator of \(\sigma_{1}^2/\sigma_{2}^2\)
\[LCL = \frac{s_{1}^2}{s_{2}^2} \frac{1}{F_{\alpha/2,\nu_{1},\nu_{2}}}\] \[UCL = \frac{s_{1}^2}{s_{2}^2} F_{\alpha/2,\nu_{1},\nu_{2}}\]
13.2 Inference about the Difference between two Means: Matched Pairs Experiment
\(\mu_{D}\) is the mean of the population of differences.
Test Statistic for \(\mu_{D}\)
\[t = \frac{\bar x_{D} - \mu_{D}}{s_{D}/\sqrt{n_{D}}}\]
which is Student t distributed with \(\nu = n_{D} - 1\) degrees of freedom, provided that the differences are normally distributed.
Confidence Interval Estimator of \(\mu_{D}\)
\[\bar x_{D} \pm t_{\alpha/2}\frac{s_{D}}{\sqrt{n_{D}}}\]
13.3 Inference about the Difference between two Population Proportions
The statistic \(\hat p_{1} − \hat p_{2}\) is approximately normally distributed provided that the sample sizes are large enough so that \(n_{1}p_{1}\), \(n_{1}(1-p_{1})\), \(n_{2}p_{2}\), and \(n_{2}(1-p_{2})\) are all greater than or equal to 5.
\[E(\hat p_{1} − \hat p_{2}) = p_{1} − p_{2}\]
\[V(\hat p_{1} − \hat p_{2}) = \frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}\]
\[Z = \frac{(\hat p_{1} − \hat p_{2}) - (p_{1} − p_{2})}{\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}}\]
\[\hat p_{1} = \frac{x_{1}}{n_{1}}\] \[\hat p_{2} = \frac{x_{2}}{n_{2}}\]
13.3.1 Test Statistic for \(p_{1} − p_{2}\): Case 1
\(H_{0}\): \(p_{1} − p_{2} = 0\)
\[z = \frac{\hat p_{1} − \hat p_{2}}{\sqrt{\hat p(1-\hat p)(\frac{1}{n_{1}} + \frac{1}{n_{2}})}}\]
\[\hat p = \frac{x_{1} + x_{2}}{n_{1} + n_{2}}\]
13.3.2 Test Statistic for \(p_{1} − p_{2}\): Case 2
\(H_{0}\): \(p_{1} − p_{2} = D, D\ne0\)
\[z = \frac{(\hat p_{1} − \hat p_{2}) - D}{\sqrt{\frac{\hat p_{1}(1-\hat p_{1})}{n_{1}} + \frac{\hat p_{2}(1-\hat p_{2})}{n_{2}}}}\]