statistics for management and economics study notes 3
9. Sampling Distributions
9.1 Sampling Distribution of the Mean
Central Limit Theorem: The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of X will resemble a normal distribution.
μˉx=μ
σ2ˉx=σ2n
If X is normal, then ˉX is normal. If X is nonnormal, then ˉX is approximately normal for sufficiently large sample sizes. The definition of “sufficiently large” depends on the extent of nonnormality of X.
Standardizing the sample mean:
Z=ˉX−μσ/√n
9.2 Sampling Distribution of a Sample Proportion
ˆP is approximately normally distributed provided that np and n(1 − p) are greater than or equal to 5.
E(ˆP)=p
V(ˆP)=σ2ˆp=p(1−p)n
Standardizing the sample proportion:
Z=ˆP−p√p(1−p)/n
9.3 Sampling Distribution of the Difference between Two Means
E(ˉX1−ˉX2)=μˉx1−ˉx2=μ1−μ2
V(ˉX1−ˉX2)=σ2ˉx1−ˉx2=σ21n1+σ22n2
Standardizing the difference between two sample means:
Z=(ˉX1−ˉX2)−(μ1−μ2)√σ21n1+σ22n2
10. Introduction to Estimation
- An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter.
- An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger.
- If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to have relative efficiency.
10.1 Estimating the Population Mean When the Population Standard Deviation is Known
ˉx±zα/2σ√n
10.2 Determining the Sample Size to Estimate μ
n=(zα/2σB)2
B=Zα/2σ√n
B stands for the bound on the error of estimation.
11. Introduction to Hypothesis Testing
11.1 Concepts of Hypothesis Testing
- null hypothesis usually refers to a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. H0
- alternative hypothesis (or maintained hypothesis or research hypothesis) refers the hypothesis to be accepted if the null hypothesis is rejected. H1
- A Type I error occurs when we reject a true null hypothesis. α
- A Type II error is defined as not rejecting a false null hypothesis. β
- The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true.
- If we reject the null hypothesis, we conclude that there is enough statistical evidence to infer that the alternative hypothesis is true.
- If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
11.2 Testing the Population Mean When the Population Standard Deviation is Known
- A two-tail test is conducted whenever the alternative hypothesis specifies that the mean is not equal to the value stated in the null hypothesis.
- a one-tail test that focuses on the right tail of the sampling distribution whenever we want to know whether there is enough evidence to infer that the mean is greater than the quantity specified by the null hypothesis.
- a one-tail test that focuses on the left tail of the sampling distribution whenever we want to know whether there is enough evidence to infer that the mean is less than the quantity specified by the null hypothesis.
11.2.1 Standardized Test Statistic
z=ˉx−μσ/√n
The rejection region:
z>zα/2
or
z<−zα/2
11.2.2 Testing Hypotheses and Confidence Interval Estimators
ˉx±zα/2σ√n
we compute the interval estimate and determine whether the hypothesized value of the mean falls into the interval.
11.3 Calculating the Probability of a Type II Error
Example: A random sample of 400 monthly accounts is drawn, for which the sample mean is $178. The accounts are approximately normally distributed with a standard deviation of $65. Whether the mean is greater than $170 with α = 5%?
H0: μ≤170
H1: μ>170
ˉxL−17065/√400=1.645
ˉxL=175.34
Therefore, the rejection region is:
ˉx>175.34
The sample mean was computed to be 178. Because the test statistic (sample mean) is in the rejection region (it is greater than 175.34), we reject the null hypothesis. Thus, there is sufficient evidence to infer that the mean monthly account is greater than $170.
β=P(ˉX<175.34, given that the null hypothesis is false )
Suppose that when the mean account is at least $180.
β=P(ˉX<175.34, given that μ=180)
β=P(ˉX−μσ/√n<175.34−18065/√400)=P(Z<−1.43)=0.0764
This plot illustrates the inverse relationship between the probabilities of Type I and Type II errors. Unfortunately, there is no simple formula to determine what the significance level should be.
11.4 Larger Sample Size Equals More Information Equals Better Decisions
11.5 Power of a Test
power: the probability of its leading us to reject the null hypothesis when it is false. Thus, the power of a test is 1 − β.
12. Inference About a Population
12.1 Inference about a Population Mean When the Population Standard Deviation is Unknown
When the population standard deviation is unknown and the population is normal, the test statistic for testing hypotheses about μ is
t=ˉx−μs/√n
which is Student t-distributed with ν = n − 1 degrees of freedom.
Confidence Interval Estimator of μ When σ Is Unknown
ˉx±tα/2s√n
12.2 Inference about a Population Variance
The test statistic used to test hypotheses about σ2 is
χ2=(n−1)s2σ2
which is chi-squared distributed with ν = n − 1 degrees of freedom when the population random variable is normally distributed with variance equal to σ2.
Confidence Interval Estimator of σ2
Lower confidence limit (LCL) = (n−1)s2χ2α/2
Upper confidence limit (UCL) = (n−1)s2χ21−α/2
12.3 Inference about a Population Proportion
ˆp=xn
Test Statistic for p
z=ˆP−p√p(1−p)/n
which is approximately normal when np and n(1 − p) are greater than 5.
Confidence Interval Estimator of p
ˆp±zα/2√ˆp(1−ˆp)/n
Sample Size to Estimate a Proportion
n=(zα/2√ˆp(1−ˆp)B)2
B=zα/2√ˆp(1−ˆp)n
13. Inference about Comparing Two Populations
13.1 Inference about the Difference between two Means: Independent Samples
Sampling Distribution of ˉx1−ˉx2:
ˉx1−ˉx2 is normally distributed if the populations are normal and approximately normal if the populations are nonnormal and the sample sizes are large.
E(ˉx1−ˉx2)=μ1−μ2 V(ˉx1−ˉx2)=σ21n1+σ22n2 Z=(ˉx1−ˉx2)−(μ1−μ2)√σ21n1+σ22n2
13.1.1 Test Statistic for μ1−μ2 when σ21=σ22
t=(ˉx1−ˉx2)−(μ1−μ2)√s2p(1n1+1n2)
where s2p is called the pooled variance estimator:
s2p=(n1−1)s21+(n2−1)s22n1+n2−2
13.1.2 Confidence Interval Estimator of μ1−μ2 when σ21=σ22
(ˉx1−ˉx2)±tα/2√s2p(1n1+1n2)
13.1.3 Test Statistic for μ1−μ2 when σ21≠σ22
t=(ˉx1−ˉx2)−(μ1−μ2)√s21n1+s22n2
ν=(s21/n1+s22/n2)2(s21/n1)2n1−1+(s22/n2)2n2−1
13.1.4 Confidence Interval Estimator of μ1−μ2 when σ21≠σ22
(ˉx1−ˉx2)±tα/2√s21n1+s22n2
13.1.5 Testing the Population Variances
H0: σ21σ22=1
H1: σ21σ22≠1
F=s21s22
ν1=n1−1 and ν2=n2−1. This is a two-tail test so that the rejection region is F>Fα/2,ν1,ν2 or F<F1−α/2,ν1,ν2.
Confidence Interval Estimator of σ21/σ22
LCL=s21s221Fα/2,ν1,ν2 UCL=s21s22Fα/2,ν1,ν2
13.2 Inference about the Difference between two Means: Matched Pairs Experiment
μD is the mean of the population of differences.
Test Statistic for μD
t=ˉxD−μDsD/√nD
which is Student t distributed with ν=nD−1 degrees of freedom, provided that the differences are normally distributed.
Confidence Interval Estimator of μD
ˉxD±tα/2sD√nD
13.3 Inference about the Difference between two Population Proportions
The statistic ˆp1−ˆp2 is approximately normally distributed provided that the sample sizes are large enough so that n1p1, n1(1−p1), n2p2, and n2(1−p2) are all greater than or equal to 5.
E(ˆp1−ˆp2)=p1−p2
V(ˆp1−ˆp2)=p1(1−p1)n1+p2(1−p2)n2
Z=(ˆp1−ˆp2)−(p1−p2)√p1(1−p1)n1+p2(1−p2)n2
ˆp1=x1n1 ˆp2=x2n2
13.3.1 Test Statistic for p1−p2: Case 1
H0: p1−p2=0
z=ˆp1−ˆp2√ˆp(1−ˆp)(1n1+1n2)
ˆp=x1+x2n1+n2
13.3.2 Test Statistic for p1−p2: Case 2
H0: p1−p2=D,D≠0
z=(ˆp1−ˆp2)−D√ˆp1(1−ˆp1)n1+ˆp2(1−ˆp2)n2