Requirements for constructing a confidence interval for a population proportion

Confidence Intervals for Population Proportions (Jump to: Lecture | Video )

Remember that the value of any statistic that estimates the value of a parameter is called a point estimate.

Figure 1.

Here's an example involving proportions: In a recent poll of 200 households, it was found that 152 households had at least one computer. Estimate the proportion of households in the population that have at least one computer.

Figure 2.

This is just a single estimate, so it�s probably off from the actual value of the population proportion. Because of this, we�re going to create a confidence interval to give a more realistic impression of what the actual population proportion value may be.

There are two requirements for constructing meaningful confidence intervals about a population proportion:

Figure 3.

Now, let's construct a 95% confidence interval to estimate the previous population proportion.

Figure 4.

We're trying to create 95% confidence interval. That means we have an alpha of 0.05(5%) which is split into two equal tails. This 2.5% refers to the value we look up in the z-table in order to find the z-score we need to plug into the equation. We find a z of "1.96" to plug into the equation.

Figure 5.

We are 95% confident that the proportion of households in the population with at least one computer is between .701 and .819.

Want to join the conversation?

  • as the first simulation, as I understand, that population is 250, and sample size is 200, without replacement (mean we will not put the gumball back to machine). How can we have many sample? With population 250, and sample 200, I think we only have 1 sample?

    • I suppose "without replacement" means that we do not put back each single gumball after observation, but when we retrive all 200 gumball we return them back.

  • Hi guys, here @

    2:28

    Sal mentioned to get a normally distributed sample distribution of sample mean, we will need to have at least 10x successes and 10x failures in each sample. However, in one of the previous exercises, the minimal sample size is said to be 30x if we want to have a normal distribution. These two are contradicting each other. Any advice on this?

    • They actually aren't contradicting. The sample size needs to be at least 30, so n>=30, and there needs to be at least 10 successes and 10 failures or p and 1-p, in the sample. So, remember that np>=10 and n(1-p)>= 10, which means the proportion of successes times the sample size needs to be greater than 10 and the number of failures times the sample size needs to be greater than 10. Multiplying the proportion of successes/ failures by the sample size basically gives you the number of successes/failures in a sample. Heres an example to see how they relate:

      Sample size: 50
      Proportion of successes: 0.4
      Proportion of failiures:0.6 (or 1-0.4)

      n(p)=?
      50(0.4)=20

      n(1-p)=?
      50(0.6)=30

      Now look, we can take the number of successes/ failures to find the proportion of successes/failures in the sample:

      20/50= 0.4
      0.4=p

      30/50=0.6
      0.6= 1-p

      So essentially, we need to first check that the sample size is larger than 30. And if that is met, then we check if the number of successes/ failures in a sample are more than 10. If not then the sample would probably not be normal.

  • I can't understand why in normal condition, we should expect more than 10 success and failure each. If the precondition for normal distribution of mean of sample proportion is np>=5 and n(1-p)>=5 in a sample, why the number of success and failure have to be more than 10 in samples? Is that means we have to conduct at least 2 samples?

  • What's the normal condition for a non-Bernoulli distribution?

    • Mainly the sample size (n) which has to be bigger than 30. Since any distribution can be turned into a normal distribution as long as the sample size is large enough according to the Central Limit Theorem

  • How do you access the gumball simulation?

  • I want to find more information about normal condition. Is there anyone who knows the search word or key word?

  • The independence condition is unintuitive to me. Shouldn't the sample parameters approach the population parameters as the sample proportion approaches 100%? Wouldn't that mean that the only consequence of not meeting the independence condition is that our estimates of the population parameters become more accurate than expected? How is getting "too accurate" estimates ever a problem in real life?

    (Intuitive, if polling ten people produces more accurate results than polling one person ten times, then replacement when sampling can only ever decrease the accuracy of a poll.)

    • You are comparing samples of different size (1 and 10). Indeed, the bigger the sample size, the closer to the population mean the sample mean is expected to be.

      The problem lies elsewhere. Since we calculate our confidence intervals in the number of stddevs from the mean, it is important for the stddev of our sample to be an unbiased estimate of the stddev of the population.

      The stddev of the sample with replacement is such an estimate. But the stddev of the sample without replacement is not, it is actually smaller. So, when we claim with 95% confidence that the population mean is not farther than 2 stddevs away from the sample mean and calculate that distance using the stddev of the sample without replacement, we are falling short, the interval is smaller than it's supposed to be.

      Intuitively, the bigger the sample, the closer we are to the mean, but the less confident we are about how close :)

  • In the 10% rule when Khan says _n_<10% of the population, isn't it supposed to include 10% itself?

  • Thanks for the videos Sal Khan!

  • 1:06

    Why does the margin of error CHANGE? For example, if we want 95% confidence intervals, and we take samples of size n = 10, they would all be the same length for that study; margin of error = (critical value)*(stdev), say (2)(4.5) if we want to cover 2 stdevs from either side of p hat, where stdev = 4.5. Wouldn’t this value of margin of error (2*4.5=9) (the “stems” on either side of p hat) be the SAME for ALL confidence intervals for that study? Thanks!

What are the three conditions for constructing a confidence interval for a proportion?

There are three conditions we need to satisfy before we make a one-sample z-interval to estimate a population proportion. We need to satisfy the random, normal, and independence conditions for these confidence intervals to be valid.

What are the requirements for a confidence interval of a single population proportion?

For a confidence interval for a population proportion, we need to make sure that the following hold:.
We have a simple random sample of size n from a large population..
Our individuals have been chosen independently of one another..
There are at least 15 successes and 15 failures in our sample..

How to construct confidence interval for population proportion?

To calculate the confidence interval, we must find p′, q′. p′ = 0.842 is the sample proportion; this is the point estimate of the population proportion. Since the requested confidence level is CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05 ( α 2 ) ( α 2 ) = 0.025.

What values are necessary to calculate a confidence interval for a proportion?

You should always use Z scores (not t-scores) to compute the confidence interval for a proportion.

Toplist

Neuester Beitrag

Stichworte