#### Table of Contents

#### All Modules Introduction to Statistics Coursera Quiz Answers

### Introduction and Descriptive Statistics for Exploring Data

Q1. What is an appropriate way to visualize a list of the eye colors of 120 people? Select all that apply.

**pie chart**- box plot
**dot plot**

Q2. According to the histogram of travel times to work from the US 2000 census (Page 6 of “Journey to Work: 2000”), roughly what percentage of commuters travel more than 45 minutes?

**75**

Q3. According to the histogram of travel times to work from the US 2000 census (Page 6 of “Journey to Work: 2000”), approximately what is the median travel time, in minutes (i.e. 50% of commuters have at most that travel time, 50% have at least that travel time)?

**50**

Q3. You want to investigate whether households in California tend to have a higher income than households in Massachusetts. Which summary measure would you use to compare the two states?

- 3rd quartile of household income
**median household income**- mean household income

Q5. Suppose all household incomes in California increase by 5%. How does that change the mean household income?

**the mean household income goes up by 5%**- the mean household income doesn’t change
- cannot be determined from the information given

Q6. Suppose all household incomes in California increase by 5%. How does that change the median household income?

- cannot be determined from the information given
**median household income goes up by 5%**- the median household income doesn’t change

Q7. Suppose all household incomes in California increase by 5%. How does that change the standard deviation of the household incomes?

**the standard deviation of the household incomes goes up by 5%**- the standard deviation of the household incomes doesn’t change
- cannot be determined from the information given

Q8. Suppose all household incomes in California increase by 5%. How does that change the interquartile range of household incomes?

- the interquartile range of the household incomes doesn’t change
- cannot be determined from the information given
**the interquartile range of the household incomes goes up by 5%**

Q9. Suppose all household incomes in California increase by $5,000. How does that change the mean household income?

- cannot be determined from the information given
- the mean household income doesn’t change
**the mean household income goes up by $5,000**

Q10. Suppose all household incomes in California increase by $5,000. How does that change the median household income?

**the median household income goes up by $5,000**- cannot be determined from the information given
- the median household income doesn’t change

Q11. Suppose all household incomes in California increase by $5,000. How does that change the standard deviation of the household incomes?

**the standard deviation of the household incomes doesn’t change**- cannot be determined from the information given
- the standard deviation of the household incomes goes up by $5,00

Q12. Suppose all household incomes in California increase by $5,000. How does that change the interquartile range of household incomes?

- the interquartile range of the household incomes goes up by $5,000
**the interquartile range of the household incomes doesn’t change**- cannot be determined from the information given

Q13. The median sales price for houses in a certain county during the last year was $342,000. What can we say about the percentage of sales represented by the houses that sold for more than $342,000?**1 point**

- the houses that sold for more than $342,000 represent more than 50% of all sales
**the houses that sold for more than $342,000 represent exactly 50% of all sales**- the houses that sold for more than $342,000 represent less than 50% of all sales

### Producing Data and Sampling

Q1. A news company located next to Times Square in New York wants to get a sense of how people feel about a proposed law on immigration. A reporter steps out of the building and randomly selects 100 people walking there and asks them about the proposed law. What can we say about this sampling plan? Single correct answer.

- it leads to voluntary response bias
- it leads to non-response bias
**it leads to selection bias**- it represents a simple random sampling

Q2. A car company wants to get a sense how satisfied the owners of its new car model are with the quality of that car. It randomly selects 250 numbers from the all the vehicle registration numbers that have been issued for this model and contacts the owners of that model. What can we say about this sampling plan?

**it represents a simple random sampling**- it leads to selection bias
- it leads to non-response bias
- it leads to voluntary response bias

Q3. An airline wants to do a customer survey in order to improve its service. For one month, it sends an email to a random sample of customers who flew with the airline on the previous day (no customer will be contacted more than once). The email states that the airline would like the customer to fill out a 10-minute survey in order to help the airline improve its service. What can we say about this sampling plan? Single correct answer.

- it represents a simple random sampling
- it leads to selection bias
**it leads to non-response bias**- it leads to voluntary response bias

Q4. As in the previous question, an airline wants to do a customer survey in order to improve its service. For one month, it sends an email to a random sample of customers which flew with the airline on the previous day (no customer will be contacted more than once). Again, the email states that the airline would like the customer to fill out a 10 minute survey in order to help the airline improve its service, but this time it states in addition that every respondent will receive a gift card worth $100. What can we say about this sampling plan?

- it represents a simple random sampling
- it leads to selection bias
**it leads to non-response bias**- it leads to voluntary response bias

Q5. Some years ago, there were many news reports about the “Paleo diet”. It was claimed that the Paleo Diet would result in weight loss as well as prevention and control of many “diseases of civilization”.

A news channel decides to check this out. It recruits people who have followed the diet for the past year and selects 100 at random. It also recruits people who have not followed the diet and selects 100 at random. It finds that there is more weight loss in the diet group, and that this result is ‘statistically significant’.

Which of the following statements are true?

- This is a randomized controlled experiment.
**It is possible that the difference in weight loss is due to the placebo effect.**- If a future carefully run randomized controlled experiment reveals that the paleo diet does not result in weight loss, then we can conclude that the weight loss observed above must be due to the placebo effect.

Q6. A number of competitive female cross country runners suffer from bone loss due to low estrogen levels. Some medical experts conjecture that this can be prevented by taking oral contraceptives, as those contain estrogen. This conjecture is to be tested with an experiment. The goal of the experiment is to find out whether taking an oral contraceptive prevents bone loss in female cross country runners. Which of the following subjects should be recruited in order to do a good experiment? (Pick one of the three.)**1 point**

- A group of women who are competitive runners and another group of women who are not competitive athletes.
- A group of female runners who are taking oral contraceptives and another group of female runners who are not taking oral contraceptives.
**A group of female runners who are not taking oral contraceptives, but who are willing to take them if asked by the organizers of the experiment to do do.**

### Probability

Q1. A fair coin is tossed 5 times. What is the probability of getting at most 4 tails?

**1 – (1/2)5 = 0.96875**

Q2. When you roll a pair of dice, a double is when both dice show the same number, e.g. both show ‘1’ or both show ‘4’. What is the chance of a double when you roll a pair of dice?

- 6/15
**1/6**- 1/12
- 1/36

Q3. The game Monopoly is played by rolling a pair of dice. If you land in jail, then to get out, you must roll a double on any one of your next three turns, or else pay a fine. What are the chances that you get out of jail without paying a fine?

**1 – (5/6)3 = 0.421296**

Q4. 3% of all applicants to the Stanford Medical School are admitted. 70% of all applicants have a GPA of 3.6 or above. Of those who are admitted, 95% have a GPA of 3.6 or above.

What are the chances of being admitted for an applicant whose GPA is 3.6 or above?

**(0.95) (0.03) / (0.7)**

Q5. A multiple-choice exam has 10 questions. Each question has 3 possible answers, of which one is correct. A student knows the correct answers to 4 questions and guesses the answers to the other 6 questions.

It turns out that the student answered the first question correctly. What are the chances that the student was merely guessing?

Q6. There are three boxes on the table: The first box contains 2 quarters, the second box contains 2 nickels, and the last box contains 1 quarter and 1 nickel. You choose a box at random, then you pick a coin at random from the chosen box.

If the coin you picked is a quarter, what’s the chance that the other coin in the box is also a quarter?

### The Normal Approximation for Data and the Binomial Distribution

Q1. Scores on a certain test follow the normal curve with an average of 1350 and a standard deviation of 120.

What percentage of the test-takers score below 1230? (Use the empirical rule.)

**16%**- 34%
- 68%
- 18%

Q2. As in the previous question, scores on a certain test follow the normal curve with an average of 1350 and a standard deviation of 120.

In order to qualify for a certain job, a candidate needs to score in the top 2.5%. What score does she need?

- 1710
- 1470
- 1650
**1590**

Q3. Recall that the main object in a boxplot is a box that is bounded by the first and the third quartiles. So the length of the box is the difference between the third and the first quartile, which is called the interquartile range. This is a measure of the spread of the data; it is sometimes used as an alternative to the standard deviation.

If the data follow the normal curve, then the interquartile range equals how many standard deviations? (You may use the fact that the z-value of the third quartile is 0.7.)

- 0.7
- 1
**1.4**- 2

Q4. A multiple-choice exam has 5 questions. Each question has 4 possible answers, of which one is correct. If a student guesses the answers to all five questions, what are the chances that he gets 2 correct?

Q5. A fair coin is tossed 6 times. What are the chances of getting 2 tails in each of the first 3 and the last 3 tosses?

Q6. A fair coin is tossed 400 times. Approximately what are the chances to get more than 210 tails? (Use the empirical rule and the normal approximation to the binomial distribution.)

- 32%
**16%**- 5%

### Sampling Distributions and the Central Limit Theorem

Q1. A town has 10,000 registered voters, of whom 6,000 are voting for the Democratic party. A survey organization is taking a sample of 100 registered voters (assume sampling with replacement). The percentage of Democratic voters in the sample will be around _____, give or take ____. (You may use the fact that the standard deviation of 6,000 1s and 4,000 0s is about 0.5)

- 60%, give or take 5%
- 40%, give or take 5%
- 60%, give or take 0.5%
- 40%, give or take 0.5%

Q2. You solicit 100 pledges for a charitable organization. Each pledge is equally likely to be $10, $50, or $100. You may use the fact that the standard deviation of the three amounts $10, $50 and $100 is $37.

What is the expected value of the sum of the 100 pledges?

**$5333**- $533
- $3700
- $370

Q3. You solicit 100 pledges for a charitable organization. Each pledge is equally likely to be $10, $50, or $100. You may use the fact that the standard deviation of the three amounts $10, $50 and $100 is $37.

What are the chances that the 100 pledges total more than $5,700?

**16%**- 32%
- 5%

Q4. There are two candidates running for governor in CA and they are said to have roughly equal support from the voters. To get a better idea who is ahead, a company polls 400 of the 20 million registered voters in California. Likewise, there are two candidates running for mayor in Palo Alto who are said to have roughly equal support, and the company polls 400 out of the 20,000 registered voters in Palo Alto. Will the first poll be more accurate, equally accurate, or less accurate than the second poll?

- more accurate
**equally accurate**- less accurate

Q5. The average taxable income reported on tax returns for the year 2016 is $ 45,000, and the standard deviation of the taxable income is $ 23,000.

Which of the following two statements are true? Both?

- The percentage of taxable incomes that fall below $ 30,000 can be computed from the above information using a normal approximation.
**The chances that the sum of 100 randomly selected taxable incomes exceed $ 4 million can be computed from the above information using the normal approximation.**

Q6. Questions (a)-(d) below relate to the following situation: Someone tosses a fair coin 100 times.

Question (a): How many tails can she expect to get?

**50**

Q7. Question (b): What is the “give and take” number for the result from Question (a)?

**5**

Q8. Question (c): What are the chances that she gets between 40 and 60 tails?

- 16%
- 68%
**95%**- 99.7%

Q9. A large group of people gets together and everyone tosses a coin 100 times.

Question (d): About what percentage of people will get between 40 and 60 tails?

- 16%
- 68%
**95%**- 99.7%

### Regression

Q1. Some people believe that musical activity (e.g. playing an instrument) enhances mathematical ability. 100 high school students were selected at random. For each student, musical activity was recorded in hours per week, and mathematical ability was assessed by a test. The correlation coefficient was found to be 0.85.

Does the large correlation coefficient prove that musical activity enhances mathematical ability?

- yes
**no**

Q2. What would your answer to the previous question be if you learned that all students in the study came from the same grade?

- yes
**no**

Q3. For a group of commuters commuting to work on a given day, the correlation coefficient between a) time spent waiting at traffic signals, and b) total commuting time, was found to be 0.4. Which of the following statements about the correlation coefficient are true?

- If a commuter’s total commuting time increases by 10 minutes, then he will spend an additional 4 minutes waiting at traffic signals, on average.
- The average commuter spent 40% of the commuting time waiting at traffic signals.
**The more time a commuter spends commuting to work, the more time he spends waiting at traffic signals, on average.****The more time a commuter spends waiting at traffic signals, the longer the total commuting time, on average.**

Q4. A study followed 1,000 children over time. The scatter plot of heights at age 1 vs. heights at age 2 looks football-shaped with a correlation coefficient r=0.8. Alice’s height at age 1 is at the 80th percentile.

Would you predict her height at age 2 to be below, at, or above the 80th percentile?

**below**- at
- above

Q5. In the previous question we learned that in a study of children’s height, the correlation coefficient between height at age 1 vs. height at age 2 is r=0.8.

Predict the z-score of Alice’s height at age 2. (You may use the fact that the z-score of the 80th percentile is z=0.85.)

**(0.8)(0.85) = 0.68**- 0.85/0.8 = 1.0625
- not enough information

Q6. Questions (a)-(d) below relate to the following situation: In a biology class, both the midterm scores and the final exam scores have an average of 50 and a standard deviation of 10. The scatterplot looks football-shaped and the correlation coefficient is 0.6.

Claudia would like to know what score her friend Emily got on the final.

Question (a): If you have no information on how Emily did on the midterm, what is your prediction for her score on the final?

- 40
- 44
**50**- 56

Q7. Question (b): What is the “give or take” number for your prediction from Question (a)?

**10**

Q8. Now you learn that Emily got exactly the mean score of 50 on the midterm.

Question (c): Given this information, what is your prediction for Emily’s score on the final?

- 40
- 44
**50**- 56

Q9. Question (d): What is the “give or take” number for your prediction from Question (c)?

**10 (sqrt)1-(0.6)^2} =8**

Q10. A tutoring center advertises its services by stating that students who sign up improve their GPA on tests by 0.5 points on average.

Is this indeed evidence that the tutoring helps or could this be due to the regression effect?

- The improvement proves that the tutoring helps.
**The improvement could be due to the regression effect.**

Q11. True or false: If an observation with large leverage has a small residual, then it is not influential.

- True
**False**

### Confidence Intervals

Q1. A random sample of 500 sales prices of recently purchased homes in a county is taken. From that sample, a 90% confidence interval for the average sales price of all homes in the county is computed to be $215,000 +/- $35,000.

Is the following statement true or false?

“About 90% of all home sales in the county have a sales price in the range $215,000 +/- $35,000.”

- true
**false**

Q2. A random sample of 500 sales prices of recently purchased homes in a county is taken. From that sample, a 90% confidence interval for the average sales price of all homes in the county is computed to be $215,000 +/- $35,000.

Is the following statement: true or false?

“There is a 90% chance that the average sales price of all homes in the county is in the range $215,000 +/- $35,000.”

- true
**false**

Q3. Questions (a) and (b) below relate to the following: Based on a sample of 500 salaries in a large city we want to find a confidence interval for the average salary in that city.

Question (a): Is it possible to do this using the formula “average +/- z SE”? (Keep in mind that the histogram of salaries is not normal but quite skewed.)

**yes**- no

Q4. The margin of error for the confidence interval from Question (a), which was based on 500 salaries, turns out to be $5,400. How many salaries do we need to sample in order to shrink the margin of error to about $2,000?

Q6. You are interested in what the current starting salary for jobs in data science is. You solicit feedback on an online forum about data science and you get 230 replies with salary numbers. Can you use the formula “average +/- z SE” to find a confidence interval for the average starting salary?

- yes
**no**

### Tests of Significance

Q1. Which of the following statements are true? (Select all that apply.)

**The p-value depends on the data.**- If the p
*p*-value is smaller than 5%, then there is less than a 5% chance that the null hypothesis is true. **If the null hypothesis is true, then there is less than a 5% chance to get a p-value that is smaller than 5%.****If a data scientist does many tests, then even if all the null hypotheses are true, a certain proportion will be rejected in error.**

Q2. Read the first five paragraphs of the article “Online daters do better in the marriage stakes” by Regina Nuzzo in Nature News, 2013. [You can find it on the internet or here]. The main claim of the article is that there is a statistically significant difference in marital outcomes between couples that meet online and couples that meet in other ways. Is this finding is of practical relevance?

- yes
**no**

Q3. A fair coin is tossed 100100 times.

Which of the following statements are true? (Select all that apply.)

**The standard error for the percentage of heads among the 100 tosses is 5%.****The standard error for the percentage of tails among the 100 tosses is5%.**- The standard error for the quantity “percentage of heads – percentage of tails” is \sqrt{0.05^2 + 0.05^2} = 7\%.0.052+0.052=7%.

Q4. Is there a relationship between age and insomnia? A random sample of 184 people ages 18-29 was taken, and it was found that 26.1% suffer from insomnia and 73.9% do not. A separate random sample of 811 people ages 30 and over was taken, and it was found that 39.2% suffer from insomnia and 60.8% do not.

Which of the following four test statistics are appropriate for testing whether the prevalence of insomnia is different between the two age groups? (Select all that are.)

Q5. You want to test whether plain M&Ms really contain 24% blue M&Ms as claimed on the manufacturer’s website. You sample 500 plain M&Ms at random and count the fraction of blue M&Ms.

Which of the following tests is appropriate to address this question?

**z-test**- t
*t*-test - 2-sample
*z*-test - sign test
- paired-difference test.

Q6. A high school principal wants to find out whether the average SAT score of this year’s graduating class is higher than last year’s. She samples 13 students from this year’s graduating class at random and wants to compare their average SAT score to the average SAT score from last year’s graduating class.

*z*-test*t*-test- 2-sample
*z*-test - sign test
- or paired-difference test.

Q7. To investigate whether there is a difference in scholastic abilities between first-borns and second-born siblings, 600 families that have at least two children were randomly selected. The scholastic abilities of the first-born and the second-born siblings were assessed with a test and are to be compared.

*z*-test*t*-test- 2-sample
*z*-test **sign test**- paired-difference test.

### Resampling

Q1. We want to use the Monte Carlo method to estimate the probability of getting exactly one ace (one spot) in three rolls of die.

Which of the following is a correct description for doing this?

- To simulate the roll of a die, we draw a number at random (with replacement) from 1,2,3,4,5,6. To simulate the probability in question with B=1000 Monte Carlo simulations, we simulate the roll of a die 3B=3000 times and count the number of times an ace comes up. Then we divide this number by 3B. The resulting proportion is our Monte Carlo estimate.
**To simulate three rolls of a die, we draw three times a number at random (with replacement) from 1,2,3,4,5,6. If we get the number `1′ exactly once, then we label this trial to be a success. We repeat this B=1000 times. The proportion of successes in these 1000 trials is our Monte Carlo estimate of the probability in question.**- To simulate three rolls of a die, we draw three times a number at random (with replacement) from 1,2,3,4,5,6. We repeat this simulation many times until we get the number `1′ exactly once, then we stop. The desired Monte Carlo estimate is 1/(number of repetitions).

Q2. We want to use the Monte Carlo Method to approximate the standard error of our estimate from Question 1.

Which of the following is a correct description for doing this?

- We compute the standard deviation of the all the numbers we simulated in Question 1.
- In each of the B=1000 trials we simulated in Question 1, if the trial results in a success (i.e. `1′ shows exactly once), then we give that trial the label 1, otherwise the label 0. We compute the standard deviation of these 1000 labels.
**We repeat the whole Monte Carlo simulation done in Question 1 many times (e.g. 2000 times). Each time we get an estimate of the probability in question. We compute the standard deviation of these 2000 estimates.**

Q3.

Q4. We want to compute a 90% bootstrap percentile interval for the correlation coefficient based on 32 pairs (X_1,Y_1),\ldots,(X_{32},Y_{32})(*X*1,*Y*1),…,(*X*32,*Y*32).

Which of the following is a correct description for doing this?

### Analysis of Categorical Data

Q1. Questions (a)-(d) below relate to the following: Some people suspect that childbirths may not be equally distributed over the seven days of the week because hospital staff (who can influence the time of delivery in some cases) may prefer to work on certain days of the week.

Question (a): Which of the following is the null hypothesis?

- childbirths are more likely on certain days of the week
**childbirths occur equally likely on the seven days of the week**

Q2. To investigate, you note the day of the week of 300 births that were randomly selected from all births that occurred in New York City last year.

Question (b): What test should you use to test the null hypothesis?

*z*-test**chi-square test for goodness-of-fit**- chi-square test of independence
- chi-square test of homogeneity

Q3. Question (d): What would be the answer to Question (b) if you wanted to investigate a simpler question, namely whether the percentage of births on weekends is lower than expected?

*z*-test- chi-square test for goodness-of-fit
- chi-square test of independence
- chi-square test of homogeneity

Q4. This question and the next one are related to the following context: A food delivery start-up decides to advertise its service by placing ads on web pages. They wonder whether the percentage of viewers who click on the ad changes depending on how often the viewers were shown the ad. They randomly select 100 viewers from among those who were shown the add once, 135 from among those who were shown the add twice, and 150 from among those who were shown the ad three times.

Which is the null hypothesis?

- the chances that the user clicks on the ad increases with the number of ads shown
**the chances that the user clicks on the ad is the same for all three groups**

Q5. In the previous question, which test is appropriate to test the null hypothesis?

*z*-test- chi-square test for goodness-of-fit
- chi-square test of independence
**chi-square test of homogeneity**

Q6. A county wants to check whether the racial composition of the teachers in the county corresponds to that of the population in the county. It samples 500 teachers at random and wants to compare that sample with the census numbers about the racial groups in that county.

Which test would be appropriate?

*z*-test**chi-square test for goodness-of-fit**- chi-square test of independence
- chi-square test of homogeneity
- none of these

Q7. An airline wants to find out whether there is a connection between the customer’s status in its frequent flyer program and the class of tickets that the customer buys. It samples 1,000 ticket records at random and for each ticket notes the status level (‘none’, ‘silver’, ‘gold’) and the ticket class (‘economy’, ‘business’,’first’)

*z*-test- chi-square test for goodness-of-fit
**chi-square test of independence**- chi-square test of homogeneity
- none of these

Q8. The airline wants to find out whether there is a connection between the customer’s status in its frequent flyer program and the amount that the customer spends on tickets in the following year. It samples 1,000 ticket records at random and for each ticket notes the status level (‘none’, ‘silver’, ‘gold’) and the amount spent on tickets in the following year.

Which test would be appropriate?

*z*-test- chi-square test for goodness-of-fit
- chi-square test of independence
- chi-square test of homogeneity
**none of these**

### One-Way Analysis of Variance

Q1. An online retailer strongly suspects that customers purchase more in the following month if they are shown a company ad more often. To confirm that hunch they randomly select 50 customers who are then sent one ad, 45 customers who are sent two ads, and 52 customers who are sent three ads.

Which is the null hypothesis?

**the spending means for the three groups are the same.**- the spending means increase with the number of ads

Q2. Based on the description of the experiment in the previous question and the boxplots below, do you think that the assumptions of ANOVA are met?

**Yes**

Q3. Based on the ANOVA table below and the boxplots, what is the conclusion of the analysis?

- There is no statistically significant effect.
- There is sufficient evidence to conclude that the spending means to increase with the number of ads.
**There is sufficient evidence to conclude that the spending means are not equal, but based on this analysis alone we cannot conclude that the spending means to increase with the number of ads.**

Q4. Does eye color effect the type of vision correction that patients choose? From a large dataset of patients having vision correction, 70 patients were chosen randomly from those having brown eyes, 70 from those having green eyes, and 70 from those having blue eyes. For each patient, the type of vision correction was coded as follows: glasses=1, contact lenses=2, corrective surgery=3. Those numbers were used for an ANOVA, which resulted in a p-value of 0.5%.

Does the p-value of 0.5% mean that there is strong evidence that that eye color has an effect on the type of vision correction that patients choose?

- yes
**no**

Q5. A clinical trial aims to discern whether twelve interventions against high blood pressure have different effects. The study randomizes 10,000 subjects into twelve groups. Each group is administered one of the twelve interventions. After a month the change in blood pressure is measured for each subject. The ANOVA table gives a p-value of 17%. The investigators also perform pairwise two-sample t-tests for all pairs of treatments and find that two pairs show a statistically significant difference.

Which of the following options describes a valid conclusion?

**There is not enough evidence to conclude that the twelve treatment means are different.**- We can conclude that there are differences between the two pairs of treatments that were found to be significant by the two-sample t-tests.

### Multiple Comparisons

Q1. Recall that a “discovery” occurs when a test rejects the null hypothesis. In the medical literature a discovery is called a “positive result”. So a “false positive” is a “false discovery”.

What is the false discovery proportion (FDP) of the procedure that yielded the following results:

**9/9+36**

Q2. A medical study examines whether there is a significant correlation between any of the 12 lifestyle choices and high blood pressure. It doesn’t find any significant correlation, but upon further examination, the researchers find a highly significant (*p*-value <0.5%) correlation between two of the lifestyle choices. This correlation seems not to have been noticed before.

Which of the following three statements is an appropriate summary of these findings? Select all that apply.

- The correlation between these two lifestyle choices is highly significant and should be reported as such.
**The seemingly significant correlation was found as a consequence of data snooping and therefore the p***p*-value is not valid. The researchers shouldn’t report anything.- The seemingly significant correlation was found as a consequence of data snooping and therefore the p
*p*-value is not valid. However, this could potentially be a significant new finding. The researchers can report it as such, pointing out that they cannot attach a valid p*p*-value to this finding. It can serve as a hypothesis for a future study with new data, which would then allow for statistically valid conclusions.

Q3. 1,000 tests were evaluated with the Bonferroni correction. 31 tests had corrected p*p*-values smaller than 5%.

Which of the following three statements is an appropriate conclusion?

- If we reject these 31 null hypotheses then we can expect that about 5% of them are rejected in error.
**This is sufficient evidence to reject all of these 31 null hypotheses because there is only a 5% chance that any of these 31 p***p*-values would be this small if the null hypotheses were true.- There is a 95% probability that all of these 31 null hypotheses are false.

Q4. 1,000 tests were evaluated with the FDR at the 5% level, which resulted in 31 discoveries.

Which of the following three statements is an appropriate conclusion?

- There is a 95% probability that all of these 31 null hypotheses are false.
- This is sufficient evidence to reject all of these 31 null hypotheses, because there is only a 5% chance that any of these 31 p
*p*-values would be this small if the null hypothesis were true. **If we reject these 31 null hypotheses then we can expect that about 5% of them are rejected in error.**

**Find more related Quiz Answers >>**

The Data Scientist’s Toolbox Quiz Answers

Computer Vision Basics Quiz Answers

Indigenous Canada Coursera Quiz Answers

Process Data from Dirty to Clean Coursera Quiz Answers