Get All Weeks Introduction to Probability and Data with R Coursera Quiz Answers
Week 02: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz1: Practice Quiz
Q1. Which of the following classifications of variable types is false?
- Student height → continuous numerical
- Customer satisfaction: very unsatisfied, unsatisfied, satisfied, very satisfied → ordinal categorical
- Whether a student has previously taken a statistics course → categorical
- Population of each state in the US → continuous numerical
Q2. True or False: If subjects are randomly assigned to treatments, conclusions can be generalized to the population.
Q3. As part of a statistics project, Andrea would like to collect data on household size in her city. To do so, she asks each person in her statistics class for the size of their household, and reports that her sample is a simple random sample. However, this is not a simple random sample. Which of the following is the best reasoning for why this is not a random sample that is appropriate for this research question?
- Andrea did not block for any variables that might influence the response.
- Andrea did not use any randomization; she took a convenience sample.
- Andrea did not use a stratified sample.
Q4. Which of the following is not one of the four principles of experimental design?
Q5. True or False: Stratified sampling allows for controlling for possible confounders in the sampling stage, while blocking allows for controlling for such variables during random assignment.
Quiz 2: Week 1 Quiz
Q1. Consider the table below describing a data set of individuals who have registered to volunteer at a public school. Which of the choices below lists categorical variables?
- number of siblings and year born
- name and number of siblings
- phone number and name
- annual income and phone number
Q2. The General Social Survey conducted annually in the United States asks how many friends people have and how they would rate their happiness level (very happy, pretty happy, not too happy). In order to evaluate the relationship between these two variables a researcher calculates the average number of friends for people who categorize themselves as very happy, pretty happy, and not too happy. Which of the following correctly identifies the variables used in the study as explanatory and response?
- explanatory:number of friends
- response: very happy, pretty happy, not too happy
- explanatory:happiness level (categorical with 3 levels)
- response: number of friends
- explanatory:very happy, pretty happy, not too happy
- response: number of friends
- explanatory:number of friends
- response: happiness level (categorical with 3 levels)
Q3. In a study published in 2011 in The Proceedings of the National Academy of Sciences, researchers randomly assigned 120 elderly men and women who volunteered to be a part of this study (average age mid-60s) to one of two exercise groups. One group walked around a track three times a week; the other did a variety of less aerobic exercises, including yoga and resistance training with bands. After a year, brain scans showed that among the walkers, the hippocampus (part of the brain responsible for forming memories) had increased in volume by about 2% on average; in the others, it had declined by about 1.4%. Which of the following is false?
- The results of this study can be generalized to all elderly.
- A causal link between walking and expansion of the hippocampus can be inferred based on these results.
- The explanatory variable is the type of exercise, and the response variable is the change in volume of the hippocampus.
Q4. An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a _ (use all lower cases in your answer please).
Enter answer here
Q5. For your political science class, you’d like to take a survey from a sample of all the Catholic Church members in your town. Your town is divided into 17 neighborhoods, each with similar socio-economic status distribution and ethnic diversity, and each contains a Catholic Church. Rather than trying to obtain a list of all members of all these churches, you decide to pick 3 churches at random. For these churches, you’ll ask to get a list of all current members and contact 100 members at random. What kind of design have you used?
- systematic sampling
- quota sampling
- stratified sampling
- simple random sampling
- multistage sampling
Q6. In an experiment, what purpose does blocking serve?
- Prevent skewed results.
- Increase sample size.
- Obtain a random sample.
- Control for variables that might influence the response.
Q7. Which of the following is one of the four principles of experimental design?
Week 3: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Introduction to R and RStudio
Q1. How many variables are included in this data set (data set: arbuthnot)?
Q2. What command would you use to extract just the counts of girls born?
Q3. Which of the following best describes the number of girls baptised over the years included in this dataset?
- There is initially an increase in the number of girls baptised. This number peaks around 1640 and then after 1640 the number of girls baptised decreases.
- There appears to be no trend in the number of girls baptised from 1629 to 1710
- There is an initial increase in the number of girls baptised but this number appears to level around 1680 and not change after that time point.
- There is initially an increase in the number of girls baptised, which peaks around 1640. After 1640 there is a decrease in the number of girls baptised, but the number begins to increase again in 1660. Overall the trend is an increase in the number of girls baptised.
- The number of girls baptised has decreased over time.
Q4. How many variables are included in this data set (data set: present)?
Q5. Calculate the total number of births for each year and store these values in a new variable called total in the present dataset. Then, calculate the proportion of boys born each year and store these values in a new variable called prop_boys in the same dataset. Plot these values over time and based on the plot determine if the following statement is true or false: The proportion of boys born in the US has decreased over time.
Q6. Create a new variable called more_boys which contains the value of either TRUE if that year had more boys than girls, or FALSE if that year did not. Based on this variable which of the following statements is true?
- Half of the years there are more boys born, and the other half more girls born.
- Every year there are more girls born than boys.
- Every year there are more boys born than girls.
Q7. Calculate the boy-to-girl ratio each year, and store these values in a new variable called prop_boy_girl in the present dataset. Plot these values over time. Which of the following best describes the trend?
- There is an initial decrease in the boy-to-girl ratio born but this number appears to level around 1960 and remain constant since then.
- The boy-to-girl ratio has increased over time.
- There is initially an increase in boy-to-girl ratio, which peaks around 1960. After 1960 there is a decrease in the boy-to-girl ratio, but the number begins to increase in the mid 1970s.
- There appears to be no trend in the boy-to-girl ratio from 1940 to 2013.
- There is initially a decrease in the boy-to-girl ratio, and then an increase between 1960 and 1970, followed by a decrease.
Q8. In what year did we see the most total number of births in the U.S.?
Week 4: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Practice Quiz
Q1. Which of the below data sets has the lowest standard deviation? You do not need to calculate the exact standard deviations to answer this question.
- 0, 25, 50, 100, 125, 150, 1000
- 100, 100, 100, 100, 100, 100, 101
Q2. True or False: The statistic mean/median (mean divided by median) can be used as a measure of skewness (either right or left). Suppose we are dealing with a distribution where the minimum is 0.5. If this statistic (mean/median) is less than 1, the distribution is most likely left skewed.
Q3. True or False: You are going to collect income data from a right-skewed distribution of incomes of politicians. If you take a large enough sample from that distribution, the sample mean and the sample median will always have the same value.
Q4. True or False: A mosaic plot is useful for visualizing the relationship between a numerical and a categorical variable.
Q5. Does meditation cure insomnia? Researchers randomly divided 400 people into two equal- sized groups. One group meditated daily for 30 minutes, the other group attended a 2-hour information session on insomnia. At the beginning of the study, the average difference between the number of minutes slept between the two groups was about 0. After the study, the average difference was about 32 minutes, and the meditation group had a higher average number of minutes slept. To test whether an average difference of 32 minutes could be attributed to chance, a statistics student decided to conduct a randomization test. She wrote the number of minutes slept by each subject in the study on an index card. She shuffled the cards together very well, and then dealt them into two equal-sized groups. Which of the following best describes the outcome?
- The average difference between the two stacks of cards will be about 32 minutes.
- The average difference between the two stacks of cards will be about 0 minutes.
- If meditation is effective, the average difference between the two stacks of cards will be more than 32 minutes.
Quiz 2: Week 2 Quiz
Q1. Which of the below data sets has the highest standard deviation? You do not need to calculate the exact standard deviations to answer this question.
- 0, 25, 25, 25, 25, 25, 25
- 0, 100, 200, 300, 400, 500, 600
Q2. The distribution of exam scores (ranging from 0 – 100%) where the mean score is 75%, the standard deviation is 12%, and the median is 78% is most likely
- right skewed
- left skewed
Q3. Two distributions (A and B) are shown on the box plot below. Which of the following statements is not supported by the plot?
- Both distributions are roughly symmetric.
- Both distributions are unimodal.
- Median of A is higher than median of B.
- B is more variable than A.
Q4. Which is more affected by extreme observations, the mean or median? And how about the standard deviation or IQR?
- mean, SD
- median, SD
- median, IQR
- mean, IQR
Q5. Phi Delta Kappa (PDK) is an international professional organization for educators that, in collaboration with Gallup, has been conducting polls on the public’s attitudes toward the public schools since 1969. The following was one of the questions on the 2011 poll:
- ”Most teachers in the nation now belong to unions or associations that bargain over salaries, working conditions, and the like. Has unionization, in your opinion, helped, hurt, or made no difference in the quality of public school education in the United States?”
- The respondents’ answers broken down by party affiliation are shown below. Which of the following statements is most justified by these data?
- The results of the survey suggest that opinion on teachers belonging to unions or bargaining associations and political party affiliation appear to be independent.
- The results of the survey suggest a relationship between opinion on teachers belonging to unions or bargaining associations and political party affiliation.
- 14% of Republicans and 58% of Democrats think that teachers belonging to unions or bargaining associations helped the quality of public school education in the United States.
- A histogram or a box plot would be useful for investigating if distribution of opinion on teachers belonging to unions or bargaining associations varies by political party affiliation.
Q6. In 1948, Austin Bradford Hill, designed a study to test a new treatment for tuberculosis that at the beginning of the study there was no evidence whether it would be any better or worse than bed rest. He randomly assigned some patients who volunteered to be a part of this study to receive the treatment Streptomycin, an antibiotic. The other patients received only bed rest as the control group. Hill then observed the patients’ outcomes: which patients died and which recovered. The results of the study are shown below.
We use the following simulation test if there is a difference between the recovery rates under the two treatments: We write “died” on 18 index cards and “survived” on 89 index cards to indicate whether or not a patient died. Next, we shuffle the cards and deal them into two groups of 52 and 55, for control and treatment, respectively. We then calculate the simulated difference between the recovery rates in Streptomycin and control groups (p̂Streptomycin − p̂Control), and record this value. We repeat this simulation 100 times. The histogram below shows the distribution simulated difference between the recovery rates in these 100 simulations.
Which of the following is correct? Choose all that apply (there are multiple correct answers).
- Based on this study we can conclude a causal relationship between Streptomycin and better tuberculosis recovery rate.
- The difference between the survival rates in the control and treatment groups appear to be simply due to chance.
- The alternative hypothesis is that the Streptomycin treatment is more effective than bed rest.
- Streptomycin treatment does not appear to be effective in treating tuberculosis since the observed number of deaths in the treatment group would not be considered unusual based on the simulation results.
- If Streptomycin and bed rest are equally effective in curing tuberculosis, the probability of observing a difference in the recovery rates at least as high as the one observed is 2%.
- The conclusion of this study is generalizable to all tuberculosis patients.
- Hill’s study is observational.
- The alternative hypothesis should be that there is a difference between the recovery rates under the two treatments.
- Streptomycin treatment appears to be effective in treating tuberculosis since the observed difference in recovery rates would be considered unusual based on the simulation results.
Week 5: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Introduction to Data
Q1. Create a new data frame that includes flights headed to SFO in February, and save this data frame assfo_feb_flights. How many flights meet these criteria?
Q2. Make a histogram and calculate appropriate summary statistics for arrival delays of sfo_feb_flights. Which of the following is false?
- More than 50% of flights arrive on time or earlier than scheduled.
- The distribution has several extreme values on the right side.
- No flight is delayed more than 2 hours.
- The distribution is right skewed.
- The distribution is unimodal.
Q3. Calculate the median and interquartile range for arr_delays of flights in the sfo_feb_flights data frame, grouped by carrier. Which carrier has the highest IQR of arrival delays?
- Delta and United Airlines
- American Airlines
- JetBlue Airways
- Virgin America
- Frontier Airlines
Q4. Considering the data from all the NYC airports, which month has the highest average departure delay?
Q5. Which month has the highest median departure delay from an NYC airport?
Q6. Is the mean or the median a more reliable measure for deciding which month(s) to avoid flying if you really dislike delayed flights, and why?
- Median would be more reliable as the distribution of delays is symmetric.
- Mean would be more reliable as the distribution of delays is symmetric.
- Median would be more reliable as the distribution of delays is skewed.
- Mean would be more reliable as it gives us the true average.
- Both give us useful information.
Q7. If you were selecting an airport simply based on on time departure percentage, which NYC airport would you choose to fly out of?
Q8. Mutate the data frame so that it includes a new variable that contains the average speed, avg_speed traveled by the plane for each journey (in mph). What is the tail number of the plane with the fastest avg_speed? Hint: Average speed can be calculated as distance divided by number of hours of travel, and note that air_time is given in minutes. If you just want to show the avg_speed and tailnum and none of the other variables, use the select function at the end of your pipe to select just these two variables with select(avg_speed, tailnum). You can google this tail number to find out more about the aircraft.
Q9. Make a scatterplot of avg_speed vs. distance. Which of the following is true about the relationship between average speed and distance.
- The distribution of distances are uniform over 0 to 5000 miles.
- There are no outliers.
- There is an overall positive association between distance and average speed.
- The relationship is linear.
- As distance increases the average speed of flights decreases.
Q10. Suppose you define a flight to be “on time” if it gets to the destination on time or earlier than expected, regardless of any departure delays. Mutate the data frame to create a new variable called arr_type with levels “on time” and “delayed” based on this definition. Also mutate to create a new variable called dep_type with levels “on time” and “delayed” depending on the flight was delayed for fewer than 5 minutes or 5 minutes or more, respectively. In other words, if arr_delay is 0 minutes or fewer, arr_type is “on time”. If dep_delay is less than 5 minutes, dep_type is “on time”. Then, determine the on time arrival percentage based on whether the flight departed on time or not. What fraction of flights that were “delayed” departing arrive “on time”? (Enter the answer in decimal point, like 0.xx)
Enter answer here
Week 6: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Practice Quiz
Q1. Shown below are four Venn diagrams. In which of the diagrams does the shaded area represent A and B and C?
Q2. Which of the following is false about probability distributions?
- Each probability should be positive, less than or equal to 1.
- Each probability should be greater than or equal to 0.
- The outcomes listed must be independent.
- The probabilities must total 1.
Q3. Last semester, out of 170 students taking a particular statistics class, 71 students were “majoring” in social sciences and 53 students were majoring in pre-medical studies. There were 6 students who were majoring in both pre-medical studies and social sciences. What is the probability that a randomly chosen student is majoring in social sciences, given that s/he is majoring in pre-medical studies?
Quiz 2: Week 3 Quiz
Q1. Which of the following states that the proportion of occurrences with a particular outcome converges to the probability of that outcome?
- General addition rule
- Law of averages
- Law of large numbers
- Bayes’ theorem
Q2. Shown below are four Venn diagrams. In which of the diagrams does the shaded area represent A and B but not C?
Q3. Each choice below shows a suggested probability distribution for the method of access to online course materials (desktop computer, laptop computer, tablet, smartphone). Determine which is a proper probability distribution.
- desktop computer: 0.20, laptop computer: 0.20, tablet: 0.20, smartphone: 0.20
- desktop computer: 0.25, laptop computer: 0.35, tablet: 0.15, smartphone: 0.25
- desktop computer: 0.30, laptop computer: 0.40, tablet: 0.35, smartphone: -0.05
- desktop computer: 0.15, laptop computer: 0.50, tablet: 0.30, smartphone: 0.20
Q4. Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, assume heterosexual relationships. What is the probability that a randomly chosen couple is comprised of a male and female with blue eyes?
(Reference: Laeng, Bruno, Ronny Mathisen, and Jan-Are Johnsen. “Why do blue-eyed men prefer women with the same eye color?.” Behavioral Ecology and Sociobiology 61.3 (2007): 371-384.)
Q5. Which of the following statements is false?
- Two independent events cannot occur at the same time.
- Two complementary outcomes (of the same event) cannot occur at the same time.
- Two disjoint outcomes (of the same event) cannot occur at the same time.
- Two mutually exclusive outcomes (of the same event) cannot occur at the same time.
Week 7: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Probability
Q1. Fill in the blank: A streak length of 1 means one _ followed by one miss.
Q2. Fill in the blank: A streak length of 0 means one _ which must occur after a miss that ended the preceding streak.
Q3. Which of the following is false about the distribution of Kobe’s streak lengths from the 2009 NBA finals.
- The distribution of Kobe’s streaks is unimodal and right skewed.
- The typical length of a streak is 0 since the median of the distribution is at 0.
- The shortest streak is of length 1.
- The longest streak of baskets is of length 4.
- The IQR of the distribution is 1.
Q4. If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the exercise above?
- Exactly the same
- Somewhat similar
- Totally different
Q5. How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns?
- The distributions look very similar. Therefore, there appears to be evidence for Kobe Bryant’s hot hand.
- The distributions look very similar. Therefore, there doesn’t appear to be evidence for Kobe Bryant’s hot hand.
- The distributions look very different. Therefore, there doesn’t appear to be evidence for Kobe Bryant’s hot hand.
- The distributions look very different. Therefore, there appears to be evidence for Kobe Bryant’s hot hand.
Week 8: Introduction to Probability and Data with R Coursera Quiz Answers
Quiz 1: Practice Quiz
Q1. Heights of 10 year-olds, regardless of gender, closely follow a normal distribution with mean 55 inches and standard deviation 6 inches. Which of the following is true?
- A normal probability plot of heights of a random sample of 500 10 year- olds people should show a fairly straight line.
- A 10 year-old who is 65 inches tall would be considered more unusual than a 10 year-old who is 45 inches tall.
- Roughly 95% of 10 year-olds are between 37 and 73 inches tall.
- We would expect more 10 year-olds to be shorter than 55 inches than taller.
Q2. While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 children. What is the probability that exactly 2 of them will be boys?
Q3. You are about to take a multi-day tour through a national park which is famous for its wildlife. The tour guide tells you that on any given day there’s a 61% chance that a visitor will see at least one “big game” animal, and a 39% chance they’ll see no big game animals; when the tour guide says “big game”, he refers to either a moose or a bear. The guide assures you that big game sightings on a single day are independent of any other day’s sightings. Given the information from the tour guide, which of the following calculations cannot be performed using a binomial distribution?
- Calculate the probability that over a 5-day trip, you see big game on the first day and on every day after.
- Calculate the probability that you see big game exactly 0 days of an 8-day trip.
- Calculate the probability that you see big game on at least 8 days of a 10-day trip.
- Calculate the probability that you see at least 4 big game animals on the first day of a 5-day trip.
Q4. Your friend is about to begin an introductory chemistry course at his university. The course has collected data from students on their study habits for many years, and the professor reports that study times (in hours) for the final exam closely follow a normal distribution with mean 24 and standard deviation 4. What percentage of students study 34 hours or more?
- Less than 2.5%
- Between 5% and 10%
- Between 15% and 20%
- Between 2.5% and 5%
- Between 30% and 35%
Q5. Which of the following is false? Hint: It might be useful to sketch the distributions.
- The Z score for the median of a symmetric distribution is approximately 0.
- Calculating percentiles based on the Z table is only appropriate for observations that come from (nearly) normal distributions.
- The Z score for the median of a left skewed distribution is most likely negative.
- Z scores are defined for observations from distributions of any shape and skew.
Q6. About 30% of human twins are identical, and the rest are fraternal. Identical twins are necessarily the same sex, half are males and the other half are females. One-quarter of fraternal twins are both male, one-quarter both female, and one-half are mixes: one male, one female. You have just become a parent of twins and are told they are both girls. Given this information, what is the probability that they are identical?
Q7. Which of the following probabilities can be calculated using the normal approximation to the binomial distribution?
- A 2013 Gallup poll reports that 8% of Americans say the situation in Syria is the most important issue affecting the U.S. In a randomly selected group of 75 Americans, what is the probability that more than 10 of them believe the situation in Syria is the most important issue facing the U.S.?
- A clothing store offers store credit cards and only about 17% of the credit card holders are males. If we were to randomly sample 100 store credit card holders to conduct a survey, what is the probability that at most 20 of the sampled individuals would be males?
- A September 2011 Gallup poll suggests that 56% of Americans do not have a great deal of confidence in the mass media to report the news fully, accurately, and fairly. What is the probability that in a random sample of 20 people, 10 or more of them have confidence in the mass media?
- Roughly 20% of Americans smoke. What is the probability that in a random sample of 40 people at least 5 are smokers?
Quiz 2: Week 4 Quiz
Q1. Suppose that scores on a national entrance exam are normally distributed with mean 1000 and standard deviation 100. Which of the following is false?
- Roughly 68% of people have scores between 900 and 1100.
- We would expect the number of people scoring above 1200 to be more than the number of people scoring below 900.
- A normal probability plot of national entrance exam scores of a random sample of 1,000 people should show a straight line.
- A score greater than 1300 is more unusual than a score less than 800.
Q2. A 2005 survey found that 7% of teenagers (ages 13 to 17) suffer from an extreme fear of spiders (arachnophobia). At a summer camp there are 10 teenagers sleeping in each tent. Assume that these 10 teenagers are independent of each other. What is the probability that at least one of them suffers from arachnophobia?
Q3. Your roommate loves to eat Chinese food for dinner. He estimates that on any given night, there’s a 30% chance he’ll choose to eat Chinese food. Although he loves Chinese food, he doesn’t like to eat it too much in a short period of time, so on most weeks he eats several different kinds of foods for dinner. Suppose you wanted to calculate the probability that, over the next 7 days, you friend eats Chinese food at least 3 times. Which of the following is the most accurate statement about calculating this probability?
- Because we know n = 3, k = 7, and p = 0.30, we can use the binomial distribution to calculate the desired probability.
- Because “success” or “failure” have no real meaning in the context of this problem, we cannot use the binomial distribution to calculate the desired probability.
- Because we know n = 7, k = 3, and p = 0.30, we can use the binomial distribution to calculate the desired probability.
- Because we do not know the probabilities of your roommate eating any other types of foods, we cannot use the binomial distribution to calculate the desired probability.
- Because he doesn’t like to eat Chinese food too much in a short period of time, p is not really the same for each trial and so we cannot use the binomial distribution to calculate the desired probability.
Q4. Which of the following, on its own, is the least useful method for assessing if the data follow a normal distribution?
- Check if the points are on a straight line on a normal probability plot.
- Check if the distribution is unimodal and symmetric.
- Check if the mean and median are equal.
- Check if 68% of the data are within 1 SD of the mean, 95% of data are within 2 SDs of the mean, and 99.7% of data are within 3 SDs of the mean.
Q5. Which of the following is true? Hint: It might be useful to sketch the distributions.
- The Z score for the median is approximately 0 if the distribution is bimodal and symmetric.
- The Z score for the median will usually be 0 if the distribution is unimodal and right- skewed.
- The Z score for the mean is undefined if the distribution is bimodal and skewed.
- The Z score for the median is undefined if the distribution is bimodal.
Q6. More than three-quarters of the nation’s colleges and universities now offer online classes, and about 23% of college graduates have taken a course online. 39% of those who have taken a course online believe that online courses provide the same educational value as one taken in person, a view shared by only 27% of those who have not taken an online course. At a coffee shop you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before?
Q7. One strange phenomenon that sometimes occurs at U.S. airport security gates is that an otherwise law-abiding passenger is caught with a gun in his/her carry-on bag. Usually the passenger claims he/she forgot to remove the handgun from a rarely-used bag before packing it for airline travel. It’s estimated that every day 3,000,000 gun owners fly on domestic U.S. flights. Suppose the probability a gun owner will mistakenly take a gun to the airport is 0.00001. What is the probability that tomorrow more than 35 domestic passengers will accidentally get caught with a gun at the airport? Choose the closest answer.