Book Appointment Now

# Introduction to Probability and Data with R Coursera Quiz Answers

## Table of Contents

## Get All Weeks Introduction to Probability and Data with R Coursera Quiz Answers

### Week 02: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz1: Practice Quiz

Q1. Which of the following classifications of variable types is false?

[expand title=View Answer] The population of each state in the US → Continuous numerical [/expand]

Q2. True or False: If subjects are randomly assigned to treatments, conclusions can be generalized to the population.

[expand title=View Answer]True[/expand]

Q3. As part of a statistics project, Andrea would like to collect data on household size in her city. To do so, she asks each person in her statistics class for the size of their household and reports that her sample is a simple random sample. However, this is not a simple random sample. Which of the following is the best reasoning for why this is not a random sample that is appropriate for this research question?

[expand title=View Answer] Andrea did not use any randomization; she took a convenience sample. [/expand]

Q4. Which of the following is not one of the four principles of experimental design?

[expand title=View Answer] stratify [/expand]

Q5. True or False: Stratified sampling allows for controlling for possible confounders in the sampling stage while blocking allows for controlling for such variables during the random assignment.

[expand title=View Answer] True [/expand]

#### Quiz 2: Week 1 Quiz

Q1. Consider the table below describing a data set of individuals who have registered to volunteer at a public school. Which of the choices below lists categorical variables?

[expand title=View Answer] Name and number of siblings [/expand]

Q2. The General Social Survey conducted annually in the United States asks how many friends people have and how they would rate their happiness level (very happy, pretty happy, not too happy). In order to evaluate the relationship between these two variables a researcher calculates the average number of friends for people who categorize themselves as very happy, pretty happy, and not too happy. Which of the following correctly identifies the variables used in the study as explanatory and response?

[expand title=View Answer]

explanatory:happiness level (categorical with 3 levels)

response: number of friends

[/expand]

Q3. In a study published in 2011 in The Proceedings of the National Academy of Sciences, researchers randomly assigned 120 elderly men and women who volunteered to be a part of this study (average age mid-60s) to one of two exercise groups. One group walked around a track three times a week; the other did a variety of less aerobic exercises, including yoga and resistance training with bands. After a year, brain scans showed that among the walkers, the hippocampus (part of the brain responsible for forming memories) had increased in volume by about 2% on average; in the others, it had declined by about 1.4%. Which of the following is false?

[expand title=View Answer]The results of this study can be generalized to all elderly. [/expand]

Q4. An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a ** _** (use all lower cases in your answer please).

[expand title=View Answer] An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a confounder. [/expand]

Q5. For your political science class, you’d like to take a survey from a sample of all the Catholic Church members in your town. Your town is divided into 17 neighborhoods, each with similar socio-economic status distribution and ethnic diversity, and each contains a Catholic Church. Rather than trying to obtain a list of all members of all these churches, you decide to pick 3 churches at random. For these churches, you’ll ask to get a list of all current members and contact 100 members at random. What kind of design have you used?

[expand title=View Answer] multistage sampling [/expand]

Q6. In an experiment, what purpose does blocking serve?

[expand title=View Answer] Control for variables that might influence the response. [/expand]

Q7. Which of the following is one of the four principles of experimental design?

[expand title=View Answer] control [/expand]

### Week 3: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Introduction to R and RStudio

Q1. How many variables are included in this data set (data set: Arbuthnot)?

[expand title=View Answer] 2 [/expand]

Q2. What command would you use to extract just the counts of girls born?

[expand title=View Answer] arbuthnot$girls[/expand]

Q3. Which of the following best describes the number of girls baptized over the years included in this dataset?

[expand title=View Answer] There is initially an increase in the number of girls baptized. This number peaks around 1640 and then after 1640, the number of girls baptized decreases. [/expand]

Q4. How many variables are included in this data set (data set: present)?

[expand title=View Answer]3 [/expand]

Q5. Calculate the total number of births for each year and store these values in a new variable called total in the present dataset. Then, calculate the proportion of boys born each year and store these values in a new variable called prop_boys in the same dataset. Plot these values over time and based on the plot determine if the following statement is true or false: The proportion of boys born in the US has decreased over time.

[expand title=View Answer] False [/expand]

Q6. Create a new variable called more_boys which contains the value of either TRUE if that year had more boys than girls, or FALSE if that year did not. Based on this variable which of the following statements is true?

[expand title=View Answer] Every year there are more boys born than girls.[/expand]

Q7. Calculate the boy-to-girl ratio each year, and store these values in a new variable called prop_boy_girl in the present dataset. Plot these values over time. Which of the following best describes the trend?

[expand title=View Answer] There is initially an increase in the boy-to-girl ratio, which peaks around 1960. After 1960 there is a decrease in the boy-to-girl ratio, but the number began to increase in the mid-1970s.[/expand]

Q8. In what year did we see the most total number of births in the U.S.?

[expand title=View Answer] 2007 [/expand]

### Week 4: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Practice Quiz

Q1. Which of the below data sets has the lowest standard deviation? You do not need to calculate the exact standard deviations to answer this question.

[expand title=View Answer] 100, 100, 100, 100, 100, 100, 101 [/expand]

Q2. True or False: The statistic mean/median (mean divided by median) can be used as a measure of skewness (either right or left). Suppose we are dealing with a distribution where the minimum is 0.5. If this statistic (mean/median) is less than 1, the distribution is most likely left skewed.

[expand title=View Answer] True[/expand]

Q3. True or False: You are going to collect income data from a right-skewed distribution of the incomes of politicians. If you take a large enough sample from that distribution, the sample mean and the sample median will always have the same value.

[expand title=View Answer] False[/expand]

Q4. True or False: A mosaic plot is useful for visualizing the relationship between a numerical and a categorical variable.

[expand title=View Answer] False[/expand]

Q5. Does meditation cure insomnia? Researchers randomly divided 400 people into two equal-sized groups. One group meditated daily for 30 minutes, the other group attended a 2-hour information session on insomnia. At the beginning of the study, the average difference between the number of minutes slept between the two groups was about 0. After the study, the average difference was about 32 minutes, and the meditation group had a higher average number of minutes slept. To test whether an average difference of 32 minutes could be attributed to chance, a statistics student decided to conduct a randomization test. She wrote the number of minutes slept by each subject in the study on an index card. She shuffled the cards together very well, and then dealt them into two equal-sized groups. Which of the following best describes the outcome?

[expand title=View Answer] The average difference between the two stacks of cards will be about 0 minutes.[/expand]

#### Quiz 2: Week 2 Quiz

Q1. Which of the below data sets has the highest standard deviation? You do not need to calculate the exact standard deviations to answer this question.

[expand title=View Answer] 0, 100, 200, 300, 400, 500, 600[/expand]

Q2. The distribution of exam scores (ranging from 0 – 100%) where the mean score is 75%, the standard deviation is 12%, and the median is 78% is most likely

[expand title=View Answer] left skewed [/expand]

Q3. Two distributions (A and B) are shown on the box plot below. Which of the following statements is not supported by the plot?

[expand title=View Answer] Both distributions are roughly symmetric. [/expand]

Q4. Which is more affected by extreme observations, the mean or median? And how about the standard deviation or IQR?

[expand title=View Answer] mean, SD [/expand]

Q5. Phi Delta Kappa (PDK) is an international professional organization for educators that, in collaboration with Gallup, has been conducting polls on the public’s attitudes toward public schools since 1969. The following was one of the questions on the 2011 poll:

[expand title=View Answer] A histogram or a box plot would be useful for investigating if the distribution of opinions on teachers belonging to unions or bargaining associations varies by political party affiliation. [/expand]

Q6. In 1948, Austin Bradford Hill, designed a study to test a new treatment for tuberculosis that at the beginning of the study there was no evidence whether it would be any better or worse than bed rest. He randomly assigned some patients who volunteered to be a part of this study to receive the treatment of Streptomycin, an antibiotic. The other patients received only bed rest as the control group. Hill then observed the patients’ outcomes: which patients died and which recovered. The results of the study are shown below.

We use the following simulation test if there is a difference between the recovery rates under the two treatments: We write “died” on 18 index cards and “survived” on 89 index cards to indicate whether or not a patient died. Next, we shuffle the cards and deal them into two groups of 52 and 55, for control and treatment, respectively. We then calculate the simulated difference between the recovery rates in Streptomycin and control groups (p̂Streptomycin − p̂Control), and record this value. We repeat this simulation 100 times. The histogram below shows the distribution simulated difference between the recovery rates in these 100 simulations.

Which of the following is correct? Choose all that apply (there are multiple correct answers).

[expand title=View Answer]

1.The difference between the survival rates in the control and treatment groups appears to be simply due to chance.

2.The alternative hypothesis is that the Streptomycin treatment is more effective than bed rest.

[/expand]

### Week 5: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Introduction to Data

Q1. Create a new data frame that includes flights headed to SFO in February, and save this data frame assfo_feb_flights. How many flights meet these criteria?

[expand title=View Answer] 32735 [/expand]

Q2. Make a histogram and calculate appropriate summary statistics for arrival delays of sfo_feb_flights. Which of the following is false?

[expand title=View Answer] No flight is delayed more than 2 hours.[/expand]

Q3. Calculate the median and interquartile range for arr_delays of flights in the sfo_feb_flights data frame, grouped by carrier. Which carrier has the highest IQR of arrival delays?

[expand title=View Answer] JetBlue Airways[/expand]

Q4. Considering the data from all the NYC airports, which month has the highest average departure delay?

[expand title=View Answer] March [/expand]

Q5. Which month has the highest median departure delay from an NYC airport?

[expand title=View Answer] January [/expand]

Q6. Is the mean or the median a more reliable measure for deciding which month(s) to avoid flying if you really dislike delayed flights, and why?

[expand title=View Answer] The median would be more reliable as the distribution of delays is skewed. [/expand]

Q7. If you were selecting an airport simply based on on-time departure percentage, which NYC airport would you choose to fly out of?

[expand title=View Answer] JFK [/expand]

Q8. Mutate the data frame so that it includes a new variable that contains the average speed, avg_speed traveled by the plane for each journey (in mph). What is the tail number of the plane with the fastest avg_speed? Hint: Average speed can be calculated as distance divided by a number of hours of travel, and note that air_time is given in minutes. If you just want to show the avg_speed and tailnum and none of the other variables, use the select function at the end of your pipe to select just these two variables with select(avg_speed, tailnum). You can google this tail number to find out more about the aircraft.

[expand title=View Answer] N666DN [/expand]

Q9. Make a scatterplot of avg_speed vs. distance. Which of the following is true about the relationship between average speed and distance.

[expand title=View Answer] As distance increases the average speed of flights decreases. [/expand]

Q10. Suppose you define a flight to be “on time” if it gets to the destination on time or earlier than expected, regardless of any departure delays. Mutate the data frame to create a new variable called arr_type with levels “on time” and “delayed” based on this definition. Also mutate to create a new variable called dep_type with levels “on time” and “delayed” depending on the flight was delayed for fewer than 5 minutes or 5 minutes or more, respectively. In other words, if arr_delay is 0 minutes or fewer, arr_type is “on time”. If dep_delay is less than 5 minutes, dep_type is “on time”. Then, determine the on time arrival percentage based on whether the flight departed on time or not. What fraction of flights that were “delayed” departing arrive “on time”? (Enter the answer in decimal point, like 0.xx)

[expand title=View Answer] 0.791818181818182 (approximately) [/expand]

### Week 6: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Practice Quiz

Q2. Which of the following is false about probability distributions?

[expand title=View Answer] The outcomes listed must be independent. [/expand]

Q3. Last semester, out of 170 students taking a particular statistics class, 71 students were “majoring” in social sciences and 53 students were majoring in pre-medical studies. There were 6 students who were majoring in both pre-medical studies and social sciences. What is the probability that a randomly chosen student is majoring in social sciences, given that s/he is majoring in pre-medical studies?

[expand title=View Answer] 6/53 [/expand]

#### Quiz 2: Week 3 Quiz

Q1. Which of the following states that the proportion of occurrences with a particular outcome converges to the probability of that outcome?

[expand title=View Answer] Law of large numbers [/expand]

Q2. Shown below are four Venn diagrams. In which of the diagrams does the shaded area represent A and B but not C?

[expand title=View Answer]Diagram D [/expand]

Q3. Each choice below shows a suggested probability distribution for the method of access to online course materials (desktop computer, laptop computer, tablet, smartphone). Determine which is a proper probability distribution.

[expand title=View Answer]desktop computer: 0.20, laptop computer: 0.20, tablet: 0.20, smartphone: 0.20[/expand]

Q4. Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, assume heterosexual relationships. What is the probability that a randomly chosen couple is comprised of a male and female with blue eyes?

(Reference: Laeng, Bruno, Ronny Mathisen, and Jan-Are Johnsen. “Why do blue-eyed men prefer women with the same eye color?.” Behavioral Ecology and Sociobiology 61.3 (2007): 371-384.)

[expand title=View Answer] (108+114−78)/204[/expand]

Q5. Which of the following statements is false?

[expand title=View Answer] Two independent events cannot occur at the same time.[/expand]

### Week 7: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Probability

Q1. Fill in the blank: A streak length of 1 means one *_* followed by one miss.

[expand title=View Answer] hit [/expand]

Q2. Fill in the blank: A streak length of 0 means one *_* which must occur after a miss that ended the preceding streak.

[expand title=View Answer]hit [/expand]

Q3. Which of the following is false about the distribution of Kobe’s streak lengths from the 2009 NBA finals?

[expand title=View Answer] The longest streak of baskets is of length 4. [/expand]

Q4. If you were to run the simulation of the independent shooter a second time, how would you expect its streak distribution to compare to the distribution from the exercise above?

[expand title=View Answer] Exactly the same [/expand]

Q5. How does Kobe Bryant’s distribution of streak lengths compare to the distribution of streak lengths for the simulated shooter? Using this comparison, do you have evidence that the hot hand model fits Kobe’s shooting patterns?

[expand title=View Answer] The distributions look very different. Therefore, there doesn’t appear to be evidence for Kobe Bryant’s hot hand. [/expand]

### Week 8: Introduction to Probability and Data with R Coursera Quiz Answers

#### Quiz 1: Practice Quiz

Q1. Heights of 10-year-olds, regardless of gender, closely follow a normal distribution with a mean 55 inches and a standard deviation of 6 inches. Which of the following is true?

[expand title=View Answer] Roughly 95% of 10-year-olds are between 37 and 73 inches tall.[/expand]

Q2. While it is often assumed that the probabilities of having a boy or a girl are the same, the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 children. What is the probability that exactly 2 of them will be boys?

[expand title=View Answer] 0.48 [/expand]

Q3. You are about to take a multi-day tour through a national park which is famous for its wildlife. The tour guide tells you that on any given day there’s a 61% chance that a visitor will see at least one “big game” animal and a 39% chance they’ll see no big game animals; When the tour guide says “big game”, he refers to either a moose or a bear. The guide assures you that big game sightings on a single day are independent of any other day’s sightings. Given the information from the tour guide, which of the following calculations cannot be performed using a binomial distribution?

[expand title=View Answer] Calculate the probability that you see at least 4 big game animals on the first day of a 5-day trip.[/expand]

Q4. Your friend is about to begin an introductory chemistry course at his university. The course has collected data from students on their study habits for many years, and the professor reports that study times (in hours) for the final exam closely follow a normal distribution with a mean 24 and a standard deviation 4. What percentage of students study 34 hours or more?

[expand title=View Answer] Less than 2.5% [/expand]

Q5. Which of the following is false? Hint: It might be useful to sketch the distributions.

[expand title=View Answer] The Z score for the median of a left-skewed distribution is most likely negative. [/expand]

Q6. About 30% of human twins are identical, and the rest are fraternal. Identical twins are necessarily the same sex, half are males and the other half are females. One-quarter of fraternal twins are both males, one-quarter are both female and one-half are mixed: one male, one female. You have just become a parent of twins and are told they are both girls. Given this information, what is the probability that they are identical?

[expand title=View Answer]33% [/expand]

Q7. Which of the following probabilities can be calculated using the normal approximation to the binomial distribution?

[expand title=View Answer] A September 2011 Gallup poll suggests that 56% of Americans do not have a great deal of confidence in the mass media to report the news fully, accurately, and fairly. What is the probability that in a random sample of 20 people, 10 or more of them have confidence in the mass media? [/expand]

#### Quiz 2: Week 4 Quiz

Q1. Suppose that scores on a national entrance exam are normally distributed with a mean 1000 and a standard deviation of 100. Which of the following is false?

[expand title=View Answer] A normal probability plot of national entrance exam scores of a random sample of 1,000 people should show a straight line. [/expand]

Q2. A 2005 survey found that 7% of teenagers (ages 13 to 17) suffer from an extreme fear of spiders (arachnophobia). At a summer camp, there are 10 teenagers sleeping in each tent. Assume that these 10 teenagers are independent of each other. What is the probability that at least one of them suffers from arachnophobia?

[expand title=View Answer] 62% [/expand]

Q3. Your roommate loves to eat Chinese food for dinner. He estimates that on any given night, there’s a 30% chance he’ll choose to eat Chinese food. Although he loves Chinese food, he doesn’t like to eat it too much in a short period of time, so on most weeks he eats several different kinds of foods for dinner. Suppose you wanted to calculate the probability that, over the next 7 days, you friend eats Chinese food at least 3 times. Which of the following is the most accurate statement about calculating this probability?

[expand title=View Answer]Because we know n = 7, k = 3, and p = 0.30, we can use the binomial distribution to calculate the desired probability. [/expand]

Q4. Which of the following, on its own, is the least useful method for assessing if the data follow a normal distribution?

[expand title=View Answer] Check if the mean and median are equal.[/expand]

Q5. Which of the following is true? Hint: It might be useful to sketch the distributions.

[expand title=View Answer] The Z score for the median is approximately 0 if the distribution is bimodal and symmetric. [/expand]

Q6. More than three-quarters of the nation’s colleges and universities now offer online classes, and about 23% of college graduates have taken a course online. 39% of those who have taken a course online believe that online courses provide the same educational value as one taken in person, a view shared by only 27% of those who have not taken an online course. At a coffee shop, you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before?

[expand title=View Answer] 0.1403[/expand]

Q7. One strange phenomenon that sometimes occurs at U.S. airport security gates is that an otherwise law-abiding passenger is caught with a gun in his/her carry-on bag. Usually the passenger claims he/she forgot to remove the handgun from a rarely-used bag before packing it for airline travel. It’s estimated that every day 3,000,000 gun owners fly on domestic U.S. flights. Suppose the probability a gun owner will mistakenly take a gun to the airport is 0.00001. What is the probability that tomorrow more than 35 domestic passengers will accidentally get caught with a gun at the airport? Choose the closest answer.

[expand title=View Answer] 0.82 [/expand]

**Conclusion:**

In conclusion, our journey through the Introduction to Probability and Data with R course has been a fascinating exploration of the fundamental concepts that underpin the world of data analysis and statistics. We’ve delved into the principles of probability, statistical inference, and data visualization, equipping ourselves with essential tools for making sense of the vast sea of information that surrounds us.

**Find More Related Quiz Answers >>**

Introduction to Databases Coursera Quiz Answers

Version Control Coursera Quiz Answers