### Get All Weeks Linear Regression in R for Public Health Coursera Quiz Answers

Welcome to Linear Regression in R for Public Health!

Public Health has been defined as “the art and science of preventing disease, prolonging life and promoting health through the organized efforts of society”. Knowing what causes disease and what makes it worse are clearly vital parts of this. This requires the development of statistical models that describe how patient and environmental factors affect our chances of getting ill.

This course will show you how to create such models from scratch, beginning with introducing you to the concept of correlation and linear regression before walking you through importing and examining your data, and then showing you how to fit models. Using the example of respiratory disease, these models will describe how patients and other factors affect outcomes such as lung function.

### Linear Regression in R for Public HealthCoursera Quiz Answers

#### Week 1 Quiz Answers

#### Quiz 1: Linear Regression Models: Behind the Headlines

Q1. You have just read a short extract about Loss of Control over eating in pregnancy.

As you can imagine, this

short extract does not provide enough information for a full

understanding of all the issues; however, it should provide enough information to prompt

initial thoughts on this topic of research.

Work through and answer

these reflective questions (1-5) aimed at prompting you to think about when and why a linear regression model would be appropriate to use.

What do you think the research question in this study might have been?

What do you think?

Q2. Can you identify the likely outcome variable of interest for this research question?

What do you think?

Q3. Do you think a linear regression model could be a suitable method of analysis for this reserach question? If so, why?

What do you think?

Q4. What explanatory variables might have been included in the model?

What do you think?

Q5. What was the purpose of this model?

- Evaluation
**Understanding causes**- Prediction (diagnosis/prognosis)

#### Quiz 2: Correlations Quiz Answers

Q1. Which of the following plots show negative correlation?

- C
**A**- B

Q2. Which plot shows the strongest association?

**B**- A

Q3. Is the following relationship linear?

- Yes
**No**

Q4. Is the following relationship linear?

**Yes**- No

Q5. Does the plot show an association between the two variables?

- Yes
**No**

Q6. What would the Pearson’s correlation coefficient be for the following plot?

- -0.9
- 0.03
- 10
- 0.9
**0.2**

Q7. What would the Pearson’s correlation coefficient be for the following plot?

- 10
- -0.9
**0.9**- 0.03
- 0.2

Q8. What would the Pearson’s correlation coefficient be for the following plot?

- -0.9
**0.03**- 10
- 0.9
- 0.2

Q9. Matyas wants to see if there is an association between gender and systolic blood pressure. He decides that the Pearson’s correlation coefficient would be an appropriate statistic for this. Is Matyas correct?

- Yes
**No**

Q10. Francesca wants to look at the association between systolic blood pressure and age. Which of the following conditions must be satisfied for her to be able to proceed with Pearson’s correlation? Select all answers that apply.

**Both variables must be continuous**- One variable should be known or suspected to cause a change in the other variable
- At least one variable must be continuous
**Observations must come from a random sample from the population****Both variables should be approximately normally distributed in the population to calculate valid confidence intervals.**

Q11. Andrea calculates the correlation between a person’s sugar consumption over a day and the minutes of sleep achieved the previous night. She gets a value of r = – 0.9 (95% confidence interval (-0.96, -0.86), p-value < 0.001). She concludes that increased sleep leads to a reduction in sugar consumption. Is this statement true or false?

- True
**False**

#### Quiz 3: Spearman Correlation Quiz Answers

Q1. You have now covered two approaches that can be used to estimate the correlation. It is important that you are clear when each of these are appropriate to use. The questions below will help clarify your understanding on which to use and when.

Mo has a sample of data from patients attending his clinic that contains information on heights and weights. He wants to examine whether there is an association between these variables. He decides his first step should be to plot a scatterplot of these variables. His plot looks like this:

Please select the best answer from below. Based on the plot, it is appropriate for Mo to calculate:

- Pearson’s correlation coefficient
- Spearman’s correlation coefficient
**Either**- Neither

Q2. Ben is interested in the impact of daily calorie consumption and maternal BMI. He has a sample of data from a clinic. He wants to know if there is an association between these variables. His first step is to produce a scatterplot of these variables. His plot looks like this:

- Based on this plot is it strictly appropriate for Ben to calculate the:
- Pearson’s correlation coefficient
- Spearman’s correlation coefficient
- Either
**Neither**

Q3. Pippa wants to establish whether there is a relationship between gestational age and baby’s weight gain in the first 6 months. Her data looks like this:

- Based on this plot is it strictly appropriate for Pippa to calculate the:
- Pearson’s correlation coefficient
**Spearman’s correlation coefficient**- Either
- Neither

Q4. Zoe wants to look at the association between pain scores measured on a pain questionnaire (range 0-10) and current age in a sample of arthritis patients. Which of the following conditions must be satisfied for her to be able to proceed with Spearman’s correlation?

Select all answers that apply.

- There should be a monotonic relationship between the two variables.
**One variable should be known or at least suspected to cause a change in the other variable.****Both variables must be continuous.****Both variables must be either continuous or ordinal.****Observations must come from a random sample from the population.****At least one variable must be continuous.**

Q5. Charlie wants to look at the association between pain scores (scale 0-10) and ethnicity in the same sample of arthritis patients. He decides that the Spearman’s correlation coefficient would be an appropriate statistic for this. Is Charlie correct?

- Yes
**No**

#### Quiz 4: Practice Quiz on Linear Regression Quiz Answers

Q1. This week has covered the

basics of linear regression and introduced your

to the model equation. The questions below will help you verify that you are clear how

to set up a model and interpret regression coefficients.

Luis wants to perform a linear regression to examine the relationship between the number of failures of a piece of hospital equipment and the age of the equipment. He examines the scatterplot and finds the following:

Is it reasonable for Luis to perform a linear regression in this scenario?

- Yes
**No**

Q2. Parveen wants to quantify the relationship between age and lung function measured by FEV1. She has looked at the scatterplot and is satisfied that there is a linear relationship between these two variables, so she decides to run a linear regression. Which of these variables should she use as the outcome (or dependent) variable?

**FEV1**- AGE
- Neither

Q3. Parveen fits the model and gets the following output:

FEV1= α+ β∗AGE=2.21+(−0.009)∗AGE

How should she interpret α = 2.21?

- α = 2.21 is the age value we would expect for people who have an FEV1 value of 0.
- α = 2.21 is the FEV1 value we would expect for people who are aged 0 years.
- α = 2.21 is the average increase in age for every 1 unit increase in FEV1.
- α = 2.21 is the average increase in FEV1 for every one year increase in age.

Q4. How should Parveen interpret β = -0.009?

- β = -0.009 is the age value we would expect for people who have an FEV1 value of 0.
- β = -0.009 is the average increase in FEV1 for every one year increase in age.
- β = -0.009 is the FEV1 value we would expect for people who are aged 0 years.
- β = -0.009 is the average increase in age for every 1 unit increase in FEV1.

Q5. She now wants to look at the impact of gender (male/female) on lung function measured by FEV1. She wants to run a linear regression but isn’t sure if she can do this with a variable that isn’t continuous (gender). Is she ok to proceed?

**Yes**- No

Q6. Parveen fits the model with FEV1 and gender and gets the following output:

FEV1= α+ β∗gender =1.32+(0.44)∗gender

where gender equals 0 if female and 1 if male.

How does she interpret the β coefficient?

**β = 0.44 is the average increase in FEV1 for every 1 unit increase in gender.**- β = 0.44 is the average increase in gender for every 1 unit increase in FEV1.
- β= 0.44 is the FEV1 value when gender equals 0.
- β = 0.44 is the value of gender when FEV1 equals 0.

Q7. If the linear regression equation for the relationship between blood pressure and age is given by:

blood pressure=α+ β*age

where α= 40 and β=0.5

What is the predicted blood pressure for a fifty year old?

**65**- 2000.5
- 90.5

#### Quiz 5: End of Week Quiz Answers

Q1. When we fit a linear

regression model we make strong assumptions about the relationships between variables

and variance. These assumptions need to be assessed to be valid if we are to be confident in estimated model parameters.

The questions below will help ascertain that you know what assumptions are made

and how to verify these.

Which of these is not assumed when fitting a linear regression model?

- Linearity between outcome and predictor variable.
- Variance of the outcome is constant across values of the predictor variable.
- The outcome variable is normally distributed across values of the predictor variable.
**The predictor variable is normally distributed across values of the outcome variable.**

Q2. Recall that the residuals are the distance between the observed values and the fitted regression line. If the assumptions of linear regression hold how would we expect the residuals to behave?

**Residuals ~ Normal (0, σ2)**- Residuals ~ Normal (0, 1)
- Residuals ~ Uniform (0, 1)
- Residuals ~ Normal (1, 0)

Q3. Parveen previously fitted a linear regression model to quantify the relationship between age and lung function measured by FEV1. After she fitted her linear regression model she decided to assess the validity of the linear regression assumptions. She knew she could do this by assessing the residuals and so produced the following plot known as a QQ plot.

How can she use this plot to see if her residuals satisfy the requirements for a linear regression?

- If the variance is constant across the predictor values the observations will lie on a straight line.
**If the residuals have a normal distribution then the observations will lie on a straight line.**- Both options
- Neither options

Q4. Parveen also produced this plot:

How can she use this plot to see if her residuals satisfy the requirements for a linear regression?

**If there is constant variance across predicted values then there will be an equal scatter of residual values around the mean value of zero.**- If the residuals are normal then the observations will fall on the y=0 line.
- If there is constant variance then there will be an uneven scatter of residual values around the mean value of zero.
- If there is non-constant variance then there will be an equal scatter of residual values around the mean value of zero.

Q5. Parveen notes that the value of the R2 for her model is 0.0104. What does this tell her?

**The regression line explains 1.04% of the total variability in her FEV1 data.**- The residuals explain 1.04% of the total variability in her FEV1 data.
- Neither option.

Q6. She decides to add gender into her model with age. She wants to compare if this model explains more of the variability in her data than the model with just age. How could she make this comparison?

**Compare the adjusted R-squared statistic.**- Compare the R-squared statistic.
- Plot the residuals against the fitted values for each model and choose the model with the most even scatter of residual values around the mean value of zero.
- Compare the QQ plots for each model and choose the model with best-fitting residuals.

Q7. Ching-yi is interested in the relationship between depression scores as measured by the Hospital Anxiety and Depression Scale (HADS) and lung function measured by FEV1. She fits the following model

FEV1= α+ β∗HADS=1.73+(−0.0116)∗HADS

She produces the following plots to assess whether her model satisfies the assumptions of linear regression:

Are the assumptions of linear regression satisfied?

**No – the plots suggest violation of the assumption that the residuals ~ Normal (0, σ2)**- Yes – the QQ plot shows that most values lay on the straight line and the drift in the tails doesn’t matter.
- Yes – the histogram looks symmetrical around the value of zero, the QQ plot shows that most values lay on the straight line, and the scatterplot show equal scatter around the mean value of zero.

#### Week 2 Quiz Answers

#### Quiz 1: Linear Regression Quiz Answers

Q1. You have had the opportunity

to run correlation and linear regression model in R and examine the output. This

questions below will

help you practice your interpretation of the results before we move on to

multiple linear regression.

In the previous Practice with R, you fitted a linear regression model and found the following relationship – MWT1best= α+β∗AGE=616.5+ (−3.10)*AGE

How should you interpret α = 616.5?

**α = 616.5 is the estimated age we would expect for people who have a walking distance of 0 metres.**- α = 616.5 is the average increase in walking distance for every one year increase in age.
- α = 616.5 is the average increase in age for every one metre increase in walking distance.
- α = 616.5 is the estimated walking distance we would expect for people who are aged 0 years.

Q2. How should you interpret β = -3.10?

**β = -3.10 is the average increase in walking distance for every one year increase in age.**- β = -3.10 is the average increase in age for every one metre increase in walking distance.
- β = -3.10 is the estimated walking distance we would expect for people who are aged 0 years.
- β = -3.10 is the estimated age we would expect for people who have a walking distance of 0 metres.

Q3. The β coefficient had a 95% confidence interval that ranged from -5.74 to -0.47. What does this indicate?

- The 95% confidence interval ranges from -5.74 to -0.47 indicating that the sample estimate has a 95% chance of lying in this range.
**The 95% confidence interval ranges from -5.74 to -0.47 indicating that there is a 95% chance the population parameter will lie in this range.**

Q4. What does the confidence interval for the β coefficient tell you about the statistical significance of β?

**The 95% confidence interval does not include 0 so it is a statistically significant result at the 5% significance level.**- The 95% confidence interval does not include 0 so it’s not a statistically significant result at the 5% significance level.
- Neither of the above, the 95% confidence interval doesn’t tell you anything about statistical significance.

Q5. The model produced an R2 value of 0.0529. What does this indicate?

**That the regression line is explaining 5% of the total variance in the walking distance observations.**- The residuals explain 5% of the total variability in the age observations.
- The residuals explain 5% of the total variability in the walking distance observations.
- That the regression line is explaining 5% of the total variance in the age observations.

Q6. In Practice with R: Why Spearman’s and Pearson’s may differ slightly, you found that the Pearson’s correlation coefficient between age and walking distance was -0.23 and that the Spearman’s correlation was -0.27. How would you interpret these correlation coefficients?

**Weak negative correlation**- Weak positive correlation
- Strong positive correlation
- Strong negative correlation

Q7. The Pearson’s and Spearman’s correlation identified in Practice with R: Why Spearman’s and Pearson’s may differ slightly are slightly different. Thinking back to the analysis you performed when you were examining the association of AGE and MTW1BEST. From the options below which would best explain this?

**A violation of the assumptions of Pearson’s correlation.**- A discontinuation in the distributions.
- An outlying observation.
- None of the above.

#### Quiz 2: End of Week Quiz

Q1. So far in this course you have learnt different approaches to analysing continuous

outcome data using correlation, simple and multiple linear regression and how to compare models. You

have tried these out in R. The quiz below will allow you to practice selecting

the correct approach to analysis and further practice your interpretation skills.

Milly wants to examine the

relationship between walking distance and BMI in COPD patients. Given the

limited information you have what analysis would you suggests she performs. Choose the single best option.

- Calculate a correlation coefficient
- Run a linear regression model
**Both**

Q2. Milly also wants to know if there is a relationship between walking distance and smoking status (with categories ‘current’ or ‘ex’-smokers).

Which of the following should Milly calculate?

- Spearman’s correlation
- Pearson’s correlation
**Linear regression**

Q3. The β coefficient had a 95% confidence interval that ranged from -5.74 to -0.47. What does this indicate?

- The 95% confidence interval ranges from -5.74 to -0.47 indicating that the sample estimate has a 0.95 probability of lying in this range.
**The 95% confidence interval ranges from -5.74 to -0.47 indicating that the population parameter has a 0.95 probability of lying in this range.**

Q4. Milly decides to use the more detailed assessment of smoking status captured by the variable PackHistory (which records a person’s pack years smoking, where pack years is defined as twenty cigarettes smoked every day for one year) to explore the relationship between walking distance and smoking status.

Milly finds

MWT1best= α+β∗PackHistory =442.2+ −1.1∗PackHistory

and the corresponding 95% confidence interval for β ranges from -1.9 to -0.25.

**The 95% confidence interval ranges from -1.9 to -0.25 indicating a statistically significant result at the 0.05 significance level.**

The 95% confidence interval ranges from -1.9 to -0.25 indicating a non-statistically significant result at the 0.05 significance level.

Neither.

Q5. Milly decides to fit the multivariable model with age, FEV1 and smoking pack years as predictors. MWT1best= α+β1∗AGE+β2∗FEV1+ β3∗PackHistory Milly is wondering whether this is a reasonable model to fit. Which of these do you think Milly could be concerned about?

**That smoking PackHistory and FEV1 might be highly correlated.**- That she has fitted the model with the wrong outcome.
- Both.
- Neither

Q6. How can Milly assess if there is an issue of collinearity in her model?

- Calculate variance inflation factors.
**Calculate pairwise correlation coefficients**- Examine the adjusted R2

Q7. Milly has now fitted several models and she wants to pick a final model. What statistic(s) can help her make this decision?

**Adjusted R2**- Correlation coefficients
- 95% confidence intervals for each coefficient
- All of the above

Q8. Milly has fitted the following 5 models. Using only the information in the table below, which model do you think Milly should take as her final one?

- Model 1
- Model 2
- Model 3
- Model 4
**Model 5**

#### Week 3 Quiz Answers

#### Quiz 1: Fitting and interpreting model results

Q1. You should now be very familiar with how to fit multiple linear regression models in R. When including multiple predictor variables the interpretation of the regression parameters alter slightly and you need to think about what is being estimated. In the quiz below you will get the opportunity to test out your ability to write regression equations and use this to help

you interpret results appropriately.

Earlier we fitted the linear regression model that looked at the association between walking distance and COPD severity. copdseverity is a categorical variable that can take values mild, moderate, severe or very severe. The R output looked like so:

Write out the model in algebraic form:

What do you think?

Q2. α= 458.09. What does this represent?

- α= 458.09 is the estimated average walking distance in all COPD patients.
**α= 458.09 is the estimated average walking distance in patients with mild COPD.**- α= 458.09 is the estimated average increase in walking distance in patients with mild COPD compared with all other severity levels of COPD patients.

Q3. β1=−51.09 What does this represent?

- β1=−51.09 is the estimated mean difference in walking distance in patients with moderate COPD
- β1=−51.09 is the estimated mean difference in walking distance in patients with moderate COPD compared with all other COPD patients.
**β1=−51.09 is the estimated mean difference in walking distance in patients with moderate COPD compared with mild COPD.**

Q4. β3=−167 What does this represent?

- That very severe patient walked on average 167.21 metres more than mild patients.
**That very severe patients walked on average 167.21 metres less than mild patients.****That very severe patients walked on average 167.21 metres less than moderate patients.**

Q5. Tao wants to use the COPD dataset to examine if there is a relationship between walking distance (MWT1Best) and smoking status (smoking). Remember that the variable smoking has two categories representing ‘current’ or ‘ex’ smokers. Which of the following should Tao use to examine this relationship?

**Linear regression**- Spearman’s correlation
- Pearson’s correlation

Q6. Tao decides to fit a linear regression model between walking distance (MWT1Best) and smoking status (smoking). He finds the following – MWT1Best = α + β *smokingstatus =385.6 +7.3 *smoking status where smoking = 1 if current and 2 if ex. How should he interpret β?

**7.3 is the average increase in walking distance for every one unit increase in smoking status.**- 7.3 is the value of smoking status when walking distance equals 1.
- 7.3 is the walking distance value when smoking status equals 1.
- 7.3 is the average increase in smoking status for every 1 unit increase in walking distance.

Q7. Tao notices that the 95% confidence interval ranges from -50.6 to 65.3. What does this tell Tao?

**The result is not statistically significant at the 5% level and he should not reject the null hypothesis of no relationship between the variables.**- The result is statistically significant at the 5% level and he should reject the null hypothesis of no relationship between the variables.
- The result is not statistically significant at the 5% level and he should conclude there is no relationship between variables.

Q8. Tao is interested in estimating the mean walking distance in ex-smokers. He uses the regression equation to do this. What does he estimate the mean to be?

**400.2**- 392.9
- 385.6

#### Quiz 2: Interpretation of interactions

Q1. Lina wants to investigate the effect of gender and Diabetes on lung function (FEV1). She plans to fit a linear regression model to investigate this. She suspects that the effect on FEV1 of diabetes in females is different from the effect of diabetes in males. She decides to include an interaction term in her model. Which interaction should she include in her model?

- gender*FEV1
**gender*Diabetes**- gender Diabetes FEV1
- Diabetes*FEV1

Q2. Jaz is interested in the effect of gender and Diabetes on walking distance (MTW1best) in COPD patients. He too suspects an interaction between his predictors. He fits the following model:

MTW1best= α+β1∗gender+ β2∗Diabetes+ β3∗gender∗Diabetes

=403.5+β1∗24.1− β2∗107.5+ β3∗21.4

where gender = 0 if female and 1 if male, Diabetes = 0 if absent and 1 if present.

What does α=403.5 represent?

- α=403.5 is the estimated average distance a female with diabetes can walk.
- α=403.5 is the estimated average distance a male without diabetes can walk.
**α=403.5 is the estimated average distance a female without diabetes can walk.**

Q3. Jaz wants to know the estimated average walking distance for a female with diabetes. Which is the correct answer?

- 403.5+24.1− 107.5+ 21.4= 341.5
**403.5− 107.5= 296**- 403.5− 107.5+ 21.4= 317.4
- 403.5+24.1= 427.6

### Week 4 Quiz Answers

#### Quiz 1: Problems with automated approaches Quiz Answers

Q1. Thinking of the COPD dataset that you have been working with. Recall that there are 101 subjects in this dataset. Imagine you plan to fit a linear regression model where your outcome of interest is the HAD score (HAD). You want to better understand the causes of increased HAD scores so decide to include all candidate predictors in your model (the full list of predictors is given in 3.05). What problems might you run into if you fit this model?

- Overfitting
- Collinearity
**Both overfitting and collinearity**- Neither overfitting or collinearity

Q2. You run a linear regression model but you suspect that you have included too many predictor variables for your sample size and have ‘overfitted’ your model. What are the implications on your results? Choose the single best answer.

- Model is explaining random error
- Lack of generalizability
**Both**- Neither

Q3. You decide it might be better to use an automated approach to select your predictors and fit a model using a stepwise approach. What are the potential implications on the estimated outputs?

**Biased p-values****Biased standard errors****Biased regression coefficients**- You’ll include all important predictors in your model

#### Quiz 2: End of Course Quiz

Q1. Imagine you fit a linear regression model and obtain the following output:

Based on an examination of the p-values, gender should not be included in the model:

- True
**False**

Q2. Is AGE is associated with a statistically significantly associated with HAD score (HAD)?

**Yes, AGE is statistically significantly associated with a decrease in HAD score (HAD)**- Yes, AGE is statistically significantly associated with a increase in HAD score (HAD)
- No, AGE is not associated with HAD score (HAD)

Q3. The comorbidity regression coefficient is 2.9, this means that the average HAD

score is 2.9 units higher for each comorbidity a patient has after adjusting for AGE, CAT scores (CAT) and gender.

**True**- False

Q4. The adjusted R-squared value of 0.318 means:

- That the model explains 3.2% of the total variability in the data
**That the model explains 32% of the total variability in the data**- There is 3.2% of the variability left unexplained by the model
- There is 32% of the variability left unexplained by the model

Q5. Look at the following model

that examine the CAT and smoking status as predictors of lung function9 FEV1)

It includes an interaction term between CAT and smoking status.

FEV1= α + β1*CAT+ β2*smoking+ β3*CAT*smoking

FEV1 =1.9 – 0.02*CAT + 1.05*smoking – 0.05*CAT*smoking

smoking = 0 if ex-smoker, = 1 if current smoker

What does α=1.9 represent?

- The estimated average FEV1 for ex-smokers compared to current smoker adjusted for CAT and CAT*smoking
**The estimated average FEV1 for ex-smokers with a CAT score of zero.**- The estimated average FEV1 for ex-smokers with a mean CAT score.

Q6. FEV1= α + β1*CAT+ β2*smoking+ β3*CAT*smoking

FEV1 =1.9 – 0.02*CAT + 1.05*smoking – 0.05*CAT*smoking

smoking = 0 if ex-smoker, = 1 if current smoker

What does the β_1 regression coefficient of -0.02 represent?

- This is the estimated average decrease in FEV1 for every unit increase in CAT score.
**This is the estimated average decrease in FEV1 for every unit increase in CAT score amongst ex-smokers.**- This is the estimated average decrease in FEV1 for every unit increase in CAT score amongst current smokers.

Q7. Which of the following approaches

are robust ways to select variables to include into a model when there are

many variables to choose from? Select all that apply.

- Include all variables that are known to be associated with the outcome if they are also significant (p<0.05) in your sample dataset
- Include all variables that result in regression coefficients that have a p-value of less than 0.05 in your model
**Include all variables you are interested in exploring their relationship with the outcome**- Include all variables selected by a stepwise procedure
- Include all variables in you dataset without consideration to the number of observations in your dataset as this is the most unbiased approach to selection.

##### Conclusion:

I hope this Linear Regression in R for Public Health Coursera Quiz Answer would be useful for you to learn something new from this Course. If it helped you then don’t forget to bookmark our site for more Quiz Answers.

This course is intended for audiences of all experiences who are interested in learning about new skills in a business context; there are no prerequisite courses.

Keep Learning!

##### Get All Course Quiz Answers of Statistical Analysis with R for Public Health Specialization

Introduction to Statistics & Data Analysis in Public Health Quiz Answers

Linear Regression in R for Public Health Coursera Quiz Answers

Logistic Regression in R for Public Health Coursera Quiz Answers

Survival Analysis in R for Public Health Coursera Quiz Answers