# Logistic Regression in R for Public Health Coursera Quiz Answers

### Get All Weeks Logistic Regression in R for Public Health Coursera Quiz Answers

Welcome to Logistic Regression in R for Public Health!

Why logistic regression for public health rather than just logistic regression? Well, there are some particular considerations for every data set, and public health data sets have particular features that need special attention. In a word, they’re messy. Like the others in the series, this is a hands-on course, giving you plenty of practice with R on real-life, messy data, with predicting who has diabetes from a set of patient characteristics as the worked example for this course.

Additionally, the interpretation of the outputs from the regression model can differ depending on the perspective that you take, and public health doesn’t just take the perspective of an individual patient but must also consider the population angle. That said, much of what is covered in this course is true for logistic regression when applied to any data set, so you will be able to apply the principles of this course to logistic regression more broadly too.

Enroll on Coursera

### Week 1 Quiz Answers

#### Quiz 1: Logistic Regression Quiz Answers

Q1. Before learning how to do it, it’s vital you understand when logistic regression can be used. This quiz aims to check you’ve got that bit.

Which of the following outcome variables would be suitable for logistic regression? Please select all answers that apply.

• Height in metres
• HDL cholesterol level in mmol/l
• Diagnosis of anaemia
• Death within one year from time of diagnosis of cervical cancer.
• Cost of treatment
• Systolic blood pressure below 140mmHg or greater than or equal to 140mmHg

Q2. For which of the following would logistic regression be an appropriate method of analysis? Please select all answers that apply.

• Investigating the effect of smoking on the probability of being diagnosed with lung cancer in a sample of adults who are cancer-free at the start of the study and are then assessed one year later
• Identifying whether sleeping under a mosquito net reduces the risk of getting malaria compared with not sleeping under a mosquito net.
• Investigating the relationship between body mass index (BMI) and systolic blood pressure
• Investigating the change in HDL cholesterol level in patients before and after taking statins
• Estimating how much peak expiratory flow rate (a measure of lung function) increases per cm increase in height
• Comparing one-year survival rates after having a stroke between men and women.

#### Quiz 2: End of Week Quiz

Q1. This week has covered the basics of when you can logistic regression and the nuts and bolts of the underlying mathematics. It’s important that you are clear what is modelled in a logistic regression model so that you can understand the output that R or other stats software gives you.

• Suppose you have an outcome variable with three different values. Which of the following is true? Several options could be correct.
• To run logistic regression, it’s essential to combine two of the values so that the outcome is binary
• The decision to combine two of the categories should be informed by factors such as how many patients have each value, how sensible it is to combine the categories, and what the potential impact on the study’s conclusions would be of combining them
• Combining two of the categories is always the right thing to do because then it would be simple to analyse

Q2. What is the mathematical quantity that is modelled in logistic regression?

• Log odds
• Probability
• Odds
• Log probability

Q3. Suppose that the odds of having diabetes in an individual with BMI under 25 is 0.2 and the odds of having diabetes in individuals with a BMI of 25 or over is 0.6. What is the odds ratio of having diabetes in those with BMI of 25 or over versus those with BMI under 25?

• 0.4
• 0.33
• 0.12
• 3
• 0.8

Q4. The odds of having diabetes is lower in those with normal blood pressure than those with high blood pressure. What will the odds ratio be when dividing the odds of diabetes in people with normal blood pressure by the odds of diabetes in those with high blood pressure?

• Greater than 1
• Equal to 1
• Negative
• Below 1

Q5. If the probability is 0.5, what are the odds?

• 3
• 0.33
• 0.25
• 1

### Week 2 Quiz Answers

#### Quiz 1: Cross Tabulation Quiz Answers

Q1. Now it’s your turn to have a go. It’s always useful to describe the age-sex distribution of your set of patients. The first task is to make age groups from age, allowing for any missing values. To make it manageable, go for just four age groups: under 45, 45-64, 65-74 and 75 or over. Then tabulate age group by itself, followed by a cross-tabulation with gender. Finally, add the overall percentages to this cross-tab.

When you’re ready, to check that you have the right answers, enter the number of females aged under 45 in this box:

`What do you think?`

Q2. Now enter the percentage of all patients who are male and aged 65-74 in this box, rounded to the nearest whole percentage:

`What do you think?`

Q3. Were there any missing values for age and/or gender?

• Yes
• No

#### Quiz 2: Interpreting Simple Logistic Regression Quiz Answers

Q1. Before you move on to incorporate several predictors in the same model, it’s important you are happy you understand what the output means when there’s just one predictor.

This quiz looks at differences in diabetes status by location. The first question concerns a simple descriptive analysis before the quiz moves on to regression.

Use the course data set and the R commands you’ve learned so far on this course to answer the following. Of those with a recorded diabetes status, what percentage of people from Buckingham have diabetes?

• 16.3%
• 83.6%
• 15.5%
• 84.5%

Q2. Now fit a logistic regression with “location” as the predictor variable. What are the log odds of having diabetes being from Louisa compared with Buckingham? Give the answer (the log odds ratio) to two decimal places.

• 0.62
• 0.14
• -1.64
• -0.14

Q3. Using the regression results from the previous item, is location a statistically significant predictor of having diabetes in our sample?
Can’t tell

• false
• true

Q4. Using the data from the previous item, what are the odds of having diabetes being from Louisa compared with Buckingham? (to two decimal places)

• -0.87
• -0.14
• 0.14
• 0.87

### Week 3 Quiz Answers

#### Quiz 1: Running A New Logistic Regression Model Quiz Answers

Q1. Using the same data set as before, now try another model with these predictor variables: age, cholesterol and insurance type. The data set can be downloaded here.

Enter the odds ratios for each predictor in the boxes below to two decimal places. Check whether each one is considered statistically significant at the conventional 5% threshold.

Enter the odds ratio for age to two decimal places:

`Enter answer here`

Q2. Is the odds ratio for age statistically significant?

• Yes
• No

Q3. Enter the odds ratio for cholesterol to 2 decimal places:

`Enter answer here`

Q4. Is the odds ratio for cholesterol statistically significant?

• Yes
• No

Q5. Enter the odds ratio for insurance 1 (government) to 2 decimal places:

`Enter answer here`

Q6. Is the odds ratio for insurance 1 (government) statistically significant?

• Yes
• No

Q7. Enter the odds ratio for insurance 2 (private) to 2 decimal places:

`Enter answer here`

Q8. Is the odds ratio for insurance 2 (private) statistically significant?

• Yes
• No

### Week 4 Quiz Answers

#### Quiz 1: Quiz on R’s Default Output for the Model

Q1. As with any statistical software, you get given a certain set of information by default without asking. Before you ask R for more information, it’s important to understand the default output.

Answer whether the following statements are true or false.

In the above model containing age, cholesterol and insurance, insurance had two degrees of freedom because it had three categories and one was included in the intercept.

• True
• False

Q2. For assessing model fit, large values of residual deviance are good.

• True
• False

Q3. The AIC is useful for comparing models, with the “best” model having the smallest AIC value.

• True
• False

#### Quiz 2: Overfitting and Model Selection

Q1. Logistic regression models should have 100 odds ratios to be robust.

• True
• False

Q2. Logistic regression models with 100 predictors are likely to give more accurate prediction than models with only 20 predictors

• True
• False

Q3. Categorical variables are in general more likely to give problems with fitting a logistic regression model than continuous ones that are linearly related to the outcome

• True
• False

Q4. The notice that the “Algorithm did not converge” is of esoteric importance and can be safely ignored.

• True
• False

Q5. Forwards selection and stepwise selection methods for model selection have been around for years and are therefore worth trying.

• True
• False

Q6. Of the three types of model selection methods discussed in the video – forwards selection, stepwise and backwards elimination – backwards elimination is the “least bad” .

• True
• False

Q7. If using backwards elimination, you need to check that the coefficients for variables remaining in the model don’t change a lot after eliminating non-significant variables.

• True
• False

Q8. When considering which potential predictors to try, reviewing the literature thoroughly and talking to clinical subject matter experts is a good start.

• True
• False

Q9. If your literature review and clinical knowledge suggest including, for example, age as a predictor, but its coefficient has an associated p value of 0.24, age should be dropped from the model.

• True
• False

Q10. Age should always be categorised when used as a predictor because the output will be easier to interpret than including it as a continuous predictor. Three groups are best: young, middle-aged and old.

• True
• False

#### Quiz 3: End of Course Quiz

Q1. You obtain a data set and the relevant meta-data about what all the columns mean, and you run some basic descriptive analyses on each variable in the data set. The research question involves predicting who successfully quit smoking after attending a smoking cessation course.

The outcome variable to be modelled is:

Please select all true options.

• Attendance at the course
• Smoking status at some as yet unspecified time point after the end of the course
• Smoking status at the end of the course
• Smoking status six months after the end of the course
• Binary if in this data set the relevant variable has only two values, still smoking and no longer smoking

Q2. You discover that the outcome variable, called “still_smoking”, has four values:

0 (quit completely), 1 (still smoking, but in reduced quantity), 2 (still smoking, as before) and 9 (not known).

Your sample size is 500 people. Please select all true options.

• You read somewhere that 9 or not known is often entered into databases by default when it means “no” or “zero”. You can therefore assume here that 9 means 0 (quit completely) and go ahead with logistic regression on that basis (after combining values 1 and 2)
• Four values – no point even considering logistic regression here. Best look immediately for something more sophisticated
• If one person has a 2 and two other people have a 9, then your best option is just to drop these three people from the analysis and proceed with logistic regression
• If 90% of people have 9 for this variable, the potential for bias if you just analyse the remaining 10% is uncomfortably high
• It would be reasonable to combine values 1 and 2 as a “failure” for the course, thereby giving a binary outcome if you exclude the 9s

Q3. One of the variables in your data set tells you whether the person completed the course. It’s a binary variable, though 2% of people have a missing value.

Which of these statements is true?

Please select all that apply.

• For the 2% with missing values, you can just take a guess whether they finished the course
• This variable is of great interest and should be considered a potential predictor of the outcome
• This variable is of no relevance because you’re only interested in the success rate of people completing the course
• With only 2% missing values for this variable, you can probably just exclude those people from the analysis

Q4. Concerning odds, probability and logistic regression:

Please select all true options.

• As odds ratios can only be positive, the algorithm models the log odds
• To convert probabilities to odds, you need five pages of algebra or a supercomputer
• Because we are interested in probabilities, if we ask R nicely, it can give us odds ratios between 0 and 1 so that they look like probabilities
• The algorithm models the natural logarithm of the odds of the outcome
• The algorithm models probability directly, and the resulting odds ratios are between 0 and 1

Q5. You have each person’s age in your data set and think it could be a useful predictor to try.

What data preparation would it be useful to do first?

How should you try to model it?

Please select all true options.

• After importing the data set and making sure that the outcome variable is binary, the first thing to do is to include age in the model as a continuous variable
• After importing the data set and making sure that the outcome variable is binary, it would be useful to plot a histogram of age to look for weird values
• After plotting the histogram of age and finding no weird or missing values, the next step is to make three age groups and include age in the model as a categorical variable
• You’ve just read a paper on this subject from another country and they used five age groups. You should therefore do this too
• After plotting the histogram of age and finding no weird or missing values, the next useful step is to plot age against the outcome to gauge the shape of any relation

Q6. Most of the relevant literature finds that age is an important predictor of smoking success.

Please select all true options.

• You could reasonably include age as an a priori predictor in your model and retain it even if p>0.05
• While the literature has found age to be a predictor, it’s of interest to see whether in your data set it is significantly associated with the outcome
• You should still do the basic descriptive analyses, including plotting age against the outcome, before including it in your model
• The last five papers you read on this subject used age as three groups in their models. You should too.

Q7. After the basic plots, you decide to add age as a continuous predictor. You model the log odds of each person quitting smoking completely (smoking_status = 0) after removing the unknowns (9s).

The resulting odds ratio is 1.02, 95%CI 0.98 to 1.04, p=0.38.

Please select all true options.

• There’s no good evidence that age has a linear association with quitting smoking
• It’s possible that there’s some non-linear relation between age and quitting smoking
• The odds of quitting definitely increase with age because the OR is above 1
• With p=0.38, you can conclude that there’s no an association between age and quitting smoking
• The odds of quitting increase with age, though the relation is not statistically significant

Q8. Let’s say in your data set you had 5000 people, and 1200 quit smoking after the course. You do all the recommended descriptive analyses. After excluding some with high levels of missing and invalid values, you have ten possible predictors.

With regards to choosing the multiple logistic regression model:

• Who needs experts? I’m a statistician, I know what I’m doing, and I’ll fit the model how I think it should be fitted
• It would be reasonable to enter all ten of them all at once to see which ones were significant
• Ten predictors? That’s way too many. Let’s just ask a subject matter expert which ones to consider
• Ten predictors? That’s way too many. Let’s apply an automated procedure such as stepwise selection

Q9. You run the model with all ten predictors, some of them categorical, and R tells you that the algorithm has not converged.

What does this mean and what should you do?

Please select all true options.

• A potential cause is that one of the linear predictors has 5% missing values
• R is trying to be helpful with its warnings, but this time I can ignore it
• The most likely cause is missing values in a linear predictor
• The most likely cause is that there are too many predictors in the model.
• A potential cause is a categorical variable where a sparse category is the reference one

Q10. You’re now happy with your set of predictors, and it’s time to test the model fit and performance.

Please select all true options.

• If you add variables to the model, the c statistic will either stay the same or increase
• Low values of the R-squared are desirable
• Low values of the p value for the Hosmer-Lemeshow calibration test are desirable
• In medical journals, information on model fit and performance is often omitted. This is because it’s just not that important
• Residual plots are an indication of model fit
##### Logistic Regression in R for Public Health Course Review

In our experience, we suggest you enroll in Logistic Regression in R for Public Health courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Logistic Regression in R for Public Health Course for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Survival Analysis in R for Public Health Quiz Answers.

##### Conclusion:

I hope this Logistic Regression in R for Public Health Quiz Answer would be useful for you to learn something new from this Course. If it helped you then don’t forget to bookmark our site for more Quiz Answers.

This course is intended for audiences of all experiences who are interested in learning about new skills in a business context; there are no prerequisite courses.

Keep Learning!

##### Get All Course Quiz Answers of Statistical Analysis with R for Public Health Specialization

Introduction to Statistics & Data Analysis in Public Health Quiz Answers

Linear Regression in R for Public Health Coursera Quiz Answers

Logistic Regression in R for Public Health Coursera Quiz Answers

Survival Analysis in R for Public Health Coursera Quiz Answers

error: Content is protected !!