# Survival Analysis in R for Public Health Quiz Answers

### Get All Weeks Survival Analysis in R for Public Health Quiz Answers

Welcome to Survival Analysis in R for Public Health!

The three earlier courses in this series covered statistical thinking, correlation, linear regression and logistic regression. This one will show you how to run survival – or “time to event” – analysis, explaining what’s meant by familiar-sounding but deceptive terms like hazard and censoring, which have specific meanings in this context. Using the popular and completely free software R.

You’ll learn how to take a data set from scratch, import it into R, run essential descriptive analyses to get to know the data’s features and quirks, and progress from Kaplan-Meier plots through to multiple Cox regression. You’ll use data simulated from real, messy patient-level data for patients admitted to hospital with heart failure and learn how to explore which factors predict their subsequent mortality.

You’ll learn how to test model assumptions and fit to the data and some simple tricks to get around common problems that real public health data have. There will be mini-quizzes on the videos and the R exercises with feedback along the way to check your understanding.

Enroll on Coursera

### Week 1 Quiz Answers

#### Quiz 1: Survival Analysis Variables Quiz Answers

Q1. Survival analysis can be applied to:

• studies that run over many months/years.
• cross-sectional studies.

Q2. Fill in the blank:

We are mainly interested to know __ a particular outcome has occurred for each of the patients involved over the study period.

• when
• whether

Q3. Examples of outcomes survival analysis deals with:

• onset of speech from birth
• time to cancer relapse
• hospital discharge after kidney transplant
• sex at birth
• treatment choice after heart attack

#### Quiz 2: Life tables Quiz Answers

Q1. You’ve just seen the Kaplan-Meier method, used to compute tables of the survival probability. When we plot the results, we end up with a stepped survival curve.

Complete the table below using the Kaplan-Meier method. The bold letters represent a number that needs to be calculated.

`Enter answer here`

Q2. CELL B =

`Enter answer here`

Q3. CELL C =

`Enter answer here`

Q4. CELL D

`Enter answer here`

Q5. CELL E

`Enter answer here`

Q6. CELL F

`Enter answer here`

Q7. CELL G

`Enter answer here`

Q8. CELL H

`Enter answer here`

#### Quiz 3: Practice in R: Running a KM plot and log-rank test Quiz Answers

Q1. These days there are several R packages available to run survival analysis. First you need to ask R to install and load them:

`1`

`2`

`3`

`4`

`5`

`install.packages("survival")`

`install.packages("ggplot")`

`library(survival) # this is the cornerstone command for survival analysis in R`

`library(ggplot2) # newer package that does nice plots`

“Survival” creates a number of useful R objects that can be further manipulated. A key one is “survfit”, which we’ll use shortly.

To run these packages, we of course need some variables to put into them. My preferred way to do this is to turn each column of the data set, which we’ve called “g”, into a variable and tell R what kind of variable it is. For instance, gender is categorical, so type:

`1`

`2`

`3`

`gender <- as.factor(g[,"gender"]) # R calls categorical variables factors`

`fu_time <- g[,"fu_time"] # continuous variable (numeric) `

`death <- g[,"death"] # binary variable (numeric) `

This way, I find subsequent code more intuitive, but coding is often about personal preferences. To run an overall Kaplan-Meier plot, type:

`1`

`2`

`3`

`km_fit <- survfit(Surv(fu_time, death) ~ 1)`

`plot(km_fit)`

This gives the following plot: By default, there are no axis labels or titles, but let’s not worry about that for now. Let’s go through the different parts of the code. “Survfit” fits a simple survival model that depends only on gender in terms of predictors: in this case there aren’t any predictors, so the model just has the intercept, denoted by “1”. The two arguments used by “Surv” are the follow-up time for each patient and whether they died. In our data, death=1 for people who had died by the end of the follow-up period, and death=0 for those still alive at that time. Technically, those people still alive are censored, because we don’t know when they’ll die (of course everyone does at some point). The survfit() function produces the Kaplan-Meier estimates of the probability of survival over time that are used by “plot” to produce the Kaplan-Meier curve above. These estimates can be seen by typing:

`1`

`summary(km_fit, times = c(1:7,30,60,90*(1:10))) `

The “times” argument gives us control over what time periods we want to see. The above code asks for output every day for the first week, then at 30, 60 and 90 days, and then every 90 days thereafter. Here’s the output:

`1`

`2`

`3`

`4`

`5`

`6`

`7`

`8`

`9`

`10`

`11`

`12`

`13`

`14`

`15`

`16`

`17`

`18`

`19`

`20`

`21`

`22`

`23`

`24`

`25`

`26`

`27`

`28`

`29`

`30`

`31`

`32`

`33`

`34`

`35`

`36`

`37`

`38`

`39`

`40`

`Call: survfit(formula = Surv(fu_time, death) ~ 1) `

` time n.risk n.event survival std.err lower 95% CI upper 95% CI `

`    1    992      12    0.988 0.00346        0.981        0.995 `

`    2    973       7    0.981 0.00435        0.972        0.989 `

`    3    963       5    0.976 0.00489        0.966        0.985 `

`    4    954       6    0.970 0.00546        0.959        0.980 `

`    5    945       5    0.964 0.00590        0.953        0.976 `

`    6    938       1    0.963 0.00598        0.952        0.975 `

`    7    933       1    0.962 0.00606        0.951        0.974 `

`   30    865      39    0.921 0.00865        0.905        0.939 `

`   60    809      28    0.891 0.01010        0.871        0.911 `

`   90    770      24    0.864 0.01117        0.843        0.887 `

`  180    698      43    0.815 0.01282        0.790        0.841 `

`  270    653      24    0.787 0.01363        0.760        0.814 `

`  360    619      21    0.761 0.01428        0.733        0.789 `

`  450    525      44    0.705 0.01554        0.675        0.736 `

`  540    429      47    0.639 0.01681        0.607        0.673 `

`  630    362      32    0.589 0.01765        0.556        0.625 `

`  720    266      43    0.514 0.01876        0.479        0.552 `

`  810    190      31    0.448 0.01979        0.411        0.488 `

Whereas all but about 1% make it past the first day, at 900 days after a first emergency admission for heart failure, the probability of surviving is just 38%.

Now let’s extend this by splitting the curve by gender:

`1`

`2`

`3`

`km_gender_fit <- survfit(Surv(fu_time, death) ~ gender) `

`plot(km_gender_fit)`

To compare survival by gender, we can run a logrank test. There are many ways to do this because of different versions for different scenarios, e.g. particular types of censored data, but we’ll just give the standard one:

`survdiff(Surv(fu_time, death) ~ gender, rho=0) `

With rho = 0, which is the default so we don’t need to write this bit, it yields the log-rank or Mantel-Haenszel test. When you run the above, you should get this output:

`1`

`2`

`3`

`4`

`5`

`6`

`7`

`8`

`9`

`survdiff(formula = Surv(fu_time, death) ~ gender, rho = 0) `

`           N Observed Expected (O-E)^2/E (O-E)^2/V `

`gender=1 548      268      271    0.0365     0.082 `

`gender=2 452      224      221    0.0448     0.082 `

` Chisq= 0.1  on 1 degrees of freedom, p= 0.8  `

Recall from the data set documentation earlier in the course that gender=1 is male and gender=2 is female.

What do you think this output means? Choose one of the following options:

• Women live longer than men in this sample
• Men live longer than women in this sample
• Both genders seem to have similar survival rates over time

### Week 2 Quiz Answers

#### Quiz 1: Hazard function and Ratio Quiz Answers

Q1. In survival analysis, the hazard is:

• Something dangerous
• The probability of surviving at time t having survived up to time t
• The probability of surviving at time t

Q2. The risk set comprises:

• Everyone who is still alive at time t
• The set of patients at time t that are at risk of experiencing the event

#### Quiz 2: Simple Cox Model Quiz Answers

Q1. What information is returned by the summary command when you run a Cox model in R? Tick all that apply:

• hazard ratio with confidence interval
• p-value
• how well the model can predict outcome
• expected survival time
• survival probability

Q2. Is this statement true or false?

In a model where age is entered as just one term, age is assumed to have a linear relation with the hazard

• True
• False

Q3. Is this statement true or false?

The default output from “coxph” tells you everything you need to know about the model.

• True
• False

### Week 3 Quiz Answers

#### Quiz 1: Multiple Cox Model Quiz Answers

Q1. According to the results of the Cox model that you ran in the previous reading, females are at lower risk of mortality than males following hospital admission for heart failure. Is this true or false?

• True
• False

Q2. According to the results of the Cox model prior non-attendance is a statistically significant predictor of mortality.

• True
• False

Q3. According to the results of the Cox model ethnic group 9 (other) has an increased risk compared with ethnic group 1 (white) but the result is not statistically significant.

• True
• False

### Week 4 Quiz Answers

#### Quiz 1: Assessing the proportionality assumption in practice Quiz Answers

Q1. Test the proportionality assumption for gender and enter the resulting p value from the output, rounding to two decimal places if necessary. Different versions of the package give slightly different answers, so we will accept both.

`Enter answer here`

#### Quiz 2: Testing the proportionality assumption with another variable Quiz Answers

Q1. Now it’s your turn. Try the same code but this time check if the assumption holds for COPD (chronic obstructive pulmonary disease), which is a common and important comorbidity with heart failure. Type the p value that you got into the box, rounding it to two decimal places. Different versions of this package can yield slightly different results, so both will be accepted.

`Enter answer here`

#### Quiz 3: End-of-Module Assessment Quiz Answers

Q1. Answer true or false to the following set of questions:

In a birth cohort study, people are enrolled into the study at birth and followed up over time to see who gets the outcome of interest, e.g. some disease. Their age in days can be used as the time variable in a time-to-event analysis

• True
• False

Q2. In survival analysis, censoring can occur if individuals drop out of the study and we don’t know whether they have the event of interest, e.g. death. They are handled as neither alive nor dead and are deducted from the number of patients alive.

• True
• False

Q3. Pick the single best answer for the following set of questions.

With a Kaplan-Meier table and plot where death is the outcome of interest, if ten patients are alive on day 20 and then three die and one is censored, all on day 21, what is the proportion of patients under observation who are alive at the start of day 22?

• 7/10, because the censoring doesn’t matter
• 6/10, because the censoring matters and censored patients should be treated as dead
• 6/9, because the censoring matters and censored patients should be treated as neither dead nor alive
• 6/6, because in the calculation, the patients who died are handled first, and then we remove the censored patient from those remaining alive

Q4. What does the following R code produce?

``km_fit <-survfit(Surv(fu_time, death) ~ 1)``
``plot(km_fit)``
• A Kaplan-Meier curve for each variablein the data set
• Kaplan-Meier estimates of the probability of survival over time for all patients in the data set, where “~ 1” means “ignore the censoring”
• Kaplan-Meier estimates of the probability of survival over time for all patients in the data set for all time points until the last patient’s “fu_time”
• A Kaplan-Meier curve and log-rank test

Q5. For the following questions, multiple options may be selected. There may be one or more than one correct answer. Tick all that are correct.

You run the following R code:

``survdiff(Surv(fu_time, death) ~ gender, rho=0)After the table of observed and expected counts, it gives you the following output:``
``Chisq= 2.9 on 1 degrees of freedom, p= 0.085``
• The null hypothesis tested here is that both genders have the same mean survival time
• The null hypothesis tested here is that, at every given time point, each gender has the same hazard of death
• By convention, we conclude there is insufficient evidence against the null hypothesis
• By convention, we conclude there is sufficient evidence against the null hypothesis
• This shows that women live longer than men in this sample

Q6. You run a Cox regression model on the time to relapse for cancer, with treatment as a binary predictor. The estimated hazard ratio for treatment A related to treatment B is 1.33. What can you conclude from this study from just these facts?

• The hazard for treatment A is estimated to be a third higher than the hazard for treatment B
• The hazard for treatment A may in fact be lower than the hazard for treatment B, but we can’t tell just from the estimate
• If p<0.05, people on treatment B should all be moved over to treatment A
• If p<0.05 and the hazards are shown to be proportional, people on treatment B should all be moved over to treatment A
• You should test that the hazards are proportional before even interpreting the hazard ratio

Q7. You want to see whether the hazard for death differs by ethnic group using a Cox model. You import a comma-separated file and store the imported data in an R object “g”. Ethnic group takes the vales 1 to 20. The column “time” is the follow-up time. Which of the following statements is/are true?

• As “ethnicgroup” takes 20 different values, you can treat it as continuous
• cox <-coxph(Surv(death, time) ~ ethnicgroup, data = g) will do the job
• cox <-coxph(Surv(time, death) ~ ethnicgroup, data = g) will do the job
• cox <-coxph(Surv(time, death) ~ ethnicgroup, data = g) assumes a linear relation between ethnic group and the outcome
• cox <-coxph(Surv(time, death) ~ ethnicgroup) will do the job but only if you make “ethnicgroup” as a categorical variable first

Q8. You want to fix a multiple Cox model with hospital readmission as the outcome and five columns that could be used as predictors in the data set: age, gender, number of comorbidities, severity of disease on admission (categorical) and type of housing or accommodation (binary). Which of the following tasks are aspects of good statistical practice for the reason given?

• Cross-tabulate age and gender to get to know your sample, including whether these variables have any missing values
• Run a chi-squared test with gender and disease severity to get to know your sample
• Categorise age into three groups so you’ve only two coefficients to interpret in the model
• Categorise age into ten groups because that’s how a previous study did it
• Plot the number of comorbidities against the overall proportion who were readmitted to get a very rough sense of the shape of the relation

Q9. Which of the following statements are true regarding missing data?

• Missing data are a fairly common problem with public health databases but are very rarely an issue in clinical trial
• If you had to choose your typeof missing data, the “least worst” type is missing at random (MAR)
• Data set documentation is a good place to start if you want to understand why that data set has missing data
• If you missing values for age and you want to include age in your Cox model, then the best thing is just to exclude the affected patients
• With data that are missing not at random (MNAR), the probability that a value is missing depends partly on things not in your data set

Q10. You run a Cox model using the same outcome and the same set of predictors as used in a previous study published in a good journal, but your hazard ratios, confidence intervals, p-values and indeed some of your conclusions differ from those for that previous study. Which of the following are plausible explanations for the differences?

• Your sample size was four times as large as theirs
• You used RStudio but they used regular R
• Your data were of worse quality
• You set males to be the reference category but they set females to be the reference
##### Survival Analysis in R for Public Health Course Review

In our experience, we suggest you enroll in Survival Analysis in R for Public Health courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Survival Analysis in R for Public Health Course for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Survival Analysis in R for Public Health Quiz Answers.

##### Conclusion:

I hope this Survival Analysis in R for Public Health Quiz Answer would be useful for you to learn something new from this Course. If it helped you then don’t forget to bookmark our site for more Quiz Answers.

This course is intended for audiences of all experiences who are interested in learning about new skills in a business context; there are no prerequisite courses.

Keep Learning!

##### Get All Course Quiz Answers of Statistical Analysis with R for Public Health Specialization

Introduction to Statistics & Data Analysis in Public Health Quiz Answers

Linear Regression in R for Public Health Coursera Quiz Answers

Logistic Regression in R for Public Health Coursera Quiz Answers

Survival Analysis in R for Public Health Coursera Quiz Answers

error: Content is protected !!