Welcome to your comprehensive guide for Practical Machine Learning quiz answers! Whether you’re tackling practice quizzes to reinforce your skills or preparing for graded quizzes to assess your knowledge, this guide is here to help.
Covering all course modules, this resource will equip you with practical knowledge of machine learning concepts, including data preprocessing, feature selection, model building, evaluation, and tuning.
Practical Machine Learning Quiz Answers – Graded Quizzes for All Modules
Table of Contents
Practical Machine Learning Week 01 Quiz Answers
Q1. Which of the following are components in building a machine learning algorithm?
Correct Answer: Training and test sets, Collecting data to answer the question
Explanation: Building a machine learning algorithm involves collecting data, splitting it into training and test sets, and using statistical inference or optimization methods to train the model.
Q2. Suppose we build a prediction algorithm on a data set and it is 100% accurate on that data set. Why might the algorithm not work well if we collect a new data set?
Correct Answer: Our algorithm may be overfitting the training data, predicting both the signal and the noise.
Explanation: Overfitting occurs when a model performs exceptionally well on the training data but fails to generalize to new data because it has learned noise and irrelevant patterns.
Q3. What are typical sizes for the training and test sets?
Correct Answer: 80% training set, 20% test set
Explanation: A common practice is to allocate 80% of the data to training and 20% to testing, ensuring the model is trained on sufficient data while leaving enough data for unbiased evaluation.
Q4. What are some common error rates for predicting binary variables (i.e., variables with two possible values like yes/no, disease/normal, clicked/didn’t click)?
Correct Answer: Accuracy
Explanation: Accuracy is a commonly used metric for binary classification. Other metrics like sensitivity and specificity are also frequently employed for evaluating model performance.
Q5. Suppose that we have created a machine learning algorithm that predicts whether a link will be clicked with 99% sensitivity and 99% specificity. The rate the link is clicked is 1/1000 of visits to a website. If we predict the link will be clicked on a specific visit, what is the probability it will actually be clicked?
Correct Answer: 9%
Explanation: Using Bayes’ theorem, given the low base rate of 1/1000, even high sensitivity and specificity result in a low positive predictive value. This means only 9% of predicted clicks are actual clicks.
Practical Machine Learning Week 02 Quiz Answers
Q1. Load the Alzheimer’s disease data using the commands:
RCopyEditlibrary(AppliedPredictiveModeling)
data(AlzheimerDisease)
Which of the following commands will create non-overlapping training and test sets with about 50% of the observations assigned to each?
Correct Answer:
RCopyEditadData = data.frame(diagnosis, predictors)
trainIndex = createDataPartition(diagnosis, p = 0.50, list = FALSE)
training = adData[trainIndex, ]
testing = adData[-trainIndex, ]
Explanation: The createDataPartition()
function ensures balanced splits of data with respect to the class variable, and list = FALSE
outputs row indices for the split.
Q2. Load the cement data using the commands:
RCopyEditlibrary(AppliedPredictiveModeling)
data(concrete)
library(caret)
set.seed(1000)
inTrain = createDataPartition(mixtures$CompressiveStrength, p = 3/4)[[1]]
training = mixtures[inTrain, ]
testing = mixtures[-inTrain, ]
Make a plot of the outcome (CompressiveStrength) versus the index of the samples. Color by each of the variables in the data set (you may find the cut2()
function in the Hmisc
package useful for turning continuous covariates into factors). What do you notice in these plots?
Correct Answer: There is a non-random pattern in the plot of the outcome versus index.
Explanation: The outcome variable exhibits a clear trend over the index, indicating non-random variation that could suggest systematic changes in the dataset.
Q3. Load the cement data using the commands:
RCopyEditlibrary(AppliedPredictiveModeling)
data(concrete)
library(caret)
set.seed(1000)
inTrain = createDataPartition(mixtures$CompressiveStrength, p = 3/4)[[1]]
training = mixtures[inTrain, ]
testing = mixtures[-inTrain, ]
Make a histogram and confirm the SuperPlasticizer variable is skewed. Normally you might use the log transform to try to make the data more symmetric. Why would that be a poor choice for this variable?
Correct Answer: There are values of zero so when you take the log()
transform those values will be -Inf
.
Explanation: Log transformations are undefined for zero values, resulting in -Inf
values, which are not valid for modeling.
Q4. Load the Alzheimer’s disease data using the commands:
RCopyEditlibrary(caret)
library(AppliedPredictiveModeling)
set.seed(3433)
data(AlzheimerDisease)
adData = data.frame(diagnosis, predictors)
inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
training = adData[inTrain, ]
testing = adData[-inTrain, ]
Find all the predictor variables in the training set that begin with IL. Perform principal components on these variables with the preProcess()
function from the caret package. Calculate the number of principal components needed to capture 90% of the variance. How many are there?
Correct Answer: 7
Explanation: Principal Component Analysis (PCA) reduces the dimensionality of the data by finding linear combinations of predictors. Seven components explain 90% of the variance in this case.
Q5. Load the Alzheimer’s disease data using the commands:
RCopyEditlibrary(caret)
library(AppliedPredictiveModeling)
set.seed(3433)
data(AlzheimerDisease)
adData = data.frame(diagnosis, predictors)
inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
training = adData[inTrain, ]
testing = adData[-inTrain, ]
Create a training data set consisting of only the predictors with variable names beginning with IL and the diagnosis. Build two predictive models, one using the predictors as they are and one using PCA with principal components explaining 80% of the variance in the predictors. Use method = "glm"
in the train
function.
What is the accuracy of each method in the test set? Which is more accurate?
Correct Answer: Non-PCA Accuracy: 0.72, PCA Accuracy: 0.65
Explanation: The Non-PCA model performs better in this scenario, likely because PCA removes some information relevant to the predictive model.
Practical Machine Learning Week 03 Quiz Answers
Q1. For this quiz, we will be using several R packages. R package versions change over time; the right answers have been checked using the following versions of the packages:
- AppliedPredictiveModeling: v1.1.6
- caret: v6.0.47
- ElemStatLearn: v2012.04-0
- pgmm: v1.1
- rpart: v4.1.8
Load the cell segmentation data from the AppliedPredictiveModeling
package using the commands:
RCopyEditlibrary(AppliedPredictiveModeling)
data(segmentationOriginal)
library(caret)
- Subset the data to a training set and testing set based on the
Case
variable in the dataset. - Set the seed to 125 and fit a CART model with the
rpart
method using all predictor variables and default caret settings. - In the final model, what would be the final model prediction for cases with the following variable values?
a. TotalIntench2 = 23,000; FiberWidthCh1 = 10; PerimStatusCh1 = 2
b. TotalIntench2 = 50,000; FiberWidthCh1 = 10; VarIntenCh4 = 100
c. TotalIntench2 = 57,000; FiberWidthCh1 = 8; VarIntenCh4 = 100
d. FiberWidthCh1 = 8; VarIntenCh4 = 100; PerimStatusCh1 = 2
Correct Answer: a. PS
b. WS
c. PS
d. Not possible to predict
Explanation: The CART model divides cases based on predictor splits. The above predictions are based on the splits in the fitted decision tree.
Q2. If K is small in a K-fold cross-validation, is the bias in the estimate of out-of-sample (test set) accuracy smaller or bigger? Is the variance in the estimate of out-of-sample (test set) accuracy smaller or bigger? Is K large or small in leave-one-out cross-validation?
Correct Answer: The bias is larger, and the variance is smaller. Under leave-one-out cross-validation, K is equal to the sample size.
Explanation: When K is small, fewer folds result in higher bias (less representative validation sets) and lower variance. Leave-one-out cross-validation (LOOCV) uses K equal to the number of observations, minimizing bias but increasing variance.
Q3. Load the olive oil data using the commands:
RCopyEditlibrary(pgmm)
data(olive)
olive = olive[, -1]
Fit a classification tree where Area
is the outcome variable. Then predict the value of area for the following data frame using the tree command with all defaults:
RCopyEditnewdata = as.data.frame(t(colMeans(olive)))
What is the resulting prediction? Is the resulting prediction strange? Why or why not?
Correct Answer: 2.783. It is strange because Area
should be a qualitative variable, but tree
is reporting the average value of Area
as a numeric variable in the leaf predicted for newdata
.
Explanation: Classification trees are designed for categorical outcomes, but when misconfigured, they can treat the outcome as numeric and provide average values.
Q4. Load the South Africa Heart Disease Data and create training and test sets with the following code:
RCopyEditlibrary(ElemStatLearn)
data(SAheart)
set.seed(8484)
train = sample(1:dim(SAheart)[1], size = dim(SAheart)[1] / 2, replace = FALSE)
trainSA = SAheart[train, ]
testSA = SAheart[-train, ]
Then set the seed to 13234 and fit a logistic regression model (method = "glm"
, family = “binomial”) with Coronary Heart Disease (chd
) as the outcome and age at onset, current alcohol consumption, obesity levels, cumulative tobacco, type-A behavior, and low-density lipoprotein cholesterol as predictors.
Calculate the misclassification rate for your model using this function and a prediction on the “response” scale:
RCopyEditmissClass = function(values, prediction) {
sum(((prediction > 0.5) * 1) != values) / length(values)
}
What is the misclassification rate on the training set? What is the misclassification rate on the test set?
Correct Answer: Test Set Misclassification: 0.31, Training Set Misclassification: 0.27
Explanation: The misclassification rates are calculated based on whether predicted probabilities exceed the 0.5 threshold, compared to the observed outcomes.
Q5. Load the vowel.train
and vowel.test
datasets:
RCopyEditlibrary(ElemStatLearn)
data(vowel.train)
data(vowel.test)
Set the variable y
to be a factor variable in both the training and test set. Then set the seed to 33833. Fit a random forest predictor relating the factor variable y
to the remaining variables.
Calculate the variable importance using the varImp
function in the caret package. What is the order of variable importance?
Correct Answer: The order of the variables is: x.2, x.1, x.5, x.8, x.6, x.4, x.3, x.9, x.7, x.10
Explanation: Variable importance measures how much each predictor contributes to the model’s accuracy or Gini index in random forests.
Practical Machine Learning Week 04 Quiz Answers
Q1. Load the vowel.train
and vowel.test
data sets:
RCopyEditlibrary(ElemStatLearn)
data(vowel.train)
data(vowel.test)
Set the variable y
to be a factor variable in both the training and test set. Then set the seed to 33833. Fit:
- A random forest predictor relating the factor variable
y
to the remaining variables. - A boosted predictor using the
"gbm"
method.
Fit these both with the train()
command in the caret
package.
What are the accuracies for the two approaches on the test data set? What is the accuracy among the test set samples where the two methods agree?
Correct Answer: RF Accuracy = 0.6082, GBM Accuracy = 0.5152, Agreement Accuracy = 0.6361
Explanation: The random forest model achieves higher accuracy than the boosted trees model. Among the cases where the predictions agree, the accuracy is slightly higher, indicating agreement correlates with correctness.
Q2. Load the Alzheimer’s data using the following commands:
RCopyEditlibrary(caret)
library(gbm)
set.seed(3433)
library(AppliedPredictiveModeling)
data(AlzheimerDisease)
Set the seed to 62433 and predict diagnosis with all the other variables using:
- Random forest (
"rf"
), - Boosted trees (
"gbm"
), - Linear discriminant analysis (
"lda"
) models.
Stack the predictions together using random forests ("rf"
).
What is the resulting accuracy on the test set? Is it better or worse than each of the individual predictions?
Correct Answer: Stacked Accuracy: 0.80 is better than all three other methods
Explanation: Stacking combines the strengths of all three models, resulting in improved predictive performance compared to any individual model.
Q3. Load the concrete data with the commands:
RCopyEditset.seed(3523)
library(AppliedPredictiveModeling)
data(concrete)
inTrain = createDataPartition(concrete$CompressiveStrength, p = 3/4)[[1]]
training = concrete[inTrain, ]
Set the seed to 233 and fit a lasso model to predict CompressiveStrength
. Which variable is the last coefficient to be set to zero as the penalty increases?
Correct Answer: Cement
Explanation: The lasso penalizes coefficients based on their contribution to the model. Cement remains as the last variable, indicating its strong association with compressive strength.
Q4. Load the data on the number of visitors to the instructor’s blog from here:
RCopyEditlibrary(lubridate) # For year() function below
dat = read.csv("~/Desktop/gaData.csv")
training = dat[year(dat$date) < 2012, ]
testing = dat[(year(dat$date)) > 2011, ]
tstrain = ts(training$visitsTumblr)
Fit a model using the bats()
function in the forecast
package to the training time series. Then forecast this model for the remaining time points. For how many of the testing points is the true value within the 95% prediction interval bounds?
Correct Answer: 94%
Explanation: The bats()
model captures the time-series patterns well, and 94% of the actual values fall within the 95% prediction intervals.
Q5. Load the concrete data with the commands:
RCopyEditset.seed(3523)
library(AppliedPredictiveModeling)
data(concrete)
inTrain = createDataPartition(concrete$CompressiveStrength, p = 3/4)[[1]]
training = concrete[inTrain, ]
Set the seed to 325 and fit a support vector machine (SVM) using the e1071
package to predict CompressiveStrength
using the default settings. Predict on the testing set. What is the RMSE?
Correct Answer: 6.72
Explanation: The support vector machine model predicts the compressive strength with an RMSE of 6.72, indicating the average deviation of predictions from the actual values.
Frequently Asked Questions (FAQ)
Are the Practical Machine Learning quiz answers accurate?
Yes, these answers are thoroughly reviewed and verified to align with the latest course content and machine learning principles.
Can I use these answers for both practice and graded quizzes?
Absolutely! These answers are designed for both practice quizzes and graded assessments, helping you prepare thoroughly for all evaluations.
Does this guide cover all modules of the course?
Yes, this guide provides answers for every module, ensuring complete coverage of the entire course content.
Will this guide help me apply machine learning concepts practically?
Yes, beyond providing quiz answers, this guide reinforces key topics such as supervised and unsupervised learning, cross-validation, performance metrics, and model optimization for real-world applications.
Conclusion
We hope this guide to Practical Machine Learning Quiz Answers helps you gain confidence in applying machine learning concepts and succeed in your course. Bookmark this page for quick reference and share it with your classmates. Ready to build practical machine learning models and ace your quizzes? Let’s dive in and get started!
Sources: Practical Machine Learning
Get All Quiz Answers of Data Science Specialization >>
The Data Scientist’s Toolbox Quiz Answers
Getting and Cleaning Data Quiz Answers
Exploratory Data Analysis Quiz Answers
Reproducible Research Quiz Answers
Statistical Inference Quiz Answers
Regression Models Quiz Answers
Practical Machine Learning Quiz Answers
Developing Data Products Quiz Answers