### All Weeks Applied Machine Learning in Python Coursera Quiz Answers

## Table of Contents

### Applied Machine Learning in Python Week 01 Quiz Answers

#### Fundamentals of Machine Learning – Intro to SciKit Learn

Q1. Select the option that correctly completes the sentence:

Training a model using labeled data and using this model to predict the labels for new data is known as **__**.

- Unsupervised Learning
- Clustering
**Supervised Learning**- Density Estimation

Q2. Select the option that correctly completes the sentence:

Modeling the features of an unlabeled dataset to find hidden structure is known as **__**.

- Classification
- Supervised Learning
- Regression
**Unsupervised Learning**

Q3. Select the option that correctly completes the sentence:

Training a model using categorically labelled data to predict labels for new data is known as ** __**.

- Clustering
**Classification**- Regression
- Feature Extraction

Q4. Select the option that correctly completes the sentence:

Training a model using labelled data where the labels are continuous quantities to predict labels for new data is known as ** __**.

**Regression**- Clustering
- Classification
- Feature Extraction

Q5. Using the data for classes 0, 1, and 2 plotted below, what class would a KNeighborsClassifier classify the new point as for k = 1 and k = 3?

- k=1: Class 1 , k=3: Class 0
- k=1: Class 0 , k=3: Class 2
- k=1: Class 0 , k=3: Class 1
- k=1: Class 2 , k=3: Class 1
**k=1: Class 1 , k=3: Class 2**

Q6. Which of the following is true for the nearest neighbor classifier (Select all that apply):

**Memorizes the entire training set**

- A higher value of k leads to a more complex decision boundary
- Partitions observations into k clusters where each observation belongs to the cluster with the nearest mean
- Given a data instance to classify, computes the probability of each possible class using a statistical model of the input features

Q7. Why is it important to examine your dataset as a first step in applying machine learning? (Select all that apply):

**See what type of cleaning or preprocessing still needs to be done****You might notice missing data****Gain insight on what machine learning model might be appropriate, if any****Get a sense for how difficult the problem might be**- It is not important

Q8. The key purpose of splitting the dataset into training and test sets is:

- To speed up the training process
- To reduce the amount of labelled data needed for evaluating classifier accuracy
**To estimate how well the learned model will generalize to new data**- To reduce the number of features we need to consider as input to the learning algorithm

Q9. The purpose of setting the random_state parameter in train_test_split is: (Select all that apply)

**To make experiments easily reproducible by always using the same partitioning of the data**- To avoid bias in data splitting
- To avoid predictable splitting of the data
- To split the data into similar subsets so that bias is not introduced into the final results

Q10. Given a dataset with 10,000 observations and 50 features plus one label, what would be the dimensions of X_train, y_train, X_test, and y_test? Assume a train/test split of 75%/25%.

- X_train: (10000, 28)
- y_train: (10000, )
- X_test: (10000, 12)
- y_test: (10000, )

- X_train: (2500, 50)
- y_train: (2500, )
- X_test: (7500, 50)
- y_test: (7500, )

- X_train: (10000, 50)
- y_train: (10000, )
- X_test: (10000, 50)
- y_test: (10000, )

- X_train: (2500, )
- y_train: (2500, 50)
- X_test: (7500, )
- y_test: (7500, 50)

**X_train: (7500, 50)****y_train: (7500, )****X_test: (2500, 50)****y_test: (2500, )**

### Applied Machine Learning in Python Week 02 Quiz Answers

#### Supervised Machine Learning

Q1. After training a ridge regression model, you find that the training and test set accuracies are 0.98 and 0.54 respectively. Which of the following would be the best choice for the next ridge regression model you train?

- You are overfitting, the next model trained should have a lower value for alpha
**You are overfitting, the next model trained should have a higher value for alpha**- You are underfitting, the next model trained should have a lower value for alpha
- You are underfitting, the next model trained should have a higher value for alpha

Q2. After training a Radial Basis Function (RBF) kernel SVM, you decide to increase the influence of each training point and to simplify the decision surface. Which of the following would be the best choice for the next RBF SVM you train?

**Decrease C and gamma**- Increase C and gamma
- Increase C, decrease gamma
- Decrease C, increase gamma

Q3. Which of the following is an example of multiclass classification? (Select all that apply)

**Classify a set of fruits as apples, oranges, bananas, or lemons**- Predict whether an article is relevant to one or more topics (e.g. sports, politics, finance, science)
- Predicting both the rating and profit of soon to be released movie
- Classify a voice recording as an authorized user or not an authorized user.

Q4. Looking at the plot below which shows accuracy scores for different values of a regularization parameter lambda, what value of lambda is the best choice for generalization?

10

Q5. Suppose you are interested in finding a parsimonious model (the model that accomplishes the desired level of prediction with as few predictor variables as possible) to predict housing prices. Which of the following would be the best choice?

**Lasso Regression**- Logistic Regression
- Ridge Regression
- Ordinary Least Squares Regression

Q6. Match the plots of SVM margins below to the values of the C parameter that correspond to them.

**0.1, 1, 10**- 10, 0.1, 1
- 1, 0.1, 10
- 10, 1, 0.1

Q7. Use Figures A and B below to answer questions 7, 8, 9, and 10.

Looking at the two figures (Figure A, Figure B), determine which linear model each figure corresponds to:

**Figure A: Ridge Regression, Figure B: Lasso Regression**- Figure A: Lasso Regression, Figure B: Ridge Regression
- Figure A: Ordinary Least Squares Regression, Figure B: Ridge Regression
- Figure A: Ridge Regression, Figure B: Ordinary Least Squares Regression
- Figure A: Ordinary Least Squares Regression, Figure B: Lasso Regression
- Figure A: Lasso Regression, Figure B: Ordinary Least Squares Regression

Q8. Looking at Figure A and B, what is a value of alpha that optimizes the R2 score for the Ridge Model?

3

Q9. Looking at Figure A and B, what is a value of alpha that optimizes the R2 score for the Lasso Model?

10

Q10. When running a LinearRegression() model with default parameters on the same data that generated Figures A and B the output coefficients are:

Coef 0 | -19.5 |

Coef 1 | 48.8 |

Coef 2 | 9.7 |

Coef 3 | 24.6 |

Coef 4 | 13.2 |

Coef 5 | 5.1 |

For what value of Coef 3 is R2 score maximized for the Lasso Model?

0

Q11. Which of the following is true of cross-validation? (Select all that apply)

- Increases generalization ability and reduces computational complexity
**Increases generalization ability and computational complexity****Helps prevent knowledge about the test set from leaking into the model****Fits multiple models on different splits of the data**- Removes need for training and test sets

### Applied Machine Learning in Python Week 03 Quiz Answers

#### Evaluation

Q1. A supervised learning model has been built to predict whether someone is infected with a new strain of a virus. The probability of any one person having the virus is 1%. Using accuracy as a metric, what would be a good choice for a baseline accuracy score that the new model would want to outperform?

0.99

Q2. Given the following confusion matrix:

Predicted Positive | Predicted Negative | |

Condition Positive | 96 | 4 |

Condition Negative | 8 | 19 |

Compute the accuracy to three decimal places.

0.906

Q3. Given the following confusion matrix:

Predicted Positive | Predicted Negative | |

Condition Positive | 96 | 4 |

Condition Negative | 8 | 19 |

Compute the precision to three decimal places.

0.923

Q4. Given the following confusion matrix:

Predicted Positive | Predicted Negative | |

Condition Positive | 96 | 4 |

Condition Negative | 8 | 19 |

Compute the recall to three decimal places.

0.960

Q5. Using the fitted model `m`

create a precision-recall curve to answer the following question:

For the fitted model `m`

, approximately what precision can we expect for a recall of 0.8?

(Use y_test and X_test to compute the precision-recall curve. If you wish to view a plot, you can use `plt.show()`

)

```
1 print(m)
```

0.6

Q6. Given the following models and AUC scores, match each model to its corresponding ROC curve.

- Model 1 test set AUC score: 0.91
- Model 2 test set AUC score: 0.50
- Model 3 test set AUC score: 0.56

- Model 1: Roc 1
- Model 2: Roc 2
- Model 3: Roc 3

**Model 1: Roc 1****Model 2: Roc 3****Model 3: Roc 2**

- Model 1: Roc 2
- Model 2: Roc 3
- Model 3: Roc 1

- Model 1: Roc 3
- Model 2: Roc 2
- Model 3: Roc 1

- Not enough information is given.

Q7. Given the following models and accuracy scores, match each model to its corresponding ROC curve.

- Model 1 test set accuracy: 0.91
- Model 2 test set accuracy: 0.79
- Model 3 test set accuracy: 0.72

- Model 1: Roc 1
- Model 2: Roc 2
- Model 3: Roc 3

- Model 1: Roc 1
- Model 2: Roc 3
- Model 3: Roc 2

- Model 1: Roc 2
- Model 2: Roc 3
- Model 3: Roc 1

- Model 1: Roc 3
- Model 2: Roc 2
- Model 3: Roc 1

**Not enough information is given.**

Q8. Using the fitted model `m`

what is the micro precision score?

(Use y_test and X_test to compute the precision score.)

`1 print(m)`

0.744

Q9. Which of the following is true of the R-Squared metric? (Select all that apply)

- A model that always predicts the mean of y would get a negative score
- A model that always predicts the mean of y would get a score of 0.0
**The worst possible score is 0.0****The best possible score is 1.0**

Q10. In a future society, a machine is used to predict a crime before it occurs. If you were responsible for tuning this machine, what evaluation metric would you want to maximize to ensure no innocent people (people not about to commit a crime) are imprisoned (where crime is the positive label)?

- Accuracy
**Precision**- Recall
- F1
- AUC

Q11. Consider the machine from the previous question. If you were responsible for tuning this machine, what evaluation metric would you want to maximize to ensure all criminals (people about to commit a crime) are imprisoned (where crime is the positive label)?

- Accuracy
- Precision
**Recall**- F1
- AUC

Q12. A classifier is trained on an imbalanced multiclass dataset. After looking at the model’s precision scores, you find that the micro averaging is much smaller than the macro averaging score. Which of the following is most likely happening?

- The model is probably misclassifying the infrequent labels more than the frequent labels.
**The model is probably misclassifying the frequent labels more than the infrequent labels.**

Q13. Using the already defined RBF SVC model `m`

, run a grid search on the parameters C and gamma, for values [0.01, 0.1, 1, 10]. The grid search should find the model that best optimizes for recall. How much better is the recall of this model than the precision? (Compute recall – precision to 3 decimal places)

(Use y_test and X_test to compute precision and recall.)

`1 print(m)`

0.52

Q14. Using the already defined RBF SVC model `m`

, run a grid search on the parameters C and gamma, for values [0.01, 0.1, 1, 10]. The grid search should find the model that best optimizes for precision. How much better is the precision of this model than the recall? (Compute precision – recall to 3 decimal places)

(Use y_test and X_test to compute precision and recall.)

`1 print(m)`

0.15

### Applied Machine Learning in Python Week 04 Quiz Answers

#### Supervised Machine Learning – Part 2

Q1. Which of the following is an example of clustering?

**Separate the data into distinct groups by similarity**- Creating a new representation of the data with fewer features
- Accumulate data into groups based on labels
- Compress elongated clouds of data into more spherical representations

Q2. Which of the following are advantages to using decision trees over other models? (Select all that apply)

**Trees often require less preprocessing of data**- Trees are naturally resistant to overfitting
- Decision trees can learn complex statistical models using a variety of kernel functions
**Trees are easy to interpret and visualize**

Q3. What is the main reason that each tree of a random forest only looks at a random subset of the features when building each node?

- To increase interpretability of the model
- To learn which features are not strong predictors
- To reduce the computational complexity associated with training each of the trees needed for the random forest.
**To improve generalization by reducing correlation among the trees and making the model more robust to bias.**

Q4. Which of the following supervised machine learning methods are greatly affected by feature scaling? (Select all that apply)

**KNN****Support Vector Machines**- Naive Bayes
- Decision Trees
**Neural Networks**

Q5. Select which of the following statements are true.

**For a model that won’t overfit a training set, Naive Bayes would be a better choice than a decision tree.**- For a fitted model that doesn’t take up a lot of memory, KNN would be a better choice than logistic regression.
- For having an audience interpret the fitted model, a support vector machine would be a better choice than a decision tree.
**For predicting future sales of a clothing line, Linear regression would be a better choice than a decision tree regressor.**

Q6. Match each of the prediction probabilities decision boundaries visualized below with the model that created them.

- KNN (k=1)
- Decision Tree
- Neural Network

- Neural Network
- Decision Tree
- KNN (k=1)

- KNN (k=1)
- Neural Network
- Decision Tree

**Neural Network****KNN (k=1)****Decision Tree**

Q7. A decision tree of depth 2 is visualized below. Using the `value`

attribute of each leaf, find the accuracy score for the tree of depth 2 and the accuracy score for a tree of depth 1.

What is the improvement in accuracy between the model of depth 1 and the model of depth 2? (i.e. accuracy2 – accuracy1)

0.06745

Q8. For the autograded assignment in this module, you will create a classifier to predict whether a given blight ticket will be paid on time (See the module 4 assignment notebook for a more detailed description). Which of the following features should be removed from the training of the model to prevent data leakage? (Select all that apply)

- ticket_issued_date – Date and time the ticket was issued
- grafitti_status – Flag for graffiti violations
**compliance_detail – More information on why each ticket was marked compliant or non-compliant**- agency_name – Agency that issued the ticket
**collection_status – Flag for payments in collections**

Q9. Which of the following might be good ways to help prevent a data leakage situation?

**If time is a factor, remove any data related to the event of interest that doesn’t take place prior to the event.**- Ensure that data is preprocessed outside of any cross validation folds.
**Remove variables that a model in production wouldn’t have access to****Sanity check the model with an unseen validation set**

Q10. Given the neural network below, find the correct outputs for the given values of x1 and x2.

The neurons that are shaded have an activation threshold, e.g. the neuron with >1? will be activated and output 1 if the input is greater than 1 and will output 0 otherwise.

x1 | x2 | output |

0 | 0 | 0 |

0 | 1 | 1 |

1 | 0 | 1 |

1 | 1 | 0 |

#### Get All Course Quiz Answers of **Entrepreneurship Specialization**

Entrepreneurship 1: Developing the Opportunity Quiz Answers

Entrepreneurship 2: Launching your Start-Up Quiz Answers

Entrepreneurship 3: Growth Strategies Coursera Quiz Answers

Entrepreneurship 4: Financing and Profitability Quiz Answers