# Art and Science of Machine Learning Coursera Quiz Answers

## Art and Science of Machine Learning Week 01 Quiz Answers

### The Art of ML: Regularization Quiz Answers

Q1. Regularization is useful because it can

• Limit overfitting
• Make models smaller

Q1. If searching among a large number of hyperparameters, you should do a systematic grid search rather than start from random values, so that you are not relying on chance. True or False?

• False

Q2. It is a good idea to use the training loss itself as the hyperparameter tuning metric. True or False?

• False

Q3. Hyperparameter tuning in Cloud ML Engine involves adding the appropriate TensorFlow function call to your model code. True or False?

• False

Q4. You are creating a model to predict the outcome (final score difference) of a basketball game between Team A and Team B. Your initial model is a neural network with [64, 32] nodes, learning_rate = 0.05, batch_size = 32. The input features include whether the game was played “at home” for Team A, the fraction of the last 7 games that Team A won, the average number of points scored by Team A in its last 7 games, the average score of Team A’s opponents in its last 7 games, etc.

Which of these are hyperparameters to the model?

• The number of nodes in each layer of the DNN
• The learning rate
• The batch size
• The number of layers in the DNN
• The number of previous games that the input features are averaged over

### Learning Rate & Batch Size Quiz Answers

Q1. What is the key reason that we want to penalize models for over-complexity?

• Overly-complex models may not be generalizable to real-world scenarios on unseen data

Q2. If your learning rate is too small, your loss function will:

• Converge very slowly

• Will converge rapidly, but not reach the lowest error value possible

Q4. If your batch size is too high, your loss function will

• Converge slowly

Q5. If your batch size is too low, your loss function will:

• Oscillate wildly

## Art and Science of Machine Learning Week 02 Quiz Answers

Q1. Which type of regularization is more likely to lead to zero weights?

• L1

Q2. Which type of regularization penalizes large weight values more?

• L2

Q1. You are training your classification model and are using Logistic Regression. Which is true?

• Your last layer has no weights that can be tuned

### Multi Class Neural Network Quiz Answers

Q1. If you have a classification problem with multiple labels, how does the neural network architecture change?

• Have a logistic layer for each label, and send the outputs of the logistic layer to a softmax layer

Q2. If you have thousands of classes, computing the cross-entropy loss can be very slow. Which of these is a way to help address that problem?

• Use a noise-contrastive loss function

### Training Neural Network Quiz Answers

Q1. Which of these is a common way that neural network training can fail?

• Gradients can explode if the learning rate is too high
• Entire layers can die with all their weights becoming zero
• Gradients can vanish, making it harder to train networks the deeper they are

Q2. If you see a dead layer (fraction of zero weights close to 1), what is a reasonable thing to try?

• Lower the learning rate

Q3. I am training a classification neural network with 5 hidden layers, sigmoid activation function, and [128, 64, 32, 16, 8] with learning_rate=0.05 and batch_size=32. I notice from TensorBoard that gradients in the third layer are near-zero. Is this a problem?

• Yes

Q4. I am training a classification neural network with 5 hidden layers, sigmoid activation function, and [128, 64, 32, 16, 8] with learning_rate=0.05 and batch_size=32. I notice from TensorBoard that gradients in the third layer are near-zero. What would you try to fix this?

• Try using ReLU activation function

## Art and Science of Machine Learning Week 03 Quiz Answers

Q1. What is the benefit of using a pre-canned Estimator?

• It can give us a quick ML model
• Write a Keras model as normal, and use the model_to_estimator function to convert it into an Estimator for train_and_evaluate

Q3. In the model function for a custom estimator, you can customize:

• The set of evaluation metrics
• The loss metric that is optimized
• The optimizer that is used
• The predictions that are returned

Q1. What does the word “embedding” mean in the context of Machine Learning?

• What that means is that you convert words into vectors. This allow you to do calculations on them and find similarities between them. Well-trained models with word embeddings have shown powerful understanding of the language.

Q2. Which of these statements are true?

• Embeddings can be used to project data to a lower dimensional representation
• Embeddings learned on one problem can be reused in another problem
• Creating embeddings can be the first step to solving a clustering problem
• Embeddings can be learned directly from the data
• Embeddings learned on one problem can be used as a starting point when training a related problem