## Get All Weeks Advanced Learning Algorithms Coursera Quiz Answers

## Table of Contents

### Advanced Learning Algorithms Week 01 Quiz Answers

#### Quiz 1: Neural Networks Intuition Quiz Answers

Q1. Which of these are terms used to refer to components of an artificial neural network?

[expand title=View Answer]

layers

activation function

neurons

[/expand]

Q2. True/False? Neural networks take inspiration from but do not very accurately mimic, how neurons in a biological brain learn.

[expand title=View Answer] True [/expand]

#### Quiz 2: Neural Network Model Quiz Answers

Q1. For a neural network, here is the formula for calculating the activation of the third neuron in layer 2, given the activation vector from layer 1: a^{[2]}_{3}=g( \vec{w}^{[2]}_{3} \cdot \vec{a}^{[1]} + b^{2}_{3} )*a*3[2]=*g*(*w*3[2]⋅*a*[1]+*b*32). Which of the following are correct statements?

[expand title=View Answer]

1.The activation of layer 2 is determined using the activations from the previous layer.

2.Unit 3 (neuron 3) outputs a single number (a scalar).

[/expand]

Q2. For the binary classification for handwriting recognition, discussed in the lecture, which of the following statements is correct?

[expand title=View Answer]

There is a single unit (neuron) in the output layer.

The output of the model can be interpreted as the probability that the handwritten image is of the number one “1”.

After choosing a threshold, you can convert the neural network’s output into a category of 0 or 1.

[/expand]

Q3. For a neural network, what is the expression for calculating the activation of the third neuron in layer 2? Note, this is different from the question that you saw in the lecture video.

[expand title=View Answer]a^{[2]}_{3}=g( \vec{w}^{[2]}_{3} \cdot \vec{a}^{[1]} + b^{2}_{3} )a3[2]=g(w3[2]⋅a[1]+b32) [/expand]

Q4. For the handwriting recognition task discussed in the lecture, what is the output a^{[3]}_1*a*1[3]?

[expand title=View Answer] A number that is either exactly 0 or 1, comprising the network’s prediction [/expand]

#### Quiz 3: TensorFlow Implementation Quiz Answers

Q1. For the the following code:

model = Sequential([

Dense(units=25, activation=”sigmoid”),

Dense(units=15, activation=”sigmoid”),

Dense(units=10, activation=”sigmoid”),

Dense(units=1, activation=”sigmoid”)])

This code will define a neural network with how many layers?

[expand title=View Answer] 4 [/expand]

Q2. How do you define the second layer of a neural network that has 4 neurons and a sigmoid activation?

[expand title=View Answer] Dense(units=4, activation=‘sigmoid’) [/expand]

Q3. If the input features are temperature (in Celsius) and duration (in minutes), how do you write the code for the first feature vector x shown above?

[expand title=View Answer] x = np.array([[200.0, 17.0]]) [/expand]

#### Quiz 4: Neural network implementation in Python Quiz Answers

Q1. According to the lecture, how do you calculate the activation of the third neuron in the first layer using NumPy?

[expand title=View Answer]

1.a_1 = layer_1(x)

2.z1_3 =w1_3 * x + b

[/expand]

Q2. According to the lecture, when coding up the numpy array W, where would you place the w parameters for each neuron?

[expand title=View Answer] In the columns of W. [/expand]

Q3. For the code above in the “dense” function that defines a single layer of neurons, how many times does the code go through the “for loop”? Note that W has 2 rows and 3 columns.

[expand title=View Answer] 3 times [/expand]

For each neuron in the layer, there is one column in the numpy array W. Each row of W represents how many input features are fed into that layer. The for loop calculates the activation value for each neuron.

- 5 times

For each neuron in the layer, there is one column in the numpy array W. Each row of W represents how many input features are fed into that layer. The for loop calculates the activation value for each neuron.

### Advanced Learning Algorithms Week 02 Quiz Answers

#### Quiz 1: Neural Network Training Quiz Answers

Q1. Here is some code that you saw in the lecture:

“`

model.compile(loss=BinaryCrossentropy())

“`

For which type of task would you use the binary cross entropy loss function?

[expand title=View Answer] binary classification (classification with exactly 2 classes) [/expand]

Q2. Here is code that you saw in the lecture:

[expand title=View Answer] model.fit(X,y,epochs=100)[/expand]

“

Which line of code updates the network parameters in order to reduce the cost?

[expand title=View Answer] model.fit(X,y,epochs=100) [/expand]

#### Quiz 2: Activation Functions Quiz Answers

Q1. Which of the following activation functions is the most common choice for the hidden layers of a neural network?

[expand title=View Answer] ReLU (rectified linear unit) [/expand]

Q2. For the task of predicting housing prices, which activation functions could you choose for the output layer? Choose the 2 options that apply.

[expand title=View Answer]

linear

Sigmoid

[/expand]

Q3. True/False? A neural network with many layers but no activation function (in the hidden layers) is not effective; that’s why we should instead use the linear activation function in every hidden layer.

[expand title=View Answer] False [/expand]

#### Quiz 3: Multiclass Classification

Question 1: For a multiclass classification task that has 4 possible outputs, the sum of all the activations adds up to 1. For a multiclass classification task that has 3 possible outputs, the sum of all the activations should add up to ….

- Less than 1

The sum of all the softmax activations should add up to 1 whether the number of possible classes is 3, 4, 5 or any other number of classes. One way to see this is that if e^{z_1}=10, e^{z_2}=20,e^{z_3}=30*ez*1=10,*ez*2=20,*ez*3=30, then the sum of a_1 + a_2 + a_3*a*1+*a*2+*a*3 is equal to \frac{e^{z_1} + e^{z_2} + e^{z_3}}{e^{z_1} + e^{z_2} + e^{z_3}}*ez*1+*ez*2+*ez*3*ez*1+*ez*2+*ez*3 which is 1.

[expand title=View Answer] 1 [/expand]

Q2. For multiclass classification, the cross entropy loss is used for training the model. If there are 4 possible classes for the output, and for a particular training example, the true class of the example is class 3 (y=3), then what does the cross entropy loss simplify to? [Hint: This loss should get smaller when a_3*a*3 gets larger.]

[expand title=View Answer] -log(a_3)−log(a3) [/expand]

Q3. For multiclass classification, the recommended way to implement softmax regression is to set from_logits=True in the loss function, and also to define the model’s output layer with…

[expand title=View Answer] a ‘linear’ activation [/expand]

#### Quiz 4: Additional Neural Network Concepts Quiz Answers

Q1. The Adam optimizer is the recommended optimizer for finding the optimal parameters of the model. How do you use the Adam optimizer in TensorFlow?

[expand title=View Answer] When calling model. compile, set optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3). [/expand]

Q2. The lecture covered a different layer type where each single neuron of the layer does not look at all the values of the input vector that is fed into that layer. What is the name of the layer type discussed in the lecture?

[expand title=View Answer] A fully connected layer[/expand]

### Advanced Learning Algorithms Week 03 Quiz Answers

#### Quiz 1: Advice for applying machine learning

Q1. In the context of machine learning, what is a diagnostic?

[expand title=View Answer] A test that you run to gain insight into what is/isn’t working with a learning algorithm. [/expand]

Q2. True/False? It is always true that the better an algorithm does on the training set, the better it will do on generalizing to new data.

[expand title=View Answer]False[/expand]

Q3. For a classification task; suppose you train three different models using three different neural network architectures. Which data do you use to evaluate the three models in order to choose the best one?

[expand title=View Answer] The test set [/expand]

#### Quiz 2: Bias and Variance Quiz Answers

Q1. If the model’s cross-validation error J_{cv}*Jcv* is much higher than the training error J_{train}*Jtrain*, this is an indication that the model has…

[expand title=View Answer]high variance[/expand]

Q2. Which of these is the best way to determine whether your model has a high bias (has to underfit the training data)?

[expand title=View Answer] Compare the training error to the cross-validation error. [/expand]

Q3. You find that your algorithm has a high bias. Which of these seem like good options for improving the algorithm’s performance? Hint: two of these are correct.

[expand title=View Answer]

Collect more training examples

Collect additional features or add polynomial features

[/expand]

Q4. You find that your algorithm has a training error of 2%, and a cross-validation error of 20% (much higher than the training error). Based on the conclusion you would draw about whether the algorithm has a high bias or high variance problem, which of these seem like good options for improving the algorithm’s performance? Hint: two of these are correct.

[expand title=View Answer]

1.Increase the regularization parameter \lambdaλ

2.Collect more training data

[/expand]

#### Quiz 3: Machine Learning Development Process Quiz Answers

Q1. Which of these is a way to do error analysis?

[expand title=View Answer] Manually examine a sample of the training examples that the model misclassified in order to identify common traits and trends. [/expand]

Q2. We sometimes take an existing training example and modify it (for example, by rotating an image slightly) to create a new example with the same label. What is this process called?

[expand title=View Answer] Data augmentation [/expand]

Q3. What are two possible ways to perform transfer learning? Hint: two of the four choices are correct.

[expand title=View Answer]Given a dataset, pre-train and then further fine-tune a neural network on the same dataset. [/expand]

### Advanced Learning Algorithms Week 04 Quiz Answers

#### Quiz 1: Decision Trees Quiz Answers

Q1. Based on the decision tree shown in the lecture, if an animal has floppy ears, a round face shape, and has whiskers, does the model predict that it’s a cat or not a cat?

[expand title=View Answer] Not a cat [/expand]

Q2. Take a decision tree learning to classify between spam and non-spam email. There are 20 training examples at the root note, comprising 10 spam and 10 non-spam emails. If the algorithm can choose from among four features, resulting in four corresponding splits, which would it choose (i.e., which has the highest purity)?

[expand title=View Answer] Left split: 10 of 10 emails are spam. Right split: 0 of 10 emails are spam.[/expand]

#### Quiz 2: Decision tree learning

Q1. Recall that entropy was defined in the lecture as H(p_1) = – p_1 log_2(p_1) – p_0 log_2(p_0), where p_1 is the fraction of positive examples and p_0 the fraction of negative examples.

At a given node of a decision tree, , 6 of 10 examples are cats and 4 of 10 are not cats. Which expression calculates the entropy H(p_1)*H*(*p*1) of this group of 10 animals?

[expand title=View Answer] -(0.6) log_2(0.6) – (0.4)log_2(0.4)−(0.6)log2(0.6)−(0.4)log2(0.4)[/expand]

Q2. Recall that information was defined as follows:

H(p_1^{root}) – \left ( w^{left} H(p_1^{left}) + w^{right} H(p_1^{right}) \right ) *H*(*p*1*r**o**o**t*)−(*w**l**e**f**t**H*(*p*1*l**e**f**t*)+*w**r**i**g**h**t**H*(*p*1*r**i**g**h**t*))

Before a split, the entropy of a group of 5 cats and 5 non-cats is H(5/10) *H*(5/10). After splitting on a particular feature, a group of 7 animals (4 of which are cats) has an entropy of H(4/7)*H*(4/7). The other group of 3 animals (1 is a cat) has an entropy of H(1/3)*H*(1/3). What is the expression for information gain?

[expand title=View Answer] H(0.5) – \left ( \frac{4}{7} * H(4/7) + \frac{4}{7} * H(1/3) \right )H(0.5)−(74∗H(4/7)+74∗H(1/3)) [/expand]

Q3. To represent 3 possible values for the ear shape, you can define 3 features for ear shape: pointy ears, floppy ears, and oval ears. For an animal whose ears are not pointy, not floppy, but are oval, how can you represent this information as a feature vector?

[expand title=View Answer] [0, 0, 1] [/expand]

Q4. For a continuous-valued feature (such as the weight of the animal), there are 10 animals in the dataset. According to the lecture, what is the recommended way to find the best split for that feature?

[expand title=View Answer] Choose the 9 mid-points between the 10 examples as possible splits, and find the split that gives the highest information gain. [/expand]

Q5. Which of these are commonly used criteria to decide to stop splitting? (Choose two.)

[expand title=View Answer]

1.When the number of examples in a node is below a threshold

2.When the tree has reached a maximum depth

[/expand]

#### Quiz 3: Tree Ensembles Quiz Answers

Q1. For the random forest, how do you build each individual tree so that they are not all identical to each other?

[expand title=View Answer]

If you are training B trees, train each one on 1/B of the training set, so each tree is trained on a distinct set of examples.

Sample the training data with replacement

Sample the training data without replacement

Train the algorithm multiple times on the same training set. This will naturally result in different trees.

[/expand]

Q2. You are choosing between a decision tree and a neural network for a classification task where the input x*x* is a 100×100 resolution image. Which would you choose?

[expand title=View Answer]

A neural network, because the input is unstructured data, and neural networks typically work better with unstructured data.

A decision tree is because the input is unstructured and decision trees typically work better with unstructured data.

A decision tree is because the input is structured data, and decision trees typically work better with structured data.

A neural network, because the input is structured data, and neural networks typically work better with structured data.

[/expand]

Q3. What does sampling with replacement refer to?

[expand title=View Answer]

Drawing a sequence of examples where, when picking the next example, first remove all previously drawn examples from the set we are picking from.

It refers to a process of making an identical copy of the training set.

It refers to using a new sample of data that we use to permanently overwrite (that is, to replace) the original data.

Drawing a sequence of examples where, when picking the next example, first replace all previously drawn examples into the set we are picking from.

[/expand]

#### Get All Course Quiz Answers of Machine Learning Specialization

Supervised Machine Learning: Regression and Classification Quiz Answers

Advanced Learning Algorithms Coursera Quiz Answers

Unsupervised Learning, Recommenders, Reinforcement Learning Quiz Answers