Table of Contents

### About Neural Networks and Deep Learning Course

In the first course of the Deep Learning Specialization, you will study the foundational concept of neural networks and deep learning.

By the end, you will be familiar with the significant technological trends driving the rise of deep learning; build, train, and apply fully connected deep neural networks; implement efficient (vectorized) neural networks; identify key parameters in a neural network’s architecture, and apply deep learning to your own applications.

**Enroll in Neural Networks and Deep Learning Coursera**

### Neural Networks and Deep Learning Coursera Quiz Answers

### Neural Networks and Deep Learning Week 1 Quiz Answers

#### Week 01: Introduction to Deep Learning

Q1. What does the analogy “AI is the new electricity” refer to?

- Through the “smart grid”, AI is delivering a new wave of electricity.
- AI is powering personal devices in our homes and offices, similar to electricity.
- AI runs on computers and is thus powered by electricity, but it is letting computers do things not possible before.
**Similar to electricity starting about 100 years ago, AI is transforming multiple industries.**

Q2. Which of these are reasons for Deep Learning recently taking off? (Check the three options that apply.)

**We have access to a lot more computational power.**- Neural Networks are a brand new field.
**We have access to a lot more data.****Deep learning has resulted in significant improvements in important applications such as online advertising, speech recognition, and image recognition.**

Q3. Recall this diagram of iterating over different ML ideas. Which of the statements below are true? (Check all that apply.)

**Being able to try out ideas quickly allows deep learning engineers to iterate more quickly.****Recent progress in deep learning algorithms has allowed us to train good models faster (even without changing the CPU/GPU hardware).**- It is faster to train on a big dataset than a small dataset.
**Faster computation can help speed up how long a team takes to iterate to a good idea.**

Q4. When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models. True/False?

**False**- True

Q5. Which one of these plots represents a ReLU activation function?

**Figure 1:**

- Figure 2:

- Figure 3:

- Figure 4:

Q6. Images for cat recognition is an example of “structured” data, because it is represented as a structured array in a computer. True/False?

**False**- True

Q7. A demographic dataset with statistics on different cities’ population, GDP per capita, economic growth is an example of “unstructured” data because it contains data coming from different sources. True/False?

**False**- True

Q8. Why is an RNN (Recurrent Neural Network) used for machine translation, say translating English to French? (Check all that apply.)

- RNNs represent the recurrent process of Idea->Code->Experiment->Idea->….
**It can be trained as a supervised learning problem.**- It is strictly more powerful than a Convolutional Neural Network (CNN).
**It is applicable when the input/output is a sequence (e.g., a sequence of words).**

Q9. In this diagram which we hand-drew in lecture, what do the horizontal axis (x-axis) and vertical axis (y-axis) represent?

**x-axis is the amount of data****y-axis (vertical axis) is the performance of the algorithm.**- x-axis is the amount of data
- y-axis is the size of the model you train.
- x-axis is the performance of the algorithm
- y-axis (vertical axis) is the amount of data.
- x-axis is the input to the algorithm
- y-axis is outputs.

Q10. Assuming the trends described in the previous question’s figure are accurate (and hoping you got the axis labels right), which of the following are true? (Check all that apply.)

**Increasing the size of a neural network generally does not hurt an algorithm’s performance, and it may help significantly.**- Decreasing the training set size generally does not hurt an algorithm’s performance, and it may help significantly.
**Increasing the training set size generally does not hurt an algorithm’s performance, and it may help significantly.**- Decreasing the size of a neural network generally does not hurt an algorithm’s performance, and it may help significantly.

### Neural Networks and Deep Learning Week 02 Quiz Answers

Q1. What does a neuron compute?

- A neuron computes an activation function followed by a linear function (z = Wx + b)
- A neuron computes a function g that scales the input x linearly (Wx + b)
- A neuron computes the mean of all features before applying the output to an activation function
**A neuron computes a linear function (z = Wx + b) followed by an activation function**

Q2. Which of these is the “Logistic Loss”?

Q3. Suppose img is a (32,32,3) array, representing a 32×32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?

- x = img.reshape((1,32,
*32,*3)) - x = img.reshape((3,32*32))
- x = img.reshape((32*32,3))
**x = img.reshape((32***,32,*3,1))

Q4. Consider the two following random arrays aa and bb:

a = np.random.randn(2, 3)a=np.random.randn(2,3) # a.shape = (2, 3)a.shape=(2,3)

b = np.random.randn(2, 1)b=np.random.randn(2,1) # b.shape = (2, 1)b.shape=(2,1)

c = a + b c=a+b

What will be the shape of cc?

- c.shape = (3, 2)
**c.shape = (2, 3)**- The computation cannot happen because the sizes don’t match. It’s going to be “Error”!
**c.shape = (2, 1)**

Q5. Consider the two following random arrays aa and bb:

a = np.random.randn(4, 3)a=np.random.randn(4,3) # a.shape = (4, 3)a.shape=(4,3)

b = np.random.randn(3, 2)b=np.random.randn(3,2) # b.shape = (3, 2)b.shape=(3,2)

c = a*bc=a∗b

What will be the shape of cc?

**c.shape = (4, 3)**- The computation cannot happen because the sizes don’t match. It’s going to be “Error”!
- c.shape = (3, 3)
- c.shape = (4,2)

Q6. Recall that X = [x^{(1)} x^{(2)} … x^{(m)}]*X*=[*x*(1)*x*(2)…*x*(*m*)]. What is the dimension of X?

**(n_x, m)**- (m,n_x)
- (m,1)
- (1,m)

Q7. Recall that np.dot(a,b)np.dot(a,b) performs a matrix multiplication on aa and bb, whereas a*ba∗b performs an element-wise multiplication.

Consider the two following random arrays aa and bb:

a = np.random.randn(12288, 150)a=np.random.randn(12288,150) # a.shape = (12288, 150)a.shape=(12288,150)

b = np.random.randn(150, 45)b=np.random.randn(150,45) # b.shape = (150, 45)$$

c = np.dot(a,b)c=np.dot(a,b)

What is the shape of cc?

**c.shape = (12288, 45)****c.shape = (12288, 150)**- The computation cannot happen because the sizes don’t match. It’s going to be “Error”!
- c.shape = (150,150)

Q8. Consider the following code snippet:

# a.shape = (3,4)a.shape=(3,4)

b.shape = (4,1)b.shape=(4,1)

for i in range(3):

for j in range(4):

c[i][j] = a[i][j] + b[j]c[i][j]=a[i][j]+b[j]

How do you vectorize this?

- c = a + b
- c = a.T + b.T
- c = a.T + b
**c = a + b.T**

Q9. Consider the following code:

a = np.random.randn(3, 3)a=np.random.randn(3,3)

b = np.random.randn(3, 1)b=np.random.randn(3,1)

c = a*bc=a∗b

What will be cc? (If you’re not sure, feel free to run this in python to find out).

- This will multiply a 3×3 matrix a with a 3×1 vector, thus resulting in a 3×1 vector. That is, c.shape = (3,1).
- This will invoke broadcasting, so b is copied three times to become (3,3), and *∗ is an element-wise product so c.shape will be (3, 3)
- This will invoke broadcasting, so b is copied three times to become (3, 3), and *∗ invokes a matrix multiplication operation of two 3×3 matrices so c.shape will be (3, 3)
- It will lead to an error since you cannot use “*” to operate on these two matrices. You need to instead use np.dot(a,b)

Q10. Consider the following computation graph.

What is the output J?

**J = (a – 1) * (b + c)**- J = (b – 1) * (c + a)
- J = (c – 1)*(b + a)
- J = a
*b + b*c + a*c

### Neural Networks and Deep Learning Week 03 Quiz Answers

Q1. Which of the following are true? (Check all that apply.)

**X is a matrix in which each row is one training example.**- a^{[2](12)}
*a*[2](12) denotes activation vector of the 12^{th}12*th*layer on the 2^{nd}2*nd*training example. - a^{[2]}_4
*a*4[2] is the activation output of the 2^{nd}2*nd*layer for the 4^{th}4*th*training example **a^{[2](12)}***a*[2](12) denotes the activation vector of the 2^{nd}2*nd*layer for the 12^{th}12*th*training example.**a^{[2]}_4***a*4[2] is the activation output by the 4^{th}4*th*neuron of the 2^{nd}2*nd*layer- X
*X*is a matrix in which each column is one training example. **a^{[2]}***a*[2] denotes the activation vector of the 2^{nd}2*nd*layer.

Q2. The tanh activation is not always better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

- False
**True**

Q3. Which of these is a correct vectorized implementation of forward propagation for layer *l*, where 1≤*l*≤*L*?

**Z^[l]=W^[l]A^[l−1]+b^[l]****A^{[l]} = g^{[l]}(Z^{[l]})****A****[****l****]=****g****[****l****](****Z****[****l****])**

- Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}
*Z*[*l*]=*W*[*l*]*A*[*l*]+*b*[*l*] - A^{[l+1]} = g^{[l]}(Z^{[l]})
*A*[*l*+1]=*g*[*l*](*Z*[*l*])

- Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}
*Z*[*l*]=*W*[*l*]*A*[*l*]+*b*[*l*] - A^{[l+1]} = g^{[l+1]}(Z^{[l]})
*A*[*l*+1]=*g*[*l*+1](*Z*[*l*])

- Z^{[l]} = W^{[l]} A^{[l-1]}+ b^{[l]}
*Z*[*l*]=*W*[*l*]*A*[*l*−1]+*b*[*l*] - A^{[l]} = g^{[l]}(Z^{[l]})
*A*[*l*]=*g*[*l*](*Z*[*l*

Q4. You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

- ReLU
- tanh
**sigmoid**- Leaky ReLU

Q5. Consider the following code:

A = np.random.randn(4,3)B =

B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? (If you’re not sure, feel free to run this in python to find out).

- (1, 3)
**(4, 1)**- (4, )
**(4, 3)**

Q6. Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

- The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.
**Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.**- Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
- Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

Q7. Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

- True
**False**

Q8. You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

- It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
**This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.**- This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning.
- This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

Q9. Consider the following 1 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

- W^{[1]}
*W*[1] will have shape (2, 4) **b^{[1]}***b*[1] will have shape (4, 1)- b^{[1]}
*b*[1] will have shape (2, 1) **W^{[1]}***W*[1] will have shape (4, 2)- W^{[2]}
*W*[2] will have shape (4, 1) - b^{[2]}
*b*[2] will have shape (4, 1) **b^{[2]}***b*[2] will have shape (1, 1)**W^{[2]}***W*[2] will have shape (1, 4)

Q10. In the same network as the previous question, what are the dimensions of *Z*[1] and A^{[1]}*A*[1]?

*Z*[1] and A^{[1]}*A*[1] are (4,2)**Z^{[1]}***Z*[1] and A^{[1]}*A*[1] are (4,m)- Z^{[1]}
*Z*[1] and A^{[1]}*A*[1] are (1,4) - Z^{[1]}
*Z*[1] and A^{[1]}*A*[1] are (4,1)

### Neural Networks and Deep Learning Week 04 Quiz Answers

Q1. What is the “cache” used for in our implementation of forward propagation and backward propagation?

- It is used to keep track of the hyperparameters that we are searching over, to speed up computation.
- We use it to pass variables computed during backward propagation to the corresponding forward propagation step. It contains useful values for forward propagation to compute activations.
**We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.**- It is used to cache the intermediate values of the cost function during training.

Q2. Among the following, which ones are “hyperparameters”? (Check all that apply.)

**learning rate α**- weight matrices W^{[l]}
**number of layers LL in the neural network****size of the hidden layers n^{[l]}**- activation values a^{[l]}
- bias vectors b^{[l]}
**number of iterations**

Q3. Which of the following statements is true?

**The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.**- The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.

Q4. Vectorization allows you to compute forward propagation in an LL-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, …,L. True/False?

- True
**False**

Q5. Assume we store the values for n^{[l]} in an array called layer_dims, as follows: layer_dims = [n_xn x, 4,3,2,1]. So layer 1 has four hidden units, layer 2 has 3 hidden units and so on. Which of the following for-loops will allow you to initialize the parameters for the model?

- for i in range(1, len(layer_dims)): parameter[‘W’ + str(i)] = np.random.randn(layer_dims[i-1], layer_dims[i]) * 0.01 parameter[‘b’ + str(i)] = np.random.randn(layer_dims[i], 1) * 0.01

- for i in range(1, len(layer_dims)/2):

parameter[‘W’ + str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01

parameter[‘b’ + str(i)] = np.random.randn(layer_dims[i], 1) * 0.01 **for i in range(1, len(layer_dims)):**

parameter[‘W’ + str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01

parameter[‘b’ + str(i)] = np.random.randn(layer_dims[i], 1) * 0.01- for i in range(1, len(layer_dims)/2):

parameter[‘W’ + str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01

parameter[‘b’ + str(i)] = np.random.randn(layer_dims[i-1], 1) * 0.01

Q6. Consider the following neural network.

- How many layers does this network have?
**The number of layers L is 4. The number of hidden layers is 3.**- The number of layers L is 5. The number of hidden layers is 4.
- The number of layers L is 4. The number of hidden layers is 4.
- The number of layers L is 3. The number of hidden layers is 3.

Q7. During forward propagation, in the forward function for a layer ll you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer ll, since the gradient depends on it. True/False?

- False
**True**

Q8. There are certain functions with the following properties:

(i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?

- False
**True**

Q9. Consider the following 2 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

- W ^ [3] will have shape (3, 1)
**W ^ [2] will have shape (3, 4)**- W ^ [2] will have shape (3, 1)
**b^ [2] will have shape (3, 1)****W^ [3] will have shape (1, 3)**- b^ [1] will have shape (3, 1)
- b^ [2] will have shape (1, 1)
**b^ [3] will have shape (1, 1)**- b^ [3] will have shape (3, 1)
**W^ [1] will have shape (4, 4)****b^ [1] will have shape (4, 1)**- W^ [1] will have shape (3, 4)

Q10. Whereas the previous question used a specific network, in the general case what is the dimension of W^{[l]}, the weight matrix associated with layer ll?

- W[l] has shape (n[l−1],n[l])
- W[l] has shape (n[l],n[l+1])
**W[l] has shape (n[l],n[l−1])**- W[l] has shape (n[l+1],n[l])

**Get all Course Quiz Answers of Deep Learning Specialization**

**Course 01: Neural Networks and Deep Learning Coursera Quiz Answers**

**Course 03: Structuring Machine Learning Projects Coursera Quiz Answers**

**Course 04: Convolutional Neural Networks Coursera Quiz Answers**

**Course 05: Sequence Models Coursera Quiz Answers**