Neural Networks and Deep Learning Coursera Quiz Answers

Welcome to your ultimate guide for Neural Networks and Deep Learning quiz answers! Whether you’re tackling practice quizzes to solidify your understanding or preparing for graded quizzes to test your knowledge, this guide has you covered.

Covering all course modules, this resource will help you master key deep learning concepts, including neural network architecture, forward and backward propagation, activation functions, and optimization techniques.

Neural Networks and Deep Learning Coursera Quiz Answers – Practice & Graded Quizzes for All Modules

Neural Networks and Deep Learning Module 01 Quiz Answers

Q1: What does the analogy “AI is the new electricity” refer to?

Answer: Similar to electricity starting about 100 years ago, AI is transforming multiple industries.

Explanation: Just as electricity revolutionized industries in the past, AI is now reshaping sectors like healthcare, finance, transportation, and more by enabling automation and intelligent decision-making.


Q2: Which of these are reasons for Deep Learning recently taking off? (Check the three options that apply.)

Answer:

  1. We have access to a lot more computational power.
  2. We have access to a lot more data.
  3. Deep learning has resulted in significant improvements in important applications such as online advertising, speech recognition, and image recognition.
    Explanation: The combination of powerful GPUs/TPUs, large datasets, and breakthroughs in applications has fueled the rapid adoption and success of deep learning.

Q3: Recall this diagram of iterating over different ML ideas. Which of the statements below are true? (Check all that apply.)

Answer:

  • Being able to try out ideas quickly allows deep learning engineers to iterate more quickly.
  • Recent progress in deep learning algorithms has allowed us to train good models faster (even without changing the CPU/GPU hardware).
  • Faster computation can help speed up how long a team takes to iterate to a good idea.
    Explanation: Rapid experimentation, faster algorithms, and computational advancements reduce the time required to optimize machine learning models.

Q4: When an experienced deep learning engineer works on a new problem, they can usually use insight from previous problems to train a good model on the first try, without needing to iterate multiple times through different models.

Answer: False

Explanation: Even experienced engineers often need to iterate and refine their models to achieve optimal performance on new problems.


Q5: Which one of these plots represents a ReLU activation function?

Answer: Figure 1

Explanation: The ReLU (Rectified Linear Unit) activation function outputs 0 for negative inputs and the input value itself for positive inputs, resulting in a plot with a flat line for x<0x < 0x<0 and a linear increase for x>0x > 0x>0.


Q6: Images for cat recognition is an example of “structured” data because it is represented as a structured array in a computer.

Answer: False

Explanation: Images are considered unstructured data because, despite being stored as structured arrays, they do not have a predefined schema like a database table.


Q7: A demographic dataset with statistics on different cities’ populations, GDP per capita, and economic growth is an example of “unstructured” data because it contains data coming from different sources.

Answer: False

Explanation: Demographic datasets with defined attributes like population and GDP are examples of structured data, as they are organized in a clear, tabular format.


Q8: Why is an RNN (Recurrent Neural Network) used for machine translation, say translating English to French? (Check all that apply.)

Answer:

  • It can be trained as a supervised learning problem.
  • It is applicable when the input/output is a sequence (e.g., a sequence of words).
    Explanation: RNNs are designed to handle sequential data, such as text or time series, and can be trained on labeled datasets in a supervised learning setup.

Q9: In this diagram which we hand-drew in lecture, what do the horizontal axis (x-axis) and vertical axis (y-axis) represent?

Answer:

  • x-axis is the amount of data.
  • y-axis (vertical axis) is the performance of the algorithm.
    Explanation: This diagram typically shows how the performance of machine learning algorithms improves with an increase in training data size.

Q10: Assuming the trends described in the previous question’s figure are accurate, which of the following are true? (Check all that apply.)

Answer:

  • Increasing the size of a neural network generally does not hurt an algorithm’s performance, and it may help significantly.
  • Increasing the training set size generally does not hurt an algorithm’s performance, and it may help significantly.
    Explanation: Larger training datasets and neural networks generally enhance performance, provided overfitting and computational constraints are managed.

Neural Networks and Deep Learning Module 02 Quiz Answers

Q1: What does a neuron compute?

Answer: A neuron computes a linear function (z = Wx + b) followed by an activation function.

Explanation: In deep learning, a neuron first computes a weighted sum of its inputs plus a bias term (z=Wx+bz = Wx + bz=Wx+b) and then applies an activation function (such as ReLU or sigmoid) to introduce non-linearity.


Q3: Suppose img is a (32,32,3) array, representing a 32×32 image with 3 color channels red, green, and blue. How do you reshape this into a column vector?

Answer: x = img.reshape((32*32*3, 1))

Explanation: To reshape the (32,32,3) image into a column vector, you flatten the dimensions into a single dimension of size 32×32×332 \times 32 \times 332×32×3 and create a vector with shape (3072, 1).


Q4: Consider the two following random arrays a and b:

pythonCopyEdita = np.random.randn(2, 3)  # a.shape = (2, 3)
b = np.random.randn(2, 1)  # b.shape = (2, 1)
c = a + b

What will be the shape of c?

Answer: c.shape = (2, 3)

Explanation: Broadcasting copies b (2,1) across the second dimension to match the shape of a (2,3), so the resulting shape of c is (2,3).


Q5: Consider the two following random arrays a and b:

pythonCopyEdita = np.random.randn(4, 3)  # a.shape = (4, 3)
b = np.random.randn(3, 2)  # b.shape = (3, 2)
c = a * b

What will be the shape of c?

Answer: This will lead to an error.

Explanation: Element-wise multiplication requires the dimensions to match or be broadcastable, but (4,3) and (3,2) are not compatible. This will raise a shape mismatch error.


Q6: Recall that X=[x(1)x(2)…x(m)]X = [x^{(1)} x^{(2)} \dots x^{(m)}]X=[x(1)x(2)…x(m)]. What is the dimension of XXX?

Answer: (nx,m)(n_x, m)(nx​,m)

Explanation: Here, nxn_xnx​ is the number of features (input size), and mmm is the number of examples. XXX is a matrix where each column represents one training example.


Q7: Recall that np.dot(a,b) performs a matrix multiplication on a and b, whereas a * b performs an element-wise multiplication.

pythonCopyEdita = np.random.randn(12288, 150)  # a.shape = (12288, 150)
b = np.random.randn(150, 45)  # b.shape = (150, 45)
c = np.dot(a, b)

What is the shape of c?

Answer: c.shape = (12288, 45)

Explanation: Matrix multiplication between (12288, 150) and (150, 45) results in a matrix with shape (12288, 45).


Q8: Consider the following code snippet:

pythonCopyEdita.shape = (3, 4)
b.shape = (4, 1)
for i in range(3):
    for j in range(4):
        c[i][j] = a[i][j] + b[j]

How do you vectorize this?

Answer: c = a + b.T

Explanation: Broadcasting allows you to perform the operation directly by transposing b (4,1) to (1,4) and adding it to a (3,4).


Q9: Consider the following code:

pythonCopyEdita = np.random.randn(3, 3)
b = np.random.randn(3, 1)
c = a * b

What will be c?

Answer: This will invoke broadcasting, so b is copied three times to become (3,3), and * is an element-wise product, so c.shape = (3, 3).

Explanation: Broadcasting extends b from (3,1) to match a (3,3). The * operator performs element-wise multiplication.


Q10: Consider the following computation graph. What is the output JJJ?

Answer: J=(a−1)∗(b+c)J = (a – 1) * (b + c)J=(a−1)∗(b+c)

Explanation: The computation graph represents the calculation J=(a−1)×(b+c)J = (a – 1) \times (b + c)J=(a−1)×(b+c), where a,b,a, b,a,b, and ccc are inputs.

Neural Networks and Deep Learning Module 03 Quiz Answers

Q1: Which of the following are true? (Check all that apply.)

Answer:

  1. XXX is a matrix in which each row is one training example.
  2. ( a^{enotes the activation vector of the 2nd layer for the 12th training example.
  3. a4[2]a^{[2]}_4a4[2]​ is the activation output by the 4th neuron of the 2nd layer.
  4. a[2]a^{[2]}a[2] denotes the activation vector of the 2nd layer.
    Explanation:
  • These statements correctly describe how activations and training examples are indexed and stored in neural networks. a[2]a^{[2]}a[2] represents the activations for the entire 2nd layer, while subscripts or superscripts specify individual neurons or training examples.

Q2: The tanh activation is not always better than the sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

Answer: False

Explanation: While tanh centers the data around zero, it is often better than sigmoid because this centering speeds up convergence. However, it does not make learning more complex for the next layer.


Q3: Which of these is a correct vectorized implementation of forward propagation for layer lll, where 1≤l≤L1 \leq l \leq L1≤l≤L?
Answer:

  1. Z[l]=W[l]A[l−1]+b[l]Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}Z[l]=W[l]A[l−1]+b[l]
  2. A[l]=g[l](Z[l])A^{[l]} = g^{[l]}(Z^{[l]})A[l]=g[l](Z[l])
    Explanation:
  • Z[l]Z^{[l]}Z[l]: Computes the linear function for layer lll.
  • A[l]A^{[l]}A[l]: Applies the activation function g[l]g^{[l]}g[l] to Z[l]Z^{[l]}Z[l]. This is the standard forward propagation implementation.

Q4: You are building a binary classifier for recognizing cucumbers (y=1y = 1y=1) vs. watermelons (y=0y = 0y=0). Which one of these activation functions would you recommend using for the output layer?

Answer: Sigmoid

Explanation: Sigmoid is commonly used in binary classification because its output is between 0 and 1, which is suitable for representing probabilities.


Q5: Consider the following code:

pythonCopyEditA = np.random.randn(4,3)
B = np.sum(A, axis=1, keepdims=True)

What will be B.shape?

Answer: (4,1)(4, 1)(4,1)

Explanation:

  • The axis=1 parameter sums along the rows, resulting in one value per row.
  • keepdims=True ensures that the result maintains its 2D structure, so BBB has shape (4,1)(4, 1)(4,1).

Q6: Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Answer: Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent, each neuron in the layer will compute the same thing as other neurons.

Explanation: Initializing weights to zero causes symmetry between neurons, meaning all neurons in the same layer will learn identical parameters, preventing the network from converging effectively.


Q7: Logistic regression’s weights www should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”. True/False?

Answer: False

Explanation: Logistic regression does not have hidden layers, so initializing www to zeros does not create a symmetry issue. The optimization process (e.g., gradient descent) can still find a solution.


Q8: You have built a network using the tanh activation for all the hidden units. You initialize the weights to relatively large values, using np.random.randn(..,..)*1000. What will happen?

Answer: This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

Explanation: When weights are large, the outputs of the tanh function saturate (approach +1 or -1), leading to vanishing gradients and slow convergence.


Q9: Consider the following 1 hidden layer neural network: Which of the following statements is True? (Check all that apply.)

Answer:

  1. b[1]b^{[1]}b[1] will have shape (4,1)(4, 1)(4,1).
  2. W[1]W^{[1]}W[1] will have shape (4,2)(4, 2)(4,2).
  3. W[2]W^{[2]}W[2] will have shape (1,4)(1, 4)(1,4).
    Explanation:
  • W[1]W^{[1]}W[1] connects the input layer (2 units) to the hidden layer (4 units), so it has shape (4,2)(4, 2)(4,2).
  • b[1]b^{[1]}b[1] has one bias per hidden neuron, so its shape is (4,1)(4, 1)(4,1).
  • W[2]W^{[2]}W[2] connects the hidden layer (4 units) to the output layer (1 unit), so it has shape (1,4)(1, 4)(1,4).

Q10: In the same network as the previous question, what are the dimensions of Z[1]Z^{[1]}Z[1] and A[1]A^{[1]}A[1]?

Answer: Z[1]Z^{[1]}Z[1] and A[1]A^{[1]}A[1] are (4,m)(4, m)(4,m).

Explanation:

  • Z[1]Z^{[1]}Z[1]: The result of the linear computation in the hidden layer, where the hidden layer has 4 units and there are mmm training examples.
  • A[1]A^{[1]}A[1]: The activations for the hidden layer, also of shape (4,m)(4, m)(4,m).

Neural Networks and Deep Learning Module 04 Quiz Answers

Q1: What is the “cache” used for in our implementation of forward propagation and backward propagation?

Answer: We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.

Explanation: The “cache” stores intermediate results (like Z[l]Z^{[l]}Z[l] and A[l]A^{[l]}A[l]) from forward propagation, which are required during backpropagation to compute gradients efficiently.


Q2: Among the following, which ones are “hyperparameters”? (Check all that apply.)

Answer:

  1. Learning rate α\alphaα
  2. Number of layers LLL in the neural network
  3. Size of the hidden layers n[l]n^{[l]}n[l]
  4. Number of iterations
    Explanation:
    Hyperparameters are values set before training that control the training process, such as learning rate, network architecture, and the number of training iterations.

Q3: Which of the following statements is true?

Answer: The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.

Explanation: Early layers of a neural network learn simpler features (like edges in images), while deeper layers combine these features to compute more complex representations.


Q4: Vectorization allows you to compute forward propagation in an LLL-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1,2,…,Ll = 1, 2, \dots, Ll=1,2,…,L. True/False?

Answer: False

Explanation: While vectorization eliminates the need for loops within a single layer’s computations, a loop over the layers is still necessary to compute forward propagation across all layers.


Q5: Assume we store the values for n[l]n^{[l]}n[l] in an array called layer_dims, as follows: layer_dims = [n_x, 4, 3, 2, 1]. So layer 1 has four hidden units, layer 2 has 3 hidden units, and so on. Which of the following for-loops will allow you to initialize the parameters for the model?

Answer:

pythonCopyEditfor i in range(1, len(layer_dims)):
    parameters['W' + str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01
    parameters['b' + str(i)] = np.zeros((layer_dims[i], 1))

Explanation: This loop initializes W[l]W^{[l]}W[l] matrices with random values (scaled to 0.010.010.01) and b[l]b^{[l]}b[l] vectors with zeros, which is a standard initialization technique.


Q6: Consider the following neural network.

Answer: The number of layers LLL is 4. The number of hidden layers is 3.

Explanation: The total number of layers (LLL) includes the input layer, hidden layers, and the output layer. Hidden layers exclude the input and output layers.


Q7: During forward propagation, in the forward function for a layer lll, you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer lll, since the gradient depends on it. True/False?

Answer: True

Explanation: The activation function used in forward propagation affects the gradient computation during backpropagation, as the derivative of the activation function is required to compute the gradients.


Q8: There are certain functions with the following properties:
(i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?

Answer: True

Explanation: Deep networks can compute certain functions much more efficiently than shallow networks, requiring exponentially fewer units due to their hierarchical structure.


Q9: Consider the following 2 hidden-layer neural network: Which of the following statements is True? (Check all that apply.)

Answer:

  1. W[2]W^{[2]}W[2] will have shape (3, 4)
  2. b[2]b^{[2]}b[2] will have shape (3, 1)
  3. W[3]W^{[3]}W[3] will have shape (1, 3)
  4. b[3]b^{[3]}b[3] will have shape (1, 1)
  5. b[1]b^{[1]}b[1] will have shape (4, 1)
    Explanation:
  • W[l]W^{[l]}W[l] has shape (n[l],n[l−1])(n^{[l]}, n^{[l-1]})(n[l],n[l−1]).
  • b[l]b^{[l]}b[l] has shape (n[l],1)(n^{[l]}, 1)(n[l],1).
    These dimensions ensure that matrix multiplications and additions during forward propagation are valid.

Q10: In the general case, what is the dimension of W[l]W^{[l]}W[l], the weight matrix associated with layer lll?

Answer: W[l]W^{[l]}W[l] has shape (n[l],n[l−1])(n^{[l]}, n^{[l-1]})(n[l],n[l−1])

Explanation: Here, n[l]n^{[l]}n[l] is the number of units in layer lll, and n[l−1]n^{[l-1]}n[l−1] is the number of units in the previous layer (l−1l-1l−1). This shape ensures proper matrix multiplication in forward propagation.

Frequently Asked Questions (FAQ)
Are the Neural Networks and Deep Learning quiz answers accurate?

Yes, these answers are thoroughly verified and aligned with the latest course material on neural networks and deep learning.

Can I use these answers for both practice and graded quizzes?

Absolutely! These answers are suitable for both practice quizzes and graded assessments, ensuring you’re well-prepared for all evaluations.

Does this guide cover all modules in the course?

Yes, this guide provides answers for each module, ensuring comprehensive coverage for the entire course.

Will this guide help me understand deep learning better?

Yes, this guide reinforces fundamental concepts such as loss functions, optimization algorithms, gradient descent, and how neural networks learn, helping you build a strong foundation in deep learning.

Conclusion:

In conclusion, the Neural Networks and Deep Learning Coursera Quiz Answers provide a comprehensive understanding of key concepts and principles in the field of neural networks and deep learning.

These answers not only serve as a valuable resource for learners seeking to solidify their knowledge but also offer insights into solving practical problems using deep learning techniques.

Get all Course Quiz Answers of Deep Learning Specialization

Course 01: Neural Networks and Deep Learning Coursera Quiz Answers

Course 02: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization Quiz Answers

Course 03: Structuring Machine Learning Projects Coursera Quiz Answers

Course 04: Convolutional Neural Networks Coursera Quiz Answers

Course 05: Sequence Models Coursera Q

Share your love

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *