## All Weeks Applied AI with DeepLearning Coursera Quiz Answers

### Applied AI with Deep Learning Week 01 Quiz Answers

#### Quiz : DeepLearning Fundamentals

Q1. What is an example of a vector-vector multiplication?

- 2*3=6
- (1,2,3) * (4,5,6) = (4,10,18)
- (1,2,3) * (4,5,6) = 23
- (1,2,3) * (4,5,6) = 32

Q2. Which equation is correct give the following vectors?

Note: The “%*%” symbol is used to denote the “dot product”

w = (0.3,0.5,0.7)

x = (3,5,7)

- w %*% x = 0.3 * 3 + 0.5 * 5 + 0.7 * 7
- w %*% x = 0.3 + 3 * 0.5 + 5 * 0.7 + 7

Q3. Please use this wikipedia article to answer the following question:

For which task an LSTM Neural Network is best suited?

- Any type of sequence processing including Text and DNA
- Image Classification
- Time-series analysis

Q4. To train an auto-encoder, what data is shown to the neural network on the right hand side?

- X – The same data as on the left hand side
- y – The actual labeled data

Q5. Which dimension reduction technique is limited because it provides only a linear transformation of the data?

- PCA – Principal Component Analysis
- t-SNE – the t-distributed stochastic neighbor embedding
- Deep Neural Network Auto Encoder

Q6. Which optimization technique is the most commonly used for Neural Network training?

- Gradient Descent
- Montecarlo
- Grid Search

Q7. What are examples of a one-hot encoded vector?

Reference:

https://en.wikipedia.org/wiki/One-hot

- (0,0,0,0,0,0,1,0,0,0)
- (1,0,0,0,0,0,1,0,0,0)

### Applied AI with Deep Learning Week 02 Quiz Answers

#### Quiz : TensorFlow

Q1. Which statement is correct?

- TensorFlow is a library for arbitrary numerical computation not limited to DeepLearning only
- TensorFlow is a high-level DeepLearning library allowing you to define only Neural Network layers

Q2. What is a TensorFlow placeholder?

- A way to add data to the TensorFlow execution graph at a later stage
- A TensorFlow program expressed as execution graph which can be run by the TensorFlow engine at a later stage

Q3. What data is usually stored in a TensorFlow variable?

- The weight matrix W
- Training data X
- Label data y

Q4. What statements are correct in respect to a unhealthy value distribution (histogram) of values in the weight matrix?

- A uniform distribution indicates a lack of parameter updates and therefore problems with training
- Values at the far end of the spectrum are indicating over-saturation of the weight matrix
- Most of the values centered very close to zero forces gradients to become very small

Q5. What is the relationship between accuracy and loss in a healthy and working neural network?

- Both values should behave inverse during neural network training
- accuracy and loss are basically the same measure, so they should behave identically

Q6. Which statements about automatic differentiation in TensorFlow are correct?

- Every operator in TensorFlow has registered the first derivative of it’s operation as well. Therefore TensorFlow can apply the chain rule of automatic differentiation in order to compute the 1st derivative of any complex function
- TensorFlow has a auto differentiation engine which is capable of can find out the 1st derivative of every atomic operation without being provided.

#### Quiz : TensorFlow 2.x

Q1. What are the most important new features in TensorFlow 2.x?

- Eager Execution
- Keras as official high level API for Deep Learning
- Compatibility with other Deep Learning Frameworks like PyTorch or Chainer

Q2. Which is correct?

- With eager execution an TensorFlow session is not needed anymore
- TensorFlow sessions are still necessarry for eager execution

Q3. Which is true?

- Tightly integrating Keras as official high level API into TensorFlow has many advantages
- Tightly integrating Keras as official high level API into TensorFlow changes nothing but the import statements

#### Quiz : Apache SystemML

Q1. What’s are the most important similarities between SystemML and Tensorflow?

- Both are OpenSource and totally free of charge
- Both are a linear algebra computing framework using a declarative approach of expressing computations and shipping those to to execution engine

Q2. What’s are the most important differences between SystemML and Tensorflow?

- Apache SystemML is mostly used in Machine Learning research
- TensorFlow is a very established framework and slowly becoming de-facto standard in Deep Learning together with PyTorch
- SystemML takes statistics about your data into account to create a parallel program that runs on a compute cluster that is optimized for the type of data you have

#### Quiz : PyTorch Introduction

Q1. On which objects the PyTorch operations are running?

- On pandas dataframes
- On numpy arrays
- On tensors

Q2. Which size has the tensor created by this operation?

`m_tensor = torch.randn((6, 3, 28, 28))`

- 6
- 6x3x28x28
- 14112

Q3. The tensor of wich size returns the following indexing of a tensor of size 5x4x4x4?

`m_tensor = torch.randn((5, 4, 4, 4))`

`m_tensor[0]`

- 5
- 5x4x4
- 4x4x4

Q4. Given are the following two tensors:

`x = torch.randn(4, 3)`

`y = torch.randn(3, 4)`

What would be the result of the concatenation of these two tensors?

`z = torch.cat([x, y], 1)`

** **

- Tensor of size 4×2

`RuntimeError: inconsistent tensor sizes`

- Tensor of size 2×4
- Tensor of size 2×4 with deprecation warning

Q5. Which function(s) can you call to reshape a PyTorch tensor?

- reshape(args)
- view(args)

Q6. How is the computation graph created with PyTorch?

- PyTorch uses a unique approach of tape-based differentiation, hence computation graph is not created at all
- Computation graph defined staticaly during the model definition
- Computation graph is defined dynamically via autograd.Variable PyTorch components

### Applied AI with Deep Learning Week 03 Quiz Answers

#### Quiz : Anomaly Detection

Q1. Which statements are true with respect to unsupervised machine learning?**1 point**

- Unsupervised Machine Learning requires a labeled data set in order to train the algorithm
- There exists more unlabeled data than labeled
- There exists more labeled data than unlabeled

Q2. IBM Watson Studio provides a file system underlying the jupyter notebooks of roughly 100 TeraByte for staging. Why it is not a good idea to use this as permanent data store?

- The staging area is very expensive to use
- This filesystem is volatile. So IBM can reset it at any point in time. Therefore, if data needs to be kept for long-term, ObjectStore is the best choice

q3. What’s the input dimensionality of this layer?

(How many columns/features does the input data set have)

`model.add(LSTM(42,input_shape=(23,5),return_sequences=True))`

Q4. How many LSTM cells does this layer have?

`model.add(LSTM(42,input_shape=(23,5),return_sequences=True))`

Q5. How many future time steps does this layer predict?

(Note: this depends on the number of time-steps this layer received as input. Since Keras only supports symetrical input/output lenghts the input length determines the output length)

`model.add(LSTM(42,input_shape=(23,5),return_sequences=True))`

Q6. Which type of neurons are the best fit for time-series data?

- Feed Forward Neural Networks
- LSTM Long-Short-Term-Memory networks

#### Quiz : Sequence Classification with Keras LSTM Network

Q1. What is the main characteristic of the stateful LSTM Network?

- Stateful LSTM uses
**batch n**last output hidden states and cell states as initial states for the**batch n+1** - Stateful LSTM updates parameters on
**batch n**and then initiates hidden states and cell states for**batch n+1,**usually with zeros

Q2. In which case would you prefer stateless over stateful LSTM?

- When sequences in the batches are not related to each other, e.g. represent complete sentences
- Stateful LSTM is always a better choice then the stateless LSTM
- When sequences in batches are related to each other, e.g. one long time series

Q3. Batch-size and trainings-set size (pick two)

- For a stateful LSTM the batch-size must be always divisible by 8
- The trainings-set size must be divisible by the batch-size for stateful and stateless LSTM Network
- The trainings-set size must be divisible by the batch-size for the stateful LSTM only
- Batch-size can impact (prediction) performance of LSTM Network

Q4. Loss Function

- Mean Absolute Error (MAE) is always less effective Loss-Function then Mean Squared Error (MSE)
- Mean Absolute Error (MAE) can be more outlier resistant

- Both are synonyms
- LSTM Cell
- LSTM Hidden State is equivalent to the Cell output

Q6. Given is the formula to compute the number of LSTM layer parameters:

PARAMETERS = 4 * LSTM outputs size * (weights LSTM inputs size + weights LSTM outputs size + 1 bias variable)

Please, calculate the parameters for the Layer with the output-shape (64, 30, 10). This the first layer after the input-layer, which has the shape (64, 30, 1).

Select single correct answer:**=**

- 840
- 440
- 0
- 480

#### Quiz : Image Classification

Q1. Which neural network layer types are most commonly used in image classicifation?

- Convolution
- LSTM
- MaxPooling
- ConvPooling

Q2. What is the purpose of a Dropout layer?

- Improve classification performance on the training set
- Prevent over-fitting on the training set

Q3. Which statements are correct? Please choose all that apply.

- Imagenet is a pre-classified dataset of images
- Resnet50 is a pre-classified dataset of images
- Imagenet is a special Neural Network topology for classifying images
- Resnet50 is a special Neural Network topology for classifying images

#### Quiz : NLP

Q1. Which statement is correct?

- Word2Vec is a special case of an auto-encoder
- Word2Vec uses a Convolutional Neural Network

Q2. What is the closes result given the following word vectors and the given calculation?

cat = (1,2,3,4)

bird = (2,3,4,5)

flying = (1,1,1,1)

swimming = (0,3,5,6)

duck = (2,3,4,6)

bird – flying = ?

- duck
- cat
- swimming

Q3. Consider a vocabulary size of 1.000.000 words, a 100 dimensional Embedding in a ternary (three output classes) classification task – which of the following neural network configuration works best?

i -> number of input layer neurons

h -> number of hidden layer neurons

o -> number of output layer neurons

a -> activation function of output layer (softmax)

- i =100 , h = 1.000.000, o = 3, a = relu
- i = 3, h = 1.000.000, o = 100, a = softmax
- i = 1.000.000, h = 100, o = 3, a = relu
- i = 1.000.000, h = 100, o = 3, a = softmax
- i =100 , h = 1.000.000, o = 3, a = softmax
- i = 3, h = 1.000.000, o = 100, a = relu

### Applied AI with Deep Learning Week 04 Quiz Answers

#### Quiz : Methods of parallel neural network training

Q1. Which of the following is NOT a way to parallelize neural network training?

- data parallelism
- intra-model parallelism
- inter-model parallelism
- pipelined parallelism
**model-free parallelism**

Q2. In which time window(s), the GPU sits idle, waiting for the next data to arrive?

- C and S
- C
- T
- C,T and S
- C and T
**T and S**- S

Q3. What is the role of the Parameter Server in Data Parallelism?

- Distributing the data for training
**Averaging the gradients it receives from the different workers and sending them back**- Scheduling the workload on the compute cluster

#### Get All Course Quiz Answers of Advanced Data Science with IBM Specialization

Fundamentals of Scalable Data Science Coursera Quiz Answers

Advanced Machine Learning and Signal Processing Quiz Answers

Applied AI with DeepLearning Coursera Quiz Answers

Advanced Data Science Capstone Coursera Quiz Answers