All Weeks Applied AI with DeepLearning Coursera Quiz Answers
Applied AI with Deep Learning Week 01 Quiz Answers
Quiz : DeepLearning Fundamentals
Q1. What is an example of a vector-vector multiplication?
- (1,2,3) * (4,5,6) = (4,10,18)
- (1,2,3) * (4,5,6) = 23
- (1,2,3) * (4,5,6) = 32
Q2. Which equation is correct give the following vectors?
Note: The “%*%” symbol is used to denote the “dot product”
w = (0.3,0.5,0.7)
x = (3,5,7)
- w %*% x = 0.3 * 3 + 0.5 * 5 + 0.7 * 7
- w %*% x = 0.3 + 3 * 0.5 + 5 * 0.7 + 7
Q3. Please use this wikipedia article to answer the following question:
For which task an LSTM Neural Network is best suited?
- Any type of sequence processing including Text and DNA
- Image Classification
- Time-series analysis
Q4. To train an auto-encoder, what data is shown to the neural network on the right hand side?
- X – The same data as on the left hand side
- y – The actual labeled data
Q5. Which dimension reduction technique is limited because it provides only a linear transformation of the data?
- PCA – Principal Component Analysis
- t-SNE – the t-distributed stochastic neighbor embedding
- Deep Neural Network Auto Encoder
Q6. Which optimization technique is the most commonly used for Neural Network training?
- Gradient Descent
- Grid Search
Q7. What are examples of a one-hot encoded vector?
Applied AI with Deep Learning Week 02 Quiz Answers
Quiz : TensorFlow
Q1. Which statement is correct?
- TensorFlow is a library for arbitrary numerical computation not limited to DeepLearning only
- TensorFlow is a high-level DeepLearning library allowing you to define only Neural Network layers
Q2. What is a TensorFlow placeholder?
- A way to add data to the TensorFlow execution graph at a later stage
- A TensorFlow program expressed as execution graph which can be run by the TensorFlow engine at a later stage
Q3. What data is usually stored in a TensorFlow variable?
- The weight matrix W
- Training data X
- Label data y
Q4. What statements are correct in respect to a unhealthy value distribution (histogram) of values in the weight matrix?
- A uniform distribution indicates a lack of parameter updates and therefore problems with training
- Values at the far end of the spectrum are indicating over-saturation of the weight matrix
- Most of the values centered very close to zero forces gradients to become very small
Q5. What is the relationship between accuracy and loss in a healthy and working neural network?
- Both values should behave inverse during neural network training
- accuracy and loss are basically the same measure, so they should behave identically
Q6. Which statements about automatic differentiation in TensorFlow are correct?
- Every operator in TensorFlow has registered the first derivative of it’s operation as well. Therefore TensorFlow can apply the chain rule of automatic differentiation in order to compute the 1st derivative of any complex function
- TensorFlow has a auto differentiation engine which is capable of can find out the 1st derivative of every atomic operation without being provided.
Quiz : TensorFlow 2.x
Q1. What are the most important new features in TensorFlow 2.x?
- Eager Execution
- Keras as official high level API for Deep Learning
- Compatibility with other Deep Learning Frameworks like PyTorch or Chainer
Q2. Which is correct?
- With eager execution an TensorFlow session is not needed anymore
- TensorFlow sessions are still necessarry for eager execution
Q3. Which is true?
- Tightly integrating Keras as official high level API into TensorFlow has many advantages
- Tightly integrating Keras as official high level API into TensorFlow changes nothing but the import statements
Quiz : Apache SystemML
Q1. What’s are the most important similarities between SystemML and Tensorflow?
- Both are OpenSource and totally free of charge
- Both are a linear algebra computing framework using a declarative approach of expressing computations and shipping those to to execution engine
Q2. What’s are the most important differences between SystemML and Tensorflow?
- Apache SystemML is mostly used in Machine Learning research
- TensorFlow is a very established framework and slowly becoming de-facto standard in Deep Learning together with PyTorch
- SystemML takes statistics about your data into account to create a parallel program that runs on a compute cluster that is optimized for the type of data you have
Quiz : PyTorch Introduction
Q1. On which objects the PyTorch operations are running?
- On pandas dataframes
- On numpy arrays
- On tensors
Q2. Which size has the tensor created by this operation?
m_tensor = torch.randn((6, 3, 28, 28))
Q3. The tensor of wich size returns the following indexing of a tensor of size 5x4x4x4?
m_tensor = torch.randn((5, 4, 4, 4))
Q4. Given are the following two tensors:
x = torch.randn(4, 3)
y = torch.randn(3, 4)
What would be the result of the concatenation of these two tensors?
z = torch.cat([x, y], 1)
- Tensor of size 4×2
RuntimeError: inconsistent tensor sizes
- Tensor of size 2×4
- Tensor of size 2×4 with deprecation warning
Q5. Which function(s) can you call to reshape a PyTorch tensor?
Q6. How is the computation graph created with PyTorch?
- PyTorch uses a unique approach of tape-based differentiation, hence computation graph is not created at all
- Computation graph defined staticaly during the model definition
- Computation graph is defined dynamically via autograd.Variable PyTorch components
Applied AI with Deep Learning Week 03 Quiz Answers
Quiz : Anomaly Detection
Q1. Which statements are true with respect to unsupervised machine learning?1 point
- Unsupervised Machine Learning requires a labeled data set in order to train the algorithm
- There exists more unlabeled data than labeled
- There exists more labeled data than unlabeled
Q2. IBM Watson Studio provides a file system underlying the jupyter notebooks of roughly 100 TeraByte for staging. Why it is not a good idea to use this as permanent data store?
- The staging area is very expensive to use
- This filesystem is volatile. So IBM can reset it at any point in time. Therefore, if data needs to be kept for long-term, ObjectStore is the best choice
q3. What’s the input dimensionality of this layer?
(How many columns/features does the input data set have)
Q4. How many LSTM cells does this layer have?
Q5. How many future time steps does this layer predict?
(Note: this depends on the number of time-steps this layer received as input. Since Keras only supports symetrical input/output lenghts the input length determines the output length)
Q6. Which type of neurons are the best fit for time-series data?
- Feed Forward Neural Networks
- LSTM Long-Short-Term-Memory networks
Quiz : Sequence Classification with Keras LSTM Network
Q1. What is the main characteristic of the stateful LSTM Network?
- Stateful LSTM uses batch n last output hidden states and cell states as initial states for the batch n+1
- Stateful LSTM updates parameters on batch n and then initiates hidden states and cell states for batch n+1, usually with zeros
Q2. In which case would you prefer stateless over stateful LSTM?
- When sequences in the batches are not related to each other, e.g. represent complete sentences
- Stateful LSTM is always a better choice then the stateless LSTM
- When sequences in batches are related to each other, e.g. one long time series
Q3. Batch-size and trainings-set size (pick two)
- For a stateful LSTM the batch-size must be always divisible by 8
- The trainings-set size must be divisible by the batch-size for stateful and stateless LSTM Network
- The trainings-set size must be divisible by the batch-size for the stateful LSTM only
- Batch-size can impact (prediction) performance of LSTM Network
Q4. Loss Function
- Mean Absolute Error (MAE) is always less effective Loss-Function then Mean Squared Error (MSE)
- Mean Absolute Error (MAE) can be more outlier resistant
- Both are synonyms
- LSTM Cell State is its memory
- LSTM Hidden State is equivalent to the Cell output
Q6. Given is the formula to compute the number of LSTM layer parameters:
PARAMETERS = 4 * LSTM outputs size * (weights LSTM inputs size + weights LSTM outputs size + 1 bias variable)
Please, calculate the parameters for the Layer with the output-shape (64, 30, 10). This the first layer after the input-layer, which has the shape (64, 30, 1).
Select single correct answer:=
Quiz : Image Classification
Q1. Which neural network layer types are most commonly used in image classicifation?
Q2. What is the purpose of a Dropout layer?
- Improve classification performance on the training set
- Prevent over-fitting on the training set
Q3. Which statements are correct? Please choose all that apply.
- Imagenet is a pre-classified dataset of images
- Resnet50 is a pre-classified dataset of images
- Imagenet is a special Neural Network topology for classifying images
- Resnet50 is a special Neural Network topology for classifying images
Quiz : NLP
Q1. Which statement is correct?
- Word2Vec is a special case of an auto-encoder
- Word2Vec uses a Convolutional Neural Network
Q2. What is the closes result given the following word vectors and the given calculation?
cat = (1,2,3,4)
bird = (2,3,4,5)
flying = (1,1,1,1)
swimming = (0,3,5,6)
duck = (2,3,4,6)
bird – flying = ?
Q3. Consider a vocabulary size of 1.000.000 words, a 100 dimensional Embedding in a ternary (three output classes) classification task – which of the following neural network configuration works best?
i -> number of input layer neurons
h -> number of hidden layer neurons
o -> number of output layer neurons
a -> activation function of output layer (softmax)
- i =100 , h = 1.000.000, o = 3, a = relu
- i = 3, h = 1.000.000, o = 100, a = softmax
- i = 1.000.000, h = 100, o = 3, a = relu
- i = 1.000.000, h = 100, o = 3, a = softmax
- i =100 , h = 1.000.000, o = 3, a = softmax
- i = 3, h = 1.000.000, o = 100, a = relu
Applied AI with Deep Learning Week 04 Quiz Answers
Quiz : Methods of parallel neural network training
Q1. Which of the following is NOT a way to parallelize neural network training?
- data parallelism
- intra-model parallelism
- inter-model parallelism
- pipelined parallelism
- model-free parallelism
Q2. In which time window(s), the GPU sits idle, waiting for the next data to arrive?
- C and S
- C,T and S
- C and T
- T and S
Q3. What is the role of the Parameter Server in Data Parallelism?
- Distributing the data for training
- Averaging the gradients it receives from the different workers and sending them back
- Scheduling the workload on the compute cluster