Table of Contents
Week 01: Quiz Answers
Q1. What is the name of the object used to tokenize sentences?
- Tokenizer
- WordTokenizer
- CharacterTokenizer
- TextTokenizer
Q2. What is the name of the method used to tokenize a list of sentences?
- fit_on_texts(sentences)
- tokenize(sentences)
- tokenize_on_text(sentences)
- fit_to_text(sentences)
Q3. Once you have the corpus tokenized, what’s the method used to encode a list
of sentences to use those tokens?
- texts_to_tokens(sentences)
- text_to_sequences(sentences)
- text_to_tokens(sentences)
- texts_to_sequences(sentences)
Q4. When initializing the tokenizer, how to you specify a token to use for unknown words?
- out_of_vocab=
- unknown_token=
- oov_token=
- unknown_word=
Question 5: If you don’t use a token for out of vocabulary words, what happens at
encoding?
- The word isn’t encoded, and is replaced by a zero in the sequence
- The word is replaced by the most common token
- The word isn’t encoded, and is skipped in the sequence
- The word isn’t encoded, and the sequencing ends
Q6. If you have a number of sequences of different lengths, how do you ensure
that they are understood when fed into a neural network?
- Make sure that they are all the same length using the pad_sequences method of the tokenizer
- Use the pad_sequences object from the tensorflow.keras.preprocessing.sequence namespace
- Specify the input layer of the Neural Network to expect different sizes with dynamic_length
- Process them on the input layer of the Neural Netword using the pad_sequences property
Q7. If you have a number of sequences of different length, and call pad_sequences
on them, what’s the default result?
- Nothing, they’ll remain unchanged
- They’ll get cropped to the length of the shortest sequence
- They’ll get padded to the length of the longest sequence by adding zeros to the beginning of shorter ones
- They’ll get padded to the length of the longest sequence by adding zeros to the end of shorter
- ones
Q8. When padding sequences, if you want the padding to be at the end of the
sequence, how do you do it?
- Call the padding method of the pad_sequences object, passing it ‘post’
- Pass padding=’post’ to pad_sequences when initializing it
- Call the padding method of the pad_sequences object, passing it ‘after’
- Pass padding=’after’ to pad_sequences when initializing it
Week 02: Quiz Answers
Q1. What is the name of the TensorFlow library containing common data that you
can use to train and test neural networks?
- TensorFlow Datasets
- TensorFlow Data
- TensorFlow Data Libraries
- There is no library of common data sets, you have to use your own
Q2. How many reviews are there in the IMDB dataset and how are they split?
- 60,000 records, 50/50 train/test split
- 50,000 records, 80/20 train/test split
- 60,000 records, 80/20 train/test split
- 50,000 records, 50/50 train/test split
Q3. How are the labels for the IMDB dataset encoded?
- Reviews encoded as a number 1-10
- Reviews encoded as a number 0-1
- Reviews encoded as a boolean true/false
- Reviews encoded as a number 1-5
Q4. What is the purpose of the embedding dimension?
- It is the number of dimensions required to encode every word in the corpus
- It is the number of words to encode in the embedding
- It is the number of dimensions for the vector representing the word encoding
- It is the number of letters in the word, denoting the size of the encoding
Q5. When tokenizing a corpus, what does the num_words=n parameter do?
- It specifies the maximum number of words to be tokenized, and picks the most common ‘n’ words
- It specifies the maximum number of words to be tokenized, and stops tokenizing when it reaches n
- It errors out if there are more than n distinct words in the corpus
- It specifies the maximum number of words to be tokenized, and picks the first ‘n’ words that were tokenized
Q6. To use word embeddings in TensorFlow, in a sequential layer, what is the
name of the class?
- tf.keras.layers.Embedding
- tf.keras.layers.WordEmbedding
- tf.keras.layers.Word2Vector
- tf.keras.layers.Embed
Q7. IMDB Reviews are either positive or negative. What type of loss function
- should be used in this scenario?
- Categorical crossentropy
- Binary crossentropy
- Adam
- Binary Gradient descent
Q8. When using IMDB Sub Words dataset, our results in classification were poor.
Why?
- Sequence becomes much more important when dealing with subwords, but we’re ignoring word positions
- The sub words make no sense, so can’t be classified
- Our neural network didn’t have enough layers
- We didn’t train long enough
Week 03: Quiz Answers
Q1: Why does sequence make a large difference when determining the semantics of
language?
- Because the order of words doesn’t matter
- Because the order in which words appear dictate their meaning
- It doesn’t
- Because the order in which words appear dictate their impact on the meaning of the sentence
Q2. How do Recurrent Neural Networks help you understand the impact of
sequence on meaning?
- They look at the whole sentence at a time
- They shuffle the words evenly
- They carry meaning from one cell to the next
- They don’t
Q3. How does an LSTM help understand the meaning when words that qualify each
Others aren’t necessarily beside each other in a sentence?
- They shuffle the words randomly
- They load all words into a cell state
- Values from earlier words can be carried to later ones via a cell state
- They don’t
Q4. What keras layer type allows LSTMs to look forward and backward in a
sentence?
- Bilateral
- Unilateral
- Bothdirection
- Bidirectional
Q5. What’s the output shape of a bidirectional LSTM layer with 64 units?
- (128,1)
- (128,None)
- (None, 64)
- (None, 128)
Q6. When stacking LSTMs, how do you instruct an LSTM to feed the next one in the
sequence?
- Ensure that return_sequences is set to True only on units that feed to another LSTM
- Ensure that return_sequences is set to True on all units
- Do nothing, TensorFlow handles this automatically
- Ensure that they have the same number of units
Q7. If a sentence has 120 tokens in it, and a Conv1D with 128 filters with a Kernal
size of 5 is passed over it, what’s the output shape?
- (None, 120, 128)
- (None, 116, 128)
- (None, 116, 124)
- (None, 120, 124)
Q8. What’s the best way to avoid overfitting in NLP datasets?
- Use LSTMs
- Use GRUs
- Use Conv1D
- None of the above
Week 04: Quiz Answers
Q1. What is the name of the method used to tokenize a list of sentences?
- tokenize(sentences)
- tokenize_on_text(sentences)
- fit_on_texts(sentences)
- fit_to_text(sentences)
Q2. If a sentence has 120 tokens in it, and a Conv1D with 128 filters with a Kernal
size of 5 is passed over it, what’s the output shape?
- (None, 120, 128)
- (None, 120, 124)
- (None, 116, 128)
- (None, 116, 124)
Q3. What is the purpose of the embedding dimension?
- It is the number of words to encode in the embedding
- It is the number of dimensions required to encode every word in the corpus
- It is the number of letters in the word, denoting the size of the encoding
- It is the number of dimensions for the vector representing the word encoding
Q4. IMDB Reviews are either positive or negative. What type of loss function should be used in this scenario?
- Adam
- Binary Gradient descent
- Categorical crossentropy
- Binary crossentropy
Q5. If you have a number of sequences of different lengths, how do you ensure
that they are understood when fed into a neural network?
- Use the pad_sequences object from the tensorflow.keras.preprocessing.sequence namespace
- Make sure that they are all the same length using the pad_sequences method of the tokenizer
- Process them on the input layer of the Neural Network using the pad_sequences property
- Specify the input layer of the Neural Network to expect different sizes with dynamic_length
Q6. When predicting words to generate poetry, the more words predicted the more
likely it will end up gibberish. Why?
- It doesn’t, the likelihood of gibberish doesn’t change
- Because the probability that each word matches an existing phrase goes down the more words you create
- Because the probability of prediction compounds, and thus increases overall
- Because you are more likely to hit words not in the training set
Q7. What is a major drawback of word-based training for text generation instead
of character-based generation?
- Word based generation is more accurate because there is a larger body of words to draw from
- Character based generation is more accurate because there are less characters to predict
- There is no major drawback, it’s always better to do word-based training
- Because there are far more words in a typical corpus than characters, it is much more memory intensive
Q8. How does an LSTM help understand the meaning when words that qualify each
Others aren’t necessarily beside each other in a sentence?
- They load all words into a cell state
- They don’t
- They shuffle the words randomly
- Values from earlier words can be carried to later ones via a cell state
Next Course Quiz Answers >>
Sequences, Time Series, and Prediction
<< Previous Course Quiz Answers
Convolutional Neural Networks in TensorFlow
All Course Quiz Answers of DeepLearning.AI TensorFlow Developer Professional Certificate
Course 02: Convolutional Neural Networks in TensorFlow
Course 03: Natural Language Processing in TensorFlow
Course 04: Sequences, Time Series, and Prediction