Feature Engineering Coursera Quiz Answers

Apache Beam and Cloud Dataflow Quiz Answers

Q1. Which of these accurately describes the relationship between Apache Beam and Cloud Dataflow?

Apache Beam is the API for data pipeline building in java or python and Cloud Dataflow is the implementation and execution framework.

Q2. TRUE or FALSE: The Filter method can be carried out in parallel and autoscaled by the execution framework:

True

Q3. What is the purpose of a Cloud Dataflow connector?

Connectors allow you to output the results of a pipeline to a specific data sink like Bigtable, Google Cloud Storage, flat file, BigQuery, and more…

Q4. Below you’ll find a Cloud Dataflow preprocessing graph. Correctly identify the terms for A, B, and C

A is a data source, B are transformation steps, and C is a data sink

Q5. To run a pipeline you need something called a __

runner

Q6. Your development team is about to execute this code block. What is your team about to do?

We are compiling our Cloud Dataflow pipeline written in Java and are submitting it to the cloud for execution

Q7. TRUE or FALSE: A ParDo acts on all items at once (like a Map in MapReduce)

False

Feature crosses Quiz Answers

Q1. You are building a model to predict the number of points (“margin”) by which Team A will beat Team B in a basketball game. Your input features are (1) whether or not it is a home game for Team A (2) the average number of points Team A scored in its past 7 games and (3) the average number of points Team B scored in its past 7 games. Which of these is a linear model suitable for machine learning?

Ans: margin = b + w1 is_home_game + w2 avg_points_A + w3 avg_points_B, margin = (avg_points_A – avg_points_B), margin = w1 is_home + w2 * (avg_points_A – avg_points_B)^3

Q2. Feature crosses are more common in modern machine learning because:

Ans: Feature crosses memorize, and that is okay only if you have extremely large datasets.

Q3. The function tf.feature_column.crossed_column requires:

Ans: A list of categorical or bucketized features

Q4. You might create an embedding of a feature cross in order to:

Ans: Identify similar sets of inputs for clustering, Reuse weights learned in one problem in another problem, Create a lower-dimensional representation of the input space

Preprocessing and Feature Creation Quiz Answers

Q1. You are training a model to predict how long it will take to sell a house. The list price of the house, with numeric 20,000 to 500,000 values, is one of the inputs to the model. Which of these is a good practice?

Ans: Rescale the real valued feature like a price to a range from 0 to 1

Q2. Which of these tools are commonly used for data pre-processing? (Select 3 correct responses)

Ans: BigQuery, Apache Beam, TensorFlow

Q3. Which one of these is NOT something you would commonly do in data preprocessing?

Ans: Tune your ML model hyperparameters

Q4. In your TensorFlow model you are calculating the distance between two points on a map as a new feature. How do you ensure the preprocessing you’re doing for model training is also do the exact same way in prediction?

Ans: Wrap features in training/evaluation input function AND wrap features in serving input function:

Q5. The below code preprocesses the latitude and longitude using feature columns. What is the point of the 38.0 and 42.0 in the column buckets?

Ans: Latitudes must be between 38 and 42 will be discretized into the specified number of bins.

Q6. What are two advantages of using TensorFlow to preprocess your code instead of building an Apache Beam pipeline? (Select two correct responses)

In TensorFlow the same pipelines can be used in both training and serving
In TensorFlow you will have access to helper APIs to help automatically bucketize and process features instead of writing your own java or python code

Q7. What is one key advantage of preprocessing your features using Apache Beam?

Ans: The same code you use to preprocess features in training and evaluation can also be used in serving

Preprocessing with Cloud Dataprep Quiz Answers

Q1. What are some of the advantages to exploring datasets with a UI tool like Cloud Dataprep?

Dataprep uses Dataflow behind-the-scenes and you can create your transformations in a UI tool instead of writing Java or Python
Dataprep has a number of transformation steps available that you can chain together as part of a recipe
Dataprep supports outputting your data into BigQuery, Google Cloud Storage, or flat files

Q2. TRUE or FALSE: You can automatically setup pipelines to run at defined intervals with Cloud Dataprep

Ans: True

Raw Data and Features Quiz Answers

Q1. What are the characteristics of a good feature?

Related to the objective
Have enough examples in the data
Be numeric with meaningful magnitude
Knowable at prediction time

Q2. I want to build a model to predict whether Team A will win its basketball game against Team B. I will train my model on features computed on historical basketball games. One of my features is how many games this season Team A has won. How should I compute this feature?

Ans: Compute num_games_won / num_games_played until the N-1 th game in order to train with the label for the N th game

Q3. I want to build a model to predict whether Team A will win its basketball game against Team B. Which of these attributes (computed on historical basketball games) are good features? Assume that these features are all computed appropriately without taking into account non-causal data.

How often Team A wins games
How often Team A wins games where its opponent is ranked in the top 10
How many of the last 7 games that Team A played that it has won

Representing Features questions Quiz Answers

Q1. What is one-hot encoding?

Ans: One hot encoding is a process by which categorical variables are converted into a form that could be provided to neural networks to do a better job in prediction.

Q2. Which of these offers the best way to encode categorical data that is already indexed, i.e. has integers in [0-N]?

Ans: tf.feature_column.categorical_column_with_identity

Q3. What do you use the tf.feature_column.bucketized_column function for?

Ans: To discretize floating point values into a smaller number of categorical bins

tf.transform Quiz Answers

Q1. What is a common use case for where you would use tf.transform instead of a Cloud Dataflow pipeline or regular TensorFlow for preprocessing?

Ans: You want to scale your inputs based on min/max value in the dataset, You need to compute the vocabulary list for categorical columns from your training dataset

Q2. The Analyze phase of tf.transform is carried out via:

Ans: A Python Beam pipeline that contains TensorFlow functions

Q3. The Transform phase of tf.transform is carried out via:

Inside a TensorFlow serving input function during prediction
Inside a Beam pipeline for training and in TensorFlow during evaluation
Inside a Beam pipeline for evaluation and in TensorFlow during training
A Beam pipeline while creating a training or evaluation dataset

Get all Course Quiz Answers of Machine Learning with TensorFlow on Google Cloud Specialization

How Google does Machine Learning Coursera Quiz Answers

Launching into Machine Learning Coursera Quiz Answers

Feature Engineering Coursera Quiz Answers

Art and Science of Machine Learning Coursera Quiz Answers

Feature Engineering Coursera Quiz Answers – Networking Funda

Table of Contents