Data Engineering and Machine Learning using Spark Quiz Answers

Data Engineering and Machine Learning using Spark Week 01 Quiz Answers

Graded Quiz: Spark for Data Engineering

Q1. Select the option where all four statements about streaming data characteristics are correct.

  • Data is generated in finite, small batches; often originates from more than one source; is often available as a complete data set; requires incremental processing .
  • Data is generated incrementally; often originates from more than one source; is unavailable as a complete data set; requires incremental processing.
  • Data is generated incrementally; often originates from more than one source; is unavailable as a complete data set; requires batch processing.
  • Data is generated continuously; often originates from more than one source; is unavailable as a complete data set; requires incremental processing.

Q2. Select the data sink option that is not fault-tolerant and that is recommended for debugging only.

  • Console and Memory
  • Files
  • Foreach and ForeachBatch
  • Kafka

Q3. Select the answer with the options that best completes the following statement:

Apache Spark Structured Streaming processes a data stream with the Spark SQL engine _______________.

  • Extended SQL APIs
  • Dataset and DataFrame APIs
  • RDD APIs
  • Structured Streaming specific APIs

Q4. Select the website where you can find and download the GraphFrames package.

  • On the sparkpackages.org website
  • On the spark-packages.org website.
  • On the Spark.com website
  • On the GraphFrames.com website

Q5. Identify which options correctly describe a directed graph and an undirected graph. (Multiple answers)

  • A directed graph contains edges with a single direction between two vertices, indicating a one-way relationship, illustrated using lines without arrows.Data Engineering and Machine Learning using Spark Quiz Answers
  • Undirected graphs have edges representing a relationship without a direction, illustrated using lines with arrows.Data Engineering and Machine Learning using Spark Quiz Answers
  • Undirected graphs have edges representing a relationship without a direction, illustrated using lines without arrows.Data Engineering and Machine Learning using Spark Quiz Answers
  • A directed graph contains edges with a single direction between two vertices, indicating a one-way relationship, illustrated using lines with arrows.Data Engineering and Machine Learning using Spark Quiz Answers

Q6. Select the option that lists the correct order of these ETL workflow items.

Step 1: The first data processing step loads a Parquet file to create a DataFrame with a “Telephone number” column.

Step 2: Data stored in the “Telephone” column is cleaned and transformed into three columns to separate the country code, the area code, and the local phone number.

Step 3: A data processing step creates a second DataFrame with other information, such as age, from a database.

Step 4: These two DataFrames are joined and loaded into the data warehouse for further analysis.

  • Step 4, Step 2, Step 1, Step 3
  • Step 1, Step 3, Step 2, Step 4
  • Step 1, Step 4, Step 3, Step 2
  • Step 1, Step 2, Step 3, Step 4

Q7. Select the answers that define and describe Graph Theory. (Multiple answers)

  • Graph theory for Apache Spark is the study of graphs generated from parametric specifications.
  • The graph is a construct that contains a set of vertices with pairwise edges that connect one vertex to another.
  • The graph is a construct that contains an X, Y, and Z-axis.
  • Graph theory is the mathematical study of modeling pairwise relationships between objects.

Q8. Select the options that define watermarking. (Multiple answers)

  • Updates results after initial data processing.
  • Enables the inclusion of late-arriving data stream processing
  • Is the process that manages and tags first-arriving data
  • Is the process that manages late data

Q9. Select the statements that are true about using GraphFrames. (Multiple Answers)

  • Is ideal for modeling data with connecting relationships and computes relationship strength and direction
  • Provides one DataFrame for graph vertices and one DataFrame for edges that can be used with SparkSQL for analysis
  • Comes with popular built-in graph algorithms for use with the edge and vertex DataFrames
  • Performs Motif finding, which searches the graph for structural patterns. Motif finding is supported in GraphFrames with the `find()` method that uses domain specific language (DSL) to specify the search query in terms of edges and vertices.

Q10. Select the built-in data sources from which Spark can extract data.

  • Parquet
  • JDBC
  • Microsoft Excel
  • Apache ORC

Data Engineering and Machine Learning using Spark Week 02 Quiz Answers

Graded Quiz: SparkML

Q1. Select the best definition of a machine learning system.

  • A machine learning system consists of already trained data models that predict results on previously unseen data. ​
  • A machine learning system trains data models and uses that information to calculate results on the known data.
  • A machine learning system consists of already trained data models that predict results on known data. ​
  • A machine learning system applies a specific machine learning algorithm to train data models. After training the model, the system infers or “predicts” results on previously unseen data. ​

Q2. Which of the following options are true about Spark ML inbuilt utilities?

  • Spark ML inbuilt utilities includes a statistics package.
  • Spark ML utilities help during the intermediate steps of data processing, cleaning, and building models.
  • Spark ML inbuilt utilities includes a linear algebra package.
  • Spark ML inbuilt utilities includes the Feature module.

Q3. Select the statements that are true about Spark’s support for machine learning data sources.

  • Has standard libraries to support images and LIBSVM data types
  • Supports both feature vector and label column data
  • LIBSVM loads the ”libsvm” data files and creates a DataFrame with two columns including the feature vector and label​.
  • Images are not a common data source

Q4. How do you perform supervised machine learning classification on Apache Spark? ​

  • The Spark ML library provides the spark.ml.classification library for classifications​. ​
  • The Spark ML library provides the spark.classification library for classifications​
  • The Spark ML library provides the spark.ml.regression library for regressions ​
  • The Spark ML library provides the spark.regression.library for regressions

Q5. Select the statements that are true for classification using Apache Spark.

  • Classification is a form of an implicit function approximation where the model predicts real valued outputs for a given input​.
  • Classification examples include weather predictions, stock market price predictions, house value estimation, and others.
  • The Spark ML model predicts each object’s target category or “class.”
  • Producing a prediction from a discrete set of possible outcomes from the task is called classification.​

Q6. Select the statements that are true about regression using Apache Spark ML.

  • The predicted value is usually a continuous real number, such as a float or integer​
  • Examples of regression analysis include Weather predictions, stock market price predictions, house value estimation, and others​.
  • Examples of regression analysis include predicting a sports tournament winner, heads, or tails on a coin toss, classifying images with a pre-set number of distinct categories​
  • Regression is a form of an implicit function approximation where the model predicts real valued outputs for a given input.

Q7. Select the answers that correctly fill in the blank. Unsupervised learning _________.

  • Does not require explicit labels mapped to features​
  • Requires explicit labels mapped to features​
  • Automatically learns patterns and latent spaces in the data​
  • Is a subset of machine learning algorithms

Q8. View the following code samples and place the code in the order needed to perform clustering using Spark ML

#1 Perform predictions on test data​

test_data = spark.read.format(“libsvm”).load(”test_data.txt”)​

predictions = model.transform(test_data)

#2 Create a model and train it​

kmeans = KMeans().setK(5) ​

model = kmeans.fit(data)

#3 Load data​

data = spark.read.format(“libsvm”).load(”data.txt”)​

  • #2, #3, #1
  • #1, #2, #3
  • #3, #1, #2​
  • #3, #2, #1

Q9. Select the answer that correctly fills in the blank. The Spark MLlib provides a clustering library located at _______________

  • ​ (clustering.spark)
  • (spark.clustering)​
  • (spark.ml.clustering)
  • (clustering.ml.spark)

Q10. Select the clustering algorithms for which Spark MLlib provides functions.

  • Gaussian Mixture Models
  • k-means
  • Early Dirichlet Allocation
  • Latent Dirichlet Allocation
Get all Course Quiz Answers of IBM Data Engineering Professional Certificate

Introduction to Data Engineering Coursera Quiz Answers

Python for Data Science, AI & Development Coursera Quiz Answers

Introduction to Relational Databases (RDBMS) Quiz Answers

Databases and SQL for Data Science with Python Quiz Answers

Introduction to NoSQL Databases Coursera Quiz Answers

Team Networking Funda
Team Networking Funda

We are Team Networking Funda, a group of passionate authors and networking enthusiasts committed to sharing our expertise and experiences in networking and team building. With backgrounds in Data Science, Information Technology, Health, and Business Marketing, we bring diverse perspectives and insights to help you navigate the challenges and opportunities of professional networking and teamwork.

Leave a Reply

Your email address will not be published. Required fields are marked *