Introduction to Big Data with Spark and Hadoop Quiz Answers

Get All Week Introduction to Big Data with Spark and Hadoop Quiz Answers

Introduction to Big Data with Spark and Hadoop Week 01 Quiz Answers

Graded Quiz: Introduction to Big Data

Q1. Which of these statements describe big data? Check all that apply.

  • Big Data is relatively consistent and is stored in JSON or XML forms..
  • Big Data is mostly located in storage within Enterprises and Data Centers.
  • Big Data arrives continuously at enormous speed from multiple sources.
  • Data is generated in huge volumes and can be structured, semi-structured, or unstructured.

Q2. Select the options that illustrate the use of big data technologies.

  • Personal assistants including Google, Alexa, and Siri
  • Social media advertising
  • IoT Systems
  • A company’s customer email campaign with clients selected by a manager

Q3. Which of the following capabilities are quantifiable advantages of parallel processing?

  • Parallel processing of Big Data increases processing times compared to linear processing.
  • Parallel processing can process Big Data in a fraction of the time compared to linear processing.
  • You can add and remove execution nodes as and when required, significantly reducing infrastructure costs.
  • Parallel processing processes instructions simultaneously within the same execution nodes, keeping memory and processing requirements low even while processing large volumes of data.

Q4. Select the statement that identifies all the data types associated with Big Data.

  • Structured, semi-structured, and unstructured data are all associated with Big Data.
  • Unstructured data is not associated with Big Data.
  • Only unstructured data is associated with Big Data.
  • Semi-structured data is not associated with Big Data.

Q5. Select the option that includes all the different types of tools required for Big Data processing.

  • Analytics and visualization, business intelligence, cloud service providers, NoSQL databases, and programming languages and their tools.
  • Data technologies, analytics and visualization, NoSQL databases, and programming languages and their tools
  • Analytics and visualization, business intelligence, NoSQL databases, and programming languages and their tools
  • Data technologies, analytics and visualization, business intelligence, cloud service providers, NoSQL databases, and programming languages and their tools

Introduction to Big Data with Spark and Hadoop Week 02 Quiz Answers

Graded Quiz: Introduction to Hadoop

Q1. MapReduce has two tasks, map and reduce. Together, what do these two tasks do?

  • Creates a unique key
  • Aggregates and computes
  • Organize data
  • Sort data

Q2. Which external component is the first stage of operation in the Hadoop ecosystem?

  • Ingest data
  • Store data
  • Access data
  • Process and analyze

Q3. Nodes are single systems that are responsible for storing and processing data. Which node is known as a data node in HDFS?

  • Primary node
  • Secondary node
  • Storage node
  • Name node

Q4. What is the maximum data size Hive can handle?

  • Terabyte
  • Petabyte
  • Gigabyte
  • Unlimited

Q5. Which of the HBase components communicates directly with the client?

  • Region server
  • Zookeeper
  • HMaster
  • Region

Introduction to Big Data with Spark and Hadoop Week 03 Quiz Answers

Graded Quiz: Introduction to Apache Spark

Q1. Apache Spark uses distributed data processing and iterative analysis. Parallel and Distributed computing are similar. What is the difference between these computing types?

  • Distributed computing utilizes each processor’s own memory
  • Parallel computing utilizes shared memory
  • Parallel computing works best for small data sets
  • Both computing types are exactly the same

Q2. Which of the following statements defines functional programming?

  • Imperative in nature
  • Shares functionality of Java Script
  • Follows mathematical function format
  • Emphasizes “What” instead of “How-to”

Q3. The three Apache Spark components are data storage, compute interface, and cluster management framework. In which order does the data flow through these components?

  • Data flows from the compute interface to various nodes for distributed tasks, then goes to the Hadoop file system
  • Data flows from the Hadoop file system, into the computing interface and then into different nodes for distributed
  • Data flows from a Hadoop file system into different nodes for distributed task but then flows to the APIs
  • Data flows from API into different nodes for parallel tasks, and then into a Hadoop file system

Q4. DataFrames are comparable to tables in relational databases or a data frame in which programming languages?

  • Java
  • R/Python
  • RDDs
  • NoSQL

Q5. There are three ways to create an RDD in Spark. Which method involves Hadoop-supported file systems like HDFS, Cassandra, or HBase?

  • Apply transformation on existing RDDs.
  • Create an external or local file.
  • Code a functional program.
  • Apply the parallelize function to an existing collection.

Introduction to Big Data with Spark and Hadoop Week 04 Quiz Answers

Graded Quiz: Introduction to Data Frames & SparkSQL

Q1. Select the statements that are true about a Directed Acyclic Graph, known as a DAG.

  • In Apache Spark, RDDs are represented by the vertices of a DAG while the transformations and actions are represented by directed edges.
  • A DAG is a tabular data structure with rows and columns
  • A DAG is a data structure with edges and vertices
  • If a node goes down, Spark replicates the DAG and restores the node.
  • Every new edge of a DAG is obtained from an older vertex.

Q2. Which function would you apply to create a dataset from a sequence?

  • toDS()
  • seqDS()
  • Create()
  • DSRdd()

Q3. Which of the following features belong to Tungsten?

  • Manages memory explicitly and does not rely on the JVM object model or garbage collection​.
  • Generates virtual function dispatches​
  • Places intermediate data in CPU registers​
  • Prohibits Loop unrolling.

Q4. Select the answer that lists the order in which a typical data engineer would perform operations on Apache Spark while adhering to best practices.

  • Analyze, Read, Load, Transform, and Write
  • Read, Analyze, Load, Transform, and Write
  • Read, Analyze, Transform, Load and Write
  • Analyze, Read, Transform, Load and Write

Q5. Based on the lesson content, what data sources does Apache Spark SQL support natively?

  • Hive tables
  • Parquet files
  • JSON files
  • NoSQL databases

Introduction to Big Data with Spark and Hadoop Week 05 Quiz Answers

Graded Quiz 01: Spark Architecture

Q1. A Spark application has two processes, driver and executor process. Where can the driver process be run? Select all that apply.

  • Another machine as a client to the cluster
  • Driver
  • Cluster node
  • Executor

Q2. The Cluster Manager communicates with a cluster to acquire resources for an application. Which type of Cluster Manager is recommended for setting simple clusters?

  • Kubernetes
  • Spark Standalone
  • Apache Mesos
  • Apache Hadoop YARN

Q3. The spark-submit script that is included with Spark to submit applications has several options you can use. Which option will show the available options per cluster manager?

  • ‘deploy-mode’
  • `–executor-cores`
  • `./bin/spark-submit –help`
  • `–class <full-class-name>`

Graded Quiz 02: Spark Runtime Environments

Q1. What is one of the advantages of using Spark on IBM Cloud? Select all that apply.

  • Easy to configure local cluster nodes
  • Pre-existing default configuration
  • Better communication for local cluster nodes
  • Enterprise-grade security

Q2. Spark properties have precedence and are merged into a final configuration before running the application. Select the statement that describes the order of precedence.

  • Set configurations in the spark-defaults.conf file, set the spark-submit configuration and lastly ​ perform programmatic configuration.
  • Perform programmatic configuration, set configurations in the spark-defaults.conf file​, and lastly set the spark-submit configuration.
  • Set the spark-submit configuration, set configurations in the spark-defaults.conf file, and lastly ​ perform programmatic configuration.
  • Perform programmatic configuration, set spark-submit configuration and lastly set configurations in the spark-defaults.conf file​.

Q3. What is the first command to run when submitting a Spark application to a Kubernetes cluster?

  • ‘–deploy-mode client’
  • ‘spark.kubernetes’
  • Set the ‘–master’ option to the Kubernetes API server and port
  • ‘–conf spark.kubernetes.driver.pod.name’

Introduction to Big Data with Spark and Hadoop Week 06 Quiz Answers

Graded Quiz: Introduction to Monitoring & Tuning

Q1. Select the option that includes the available tabs within the Apache Spark User Interface.

  • Jobs, Stages, Storage, Environment, and SQL
  • Jobs, Stages, Storage, Executor, and SQL
  • Jobs, Stages, Storage, Environment, Executor, and SQL
  • Jobs, Storage, Environment, Executor, and SQL

Q2. Which action triggers job creation and schedules the tasks?

  • The schedule() action
  • The collect() action
  • The create() action
  • The jobs() action

Q3. Syntax, serialization, data validation, and other user errors can occur when running Spark applications. View the numbered list and select the option that places this numbered list in the order of how Spark handles application errors.

  1. ​View the driver event log to locate the cause of an application failure.
  2. If all attempts to run the task fail, Spark reports an error to the driver and the application is terminated.​
  3. If a task fails due to an error, Spark can attempt to rerun the task for a set number of retries.​
  • 2,1,3
  • 3,2,1
  • 3,1,2
  • 1,2,3

Q4. Select an option to filI in the blank. If a DataFrame is not cached, then different random features would be generated with each action on the DataFrame, because the function _____ is called each time. ​

  • `cache()`
  • `regenerate()`
  • `random()`
  • ‘rand()​’

Q5. Which command specifies the number of executor cores for a Spark standalone cluster per executor process?

  • Use the command ‘–executor-process-cores’ followed by the number of cores
  • Use the command ‘–process–executor–cores’ followed by the number of cores
  • Use the command ‘–per–executor—cores’ followed by the number of cores.
  • Use the command ‘–executor-cores’ followed by the number of cores. ​
Get all Course Quiz Answers for the IBM Data Engineering Professional Certificate

Introduction to Data Engineering Coursera Quiz Answers

Python for Data Science, AI & Development Coursera Quiz Answers

Introduction to Relational Databases (RDBMS) Quiz Answers

Databases and SQL for Data Science with Python Quiz Answers

Share your love

Newsletter Updates

Enter your email address below and subscribe to our newsletter

2 Comments

  1. The Answers from Week 5 are not visible/not made bold.
    there is also a final quiz in course which is not included in website.

Leave a Reply

Your email address will not be published. Required fields are marked *