Hadoop Platform and Application Framework Quiz Answers

All Weeks Hadoop Platform and Application Framework Quiz Answers

Hadoop Platform and Application Framework Week 01 Quiz Answers

Basic Hadoop Stack Quiz Answers

Q1. What does SQOOP stand for?

System Quality Object Oriented Process
SQL to Hadoop
Does not stand for anything specific
‘Sqooping’ the data.

Q2. What is not part of the basic Hadoop Stack ‘Zoo’?

Pig
Horse
Elephant
Hive

Q3. What is considered to be part of the Apache Basic Hadoop Modules?

HDFS
Yarn
MapReduce
Impala

Q4. What are the two major components of the MapReduce layer?

TaskManager
JobTracker
NameNode
DataNode

Q5. What does HDFS stand for?

Hadoop Data File System
Hadoop Distributed File System
Hadoop Data File Scalability
Hadoop Datanode File Security

Q6. What are the two majority types of nodes in HDFS?

DataNode
BlockNode
NameNode
RackNode
MetaNode

Q7. What is Yarn used as an alternative to in Hadoop 2.0 and higher versions of Hadoop?

Pig
Hive
ZooKeeper
MapReduce
HDFS

Q8. Could you run an existing MapReduce application using Yarn?

No
Yes

Q9. What are the two basic layers comprising the Hadoop Architecture?

ZooKeeper and MapReduce
HDFS and Hive
MapReduce and HDFS
Impala and HDFS

Q10. What are Hadoop advantages over a traditional platform?

Scalability
Reliability
Flexibility
Cost

Hadoop Platform and Application Framework Week 02 Quiz Answers

Overview of Hadoop Stack Quiz Answers

Q1. Choose features introduced in Hadoop2 HDFS

Multiple DataNodes
Heterogenous storage including SSD, RAM_DISK
Multiple namespaces
MapReduce
HDFS Federation

Q2. In Hadoop2 HDFS a namespace can generate block IDs for new blocks without coordinating with other namespaces.

True
False

Q3. This is a new feature in YARN:

High Availability ResourceManager
web services REST APIs
MapReduce
ApplicationMasters

Q4. Apache Tez can run independent of YARN

True
False

Q5. In Hadoop2 with YARN

ResourceManagers are running on every compute node
Each application has its own ApplicationMaster
Only MapReduce jobs can be run
Each application has its own ResourceManager

Hadoop Execution Environment Quiz Answers

Q1. Apache Spark cannot operate without YARN?

False

Q2. Apache Tez can support dynamic DAG changes?

True

Q3. Give an example of an execution framework that supports cyclic data flow?

Spark

Q4. The Fairshare scheduler can support queues/sub-queues?

True

Q5. The Capacity Scheduler can use ACLs to control security?

True

Q6. Mark choices that apply for Apache Spark:

Can run integrated with YARN
Supports in memory computing
Can be accessed/used from high level languages like Java, Scala, Python, and R.

Q7. Which of the following choices apply for Apache Tez?

Supports complex directed acyclic graph (DAG) of tasks
Supports in memory caching of data
Improves resource usage efficiency

Hadoop Applications Quiz Answers

Q1. Check all databases/stores applications that can run within Hadoop

HBase
Cassandra

Q2. Name the high level language that is a main part of Apache Pig?

Pig Latin

Q3. Apache Pig can only be run using scripts

False

Q4. Check options that are methods of using/accessing Hive.

Hcatalog
Beeline
WebHcat

Q5. Check features that apply for HBase.

Non-relational distributed database
Consistency
Compression

Q6. List methods of accessing HBase

Apache HBase shell
HBase External API
HBase API

Hadoop Platform and Application Framework Week 03 Quiz Answers

HDFS Architecture Quiz Answers

Q1. HDFS is strictly POSIX compliant.

False

Q2. Following issues may be caused by lot of small files in HDFS

NameNode memory usage increases significantly
Network load decreases
Number of map tasks need to process the same amount of data will be larger.

Q3. 10gb / 128megabyte ~ 80

4.20GB

Q5. What is the first step in a write process from a HDFS client?

Immediately contact the NameNode

Q6. HDFS NameNode is not rack aware when it places the replica blocks.

False

HDFS performance,tuning, and robustness Quiz Answers

Q1. Name the configuration file which holds HDFS tuning parameters

hdfs-site.xml

Q2. Name the parameter that controls the replication factor in HDFS:

dfs.replication

Q3. Check answers that apply when replication is lowered

HDFS is less robust
less likely make data local to more workers
more space

Q4. Check answers that apply when NameNode fails to receive heartbeat from a DataNode

DataNode is marked dead
No new I/O is sent to particular DataNode that missed heartbeat check
Blocks below replication factor are re-replicated on other DataNodes

Q5. How is data corruption mitigated in HDFS

checksums are computed on file creation and stored in HDFS namespace for verification when data is retrieved.

Accessing HDFS Quiz Answers

Q1. Which of the following are valid access mechanisms for HDFS

Can be accessed via hdfs binary/script
Accessed via Java API
Accessed via HTTP
Mounted as a filesystem using NFS Gateway

Q2. Which of the following is not a valid command to handle data in HDFS?

hdfs dfs -mkdir /user/test
hdfs dfs -ls /
cp -r /user/data /user/test/
hdfs fsck /user/test/test.out

Q3. Which of the following commands will give information on the status of DataNodes

hdfs dfs -status datanodes
hdfs -status
hdfs datanode -status
hdfs dfsadmin -report

Q4. Which of the following is not a method in FSDataInputStream

read
readFully
hflush
getPos

Q5. You can only read data in HDFS via HTTP

True
False

Q6. What are some webhdfs REST API related parameters in HDFS

dfs.webhdfs.enabled
dfs.blocksize
dfs.replication
dfs.web.authentication.kerberos.keytab

Hadoop Platform and Applicat ion Framework Week 04 Quiz Answers

Lesson 1 Review Quiz Answers

Q1. Which of these kinds of data motivated the Map/Reduce framework?

Large number of internet documents that need to be indexed for searching by words

Q2. What is the organizing data structure for map/reduce programs?

A list of identification keys and some value associated with that identifier

Q3. In map/reduce framework, which of these logistics does Map/Reduce do with the map function?

Distribute map to cluster nodes, run map on the data partitions at the same time

Q4. Map/Reduce performs a ‘shuffle’ and grouping. That means it…

Shuffles pairs into different partitions according to the key value, and sorts within the partitions by key.

Q5. In the word count example, what is the key?

The word itself.

Q6.Streaming map/reduce allows mappers and reducers to be written in what languages:

All of the above

Q7. The assignment asked you to run with 2 reducers. When you use 2 reducers instead of 1 reducer, what is the difference in global sort order?

With 1 reducer, but not 2 reducers, the word counts are in global sort order by word.

Hadoop Platform and Applicat ion Framework Week 05 Quiz Answers

Spark Lesson 1 Quiz Answers

Q1. Apache Spark was developed in order to provide solutions to shortcomings of another project, and eventually replace it. What is the name of this project?

MapReduce

Q2. Why is Hadoop MapReduce slow for iterative algorithms?

It needs to read off disk for every iteration

Q3. What is the most important feature of Apache Spark to speedup iterative algorithms?

Caching datasets in memory

Q4. Which other Hadoop project can Spark rely to provision and manage the cluster of nodes?

YARN

Q5. When Spark reads data out of HDFS, what is the process that interfaces directly with HDFS?

Executor

Q6. Under which circumstances is preferable to run Spark in Standalone mode instead of relying on YARN?

When you only plan on running Spark jobs

Spark Lesson 2 Quiz Answers

Q1. How can you create an RDD? Mark all that apply

Reading from a local file available both on the driver and on the workers
Reading from HDFS
Apply a transformation to an existing RDD

Q2. How does Spark make RDDs resilient in case a partition is lost?

Tracks the history of each partition and reruns what is needed to restore it

Q3. Which of the following sentences about flatMap and map are true?

flatMap accepts a function that returns multiple elements, those elements are then flattened out into a continuous RDD.
map transforms elements with a 1 to 1 relationship, 1 input – 1 output

Q4. Check all wide transformations

Repartition, even if it triggers a shuffle, can improve performance of your pipeline by balancing the data distribution after a heavy filtering operation

Spark Lesson 3 Quiz Answers

Q1. Check all true statements about the Directed Acyclic Graph Scheduler

The DAG is managed by the cluster manager
A DAG is used to track dependencies of each partition of each RDD

Q2. Why is building a DAG necessary in Spark but not in MapReduce?

Because MapReduce always has the same type of workflow, Spark needs to accommodate diverse workflows.

Q3. What are the differences between an action and a transformation? Mark all that apply

A transformation is from worker nodes to worker nodes, an action between worker nodes and the Driver (or a data source like HDFS)
A transformation is lazy, an action instead executes immediately.

Q4. Generally, which are good stages to mark a RDD for caching in memory?

The first RDD, just after reading from disk, so we avoid reading from disk again.
At the start of an iterative algorithm.

Q5. What are good cases for using a broadcast variable? Mark all that apply

Copy a small/medium sized RDD for a join
Copy a large lookup table to all worker nodes
Copy a large configuration dictionary to all worker nodes

Q6. We would like to count the number of invalid entries in this example dataset:

invalid = sc.accumulator(0)
d = sc.parallelize(["3", "23", "S", "99", "TT"]).foreach(count_invalid)

What would be a good implementation of the count_invalid function?

def count_invalid(element):
    try:
        int(element)
    except:
        invalid.add(1)

Get All Course Quiz Answers of Entrepreneurship Specialization

Entrepreneurship 1: Developing the Opportunity Quiz Answers

Entrepreneurship 2: Launching your Start-Up Quiz Answers

Entrepreneurship 3: Growth Strategies Coursera Quiz Answers

Entrepreneurship 4: Financing and Profitability Quiz Answers

Menu

Categories

Follow Us

Hadoop Platform and Application Framework Quiz Answers

Gut Check: Exploring Your Microbiome Coursera Quiz Answers

Biology Meets Programming: Bioinformatics for Beginners Quiz Answers

Leave a Reply Cancel reply

Menu

Categories

Follow Us

Hadoop Platform and Application Framework Quiz Answers

All Weeks Hadoop Platform and Application Framework Quiz Answers

Table of Contents

Hadoop Platform and Application Framework Week 01 Quiz Answers

Basic Hadoop Stack Quiz Answers

Hadoop Platform and Application Framework Week 02 Quiz Answers

Overview of Hadoop Stack Quiz Answers

Hadoop Execution Environment Quiz Answers

Hadoop Applications Quiz Answers

Hadoop Platform and Application Framework Week 03 Quiz Answers

HDFS Architecture Quiz Answers

HDFS performance,tuning, and robustness Quiz Answers

Accessing HDFS Quiz Answers

Hadoop Platform and Applicat ion Framework Week 04 Quiz Answers

Lesson 1 Review Quiz Answers

Hadoop Platform and Applicat ion Framework Week 05 Quiz Answers

Spark Lesson 1 Quiz Answers

Spark Lesson 2 Quiz Answers

Spark Lesson 3 Quiz Answers

Get All Course Quiz Answers of Entrepreneurship Specialization

Gut Check: Exploring Your Microbiome Coursera Quiz Answers

Biology Meets Programming: Bioinformatics for Beginners Quiz Answers

Leave a Reply Cancel reply