Hadoop Platform and Application Framework Quiz Answers

All Weeks Hadoop Platform and Application Framework Quiz Answers

Hadoop Platform and Application Framework Week 01 Quiz Answers

Basic Hadoop Stack Quiz Answers

Q1. What does SQOOP stand for?

  • System Quality Object Oriented Process
  • SQL to Hadoop
  • Does not stand for anything specific
  • ‘Sqooping’ the data.

Q2. What is not part of the basic Hadoop Stack ‘Zoo’?

  • Pig
  • Horse
  • Elephant
  • Hive

Q3. What is considered to be part of the Apache Basic Hadoop Modules?

  • HDFS
  • Yarn
  • MapReduce
  • Impala

Q4. What are the two major components of the MapReduce layer?

  • TaskManager
  • JobTracker
  • NameNode
  • DataNode

Q5. What does HDFS stand for?

  • Hadoop Data File System
  • Hadoop Distributed File System
  • Hadoop Data File Scalability
  • Hadoop Datanode File Security

Q6. What are the two majority types of nodes in HDFS?

  • DataNode
  • BlockNode
  • NameNode
  • RackNode
  • MetaNode

Q7. What is Yarn used as an alternative to in Hadoop 2.0 and higher versions of Hadoop?

  • Pig
  • Hive
  • ZooKeeper
  • MapReduce
  • HDFS

Q8. Could you run an existing MapReduce application using Yarn?

  • No
  • Yes

Q9. What are the two basic layers comprising the Hadoop Architecture?

  • ZooKeeper and MapReduce
  • HDFS and Hive
  • MapReduce and HDFS
  • Impala and HDFS

Q10. What are Hadoop advantages over a traditional platform?

  • Scalability
  • Reliability
  • Flexibility
  • Cost

Hadoop Platform and Application Framework Week 02 Quiz Answers

Overview of Hadoop Stack Quiz Answers

Q1. Choose features introduced in Hadoop2 HDFS

  • Multiple DataNodes
  • Heterogenous storage including SSD, RAM_DISK
  • Multiple namespaces
  • MapReduce
  • HDFS Federation

Q2. In Hadoop2 HDFS a namespace can generate block IDs for new blocks without coordinating with other namespaces.

  • True
  • False

Q3. This is a new feature in YARN:

  • High Availability ResourceManager
  • web services REST APIs
  • MapReduce
  • ApplicationMasters

Q4. Apache Tez can run independent of YARN

  • True
  • False

Q5. In Hadoop2 with YARN

  • ResourceManagers are running on every compute node
  • Each application has its own ApplicationMaster
  • Only MapReduce jobs can be run
  • Each application has its own ResourceManager

Hadoop Execution Environment Quiz Answers

Q1. Apache Spark cannot operate without YARN?

  • False

Q2. Apache Tez can support dynamic DAG changes?

  • True

Q3. Give an example of an execution framework that supports cyclic data flow?

  • Spark

Q4. The Fairshare scheduler can support queues/sub-queues?

  • True

Q5. The Capacity Scheduler can use ACLs to control security?

  • True

Q6. Mark choices that apply for Apache Spark:

  • Can run integrated with YARN
  • Supports in memory computing
  • Can be accessed/used from high level languages like Java, Scala, Python, and R.

Q7. Which of the following choices apply for Apache Tez?

  • Supports complex directed acyclic graph (DAG) of tasks
  • Supports in memory caching of data
  • Improves resource usage efficiency

Hadoop Applications Quiz Answers

Q1. Check all databases/stores applications that can run within Hadoop

  • HBase
  • Cassandra

Q2. Name the high level language that is a main part of Apache Pig?

  • Pig Latin

Q3. Apache Pig can only be run using scripts

  • False

Q4. Check options that are methods of using/accessing Hive.

  • Hcatalog
  • Beeline
  • WebHcat

Q5. Check features that apply for HBase.

  • Non-relational distributed database
  • Consistency
  • Compression

Q6. List methods of accessing HBase

  • Apache HBase shell
  • HBase External API
  • HBase API

Hadoop Platform and Application Framework Week 03 Quiz Answers

HDFS Architecture Quiz Answers

Q1. HDFS is strictly POSIX compliant.

  • False

Q2. Following issues may be caused by lot of small files in HDFS

  • NameNode memory usage increases significantly
  • Network load decreases
  • Number of map tasks need to process the same amount of data will be larger.

Q3. 10gb / 128megabyte ~ 80

  • 4.20GB

Q5. What is the first step in a write process from a HDFS client?

  • Immediately contact the NameNode

Q6. HDFS NameNode is not rack aware when it places the replica blocks.

  • False

HDFS performance,tuning, and robustness Quiz Answers

Q1. Name the configuration file which holds HDFS tuning parameters

  • hdfs-site.xml

Q2. Name the parameter that controls the replication factor in HDFS:

  • dfs.replication

Q3. Check answers that apply when replication is lowered

  • HDFS is less robust
  • less likely make data local to more workers
  • more space

Q4. Check answers that apply when NameNode fails to receive heartbeat from a DataNode

  • DataNode is marked dead
  • No new I/O is sent to particular DataNode that missed heartbeat check
  • Blocks below replication factor are re-replicated on other DataNodes

Q5. How is data corruption mitigated in HDFS

  • checksums are computed on file creation and stored in HDFS namespace for verification when data is retrieved.

Accessing HDFS Quiz Answers

Q1. Which of the following are valid access mechanisms for HDFS

  • Can be accessed via hdfs binary/script
  • Accessed via Java API
  • Accessed via HTTP
  • Mounted as a filesystem using NFS Gateway

Q2. Which of the following is not a valid command to handle data in HDFS?

  • hdfs dfs -mkdir /user/test
  • hdfs dfs -ls /
  • cp -r /user/data /user/test/
  • hdfs fsck /user/test/test.out

Q3. Which of the following commands will give information on the status of DataNodes

  • hdfs dfs -status datanodes
  • hdfs -status
  • hdfs datanode -status
  • hdfs dfsadmin -report

Q4. Which of the following is not a method in FSDataInputStream

  • read
  • readFully
  • hflush
  • getPos

Q5. You can only read data in HDFS via HTTP

  • True
  • False

Q6. What are some webhdfs REST API related parameters in HDFS

  • dfs.webhdfs.enabled
  • dfs.blocksize
  • dfs.replication
  • dfs.web.authentication.kerberos.keytab

Hadoop Platform and Applicat ion Framework Week 04 Quiz Answers

Lesson 1 Review Quiz Answers

Q1. Which of these kinds of data motivated the Map/Reduce framework?

  • Large number of internet documents that need to be indexed for searching by words

Q2. What is the organizing data structure for map/reduce programs?

  • A list of identification keys and some value associated with that identifier

Q3. In map/reduce framework, which of these logistics does Map/Reduce do with the map function?

  • Distribute map to cluster nodes, run map on the data partitions at the same time

Q4. Map/Reduce performs a ‘shuffle’ and grouping. That means it…

  • Shuffles pairs into different partitions according to the key value, and sorts within the partitions by key.

Q5. In the word count example, what is the key?

  • The word itself.

Q6.Streaming map/reduce allows mappers and reducers to be written in what languages:

  • All of the above

Q7. The assignment asked you to run with 2 reducers. When you use 2 reducers instead of 1 reducer, what is the difference in global sort order?

  • With 1 reducer, but not 2 reducers, the word counts are in global sort order by word.

Hadoop Platform and Applicat ion Framework Week 05 Quiz Answers

Spark Lesson 1 Quiz Answers

Q1. Apache Spark was developed in order to provide solutions to shortcomings of another project, and eventually replace it. What is the name of this project?

  • MapReduce

Q2. Why is Hadoop MapReduce slow for iterative algorithms?

  • It needs to read off disk for every iteration

Q3. What is the most important feature of Apache Spark to speedup iterative algorithms?

  • Caching datasets in memory

Q4. Which other Hadoop project can Spark rely to provision and manage the cluster of nodes?

  • YARN

Q5. When Spark reads data out of HDFS, what is the process that interfaces directly with HDFS?

  • Executor

Q6. Under which circumstances is preferable to run Spark in Standalone mode instead of relying on YARN?

  • When you only plan on running Spark jobs

Spark Lesson 2 Quiz Answers

Q1. How can you create an RDD? Mark all that apply

  • Reading from a local file available both on the driver and on the workers
  • Reading from HDFS
  • Apply a transformation to an existing RDD

Q2. How does Spark make RDDs resilient in case a partition is lost?

  • Tracks the history of each partition and reruns what is needed to restore it

Q3. Which of the following sentences about flatMap and map are true?

  • flatMap accepts a function that returns multiple elements, those elements are then flattened out into a continuous RDD.
  • map transforms elements with a 1 to 1 relationship, 1 input – 1 output

Q4. Check all wide transformations

  • Repartition, even if it triggers a shuffle, can improve performance of your pipeline by balancing the data distribution after a heavy filtering operation

Spark Lesson 3 Quiz Answers

Q1. Check all true statements about the Directed Acyclic Graph Scheduler

  • The DAG is managed by the cluster manager
  • A DAG is used to track dependencies of each partition of each RDD

Q2. Why is building a DAG necessary in Spark but not in MapReduce?

  • Because MapReduce always has the same type of workflow, Spark needs to accommodate diverse workflows.

Q3. What are the differences between an action and a transformation? Mark all that apply

  • A transformation is from worker nodes to worker nodes, an action between worker nodes and the Driver (or a data source like HDFS)
  • A transformation is lazy, an action instead executes immediately.

Q4. Generally, which are good stages to mark a RDD for caching in memory?

  • The first RDD, just after reading from disk, so we avoid reading from disk again.
  • At the start of an iterative algorithm.

Q5. What are good cases for using a broadcast variable? Mark all that apply

  • Copy a small/medium sized RDD for a join
  • Copy a large lookup table to all worker nodes
  • Copy a large configuration dictionary to all worker nodes

Q6. We would like to count the number of invalid entries in this example dataset:

invalid = sc.accumulator(0)
d = sc.parallelize(["3", "23", "S", "99", "TT"]).foreach(count_invalid)

What would be a good implementation of the count_invalid function?

def count_invalid(element):
    try:
        int(element)
    except:
        invalid.add(1)

Get All Course Quiz Answers of Entrepreneurship Specialization

Entrepreneurship 1: Developing the Opportunity Quiz Answers

Entrepreneurship 2: Launching your Start-Up Quiz Answers

Entrepreneurship 3: Growth Strategies Coursera Quiz Answers

Entrepreneurship 4: Financing and Profitability Quiz Answers

Team Networking Funda
Team Networking Funda

We are Team Networking Funda, a group of passionate authors and networking enthusiasts committed to sharing our expertise and experiences in networking and team building. With backgrounds in Data Science, Information Technology, Health, and Business Marketing, we bring diverse perspectives and insights to help you navigate the challenges and opportunities of professional networking and teamwork.

Leave a Reply

Your email address will not be published. Required fields are marked *