Book Appointment Now

Introduction to Big Data Quiz Answers – 100% Correct Answers

Introduction to Big Data Quiz Answers

Quiz 1: Why Big Data and Where Did it Come From

Q1. Which of the following is an example of big data utilized in action today?

  • The Internet
  • Social Media
  • Wi-Fi Networks
  • Individual, Unconnected Hospital Database

Q2. What reasoning was given for the following: why is the “data storage to price ratio” relevant to big data?

  • It isn’t, it was just an arbitrary example on big data usage.
  • Larger storage means easier accessibility to big data for every user because it allows users to download in bulk.
  • Companies can’t afford to own, maintain, and spend the energy to support large data storage unless the cost is sufficiently low.
  • Access of larger storage becomes easier for everyone, which means client-facing services require very large data storage.

Q3. What is the best description of personalized marketing enabled by big data?

  • Being able to use the data from each customer for marketing needs.
  • Marketing to each customer on an individual level and suiting to their needs.
  • Being able to obtain and use customer information for specific groups and utilize them for marketing needs.

Q4. Of the following, which are some examples of personalized marketing related to big data?

  • Facebook revealing posts that cater towards similar interests.
  • A survey that asks your age and markets to you a specific brand.
  • News outlets gathering information from the internet in order to report them to the public.

Q5. What is the workflow for working with big data?

  • Theory -> Models -> Precise Advice
  • Big Data -> Better Models -> Higher Precision
  • Extrapolation -> Understanding -> Reproducing

Q6. Which is the most compelling reason why mobile advertising is related to big data?

  • Mobile advertising in and of itself is always associated with big data.
  • Mobile advertising benefits from data integration with location which requires big data.
  • Mobile advertising allows massive cellular/mobile texting to a wide audience, thus providing large amounts of data.
  • Since almost everyone owns a cell/mobile phone, the mobile advertising market is large and thus requires big data to contain all the information.

Q7. What are the three types of diverse data sources?

  • Machine Data, Map Data, and Social Media
  • Information Networks, Map Data, and People
  • Machine Data, Organizational Data, and People
  • Sensor Data, Organizational Data, and Social Media

Q8. What is an example of machine data?

  • Social Media
  • Weather station sensor output.
  • Sorted data from Amazon regarding customer info.

Q9. What is an example of organizational data?

  • Satellite Data
  • Social Media
  • Disease data from Center for Disease Control.

Q10. Of the three data sources, which is the hardest to implement and streamline into a model?

  • People
  • Machine Data
  • Organizational Data

Q11. Which of the following summarizes the process of using data streams?

  • Theory -> Models -> Precise Advice
  • Integration -> Personalization -> Precision
  • Big Data -> Better Models -> Higher Precision
  • Extrapolation -> Understanding -> Reproducing

Q12. Where does the real value of big data often come from?

  • Size of the data.
  • Combining streams of data and analyzing them for new insights.
  • Using the three major data sources: Machines, People, and Organizations.
  • Having data-enabled decisions and actions from the insights of new data.

Q13. What does it mean for a device to be “smart”?

  • Must have a way to interact with the user.
  • Connect with other devices and have knowledge of the environment.
  • Having a specific processing speed in order to keep up with the demands of data processing.

Q14. What does the term “in situ” mean in the context of big data?

  • Accelerometers.
  • In the situation
  • The sensors used in airplanes to measure altitude.
  • Bringing the computation to the location of the data.

Q15. Which of the following are reasons mentioned for why data generated by people are hard to process?

  • Very unstructured data.
  • They cannot be modeled and stored.
  • The velocity of the data is very high.
  • Skilled people to analyze the data are hard to come by.

Q16. What is the purpose of retrieval and storage; pre-processing; and analysis in order to convert multiple data sources into valuable data?

  • To enable ETL methods.
  • Designed to work like the ETL process.
  • To allow scalable analytical solutions to big data.
  • Since the multi-layered process is built into the Neo4j database connection.

Q17. Which of the following are benefits for organization-generated data?

  • Higher Sales
  • High Velocity
  • Improved Safety
  • Better Profit Margins
  • Customer Satisfaction

Q18. What are data silos and why are they bad?

  • Highly unstructured data. Bad because it does not provide meaningful results for organizations.
  • Data produced from an organization that is spread out. Bad because it creates unsynchronized and invisible data.
  • A giant centralized database to house all the data production within an organization. Bad because it hinders opportunity for data generation.
  • A giant centralized database to house all the data produces within an organization. Bad because it is hard to maintain as highly structured data.

Q19. Which of the following are the benefits of data integration?

  • Monitoring of data.
  • Adds value to big data.
  • Increase data availability.
  • Unify your data system.
  • Reduce data complexity.
  • Increase data collaboration.

Quiz 2: V for the V’s of Big Data

Q1. Amazon has been collecting review data for a particular product. They have realized that almost 90% of the reviews were mostly a 5/5 rating. However, of the 90%, they realized that 50% of them were customers who did not have proof of purchase or customers who did not post serious reviews about the product. Of the following, which is true about the review data collected in this situation?

  • High Veracity
  • High Volume
  • Low Veracity
  • High Valence
  • Low Valence
  • Low Volume

Q2. As mentioned in the slides, what are the challenges to data with a high valence?

  • Reliability of Data
  • Difficult to Integrate
  • Complex Data Exploration Algorithms

Q3. Which of the following is the 6 V’s in big data?

  • Variety
  • Volume
  • Valence
  • Value
  • Veracity
  • Velocity
  • Vision

Q4. What is the veracity of big data?

  • The size of the data.
  • The connectedness of data.
  • The speed at which data is produced.
  • The abnormality or uncertainties of data.

Q5. What are the challenges of data with high variety?

  • Hard to integrate.
  • The quality of data is low.
  • Hard in utilizing group event detection.
  • Hard to perform emergent behavior analysis.

Q6. Which of the following is the best way to describe why it is crucial to process data in real-time?

  • More accurate.
  • Prevents missed opportunities.
  • More expensive to batch process.
  • Batch processing is an older method that is not as accurate as real-time processing.

Q7. What are the challenges with big data that has high volume?

  • Effectiveness and Cost
  • Storage and Accessibility
  • Speed Increase in Processing
  • Cost, Scalability, and Performance

Quiz 3: Data Science 101

Q1. Which of the following are parts of the 5 P’s of data science and what is the additional P introduced in the slides?

  • People
  • Purpose
  • Product
  • Perception
  • Process
  • Platforms
  • Programmability

Q2. Which of the following are part of the four main categories to acquire, access, and retrieve data?

  • Text Files
  • Web Services
  • Remote Data
  • NoSQL Storage
  • Traditional Databases

Q3. What are the steps required for data analysis?

  • Investigate, Build Model, Evaluate
  • Classification, Regression, Analysis
  • Regression, Evaluate, Classification
  • Select Technique, Build Model, Evaluate

Q4. Of the following, which is a technique mentioned in the videos for building a model?

  • Validation
  • Evaluation
  • Analysis
  • Investigation

Q5. What is the first step in finding the right problem to tackle in data science?

  • Define the Problem
  • Define Goals
  • Assess the Situation
  • Ask the Right Questions

Q6. What is the first step in determining a big data strategy?

  • Business Objectives
  • Collect Data
  • Build In-House Expertise
  • Organizational Buy-In

Q7. According to Ilkay, why is exploring data crucial to better modeling?

  • Data exploration…
  • enables a description of data which allows visualization.
  • enables understanding of general trends, correlations, and outliers.
  • leads to data understanding which allows an informed analysis of the data.
  • enables histograms and others graphs as data visualization.

Q8. Why is data science mainly about teamwork?

  • Analytic solutions are required.
  • Engineering solutions are preferred.
  • Exhibition of curiosity is required.
  • Data science requires a variety of expertise in different fields.

Q9. What are the ways to address data quality issues?

  • Remove outliers.
  • Data Wrangling
  • Merge duplicate records.
  • Remove data with missing values.
  • Generate best estimates for invalid values.

Q10. What is done to the data in the preparation stage?

  • Build Models
  • Retrieve Data
  • Select Analytical Techniques
  • Identify Data Sets and Query Data
  • Understanding Nature of Data and Preliminary Analysis

Quiz 4: Foundations for Big Data

Q1. Which of the following is the best description of why it is important to learn about the foundations for big data?

  • Foundations stand the test of time.
  • Foundations is all that is required to show a mastery of big data concepts.
  • Foundations allow for the understanding of practical concepts in Hadoop.
  • Foundations help you revisit calculus concepts required in the understanding of big data.

Q2. What is the benefit of a commodity cluster?

  • Enables fault tolerance
  • Prevents network connection failure
  • Prevents individual component failures
  • Much faster than a traditional super computer

Q3. What is a way to enable fault tolerance?

  • Distributed Computing
  • System Wide Restart
  • Better LAN Connection
  • Data Parallel Job Restart

Q4. What are the specific benefit(s) to a distributed file system?

  • Large Storage
  • High Concurrency
  • Data Scalability
  • High Fault Tolerance

Q5. Which of the following are general requirements for a programming language in order to support big data models?

  • Handle Fault Tolerance
  • Utilize Map Reduction Methods
  • Support Big Data Operations
  • Enable Adding of More Racks
  • Optimization of Specific Data Types

Quiz 5: Intro to MapReduce

Q1. What does IaaS provide?

  • Hardware Only
  • Software On-Demand
  • Computing Environment

Q2. What does PaaS provide?

  • Hardware Only
  • Computing Environment
  • Software On-Demand

Q3. What does SaaS provide?

  • Hardware Only
  • Computing Environment
  • Software On-Demand

Q4. What are the two key components of HDFS and what are they used for?

  • NameNode for block storage and Data Node for metadata.
  • NameNode for metadata and DataNode for block storage.
  • FASTA for genome sequence and Rasters for geospatial data.

Q5. What is the job of the NameNode?

  • For gene sequencing calculations.
  • Coordinate operations and assigns tasks to Data Nodes
  • Listens from DataNode for block creation, deletion, and replication.

Q6. What is the order of the three steps to Map Reduce?

  • Map -> Shuffle and Sort -> Reduce
  • Shuffle and Sort -> Map -> Reduce
  • Map -> Reduce -> Shuffle and Sort
  • Shuffle and Sort -> Reduce -> Map

Q7. What is a benefit of using pre-built Hadoop images?

  • Guaranteed hardware support.
  • Less software choices to choose from.
  • Quick prototyping, deploying, and validating of projects.
  • Quick prototyping, deploying, and guaranteed bug free.

Q8. What is an example of open-source tools built for Hadoop and what does it do?

  • Giraph, for SQL-like queries.
  • Zookeeper, analyze social graphs.
  • Pig, for real-time and in-memory processing of big data.
  • Zookeeper, management system for animal named related components

Q9. What is the difference between low-level interfaces and high-level interfaces?

  • Low level deals with storage and scheduling while high level deals with interactivity.
  • Low level deals with interactivity while high level deals with storage and scheduling.

Q10. Which of the following are problems to look out for when integrating your project with Hadoop?

  • Random Data Access
  • Data Level Parallelism
  • Task Level Parallelism
  • Advanced Alogrithms
  • Infrastructure Replacement

Q11. As covered in the slides, which of the following are the major goals of Hadoop?

  • Enable Scalability
  • Handle Fault Tolerance
  • Provide Value for Data
  • Latency Sensitive Tasks
  • Facilitate a Shared Environment
  • Optimized for a Variety of Data Types

Q12. What is the purpose of YARN?

  • Implementation of Map Reduce.
  • Enables large scale data across clusters.
  • Allows various applications to run on the same Hadoop cluster.

Q13. What are the two main components for a data computation framework that were described in the slides?

  • Node Manager and Container
  • Resource Manager and Container
  • Applications Master and Container
  • Node Manager and Applications Master
  • Resource Manager and Node Manager

Next Quiz Answers >>

Big Data Modeling and Management Systems

All Courses Quiz Answers of Big Data Specialization

Course 01: Introduction to Big Data

Course 02: Big Data Modeling and Management Systems

Course 03: Big Data Integration and Processing

Course 04: Machine Learning With Big Data

Course 05: Graph Analytics for Big Data

Share your love

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *