Table of Contents
All Weeks Big Data Modeling and Management Systems Quiz Answers
Week 1: Quiz Answers
Q1. (Questions 1-3 pertain to the video lecture “Exploring the Relational Data Model of CSV”) What is the approximate population of La Paz County in the state of Arizona for the CENSUS2010POP (column H)? (Choose the best answer.)
Q3. At 2:45 of the video, the Instructor creates a filter for all of the counties in California with a population greater than 1,000,000. However, included in the results is the entire state of California. This anomalous value might skew our analysis if, for example, we wanted to compute the average population of these results. What additional filter might work to resolve this problem?
- Add a filter to detect and remove results which do not include the word “County” in column G.
- Add a filter which finds all counties with population greater than 100,000 AND less than 10,000,000 for column H (CENSUS2010POP).
- Add a filter where the value in column E is greater than 1,000,000.
- None of the above
- (163, 118, 79)
- (134, 145, 46)
- (50, 156, 182)
- (100, 123, 149)
Q7. Is this value likely to be land or ocean?
Q8. (Questions 8 and 9 pertain to the video lecture “Exploring the Semistructured Data Model of JSON”) Given a tweet, what path would you most likely enter to obtain a count of the number of followers for a user?
- None of the above
Week 2: Data Models Quiz
- Excel does not enforce many principles of relational data models.
- Excel is a user program and thus cannot run on a server.
- Excel does not allow algorithms for data manipulation.
- Fixed schema of a particular database.
- A tuple that cannot be reduced.
- A column or row of data. Depends on the context.
- One unit of information that cannot be decomposed.
- Find the shortest path from source node to target node.
- Find the best possible path given two or more optimization criteria where neither constraint can be fully optimized simultaneously.
- Find the optimal path that requires going through specific nodes given by the user.
- High density of nodes at a certain location.
- A neighborhood defined by an integer constant K around a specific node. All K+1 nodes belong in another community.
- A dense amount of edge connections between nodes in a community and a few connections across communities.
- Many anomalous neighborhoods within the same vicinity.
- Computers can easily visualize the data with a tree structure.
- It is not always the case that XML and JSON can be represented as trees.
- Trees take advantage of the parent-child relationship of the data for easy navigation.
- They are only useful for XML data as tree-like structure is apparent with tags. While JSON does not contain a tree-like structure as it contains arrays.
- Enables weighting of the query.
- The ability to normalize vectors allowing probability distributions.
- Enables image searching.
- Results can be ordered by similarity using vector projection.
Q7. For the following questions 7, 8, and 9, suppose a registration website creates data with the following fields for each person registered (note: if the user does not input a value, NULL is stored instead): Name, Date, Address, and Account Number.
Suppose we collect data month by month. Each month, we would have a batch of data containing the fields listed above. At the end of the year, we want to summarize our registrant activities for the entire year, so we would remove redundancies in our data by removing any records with duplicate account numbers from month to month. What type of operation do we use in this scenario?
- Not an Operation
- Account should have at most n digits.
- If we had n duplicate Account Numbers then we will remove n-1 duplicate fields.
- There are no constraints.
- Account Number should be unique.
Q9. Suppose 100 people signup for our system and of the 100 people, 60 of them did not input an address. The system lists the values as NULL for these empty entries in the address field. Would this situation still have structure for our data?
- No because the majority of data do not have a specific field filled, thus our originally defined structure is lost.
- Yes the data has structure because we have placed a structural constraint on the data, thus the data will always have the originally defined structure.
Week 3: Data Formats and Streaming Data Quiz
- There is a one to one correspondence between formatting data and data modeling. For every model of data, there is only one way to store the data.
- There is always one specific schema for storing model data that is the best and preferred method for the specific data representation.
- The data does not necessarily need to be formatted in a way that represents the data model. Just so long as it can be extrapolated.
- Calculating results using real time data otherwise known as streaming data.
- Using static data stored from a real time source in order to process and guide the application.
- Utilizing real time data to compute and change the state of an application continuously.
- Using sensors to manipulate the system, such as a smart car being able to drive by itself using sensors to detect road hazards.
- Small time windows for working with data.
- Data is always utilized for streaming the application.
- Data manipulation is near real time.
- Independent computations that do not rely on previous or future data.
- Always unbounded in sequence, in other words, data is not guaranteed to be in order.
- Does not ping the source interactively for a response upon receiving the data.
- Data is unbounded in size but requires only finite time and space to process it.
- The data is unbounded in size and the size determines the time and space of processing the data.
- The data is finite and requires only finite time and space to process the data.
- Data is finite in size and size determines the time and space of processing the data.
- Accurate and Consistent
- Accurate and Memory Efficient
- Fast and Complex
- Fast and Simple
- A specific method for processing streaming data using special real time processes.
- A specific hardware architecture for a server made specifically for processing real time data.
- A method to process streaming data by utilizing batch processing and real time processing.
- The size and frequency of the streaming data may be too small.
- The size and frequency of the streaming data may be sporadic.
- There may not be data to produce the notion of size and frequency.
- Data lakes house raw data while data warehouses contain pre-formatted data.
- Data lakes contain only files while data warehouses contain only databases.
- Data lakes utilize hierarchical systems while data warehouses use object storage.
- The process where formatted data is given structure when read.
- Another name for data lakes.
- Data is stored as raw data until it is read by an application where the application assigns structure.
- The process where data is pre-formatted prior to being read but the schema is loaded on read.
Q1. The desired characteristics of a BDMS include (select all that apply):
- Narrow range of query sizes
- Continuous data ingestion
- Support for common “Big Data” data types
- Support for ACID
- A full query language
- A flexible semi-structured data model
- it is impossible to have consistency, accuracy, and partial tolerance
- it is necessary to have consistency, accuracy, and partial tolerance
- it is necessary to have consistency, availability, and partition tolerance
- it is impossible to have consistency, availability, and partition tolerance
- The same as ACID.
- To overcome CAP theorem.
- To impose properties on a BDMS in order to guarantee certain results.
- Enables stricter enforcement of ACID type design.
- A special type of data type that can store up to 512 mb of image data.
- A look up table that is stored as a value in the database. Look up table points to actual values in memory.
- A compressed list that is stored within the value of the database.
- A special type of data type that can store hashes that point to multiple attributes.
- Images as values within the database.
- Enables real time data streaming from external sources.
- Support for geospatial data storage and geospatial queries.
- Better equipped for string based search applications.
Q6. What database would be best suited for the following scenario: An app development company is trying to implement a cloud based storage system for their new map-based app. The cloud will manage the longitude and latitude of the data in order to track user location.
- Sorted Sets
- Streaming Video
Next Quiz Answers >>
<< Previous Quiz Answers