Table of Contents
Exploring and Preparing your Data with BigQuery Week 1 Quiz Answers
Quiz 1: Introduction to Data on Google Cloud
Q1. What is one of the key reasons Google Cloud can scale effectively to query large datasets?
- Users can manually launch and customize as many cloud virtual machines as they need to process larger BigQuery datasets.
- Cloud Storage buckets map to one specific and large hard drive in the cloud.
- Storage and compute power are handled separately and users don’t manage their own data warehouse architecture.
Q2. Select the common reasons why organizations move to the cloud for big data analysis (choose all that apply):
- Querying infrastructure is fully-managed
- Autocompletion of common SQL queries
- Reduced cost compared to on-premises
Q3. What are the elastic storage bins called in Cloud Storage?
- Hard drives
Quiz 2: Module 2 Quiz
Q1. What is the best way to see how much data is being processed by your query in the BigQuery Web UI?
- Click on the Query Validator to see how many bytes will be / are processed
- Wait to see the bill in the Google Cloud Platform billing console
- View the table metadata schema to quickly determine the size of the table to be scanned
- Save the query results to a permanent table and then view the size of that table
Q2. What is a common way in SQL to identify duplicate records?
- Sort the records and look for multiple occurrences visually
- Use COUNT(DISTINCT field) in conjunction with a HAVING clause
- Use COUNT(field) in conjunction with a HAVING clause
- Manually export and remove the records in a spreadsheet program
Q3. What are the common tasks a data analyst must perform?
- Setup hardware
- Build Dataflow Pipelines
Q4. True or False: BigQuery stores the audit logs of each query ran against your datasets
Q5. Which of these most accurately describes the goal of BigQuery?
- BigQuery is a petabyte-scale data warehouse. It is optimized for analytic workloads which equates a large amount of reading and processing data.
- BigQuery is a highly scalable operational datastore that you can use in place of your relational database back-ends for applications. It is optimized for transactional workloads which equates to a large volume of writing and updating records.
6. Which one of these three roles benefit from having a background in SQL for processing data?
- Data Analyst
- Data Scientist
- Data Engineer
- All of the above
Quiz 3: Module 3 Quiz
Q1. What is the reason to not use SELECT * to explore your dataset?
- The BigQuery preview data table feature is faster and free to preview records
- Selecting all columns is an expensive operation performance-wise, especially with no filters
- Selecting all columns, even with WHERE clause filters, will scan your entire dataset and incur charges for all bytes processed. This is a pitfall when returning potentially large columns (like very long string fields, e.g. Product Reviews).
- All of the above
Q2. Which one of these SQL clauses cannot operate against an alias that was just defined in your SELECT statement?
- ORDER BY
- GROUP BY
Q3. What is the core principle behind how BigQuery can effectively scale to process billions of rows of data?
- Massive parallel processing of queries across distributed resources
- Solid state hard drives
- Proprietary SQL language
Quiz 4: Module 4 Quiz
Q1. Which of the following statements about BigQuery pricing is FALSE?
- Using a LIMIT clause on your queries could still scan all the records
- Storage costs are automatically reduced after a table has been not modified for a period of time
- Users incur charges for processing data from queries and for storing data in BigQuery managed storage
- All BigQuery jobs incur charges and are billed directly in the Google Cloud Platform billing area
Q2. What is NOT one of the best practices for cost optimizing your queries?
- Use LIMIT queries with WHERE clause filters to limit the amount of data scanned
- Avoid SELECTing all columns in your data, use only what you need
- Filter your data as early as possible so you are not doing work on records that are later filtered out
Quiz 5: Module 5 Quiz
Q1. Which of the following statements about Cloud Dataprep is FALSE?
- Running Cloud Dataprep jobs create new Cloud Dataflow jobs that you can then monitor in Google Cloud Platform
- Accessing Cloud Dataprep column details will reveal the number of unique and duplicative records in a particular column
- Cloud Dataprep loads all of your data into the Transformer — a view where your can quickly explore the quality of your dataset
- Cloud Dataprep uses data transformation Recipes to build an overall data processing Flow
Q2. What is a challenge of having a dataset that is tall (many rows) but thin (few columns)?
- There are too few records to uncover common patterns or trends
- You are limited in the amount of insight you can get from a single record
- You will have the presence of many duplicative records in taller datasets
Q3. True or False: Running Cloud Dataprep jobs automatically creates a new Cloud Spanner job behind-the-scenes to process your data
Q4. True or False: You can automatically output the results of your Cloud Dataprep job into a new BigQuery table
Q5. If you are analyzing your dataset for completeness, what is one you will always ask?
- Have I been given the complete set of records to analyze or is this a subset?
- What units of measurement are we using for our numerical values?
- What is the correct list of values to compare this string field against?
Q6. What are the key goals of using the query validator in the BigQuery Web UI
- Autocomplete partially written queries
- View the amount of data you are going to process before running your query
- Get insight into how to fix broken queries
Get All Course Quiz Answers of From Data to Insights with Google Cloud Specialization
Creating New BigQuery Datasets and Visualizing Insights Quiz Answers
Achieving Advanced Insights with BigQuery Coursera Quiz Answers
Applying Machine Learning to your Data with GCP Quiz Answers