Table of Contents
Preparing for the Google Cloud Professional Data Engineer Exam Week 3 Quiz Answers
Designing and Building Data Processing Systems Quiz Answers
Q1. A common assembly of technologies for Data Engineering is:
- Dataproc, Cloud SQL, and Datastore
- Pub/Sub, Dataflow, and BigQuery
Q2. When you prepare using tables that compare different technologies…
- Read from the table-up. So that if you see a keyword in a question, you will recognize which associated technology in the heading row is a candidate for the solution.
- You don’t really need to know the characteristics of each technology. So it is safe to ignore tables. They have too much information anyway.
Q3. What are the three kinds of streaming windows discussed?
- Fixed, Sliding, and Sessions.
- Elastic, Average, and Wide-column.
Preparing for the Google Cloud Professional Data Engineer Exam Week 4 Quiz Answers
Operationalizing Machine Learning Models Quiz Answers
Q1. Machine Learning steps include:
- Download data to a spreadsheet, Sort data, Generate insights.
- Collect data, Organize data, and Create a model.
Q2. What distinguishes problems that are appropriate for Machine Learning.1 point
- Hard problems that reduce to complex counting.
- Making decisions that require insight and judgement.
Preparing for the Google Cloud Professional Data Engineer Exam Week 5 Quiz Answers
Preparing for Reliability, Policy, and Security Quiz Answers
Q1. Which one is the recommended best practice for Identity and Access Management?
- Assign roles to groups, then administer group membership.
Q2. What point was made about monitoring and displaying parts of a solution?
- That some services provide their own display and dashboard features, such as TensorBoard for TensorFlow, Google Cloud’s operations suite, and others.
Q3. What is one difference between Failover and Disaster Recovery?
- Failover has very short downtime. Disaster Recovery tolerates may incur delays before service is restored.
Q4. What is a coincidental benefit of distributing work as a scaling strategy?
- If a single unit goes out of service, it is a smaller portion of the overall service, so it increases reliability.
Graded Practice Exam Quiz Answers
Q1. Storage of JSON files with occasionally changing schema, for ANSI SQL queries.
- Store in BigQuery. Select “Automatically detect” in the Schema section.
Q2. Low-cost one-way one-time migration of two 100-TB file servers to Google Cloud; data will be frequently accessed and only from Germany.
- Use Transfer Appliance. Transfer to a Cloud Storage Standard bucket.
Q3. Cost-effective backup to Google Cloud of multi-TB databases from another cloud including monthly DR drills.
- Use Storage Transfer Service. Transfer to Cloud Storage Nearline bucket.
Q4. 250,000 devices produce a JSON device status every 10 seconds. How do you capture event data for outlier time series analysis?
- Capture data in Cloud Bigtable. Use the Cloud Bigtable cbt tool to display device outlier data.
Q5. Event data in CSV format to be queried for individual values over time windows. Which storage and schema to minimize query costs?
- Use Cloud Bigtable. Design tall and narrow tables, and use a new row for each single event version.
Q6. Customer wants to maintain investment in an existing Apache Spark code data pipeline.
- Dataproc
Q7. Host a deep neural network machine learning model on Google Cloud. Run and monitor jobs that could occasionally fail.
- Use AI Platform to host your model. Monitor the status of the Jobs object for ‘failed’ job states.
Q8. Cost-effective way to run non-critical Apache Spark jobs on Dataproc?
- Set up a cluster in standard mode with high-memory machine types. Add 10 additional preemptible worker nodes.
Q9. Promote a Cloud Bigtable solution with a lot of data from development to production and optimize for performance.
- Change your Cloud Bigtable instance type from Development to Production, and set the number of nodes to at least 3. Verify that the storage type is SSD.
Q10. As part of your backup plan, you want to be able to restore snapshots of Compute Engine instances using the fewest steps.
- Use the snapshots to create replacement instances as needed.
Q11. You want to minimize costs to run Google Data Studio reports on BigQuery queries by using prefetch caching.
- Set up the report to use the Owner’s credentials to access the underlying data in BigQuery, and verify that the ‘Enable cache’ checkbox is selected for the report.
Q12. A Data Analyst is concerned that a BigQuery query could be too expensive.
- Use the SELECT clause to limit the amount of data in the query. Partition data by date so the query can be more focused.
Q13. BigQuery data is stored in external CSV files in Cloud Storage; as the data has increased, the query performance has dropped.
- Import the data into BigQuery for better performance.
Q14. Source data is streamed in bursts and must be transformed before use.
- Use Pub/Sub to buffer the data, and then use Dataflow for ETL.
Q15. Calculate a running average on streaming data that can arrive late and out of order.
- Use Pub/Sub and Dataflow with Sliding Time Windows.
Q16. Testing a Machine Learning model with validation data returns 100% correct answers.
- The model is overfit. There is a problem.
Q17. A client is using Cloud SQL database to serve infrequently changing lookup tables that host data used by applications. The applications will not modify the tables. As they expand into other geographic regions they want to ensure good performance. What do you recommend?
- Read replicas
Q18. A client wants to store files from one location and retrieve them from another location. Security requirements are that no one should be able to access the contents of the file while it is hosted in the cloud. What is the best option?
- Client-side encryption
Q19. Three Google Cloud services commonly used together in data engineering solutions. (Described in this course).
- Pub/Sub, Dataflow, BigQuery
Q20. What is AVRO used for?
- Serialization and de-serialization of data so that it can be transmitted and stored while maintaining an object structure.
Q21. A company has a new IoT pipeline. Which services will make this design work?Select the services that should be used to replace the icons with the number “1” and number “2” in the diagram.
- IoT Core, Pub/Sub
Q20. What is AVRO used for?
- Serialization and de-serialization of data so that it can be transmitted and stored while maintaining an object structure.
Q22. A company wants to connect cloud applications to an Oracle database in its data center. Requirements are a maximum of 9 Gbps of data and a Service Level Agreement (SLA) of 99%.
- Partner Interconnect
Q23. A client has been developing a pipeline based on PCollections using local programming techniques and is ready to scale up to production. What should they do?
- They should use the Dataflow Cloud Runner.
Q24. A company has migrated their Hadoop cluster to the cloud and is now using Dataproc with the same settings and same methods as in the data center. What would you advise them to do to make better use of the cloud environment?
- Store persistent data off-cluster. Start a cluster for one kind of work then shut it down when it is not processing data.
Q25. An application has the following data requirements. 1. It requires strongly consistent transactions. 2. Total data will be less than 500 GB. 3. The data does not need to be streaming or real time. Which data technology would fit these requirements?
- Cloud SQL
All Course Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certificate Quiz Answers
Google Cloud Big Data and Machine Learning Fundamentals Quiz Answers
Modernizing Data Lakes and Data Warehouses with GCP Quiz Answers
Building Batch Data Pipelines on GCP Coursera Quiz Answers
Building Resilient Streaming Analytics Systems on GCP Coursera Quiz Answers
Smart Analytics, Machine Learning, and AI on GCP Coursera Quiz Answers
Preparing for the Google Cloud Professional Data Engineer Exam Quiz Answers