Getting and Cleaning Data Quiz Answers – Practice & Graded Solution

Q: Are the Getting and Cleaning Data quiz answers accurate?

Yes, these answers are carefully reviewed and verified to ensure they align with the latest course content on data collection and cleaning.

Q: Can I use these answers for both practice and graded quizzes?

Absolutely! These answers are designed for both practice quizzes and graded assessments, ensuring you're fully prepared for all evaluations.

Q: Does this guide cover all modules of the course?

Yes, this guide provides answers for every module, ensuring complete coverage of the entire course content.

Q: Will this guide help me improve my data cleaning skills?

Yes, beyond providing quiz answers, this guide reinforces key data cleaning concepts such as handling missing data, data transformation, and formatting techniques that are critical for preparing data for analysis.

Welcome to your comprehensive guide for Getting and Cleaning Data quiz answers! Whether you’re working through practice quizzes to improve your data cleaning skills or preparing for graded quizzes to test your knowledge, this guide is here to help.

Covering all course modules, this resource will teach you essential data collection, cleaning, and preparation techniques using R, including handling missing values, formatting, and transforming raw data into clean, analyzable formats.

Getting and Cleaning Data Quiz Answers for All Modules

Getting and Cleaning Data Week 01 Quiz Answers

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

How many properties are worth $1,000,000 or more?

Explanation: To answer this, you would load the dataset using R and then filter or query the data to count the number of properties worth $1,000,000 or more. The answer would depend on the content of the dataset.

Answer: 164

Q2. Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the “tidy data” principles does this variable violate?

Explanation: The tidy data principles emphasize that data should be in a format where each variable is in its own column. The variable FES may violate a tidy data principle depending on how it is represented in the dataset.

Answer: Tidy data has one variable per column.

Q3. Download the Excel spreadsheet on the Natural Gas Acquisition Program here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx

Read rows 18-23 and columns 7-15 into R and assign the result to a variable called dat. What is the value of:

RCopyEditsum(dat$Zip*dat$Ext,na.rm=T)

Explanation: This involves downloading and reading specific rows and columns of an Excel file, followed by a sum calculation on the Zip and Ext columns. The result of this sum is asked.

Answer: 184585

Q4. Read the XML data on Baltimore restaurants from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

How many restaurants have zipcode 21231?

Explanation: This involves parsing XML data to extract restaurant information, followed by filtering to count restaurants with the specific zipcode 21231.

Answer: 181

Q5. The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

Using the fread() command load the data into an R object DT. The following are ways to calculate the average value of the variable pwgtp15 broken down by sex. Using the data.table package, which will deliver the fastest user time?

Explanation: The question asks which method is the fastest when calculating the average for the variable pwgtp15 by SEX using the data.table package.

Answer: DT[,mean(pwgtp15),by=SEX]

Getting and Cleaning Data Week 02 Quiz Answers

Q1. Register an application with the Github API here:
https://github.com/settings/applications
Access the API to get information on your instructor’s repositories (hint: this is the URL you want: “https://api.github.com/users/jtleek/repos“). Use this data to find the time that the datasharing repo was created. What time was it created?

Explanation: The task requires accessing the GitHub API to find the creation time of a specific repository. Using an HTTP request via the API, you can retrieve the metadata for the repository and extract the creation time.

Answer: 2012-06-20T18:39:06Z

Q2. The sqldf package allows for execution of SQL commands on R data frames. We will use the sqldf package to practice the queries we might send with the dbSendQuery command in RMySQL.
Download the American Community Survey data and load it into an R object called acs:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

Which of the following commands will select only the data for the probability weights pwgtp1 with ages less than 50?

Explanation: The task requires writing an SQL query using sqldf to select data where the age (AGEP) is less than 50 and the pwgtp1 variable is included in the result.

Answer: sqldf("select * from acs where AGEP < 50 and pwgtp1")

Q3. Using the same data frame you created in the previous problem, what is the equivalent function to unique(acs$AGEP)?

Explanation: The task asks for the equivalent of the R unique() function using SQL queries with sqldf.

Answer: sqldf("select distinct AGEP from acs")

Q4. How many characters are in the 10th, 20th, 30th, and 100th lines of HTML from this page:
http://biostat.jhsph.edu/~jleek/contact.html

Explanation: This requires downloading the HTML content, extracting specific lines, and calculating the number of characters in each line.

Answer: 45 31 7 25

Q5. Read this dataset into R and report the sum of the numbers in the fourth of the nine columns:
https://d396qusza40orc.cloudfront.net/getdata%2Fwksst8110.for

Explanation: This task involves reading a fixed-width file into R and calculating the sum of the numbers in the fourth column.

Answer: 35824.9

Getting and Cleaning Data Week 03 Quiz Answers

Q1. The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
and load the data into R. The code book, describing the variable names is here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable agricultureLogical. Apply the which() function like this to identify the rows of the data frame where the logical vector is TRUE.
which(agricultureLogical)
What are the first 3 values that result?

Explanation: This task requires loading a dataset, creating a logical condition based on specific columns, and then using the which() function to extract rows matching that condition.

Answer: 125, 238, 262

Q2. Using the jpeg package, read in the following picture of your instructor into R:
https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg
Use the parameter native=TRUE. What are the 30th and 80th quantiles of the resulting data? (Some Linux systems may produce an answer 638 different for the 30th quantile)

Explanation: This task involves reading in an image and calculating specific quantiles of the image data using R.

Answer: -15259150 -10575416

Q3. Load the Gross Domestic Product data for the 190 ranked countries in this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv
Load the educational data from this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv
Match the data based on the country shortcode. How many of the IDs match? Sort the data frame in descending order by GDP rank (so United States is last). What is the 13th country in the resulting data frame?

Explanation: This task involves matching two datasets on a common field and sorting the resulting data by GDP rank to determine a specific country’s position.

Answer: 234 matches, 13th country is Spain

Q4. What is the average GDP ranking for the “High income: OECD” and “High income: nonOECD” group?

Explanation: This question requires calculating the average GDP ranking for two income groups based on the provided data.

Answer: 32.96667, 91.91304

Q5. Cut the GDP ranking into 5 separate quantile groups. Make a table versus Income.Group. How many countries are Lower middle income but among the 38 nations with the highest GDP?

Explanation: This task involves dividing the GDP ranking into quantiles and creating a contingency table with the Income.Group variable to identify countries that are classified as “Lower middle income” but fall within the top 38 by GDP.

Answer: 5

Getting and Cleaning Data Week 04 Quiz Answers

Q1. The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
and load the data into R. The code book, describing the variable names is here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
Apply strsplit() to split all the names of the data frame on the characters “wgtp”. What is the value of the 123 element of the resulting list?

Explanation: This question involves using the strsplit() function to split variable names in the data frame, and accessing a specific element from the resulting list.

Answer: “wgtp” “15”

Q2. Load the Gross Domestic Product data for the 190 ranked countries in this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv
Remove the commas from the GDP numbers in millions of dollars and average them. What is the average?

Explanation: This question involves cleaning the GDP data (removing commas) and then calculating the average GDP value for the countries.

Answer: 379596.5

Q3. In the data set from Question 2, what is a regular expression that would allow you to count the number of countries whose name begins with “United”? Assume that the variable with the country names in it is named countryNames. How many countries begin with United?

Explanation: This task uses the grep() function with a regular expression to match country names that start with “United.”

Answer: grep(“^United”,countryNames), 3

Q4. Load the Gross Domestic Product data for the 190 ranked countries in this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv
Load the educational data from this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv
Match the data based on the country shortcode. Of the countries for which the end of the fiscal year is available, how many end in June?

Explanation: This task involves matching datasets based on a common field (country shortcode) and counting how many countries have their fiscal year ending in June.

Answer: 7

Q5. You can use the quantmod (http://www.quantmod.com/) package to get historical stock prices for publicly traded companies on the NASDAQ and NYSE. Use the following code to download data on Amazon’s stock price and get the times the data was sampled:

scssCopyEditlibrary(quantmod)  
amzn = getSymbols("AMZN",auto.assign=FALSE)  
sampleTimes = index(amzn)

How many values were collected in 2012? How many values were collected on Mondays in 2012?

Explanation: This question involves using the quantmod package to download stock price data for Amazon and then filtering the data for specific conditions (the year 2012 and Mondays).

Answer: 252, 47

Frequently Asked Questions (FAQ)

Are the Getting and Cleaning Data quiz answers accurate?

Yes, these answers are carefully reviewed and verified to ensure they align with the latest course content on data collection and cleaning.

Can I use these answers for both practice and graded quizzes?

Absolutely! These answers are designed for both practice quizzes and graded assessments, ensuring you’re fully prepared for all evaluations.

Does this guide cover all modules of the course?

Yes, this guide provides answers for every module, ensuring complete coverage of the entire course content.

Will this guide help me improve my data cleaning skills?

Yes, beyond providing quiz answers, this guide reinforces key data cleaning concepts such as handling missing data, data transformation, and formatting techniques that are critical for preparing data for analysis.

Conclusion

We hope this guide to Getting and Cleaning Data Quiz Answers helps you build a strong foundation in data cleaning and succeed in your course. Bookmark this page for easy access and share it with your classmates. Ready to master data preparation techniques and ace your quizzes? Let’s get started!

Sources: Getting and Cleaning Data

Get All Quiz Answers of Data Science Specialization >>

The Data Scientist’s Toolbox Quiz Answers

R Programming Quiz Answers

Getting and Cleaning Data Quiz Answers

Exploratory Data Analysis Quiz Answers

Reproducible Research Quiz Answers

Statistical Inference Quiz Answers

Regression Models Quiz Answers

Practical Machine Learning Quiz Answers

Developing Data Products Quiz Answers