Table of Contents
Process Data from Dirty to Clean Week 01 Quiz Answers
Practice Quiz-1 Answers
L2 Maintaining data integrity:
Q1. Which process do data analysts use to make data more organized and easier to read?
- Data transfer
- Data manipulation
- Data uniformity
- Data replication
Q2. Fill in the blank: The degree to which data conforms to certain business rules or constraints determines the data’s _____.
Q3. Which of the following is an example of invalid data?
- A mandatory value that has been left blank
- Values for two customers with the same first initial but different last names
- A string data type containing more than one word
- A value that equals the last number in a data range
Practice Quiz-2 Answers
L3 Connect data to objectives:
Q1. Fill in the blank: Data being used for analysis should align with _____ and help answer stakeholder questions.
- project limitations
- current trends
- obsolete projects
- business objectives
Q2. Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?
- Remove data in an unfamiliar date format
- Change all of the dates to the same format
- Leave the dates in their current formats
- Organize the data by country
Q3. When should data analysts think about modifying a business objective? Select all that apply.
- When the data doesn’t align with the original objective
- When they find a row of duplicate data
- When the analysis is taking longer than expected
- When there is not enough data to meet the objective
Practice Quiz-3 Answers
L4 When to stop collecting data:
Q1. What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.
- Continue with the analysis using data from less reliable sources.
- Perform the analysis by finding and using proxy data from other datasets.
- Create and use hypothetical data that aligns with analysis predictions.
- Gather related data on a small scale and request additional time to find more complete data.
Q2. Which of the following are limitations that might lead to insufficient data? Select all that apply.
- Duplicate data
- Data from a single source
- Outdated data
- Data that updates continually
Q3. How can a data analyst eliminate the sampling bias of a population for a study about the most popular ice cream flavors?
- Random sampling
- Job-based sampling
- Geographical sampling
- Gender sampling
Q4. A data analyst wants to find out how many people in Utah have swimming pools. It’s unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?
- Margin of error
- Statistical significance
- Confidence level
Practice Quiz-4 Answers
L5 Testing your data:
Q1. A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?
- Results that are real and not caused by random chance
- Results that are hypothetical and in need of more testing
- Results that are inaccurate and should be ignored
- Results that are unlikely to occur again
Q2. In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?
- The predictions of stakeholders
- The most valuable members of the population
- The trends from other customer surveys
- The entire population
Q3. A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%.
Practice Quiz-5 Answers
L6 Consider the margin of error:
Q1. Fill in the blank: Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population.
Q2. In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population’s true response?
- Between 70% and 80%
- Between 75% and 80%
- Between 73% and 78%
- Between 70% and 75%
Process Data from Dirty to Clean Weekly Challenge 1 Answers
Q1. Which of the following conditions are necessary to ensure data integrity? Select all that apply.
- Statistical power
Q2. What is one potential problem associated with data manipulation that analysts must be aware of?
- Data manipulation can help organize a dataset.
- Data manipulation can separate a dataset among different locations.
- Data manipulation can make a dataset easier to read.
- Data manipulation can introduce errors.
Q3. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst will be able to determine which country was the most populous from 2016 to 2017.
Q4. A data analyst is given a dataset for analysis.
Which of the following has duplicate data?
- Data for Valando on 2/18/2014
- Data for Valando on 1/1/2014
- Data for Symteco on 5/20/2014
- Data for Symteco on 2/21/2014
Q5. A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?
- Data that keeps updating
- Data that’s outdated
- Data that’s geographically limited
- Data from only one source
Q6. A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?
- A sample of car owners who most recently bought an electric car
- A sample of all electric car owners
- A sample of car owners who have owned more than one electric car
- The entire population of electric car owners
Q7. Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.
- a dataset about the population
- the population most affected by the data
- a subset of the population
- the population as a whole
Q8. Which of the following processes helps ensure a close alignment of data and business objectives?
- Completing data replication
- Transferring data multiple times
- Having data update automatically during analysis
- Maintaining data integrity
Process Data from Dirty to Clean Week 02 Quiz Answers
Practice Quiz-1 Answers
L2 Recognize clean vs. dirty data:
Q1. Describe the difference between a null and a zero in a dataset.
- A null signifies invalid data. A zero is missing data.
- A null indicates that a value does not exist. A zero is a numerical response.
- A null represents a value of zero. A zero represents an empty cell.
- A null represents a number with no significance. A zero represents the number zero.
Q2. What are the most common processes and procedures handled by data engineers? Select all that apply.
- Giving data a reliable infrastructure
- Developing, maintaining, and testing systems
- Verifying results of data analysis
- Transforming data into a useful format for analysis
Q3. What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.
- Ensuring data is properly cleaned
- Ensuring data is available
- Ensuring data is backed up to prevent loss
- Ensuring data is secure
Q4. A data analyst is cleaning a dataset. They want to confirm that exactly three characters are present in each cell of a certain spreadsheet column. Which tool can they use?
- Character count
- Field length
Practice Quiz-3 Answers
L4 Cleaning data in spreadsheets:
Process Data from Dirty to Clean Weekly Challenge 2 Answers
Q1. Which of the following terms describes dirty data? Select all that apply.
Q2. Field length is a spreadsheet tool for determining if a field has been duplicated.
Q3. A data analyst notices that the customer in row 2 shares the same Customer ID as the customer in row 6. What does this scenario describe?
|1||Last name||First name||Middle initial||Customer ID|
- Duplicate data
- Mislabeled data
- Inconsistent data
- Obsolete data
Q4. Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.
Q5. A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?
Q6. For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?
Q7. You are working with the following selection of a spreadsheet:
|2||Sally Stewart||9912 School St. North Wales, PA 19454|
|3||Lorenzo Price||8621 Glendale Dr. Burlington, MA 01803|
|4||Stella Moss||372 W. Addison Street Brandon, FL 33510|
|5||Paul Casey||9069 E. Brickyard Road Chattanooga, TN 37421|
In order to extract the five-digit postal code from Burlington, MA, what is the correct function?
Q8. A data analyst in a human resources department is working with the following selection of a spreadsheet:
|1||Year Hired||Last 4 of SS#||Department||Employee ID|
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?
Q9. An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Which of the following COUNTIF function syntaxes could be used to answer this question? Select all that apply.
Q10. The V in VLOOKUP stands for what?
Q11. Fill in the blank: Data mapping is the process of _____ fields from one data source to another.
Q12. Describe the relationship between a primary key and a foreign key.
- A primary key references a row in which each value is unique. A foreign key is a column within a table that is a primary key in another table.
- A primary key is a field within a table that is a foreign key in another table. A foreign key references a column in which each value is unique
- A primary key references a column in a table in which each value is unique. A foreign key is a field within a table that is a primary key in another table.
- A primary key references a field within a table that is a foreign key in another table. A foreign key references a row in which each value is unique. Correct
Process Data from Dirty to Clean Week 03 Quiz Answers
Practice Quiz-1 Answers
L2 More about SQL:
Q1. Which of the following are benefits of using SQL? Select all that apply.
- SQL can also be used to create web apps.
- SQL offers powerful tools for cleaning data.
- SQL can be adapted and used for multiple database programs.
- SQL can handle huge amounts of data.
Q2. Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.
- Perform arithmetic
- Process huge amounts of data efficiently
- Use formulas
- Join data
Q3. SQL is a language used to communicate with databases. Like most languages, SQL has dialects. How should data analysts approach SQL dialects? Select all that apply.
- SQL dialects don’t change often, so data analysts should pick one and master it.
- SQL dialects apply to different database programs, so data analysts should first master Standard SQL.
- SQL dialects vary company by company, so data analysts should learn the dialect their company uses.
- SQL has different dialects, and data analysts must learn all of them.
Practice Quiz-2 Answers
L3 Learn basic SQL queries:
Q1. Which of the following SQL functions can data analysts use to clean string variables? Select all that apply.
Q2. You are working with a database of information about middle school students. The student_data table contains the name and eight-digit identification (ID) number for each student. The first four digits of each ID number correspond to the student’s graduation year. For example, 20267482 indicates the student will graduate in 2026.
The identification number is stored as a string in the id_number column. How do you complete this query to return the name of all students who will graduate in 2026?
SELECT name FROM student_data WHERE
- SUBSTR(id_number, 4,1) = ‘2026’
- SUBSTR(id_number, 1, 4) = ‘2026’
- SUBSTR = ‘2026’ (id_number, 4,1)
- SUBSTR = ‘2026’ (id_number, 1, 4)
Q3. A data analyst wants to confirm that all of the text strings in a table are the correct length. How would they complete the following query to return any routes greater than 10 characters long? SELECT route FROM US_roads_data WHERE
- LENGTH = (route) < 10
- LENGTH(route) > 10
- LENGTH = (route) > 10
- LENGTH(route) < 10
Process Data from Dirty to Clean Weekly Challenge 3 Answers
Q1. Data analysts choose SQL for which of the following reasons? Select all that apply.
- SQL is a programming language that can also create web apps
- SQL is a powerful software program
- SQL is a well-known standard in the professional community
- SQL can handle huge amounts of data
Q2. In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.
- When visually inspecting data
- When working with a dataset with more than 1,000,000 rows
- When working with a small dataset
- When using a language to interact with multiple database programs
Q3. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?
- INSERT INTO
- CREATE TABLE IF NOT EXISTS
- DROP TABLE IF EXISTS
Q4. A data analyst is cleaning customer data for an online retail company. They are working with the following section of a database:
The analyst wants to find out if the state data is consistent and if any text strings contain more than two characters. What is the correct SQL clause to use to find any text strings containing more than two characters?
- WHERE(state) > 2
- DISTINCT(state) > 2
- SUBSTR(state) > 2
- LENGTH(state) > 2
Q5. Fill in the blank: The _____ function counts the number of characters a string contains.
Q6. In SQL databases, what data type refers to a number that contains a decimal?
Q7. Fill in the blank: In SQL databases, the _____ function can be used to convert data from one datatype to another.
Q8. Fill in the blank: The _____ function can be used to return non-null values in a list.
Process Data from Dirty to Clean Week 04 Quiz Answers
Practice Quiz-1 Answers
L2 Manually cleaning data:
Q1. Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply.
- Rechecking the data-cleaning effort
- Providing a list of updates to stakeholders
- Manually fixing any errors data analysts find
- Comparing the original purpose of the project to the findings
Q2. An analyst has just finished cleaning a dataset. Before analysis, why might the analyst want to revisit the business problem? Select all that apply.
- To confirm that the data is capable of meeting project objectives
- To consider whether the data can help solve the business problem
- To select which data points to include in analysis
- To schedule a meeting with stakeholders
Q3. A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply.
- Import data
- Remove duplicates
- Pivot table
- Protect sheet
Practice Quiz-2 Answers
L3 Documenting cleaning results:
Q1. Fill in the blank: While cleaning data, documentation is used to track _____. Select all that apply.
Q2. Why is it important for a data analyst to document the evolution of a dataset? Select all that apply.
- To inform other users of changes
- To determine the quality of the data
- To identify best practices in the collection of data
- To recover data-cleaning errors
Practice Quiz-3 Answers
L4 Documentation the cleaning process:
Q1. Which of the following data errors can be eliminated by documenting the data-cleaning process? Select all that apply.
- Human error in data entry
- System issues
- Flawed processes
- Premature feedback
Q2. Documenting data-cleaning makes it possible to achieve what goals? Select all that apply.
- Demonstrate to project stakeholders that you are accountable
- Be transparent about your process
- Visualize the results of your data analysis
- Keep team members on the same page
Process Data from Dirty to Clean Weekly Challenge 4 Answers
Q1. The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.
Q2. A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?
- Reporting on the data
- Seeing the big picture
- Considering the stakeholders
- Visualizing the data
Q3. Which function removes leading, trailing, and repeated spaces in data?
Q4. A data analyst uses the COUNTA function to count which of the following?
- The total number of headers in a specific range.
- The total number of values within a specified range.
- The total number of entries in a changelog.
- The specific numbers in a dataset.
Q5. A WHEN statement considers one or more conditions and returns a value as soon as that condition is met.
Q6. What is the process of tracking changes, additions, deletions, and errors during data cleaning?
Q7. Fill in the blank: A changelog contains a _____ list of modifications made to a project.
Q8. Reviewing version history is an effective way to view a changelog in SQL.
Process Data from Dirty to Clean Week 05 Quiz Answers
Practice Quiz-1 Answers
Process Data from Dirty to Clean Course Challenge Answers
Q1. You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.
Meer-Kitty Interior Design About Us Page.pdf
Meer-Kitty Interior Design Business Plan.pdf
Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.
Kitty Survey Feedback – Meer-Kitty survey feedback.csv
You are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.
As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.
- Repeat the survey in order to create a new, improved dataset.
- Locate another dataset about indoor paint.
- Remove the duplicates from the data and proceed with analysis.
- Talk with stakeholders and ask for more time.
Q2. During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.
Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.
Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?
- Find an alternate data source that will still enable you to meet your objective.
- Watch the videos and use your gut instinct to identify which are most successful.
- Tell the client you’re sorry, but there is no way to meet their objective.
- Move ahead with the data you have to determine the top video subjects.
Q3. Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.
Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole.
When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.
- Increase sample size
- Use data that keeps updating
- Use data from only one source
- Use random sampling
Q4. The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.
Kitty Survey Feedback – New Meer-Kitty survey feedback.csv
You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?
- Data validation
- Conditional formatting
Q5. You continue cleaning the data. You use tools such as remove duplicates and COUNTIF to ensure the dataset is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.
While reviewing, your team notes one aspect of data cleaning that would improve the dataset even more. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.
What spreadsheet function enables you to put each of the colors in Column G into a new, separate cell?
Q6. You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:
C4 B.Spoke Market Research Job Description.pdf
So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:
C4 S2 Email from Recruiter.pdf
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.
There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.
Q7. Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.
She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?
- The FROM statement
- The WHERE statement
- The UPDATE statement
- The SELECT statement
Q8. Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.
She asks: What function would you use to convert data in a SQL table from one datatype to another?
Q9. Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.
She asks: Which SQL function would you use to add strings together to create new text strings?
Q10. For your final question, your interviewer explains that her team often comes across data with extra spaces.
She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.
Next Course Quiz Answers >>
<< Previous Course Quiz Answers