## Table of Contents

### Exploratory Data Analysis for Machine Learning Week 01 Quiz Answers

#### Quiz 01: Check for Understanding

Q1. (True/False) Machine Learning is a subset of Artificial Intelligence

- False
**True**

Q2. (True/False) Deep Learning is a subset of Machine Learning

- False
**True**

Q3. (True/False) Machine Learning consists in programming computers to learn from real-time human interactions

**False**- True

Q4. (True/False) Machine Learning is the same as Artificial Intelligence

False

True

#### Quiz 02: Check for Understanding

Q1. (True/False) AI Winters happened mostly due to the lack of understanding behind the theory of neural networks

- True
- False

Q2. Most applications that use computer vision, use models that were trained using this discipline:

- Machine Learning
- Artificial Intelligence
- Deep Learning

Q3. In the Machine Learning Workflow, the main goal of the Data Exploration and Preprocessing step is to:

- Identify what data that is best suited to find a solution to your business problem
- Determine how to clean your data such that you can use it to train a model

#### Module 1 Quiz

Q1. Assume you have a data set that summarizes a marketing campaign with information related to prospective customers. The data set contains 100 observations with several columns that summarize information about the prospective customer. It also has a column that flags whether the prospect responded or not.

In this example, “Yes” or “No” are the possible values of the:

- label
- features
- target
- example

Q2. Assume you have a data set that summarizes a marketing campaign with information related to prospective customers. The data set contains 100 observations with several columns that summarize information about the prospective customer. It also has a column that flags whether the prospect responded or not.

In this context, observation is a synonym of:

- label
- features
- target
- example

Q3. Assume you have a data set that summarizes a marketing campaign with information related to prospective customers. The data set contains 100 observations with several columns that summarize information about the prospective customer. It also has a column that flags whether the prospect responded or not.

A machine learning model that predicts response, is using the column Responded as a:

- label
- features
- target
- example

#### Quiz 03: Check for Understanding

Q1. Which statement about the Pandas read_csv function is TRUE?

- It can only read comma-delimited data.
- It can read both tab-delimited and space-delimited data.
- It reads data into a 2-dimensional NumPy array.
- It allows only one argument: the name of the file.

Q2. Which of the following is an example of a file type that uses Javascript Object Notation (JSON) formatting?

- Python (.py files)
- Javascript (.js files)
- SQL Database (.sql files)
- Jupyter/iPython (.ipynb files)

Q3. The data below appears in ‘data.txt’, and Pandas has been imported. Which Python command will read it correctly into a Pandas DataFrame?

63.03 22.55 39.61 40.48 98.67 -0.25 AB

39.06 10.06 25.02 29 114.41 4.56 AB

68.83 22.22 50.09 46.61 105.99 -3.53 AB

- pandas.read_csv(‘data.txt’)
- pandas.read_csv(‘data.txt’, header=None, sep=’ ‘)
- pandas.read_csv(‘data.txt’, delim_whitespace=True)
- pandas.read_csv(‘data.txt’, header=0, delim_whitespace=True)

#### Quiz 05: Check for Understanding

Q1. (True/False) Outliers must be very extreme to noticeably impact the fit of a statistical model.

- True
- False

Q2. (True/False) Outliers should always be replaced, since they never contain useful information about the data.

- True
- False

Q3. Which residual-based approach to identifying outliers compares running a model with all data to running the same model, but dropping a single observation?

- Standardized residuals
- Unstandardized residuals
- Externally-studentized residuals
- Abnormally-studentized residuals

#### Quiz 06: Check for Understanding

Q1. From the options listed below, select the option that is NOT a valid exploratory data approach to visually confirm whether your data is ready for modeling or if it needs further cleaning or data processing:

- Create a panel plot that shows distributions for the dependent variable and scatter plots for all independent variables
- Train a model and identify the observations with the largest residuals
- Create visualizations for scatter plots, histograms, box plots, and hexbin plots
- Create a correlation heatmap to confirm the sign and magnitude of correlation across your features.

Q2. These are two of the most common variables for data visualization:

- matplotlib and seaborn
- scipy and seaborn
- numpy and matplotlib
- scipy and numpy

Q3. (True/False) You can use the pandas library to use plots.

- True
- False

#### Quiz 07: Check for Understanding

Q1. (True/False) Classification models require that input features be scaled.

- True
- False

Q2. (True/False) Feature scaling allows better interpretation of distance-based approaches.

- True
- False

Q3. (True/False) Feature scaling reduces distortions caused by variables with different scales.

- True
- False

#### Module 2 Quiz

Q1. Which of the following statements about cloud data access using Pandas is TRUE**?**

- With read_csv , the online file must be comma-delimited.
- The ead_csv function can read data directly from a website or url.
- With read_csv , the destination file must have column names in the first row.
- A remote destination file must be downloaded locally before it can be read by Pandas.

Q2. In which case below is it most plausible to conclude that an observation includes an outlier for one of the features?

- One feature has a deleted residual value above 3.
- The observation includes the maximum target value.
- The observation is missing values for several of the features.
- One feature has an internally-studentized residual value above 3.

Q3. Which of these approaches to feature engineering will be impacted LEAST by extreme values?

- RobustScaler
- MinMaxScaler
- LabelBinarizer
- OneHotEncoder

Q4. Which of these approaches to feature engineering will be impacted MOST by extreme values?

- RobustScaler
- MinMaxScaler
- LabelBinarizer
- OneHotEncoder

Q5. (True/False) RobustScaler adapts MinMaxScaler to account for outliers.

- True
- False

Q6. (True/False) StandardScaler requires data that are normally distributed.

- True
- False

Q7. (True/False) Any features that have been transformed by StandardScaler, MinMaxScaler, or RobustScaler will take values in (0,1)

- True
- False

Q8. Which of the following assertions describes a good reason for using scatter plots to complement calculating the correlation coefficient between two variables?

- A scatter plot helps you visualize whether outliers are inflating or deflating a correlation coefficient
- A scatter plot will help you identify if the correlation is positive or negative
- A scatter plot takes into account both the spearman and the person correlation coefficients in a single step.
- It is computationally more efficient to produce a scatter plot first and then compute a correlation coefficient.

**Exploratory Data Analysis for Machine Learning Week 02 Quiz Answers**

#### Quiz 01: Check for Understanding

Q1. (True/False) In general, the population parameters are unknown

- True.
- False.

Q2. (True/False) Parametric models have **finite** number of parameters.

- True.
- False.

Q3. The most common way of estimating parameters in a parametric model is:

- using the maximum likelihood estimation
- using the central limit theorem
- extrapolating a non-parametric model
- extrapolating Bayesian statistics

#### Quiz 02: Check for Understanding

Q1. A p-value is

- the smallest significance level at which the null hypothesis would be rejected
- the probability of the null hypothesis being true
- the probability of the null hypothesis being false
- the smallest significance level at which the null hypothesis is accepted

Q2. Type 1 Error 1 is defined as:

- Saying the null hypothesis is false, when it is actually true
- Saying the null hypothesis is true, when it is actually false

Q3. You find through a graph that there is a strong correlation between Net Promoter Score and the visual time that customers spend on a website. Select the TRUE assertion:

- There is an underlying factor that explains this correlation, but manipulating the time that customers spend on a website may not affect the Net Promoter Score they will give to the company
- To boost the Net Promoter Score of a business, you need to increase the time that customers spend on a website.

Q4. (True/False) If you reject the null hypothesis, it means that the alternate hypothesis is true.

- True
- False

#### Module 3 Quiz

Q1. Type of Statistics in which the posterior probability is the updated belief on the probability of an event happening given the prior and the data observed.

- Classical Statistics
- Frequentist Statistics
- Bayesian Statistics
- Descriptive Statistics

Q2. (True/False) On a given hypothesis test, you obtain a p-value of 0.051. This can be interpreted as approaching significance or almost significant.

- True
- False

Q3. (True/False) When you reject the null hypothesis, then the alternate hypothesis must be true.

- True
- False

##### Get All Course Quiz Answers of IBM Machine Learning Professional Certificate

Exploratory Data Analysis for Machine Learning Quiz Answers

Supervised Machine Learning: Regression Quiz Answers

Supervised Machine Learning: Classification Coursera Quiz Answers

Unsupervised Machine Learning Coursera Quiz Answers

Deep Learning and Reinforcement Learning Quiz Answers

Specialized Models: Time Series and Survival Analysis Quiz Answers