#### Table of Contents

### Unsupervised Machine Learning Week 01 Quiz Answers

### Quiz 01: Introduction to Unsupervised Learning

Q1. Which statement about unsupervised algorithms is TRUE?

- Unsupervised algorithms are relevant when we have outcomes we are trying to predict.
- Unsupervised algorithms are relevant when we don’t have the outcomes we are trying to predict and when we want to break down our data set into smaller groups.
- Unsupervised algorithms are typically used to forecast time related patterns like stock market trends or sales forecasts.
- Unsupervised algorithms are relevant in cases that require explainability, for example comparing parameters from one model to another.

Q2. Which of these options is NOT an example of Unsupervised Learning?

- Segmenting costumers into different groups.
- Reducing the size of a data set without losing too much information from our original data set.
- Explaining the relationship between an individual’s income and the price they pay for a car.
- Grouping observations together to find similar patterns across them.

Q3. What is one of the real-world solutions to fix the problems of the curse dimensionality?

- Increase the size of the data set
- Use more computational power
- Reduce the dimension of the data set.
- Balance the classes of a data set

Q4. Which of the following examples is NOT a common use case of clustering in the real world?

- Anomaly detection.
- Customer segmentation
- Determine risk factor and prevention factors for diseases such as osteoporosis
- Improve supervised learning.

Q5. Which statement is a common use of Dimension Reduction in the real world?

- Image tracking
- Explaining the relation between the amount of alcohol consumption and diabetes.
- Deep Learning
- Predicting whether a customer will return to a store to make a major purchase.

### Quiz 02: K Means Clustering

Q1. (True/False) Is the following statement True or False?

*“We initialize our K-means algorithm by taking 2 random points and these points are going to act as the centroids”. *

- True
- False

Q2. Which of the following statements best describes the iterative part of the K-means algorithm?

- The k-means algorithm assigns a number of clusters at random.
- The k-means algorithm adjusts the centroids to the new mean of each cluster, and then it keeps repeating this process until no example is assigned to another cluster.
- The k-means algorithm iteratively deletes outliers.
- The k-means algorithm iteratively calculates the distance from each point to the centroid of each cluster.

Q3. (True/False) Is the following statement True or False?

*“The problem with K-means algorithm is that is sensitive to the choice of the initial points, so different initial configurations may yield different results”. *

- False
- True

Q4. Which statement describes better “The Smarter initialization of K-mean clusters?

- “Draw a line between the data points to create 2 big clusters.”
- “After we find our centroids, we calculate the distance between all our data points.”
- “Pick one point random as initial point and for the second pick instead of doing it randomly we prioritize by assigning the probability of the distance.”
- “We start by having two centroids as far as possible between each other.”

Q5. What happen with our second cluster centroid when we use the probability formula?

- When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be closer.
- When we use the probability formula, we put more weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.
- When we use the probability formula, we put more weight on the lighter centroids, because it will take more computational power to draw our clusters. So, the second cluster centroid is likely going to be less distant.
- When we use the probability formula, we put less weight on the points that are far away. So, our second cluster centroid is likely going to be more distant.

### Quiz 03: End of Module

Q1. (True/False) K-means clustering algorithm relies on finding clusters centers to group data points based on minimizing the sum of square errors between each data point and its cluster centroid.

- False
- True

Q2. What’s the name of the default initialization for K-means?

- K-means optimal.
- K-means ++
- K-means inertia
- K-means sum of square error

Q3. What is the implication of a small standard deviation of the clusters?

- A small standard deviation of the clusters defines the size of the clusters.
- The standard deviation of the cluster defines how tightly around each one of the centroids are. With a small standard deviation, the points will be closer to the centroids.
- The standard deviation of the cluster defines how tightly around each one of the centroids are. With a small standard deviation, we can’t find any centroids.
- A small standard deviation of the clusters means that the centroids are not close enough to each other.

Q4. After we plot our elbow and we find the inflection point, what does that point indicate to us?

- The ideal number of clusters.
- The data points we need to form a cluster
- How we can reduce our number of clusters.
- Whether we need to remove outliers.

Q5. (True/False) We can use K-means to reduce the size of high-quality images by just keeping the important information and grouping the colors with the right number of clusters.

- False
- True

Q6. What is one of the most suitable ways to choose K when the number of clusters is unclear?

- You can start by choosing a random number of clusters.
- By evaluating Clustering performance such as Inertia and Distortion.
- By increasing the number of clusters calculating the square root.
- You can start by using a k nearest neighbor method.

Q7. Which statement best describes the formula for Inertia?

- The Sum of squares distance from each point (xi) to its clusters (ck)
- B: The sum of
*a*squared plus*b*squared equals*c*squared. - Average of squared distance from each point (xi) to its cluster.
- Average of the distance from each point to its cluster.

Q8. *Which statement describes correctly the use of distortion and inertia?*

- When we the sum of the point equals a prime number use inertia, and when the sum of the point equals a pair number use distortion.
- When the we can calculate a number of clusters higher than 10, we use distortion, when we calculate a number of clusters smaller than 10, we use inertia.
- When outliers are a concern use inertia, otherwise use distortion.
- When the similarity of the points in the cluster are more important you should use distortion and if you are more concern that clusters have similar numbers of points then you should use inertia.

Q9. Select the approach that can help you find the cluster with best inertia

- Compute the resulting inertia or distortion, keep the results, and see which one of the different initializations of configurations lead to the best inertia or distortion. As an example of this, the best inertia result is the
**highest**value. - Compute the resulting inertia or distortion, keep the results, and see which one of the different initializations of configurations lead to the best inertia or distortion. As an example of this, the best inertia result is the
**lowest**value. - Compute the resulting inertia or distortion, keep the results, and see which one of the different initializations of configurations lead to the best inertia or distortion. As an example of this, the best inertia result is the
**average**value. - Compute the resulting inertia or distortion, keep the results, and see which one of the different initializations of configurations lead to the best inertia or distortion. As an example of this, the best inertia result is the
**median**value.

Q10. Which method is commonly used to select the right number of clusters?

- The elbow method.
- The ROC curve.
- The perfect Square Method
- The Sum of Square Method

### Unsupervised Machine Learning Week 02 Quiz Answers

### Quiz 01: Distance Metrics

*Q1. (True/False) Is the following statement true or false?*

*“Our choice of Distance Metric will be extremely important when discussing our clustering algorithms and to clustering success.”*

- True
- False

Q2. *What is the other name we can give to the L2 distance?*

- Hamming Distance
- Euclidean Distance
- Manhattan Distance
- Mahalanobis Distanc

Q3. *Which of the following statements is a business case for the use of the Manhattan distance (L1)?*

- We use it in business cases where there is very high dimensionality.
- We use it in business cases where there is low dimensionality.
- We use it in business cases where the dimensionality is unknown.
- We use it in business cases with outliers.

Q4. *What is thekey feature for the Cosine Distance?*

- The size of the curve.
- The Cosine Distance, which takes into acount the angle between 2 points.
- It is sensitive to the size of the data set.
- It is not sensitive to the size of the data set.

Q5. *The following statement is an example of a business case where we can use the Cosine Distance?*

- Cosine is useful for coordinate based measurements.
- Cosine is better for data such as text where location of occurrence is less important.
- Cosine distance is more sensitive to the curse of dimensionality
- Cosine distance is less sensitive to the curse of dimensionality

Q6. *Which distance metric is usefulwhen we have text documents and we want to group similar topics together?*

- Manhattan Distance
- Euclidean
- Jaccard
- Mahalanobis Distance

### Quiz 02: Clustering Algorithms

Q1. (True/False) Hierarchical Agglomerative Clustering algorithm will try to continuously split out, and merge new clusters successively until it reaches a level of convergence.

- True
- False

Q2. Why we need a stopping criterion when we are using the HAC?

- The algorithm will turn our data into small clusters.
- The algorithm will turn our data into just one cluster.
- The algorithm will not start working if we don’t assign a number of clusters.
- The stopping criterion ensures centroids are calculated correctly.

Q3. (True/False) Does the following statemen is a TRUE or FALSE explanation about the key operation of the DBSCAN algorithm?

“A key part of this algorithm is that truly finds clusters of data rather that partitioning it, works better when we have noise in our data set and properly find the outliers…”

- False
- True

Q4. According to the DBSCAN required inputs, which statement describes the n_clu input?

- Function to calculate distance.
- Radius of local neighborhood.
- Determines density threshold (for fixed Ɛ) (The minimum amount of points for a particular point to be consider a core point of a cluster).
- The maximum amount of observations for a particular point to be consider a core point of a cluster.

Q5. How do we define the core points when we use the DBSCAN algorithm?

- A point that has more than n_clu neighbors in their Ɛ-neighborhood.
- An Ɛ-neighbor point than has fewer than n_clu neighbors itself.
- A point that has no points in its Ɛ-neighborhood.
- A point that has the same amount of n_clu neighbors within and outside the Ɛ-neighborhood.

### Quiz 03: Comparing Clustering Algorithms

Q1. *Which of the following statements is a characteristic of the K-means algorithm?*

- It is limited to use the Euclidean distance within his formulation.
- To determine the number of clusters we use the elbow method.
- Can be slow to calculate as the number of observations increases.
- It’s limited to use the Ward distance within his formulation.

Q2. *Which of the following statements is a characteristic of the DBSCAN algorithm?*

- Can handle tons of data and weird shapes.
- Finds uneven cluster sizes (one is big, some are tiny).
- It will do a great performance finding many clusters.
- It will do a great performance finding few clusters.

Q3. *Which of the following statements is a characteristic of the Hierarchical Clustering (Ward) algorithm?*

- If we use a mini batch to find our centroids and clusters this will find our clusters fairly quickly.
- It offers a lot of distance metrics and linkage options.
- Too small epsilon (too many clusters) is not trustworthy.
- Too large epsilon (too few clusters) is not trustworthy.

Q4. *Which of the following statements is a characteristic of the Mean Shift algorithm?*

- Not require us to set the number of clusters, the number of clusters will be determined for us.
- Bad with non-spherical cluster shapes.
- You need to decide the number of clusters on your own, choosing the numbers directly or the minimum distance threshold.
- Good with non-spherical cluster shapes.

### Quiz 04: End of Module

Q1. When using DBSCAN, how does the algorithm determine that a cluster is complete and is time to move to a different point of the data set and potentially start a new cluster**?**

- When the algorithm requires you to change the input.
- When the algorithm forms a new cluster using the outliers.
- When no point is left unvisited by the chain reaction.
- When the solution converges to a single cluster.

Q2. Which of the following statements correctly defines the strengths of the DBSCAN algorithm?

- No need to specify the number of clusters (cf. K-means), allows for noise, and can handle arbitrary-shaped clusters.
- Do well with different density, works with just one parameter, the n_clu defines itself.
- The algorithm will find the outliers first, draw regular shapes, works faster than other algorithms.
- The algorithm is computationally intensive, it is sensitive to outliers, and it requires few hyperparameters to be tuned.

Q3. Which of the following statements correctly defines the weaknesses of the DBSCAN algorithm?

- The clusters it find might not be trustworthy, it needs noisy data to work, and it can’t handle subgroups.
- It needs two parameters as input, finding appropriate values of Ɛ and n_clu can be difficult, and it does not do well with clusters of different density.
- The algorithm will find the outliers first, it draws regular shapes, and it works faster than other algorithms.
- The algorithm is computationally intensive, it is sensitive to outliers, and it requires too many hyperparameters to be tuned.

Q4. (True/false) Using the Single Linkage method with HAC helps you ensure a clear separation between clusters.

- False
- True

Q5. (True/false) Does complete linkage refers to the maximum pairwise distance between clusters?

- True
- False

Q6. Which of the following measure methods computes the inertia and pick the pair that is going to ultimately minimize the inertia value?

- Single linkage
- Average linkage
- Ward linkage
- Complete linkage

### Unsupervised Machine Learning Week 03 Quiz Answers

### Quiz 01: Dimensionality Reduction

Q1. Select the option that best completes the following sentence:

For data with many features, principal components analysis

- identifies which features can be safely discarded
- reduces the number of features without losing any information.
- establishes a minimum number of viable features for use in the analysis.
- generates new features that are linear combinations of the original features.

Q2. What is the main difference between kernel PCA and linear PCA?

- The objective of linear PCA is to decrease the dimensionality of the space whereas the objective of Kernel PCA is to increase the dimensionality of the space.
- Kernel PCA tend to uncover non-linearity structure within the dataset by increasing the dimensionality of the space thanks to the kernel trick.
- Kernel PCA and Linear PCA are both Linear dimensionality reduction algorithm but they use a different optimization method.
- Kernel PCA tend to preserve the geometric distances between the points while reducing the dimensionality of the space.

Q3. (True/False) Multi-Dimensional Scaling (MDS) focuses on maintaining the geometric distances between points.

- True
- False

### Quiz 02: Non-Negative Matrix Factorization

Q1. (True/False) In some applications, NMF can make for more human interpretable latent features.

- True
- False

Q2. Which of the following set of features is the least adapted to NMF?

- Word Count of the different words present in a text.
- Pixel color values of a an Image.
- Spectral decomposition of an audio file.
- Monthly returns of a set of stock portfolios.

Q3. (True/False) The NMF can produce different outputs depending on its initialization.

- True
- False

Q4. Which option is the dense representation of the matrix below?

[(1, 1, 2), (1, 2, 3), (3, 4, 1), (2, 4, 4), (4, 3, 1)]

- [[2 0 0 0],

[3 0 0 0],

[0 0 0 1],

[0 4 1 0]]

- [[0 0 0 1],

[0 2 0 0],

[0 0 0 3],

[0 4 1 0]]

- [[1 0 0 0],

[0 3 0 0],

[0 2 0 0],

[0 0 4 2]]

- [[0 0 0 2],

[0 3 4 0],

[0 0 0 0],

[0 0 1 0]]

### Quiz 03: End of Module

Q1. When we use the DBSCAN algorithm, how do we know that our cluster is complete and is time to move to a different point of the data set and potentially start a new cluster?

- When the algorithm required us to change the input.
- When the algorithm forms a new cluster using the outliers.
- When no point is left unvisited by the chain reaction.
- When the solution converges to a single cluster.

Q2. Which of the following statements correctly defines the strengths of the DBSCAN algorithm

- No need to specify the number of clusters (cf. K-means), allows for noise, and can handle arbitrary-shaped clusters.
- Do well with different density, works with just one parameter, the n_clu defines itself.
- The algorithm will find the outliers first, draw regular shapes, works faster than other algorithms.
- The algorithm is computationally intensive, it is sensitive to outliers, and it requires few hyperparameters to be tuned.

Q3. Which of the following statements correctly defines the weaknesses of the DBSCAN algorithm?

- The clusters it finds might not be trustworthy, it needs noisy data to work, and it can’t handle subgroups.
- It needs two parameters as input, finding appropriate values of Ɛ and n_clu can be difficult, and it does not do well with clusters of different density.
- The algorithm will find the outliers first, it draws regular shapes, and it works faster than other algorithms.
- The algorithm is computationally intensive, it is sensitive to outliers, and it requires too many hyperparameters to be tuned.

Q4. (True/False) Using the Single Linkage method with HAC helps you ensure a clear separation between clusters.

- False
- True

Q5. (True/false) Does complete linkage refer to the maximum pairwise distance between clusters?

- True
- False

Q6. Which of the following measure methods computes the inertia and picks the pair that is going to ultimately minimize the inertia value?

- Single linkage
- Average linkage
- Ward linkage
- Complete linkage

##### Get All Course Quiz Answers of the IBM Machine Learning Professional Certificate

Exploratory Data Analysis for Machine Learning Quiz Answers

Supervised Machine Learning: Regression Quiz Answers

Supervised Machine Learning: Classification Coursera Quiz Answers

Unsupervised Machine Learning Coursera Quiz Answers

Deep Learning and Reinforcement Learning Quiz Answers

Specialized Models: Time Series and Survival Analysis Quiz Answers