## All Process Mining: Data science in Action Coursera Quiz Answers

Q1. The four V’s of big data are Volume, Velocity, Variety and Veracity.

Which of these four V’s is applicable when we talk about the problem that you cannot be sure that the data is fully accurate?

**Volume**- Veracity
- Velocity
- Variety

Q2. When we talk about replay, we mean the process where…

- we start from both a process model and a collection of observed behavior, e.g. traces, and compare these.
- we start from a process model and generate behavior, e.g. traces.
**we start from event data and generate a process model, e.g. a Petri net.**

Q3. We would like to learn the influence of someone’s weight and drinking behavior on their smoking behavior. What are the response and predictor variables?

- Variable weight is the response variable and drinking and smoker are the predictor variables.
- Variable drinker is the response variable and weight and smoker are the predictor variables.
**Variable smoker is the response variable and drinker and weight are the predictor variables.**- Variables drinker and weight are the response variables and smoker is the predictor variable.
**Variables drinker and smoker are the response variables and weight is the predictor variable.**- Variables smoker and weight are the response variables and drinking is the predictor variable.

Q4. There are two types of learning: supervised and unsupervised. Which of the following statements are true for **unsupervised** learning?

**The goal is to explain a response variable in terms of the predictor variables.**- An example is the detection of patterns in the data.
**The data is labeled such that for each element its class is known**- An example is to cluster similar data together.
- An example is classification of data, e.g. learning a decision tree.

Q5. Consider a node in a decision tree with 100 instances of type A and 50 of type B. What is the entropy of this node?

Q6. Consider the two decision trees depicted below (a tree with just one node, and a tree where this node is split based on the age attribute). Does it make sense to split the tree?

**Yes, since the entropy of the entire tree goes from 0.9183 to 0.7453.**- No, since the entropy of the entire tree goes from 0.9183 to 0.7453.
- Yes, since the entropy of the entire tree goes from 0.9183 to 1.1716.
- Yes, since the entropy of the entire tree goes from 0.9183 to 1.7453.

Q7. What is the formula to calculate the **support** that X implies Y given that

N*N* is the number of instances

N_X*N**X* is the number of instances covering X

N_{X \land Y} = N_{X \cup Y}*NX*∧*Y*=*NX*∪*Y* is the number of instances covering both X and Y

Q8. Assume a data set with two variables that we would like to cluster using k-means with k=3. See the following centroids. Which one could be the end results of applying k-means.

Q9. Given the classification provided below, what is the corresponding recall?

- 0.8741
- 0.8310
- 0.9091
**0.1690**

Q10. Please check the statements that are true for k-fold cross validation.

- The learning algorithm can only use k-1 data sets during each of the runs to learn the model from.
- The learning algorithm is applied k-1 times on different combinations of training and test data sets.
**Within a run, the quality of the model learned by the algorithm is evaluated on the one data set not used for learning the model.****The data set is split into k smaller data sets.**

**Find more related Quiz Answers >>**

The Data Scientist’s Toolbox Quiz Answers

Computer Vision Basics Quiz Answers

Indigenous Canada Coursera Quiz Answers

Process Data from Dirty to Clean Coursera Quiz Answers