All Weeks Code Free Data Science Coursera Quiz Answers
Code Free Data Science Week 01 Quiz Answers
Big Data Quiz Answers
Q1. Over what X% of data was created in last 2 years?
Q2. Data generated is growing at exponential rate
Q3. The number of smart connected devices in the world has reached over 50 ?
Module 1 quiz Answers
Q1. How many Terabytes are in a Petabyte?
Q2. Big Data is fueling Data Science
Q3. What types of data are consumed in order to bring the value from Big Data?
Q4. Which one of the V’s below does NOT describe one of the 4 major characteristics of Big Data?
Q5. Descriptive Analytics enables faster decision automation than Prescriptive Analytics
Code Free Data Science Week 02 Quiz Answers
Install KNIME Quiz Answers
Q1. I have successfully downloaded and installed KNIME Analytics Platform
Q2. In order to download KNIME you should consult the following website
Q3. KNIME Nodes indicate their status by utilizing
- red, yellow and green light
- red, white and blue light
- yellow, purple and green light
Exploring KNIME Answers
Q1. KNIME Analytics Platform requires programming
Q2. Workflow editor in KNIME enables us to include documentation with the analytics
Q3. If we are not sure what the certain node does we can look it up in the
- Node Repository Panel
- Node Description Panel
- Console Panel
- Workflow Coach Panel
Node Operations Quiz Answers
Q1. In order to bring a node into to workflow editor from the node repository you can – click all that apply
- double-click the node
- drag the node
- right click on the node
- hover over the node
Q2. The node needs to be _________ before executed
Q3. When you use a file reader node you can see the file after you
- configure the node
- execute the node
Filtering Data Quiz Answer
Q1. KNIME Supports all of the following data types except
- Date or Time
Q2. KNIME indicates the missing value in the data set with the following character
Q3. Column filter in KNIME can be used to
- Add columns to the existing data set
- Exclude columns from the data set
Q4. Column filter can select columns based on
- Name or Type
Q5. Row Filter in KNIME can include of exclude rows based on
- Attribute value test
- Row number
- Row length
- Row ID
Filtering Workflow Assignment Solution
Q. Create a KNIME workflow where you read the provided autos data set. Exlcude the columns ‘normalized’ and ‘bore’. Filter the rows on the price column and keep only instances describing cars that are less than $10,000. Write the file out to csv and report back the number of rows.
Code Free Data Science Week 03 Quiz Answers
Rule Engine Quiz Ansswers
Q1. Rule engine node in KNIME takes a list of user-defined rules and tries to match them to each row in the input table
Q2. Each rule in Rule Engine Node is represented by a
Q3. Rules in the Rule Engine Node consist of a condition 2 main pars
- antecedent and consequent
- question and answer
- start and stop
Q4. Rules in the Rule Engine Node the outcome of the rule is indicated by
Q5. If no rule matches, the outcome is a missing value unless a default value is specified.
Module 3 Assignment Answers
Q1. Read in the Baloon Data Set from the UCI Data Repository at https://archive.ics.uci.edu/ml/datasets/balloons.
Download : Yellow-small.data
This file has 5 columns: Color, Size, Act, Age and Inflated (True/False)
1. Rename the columns accordingly.
2. Add the following classification column and name it Class
IF Color=yellow AND Size=small => Class=inflated
ELSE Class= not inflated
3. Add a final column called “Full sentence” that provide the info as
“inflated is T”
“not Inflated is F”
Where “inflated/not inflated” comes from the “Class” column and “T/F” from the “Inflated (True/False) column.
Question: How many rows are there with the Class = inflated
Hint: you can use Rule engine and String Manipulation Nodes to accomplish this assignment
Q2. Read the adult data set. Download adult.data from https://archive.ics.uci.edu/ml/machine-learning-databases/adult/
What is the highest number of hours of work per week for people who have Masters degree and are 20-40 years?
Code Free Data Science Week 04 Quiz Answers
Decision Trees Quiz Answers
Q1. Decision Trees are (mark all that apply)
- Robust to missing and noisy data
- Can learn non-linear relationships from data
- Have inductive Bias towards shortest trees
- Are an Unsupervised types of machine learning
Q2. Decision Trees can only have one Root Node
Q3. In decision trees each leaf node assigns
- test on an attribute
- a classification
- corresponding attribute value
Q4. Decision Trees use Information gain to calculate which attribute to split on. Information gain is measured in
Q5. What does Decision tree uses to prevent overfitting?
- divide and conquer
- self selection
Decision Tree Assignment Answers
Q2. Read Iris data set from the UCI repository. Train a Decision Tree on 75% of the data and use the remaining 25% for testing. Set the parameters of the Learner Decision Tree node to utilize: gain ratio with no pruning and minimum of 4 attributes per leaf node. What is the overall model accuracy ? (hint: answer should be in decimal point format)
Q2. Using the wine.data data set from the UCI ML Repository. Split the training and testing into 80%/20%. Train a Decision Tree to recognize the class to which each wine belongs. Evaluate the Decision Tree on the wine test set and measure the Decision Tree performance. In particular, report back how many false negatives for class 2 there are?
- More than 10
- More than 100
- Less than 3
- More than 3 but less than 10
Clustering Assignment Answers
Q1. Use the wine.data data set. Partition the data into 80/20% training and test data sets. Utilize the K-means clustering algorithm to train the model to produce 3 clusters. Use color, shape and scatter plot nodes to visualize results. Visualizing the cluster assignments how many instances have ended up in the cluster_0. Hint: use the number to string on the class attribute.
Q2. Use the clustering algorithm on the Iris data set and partition the training test data set into 90/10%. By utilizing the cluster assigner node – are there any Iris-versicolor that were clustered across numerous clusters? How about Iris-Virginica? Which classes were “spread” across multiple cluster assignments? If you change the split to 85 or 90% for training – how does it influence the “assignments” of classes to clusters? What happens if you increase the number of clusters? Mark all that apply.
- Iris-versicolor was” spread” across several clusters
- Iris-virginica demonstrated more “spread” across clusters as % training/testing was changed
- Increased number of clusters produced less diversity on class to cluster assignments