Machine Learning: Regression Coursera Quiz Answers

Get All Weeks Machine Learning: Regression Coursera Quiz Answers

Week 1: Machine Learning: Regression Quiz Answers

Quiz 1: Simple Linear Regression

Question 1: Assume you fit a regression model to predict house prices from square feet based on a training data set consisting of houses with square feet in the range of 1000 and 2000. In which interval would we expect predictions to do best?

View
[1000, 2000]

Question 2: In a simple regression model, if you increase the input value by 1 then you expect the output to change by:

View
The value of the slope parameter

Question 3: Two people present you with fits of their simple regression model for predicting house prices from square feet. You discover that the estimated intercept and slopes are exactly the same. This necessarily implies that these two people fit their models on exactly the same data set.

View
False

Question 4: You have a data set consisting of the sales prices of houses in your neighborhood, with each sale time-stamped by the month and year in which the house sold. You want to predict the average value of houses in your neighborhood over time, so you fit a simple regression model with average house price as the output and the time index (in months) as the input. Based on 10 months of data, the estimated intercept is $4569 and the estimated slope is 143 ($/month). If you extrapolate this trend forward in time, at which time index (in months) do you predict that your neighborhood’s value will have doubled relative to the value at month (index) 10? (Assume months are 0-indexed, round to the nearest month).

View
52

Question 5: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what intercept must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.

(Note: the next quiz question will ask for the slope of the new model.)

View
44850

Question 6: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what slope must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.

View
3022

Question 7: Consider the following data set, and the regression line fitted on this data:Machine Learning: Regression Quiz Answer

Which bold/labeled point, if removed, will have the largest effect on the fitted regression line (dashed)?

View
d

Quiz 2: Fitting a simple linear regression model on housing data

Question 1: Using your Slope and Intercept from predicting prices from square feet, what is the predicted price for a house with 2650 sqft? Use American-style decimals without comma separators (e.g. 300000.34), and round your answer to 2 decimal places. Do not include the dollar sign. You do not need to round your answer.

View
$700074.85

Question 2: Using the learned slope and intercept from the squarefeet model, what is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?

View
Between 1.1e+15 and 1.3e+15

Question 3: According to the inverse regression function and the regression slope and intercept from predicting prices from square-feet, what is the estimated square-feet for a house costing $800,000? You do not need to round your answer.

View
3004

Question 4: Which of the two models (square feet or bedrooms) has lower RSS on TEST data?

View
Model 1 (Square feet)

Week 2: Machine Learning: Regression Quiz Answer

Quiz 1: Multiple Regression

Question 1: Which of the following is NOT a linear regression model. Hint: remember that a linear regression model is always linear in the parameters, but may use non-linear features.

View
y = w_0 w_1 + \log(w_1)xy=w0​w1​+log(w1​)x

Question 2: Your estimated model for predicting house prices has a large positive weight on ‘square feet living’. This implies that if we remove the feature ‘square feet living’ and refit the model, the new predictive performance will be worse than before.

View
False

Question 3: Complete the following: Your estimated model for predicting house prices has a positive weight on ‘square feet living’. You then add ‘lot size’ to the model and re-estimate the feature weights. The new weight on ‘square feet living’ [_________] be positive.

View
might

Question 4: If you double the value of a given feature (i.e. a specific column of the feature matrix), what happens to the least-squares estimated coefficients for every other feature? (assume you have no other feature that depends on the doubled feature i.e. no interaction terms).

View
They stay the same

Question 5: Gradient descent/ascent is…

View
An algorithm for minimizing/maximizing a function

Question 6: Gradient descent/ascent allows us to…

View
Estimate model parameters from data

Question 7: Which of the following statements about step-size in gradient descent is/are TRUE (select all that apply)

View
1.If the step size is too small (but not zero) gradient descent may take a very long time to converge
2.If the step-size is too large gradient descent may not converge

Question 8: Let’s analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10×10 matrix H^T HHTH was on the order of D^3D3 operations. Let’s focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix H^T HHTH?

Please enter a number below.

View
5000

Question 9: More generally, if you have DD features and NN observations what is the total complexity of computing (H^T H)^{-1}(HTH)−1?

View
O(ND^2 + D^3)O(ND2+D3)

Quiz 2: Exploring different multiple regression models for house price prediction


Question 1: What is the mean value (arithmetic average) of the ‘bedrooms_squared’ feature on TEST data? (round to 2 decimal places)

View
12.45

Question 2: What is the mean value (arithmetic average) of the ‘bed_bath_rooms’ feature on TEST data? (round to 2 decimal places)

View
7.50

Question 3: What is the mean value (arithmetic average) of the ‘log_sqft_living’ feature on TEST data? (round to 2 decimal places)

View
7.55

Question 4:What is the mean value (arithmetic average) of the ‘lat_plus_long’ feature on TEST data? (round to 2 decimal places)

View
74.65

Question 5: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 1?

View
Positive (+)

Question 6: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 2?

View
Negative (-)

Question 7: Which model (1, 2 or 3) has lowest RSS on TRAINING Data?

View
Model 3

Question 8: Which model (1, 2 or 3) has lowest RSS on TESTING Data?

View
Model 2

Quiz 3: Implementing gradient descent for multiple regression

Question 1: What is the value of the weight for sqft_living from your gradient descent predicting house prices (model 1)? Round your answer to 1 decimal place.

View
281.91

Question 2: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?

View
356134.44

Question 3: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?

View
366651.41

Question 4:Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?

View
Model 1

Question 5: Which model (1 or 2) has lowest RSS on all of the TEST data?

View
Model 2

Week 3: Machine Learning: Regression Quiz Answer

Quiz 1: Assessing Performance

Question 1: If the features of Model 1 are a strict subset of those in Model 2, the TRAINING error of the two models can never be the same.

View
False

Question 2: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lowest TRAINING error?

View
Model 2

Question 3: If the features of Model 1 are a strict subset of those in Model 2. which model will USUALLY have lowest TEST error?

View
It’s impossible to tell with only this information

Question 4: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lower BIAS?

View
Model 2

Question 5: Which of the following plots of model complexity vs. RSS is most likely from TRAINING data (for a fixed data set)?

View
c

Machine Learning: Regression Coursera Quiz Answers

Question 6: Which of the following plots of model complexity vs. RSS is most likely from TEST data (for a fixed data set)?

View
a

Question 7: It is always optimal to add more features to a regression model.

View
False

Question 8: A simple model with few parameters is most likely to suffer from:

View
High Bias

Question 9: A complex model with many parameters is most likely to suffer from:

View
High Variance

Question 10: A model with many parameters that fits training data very well but does poorly on test data is considered to be

View
overfitted

Question 11: A common process for selecting a parameter like the optimal polynomial degree is:

View
Minimizing validation error

Question 12: Selecting model complexity on test data (choose all that apply):

View
1.Provides an overly optimistic assessment of performance of the resulting model
2.Should never be done

Question 13: Which of the following statements is true (select all that apply): For a fixed model complexity, in the limit of an infinite amount of training data,

View
Variance goes to 0

Quiz 2: Exploring the bias-variance tradeoff


Question 1: Is the sign (positive or negative) for power_15 the same in all four models?

View
No, it is not the same in all four models

Question 2: The plotted fitted lines all look the same in all four plots

View
False

Question 3: Which degree (1, 2, …, 15) had the lowest RSS on Validation data?

View
6

Question 4: What is the RSS on TEST data for the model with the degree selected from Validation data? (Make sure you got the correct degree from the previous question)

View
Between 1.2e+14 and 1.3e+14

Week 4: Machine Learning: Regression Quiz Answer

Quiz 1: Ridge Regression

Question 1: Which of the following is NOT a valid measure of overfitting?

View
Sum of parameters (w_1+w_2+…+w_nw1​+w2​+…+wn​)

Question 2: In ridge regression, choosing a large penalty strength \lambdaλ tends to lead to a model with (choose all that apply):

View
1.High bias
2.Low variance

Question 4: In ridge regression using unnormalized features, if you double the value of a given feature (i.e., a specific column of the feature matrix), what happens to the estimated coefficients for every other feature? They:

View
Impossible to tell from the information provided

Question 5: If we only have a small number of observations, K-fold cross validation provides a better estimate of the generalization error than the validation set method.

View
True

Question 6: 10-fold cross validation is more computationally intensive than leave-one-out (LOO) cross validation.

View
False

Question 7: Assume you have a training dataset consisting of NN observations and DD features. You use the closed-form solution to fit a multiple linear regression model using ridge regression. To choose the penalty strength \lambdaλ, you run leave-one-out (LOO) cross validation searching over LL values of \lambdaλ. Let Cost(N,D) be the computational cost of running ridge regression with NN data points and DD features. Assume the prediction cost is negligible compared to the computational cost of training the model. Which of the following represents the computational cost of your LOO cross validation procedure?

View
LN⋅Cost(N−1,D)

Question 8: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. How long will it take to run leave-one-out (LOO) cross-validation for this selection task?

View
About 3 years

Question 9: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. If you only want to spend about 1 hour to select \lambdaλ, what value of k should you use for k-fold cross-validation?

View
k=36

Quiz 2: Observing effects of L2 penalty in polynomial regression

Question 1: We first fit a 15th order polynomial model using the ‘sqft_living’ column of the ‘sales’ data frame, with a tiny L2 penalty applied.

What is the absolute value of the learned coefficient of feature power_1? Round your answer to the nearest whole integer. Example: 29

View
Between 70 and 150

Question 2: Next, we split the sales data frame into four subsets (set_1, set_2, set_3, set_4) and fit a 15th order polynomial model using each of the subsets.

For the models learned in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Choose the range that contains this value.

View
Between -1000 and -100

Question 3: This question refer to the same models as the previous question.

For the models learned in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Choose the range that contains this value.

View
Between 1000 and 10000

Question 4: Using the same 4 subsets (set_1, set_2, set_3, set_4), we train 15th order polynomial models again, but this time we apply a large L2 penalty.

For the models learned with the high level of regularization in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11

View
Between 1.8 and 2.1

Question 5: This question refer to the same models as the previous question.

For the models learned with the high level of regularization in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11

View
Between 2.3 and 2.8

Question 6: This question refers to the section “selecting an L2 penalty via cross-validation”.

What is the best value for the L2 penalty according to 10-fold validation? Round your answer to 2 decimal places.

View
1000

Question 7: Using the best L2 penalty found above, train a model using all training data. Which of the following ranges contains the RSS on the TEST data of the model you learn with this L2 penalty?

View
Between 8e13 and 4e14

Quiz 3: Implementing ridge regression via gradient descent

Question 1: We run ridge regression to learn the weights of a simple model that has a single feature (sqft_living), once with l2_penalty=0.0 and once with l2_penalty=1e11.

What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5)

View
263.0

Question 2:This question refers to the same model as the previous question.

What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1 decimal place.

View
James P. Grant

Answer 124.6

Question 3:This question refers to the same model as the previous question.

Comparing the lines you fit with the with no regularization versus high regularization (l2_penalty=1e11), which one is steeper?

View
Line fit with no regularization (l2_penalty=0)

Question 4:This question refers to the same model as the previous question.

Using the weights learned with no regularization (l2_penalty=0), make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fail?

View
Between 2e14 and 5e14

Question 5:We run ridge regression to learn the weights of a model that has two features (sqft_living, sqft_living15), once with l2_penalty=0.0 and once with l2_penalty=1e11.

What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5).

View
243.1

Question 6: This question refers to the same model as the previous question.

What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1

View
91.5

decimal place.

Question 7: This question refers to the same model as the previous question.

Using high_penalty_weights, make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fall?

View
Between 4e14 and 8e14

Question 8: This question refers to the same model as the previous question.

Predict the price of the first house in the test set using the weights learned with no regularization. Do the same using the weights learned with high regularization. Which weights make better prediction for the first house in the test set?

View
The weights learned with high regularization (l2_penalty=1e11)

Week 5: Machine Learning: Regression Quiz Answer

Quiz 1: Feature Selection and Lasso

Question 1: The best fit model of size 5 (i.e., with 5 features) always contains the set of features from best fit model of size 4.

View
False

Question 2: Given 20 potential features, how many models do you have to evaluate in the all subsets algorithm?

View
1048576

Question 3:Given 20 potential features, how many models do you have to evaluate if you are running the forward stepwise greedy algorithm? Assume you run the algorithm all the way to the full feature set.

View
210

Question 4: Which of the plots could correspond to a lasso coefficient path? Select ALL that apply.

Hint: notice \lambda=\inftyλ=∞ in the bottom right of the plots. How should coefficients behave eventually as \lambdaλ goes to infinity?

View
James P. Grant

Answer:

Machine Learning: Regression Coursera Quiz Answers

Question 5: Which of the following statements about coordinate descent is true? (Select all that apply.)

View
To test the convergence of coordinate descent, look at the size of the maximum step you take as you cycle through coordinates.

Question 6: Using normalized features, the ordinary least squares coordinate descent update for feature j has the form (with \rho_jρj​ defined as in the videos):

View
w^j​=ρj

Question 7: Using normalized features, the ridge regression coordinate descent update for feature j has the form (with \rho_jρj​ defined as in the videos):

View
w^j​=ρj​/(λ+1)

Quiz 2: Using LASSO to select features

Question 1: We learn weights on the entire house dataset, using an L1 penalty of 1e10 (or 5e2, if using scikit-learn). Some features are transformations of inputs; see the reading.

Which of the following features have been chosen by LASSO, i.e. which features were assigned nonzero weights? (Choose all that apply)

View
sqft_living

grad

Question 2: We split the house sales dataset into training set, test set, and validation set and choose the l1_penalty that minimizes the error on the validation set.

In which of the following ranges does the best l1_penalty fall?

View
Between 0 and 100

Question 3: Using the best value of l1_penalty as mentioned in the previous question, how many nonzero weights do you have?

View
18

Question 4: We explore a wide range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights (max_nonzeros=7).

What value did you find for l1_penalty_max?

If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.

If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4313), rounded to nearest integer.

View
3792690190.73

Question 5: We then explore the narrow range of l1_penalty values between l1_penalty_min and l1_penalty_max.

What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros?

If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.

If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4342), rounded to nearest integer.

View
3448968612.16

Question 6:Consider the model learned with the l1_penalty found in the previous question. Which of the following features has non-zero coefficients? (Choose all that apply)

View
1.bathrooms
2.sqft_living

Quiz 3: Implementing LASSO using coordinate descent

Question 1: From the section “Effect of L1 penalty”: Consider the simple model with 2 features trained on the entire sales dataset.

Which of the following values of l1_penalty would not set w[1] zero, but would set w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)

View
1.64e8

1.73e8

Question 2: Refer to the same model as the previous question.

Which of the following values of l1_penalty would set both w[1] and w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)

View
1.9e8

2.3e8

Question 3: From the section “Cyclical coordinate descent”: Using the simple model (with 2 features), we run our implementation of LASSO coordinate descent on the normalized sales dataset. We apply an L1 penalty of 1e7 and tolerance of 1.0.

Which of the following ranges contains the RSS of the learned model on the normalized dataset?

View
Between 1e15 and 3e15

Question 4: Refer to the same model as the previous question.

Which of the following features were assigned a zero weight at convergence?

View
bedrooms

Question 5: In the section “Evaluating LASSO fit with more features”, we split the data into training and test sets and learn weights with varying degree of L1 penalties. The model now has 13 features.

In the model trained with l1_penalty=1e7, which of the following features has non-zero weight? (Select all that apply)

View
1.waterfront
2.sqft_living
3.constant

Question 6: This question refers to the same model as the previous question.

In the model trained with l1_penalty=1e8, which of the following features has non-zero weight? (Select all that apply)

View
constant

Question 7: This question refers to the same model as the previous question.

In the model trained with l1_penalty=1e4, which of the following features has non-zero weight? (Select all that apply)

View
constant

sqft_living

grade

waterfront

sqft_basement

Question 8: In the section “Evaluating each of the learned models on the test data”, we evaluate three models on the test data. The three models were trained with same set of features but different L1 penalties.

Which of the three models gives the lowest RSS on the TEST data?

View
The model trained with 1e4

Week 6: Machine Learning: Regression Quiz Answer

Quiz 1: Nearest Neighbors & Kernel Regression

Question 1: Which of the following datasets is best suited to nearest neighbor or kernel regression? Choose all that apply.

View
1.A dataset with two features whose observations are evenly scattered throughout the input space
2.A dataset with many observations

Question 2: Which of the following is the most significant advantage of k-nearest neighbor regression (for k>1) over 1-nearest neighbor regression?

View
Better copes with noise in the data

Question 3: To obtain a fit with low variance using kernel regression, we should choose the kernel to have:

View
Large bandwidth λ

Question 4: In k-nearest neighbor regression and kernel regression, the complexity of functions that can be represented grows as we get more data.

View
True

Question 5: Parametric regression and 1-nearest neighbor regression will converge to the same solution as we collect more and more noiseless observations.

View
False

Question 6: Suppose you are creating a website to help shoppers pick houses. Every time a user of your website visits the webpage for a specific house, you want to compute a prediction of the house value. You are using 1-NN to make the prediction and have 100,000 houses in the dataset, with each house having 100 features. Computing the distance between two houses using all the features takes about 10 microseconds. Assuming the cost of all other operations involved (e.g., fetching data, etc.) is negligible, about how long will it take to make a prediction using the brute-force method described in the videos?

View
1 second

Question 7: For the housing website described in the previous question, you learn that you need predictions within 50 milliseconds. To accomplish this, you decide to reduce the number of features in your nearest neighbor comparisons. How many features can you use?

View
5 features

Quiz 2: Predicting house prices using k-nearest neighbors regression

Question 1: From the section “Compute a single distance”: we take our query house to be the first house of the test set.

What is the Euclidean distance between the query house and the 10th house of the training set? Enter your answer in American-style decimals (e.g. 0.044) rounded to 3 decimal places.

View
0.060

Question 2: From the section “Compute multiple distances”: we take our query house to be the first house of the test set.

Among the first 10 training houses, which house is the closest to the query house? Enter the 0-based index of the closest house.

View
8

Question 3: From the section “Perform 1-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). What is the (0-based) index of the house in the training set that is closest to this query house?

View
382

Question 4: From the section “Perform 1-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). What is the predicted value of the query house based on 1-nearest neighbor regression? Enter your answer in simple decimals without comma separators (e.g. 300000), rounded to nearest whole number.

View
249000

Question 5: From the section “Perform k-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). Which of the following is NOT part of the 4 training houses closest to the query house? (Note that all indices are 0-based.)

View
training house with index 2818

Question 6: From the section “Perform k-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). Predict the value of the query house by the simple averaging method. Enter your answer in simple decimals without comma separators (e.g. 241242), rounded to nearest whole number.

View
413988

Question 7: From the section “Perform k-nearest neighbor regression”: Make prediction for the first 10 houses using k-nearest neighbors with k=10.

What is the index of the house in this query set that has the lowest predicted value? Enter an index between 0 and 9.

View
6

Question 8: From the section “Perform k-nearest neighbor regression”: We use a validation set to find the best k value, i.e. one that minimizes the RSS on validation set.

If we perform k-nearest neighbors with optimal k found above, what is the RSS on the TEST data? Choose the range that contains this value.

View
Between 8e13 and 2e14
Machine Learning: Regression Course Review

In our experience, we suggest you enroll in Machine Learning: Regression courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Machine Learning: Regression for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Machine Learning: Regression Coursera Quiz Answers.

Get All Course Quiz Answers of Machine Learning Specialization

Machine Learning Foundations: A Case Study Approach Quiz Answer

Machine Learning: Regression Coursera Quiz Answers

Machine Learning: Classification Coursera Quiz Answers

Machine Learning: Clustering & Retrieval Quiz Answers

Team Networking Funda
Team Networking Funda

We are Team Networking Funda, a group of passionate authors and networking enthusiasts committed to sharing our expertise and experiences in networking and team building. With backgrounds in Data Science, Information Technology, Health, and Business Marketing, we bring diverse perspectives and insights to help you navigate the challenges and opportunities of professional networking and teamwork.

Leave a Reply

Your email address will not be published. Required fields are marked *