## Get All Weeks Machine Learning: Regression Coursera Quiz Answers

## Table of Contents

### Week 1: Machine Learning: Regression Quiz Answers

#### Quiz 1: Simple Linear Regression

Question 1: Assume you fit a regression model to predict house prices from square feet based on a training data set consisting of houses with square feet in the range of 1000 and 2000. In which interval would we expect predictions to do best?

[expand title=View Answer] [1000, 2000] [/expand]

Question 2: In a simple regression model, if you increase the input value by 1 then you expect the output to change by:

[expand title=View Answer] The value of the slope parameter [/expand]

Question 3: Two people present you with fits of their simple regression model for predicting house prices from square feet. You discover that the estimated intercept and slopes are exactly the same. This necessarily implies that these two people fit their models on *exactly* the same data set.

[expand title=View Answer] False[/expand]

Question 4: You have a data set consisting of the sales prices of houses in your neighborhood, with each sale time-stamped by the month and year in which the house sold. You want to predict the average value of houses in your neighborhood over time, so you fit a simple regression model with average house price as the output and the time index (in months) as the input. Based on 10 months of data, the estimated intercept is $4569 and the estimated slope is 143 ($/month). If you extrapolate this trend forward in time, at which time index (in months) do you predict that your neighborhood’s value will have doubled **relative to the value at month (index) 10**? (Assume months are 0-indexed, round to the nearest month).

[expand title=View Answer] 52 [/expand]

Question 5: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what **intercept** must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.

(Note: the next quiz question will ask for the slope of the new model.)

[expand title=View Answer]44850 [/expand]

Question 6: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what **slope** must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.

[expand title=View Answer] 3022 [/expand]

Question 7: Consider the following data set, and the regression line fitted on this data:

Which bold/labeled point, if removed, will have the largest effect on the fitted regression line (dashed)?

[expand title=View Answer] d[/expand]

#### Quiz 2: Fitting a simple linear regression model on housing data

Question 1: Using your Slope and Intercept from predicting prices from square feet, what is the predicted price for a house with 2650 sqft? Use American-style decimals without comma separators (e.g. 300000.34), and round your answer to 2 decimal places. Do not include the dollar sign. You do not need to round your answer.

[expand title=View Answer] $700074.85 [/expand]

Question 2: **Using the learned slope and intercept from the squarefeet model, what is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?**

[expand title=View Answer]Between 1.1e+15 and 1.3e+15[/expand]

Question 3: According to the inverse regression function and the regression slope and intercept from predicting prices from square-feet, what is the estimated square-feet for a house costing $800,000? You do not need to round your answer.

[expand title=View Answer] 3004 [/expand]

Question 4: Which of the two models (square feet or bedrooms) has lower RSS on TEST data?

[expand title=View Answer] Model 1 (Square feet) [/expand]

#### Week 2: Machine Learning: Regression Quiz Answer

#### Quiz 1: Multiple Regression

Question 1: Which of the following is **NOT** a **linear** regression model. *Hint: remember that a linear regression model is always linear in the parameters, but may use non-linear features.*

[expand title=View Answer] y = w_0 w_1 + \log(w_1)xy=w0w1+log(w1)x [/expand]

Question 2: Your estimated model for predicting house prices has a large positive weight on ‘square feet living’. This implies that if we remove the feature ‘square feet living’ and refit the model, the new predictive performance will be **worse** than before.

[expand title=View Answer] False [/expand]

Question 3: *Complete the following:* Your estimated model for predicting house prices has a positive weight on ‘square feet living’. You then add ‘lot size’ to the model and re-estimate the feature weights. The new weight on ‘square feet living’ [_________] be positive.

[expand title=View Answer] might [/expand]

Question 4: If you double the value of a given feature (i.e. a specific column of the feature matrix), what happens to the least-squares estimated coefficients for every **other** feature? (assume you have no other feature that depends on the doubled feature i.e. no interaction terms).

[expand title=View Answer] They stay the same[/expand]

Question 5: Gradient descent/ascent is…

[expand title=View Answer] An algorithm for minimizing/maximizing a function [/expand]

Question 6: Gradient descent/ascent allows us to…

[expand title=View Answer] Estimate model parameters from data [/expand]

Question 7: Which of the following statements about step-size in gradient descent is/are **TRUE** (select all that apply)

[expand title=View Answer]

1.If the step size is too small (but not zero) gradient descent may take a very long time to converge

2.If the step-size is too large gradient descent may not converge

[/expand]

Question 8: Let’s analyze how many computations are required to fit a multiple linear regression model *using the closed-form solution* based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10×10 matrix H^T H*HTH* was on the order of D^3*D*3 operations. Let’s focus on forming this matrix **prior** to inversion. How many multiplications are required to form the matrix H^T H*HTH*?

Please enter a number below.

[expand title=View Answer] 5000 [/expand]

Question 9: More generally, if you have D*D* features and N*N* observations what is the total complexity of computing (H^T H)^{-1}(*HTH*)−1?

[expand title=View Answer]O(ND^2 + D^3)O(ND2+D3)[/expand]

#### Quiz 2: Exploring different multiple regression models for house price prediction

Question 1: What is the mean value (arithmetic average) of the ‘bedrooms_squared’ feature on TEST data? (round to 2 decimal places)

[expand title=View Answer] 12.45 [/expand]

Question 2: What is the mean value (arithmetic average) of the ‘bed_bath_rooms’ feature on TEST data? (round to 2 decimal places)

[expand title=View Answer] 7.50 [/expand]

Question 3: What is the mean value (arithmetic average) of the ‘log_sqft_living’ feature on TEST data? (round to 2 decimal places)

[expand title=View Answer] 7.55 [/expand]

Question 4:What is the mean value (arithmetic average) of the ‘lat_plus_long’ feature on TEST data? (round to 2 decimal places)

[expand title=View Answer] 74.65 [/expand]

Question 5: **What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 1?**

[expand title=View Answer] Positive (+) [/expand]

Question 6: **What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 2?**

[expand title=View Answer] Negative (-)[/expand]

Question 7: **Which model (1, 2 or 3) has lowest RSS on TRAINING Data?**

[expand title=View Answer] Model 3 [/expand]

Question 8: **Which model (1, 2 or 3) has lowest RSS on TESTING Data?**

[expand title=View Answer] Model 2 [/expand]

#### Quiz 3: Implementing gradient descent for multiple regression

Question 1: What is the value of the weight for sqft_living from your gradient descent predicting house prices (model 1)? Round your answer to 1 decimal place.

[expand title=View Answer] 281.91 [/expand]

Question 2: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?

[expand title=View Answer] 356134.44 [/expand]

Question 3: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?

[expand title=View Answer] 366651.41 [/expand]

Question 4:**Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?**

[expand title=View Answer] Model 1 [/expand]

Question 5: **Which model (1 or 2) has lowest RSS on all of the TEST data?**

[expand title=View Answer] Model 2 [/expand]

#### Week 3: Machine Learning: Regression Quiz Answer

#### Quiz 1: Assessing Performance

Question 1: If the features of Model 1 are a strict subset of those in Model 2, the TRAINING error of the two models can **never** be the same.

[expand title=View Answer] False [/expand]

Question 2: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lowest TRAINING error?

[expand title=View Answer] Model 2 [/expand]

Question 3: If the features of Model 1 are a strict subset of those in Model 2. which model will USUALLY have lowest TEST error?

[expand title=View Answer] It’s impossible to tell with only this information[/expand]

Question 4: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lower BIAS?

[expand title=View Answer]Model 2 [/expand]

Question 5: Which of the following plots of model complexity vs. RSS is most likely from TRAINING data (for a fixed data set)?

[expand title=View Answer] c [/expand]

Question 6: Which of the following plots of model complexity vs. RSS is most likely from TEST data (for a fixed data set)?

[expand title=View Answer]a [/expand]

Question 7: It is **always** optimal to add more features to a regression model.

[expand title=View Answer] False [/expand]

Question 8: A simple model with few parameters is most likely to suffer from:

[expand title=View Answer] High Bias[/expand]

Question 9: A complex model with many parameters is most likely to suffer from:

[expand title=View Answer] High Variance [/expand]

Question 10: A model with many parameters that fits training data very well but does poorly on test data is considered to be

[expand title=View Answer] overfitted [/expand]

Question 11: A common process for selecting a parameter like the optimal polynomial degree is:

[expand title=View Answer] Minimizing validation error [/expand]

Question 12: Selecting model complexity on test data (choose all that apply):

[expand title=View Answer]

1.Provides an overly optimistic assessment of performance of the resulting model

2.Should never be done

[/expand]

Question 13: Which of the following statements is true (select all that apply): For a **fixed model complexity**, in the limit of an infinite amount of training data,

[expand title=View Answer] Variance goes to 0 [/expand]

#### Quiz 2: Exploring the bias-variance tradeoff

Question 1: Is the sign (positive or negative) for power_15 the same in all four models?

[expand title=View Answer] No, it is not the same in all four models [/expand]

Question 2: The plotted fitted lines all look the same in all four plots

[expand title=View Answer] False [/expand]

Question 3: Which degree (1, 2, …, 15) had the lowest RSS on Validation data?

[expand title=View Answer]6 [/expand]

Question 4: What is the RSS on TEST data for the model with the degree selected from Validation data? (Make sure you got the correct degree from the previous question)

[expand title=View Answer] Between 1.2e+14 and 1.3e+14 [/expand]

#### Week 4: Machine Learning: Regression Quiz Answer

#### Quiz 1: Ridge Regression

Question 1: Which of the following is NOT a valid measure of overfitting?

[expand title=View Answer]Sum of parameters (w_1+w_2+…+w_nw1+w2+…+wn)[/expand]

Question 2: In ridge regression, choosing a large penalty strength \lambda*λ* tends to lead to a model with (choose all that apply):

[expand title=View Answer]

1.High bias

2.Low variance

[/expand]

Question 4: In ridge regression using unnormalized features, if you double the value of a given feature (i.e., a specific column of the feature matrix), what happens to the estimated coefficients for every other feature? They:

[expand title=View Answer]Impossible to tell from the information provided [/expand]

Question 5: If we only have a small number of observations, K-fold cross validation provides a better estimate of the generalization error than the validation set method**.**

[expand title=View Answer] True [/expand]

Question 6: 10-fold cross validation is more computationally intensive than leave-one-out (LOO) cross validation.

[expand title=View Answer]False [/expand]

Question 7: Assume you have a training dataset consisting of N*N* observations and D*D* features. You use the closed-form solution to fit a multiple linear regression model using ridge regression. To choose the penalty strength \lambda*λ*, you run leave-one-out (LOO) cross validation searching over L*L* values of \lambda*λ*. Let Cost(*N*,*D*) be the computational cost of running ridge regression with N*N* data points and D*D* features. Assume the prediction cost is negligible compared to the computational cost of training the model. Which of the following represents the computational cost of your LOO cross validation procedure?

[expand title=View Answer] LN⋅Cost(N−1,D) [/expand]

Question 8: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambda*λ* by searching over 100 possible values. How long will it take to run leave-one-out (LOO) cross-validation for this selection task?

[expand title=View Answer]About 3 years [/expand]

Question 9: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambda*λ* by searching over 100 possible values. If you only want to spend about 1 hour to select \lambda*λ*, what value of k should you use for k-fold cross-validation?

[expand title=View Answer] k=36 [/expand]

#### Quiz 2: Observing effects of L2 penalty in polynomial regression

Question 1: We first fit a 15th order polynomial model using the ‘sqft_living’ column of the ‘sales’ data frame, with a tiny L2 penalty applied.

What is the absolute value of the learned coefficient of feature power_1? Round your answer to the nearest whole integer. Example: 29

[expand title=View Answer] Between 70 and 150 [/expand]

Question 2: Next, we split the sales data frame into four subsets (set_1, set_2, set_3, set_4) and fit a 15th order polynomial model using each of the subsets.

For the models learned in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Choose the range that contains this value.

[expand title=View Answer] Between -1000 and -100 [/expand]

Question 3: This question refer to the same models as the previous question.

*For the models learned in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Choose the range that contains this value.*

[expand title=View Answer]Between 1000 and 10000 [/expand]

Question 4: Using the same 4 subsets (set_1, set_2, set_3, set_4), we train 15th order polynomial models again, but this time we apply a large L2 penalty.

For the models learned with the high level of regularization in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11

[expand title=View Answer] Between 1.8 and 2.1 [/expand]

Question 5: This question refer to the same models as the previous question.

For the models learned with the high level of regularization in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11

[expand title=View Answer] Between 2.3 and 2.8 [/expand]

Question 6: This question refers to the section “selecting an L2 penalty via cross-validation”.

What is the best value for the L2 penalty according to 10-fold validation? Round your answer to 2 decimal places.

[expand title=View Answer] 1000 [/expand]

Question 7: Using the best L2 penalty found above, train a model using all training data. Which of the following ranges contains the RSS on the TEST data of the model you learn with this L2 penalty?

[expand title=View Answer]Between 8e13 and 4e14 [/expand]

#### Quiz 3: Implementing ridge regression via gradient descent

Question 1: We run ridge regression to learn the weights of a simple model that has a single feature (sqft_living), once with l2_penalty=0.0 and once with l2_penalty=1e11.

What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5)

[expand title=View Answer] 263.0 [/expand]

Question 2:This question refers to the same model as the previous question.

What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1 decimal place.

[expand title=View Answer] James P. Grant [/expand]

**Answer 124.6**

Question 3:This question refers to the same model as the previous question.

Comparing the lines you fit with the with no regularization versus high regularization (l2_penalty=1e11), which one is steeper?

[expand title=View Answer] Line fit with no regularization (l2_penalty=0) [/expand]

Question 4:This question refers to the same model as the previous question.

Using the weights learned with no regularization (l2_penalty=0), make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fail?

[expand title=View Answer]Between 2e14 and 5e14[/expand]

Question 5:We run ridge regression to learn the weights of a model that has two features (sqft_living, sqft_living15), once with l2_penalty=0.0 and once with l2_penalty=1e11.

What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5).

[expand title=View Answer] 243.1 [/expand]

Question 6: This question refers to the same model as the previous question.

What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1

[expand title=View Answer] 91.5 [/expand]

decimal place.

Question 7: This question refers to the same model as the previous question.

Using high_penalty_weights, make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fall?

[expand title=View Answer] Between 4e14 and 8e14 [/expand]

Question 8: This question refers to the same model as the previous question.

Predict the price of the first house in the test set using the weights learned with no regularization. Do the same using the weights learned with high regularization. Which weights make better prediction for the first house in the test set?

[expand title=View Answer] The weights learned with high regularization (l2_penalty=1e11) [/expand]

#### Week 5: Machine Learning: Regression Quiz Answer

#### Quiz 1: Feature Selection and Lasso

Question 1: The best fit model of size 5 (i.e., with 5 features) always contains the set of features from best fit model of size 4.

[expand title=View Answer] False [/expand]

Question 2: Given 20 potential features, how many models do you have to evaluate in the all subsets algorithm?

[expand title=View Answer] 1048576 [/expand]

Question 3:Given 20 potential features, how many models do you have to evaluate if you are running the forward stepwise greedy algorithm? Assume you run the algorithm all the way to the full feature set.

[expand title=View Answer] 210 [/expand]

Question 4: Which of the plots could correspond to a lasso coefficient path? Select ALL that apply.

Hint: notice \lambda=\infty*λ*=∞ in the bottom right of the plots. How should coefficients behave eventually as \lambda*λ* goes to infinity?

[expand title=View Answer] James P. Grant [/expand]

Answer:

Question 5: Which of the following statements about coordinate descent is true? (Select all that apply.)

[expand title=View Answer] To test the convergence of coordinate descent, look at the size of the maximum step you take as you cycle through coordinates. [/expand]

Question 6: Using normalized features, the __ordinary least squares__ coordinate descent update for feature j has the form (with \rho_j*ρj* defined as in the videos):

[expand title=View Answer] w^j=ρj [/expand]

****

Question 7: Using normalized features, the __ridge regression__ coordinate descent update for feature j has the form (with \rho_j*ρj* defined as in the videos):

[expand title=View Answer] w^j=ρj/(λ+1) [/expand]

#### Quiz 2: Using LASSO to select features

Question 1: We learn weights on the entire house dataset, using an L1 penalty of 1e10 (or 5e2, if using scikit-learn). Some features are transformations of inputs; see the reading.

Which of the following features have been chosen by LASSO, i.e. which features were assigned nonzero weights? (Choose all that apply)

[expand title=View Answer]

sqft_living

grad

[/expand]

Question 2: We split the house sales dataset into training set, test set, and validation set and choose the l1_penalty that minimizes the error on the validation set.

In which of the following ranges does the best l1_penalty fall?

[expand title=View Answer]Between 0 and 100 [/expand]

Question 3: Using the best value of l1_penalty as mentioned in the previous question, how many nonzero weights do you have?

[expand title=View Answer] 18 [/expand]

Question 4: We explore a wide range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights (max_nonzeros=7).

What value did you find for l1_penalty_max?

__If you are using Turi Create,__ enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.

__If you are using scikit-learn__, enter your answer in simple decimals without commas (e.g. 4313), rounded to nearest integer.

[expand title=View Answer] 3792690190.73 [/expand]

Question 5: We then explore the narrow range of l1_penalty values between l1_penalty_min and l1_penalty_max.

What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity __equal__ to max_nonzeros?

__If you are using Turi Create__, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.

__If you are using scikit-learn,__ enter your answer in simple decimals without commas (e.g. 4342), rounded to nearest integer.

[expand title=View Answer] 3448968612.16 [/expand]

Question 6:Consider the model learned with the l1_penalty found in the previous question. Which of the following features has non-zero coefficients? (Choose all that apply)

[expand title=View Answer]

1.bathrooms

2.sqft_living

[/expand]

#### Quiz 3: Implementing LASSO using coordinate descent

Question 1: From the section “Effect of L1 penalty”: Consider the simple model with 2 features trained on the entire sales dataset.

Which of the following values of l1_penalty __would not__ set w[1] zero, but __would__ set w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)

[expand title=View Answer]

1.64e8

1.73e8

[/expand]

Question 2: Refer to the same model as the previous question.

Which of the following values of l1_penalty would set __both__ w[1] and w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)

[expand title=View Answer]

1.9e8

2.3e8

[/expand]

Question 3: From the section “Cyclical coordinate descent”: Using the simple model (with 2 features), we run our implementation of LASSO coordinate descent on the normalized sales dataset. We apply an L1 penalty of 1e7 and tolerance of 1.0.

Which of the following ranges contains the RSS of the learned model on the normalized dataset?

[expand title=View Answer]Between 1e15 and 3e15 [/expand]

Question 4: Refer to the same model as the previous question.

Which of the following features were assigned a zero weight at convergence?

[expand title=View Answer] bedrooms [/expand]

Question 5: In the section “Evaluating LASSO fit with more features”, we split the data into training and test sets and learn weights with varying degree of L1 penalties. The model now has 13 features.

In the model trained with l1_penalty=1e7, which of the following features has non-zero weight? (Select all that apply)

[expand title=View Answer]

1.waterfront

2.sqft_living

3.constant

[/expand]

Question 6: This question refers to the same model as the previous question.

In the model trained with l1_penalty=1e8, which of the following features has non-zero weight? (Select all that apply)

[expand title=View Answer]constant[/expand]

Question 7: This question refers to the same model as the previous question.

In the model trained with l1_penalty=1e4, which of the following features has non-zero weight? (Select all that apply)

[expand title=View Answer]

constant

sqft_living

grade

waterfront

sqft_basement

[/expand]

Question 8: In the section “Evaluating each of the learned models on the test data”, we evaluate three models on the test data. The three models were trained with same set of features but different L1 penalties.

Which of the three models gives the lowest RSS on the TEST data?

[expand title=View Answer] The model trained with 1e4 [/expand]

#### Week 6: Machine Learning: Regression Quiz Answer

#### Quiz 1: Nearest Neighbors & Kernel Regression

Question 1: Which of the following datasets is best suited to nearest neighbor or kernel regression? Choose all that apply.

[expand title=View Answer]

1.A dataset with two features whose observations are evenly scattered throughout the input space

2.A dataset with many observations

[/expand]

Question 2: Which of the following is the most significant advantage of k-nearest neighbor regression (for k>1) over 1-nearest neighbor regression?

[expand title=View Answer] Better copes with noise in the data [/expand]

Question 3: To obtain a fit with low variance using kernel regression, we should choose the kernel to have:

[expand title=View Answer] Large bandwidth λ [/expand]

Question 4: In k-nearest neighbor regression and kernel regression, the complexity of functions that can be represented grows as we get more data.

[expand title=View Answer] True [/expand]

Question 5: Parametric regression and 1-nearest neighbor regression will converge to the same solution as we collect more and more noiseless observations.

[expand title=View Answer] False[/expand]

Question 6: Suppose you are creating a website to help shoppers pick houses. Every time a user of your website visits the webpage for a specific house, you want to compute a prediction of the house value. You are using 1-NN to make the prediction and have 100,000 houses in the dataset, with each house having 100 features. Computing the distance between two houses using all the features takes about 10 microseconds. Assuming the cost of all other operations involved (e.g., fetching data, etc.) is negligible, about how long will it take to make a prediction using the brute-force method described in the videos?

[expand title=View Answer] 1 second [/expand]

Question 7: For the housing website described in the previous question, you learn that you need predictions within 50 milliseconds. To accomplish this, you decide to reduce the number of features in your nearest neighbor comparisons. How many features can you use?

[expand title=View Answer] 5 features [/expand]

#### Quiz 2: Predicting house prices using k-nearest neighbors regression

Question 1: From the section “Compute a single distance”: we take our query house to be the first house of the test set.

What is the Euclidean distance between the query house and the 10th house of the training set? Enter your answer in American-style decimals (e.g. 0.044) rounded to 3 decimal places.

[expand title=View Answer] 0.060 [/expand]

Question 2: From the section “Compute multiple distances”: we take our query house to be the first house of the test set.

Among the first 10 training houses, which house is the closest to the query house? Enter the 0-based index of the closest house.

[expand title=View Answer] 8 [/expand]

Question 3: From the section “Perform 1-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). What is the (0-based) index of the house in the training set that is closest to this query house?

[expand title=View Answer]382 [/expand]

Question 4: From the section “Perform 1-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). What is the predicted value of the query house based on 1-nearest neighbor regression? Enter your answer in simple decimals without comma separators (e.g. 300000), rounded to nearest whole number.

[expand title=View Answer] 249000 [/expand]

Question 5: From the section “Perform k-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). Which of the following is NOT part of the 4 training houses closest to the query house? (Note that all indices are 0-based.)

[expand title=View Answer] training house with index 2818 [/expand]

Question 6: From the section “Perform k-nearest neighbor regression”:

Take the query house to be third house of the test set (features_test[2]). Predict the value of the query house by the simple averaging method. Enter your answer in simple decimals without comma separators (e.g. 241242), rounded to nearest whole number.

[expand title=View Answer] 413988 [/expand]

Question 7: From the section “Perform k-nearest neighbor regression”: Make prediction for the first 10 houses using k-nearest neighbors with k=10.

What is the index of the house in this query set that has the lowest predicted value? Enter an index between 0 and 9.

[expand title=View Answer] 6 [/expand]

Question 8: From the section “Perform k-nearest neighbor regression”: We use a validation set to find the best k value, i.e. one that minimizes the RSS on validation set.

If we perform k-nearest neighbors with optimal k found above, what is the RSS on the TEST data? Choose the range that contains this value.

[expand title=View Answer] Between 8e13 and 2e14 [/expand]

##### Machine Learning: Regression Course Review

In our experience, we suggest you enroll in Machine Learning: Regression courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Machine Learning: Regression for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Machine Learning: Regression Coursera Quiz Answers.

##### Get All Course Quiz Answers of Machine Learning Specialization

Machine Learning Foundations: A Case Study Approach Quiz Answer

Machine Learning: Regression Coursera Quiz Answers

Machine Learning: Classification Coursera Quiz Answers

Machine Learning: Clustering & Retrieval Quiz Answers