Get All Weeks Machine Learning: Regression Coursera Quiz Answers
Week 1: Machine Learning: Regression Quiz Answers
Quiz 1: Simple Linear Regression
Question 1: Assume you fit a regression model to predict house prices from square feet based on a training data set consisting of houses with square feet in the range of 1000 and 2000. In which interval would we expect predictions to do best?
- [0, 1000]
- [1000, 2000]
- [2000, 3000]
Question 2: In a simple regression model, if you increase the input value by 1 then you expect the output to change by:
- Also 1
- The value of the slope parameter
- The value of the intercept parameter
- Impossible to tell
Question 3: Two people present you with fits of their simple regression model for predicting house prices from square feet. You discover that the estimated intercept and slopes are exactly the same. This necessarily implies that these two people fit their models on exactly the same data set.
- True
- False
Question 4: You have a data set consisting of the sales prices of houses in your neighborhood, with each sale time-stamped by the month and year in which the house sold. You want to predict the average value of houses in your neighborhood over time, so you fit a simple regression model with average house price as the output and the time index (in months) as the input. Based on 10 months of data, the estimated intercept is $4569 and the estimated slope is 143 ($/month). If you extrapolate this trend forward in time, at which time index (in months) do you predict that your neighborhood’s value will have doubled relative to the value at month (index) 10? (Assume months are 0-indexed, round to the nearest month).
Answer: 52
Question 5: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what intercept must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.
(Note: the next quiz question will ask for the slope of the new model.)
Answer: -44850
Question 6: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what slope must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.
Answer: 3022
Question 7: Consider the following data set, and the regression line fitted on this data:
Which bold/labeled point, if removed, will have the largest effect on the fitted regression line (dashed)?
- a
- b
- c
- d
Quiz 2: Fitting a simple linear regression model on housing data
Question 1: Using your Slope and Intercept from predicting prices from square feet, what is the predicted price for a house with 2650 sqft? Use American-style decimals without comma separators (e.g. 300000.34), and round your answer to 2 decimal places. Do not include the dollar sign. You do not need to round your answer.
Answer: $700074.85
Question 2: Using the learned slope and intercept from the squarefeet model, what is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?
- Between 5e+12 and 5.2e+12
- Between 1.1e+14 and 1.3e+14
- Between 1.1e+15 and 1.3e+15
- Between 3.3e+15 and 3.5e+15
Question 3: According to the inverse regression function and the regression slope and intercept from predicting prices from square-feet, what is the estimated square-feet for a house costing $800,000? You do not need to round your answer.
Answer: 3004
Question 4: Which of the two models (square feet or bedrooms) has lower RSS on TEST data?
- Model 1 (Square feet)
- Model 2 (Bedrooms)
Week 2: Machine Learning: Regression Quiz Answer
Quiz 1: Multiple Regression
Question 1: Which of the following is NOT a linear regression model. Hint: remember that a linear regression model is always linear in the parameters, but may use non-linear features.
- y = w_0 + w_1 xy=w0+w1x
- y = w_0 + w_1 x^2y=w0+w1x2
- y = w_0 + w_1 \log(x)y=w0+w1log(x)
- y = w_0 w_1 + \log(w_1)xy=w0w1+log(w1)x
Question 2: Your estimated model for predicting house prices has a large positive weight on ‘square feet living’. This implies that if we remove the feature ‘square feet living’ and refit the model, the new predictive performance will be worse than before.
- True
- False
Question 3: Complete the following: Your estimated model for predicting house prices has a positive weight on ‘square feet living’. You then add ‘lot size’ to the model and re-estimate the feature weights. The new weight on ‘square feet living’ [_________] be positive.
- will not
- will definitely
- might
Question 4: If you double the value of a given feature (i.e. a specific column of the feature matrix), what happens to the least-squares estimated coefficients for every other feature? (assume you have no other feature that depends on the doubled feature i.e. no interaction terms).
- They double
- They halve
- They stay the same
- It is impossible to tell from the information provided
Question 5: Gradient descent/ascent is…
- A model for predicting a continuous variable
- An algorithm for minimizing/maximizing a function
- A theoretical statistical result
- An approximation to simple linear regression
- A modeling technique in machine learning
Question 6: Gradient descent/ascent allows us to…
- Predict a value based on a fitted function
- Estimate model parameters from data
- Assess performance of a model on test data
Question 7: Which of the following statements about step-size in gradient descent is/are TRUE (select all that apply)
- It’s important to choose a very small step-size
- The step-size doesn’t matter
- If the step-size is too large gradient descent may not converge
- If the step size is too small (but not zero) gradient descent may take a very long time to converge
Question 8: Let’s analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10×10 matrix H^T HHTH was on the order of D^3D3 operations. Let’s focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix H^T HHTH?
Please enter a number below.
Answer: 5000
Question 9: More generally, if you have DD features and NN observations what is the total complexity of computing (H^T H)^{-1}(HTH)−1?
- O(D^3)O(D3)
- O(ND^3)O(ND3)
- O(ND^2 + D^3)O(ND2+D3)
- O(ND^2)O(ND2)
- O(N^2D + D^3)O(N2D+D3)
- O(N^2D)O(N2D)
Quiz 2: Exploring different multiple regression models for house price prediction
Question 1: What is the mean value (arithmetic average) of the ‘bedrooms_squared’ feature on TEST data? (round to 2 decimal places)
Answer: 12.45
Question 2: What is the mean value (arithmetic average) of the ‘bed_bath_rooms’ feature on TEST data? (round to 2 decimal places)
Answer 7.50
Question 3: What is the mean value (arithmetic average) of the ‘log_sqft_living’ feature on TEST data? (round to 2 decimal places)
Answer: 7.55
Question 4:What is the mean value (arithmetic average) of the ‘lat_plus_long’ feature on TEST data? (round to 2 decimal places)
Answer: -74.65
Question 5: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 1?
- Positive (+)
- Negative (-)
Question 6: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 2?
- Positive (+)
- Negative (-)
Question 7: Which model (1, 2 or 3) has lowest RSS on TRAINING Data?
- Model 1
- Model 2
- Model 3
Question 8: Which model (1, 2 or 3) has lowest RSS on TESTING Data?
- Model 1
- Model 2
- Model 3
Quiz 3: Implementing gradient descent for multiple regression
Question 1: What is the value of the weight for sqft_living from your gradient descent predicting house prices (model 1)? Round your answer to 1 decimal place.
Answer: 281.91
Question 2: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?
Answer: 356134.44
Question 3: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?
Answer: 366651.41
Question 4:Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?
- Model 1
- Model 2
Question 5: Which model (1 or 2) has lowest RSS on all of the TEST data?
- Model 1
- Model 2
Week 3: Machine Learning: Regression Quiz Answer
Quiz 1: Assessing Performance
Question 1: If the features of Model 1 are a strict subset of those in Model 2, the TRAINING error of the two models can never be the same.
- True
- False
Question 2: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lowest TRAINING error?
- Model 1
- Model 2
- It’s impossible to tell with only this information
Question 3: If the features of Model 1 are a strict subset of those in Model 2. which model will USUALLY have lowest TEST error?
- Model 1
- Model 2
- It’s impossible to tell with only this information
Question 4: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lower BIAS?
- Model 1
- Model 2
- It’s impossible to tell with only this information
Question 5: Which of the following plots of model complexity vs. RSS is most likely from TRAINING data (for a fixed data set)?
- a
- b
- c
- d
Question 6: Which of the following plots of model complexity vs. RSS is most likely from TEST data (for a fixed data set)?
- a
- b
- c
- d
Question 7: It is always optimal to add more features to a regression model.
- True
- False
Question 8: A simple model with few parameters is most likely to suffer from:
- High Bias
- High Variance
Question 9: A complex model with many parameters is most likely to suffer from:
- High Bias
- High Variance
Question 10: A model with many parameters that fits training data very well but does poorly on test data is considered to be
- accurate
- biased
- overfitted
- poorly estimated
Question 11: A common process for selecting a parameter like the optimal polynomial degree is:
- Bootstrapping
- Model estimation
- Multiple regression
- Minimizing test error
- Minimizing validation error
Question 12: Selecting model complexity on test data (choose all that apply):
- Allows you to avoid issues of overfitting to training data
- Provides an overly optimistic assessment of performance of the resulting model
- Is computationally inefficient
- Should never be done
Question 13: Which of the following statements is true (select all that apply): For a fixed model complexity, in the limit of an infinite amount of training data,
- The noise goes to 0
- Bias goes to 0
- Variance goes to 0
- Training error goes to 0
- Generalization error goes to 0
Quiz 2: Exploring the bias-variance tradeoff
Question 1: Is the sign (positive or negative) for power_15 the same in all four models?
- Yes, it is the same in all four models
- No, it is not the same in all four models
Question 2: The plotted fitted lines all look the same in all four plots
- True
- False
Question 3: Which degree (1, 2, …, 15) had the lowest RSS on Validation data?
Answer: 6
Question 4: What is the RSS on TEST data for the model with the degree selected from Validation data? (Make sure you got the correct degree from the previous question)
- Between 1.2e+14 and 1.3e+14
- Between 1.7e+14 and 1.8e+14
- Between 3.4e+13 and 3.5e+13
- Between 5.6e+15 and 5.7e+17
Week 4: Machine Learning: Regression Quiz Answer
Quiz 1: Ridge Regression
Question 1: Which of the following is NOT a valid measure of overfitting?
- Sum of parameters (w_1+w_2+…+w_nw1+w2+…+wn)
- Sum of squares of parameters (w_1^2 + w_2^2 + … +w_n^2w12+w22+…+wn2)
- Range of parameters, i.e., difference between maximum and minimum parameters
- Sum of absolute values of parameters (|w_1| + |w_2| + … + |w_n|∣w1∣+∣w2∣+…+∣wn∣)
Question 2: In ridge regression, choosing a large penalty strength \lambdaλ tends to lead to a model with (choose all that apply):
- High bias
- Low bias
- High variance
- Low variance
Question 3: Which of the following plots best characterize the trend of bias, variance, and generalization error (all plotted over \lambdaλ)?
Answer:

Question 4: In ridge regression using unnormalized features, if you double the value of a given feature (i.e., a specific column of the feature matrix), what happens to the estimated coefficients for every other feature? They:
- Double
- Half
- Stay the same
- Impossible to tell from the information provided
Question 5: If we only have a small number of observations, K-fold cross validation provides a better estimate of the generalization error than the validation set method.
- True
- False
Question 6: 10-fold cross validation is more computationally intensive than leave-one-out (LOO) cross validation.
- True
- False
Question 7: Assume you have a training dataset consisting of NN observations and DD features. You use the closed-form solution to fit a multiple linear regression model using ridge regression. To choose the penalty strength \lambdaλ, you run leave-one-out (LOO) cross validation searching over LL values of \lambdaλ. Let Cost(N,D) be the computational cost of running ridge regression with NN data points and DD features. Assume the prediction cost is negligible compared to the computational cost of training the model. Which of the following represents the computational cost of your LOO cross validation procedure?
- LN⋅Cost(N,D)
- LN⋅Cost(N−1,D)
- LD⋅Cost(N−1,D)
- LD⋅Cost(N,D)
- L⋅Cost(N−1,D)
- L⋅Cost(N,D)
Question 8: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. How long will it take to run leave-one-out (LOO) cross-validation for this selection task?
- About 3 hours
- About 3 days
- About 3 years
- About 3 decades
Question 9: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. If you only want to spend about 1 hour to select \lambdaλ, what value of k should you use for k-fold cross-validation?
- k=6
- k=36
- k=600
- k=3600
Quiz 2: Observing effects of L2 penalty in polynomial regression
Question 1: We first fit a 15th order polynomial model using the ‘sqft_living’ column of the ‘sales’ data frame, with a tiny L2 penalty applied.
What is the absolute value of the learned coefficient of feature power_1? Round your answer to the nearest whole integer. Example: 29
Answer: Between 70 and 150
Question 2: Next, we split the sales data frame into four subsets (set_1, set_2, set_3, set_4) and fit a 15th order polynomial model using each of the subsets.
For the models learned in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Choose the range that contains this value.
- Between -10000 and -1000
- Between -1000 and -100
- Between -100 and 0
- Between 0 and 100
- Between 100 and 1000
- Between 1000 and 10000
Question 3: This question refer to the same models as the previous question.
For the models learned in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Choose the range that contains this value.
- Between -10000 and -1000
- Between -1000 and -100
- Between -100 and 0
- Between 0 and 100
- Between 100 and 1000
- Between 1000 and 10000
Question 4: Using the same 4 subsets (set_1, set_2, set_3, set_4), we train 15th order polynomial models again, but this time we apply a large L2 penalty.
For the models learned with the high level of regularization in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11
Answer: Between 1.8 and 2.1
Question 5: This question refer to the same models as the previous question.
For the models learned with the high level of regularization in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11
Answer: Between 2.3 and 2.8
Question 6: This question refers to the section “selecting an L2 penalty via cross-validation”.
What is the best value for the L2 penalty according to 10-fold validation? Round your answer to 2 decimal places.
Answer: 1000
Question 7: Using the best L2 penalty found above, train a model using all training data. Which of the following ranges contains the RSS on the TEST data of the model you learn with this L2 penalty?
- Between 8e13 and 4e14
- Between 4e14 and 6e14
- Between 6e14 and 8e14
- Between 8e14 and 1e15
- Between 1e15 and 3e15
Quiz 3: Implementing ridge regression via gradient descent
Question 1: We run ridge regression to learn the weights of a simple model that has a single feature (sqft_living), once with l2_penalty=0.0 and once with l2_penalty=1e11.
What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5)
Answer: 263.0
Question 2:This question refers to the same model as the previous question.
What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1 decimal place.
Answer 124.6
Question 3:This question refers to the same model as the previous question.
Comparing the lines you fit with the with no regularization versus high regularization (l2_penalty=1e11), which one is steeper?
- Line fit with no regularization (l2_penalty=0)
- Line fit with high regularization (l2_penalty=1e11)
Question 4:This question refers to the same model as the previous question.
Using the weights learned with no regularization (l2_penalty=0), make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fail?
- Between 8e13 and 2e14
- Between 2e14 and 5e14
- Between 5e14 and 8e14
- Between 8e14 and 1e15
- Between 1e15 and 3e15
Question 5:We run ridge regression to learn the weights of a model that has two features (sqft_living, sqft_living15), once with l2_penalty=0.0 and once with l2_penalty=1e11.
What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5).
Answer: 243.1
Question 6: This question refers to the same model as the previous question.
What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1 decimal place.
Answer: 91.5
Question 7: This question refers to the same model as the previous question.
Using high_penalty_weights, make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fall?
- Between 8e13 and 2e14
- Between 2e14 and 4e14
- Between 4e14 and 8e14
- Between 8e14 and 1e15
- Between 1e15 and 3e15
Question 8: This question refers to the same model as the previous question.
Predict the price of the first house in the test set using the weights learned with no regularization. Do the same using the weights learned with high regularization. Which weights make better prediction for the first house in the test set?
- The weights learned with no regularization (l2_penalty=0)
- The weights learned with high regularization (l2_penalty=1e11)
Week 5: Machine Learning: Regression Quiz Answer
Quiz 1: Feature Selection and Lasso
Question 1: The best fit model of size 5 (i.e., with 5 features) always contains the set of features from best fit model of size 4.
- True
- False
Question 2: Given 20 potential features, how many models do you have to evaluate in the all subsets algorithm?
Answer: 1048576
Question 3:Given 20 potential features, how many models do you have to evaluate if you are running the forward stepwise greedy algorithm? Assume you run the algorithm all the way to the full feature set.
Answer:210
Question 4: Which of the plots could correspond to a lasso coefficient path? Select ALL that apply.
Hint: notice \lambda=\inftyλ=∞ in the bottom right of the plots. How should coefficients behave eventually as \lambdaλ goes to infinity?
Answer:

Question 5: Which of the following statements about coordinate descent is true? (Select all that apply.)
- A small enough step size should be chosen to guarantee convergence.
- To test the convergence of coordinate descent, look at the size of the maximum step you take as you cycle through coordinates.
- Coordinate descent cannot be used to optimize the ordinary least squares objective.
- Coordinate descent is always less efficient than gradient descent, but is often easier to implement.
Question 6: Using normalized features, the ordinary least squares coordinate descent update for feature j has the form (with \rho_jρj defined as in the videos):
Answer: w^j=ρj
Question 7: Using normalized features, the ridge regression coordinate descent update for feature j has the form (with \rho_jρj defined as in the videos):
Answer: w^j=ρj/(λ+1)
Quiz 2: Using LASSO to select features
Question 1: We learn weights on the entire house dataset, using an L1 penalty of 1e10 (or 5e2, if using scikit-learn). Some features are transformations of inputs; see the reading.
Which of the following features have been chosen by LASSO, i.e. which features were assigned nonzero weights? (Choose all that apply)
- yr_renovated
- waterfront
- sqft_living
- grade
- floors
Question 2: We split the house sales dataset into training set, test set, and validation set and choose the l1_penalty that minimizes the error on the validation set.
In which of the following ranges does the best l1_penalty fall?
- Between 0 and 100
- Between 100 and 1000
- Between 1000 and 10000
- Between 10000 and 100000
- Greater than 100000
Question 3: Using the best value of l1_penalty as mentioned in the previous question, how many nonzero weights do you have?
Answer: 18
Question 4: We explore a wide range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights (max_nonzeros=7).
What value did you find for l1_penalty_max?
If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.
If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4313), rounded to nearest integer.
Answer: 3792690190.73
Question 5: We then explore the narrow range of l1_penalty values between l1_penalty_min and l1_penalty_max.
What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros?
If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.
If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4342), rounded to nearest integer.
Answer: 3448968612.16
Question 6:Consider the model learned with the l1_penalty found in the previous question. Which of the following features has non-zero coefficients? (Choose all that apply)
- sqft_living
- bedrooms_square
- sqft_lot_sqrt
- bathrooms
- floors
Quiz 3: Implementing LASSO using coordinate descent
Question 1: From the section “Effect of L1 penalty”: Consider the simple model with 2 features trained on the entire sales dataset.
Which of the following values of l1_penalty would not set w[1] zero, but would set w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)
- 1.4e8
- 1.64e8
- 1.73e8
- 1.9e8
- 2.3e8
Question 2: Refer to the same model as the previous question.
Which of the following values of l1_penalty would set both w[1] and w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)
- 1.4e8
- 1.64e8
- 1.73e8
- 1.9e8
- 2.3e8
Question 3: From the section “Cyclical coordinate descent”: Using the simple model (with 2 features), we run our implementation of LASSO coordinate descent on the normalized sales dataset. We apply an L1 penalty of 1e7 and tolerance of 1.0.
Which of the following ranges contains the RSS of the learned model on the normalized dataset?
- Between 8e13 and 2e14
- Between 2e14 and 5e14
- Between 5e14 and 8e14
- Between 8e14 and 1e15
- Between 1e15 and 3e15
Question 4: Refer to the same model as the previous question.
Which of the following features were assigned a zero weight at convergence?
- constant
- sqft_living
- bedrooms
Question 5: In the section “Evaluating LASSO fit with more features”, we split the data into training and test sets and learn weights with varying degree of L1 penalties. The model now has 13 features.
In the model trained with l1_penalty=1e7, which of the following features has non-zero weight? (Select all that apply)
- constant
- sqft_living
- grade
- waterfront
- sqft_basement
Question 6: This question refers to the same model as the previous question.
In the model trained with l1_penalty=1e8, which of the following features has non-zero weight? (Select all that apply)
- constant
- sqft_living
- grade
- waterfront
- sqft_basement
Question 7: This question refers to the same model as the previous question.
In the model trained with l1_penalty=1e4, which of the following features has non-zero weight? (Select all that apply)
- constant
- sqft_living
- grade
- waterfront
- sqft_basement
Question 8: In the section “Evaluating each of the learned models on the test data”, we evaluate three models on the test data. The three models were trained with same set of features but different L1 penalties.
Which of the three models gives the lowest RSS on the TEST data?
- The model trained with 1e4
- The model trained with 1e7
- The model trained with 1e8
Week 6: Machine Learning: Regression Quiz Answer
Quiz 1: Nearest Neighbors & Kernel Regression
Question 1: Which of the following datasets is best suited to nearest neighbor or kernel regression? Choose all that apply.
- A dataset with many features
- A dataset with two features whose observations are evenly scattered throughout the input space
- A dataset with many observations
- A dataset with only a few observations
Question 2: Which of the following is the most significant advantage of k-nearest neighbor regression (for k>1) over 1-nearest neighbor regression?
- Removes discontinuities in the fit
- Better handles boundaries and regions with few observations
- Better copes with noise in the data
Question 3: To obtain a fit with low variance using kernel regression, we should choose the kernel to have:
- Small bandwidth λ
- Large bandwidth λ
Question 4: In k-nearest neighbor regression and kernel regression, the complexity of functions that can be represented grows as we get more data.
- True
- False
Question 5: Parametric regression and 1-nearest neighbor regression will converge to the same solution as we collect more and more noiseless observations.
- True
- False
Question 6: Suppose you are creating a website to help shoppers pick houses. Every time a user of your website visits the webpage for a specific house, you want to compute a prediction of the house value. You are using 1-NN to make the prediction and have 100,000 houses in the dataset, with each house having 100 features. Computing the distance between two houses using all the features takes about 10 microseconds. Assuming the cost of all other operations involved (e.g., fetching data, etc.) is negligible, about how long will it take to make a prediction using the brute-force method described in the videos?
- 10 milliseconds
- 100 milliseconds
- 1 second
- 10 seconds
Question 7: For the housing website described in the previous question, you learn that you need predictions within 50 milliseconds. To accomplish this, you decide to reduce the number of features in your nearest neighbor comparisons. How many features can you use?
- 1 feature
- 5 features
- 10 features
- 20 features
- 50 features
Quiz 2: Predicting house prices using k-nearest neighbors regression
Question 1: From the section “Compute a single distance”: we take our query house to be the first house of the test set.
What is the Euclidean distance between the query house and the 10th house of the training set? Enter your answer in American-style decimals (e.g. 0.044) rounded to 3 decimal places.
Answer: 0.060
Question 2: From the section “Compute multiple distances”: we take our query house to be the first house of the test set.
Among the first 10 training houses, which house is the closest to the query house? Enter the 0-based index of the closest house.
Answer: 8
Question 3: From the section “Perform 1-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). What is the (0-based) index of the house in the training set that is closest to this query house?
Answer: 382
Question 4: From the section “Perform 1-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). What is the predicted value of the query house based on 1-nearest neighbor regression? Enter your answer in simple decimals without comma separators (e.g. 300000), rounded to nearest whole number.
Answer: 249000
Question 5: From the section “Perform k-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). Which of the following is NOT part of the 4 training houses closest to the query house? (Note that all indices are 0-based.)
- training house with index 382
- training house with index 1149
- training house with index 2818
- training house with index 3142
- training house with index 4087
Question 6: From the section “Perform k-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). Predict the value of the query house by the simple averaging method. Enter your answer in simple decimals without comma separators (e.g. 241242), rounded to nearest whole number.
Answer: 413988
Question 7: From the section “Perform k-nearest neighbor regression”: Make prediction for the first 10 houses using k-nearest neighbors with k=10.
What is the index of the house in this query set that has the lowest predicted value? Enter an index between 0 and 9.
Answer: 6
Question 8: From the section “Perform k-nearest neighbor regression”: We use a validation set to find the best k value, i.e. one that minimizes the RSS on validation set.
If we perform k-nearest neighbors with optimal k found above, what is the RSS on the TEST data? Choose the range that contains this value.
- Between 8e13 and 2e14
- Between 2e14 and 5e14
- Between 5e14 and 8e14
- Between 8e14 and 1e15
- Between 1e15 and 3e15
Machine Learning: Regression Course Review
In our experience, we suggest you enroll in Machine Learning: Regression courses and gain some new skills from Professionals completely free and we assure you will be worth it.
Machine Learning: Regression for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Machine Learning: Regression Coursera Quiz Answers.
Get All Course Quiz Answers of Machine Learning Specialization
Machine Learning Foundations: A Case Study Approach Quiz Answer
Machine Learning: Regression Coursera Quiz Answers
Machine Learning: Classification Coursera Quiz Answers
Machine Learning: Clustering & Retrieval Quiz Answers