Get All Weeks Machine Learning: Regression Coursera Quiz Answers
Table of Contents
Week 1: Machine Learning: Regression Quiz Answers
Quiz 1: Simple Linear Regression
Question 1: Assume you fit a regression model to predict house prices from square feet based on a training data set consisting of houses with square feet in the range of 1000 and 2000. In which interval would we expect predictions to do best?
Question 2: In a simple regression model, if you increase the input value by 1 then you expect the output to change by:
ViewQuestion 3: Two people present you with fits of their simple regression model for predicting house prices from square feet. You discover that the estimated intercept and slopes are exactly the same. This necessarily implies that these two people fit their models on exactly the same data set.
ViewQuestion 4: You have a data set consisting of the sales prices of houses in your neighborhood, with each sale time-stamped by the month and year in which the house sold. You want to predict the average value of houses in your neighborhood over time, so you fit a simple regression model with average house price as the output and the time index (in months) as the input. Based on 10 months of data, the estimated intercept is $4569 and the estimated slope is 143 ($/month). If you extrapolate this trend forward in time, at which time index (in months) do you predict that your neighborhood’s value will have doubled relative to the value at month (index) 10? (Assume months are 0-indexed, round to the nearest month).
ViewQuestion 5: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what intercept must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.
(Note: the next quiz question will ask for the slope of the new model.)
ViewQuestion 6: Your friend in the U.S. gives you a simple regression fit for predicting house prices from square feet. The estimated intercept is -44850 and the estimated slope is 280.76. You believe that your housing market behaves very similarly, but houses are measured in square meters. To make predictions for inputs in square meters, what slope must you use? Hint: there are 0.092903 square meters in 1 square foot. You do not need to round your answer.
ViewQuestion 7: Consider the following data set, and the regression line fitted on this data:
Which bold/labeled point, if removed, will have the largest effect on the fitted regression line (dashed)?
ViewQuiz 2: Fitting a simple linear regression model on housing data
Question 1: Using your Slope and Intercept from predicting prices from square feet, what is the predicted price for a house with 2650 sqft? Use American-style decimals without comma separators (e.g. 300000.34), and round your answer to 2 decimal places. Do not include the dollar sign. You do not need to round your answer.
ViewQuestion 2: Using the learned slope and intercept from the squarefeet model, what is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?
ViewQuestion 3: According to the inverse regression function and the regression slope and intercept from predicting prices from square-feet, what is the estimated square-feet for a house costing $800,000? You do not need to round your answer.
ViewQuestion 4: Which of the two models (square feet or bedrooms) has lower RSS on TEST data?
ViewWeek 2: Machine Learning: Regression Quiz Answer
Quiz 1: Multiple Regression
Question 1: Which of the following is NOT a linear regression model. Hint: remember that a linear regression model is always linear in the parameters, but may use non-linear features.
ViewQuestion 2: Your estimated model for predicting house prices has a large positive weight on ‘square feet living’. This implies that if we remove the feature ‘square feet living’ and refit the model, the new predictive performance will be worse than before.
ViewQuestion 3: Complete the following: Your estimated model for predicting house prices has a positive weight on ‘square feet living’. You then add ‘lot size’ to the model and re-estimate the feature weights. The new weight on ‘square feet living’ [_________] be positive.
ViewQuestion 4: If you double the value of a given feature (i.e. a specific column of the feature matrix), what happens to the least-squares estimated coefficients for every other feature? (assume you have no other feature that depends on the doubled feature i.e. no interaction terms).
ViewQuestion 5: Gradient descent/ascent is…
ViewQuestion 6: Gradient descent/ascent allows us to…
ViewQuestion 7: Which of the following statements about step-size in gradient descent is/are TRUE (select all that apply)
View2.If the step-size is too large gradient descent may not converge
Question 8: Let’s analyze how many computations are required to fit a multiple linear regression model using the closed-form solution based on a data set with 50 observations and 10 features. In the videos, we said that computing the inverse of the 10×10 matrix H^T HHTH was on the order of D^3D3 operations. Let’s focus on forming this matrix prior to inversion. How many multiplications are required to form the matrix H^T HHTH?
Please enter a number below.
ViewQuestion 9: More generally, if you have DD features and NN observations what is the total complexity of computing (H^T H)^{-1}(HTH)−1?
ViewQuiz 2: Exploring different multiple regression models for house price prediction
Question 1: What is the mean value (arithmetic average) of the ‘bedrooms_squared’ feature on TEST data? (round to 2 decimal places)
Question 2: What is the mean value (arithmetic average) of the ‘bed_bath_rooms’ feature on TEST data? (round to 2 decimal places)
ViewQuestion 3: What is the mean value (arithmetic average) of the ‘log_sqft_living’ feature on TEST data? (round to 2 decimal places)
ViewQuestion 4:What is the mean value (arithmetic average) of the ‘lat_plus_long’ feature on TEST data? (round to 2 decimal places)
ViewQuestion 5: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 1?
ViewQuestion 6: What is the sign (positive or negative) for the coefficient/weight for ‘bathrooms’ in model 2?
ViewQuestion 7: Which model (1, 2 or 3) has lowest RSS on TRAINING Data?
ViewQuestion 8: Which model (1, 2 or 3) has lowest RSS on TESTING Data?
ViewQuiz 3: Implementing gradient descent for multiple regression
Question 1: What is the value of the weight for sqft_living from your gradient descent predicting house prices (model 1)? Round your answer to 1 decimal place.
ViewQuestion 2: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?
ViewQuestion 3: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?
ViewQuestion 4:Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?
ViewQuestion 5: Which model (1 or 2) has lowest RSS on all of the TEST data?
ViewWeek 3: Machine Learning: Regression Quiz Answer
Quiz 1: Assessing Performance
Question 1: If the features of Model 1 are a strict subset of those in Model 2, the TRAINING error of the two models can never be the same.
ViewQuestion 2: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lowest TRAINING error?
ViewQuestion 3: If the features of Model 1 are a strict subset of those in Model 2. which model will USUALLY have lowest TEST error?
ViewQuestion 4: If the features of Model 1 are a strict subset of those in Model 2, which model will USUALLY have lower BIAS?
ViewQuestion 5: Which of the following plots of model complexity vs. RSS is most likely from TRAINING data (for a fixed data set)?
ViewQuestion 6: Which of the following plots of model complexity vs. RSS is most likely from TEST data (for a fixed data set)?
ViewQuestion 7: It is always optimal to add more features to a regression model.
ViewQuestion 8: A simple model with few parameters is most likely to suffer from:
ViewQuestion 9: A complex model with many parameters is most likely to suffer from:
ViewQuestion 10: A model with many parameters that fits training data very well but does poorly on test data is considered to be
ViewQuestion 11: A common process for selecting a parameter like the optimal polynomial degree is:
ViewQuestion 12: Selecting model complexity on test data (choose all that apply):
View2.Should never be done
Question 13: Which of the following statements is true (select all that apply): For a fixed model complexity, in the limit of an infinite amount of training data,
ViewQuiz 2: Exploring the bias-variance tradeoff
Question 1: Is the sign (positive or negative) for power_15 the same in all four models?
Question 2: The plotted fitted lines all look the same in all four plots
ViewQuestion 3: Which degree (1, 2, …, 15) had the lowest RSS on Validation data?
ViewQuestion 4: What is the RSS on TEST data for the model with the degree selected from Validation data? (Make sure you got the correct degree from the previous question)
ViewWeek 4: Machine Learning: Regression Quiz Answer
Quiz 1: Ridge Regression
Question 1: Which of the following is NOT a valid measure of overfitting?
ViewQuestion 2: In ridge regression, choosing a large penalty strength \lambdaλ tends to lead to a model with (choose all that apply):
View2.Low variance
Question 4: In ridge regression using unnormalized features, if you double the value of a given feature (i.e., a specific column of the feature matrix), what happens to the estimated coefficients for every other feature? They:
ViewQuestion 5: If we only have a small number of observations, K-fold cross validation provides a better estimate of the generalization error than the validation set method.
ViewQuestion 6: 10-fold cross validation is more computationally intensive than leave-one-out (LOO) cross validation.
ViewQuestion 7: Assume you have a training dataset consisting of NN observations and DD features. You use the closed-form solution to fit a multiple linear regression model using ridge regression. To choose the penalty strength \lambdaλ, you run leave-one-out (LOO) cross validation searching over LL values of \lambdaλ. Let Cost(N,D) be the computational cost of running ridge regression with NN data points and DD features. Assume the prediction cost is negligible compared to the computational cost of training the model. Which of the following represents the computational cost of your LOO cross validation procedure?
ViewQuestion 8: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. How long will it take to run leave-one-out (LOO) cross-validation for this selection task?
ViewQuestion 9: Assume you have a training dataset consisting of 1 million observations. Suppose running the closed-form solution to fit a multiple linear regression model using ridge regression on this data takes 1 second. Suppose you want to choose the penalty strength \lambdaλ by searching over 100 possible values. If you only want to spend about 1 hour to select \lambdaλ, what value of k should you use for k-fold cross-validation?
ViewQuiz 2: Observing effects of L2 penalty in polynomial regression
Question 1: We first fit a 15th order polynomial model using the ‘sqft_living’ column of the ‘sales’ data frame, with a tiny L2 penalty applied.
What is the absolute value of the learned coefficient of feature power_1? Round your answer to the nearest whole integer. Example: 29
ViewQuestion 2: Next, we split the sales data frame into four subsets (set_1, set_2, set_3, set_4) and fit a 15th order polynomial model using each of the subsets.
For the models learned in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Choose the range that contains this value.
ViewQuestion 3: This question refer to the same models as the previous question.
For the models learned in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Choose the range that contains this value.
ViewQuestion 4: Using the same 4 subsets (set_1, set_2, set_3, set_4), we train 15th order polynomial models again, but this time we apply a large L2 penalty.
For the models learned with the high level of regularization in each of these training sets, what are the smallest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11
ViewQuestion 5: This question refer to the same models as the previous question.
For the models learned with the high level of regularization in each of these training sets, what are the largest value you learned for the coefficient of feature power_1? Round your answer to 2 decimal places, and use American-style decimals. Example: 2.11
ViewQuestion 6: This question refers to the section “selecting an L2 penalty via cross-validation”.
What is the best value for the L2 penalty according to 10-fold validation? Round your answer to 2 decimal places.
ViewQuestion 7: Using the best L2 penalty found above, train a model using all training data. Which of the following ranges contains the RSS on the TEST data of the model you learn with this L2 penalty?
ViewQuiz 3: Implementing ridge regression via gradient descent
Question 1: We run ridge regression to learn the weights of a simple model that has a single feature (sqft_living), once with l2_penalty=0.0 and once with l2_penalty=1e11.
What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5)
ViewQuestion 2:This question refers to the same model as the previous question.
What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1 decimal place.
ViewAnswer 124.6
Question 3:This question refers to the same model as the previous question.
Comparing the lines you fit with the with no regularization versus high regularization (l2_penalty=1e11), which one is steeper?
ViewQuestion 4:This question refers to the same model as the previous question.
Using the weights learned with no regularization (l2_penalty=0), make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fail?
ViewQuestion 5:We run ridge regression to learn the weights of a model that has two features (sqft_living, sqft_living15), once with l2_penalty=0.0 and once with l2_penalty=1e11.
What is the value of the coefficient for sqft_living that you learned with no regularization, rounded to 1 decimal place? Use American-style decimals (e.g. 30.5).
ViewQuestion 6: This question refers to the same model as the previous question.
What is the value of the coefficient for sqft_living that you learned with high regularization (l2_penalty=1e11)? Use American-style decimals (e.g. 30.5) and round your answer to 1
Viewdecimal place.
Question 7: This question refers to the same model as the previous question.
Using high_penalty_weights, make predictions for the TEST data. In which of the following ranges does the TEST error (RSS) fall?
ViewQuestion 8: This question refers to the same model as the previous question.
Predict the price of the first house in the test set using the weights learned with no regularization. Do the same using the weights learned with high regularization. Which weights make better prediction for the first house in the test set?
ViewWeek 5: Machine Learning: Regression Quiz Answer
Quiz 1: Feature Selection and Lasso
Question 1: The best fit model of size 5 (i.e., with 5 features) always contains the set of features from best fit model of size 4.
ViewQuestion 2: Given 20 potential features, how many models do you have to evaluate in the all subsets algorithm?
ViewQuestion 3:Given 20 potential features, how many models do you have to evaluate if you are running the forward stepwise greedy algorithm? Assume you run the algorithm all the way to the full feature set.
ViewQuestion 4: Which of the plots could correspond to a lasso coefficient path? Select ALL that apply.
Hint: notice \lambda=\inftyλ=∞ in the bottom right of the plots. How should coefficients behave eventually as \lambdaλ goes to infinity?
ViewAnswer:
Question 5: Which of the following statements about coordinate descent is true? (Select all that apply.)
ViewQuestion 6: Using normalized features, the ordinary least squares coordinate descent update for feature j has the form (with \rho_jρj defined as in the videos):
View
Question 7: Using normalized features, the ridge regression coordinate descent update for feature j has the form (with \rho_jρj defined as in the videos):
ViewQuiz 2: Using LASSO to select features
Question 1: We learn weights on the entire house dataset, using an L1 penalty of 1e10 (or 5e2, if using scikit-learn). Some features are transformations of inputs; see the reading.
Which of the following features have been chosen by LASSO, i.e. which features were assigned nonzero weights? (Choose all that apply)
Viewgrad
Question 2: We split the house sales dataset into training set, test set, and validation set and choose the l1_penalty that minimizes the error on the validation set.
In which of the following ranges does the best l1_penalty fall?
ViewQuestion 3: Using the best value of l1_penalty as mentioned in the previous question, how many nonzero weights do you have?
ViewQuestion 4: We explore a wide range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights (max_nonzeros=7).
What value did you find for l1_penalty_max?
If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.
If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4313), rounded to nearest integer.
ViewQuestion 5: We then explore the narrow range of l1_penalty values between l1_penalty_min and l1_penalty_max.
What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros?
If you are using Turi Create, enter your answer in simple decimals without commas (e.g. 1131000000), rounded to nearest millions.
If you are using scikit-learn, enter your answer in simple decimals without commas (e.g. 4342), rounded to nearest integer.
ViewQuestion 6:Consider the model learned with the l1_penalty found in the previous question. Which of the following features has non-zero coefficients? (Choose all that apply)
View2.sqft_living
Quiz 3: Implementing LASSO using coordinate descent
Question 1: From the section “Effect of L1 penalty”: Consider the simple model with 2 features trained on the entire sales dataset.
Which of the following values of l1_penalty would not set w[1] zero, but would set w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)
View1.73e8
Question 2: Refer to the same model as the previous question.
Which of the following values of l1_penalty would set both w[1] and w[2] to zero, if we were to take a coordinate gradient step in that coordinate? (Select all that apply)
View2.3e8
Question 3: From the section “Cyclical coordinate descent”: Using the simple model (with 2 features), we run our implementation of LASSO coordinate descent on the normalized sales dataset. We apply an L1 penalty of 1e7 and tolerance of 1.0.
Which of the following ranges contains the RSS of the learned model on the normalized dataset?
ViewQuestion 4: Refer to the same model as the previous question.
Which of the following features were assigned a zero weight at convergence?
ViewQuestion 5: In the section “Evaluating LASSO fit with more features”, we split the data into training and test sets and learn weights with varying degree of L1 penalties. The model now has 13 features.
In the model trained with l1_penalty=1e7, which of the following features has non-zero weight? (Select all that apply)
View2.sqft_living
3.constant
Question 6: This question refers to the same model as the previous question.
In the model trained with l1_penalty=1e8, which of the following features has non-zero weight? (Select all that apply)
ViewQuestion 7: This question refers to the same model as the previous question.
In the model trained with l1_penalty=1e4, which of the following features has non-zero weight? (Select all that apply)
Viewsqft_living
grade
waterfront
sqft_basement
Question 8: In the section “Evaluating each of the learned models on the test data”, we evaluate three models on the test data. The three models were trained with same set of features but different L1 penalties.
Which of the three models gives the lowest RSS on the TEST data?
ViewWeek 6: Machine Learning: Regression Quiz Answer
Quiz 1: Nearest Neighbors & Kernel Regression
Question 1: Which of the following datasets is best suited to nearest neighbor or kernel regression? Choose all that apply.
View2.A dataset with many observations
Question 2: Which of the following is the most significant advantage of k-nearest neighbor regression (for k>1) over 1-nearest neighbor regression?
ViewQuestion 3: To obtain a fit with low variance using kernel regression, we should choose the kernel to have:
ViewQuestion 4: In k-nearest neighbor regression and kernel regression, the complexity of functions that can be represented grows as we get more data.
ViewQuestion 5: Parametric regression and 1-nearest neighbor regression will converge to the same solution as we collect more and more noiseless observations.
ViewQuestion 6: Suppose you are creating a website to help shoppers pick houses. Every time a user of your website visits the webpage for a specific house, you want to compute a prediction of the house value. You are using 1-NN to make the prediction and have 100,000 houses in the dataset, with each house having 100 features. Computing the distance between two houses using all the features takes about 10 microseconds. Assuming the cost of all other operations involved (e.g., fetching data, etc.) is negligible, about how long will it take to make a prediction using the brute-force method described in the videos?
ViewQuestion 7: For the housing website described in the previous question, you learn that you need predictions within 50 milliseconds. To accomplish this, you decide to reduce the number of features in your nearest neighbor comparisons. How many features can you use?
ViewQuiz 2: Predicting house prices using k-nearest neighbors regression
Question 1: From the section “Compute a single distance”: we take our query house to be the first house of the test set.
What is the Euclidean distance between the query house and the 10th house of the training set? Enter your answer in American-style decimals (e.g. 0.044) rounded to 3 decimal places.
ViewQuestion 2: From the section “Compute multiple distances”: we take our query house to be the first house of the test set.
Among the first 10 training houses, which house is the closest to the query house? Enter the 0-based index of the closest house.
ViewQuestion 3: From the section “Perform 1-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). What is the (0-based) index of the house in the training set that is closest to this query house?
ViewQuestion 4: From the section “Perform 1-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). What is the predicted value of the query house based on 1-nearest neighbor regression? Enter your answer in simple decimals without comma separators (e.g. 300000), rounded to nearest whole number.
ViewQuestion 5: From the section “Perform k-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). Which of the following is NOT part of the 4 training houses closest to the query house? (Note that all indices are 0-based.)
ViewQuestion 6: From the section “Perform k-nearest neighbor regression”:
Take the query house to be third house of the test set (features_test[2]). Predict the value of the query house by the simple averaging method. Enter your answer in simple decimals without comma separators (e.g. 241242), rounded to nearest whole number.
ViewQuestion 7: From the section “Perform k-nearest neighbor regression”: Make prediction for the first 10 houses using k-nearest neighbors with k=10.
What is the index of the house in this query set that has the lowest predicted value? Enter an index between 0 and 9.
ViewQuestion 8: From the section “Perform k-nearest neighbor regression”: We use a validation set to find the best k value, i.e. one that minimizes the RSS on validation set.
If we perform k-nearest neighbors with optimal k found above, what is the RSS on the TEST data? Choose the range that contains this value.
ViewMachine Learning: Regression Course Review
In our experience, we suggest you enroll in Machine Learning: Regression courses and gain some new skills from Professionals completely free and we assure you will be worth it.
Machine Learning: Regression for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Machine Learning: Regression Coursera Quiz Answers.
Get All Course Quiz Answers of Machine Learning Specialization
Machine Learning Foundations: A Case Study Approach Quiz Answer
Machine Learning: Regression Coursera Quiz Answers
Machine Learning: Classification Coursera Quiz Answers
Machine Learning: Clustering & Retrieval Quiz Answers