Mathematics for Machine Learning: Linear Algebra Quiz Answers

Get all Weeks Mathematics for Machine Learning: Linear Algebra Quiz Answers

In this course on Linear Algebra, we look at what linear algebra is and how it relates to vectors and matrices. Then we look through what vectors and matrices are and how to work with them, including the knotty problem of eigenvalues and eigenvectors, and how to use these to solve problems. Finally, we look at how to use these to do fun things with datasets – like how to rotate images of faces and how to extract eigenvectors to look at how the Pagerank algorithm works.

Since we’re aiming at data-driven applications, we’ll be implementing some of these ideas in code, not just on pencil and paper. Towards the end of the course, you’ll write code blocks and encounter Jupyter notebooks in Python, but don’t worry, these will be quite short, focused on the concepts, and will guide you through if you’ve not coded before. At the end of this course, you will have an intuitive understanding of vectors and matrices that will help you bridge the gap between linear algebra problems, and how to apply these concepts to machine learning.

Enroll on Coursera

Mathematics for Machine Learning: Linear Algebra Coursera Quiz Answers

Week 1 Quiz Answers

Quiz 1: Exploring parameter space

Q1. In this exercise, we shall see how it is often convenient to use vectors in machine learning. These could be in the form of data itself, or model parameters, and so on.

The purpose of this exercise is to set the scene for Linear Algebra and the rest of the maths we will cover in the specialization. If this is confusing right now – stick with us! We’ll build up your skills throughout the rest of the course. For this reason we’ve set a low pass mark for this quiz, but even if you don’t pass in one go, reading the feedback from a wrong answer can often give more insight than guessing a correct answer!

* * *∗∗∗

The problem we shall focus on in this exercise is the distribution of heights in a population.

If we do a survey of the heights of people in a population, we may get a distribution like this:

This histogram indicates how likely it is for anyone in the survey to be in a particular height range. (6 ft is around 183 cm)

This histogram can also be represented by a vector, i.e. a list of numbers. In this case, we record the frequency of people with heights in little groups at 2.5 cm intervals, i.e. between 150 cm and 152.5 cm, between 152.5 cm and 155 cm, and so on. We can define this as the vector \mathbf{f}f with components,

f=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢f150.0,152.5f152.5,155.0f155.0,157.5f157.5,160.0f160.0,162.5⋮⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

These vector components are then the sizes of each bar in the histogram.

Of the following statements, select all that you think are true.

  • None of the other statements.
  • There are at least 10 elements in the frequency vector, \mathbf{f}f.
  • No one in the world is less than 160 cm tall.
  • If another sample was taken under the same conditions, the frequencies would be exactly the same.
  • If another sample was taken under the same conditions, the frequencies should be broadly similar.

Q2. One of the tasks of machine learning is to fit a model to data in order to represent the underlying distribution.

For the heights of a population, a model we may use to predict frequencies is the Normal (or Gaussian) distribution. This is a model for a bell-shaped curve, which looks like this

It has the slightly complicated equation,

g(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) g(x)=σ2π​1​exp(−2σ2(xμ)2​),

the exact form of which is unimportant, except that it is dependent on two parameters, the mean, \muμ, where the curve is centred, and the standard deviation, \sigmaσ, which is the characteristic width of the bell curve (measured from the mean).

We can put these two parameters in a vector, \mathbf{p} =

[μσ]

p=[μσ​].

Pick the parameter vector \mathbf{p}p which best describes the distribution pictured.

  • \mathbf{p} =
  • [143167]
  • p=[143167​]
  • \mathbf{p} =
  • [15512]
  • p=[15512​]
  • \mathbf{p} =
  • [16724]
  • p=[16724​]
  • \mathbf{p} =
  • [1553]
  • p=[1553​]
  • \mathbf{p} =
  • [16712]
  • p=[16712​

Q3. Pick the Normal distribution that corresponds the closest to the parameter vector \mathbf{p} =

[33]

p=[33​].

Q4. A model allows us to predict the data in a distribution. In our example we can start with a parameter vector \mathbf{p}p and convert it to a vector of expected frequencies \mathbf{g}_\mathbf{p}gp​, for example,

gp=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢g150.0,152.5g152.5,155.0g155.0,157.5g157.5,160.0g160.0,162.5⋮⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥

A model is only considered good if it fits the measured data well. Some specific values for the parameters will be better than others for a model. We need a way fit a model’s parameters to data and quantify how good that fit is.

One way of doing so is to calculate the “residuals”, which is the difference between the measured data and the modelled prediction for each histogram bin.

This is illustrated below. The model is shown in pink, the measured data is shown in orange and where they overlap is shown in green. The height of the pink and orange bars are the residuals.

A better fit would have as much overlap as it can, reducing the residuals as much as possible.

How could the model be improved to give the best fit to the data

  • Keep the mean, \muμ, approximately the same.
  • Keep the standard deviation, \sigmaσ, approximately the same.
  • Increase the mean, \muμ.
  • Increase the standard deviation, \sigmaσ.
  • Decrease the standard deviation, \sigmaσ.
  • Decrease the mean, \muμ.

Q5. The performance of a model can be quantified in a single number. One measure we can use is the Sum of Squared Residuals, \mathrm{SSR}SSR. Here we take all of the residuals (the difference between the measured and predicted data), square them and add them together.

In the language of vectors we can write this as, \mathrm{SSR}(\mathbf{p}) = |\mathbf{f}-\mathbf{g}_\mathbf{p}|^2SSR(p)=∣fgp​∣2, which will be explained further on in this course.

Use the following code block to play with parameters of a model, and try to get the best fit to the data.

1

2

3

4

5

# Play with values of μ and σ to find the best fit.

μ = 160 ; σ = 6

p = [μ, σ]

histogram(p)RunReset

Find a set of parameters with a fit \mathrm{SSR} \le 0.00051SSR≤0.00051

Input your fitted parameters into the code block below.

  • 1
  • 2
  • 3
  • # Replace μ and σ with values that minimise the SSR.
  • p = [μ, σ]

RunReset

Q6. Since each parameter vector \mathbf{p}p represents a different bell curve, each with its own value for the sum of squared residuals, \mathrm{SSR}SSR, we can draw the surface of \mathrm{SSR}SSR values over the space spanned by \mathbf{p}p, such as \muμ and \sigmaσ in this example.

Here is an illustration of this surface for our data.

Every point on this surface represents the SSR of a choice of parameters, with some bell curves performing better at representing the data than others.

We can take a ‘top-down’ view of the surface, and view it as a contour map, where each of the contours (in green here) represent a constant value for the \mathrm{SSR}SSR.

The goal in machine learning is to find the parameter set where the model fits the data as well as it possibly can. This translates into finding the lowest point, the global minimum, in this space.

Select all true statements below.

  • None of the other statements.
  • You get the same model by following along a contour line.
  • At the minimum of the surface, the model exactly matches the measured data.
  • Each point on the surface represents a set of parameters \mathbf{p}=
  • [μσ]
  • p=[μσ​].

Moving at right angles to contour lines in the parameter space will have the greatest effect on the fit than moving in other directions.

Q7. Often we can’t see the whole parameter space, so instead of just picking the lowest point, we have to make educated guesses where better points will be.

We can define another vector, \Delta\mathbf{p}Δp, in the same space as \mathbf{p}p that tells us what change can be made to \mathbf{p}p to get a better fit.

For example, a model with parameters \mathbf{p}’ = \mathbf{p} + \Delta\mathbf{p}p′=pp will produce a better fit to data, if we can find a suitable \Delta\mathbf{p}Δp.

The second course in this specialisation will detail how to calculate these changes in parameters, \Delta\mathbf{p}Δp.

Given the following contour map,

What \Delta\mathbf{p}Δp will give the best improvement in the model?

  • \Delta\mathbf{p} =
    • [−2−2]
    • Δp=[−2−2​]
  • \Delta\mathbf{p} =
    • [22]
    • Δp=[22​]
  • \Delta\mathbf{p} =
    • [−22]
    • Δp=[−22​]
  • \Delta\mathbf{p} =
    • [2−2]
    • Δp=[2−2​]

Quiz 2: Solving some simultaneous equations

Q1. In this quiz you’ll be reminded of how to solve linear simultaneous equations as a way to practice some basic linear algebra. Some of the ideas presented here will be relevant later in the course.

Solving simultaneous equations is the process of finding the values of the variables (here xx and yy) that satisfy the system of equations. Let’s start with the simplest type of simultaneous equation, where we already know all but one of the variables:

3x – y = 23x−y=2

x = 4x=4

Substitute the value of xx into the first equation to find yy, then select the correct values of xx and yy below.

  • x = 4, y = 2x=4,y=2
  • x = 4x=4, y = 10y=10
  • x = 4, y = 14x=4,y=14
  • x = 4, y =-10x=4,y=−10

Q2. The first goal when solving simple simultaneous equations should be to isolate one of the variables. For example, try taking the second equation away from the first to solve the following pair of equations:

3x – 2y = 73x−2y=7

2x – 2y = 22x−2y=2

What value did you find for xx? Now substitute xx into one of the equations to find yy, and select the correct pair below:

  • x = 3, y = 1x=3,y=1
  • x = 7, y = 7x=7,y=7
  • x = 5, y = 4x=5,y=4
  • x = 1, y = -4x=1,y=−4

Q3. This method is called elimination, and you can use it even when the coefficients, the numbers in front of xx and yy, aren’t the same.

For example, to solve the following equations try multiplying both sides of the first equation by 22, then solve using the same method as the last question.

3x – 2y = 43x−2y=4

6x + 3y = 156x+3y=15

Select the correct values of xx and yy below:

  • x = 4, y = -2x=4,y=−2
  • x = 3, y = 1x=3,y=1
  • x = 1, y = 2x=1,y=2
  • x = 2, y = 1x=2,y=1

Q4. A very similar technique can be used to find the inverse of a matrix, which you will learn about in week three of this course.

There is also the substitution method, where we rearrange one of the equations to the form x = ay+bx=ay+b or y = cx+dy=cx+d and then substitute xx or yy into the other equation. Use any method you’d like to solve the following simultaneous equations:

-2x + 2y = 20−2x+2y=20

5x + 3y = 65x+3y=6

Select the correct values of xx and yy below:

  • x = 5, y = 15x=5,y=15
  • x = -5, y = 5x=−5,y=5
  • x = -3, y = 7x=−3,y=7
  • x = 3, y = 13x=3,y=13

Q5. Systems of simultaneous equations can have more than two unknown variables. Below there is a system with three; xx, yy and zz. First try to find one of the variables by elimination or substitution, which will lead to two equations and two unknown variables. Continue the process to find all of the variables.

Which values of xx, yy and zz solve the following equations?

3x – 2y + z = 73x−2y+z=7

x + y + z = 2x+y+z=2

3x – 2y – z = 33x−2y−z=3

Before you move on you might like to think about how many equations
you would need to uniquely determine four, five, or more variables. Are there are
any other rules for how the equations have to be related? In week two of this course you will learn
about linear independence, which is very closely related to this.

  • x = 1, y = -1, z = -2x=1,y=−1,z=−2
  • x = -1, y = -3, z = 4x=−1,y=−3,z=4
  • x = 2, y = -2, z = 2x=2,y=−2,z=2
  • x = 1, y = -1, z = 2x=1,y=−1,z=2

Quiz 3: Doing some vector operations

Q1. This aim of this quiz is to familiarise yourself with vectors and some basic vector operations.

For the following questions, the vectors \mathbf{a}a, \mathbf{b}b, \mathbf{c}c, \mathbf{d}d and \mathbf{e}e refer to those in this diagram:

The sides of each square on the grid are of length 11. What is the numerical representation of the vector \mathbf{a}a?

  • [12​]
  • [21​]
  • [22​]
  • [11​]

Q2. Which vector in the diagram corresponds to

[−12]

[−12​]?

  • Vector \mathbf{a}a
  • Vector \mathbf{b}b
  • Vector \mathbf{c}c
  • Vector \mathbf{d}d

Q3. What vector is 2\mathbf{c}2c?

Please select all correct answers.

  • \mathbf{e}e
  • [−22​]
  • \mathbf{a}a
  • [22​]

Q4. What vector is -\mathbf{b}−b?

Please select all correct answers.

  • [−12​]
  • \mathbf{e}e
  • \mathbf{d}d
  • [−21​]

Q5. In the previous videos you saw that vectors can be added by placing them start-to-end. For example, the following diagram represents the sum of two new vectors, \mathbf{u} + \mathbf{v}u+v:

The sides of each square on the grid are still of length 11. Which of the following equations does the diagram represent?

[12]

+

[01]

=

[22]

[12​]+[01​]=[22​]

[11]

+

[10]

=

[21]

[11​]+[10​]=[21​]

[12]

+

[10]

=

[22]

[12​]+[10​]=[22​]

[21]

+

[01]

=

[22]

[21​]+[01​]=[22​]

Q6. Let’s return to our vectors defined by the diagram below:

What is the vector \mathbf{b}+\mathbf{e}b+e?

  • [−1−1​]
  • [2−1​]
  • [13]
  • [−12​]

Q7. What is the vector \mathbf{d} – \mathbf{b}db?

  • [−24]
  • [4−2]
  • [−42]
  • [2−4]

Week 2 Quiz Answers

Quiz 1: Dot product of vectors

Q1. As we have seen in the lecture videos, the dot product of vectors has a lot of applications. Here, you will complete some exercises involving the dot product.

We have seen that the size of a vector with two components is calculated using Pythagoras’ theorem, for example the following diagram shows how we calculate the size of the orange vector \mathbf{r} =
[r1r2]
r=[
r
1

r
2


]:

In fact, this definition can be extended to any number of dimensions; the size of a vector is the square root of the sum of the squares of its components. Using this information, what is the size of the vector \mathbf{s} =
⎡⎣⎢⎢1342⎤⎦⎥⎥
s=





1
3
4
2







?

1 point

|\mathbf{s}| = 30∣s∣=30

|\mathbf{s}| = \sqrt{30}∣s∣=
30

|\mathbf{s}| = \sqrt{10}∣s∣=
10

|\mathbf{s}| = 10∣s∣=10

Q2. Remember the definition of the dot product from the videos. For two nn component vectors, \mathbf{a}\cdot\mathbf{b} = a_1b_1 + a_2b_2 + \dots + a_nb_na⋅b=a
1

b
1

+a
2

b
2

+⋯+a
n

b
n

.

What is the dot product of the vectors \mathbf{r} =
⎡⎣⎢⎢−5328⎤⎦⎥⎥
r=





−5
3
2
8







and \mathbf{s} =
⎡⎣⎢⎢12−10⎤⎦⎥⎥
s=





1
2
−1
0







?

\mathbf{r}\cdot\mathbf{s} =
⎡⎣⎢⎢−4519⎤⎦⎥⎥
r⋅s=





−4
5
1
9






\mathbf{r}\cdot\mathbf{s} = 1r⋅s=1

\mathbf{r}\cdot\mathbf{s} =
⎡⎣⎢⎢−56−20⎤⎦⎥⎥
r⋅s=





−5
6
−2
0






\mathbf{r}\cdot\mathbf{s} = -1r⋅s=−1

Q3. The lectures introduced the idea of projecting one vector onto another. The following diagram shows the projection of \mathbf{s}s onto \mathbf{r}r when the vectors are in two dimensions:

Remember that the scalar projection is the size of the green vector. If the angle between \mathbf{s}s and \mathbf{r}r is greater than \pi/2π/2, the projection will also have a minus sign.

We can do projection in any number of dimensions. Consider two vectors with three components, \mathbf{r} =

What is the scalar projection of \mathbf{s}s onto \mathbf{r}r?

-2−2

22

-\frac{1}{2}−
2
1

\frac{1}{2}
2
1

Q4. Remember that in the projection diagram, the vector projection is the green vector:

Let \mathbf{r} =

What is the vector projection of \mathbf{s}s onto \mathbf{r}r?

  • ⎡⎣640⎤⎦
  • ⎣⎢⎡​640​⎦⎥⎤​
  • ⎡⎣⎢6/5−8/50⎤⎦⎥
  • ⎣⎢⎡​6/5−8/50​⎦⎥⎤​
  • ⎡⎣30−200⎤⎦
  • ⎣⎢⎡​30−200​⎦⎥⎤​
  • ⎡⎣6−80⎤⎦
  • ⎣⎢⎡​6−80​⎦⎥⎤​

Q5. Let

Let \mathbf{a} =

⎡⎣304⎤⎦a=⎣⎢⎡​304​⎦⎥⎤​ and \mathbf{b} =

⎡⎣0512⎤⎦b=⎣⎢⎡​0512​⎦⎥⎤​.

Which is larger, |\mathbf{a} + \mathbf{b}|∣a+b∣ or |\mathbf{a}| + |\mathbf{b}|∣a∣+∣b∣?

  • |\mathbf{a} + \mathbf{b}| > |\mathbf{a}| + |\mathbf{b}|∣a+b∣>∣a∣+∣b∣
  • |\mathbf{a} + \mathbf{b}| = |\mathbf{a}| + |\mathbf{b}|∣a+b∣=∣a∣+∣b∣
  • |\mathbf{a} + \mathbf{b}|<|\mathbf{a}| + |\mathbf{b}|∣a+b∣<∣a∣+∣b∣

Q6. Which of the following statements about dot products are correct?

  • The order of vectors in the dot product is important, so that \mathbf{s}\cdot\mathbf{r} \neq \mathbf{r}\cdot \mathbf{s}s⋅r
  • The size of a vector is equal to the square root of the dot product of the vector with itself.
  • The scalar projection of \mathbf{s}s onto \mathbf{r}r is always the same as the scalar projection of \mathbf{r}r onto \mathbf{s}s.
  • The vector projection of \mathbf{s}s onto \mathbf{r}r is equal to the scalar projection of \mathbf{s}s onto \mathbf{r}r multiplied by a vector of unit length that points in the same direction as \mathbf{r}r.
  • We can find the angle between two vectors using the dot product.
Mathematics for Machine Learning: Linear Algebra Course Review

In our experience, we suggest you enroll in Mathematics for Machine Learning: Linear Algebra courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Mathematics for Machine Learning: Linear Algebra Course for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Mathematics for Machine Learning: Linear Algebra Quiz Answers.

Conclusion:

I hope this Mathematics for Machine Learning: Linear Algebra Quiz Answer would be useful for you to learn something new from this Course. If it helped you then don’t forget to bookmark our site for more Quiz Answers.

This course is intended for audiences of all experiences who are interested in learning about new skills in a business context; there are no prerequisite courses.

Keep Learning!

Leave a Reply

Your email address will not be published.

error: Content is protected !!