Mathematics for Machine Learning: Linear Algebra Quiz Answers

Get all Weeks Mathematics for Machine Learning: Linear Algebra Quiz Answers

Mathematics for Machine Learning: Linear Algebra Week 01 Quiz Answers

Quiz 1: Exploring parameter space

Q1. In this exercise, we shall see how it is often convenient to use vectors in machine learning. These could be in the form of data itself, model parameters, and so on.

The purpose of this exercise is to set the scene for Linear Algebra and the rest of the maths we will cover in the specialization. If this is confusing right now – stick with us! We’ll build up your skills throughout the rest of the course. For this reason, we’ve set a low pass mark for this quiz, but even if you don’t pass in one go, reading the feedback from a wrong answer can often give more insight than guessing a correct answer!

* * *∗∗∗

The problem we shall focus on in this exercise is the distribution of heights in a population.

If we do a survey of the heights of people in a population, we may get a distribution like this:

This histogram indicates how likely it is for anyone in the survey to be in a particular height range. (6 ft is around 183 cm)

This histogram can also be represented by a vector, i.e. a list of numbers. In this case, we record the frequency of people with heights in little groups at 2.5 cm intervals, i.e. between 150 cm and 152.5 cm, between 152.5 cm and 155 cm, and so on. We can define this as the vector \mathbf{f}f with components,

f=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢f150.0,152.5f152.5,155.0f155.0,157.5f157.5,160.0f160.0,162.5⋮⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

These vector components are then the sizes of each bar in the histogram.

Of the following statements, select all that you think are true

[expand title=View Answer]
1.There are at least 10 elements in the frequency vector, \mathbf{f}f.
2.If another sample was taken under the same conditions, the frequencies should be broadly similar.
[/expand]

Q2. One of the tasks of machine learning is to fit a model to data in order to represent the underlying distribution.

For the heights of a population, a model we may use to predict frequencies is the Normal (or Gaussian) distribution. This is a model for a bell-shaped curve, which looks like this

It has the slightly complicated equation,

g(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) g(x)=σ2π1exp(−2σ2(x−μ)2),

the exact form of which is unimportant, except that it is dependent on two parameters, the mean, \muμ, where the curve is centred, and the standard deviation, \sigmaσ, which is the characteristic width of the bell curve (measured from the mean).

We can put these two parameters in a vector, \mathbf{p} =

[μσ]

p=[μσ].

Pick the parameter vector \mathbf{p}p which best describes the distribution pictured.

[expand title=View Answer] p=[155, 12] [/expand]

Q3. Pick the Normal distribution that corresponds the closest to the parameter vector \mathbf{p} =

[33]

p=[33].

Q4. A model allows us to predict the data in a distribution. In our example we can start with a parameter vector \mathbf{p}p and convert it to a vector of expected frequencies \mathbf{g}_\mathbf{p}gp, for example,

gp=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢g150.0,152.5g152.5,155.0g155.0,157.5g157.5,160.0g160.0,162.5⋮⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥

A model is only considered good if it fits the measured data well. Some specific values for the parameters will be better than others for a model. We need a way fit a model’s parameters to data and quantify how good that fit is.

One way of doing so is to calculate the “residuals”, which is the difference between the measured data and the modelled prediction for each histogram bin.

This is illustrated below. The model is shown in pink, the measured data is shown in orange and where they overlap is shown in green. The height of the pink and orange bars are the residuals.

A better fit would have as much overlap as it can, reducing the residuals as much as possible.

How could the model be improved to give the best fit to the data

[expand title=View Answer]
1.Keep the mean, \muμ, approximately the same.
2.Increase the standard deviation, \sigmaσ.
[/expand]

Q5. The performance of a model can be quantified in a single number. One measure we can use is the Sum of Squared Residuals, \mathrm{SSR}SSR. Here we take all of the residuals (the difference between the measured and predicted data), square them and add them together.

In the language of vectors we can write this as, \mathrm{SSR}(\mathbf{p}) = |\mathbf{f}-\mathbf{g}_\mathbf{p}|^2SSR(p)=∣f−gp∣2, which will be explained further on in this course.

Use the following code block to play with parameters of a model, and try to get the best fit to the data.

1

2

3

4

5

# Play with values of μ and σ to find the best fit.

μ = 160 ; σ = 6

p = [μ, σ]

histogram(p)RunReset

Find a set of parameters with a fit \mathrm{SSR} \le 0.00051SSR≤0.00051

Input your fitted parameters into the code block below.

[expand title=View Answer]
1

3
[/expand]

Q6. Since each parameter vector \mathbf{p}p represents a different bell curve, each with its own value for the sum of squared residuals, \mathrm{SSR}SSR, we can draw the surface of \mathrm{SSR}SSR values over the space spanned by \mathbf{p}p, such as \muμ and \sigmaσ in this example.

Here is an illustration of this surface for our data.

Every point on this surface represents the SSR of a choice of parameters, with some bell curves performing better at representing the data than others.

We can take a ‘top-down’ view of the surface, and view it as a contour map, where each of the contours (in green here) represents a constant value for the \mathrm{SSR}SSR.

The goal in machine learning is to find the parameter set where the model fits the data as well as it possibly can. This translates into finding the lowest point, the global minimum, in this space.

Select all true statements below.

[expand title=View Answer]
You get the same model by following along a contour line.

At the minimum of the surface, the model exactly matches the measured data.

Each point on the surface represents a set of parameters \mathbf{p}=[μσ]

p=[μσ].
[/expand]

Moving at right angles to contour lines in the parameter space will have the greatest effect on the fit than moving in other directions.

Q7. Often we can’t see the whole parameter space, so instead of just picking the lowest point, we have to make educated guesses where better points will be.

We can define another vector, \Delta\mathbf{p}Δp, in the same space as \mathbf{p}p that tells us what change can be made to \mathbf{p}p to get a better fit.

For example, a model with parameters \mathbf{p}’ = \mathbf{p} + \Delta\mathbf{p}p′=p+Δp will produce a better fit to data if we can find a suitable \Delta\mathbf{p}Δp.

The second course in this specialization will detail how to calculate these changes in parameters, \Delta\mathbf{p}Δp.

Given the following contour map,

What \Delta\mathbf{p}Δp will give the best improvement in the model?

[expand title=View Answer]
[2−2]

Δp=[2−2]
[/expand]

Quiz 2: Solving some simultaneous equations

Q1. In this quiz, you’ll be reminded of how to solve linear simultaneous equations as a way to practice some basic linear algebra. Some of the ideas presented here will be relevant later in the course.

Solving simultaneous equations is the process of finding the values of the variables (here xx and yy) that satisfy the system of equations. Let’s start with the simplest type of simultaneous equation, where we already know all but one of the variables:

3x – y = 23x−y=2

x = 4x=4

Substitute the value of xx into the first equation to find yy, then select the correct values of xx and yy below.

[expand title=View Answer] x = 4, y = 2x=4,y=2[/expand]

Q2. The first goal when solving simple simultaneous equations should be to isolate one of the variables. For example, try taking the second equation away from the first to solve the following pair of equations:

3x – 2y = 73x−2y=7

2x – 2y = 22x−2y=2

What value did you find for xx? Now substitute xx into one of the equations to find yy, and select the correct pair below:

[expand title=View Answer] x = 5, y = 4x=5,y=4 [/expand]

Q3. This method is called elimination, and you can use it even when the coefficients, the numbers in front of xx and yy, aren’t the same.

For example, to solve the following equations try multiplying both sides of the first equation by 22, then solve using the same method as the last question.

3x – 2y = 43x−2y=4

6x + 3y = 156x+3y=15

Select the correct values of xx and yy below:

[expand title=View Answer] x = 2, y = 1x=2,y=1 [/expand]

Q4. A very similar technique can be used to find the inverse of a matrix, which you will learn about in week three of this course.

There is also the substitution method, where we rearrange one of the equations to the form x = ay+bx=ay+b or y = cx+dy=cx+d and then substitute xx or yy into the other equation. Use any method you’d like to solve the following simultaneous equations:

-2x + 2y = 20−2x+2y=20

5x + 3y = 65x+3y=6

Select the correct values of xx and yy below:

[expand title=View Answer] x = 3, y = 13x=3,y=13 [/expand]

Q5. Systems of simultaneous equations can have more than two unknown variables. Below there is a system with three; xx, yy and zz. First try to find one of the variables by elimination or substitution, which will lead to two equations and two unknown variables. Continue the process to find all of the variables.

Which values of xx, yy and zz solve the following equations?

3x – 2y + z = 73x−2y+z=7

x + y + z = 2x+y+z=2

3x – 2y – z = 33x−2y−z=3

Before you move on you might like to think about how many equations
you would need to uniquely determine four, five, or more variables. Are there are
any other rules for how the equations have to be related? In week two of this course you will learn
about linear independence, which is very closely related to this.

[expand title=View Answer] x = 1, y = -1, z = -2x=1,y=−1,z=−2[/expand]

Quiz 3: Doing some vector operations

Q1. The aim of this quiz is to familiarise yourself with vectors and some basic vector operations.

For the following questions, the vectors \mathbf{a}a, \mathbf{b}b, \mathbf{c}c, \mathbf{d}d and \mathbf{e}e refer to those in this diagram:

The sides of each square on the grid are of length 11. What is the numerical representation of the vector \mathbf{a}a?

[expand title=View Answer][22][/expand]

Q2. Which vector in the diagram corresponds to

[−12]

[−12]?

[expand title=View Answer] Vector \mathbf{b}b [/expand]

Q3. What vector is 2\mathbf{c}2c?

Please select all correct answers.

[expand title=View Answer]
\mathbf{a}a

[22]
[/expand]

Q4. What vector is -\mathbf{b}−b?

Please select all correct answers.

[expand title=View Answer]
\mathbf{e}e

\mathbf{d}d

[−21]
[/expand]

Q5. In the previous videos you saw that vectors can be added by placing them start-to-end. For example, the following diagram represents the sum of two new vectors, \mathbf{u} + \mathbf{v}u+v:

The sides of each square on the grid are still of length 11. Which of the following equations does the diagram represent?

[12]

[01]

[22]

[12]+[01]=[22]

[11]

[10]

[21]

[11]+[10]=[21]

[12]

[10]

[22]

[12]+[10]=[22]

[21]

[01]

[22]

[21]+[01]=[22]

Q6. Let’s return to our vectors defined by the diagram below:

What is the vector \mathbf{b}+\mathbf{e}b+e?

[expand title=View Answer] [−1−1] [/expand]

Q7. What is the vector \mathbf{d} – \mathbf{b}d−b?

[expand title=View Answer] [−4,2] [/expand]

Mathematics for Machine Learning: Linear Algebra Course Review

In our experience, we suggest you enroll in Mathematics for Machine Learning: Linear Algebra courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Mathematics for Machine Learning: Linear Algebra Course for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Mathematics for Machine Learning: Linear Algebra Quiz Answers.