Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Get All Weeks Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Lesson 1

Q1. If you randomly guess on this question, you have a .25 probability of being correct. Which probabilistic paradigm from Lesson 1 does this argument best demonstrate?

Classical
Frequentist
Bayesian
None of the above

Q2. On a multiple choice test, you do not know the answer to a question with three alternatives. One of the options, however, contains a keyword which the professor used disproportionately often during lecture. Rather than randomly guessing, you select the option containing the keyword, supposing you have a better than 1/3 chance of being correct.

Which probabilistic paradigm from Lesson 1 does this argument best demonstrate?

Classical
Frequentist
Bayesian

Q3. On average, one in three students at your school participates in extracurricular activities. You conclude that the probability that a randomly selected student from your school participates is 1/3.

Which probabilistic paradigm from Lesson 1 does this argument best demonstrate?

Classical
Frequentist
Bayesian

Q4. For Questions 4-6, consider the following scenario:

Your friend offers a bet that she can beat you in a game of chess. If you win, she owes you $5, but if she wins, you owe her $3.

Suppose she is 100% confident that she will beat you. What is her expected return for this game? (Report your answer without the $ symbol.)

Q5. Chess:

Suppose she is only 50% confident that she will beat you (her personal probability of winning is p=0.5). What is her expected return now? (Report your answer without the $ symbol.)

Q6. Chess:

Now assuming your friend will only agree to fair bets (expected return of $0), find her personal probability that she will win. Report your answer as a simplified fraction.

Hint: Use the expected return of her proposed bet.

Preview will appear here..

Q7. For Questions 7-8, consider the following “Dutch book” scenario:

Suppose your friend offers a pair of bets:

(i) if it rains or is overcast tomorrow, you pay him $4, otherwise he pays you $6;

(ii) if it is sunny you pay him $5, otherwise he pays you $5.

Suppose rain, overcast, and sunny are the only events in consideration. If you make both bets simultaneously, this is called a “Dutch book,” as you are guaranteed to win money. How much do you win regardless of the outcome? (Report your answer without the $ symbol.)

Q8. Dutch book:

Apparently your friend doesn’t understand the laws of probability. Let’s examine the bets he offered.

For bet (i) to be fair, his probability that it rains or is overcast must be .6 (you can verify this by calculating his expected return and setting it equal to $0).
For bet (ii) to be fair, his probability that it will be sunny must be .5.

This results in a “Dutch book” because your friend’s probabilities are not coherent. They do not add up to 1. What do they add up to?

Lesson 2

Q1. For Questions 1-4, refer to the following table regarding passengers of the famous Titanic, which tragically sank on its maiden voyage in 1912. The table organizes passenger/crew counts according to the class of their ticket and whether they survived. Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

If we randomly select a person’s name from the complete list of passengers and crew, what is the probability that this person travelled in 1st class? Round your answer to two decimal places.

Q2. Titanic:

What is the probability that a (randomly selected) person survived? Round your answer to two decimal places.

Q3. Titanic:

What is the probability that a (randomly selected) person survived, given that they were in 1st class? Round your answer to two decimal places.

Q4. Titanic:

True/False: The events concerning class and survival are statistically independent.

True
False

Q5. For Questions 5-9, consider the following scenario:

You have three bags, labeled A, B, and C. Bag A contains two red marbles and three blue marbles. Bag B contains five red marbles and one blue marble. Bag C contains three red marbles only.

If you select from bag B, what is the probability that you will draw a red marble? Express the exact answer as a simplified fraction.

Preview will appear here..

Q6. Marbles:

If you randomly select one of the three bags with equal probability (so that P(A)=P(B)=P(C)=1/3P(A)=P(B)=P(C)=1/3) and then randomly draw a marble from that bag, what is the probability that the marble will be blue? Round your answer to two decimal places.

Hint: This is the marginal probability P(\text{blue})P(blue). You can obtain this using the law of total probability (which appears in the denominator in Bayes’ theorem). It is P(\text{blue}) = P(\text{blue} \cap A) + P(\text{blue} \cap B) + P(\text{blue} \cap C) \\ = P(\text{blue} \mid A)\cdot P(A) + P(\text{blue} \mid B)\cdot P(B) + P(\text{blue} \mid C) \cdot P(C)P(blue)=P(blue∩A)+P(blue∩B)+P(blue∩C)=P(blue∣A)⋅P(A)+P(blue∣B)⋅P(B)+P(blue∣C)⋅P(C)

Q7. Marbles:

Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected this marble from is A? That is, find P(A \mid \text{blue})P(A∣blue). Round your answer to two decimal places.

Q8. Marbles:

Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected from is C? That is, find P(C \mid \text{blue})P(C∣blue). Round your answer to two decimal places.

Q9. Marbles:

Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is red. What is the probability that the bag you selected from is C? That is, find P(C \mid \text{red})P(C∣red). Round your answer to two decimal places.

Lesson 3.1

Q1. When using random variable notation, big X denotes ________.

a random variable
a conditional probability
distributed as
a realization of a random variable
the expectation of a random variable
approximately equal to

Q2. When using random variable notation, little x denotes ________.

a random variable
a conditional probability
distributed as
a realization of a random variable
the expectation of a random variable
approximately equal to

Q3. When using random variable notation, X ~ denotes ________.

a random variable
a conditional probability
distributed as
a realization of a random variable
the expectation of a random variable
approximately equal to

Q4. What is the value of f(x) = -5 I_{\{x>2\}}(x) + x I_{\{x < -1\}}(x)f(x)=−5I{x>2}(x)+xI{x<−1}(x) when x=3x=3?

Q5. What is the value of f(x) = -5 I_{\{x>2\}}(x)+ x I_{\{x < -1\}}(x)f(x)=−5I{x>2}(x)+xI{x<−1}(x) when x=0x=0?

Q6. Which of the following scenarios could we appropriately model using a Bernoulli random variable?

Predicting the number of wins in a series of three games against a single opponent (ties count as losses)
Predicting the weight of a typical hockey player
Predicting whether your hockey team wins its next game (tie counts as a loss)
Predicting the number of goals scored in a hockey match

Q7. Calculate the expected value of the following random variable: XX takes on values \{ 0, 1, 2, 3 \}{0,1,2,3} with corresponding probabilities \{ 0.5, 0.2, 0.2, 0.1 \} {0.5,0.2,0.2,0.1}. Round your answer to one decimal place.

Q8. Which of the following scenarios could we appropriately model using a binomial random variable (with n > 1)?

Predicting whether your hockey team wins its next game (tie counts as a loss)
Predicting the number of goals scored in a hockey match
Predicting the number of wins in a series of three games against a single opponent (ties count as losses)
Predicting the weight of a typical hockey player

Q9. Suppose X \sim \text{Binomial}(3, 0.2)X∼Binomial(3,0.2). Calculate P(X=0)P(X=0). Round your answer to two decimal places

Q10. Suppose X \sim \text{Binomial}(3, 0.2)X∼Binomial(3,0.2). Calculate P(X\le 2)P(X≤2). Round your answer to two decimal places.

Lesson 3.2-3.3

Q1. If continuous random variable XX has probability density function (PDF) f(x)f(x), what is the interpretation of the following integral: \int_{-2}^5 f(x) dx∫−25f(x)dx ?

P(X \le -2 \cap X \ge 5) P(X≤−2∩X≥5)
P(X \ge -2 \cap X \le 5)P(X≥−2∩X≤5)
P(X \ge -2 \cup X \le 5)P(X≥−2∪X≤5)
P(X \le -2 \cap X \le 5)P(X≤−2∩X≤5)

Q2. If X \sim \text{Uniform}(0,1)X∼Uniform(0,1), then what is the value of P(-3 < X < 0.2)P(−3<X<0.2)?

Q3. If X \sim \text{Exponential}(5)X∼Exponential(5), find the expected value E(X)E(X). (Round your answer to one decimal place.)

Q4. Which of the following scenarios could we most appropriately model using an exponentially distributed random variable?

The hours of service until all light bulbs in a batch of 5000 fail
The probability of a light bulb failure before 100 hours in service
The number of failed lightbulbs in a batch of 5000 after 100 hours in service
The lifetime in hours of a particular lightbulb

Q5. If X \sim \text{Uniform}(2,6)X∼Uniform(2,6), which of the following is the PDF of XX?

Option:

Option: Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Q6. If X \sim \text{Uniform}(2,6)X∼Uniform(2,6), what is P(2 < X \le 3)P(2<X≤3) ? Round your answer to two decimal places.

Q7. If X\sim \text{N}(0,1)X∼N(0,1), which of the following is the PDF of XX?

Option: Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Q8. If X \sim \text{N}(2, 1)X∼N(2,1), what is the expected value of -5X−5X ? This is denoted as E(-5X)E(−5X).

Q9. Let X \sim \text{N}(1, 1)X∼N(1,1) and Y \sim \text{N}(4, 3^2)Y∼N(4,32). What is the value of E(X+Y)E(X+Y)?

Q10. The normal distribution is also linear in the sense that if X \sim \text{N}(\mu, \sigma^2)X∼N(μ,σ2), then for any real constants a\ne 0a=0 and bb, the distribution of Y = aX + bY=aX+b is distributed \text{N}(a\mu + b, a^2\sigma^2)N(aμ+b,a2σ2).

Using this fact, what is the distribution of Z = \frac{X-\mu}{\sigma}Z=σX−μ ?

\text{N}(\mu, \sigma)N(μ,σ)
\text{N}(\mu, \sigma^2)N(μ,σ2)
\text{N}(\mu / \sigma, 1)N(μ/σ,1)
\text{N}(0, 1)N(0,1)
\text{N}(1, \sigma^2)N(1,σ2)

Q11. Which of the following random variables would yield the highest value of P(-1 < X < 1)P(−1<X<1) ?

Hint: Random variables with larger variance are more dispersed.

X \sim \text{N}(0, 0.1) X∼N(0,0.1)
X \sim \text{N}(0, 1) X∼N(0,1)
X \sim \text{N}(0, 10) X∼N(0,10)
X \sim \text{N}(0, 100) X∼N(0,100)

Module 1 Honors

Q1. Which of the following (possibly more than one) must be true if random variable XX is continuous with PDF f(x)f(x)?

f(x) f(x) is a continuous function
f(x) f(x) is an increasing function of x x
X >= 0 X>=0 always
\int_{-\infty}^\infty f(x) dx = 1 ∫−∞∞f(x)dx=1
\lim_{x \to \infty} f(x) = \inftylimx→∞f(x)=∞
f(x) \ge 0 f(x)≥0 always

Q2. If X \sim \text{Exp}(3)X∼Exp(3), what is the value of P(X>1/3)P(X>1/3)? Round your answer to two decimal places.

Q3. Suppose X \sim \text{Uniform}(0,2)X∼Uniform(0,2) and Y \sim \text{Uniform}(8,10)Y∼Uniform(8,10). What is the value of E(4X + Y)E(4X+Y)?

Q4. For Questions 4-7, consider the following:

Suppose X \sim \text{N}(1, 5^2)X∼N(1,52) and Y \sim \text{N}(-2, 3^2)Y∼N(−2,32) and that XX and YY are independent. We have Z = X + Y \sim \text{N}(\mu, \sigma^2) Z=X+Y∼N(μ,σ2) because the sum of normal random variables also follows a normal distribution.

What is the value of \muμ?

Q5. Adding normals:

What is the value of \sigma^2 σ2?

Hint: If two random variables are independent, the variance of their sum is the sum of their variances.

Q6. Adding normals:

If random variables XX and YY are not independent, we still have E(X+Y) = E(X) + E(Y)E(X+Y)=E(X)+E(Y), but now Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y) where Cov(X,Y) = E[ (X – E[X]) (Y – E[Y]) ]Cov(X,Y)=E[(X−E[X])(Y−E[Y])] is called the covariance between XX and YY.

A convenient formula for calculating variance was given in the supplementary material: Var(X) = E[ (X – E[X])^2 ] = E[X^2] – (E[X])^2Var(X)=E[(X−E[X])2]=E[X2]−(E[X])2. Which of the following is an analogous expression for the covariance of XX and YY?

Hint: Expand the terms inside the expectation in the definition of Cov(X,Y)Cov(X,Y) and recall that E(X)E(X) and E(Y)E(Y) are just constants.

E[Y^2] – (E[Y])^2 E[Y2]−(E[Y])2
E[X^2] – (E[X])^2 + E[Y^2] – (E[Y])^2 E[X2]−(E[X])2+E[Y2]−(E[Y])2
(E[X^2] – (E[X])^2) \cdot (E[Y^2] – (E[Y])^2) (E[X2]−(E[X])2)⋅(E[Y2]−(E[Y])2)
E(XY) – E(X)E(Y) E(XY)−E(X)E(Y)

Q7. Adding normals:

Consider again X \sim \text{N}(1, 5^2)X∼N(1,52) and Y \sim \text{N}(-2, 3^2)Y∼N(−2,32), but this time XX and YY are not independent. Then Z = X+YZ=X+Y is still normally distributed with the same mean found in Question 4. What is the variance of ZZ if E(XY) = -5E(XY)=−5?

Hint: Use the formulas introduced in Question 6.

Q8. Free point:

1) Use the definition of conditional probability to show that for events AA and BB, we have P(A \cap B) = P(B|A)P(A) = P(A|B)P(B) P(A∩B)=P(B∣A)P(A)=P(A∣B)P(B).

2) Show that the two expressions for independence P(A | B) = P(A) P(A∣B)=P(A) and P(A \cap B) = P(A) P(B) P(A∩B)=P(A)P(B) are equivalent.

Solution (1)
Solution (2)

Week 02: Statistical Inference Quiz answers

Lesson 4

Q1. For Questions 1-3, consider the following scenario:

In the example from Lesson 4.1 of flipping a coin 100 times, suppose instead that you observe 47 heads and 53 tails.

Report the value of \hat{p}p^, the MLE (Maximum Likelihood Estimate) of the probability of obtaining heads.

Q2. Coin flip:

Using the central limit theorem as an approximation, and following the example of Lesson 4.1, construct a 95% confidence interval for pp, the probability of obtaining heads.

Report the lower end of this interval and round your answer to two decimal places.

Q3. Coin flip:

Report the upper end of this interval and round your answer to two decimal places.

Q4. The likelihood function for parameter \thetaθ with data \mathbf{y}y is based on which of the following?

P(\theta \mid \mathbf{y} )P(θ∣y)
P(\mathbf{y} \mid \theta)P(y∣θ)
P(\theta)P(θ)
P(\mathbf{y})P(y)
None of the above.

Q5. Recall from Lesson 4.4 that if X_1,\ldots,X_n \overset{\text{iid}}{\sim} \text{Exponential}(\lambda) X1,…,Xn∼iidExponential(λ) (iid means independent and identically distributed), then the MLE for \lambdaλ is 1/\bar{x}1/xˉ where \bar{x}xˉ is the sample mean. Suppose we observe the following data: X_1 = 2.0,\ X_2=2.5,\ X_3=4.1,\ X_4=1.8,\ X_5=4.0 X1=2.0, X2=2.5, X3=4.1, X4=1.8, X5=4.0.

Calculate the MLE for \lambda λ. Round your answer to two decimal places.

Q6. It turns out that the sample mean \bar{x}xˉ is involved in the MLE calculation for several models. In fact, if the data are independent and identically distributed from a Bernoulli(pp), Poisson(\lambdaλ), or Normal(\muμ, \sigma^2σ2), then \bar{x}xˉ is the MLE for pp, \lambdaλ, and \muμ respectively.

Suppose we observe n=4n=4 data points from a normal distribution with unknown mean \muμ. The data are \mathbf{x} = \{-1.2, 0.5, 0.8, -0.3 \} x={−1.2,0.5,0.8,−0.3}.

What is the MLE for \muμ ? Round your answer to two decimal places.

Lesson 5.1-5.2

Q1. For Questions 1-5, consider the following scenario:

You are trying to ascertain your American colleague’s political preferences. To do so, you design a questionnaire with five yes/no questions relating to current issues. The questions are all worded so that a “yes” response indicates a conservative viewpoint.

Let \thetaθ be the unknown political viewpoint of your colleague, which we will assume can only take values \theta=\text{conservative}θ=conservative or \theta=\text{liberal}θ=liberal. You have no reason to believe that your colleague leans one way or the other, so you assign the prior P(\theta=\text{conservative}) = 0.5P(θ=conservative)=0.5.

Assume the five questions are independent and let YY count the number of “yes” responses. If your colleague is conservative, then the probability of a “yes” response on any given question is 0.8. If your colleague is liberal, the probability of a “no” response on any given question is 0.7.

What is an appropriate likelihood for this scenario?

f(y \mid \theta) = {5 \choose y} 0.2^y 0.8^{5-y} f(y∣θ)=(y5)0.2y0.85−y
f(y \mid \theta) = {5 \choose y} 0.8^y 0.2^{5-y} I_{\{\theta=\text{conservative}\}} + {5 \choose y} 0.3^y 0.7^{5-y} I_{\{\theta=\text{liberal}\}}f(y∣θ)=(y5)0.8y0.25−yI{θ=conservative}+(y5)0.3y0.75−yI{θ=liberal}
f(y \mid \theta) = \theta^y e^{-\theta}/y! f(y∣θ)=θye−θ/y!
f(y \mid \theta) = {5 \choose y} 0.8^y 0.2^{5-y} f(y∣θ)=(y5)0.8y0.25−y
f(y \mid \theta) = {5 \choose y} 0.3^y 0.7^{5-y} I_{\{\theta=\text{conservative}\}} + {5 \choose y} 0.8^y 0.2^{5-y} I_{\{\theta=\text{liberal}\}}f(y∣θ)=(y5)0.3y0.75−yI{θ=conservative}+(y5)0.8y0.25−yI{θ=liberal}

Q2. Political preferences:

Suppose you ask your colleague the five questions and he answers “no” to all of them. What is the MLE for \thetaθ?

\hat\theta=\text{conservative}θ^=conservative
\hat\theta=\text{liberal}θ^=liberal
None of the above. The MLE is a number.

Q3. Political preferences:

Recall that Bayes’ theorem gives f(\theta \mid y) = \frac{ f(y\mid \theta) f(\theta) } { \sum_\theta f(y\mid \theta) f(\theta) } f(θ∣y)=∑θf(y∣θ)f(θ)f(y∣θ)f(θ). What is the corresponding expression for this problem?

f(\theta \mid y) = \frac{ {5 \choose y} 0.8^y 0.2^{5-y} (0.5) I_{\{\theta=\text{conservative}\}} + {5 \choose y} 0.3^y 0.7^{5-y} (0.5) I_{\{\theta=\text{liberal}\}} } { {5 \choose y} 0.8^y 0.2^{5-y} (0.5) + {5 \choose y} 0.3^y 0.7^{5-y} (0.5)} f(θ∣y)=(y5)0.8y0.25−y(0.5)+(y5)0.3y0.75−y(0.5)(y5)0.8y0.25−y(0.5)I{θ=conservative}+(y5)0.3y0.75−y(0.5)I{θ=liberal}
f(\theta \mid y) = \frac{ {5 \choose y} 0.8^y 0.2^{5-y} (0.5) + {5 \choose y} 0.3^y 0.7^{5-y} (0.5) } { {5 \choose y} 0.8^y 0.2^{5-y} (0.5) + {5 \choose y} 0.3^y 0.7^{5-y} (0.5)} f(θ∣y)=(y5)0.8y0.25−y(0.5)+(y5)0.3y0.75−y(0.5)(y5)0.8y0.25−y(0.5)+(y5)0.3y0.75−y(0.5)
f(\theta \mid y) = \frac{ \theta^y e^{-\theta} (0.5) /y! } { 0.8^y e^{-.8} (0.5) /y! + 0.3^y e^{-.3} (0.5) /y! } f(θ∣y)=0.8ye−.8(0.5)/y!+0.3ye−.3(0.5)/y!θye−θ(0.5)/y!
f(\theta \mid y) = \frac{ {5 \choose y} 0.8^y 0.2^{5-y} (0.5)^2 } { {5 \choose y} 0.8^y 0.2^{5-y} (0.5) + {5 \choose y} 0.3^y 0.7^{5-y} (0.5)} f(θ∣y)=(y5)0.8y0.25−y(0.5)+(y5)0.3y0.75−y(0.5)(y5)0.8y0.25−y(0.5)2
f(\theta \mid y) = \frac{ {5 \choose y} 0.8^y 0.2^{5-y} (0.2) I_{\{\theta=\text{conservative}\}} + {5 \choose y} 0.3^y 0.7^{5-y} (0.7) I_{\{\theta=\text{liberal}\}} } { {5 \choose y} 0.8^y 0.2^{5-y} (0.2) + {5 \choose y} 0.3^y 0.7^{5-y} (0.7)} f(θ∣y)=(y5)0.8y0.25−y(0.2)+(y5)0.3y0.75−y(0.7)(y5)0.8y0.25−y(0.2)I{θ=conservative}+(y5)0.3y0.75−y(0.7)I{θ=liberal}

Q4. Political preferences:

Evaluate the expression in Question 3 for y=0y=0 and report the posterior probability that your colleague is conservative, given that he responded “no” to all of the questions. Round your answer to three decimal places.

Q5. Political preferences:

Evaluate the expression in Question 3 for y=0y=0 and report the posterior probability that your colleague is liberal, given that he responded “no” to all of the questions. Round your answer to three decimal places.

Q6. For Questions 6-9, consider again the loaded coin example from the lesson.

Recall that your brother has a fair coin which comes up heads 50% of the time and a loaded coin which comes up heads 70% of the time.

Suppose now that he has a third coin which comes up tails 70% of the time. Again, you don’t know which coin your brother has brought you, so you are going to test it by flipping it 4 times, where XX counts the number of heads. Let \thetaθ identify the coin so that there are three possibilities \theta=\text{fair}θ=fair, \theta=\text{loaded favoring heads}θ=loaded favoring heads, and \theta=\text{loaded favoring tails}θ=loaded favoring tails.

Suppose the prior is now P(\theta=\text{fair}) = 0.4P(θ=fair)=0.4, P(\theta= \text{loaded heads}) = 0.3P(θ=loaded heads)=0.3, and P(\theta= \text{loaded tails}) = 0.3P(θ=loaded tails)=0.3. Our prior probability that the coin is loaded is still 0.6, but we do not know which loaded coin it is, so we split the probability evenly between the two options.

What is the form of the likelihood now that we have three options?

f(x \mid \theta) = {4 \choose x} 0.5^x 0.5^{4-x} I_{\{\theta=\text{fair}\}} + {4 \choose x} 0.7^x 0.3^{4-x} I_{\{\theta=\text{loaded heads}\}} + {4 \choose x} 0.3^x 0.7^{4-x} I_{\{\theta=\text{loaded tails}\}} f(x∣θ)=(x4)0.5x0.54−xI{θ=fair}+(x4)0.7x0.34−xI{θ=loaded heads}+(x4)0.3x0.74−xI{θ=loaded tails}
f(x \mid \theta) = {4 \choose x} \left[ 0.5^4 (0.4) I_{\{\theta=\text{fair}\}} + 0.7^x 0.3^{4-x} (0.3) I_{\{\theta=\text{loaded heads}\}} + 0.3^x 0.7^{4-x} (0.3) I_{\{\theta=\text{loaded tails}\}} \right] f(x∣θ)=(x4)[0.54(0.4)I{θ=fair}+0.7x0.34−x(0.3)I{θ=loaded heads}+0.3x0.74−x(0.3)I{θ=loaded tails}]
f(x \mid \theta) = {4 \choose x} \left[ 0.5^4 (0.4) I_{\{\theta=\text{fair}\}} + 0.3^x 0.7^{4-x} (0.3) I_{\{\theta=\text{loaded heads}\}} + 0.7^x 0.3^{4-x} (0.3) I_{\{\theta=\text{loaded tails}\}} \right] f(x∣θ)=(x4)[0.54(0.4)I{θ=fair}+0.3x0.74−x(0.3)I{θ=loaded heads}+0.7x0.34−x(0.3)I{θ=loaded tails}]
f(x \mid \theta) = {4 \choose x} 0.5^x 0.5^{4-x} I_{\{\theta=\text{fair}\}} + {4 \choose x} 0.3^x 0.7^{4-x} I_{\{\theta=\text{loaded heads}\}} + {4 \choose x} 0.7^x 0.3^{4-x} I_{\{\theta=\text{loaded tails}\}} f(x∣θ)=(x4)0.5x0.54−xI{θ=fair}+(x4)0.3x0.74−xI{θ=loaded heads}+(x4)0.7x0.34−xI{θ=loaded tails}

Q7. Loaded coins:

Suppose you flip the coin four times and it comes up heads twice. What is the MLE for \thetaθ?

\hat\theta = \text{fair}θ^=fair
\hat\theta = \text{loaded heads}θ^=loaded heads
\hat\theta = \text{loaded tails}θ^=loaded tails
None of the above. The MLE is a number.

Q8. Loaded coins:

Suppose you flip the coin four times and it comes up heads twice. What is the posterior probability that this is the fair coin? Round your answer to two decimal places.

Q9. Loaded coins:

Suppose you flip the coin four times and it comes up heads twice. What is the posterior probability that this is a loaded coin (favoring either heads or tails)? Round your answer to two decimal places.

Hint: P(\theta=\text{fair} \mid X=2) = 1 – P(\theta=\text{loaded} \mid X=2)P(θ=fair∣X=2)=1−P(θ=loaded∣X=2), so you can use your answer from the previous question rather than repeat the calculation from Bayes’ theorem (both approaches yield the same answer).

Lesson 5.3-5.4

Q1. We use the continuous version of Bayes’ theorem if:

\thetaθ is continuous
YY is continuous
f(y \mid \theta)f(y∣θ) is continuous
All of the above
None of the above

Q2. Consider the coin-flipping example from the lesson. Recall that the likelihood for this experiment was Bernoulli with unknown probability of heads, i.e., f(y \mid \theta) = \theta^y(1-\theta)^{1-y} I_{\{ 0 \le \theta \le 1 \}}f(y∣θ)=θy(1−θ)1−yI{0≤θ≤1}, and we started with a uniform prior on the interval [0,1][0,1].

After the first flip resulted in heads (Y_1=1)(Y1=1), the posterior for \thetaθ became f(\theta \mid Y_1=1) = 2\theta I_{\{ 0 \le \theta \le 1 \}}f(θ∣Y1=1)=2θI{0≤θ≤1}.

Now use this posterior as your prior for \thetaθ before the next (second) flip. Which of the following represents the posterior PDF for \thetaθ after the second flip also results in heads (Y_2 = 1)(Y2=1)?

f(\theta \mid Y_2=1) = \frac{ \theta (1-\theta) \cdot 2\theta} { \int_0^1 \theta (1-\theta) \cdot 2\theta d\theta} I_{\{ 0 \le \theta \le 1 \}} f(θ∣Y2=1)=∫01θ(1−θ)⋅2θdθθ(1−θ)⋅2θI{0≤θ≤1}
f(\theta \mid Y_2=1) = \frac{ (1-\theta) \cdot 2\theta } { \int_0^1 (1-\theta) \cdot 2\theta d\theta} I_{\{ 0 \le \theta \le 1 \}} f(θ∣Y2=1)=∫01(1−θ)⋅2θdθ(1−θ)⋅2θI{0≤θ≤1}
f(\theta \mid Y_2=1) = \frac{ \theta \cdot 2\theta} { \int_0^1 \theta \cdot 2\theta d\theta} I_{\{ 0 \le \theta \le 1 \}} f(θ∣Y2=1)=∫01θ⋅2θdθθ⋅2θI{0≤θ≤1}

Q3. Consider again the coin-flipping example from the lesson. Recall that we used a Uniform(0,1) prior for \thetaθ. Which of the following is a correct interpretation of P(0.3 < \theta < 0.9) = 0.6P(0.3<θ<0.9)=0.6?

(0.3, 0.9) is a 60% credible interval for \thetaθ before observing any data.
(0.3, 0.9) is a 60% credible interval for \thetaθ after observing Y=1Y=1.
(0.3, 0.9) is a 60% confidence interval for \thetaθ.
The posterior probability that \theta \in (0.3, 0.9)θ∈(0.3,0.9) is 0.6.

Q4. Consider again the coin-flipping example from the lesson. Recall that the posterior PDF for \thetaθ, after observing Y=1Y=1, was f(\theta \mid Y=1) = 2\theta I_{\{0 \le \theta \le 1 \}}f(θ∣Y=1)=2θI{0≤θ≤1}. Which of the following is a correct interpretation of P(0.3 < \theta < 0.9 \mid Y=1) = \int_{0.3}^{0.9} 2\theta d\theta = 0.72P(0.3<θ<0.9∣Y=1)=∫0.30.92θdθ=0.72?

(0.3, 0.9) is a 72% credible interval for \thetaθ before observing any data.
(0.3, 0.9) is a 72% credible interval for \thetaθ after observing Y=1Y=1.
(0.3, 0.9) is a 72% confidence interval for \thetaθ.
The prior probability that \theta \in (0.3, 0.9)θ∈(0.3,0.9) is 0.72.

Q5. Which two quantiles are required to capture the middle 90% of a distribution (thus producing a 90% equal-tailed interval)?

0 and .9
.10 and .90
.05 and .95
.025 and .975

Q6. Suppose you collect measurements to perform inference about a population mean \thetaθ. Your posterior distribution after observing data is \theta \mid \mathbf{y} \sim \text{N}(0,1) θ∣y∼N(0,1).

Report the upper end of a 95% equal-tailed interval for \thetaθ. Round your answer to two decimal places.

Q7. What does “HPD interval” stand for?

Highest partial density interval
Highest precision density interval
Highest point distance interval
Highest posterior density interval

Q8. Each of the following graphs depicts a 50% credible interval from a posterior distribution. Which of the intervals represents the HPD interval?

50% interval: \theta \in (0.326, 0.674) θ∈(0.326,0.674) Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

50% interval: \theta \in (0.500, 1.000) θ∈(0.500,1.000) Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

50% interval: \theta \in (0.196, 0.567) θ∈(0.196,0.567) Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

50% interval: \theta \in (0.400, 0.756) θ∈(0.400,0.756) Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Module 2 Honors

Q1. Although the likelihood function is not always a product of f(y_i | \theta) f(yi∣θ) for i = 1, 2, \ldots, ni=1,2,…,n, this product form is convenient mathematically. What assumption about the observations yy allows us to multiply their individual likelihood components?

Q2. One nice property of MLEs is that they are transformation invariant. That is, if \hat{\theta}θ^ is the MLE for \thetaθ, then the MLE for g(\theta)g(θ) is g(\hat{\theta})g(θ^) for any function g(\cdot)g(⋅).

Suppose you conduct 25 Bernoulli trials and observe 10 successes. What is the MLE for the odds of success? Round your answer to two decimal places.

Q3. For Questions 3-4, recall the scenario from Lesson 5 in which your brother brings you a coin which may be fair (probability of heads 0.5) or loaded (probability of heads 0.7).

Another sibling wants to place bets on whether the coin is loaded. If the coin is actually loaded, she will pay you $1. If it is not loaded, you will pay her $z.

Using your prior probability of 0.6 that the coin is loaded and assuming a fair game, determine the amount zz that would make the bet fair (with prior expectation $0). Round your answer to one decimal place.

Q4. Before taking the bet, you agree to flip the coin once. It lands heads. Your sister argues that this is evidence for the loaded coin (in which case she pays you $1) and demands you increase zz to 2.

Should you accept this new bet? Base your answer on your updated (posterior) probability that the coin is loaded.

Yes, your posterior expected payoff is now less than $0.
Yes, your posterior expected payoff is now greater than $0.
No, your posterior expected payoff is now less than $0.
No, your posterior expected payoff is now greater than $0.

Week 03: Priors and Models for Discrete Data Quiz Answers

Lesson 6

Q1. For Questions 1-2, consider the following experiment:

Suppose you are trying to calibrate a thermometer by testing the temperature it reads when water begins to boil. Because of natural variation, you take several measurements (experiments) to estimate \thetaθ, the mean temperature reading for this thermometer at the boiling point.

You know that at sea level, water should boil at 100 degrees Celsius, so you use a precise prior with P(\theta = 100) = 1P(θ=100)=1. You then observe the following five measurements: 94.6 95.4 96.2 94.9 95.9.

What will the posterior for \thetaθ look like?

Most posterior probability will be concentrated near the sample mean of 95.4 degrees Celsius.
Most posterior probability will be spread between the sample mean of 95.4 degrees Celsius and the prior mean of 100 degrees Celsius.
The posterior will be \theta = 100θ=100 with probability 1, regardless of the data.
None of the above.

Q2. Thermometer:

Suppose you believe before the experiments that the thermometer is biased high, so that on average it would read 105 degrees Celsius, and you are 95% confident that the average would be between 100 and 110.

Which of the following prior PDFs most accurately reflects this prior belief?

Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Q3. Recall that for positive integer nn, the gamma function has the following property: \Gamma(n) = (n-1)! Γ(n)=(n−1)!.

What is the value of \Gamma(6)Γ(6)?

Q4. Find the value of the normalizing constant, cc, which will cause the following integral to evaluate to 1.

\int_0^1 c \cdot z^3 (1-z)^1 dz ∫01c⋅z3(1−z)1dz.

Hint: Notice that this is proportional to a beta density. We only need to find the values of the parameters \alphaα and \betaβ and plug those into the usual normalizing constant for a beta density.

Γ(z)Γ(1−z)Γ(1)=(z−1)!1!0!
Γ(3)Γ(1)Γ(3+1)=2!0!3!=3
Γ(4)Γ(2)Γ(4+2)=3!1!5!=20

Q5. Consider the coin-flipping example from Lesson 5. The likelihood for each coin flip was Bernoulli with probability of heads \thetaθ, or f(y \mid \theta) = \theta^y (1-\theta)^{1-y}f(y∣θ)=θy(1−θ)1−y for y=0y=0 or y=1y=1, and we used a uniform prior on \thetaθ.

Recall that if we had observed Y_1=0Y1=0 instead of Y_1=1Y1=1, the posterior distribution for \thetaθ would have been f(\theta \mid Y_1=0) = 2(1-\theta) I_{\{0 \le \theta \le 1\}}f(θ∣Y1=0)=2(1−θ)I{0≤θ≤1}. Which of the following is the correct expression for the posterior predictive distribution for the next flip Y_2 \mid Y_1=0Y2∣Y1=0?

f(y2∣Y1=0)=∫01θy2(1−θ)1−y22(1−θ)dθ for y_2 = 0y2=0 or y_2=1y2=1
f(y2∣Y1=0)=∫012(1−θ)dθ for y_2 = 0y2=0 or y_2=1y2=1.
f(y2∣Y1=0)=∫01θy2(1−θ)1−y2dθ for y_2 = 0y2=0 or y_2=1y2=1.
f(y2∣Y1=0)=∫012θy2(1−θ)1−y2dθ for y_2 = 0y2=0 or y_2=1y2=1.

Q6. The prior predictive distribution for XX when \thetaθ is continuous is given by \int f(x \mid \theta) \cdot f(\theta) d\theta∫f(x∣θ)⋅f(θ)dθ. The analogous expression when \thetaθ is discrete is \sum_{\theta} f(x \mid \theta) \cdot f(\theta) ∑θf(x∣θ)⋅f(θ), adding over all possible values of \thetaθ.

Let’s return to the example of your brother’s loaded coin from Lesson 5. Recall that he has a fair coin where heads comes up on average 50% of the time (p=0.5p=0.5) and a loaded coin (p=0.7p=0.7). If we flip the coin five times, the likelihood is binomial: f(x \mid p) = {5 \choose x} p^x (1-p)^{5-x}f(x∣p)=(x5)px(1−p)5−x where XX counts the number of heads.

Suppose you are confident, but not sure that he has brought you the loaded coin, so that your prior is f(p) = 0.9 I_{\{ p=0.7 \}} + 0.1 I_{\{ p=0.5 \}} f(p)=0.9I{p=0.7}+0.1I{p=0.5}. Which of the following expressions gives the prior predictive distribution of XX?

f(x)=(x5).7x(.3)5−x+(x5).5x(.5)5−x
f(x)=(x5).7x(.3)5−x(.1)+(x5).5x(.5)5−x(.9)
f(x)=(x5).7x(.3)5−x(.5)+(x5).5x(.5)5−x(.5)
f(x)=(x5).7x(.3)5−x(.9)+(x5).5x(.5)5−x(.1)

Lesson 7

Q1. For Questions 1-5, consider the example of flipping a coin with unknown probability of heads (\thetaθ):

Suppose we use a Bernoulli likelihood for each coin flip, i.e., f(y_i \mid \theta) = \theta^{y_i} (1-\theta)^{1-y_i} I_{\{ 0 \le \theta \le 1 \}}f(yi∣θ)=θyi(1−θ)1−yiI{0≤θ≤1} for y_i=0yi=0 or y_i = 1yi=1, and a uniform prior for \thetaθ.

What is the posterior distribution for \thetaθ if we observe the following sequence: (T, T, T, T) where H denotes heads (Y=1Y=1) and T denotes tails (Y=0Y=0)?

Uniform(0,4)
Beta(1,4)
Beta(0, 4)
Beta(1, 5)
Beta(4,0)

Q2. Coin flip:

Which of the following graphs depicts the posterior PDF of \thetaθ if we observe the sequence (T, T, T, T)? (You may want to use R or Excel to plot the posterior.)

Q3. Coin flip:

What is the maximum likelihood estimate (MLE) of \thetaθ if we observe the sequence (T, T, T, T)?

Q4. Coin flip:

What is the posterior mean estimate of \thetaθ if we observe the sequence (T, T, T, T)? Round your answer to two decimal places.

Q5. Coin flip:

Use R or Excel to find the posterior probability that \theta < 0.5θ<0.5 if we observe the sequence (T,T,T,T). Round your answer to two decimal places.

Q6. For Questions 6-9, consider the following scenario:

An engineer wants to assess the reliability of a new chemical refinement process by measuring \thetaθ, the proportion of samples that fail a battery of tests. These tests are expensive, and the budget only allows 20 tests on randomly selected samples. Assuming each test is independent, she assigns a binomial likelihood where XX counts the samples which fail. Historically, new processes pass about half of the time, so she assigns a Beta(2,2) prior for \thetaθ (prior mean 0.5 and prior sample size 4). The outcome of the tests is 6 fails and 14 passes.

What is the posterior distribution for \thetaθ?

Beta(6,14)
Beta(14,6)
Beta(8,16)
Beta(6, 20)
Beta(16,8)

Q7. Chemical refinement:

Use R or Excel to calculate the upper end of an equal-tailed 95% credible interval for \thetaθ. Round your answer to two decimal places.

Q8. Chemical refinement:

The engineer tells you that the process is considered promising and can proceed to another phase of testing if we are 90% sure that the failure rate is less than .35.

Calculate the posterior probability P(\theta < .35 \mid x)P(θ<.35∣x). In your role as the statistician, would you say that this new chemical should pass?

Yes, P(\theta < .35 \mid x) \ge 0.9P(θ<.35∣x)≥0.9.
No, P(\theta < .35 \mid x) < 0.9P(θ<.35∣x)<0.9.

Q9. Chemical refinement:

It is discovered that the budget will allow five more samples to be tested. These tests are conducted and none of them fail.

Calculate the new posterior probability P(\theta < .35 \mid x_1, x_2)P(θ<.35∣x1,x2). In your role as the statistician, would you say that this new chemical should pass (with the same requirement as in the previous question)?

Hint: You can use the posterior from the previous analysis as the prior for this analysis. Assuming independence of tests, this yields the same posterior as the analysis in which we begin with the Beta(2,2) prior and use all 25 tests as the data.

Yes, P(\theta < .35 \mid x_1, x_2) \ge 0.9P(θ<.35∣x1,x2)≥0.9.
No, P(\theta < .35 \mid x_1, x_2) < 0.9P(θ<.35∣x1,x2)<0.9.

Q10. Let X \mid \theta \sim \text{Binomial}(9, \theta)X∣θ∼Binomial(9,θ) and assume a \text{Beta}(\alpha, \beta)Beta(α,β) prior for \thetaθ. Suppose your prior guess (prior expectation) for \thetaθ is 0.4 and you wish to use a prior effective sample size of 5, what values of \alphaα and \betaβ should you use?

\alpha = 4 α=4, \beta=10β=10
\alpha = 2 α=2, \beta=3β=3
\alpha = 4 α=4, \beta=6β=6
\alpha = 2 α=2, \beta=5β=5

Lesson 8

Q1. For Questions 1-8, consider the chocolate chip cookie example from the lesson.

As in the lesson, we use a Poisson likelihood to model the number of chips per cookie, and a conjugate gamma prior on \lambdaλ, the expected number of chips per cookie. Suppose your prior expectation for \lambdaλ is 8.

The conjugate prior with mean 8 and effective sample size of 2 is Gamma(a,2). Find the value of a.

Q2. Cookies:

The conjugate prior with mean 8 and standard deviation 1 is Gamma(a,8). Find the value of a

1 point

Q3. Cookies:

Suppose you are not very confident in your prior guess of 8, so you want to use a prior effective sample size of 1/100 cookies. Then the conjugate prior is Gamma(a,0.01). Find the value of a. Round your answer to two decimal places.

Q4. Cookies:

Suppose you decide on the prior Gamma(8, 1), which has prior mean 8 and effective sample size of one cookie.

We collect data, sampling five cookies and counting the chips in each. We find 9, 12, 10, 15, and 13 chips.

What is the posterior distribution for λ?

Gamma(5, 59)
Gamma(1, 8)
Gamma(6, 67)
Gamma(59, 5)
Gamma(8, 1)
Gamma(67, 6)

Q5. Cookies:

Continuing the previous question, what of the following graphs shows the prior density (dotted line) and posterior density (solid line) of λ?

Q6. Cookies:

Continuing Question 4, what is the posterior mean for λ? Round your answer to one decimal place.

Q7. Cookies:

Continuing Question 4, use R or Excel to find the lower end of a 90% equal-tailed credible interval for λ. Round your answer to one decimal place.

Q8. Cookies:

Continuing Question 4, suppose that in addition to the five cookies reported, we observe an additional ten cookies with 109 total chips. What is the new posterior distribution for λ, the expected number of chips per cookie?

Hint: You can either use the posterior from the previous analysis as the prior here, or you can start with the original Gamma(8,1) prior and update with all fifteen cookies. The result will be the same.

Gamma(11, 109)
Gamma(10, 109)
Gamma(16, 176)
Gamma(109, 10)
Gamma(176, 16)

Q9. For Questions 9-10, consider the following scenario:

A retailer notices that a certain type of customer tends to call their customer service hotline more often than other customers, so they begin keeping track. They decide a Poisson process model is appropriate for counting calls, with calling rate \thetaθ calls per customer per day.

The model for the total number of calls is then Y \sim \text{Poisson}(n\cdot t \cdot \theta)Y∼Poisson(n⋅t⋅θ) where nn is the number of customers in the group and tt is the number of days. That is, if we observe the calls from a group with 24 customers for 5 days, the expected number of calls would be 24\cdot 5\cdot \theta = 120\cdot \theta24⋅5⋅θ=120⋅θ.

The likelihood for YY is then f(y \mid \theta) = \frac{(nt\theta)^y e^{-nt\theta}}{y!} \propto \theta^y e^{-nt\theta}f(y∣θ)=y!(ntθ)ye−ntθ∝θye−ntθ.

This model also has a conjugate gamma prior \theta \sim \text{Gamma}(a, b)θ∼Gamma(a,b) which has density (PDF) f(\theta) = \frac{b^a}{\Gamma(a)} \theta^{a-1} e^{-b\theta} \propto \theta^{a-1} e^{-b\theta} f(θ)=Γ(a)baθa−1e−bθ∝θa−1e−bθ.

Following the same procedure outlined in the lesson, find the posterior distribution for \thetaθ.

\text{Gamma}(a + y, b + nt) Gamma(a+y,b+nt)
\text{Gamma}(a + 1, b + y) Gamma(a+1,b+y)
\text{Gamma}(a + y – 1, b + 1) Gamma(a+y−1,b+1)
\text{Gamma}(y, nt) Gamma(y,nt)

Q10. Poisson process:

On average, the retailer receives 0.01 calls per customer per day. To give this group the benefit of the doubt, they set the prior mean for \thetaθ at 0.01 with standard deviation 0.5. This yields a \text{Gamma}(\frac{1}{2500}, \frac{1}{25})Gamma(25001,251) prior for \thetaθ.

Suppose there are n=24n=24 customers in this particular group of interest, and the retailer monitors calls from these customers for t=5t=5 days. They observe a total of y=6y=6 calls from this group.

The following graph shows the resulting \text{Gamma}(6.0004, 120.04)Gamma(6.0004,120.04) posterior for \thetaθ, the calling rate for this group. The vertical dashed line shows the average calling rate of 0.01. Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Does this posterior inference for \thetaθ suggest that the group has a higher calling rate than the average of 0.01 calls per customer per day?

Yes, the posterior mean for \thetaθ is twice the average of 0.01.
Yes, most of the posterior mass (probability) is concentrated on values of \thetaθ greater than 0.01.
No, the posterior mean is exactly 0.01.
No, most of the posterior mass (probability) is concentrated on values of \thetaθ less than 0.01.

Module 3 Honors

Q1. Identify which of the following conditions (possibly more than one) must be true for the sum of nn Bernoulli random variables (with success probability pp) to follow a binomial distribution.

the sum must exceed n
pp must be the same for each of the Bernoulli random variables
pp must be less than .5
the sum must be greater than zero
each Bernoulli random variable is independent of all others

Q2. For Questions 2-4, consider the following:

In Lesson 6.3 we found the prior predictive distribution for a Bernoulli trial under a uniform prior on the success probability \thetaθ. We now derive the prior predictive distribution when the prior is any conjugate beta distribution.

There are two straightforward ways to do this. The first approach is the same as in the lesson. The marginal distribution of yy is f(y) = \int_0^1 f(y|\theta) f(\theta) d\theta f(y)=∫01f(y∣θ)f(θ)dθ. Now f(\theta)f(θ) is a beta PDF, but the same principles apply: we can move constants out of the integral and find a new normalizing constant to make the integral evaluate to 1.

Another approach is to notice that we can write Bayes’ theorem as f(\theta | y) = \frac{f(y|\theta)f(\theta)}{f(y)} f(θ∣y)=f(y)f(y∣θ)f(θ). If we multiply both sides by f(y)f(y) and divide both sides by f(\theta | y)f(θ∣y), then we get f(y) = \frac{f(y|\theta)f(\theta)}{f(\theta | y)}f(y)=f(θ∣y)f(y∣θ)f(θ) where f(\theta)f(θ) is the beta prior PDF and f(\theta | y)f(θ∣y) is the updated beta posterior PDF.

Both approaches yield the same answer. What is the prior predictive distribution f(y)f(y) for this model when the prior for \thetaθ is \text{Beta}(a,b)Beta(a,b)?

f(y) = \frac{ \Gamma(a+b) }{ \Gamma(a) \Gamma(b) } \theta^{a-1} (1-\theta)^{b-1} f(y)=Γ(a)Γ(b)Γ(a+b)θa−1(1−θ)b−1 for y=0, 1y=0,1
f(y) = \frac{ \Gamma(a+y) }{ \Gamma(a) } \cdot \frac{ \Gamma(b + 1 – y) }{ \Gamma(b) } f(y)=Γ(a)Γ(a+y)⋅Γ(b)Γ(b+1−y) for y=0, 1y=0,1
f(y) = \frac{ \Gamma(a+b + 1) }{ \Gamma(a+y) \Gamma(b+1-y) } \theta^{a+y-1} (1-\theta)^{b+1-y} f(y)=Γ(a+y)Γ(b+1−y)Γ(a+b+1)θa+y−1(1−θ)b+1−y for y=0, 1y=0,1
f(y) = \frac{ \Gamma(a+b) }{ \Gamma(a + b + 1) } \theta^y (1-\theta)^{1-y} f(y)=Γ(a+b+1)Γ(a+b)θy(1−θ)1−y for y=0, 1y=0,1
f(y) = \frac{ \Gamma(a+b) }{ \Gamma(a + b + 1) } \cdot \frac{ \Gamma(a+y) }{ \Gamma(a) } \cdot \frac{ \Gamma(b + 1 – y) }{ \Gamma(b) } f(y)=Γ(a+b+1)Γ(a+b)⋅Γ(a)Γ(a+y)⋅Γ(b)Γ(b+1−y) for y=0, 1y=0,1

Q3. Beta-Bernoulli predictive distribution:

Now suppose the prior for \thetaθ is \text{Beta}(2,2)Beta(2,2). What is the prior predictive probability that y^*=1y∗=1 for a new observation y^*y∗? Round your answer to one decimal place

Q4. Beta-Bernoulli predictive distribution:

After specifying our \text{Beta}(2,2)Beta(2,2) prior for \thetaθ, we observe 10 Bernoulli trials, 3 of which are successes.

What is the posterior predictive probability that y^*=1y∗=1 for the next (11th) observation y^*y∗? Round your answer to two decimal places.

Week 04: Models for Continuous Data Quiz Answers

Lesson 09 Quiz Answers

Q1. For Questions 1-3, refer to the bus waiting time example from the lesson.

Recall that we used the conjugate gamma prior for \lambdaλ, the arrival rate in busses per minute. Suppose our prior belief about this rate is that it should have mean 1/20 arrivals per minute with standard deviation 1/5. Then the prior is \text{Gamma}(a, b)Gamma(a,b) with a=1/16a=1/16.

Find the value of bb. Round your answer to two decimal places.

Q2. Bus waiting times:

Suppose that we wish to use a prior with the same mean (1/20), but with effective sample size of one arrival. Then the prior for \lambdaλ is \text{Gamma}(1, 20)Gamma(1,20).

In addition to the original Y_1=12Y1=12, we observe the waiting times for four additional busses: Y_2=15Y2=15, Y_3=8Y3=8, Y_4=13.5Y4=13.5, Y_5=25Y5=25.

Recall that with multiple (independent) observations, the posterior for \lambdaλ is \text{Gamma}(\alpha, \beta) Gamma(α,β) where \alpha = a + nα=a+n and \beta = b + \sum y_iβ=b+∑yi.

What is the posterior mean for \lambdaλ? Round your answer to two decimal places

Q3. Bus waiting times:

Continuing Question 2, use R or Excel to find the posterior probability that \lambda < 1/10λ<1/10? Round your answer to two decimal places.

Q4. For Questions 4-10, consider the following earthquake data:

The United States Geological Survey maintains a list of significant earthquakes worldwide. We will model the rate of earthquakes of magnitude 4.0+ in the state of California during 2015. An iid exponential model on the waiting time between significant earthquakes is appropriate if we assume:

earthquake events are independent,
the rate at which earthquakes occur does not change during the year, and
the earthquake hazard rate does not change (i.e., the probability of an earthquake happening tomorrow is constant regardless of whether the previous earthquake was yesterday or 100 days ago).

Let Y_iYi denote the waiting time in days between the ith earthquake and the following earthquake. Our model is Y_i \overset{\text{iid}}{\sim} \text{Exponential}(\lambda)Yi∼iidExponential(λ) where the expected waiting time between earthquakes is E(Y) = 1/\lambdaE(Y)=1/λ days.

Assume the conjugate prior \lambda \sim \text{Gamma}(a,b)λ∼Gamma(a,b). Suppose our prior expectation for \lambdaλ is 1/30, and we wish to use a prior effective sample size of one interval between earthquakes.

What is the value of aa?

Q5. Earthquake data:

What is the value of bb?

Q6. Earthquake data:

The significant earthquakes of magnitude 4.0+ in the state of California during 2015 occurred on the following dates (http://earthquake.usgs.gov/earthquakes/browse/significant.php?year=2015):

January 4, January 20, January 28, May 22, July 21, July 25, August 17, September 16, December 30.

Recall that we are modeling the waiting times between earthquakes in days. Which of the following is our data vector?

y = (16, 8, 114, 60, 4, 23, 30, 105)
y = (3, 16, 8, 114, 60, 4, 23, 30, 105)
y = (0, 0, 4, 2, 0, 1, 1, 3)
y = (3, 16, 8, 114, 60, 4, 23, 30, 105, 1)

Q7. Earthquake data:

The posterior distribution is \lambda \mid \mathbf{y} \sim \text{Gamma}(\alpha, \beta)λ∣y∼Gamma(α,β). What is the value of \alphaα?

Q8. Earthquake data:

The posterior distribution is \lambda \mid \mathbf{y} \sim \text{Gamma}(\alpha, \beta)λ∣y∼Gamma(α,β). What is the value of \betaβ?

Q9. Earthquake data:

Use R or Excel to calculate the upper end of the 95% equal-tailed credible interval for \lambdaλ, the rate of major earthquakes in events per day. Round your answer to two decimal places.

Q10. Earthquake data:

The posterior predictive density for a new waiting time y^*y∗ in days is:

f(y^* \mid \mathbf{y} ) = \int f(y^* \mid \lambda) \cdot f(\lambda \mid \mathbf{y}) d\lambda = \frac{ \beta^\alpha \Gamma(\alpha + 1) }{ (\beta + y^*)^{\alpha + 1} \Gamma(\alpha) } I_{\{y^* \ge 0 \}} = \frac{ \beta^\alpha \alpha}{ (\beta + y^*)^{\alpha + 1}} I_{\{y^* \ge 0 \}}f(y∗∣y)=∫f(y∗∣λ)⋅f(λ∣y)dλ=(β+y∗)α+1Γ(α)βαΓ(α+1)I{y∗≥0}=(β+y∗)α+1βααI{y∗≥0}

where f(\lambda \mid \mathbf{y})f(λ∣y) is the \text{Gamma}(\alpha, \beta)Gamma(α,β) posterior found earlier. Use R or Excel to evaluate this posterior predictive PDF.

Which of the following graphs shows the posterior predictive distribution for y^*y∗?

Lesson 10

Q1. For Questions 1-6, consider the thermometer calibration problem from the quiz in Lesson 6.

Suppose you are trying to calibrate a thermometer by testing the temperature it reads when water begins to boil. Because of natural variation, you take nn independent measurements (experiments) to estimate \thetaθ, the mean temperature reading for this thermometer at the boiling point. Assume a normal likelihood for these data, with mean \thetaθ and known variance \sigma^2 = 0.25σ2=0.25 (which corresponds to a standard deviation of 0.5 degrees Celsius).

Suppose your prior for \thetaθ is (conveniently) the conjugate normal. You know that at sea level, water should boil at 100 degrees Celsius, so you set the prior mean at m_0=100m0=100.

Yi∣θ∼iidN(θ,100) ; \theta \sim \text{N}(0.25, s_0^2)θ∼N(0.25,s02)
Yi∣θ∼iidN(θ,0.25) ; \theta \sim \text{N}(100, s_0^2)θ∼N(100,s02)
Yi∣θ,σ2∼iidN(θ,σ2) ; \sigma^2 \sim \text{Inverse-Gamma}(100, s_0^2)σ2∼Inverse-Gamma(100,s02)
Yi∣θ∼iidN(100,0.25) ; \theta \sim \text{N}(\theta, s_0^2)θ∼N(θ,s02)
Yi∣σ2∼iidN(100,σ2) ; \sigma^2 \sim \text{Inverse-Gamma}(0.25, s_0^2)σ2∼Inverse-Gamma(0.25,s02)

Q2. Thermometer calibration:

You decide you want the prior to be equivalent (in effective sample size) to one measurement.

What value should you select for s_0^2s02 the prior variance of \thetaθ? Round your answer to two decimal places.

Q3. Thermometer calibration:

You collect the following n=5n=5 measurements: (94.6, 95.4, 96.2, 94.9, 95.9).

What is the posterior distribution for \thetaθ?

\text{N}(96.17, 24)N(96.17,24)
\text{N}(95.41, 0.042)N(95.41,0.042)
\text{N}(100, 0.250)N(100,0.250)
\text{N}(96.17, 0.042)N(96.17,0.042)
\text{N}(95.41, 0.250)N(95.41,0.250)
\text{N}(95.41, 24)N(95.41,24)

Q4. Thermometer calibration:

Use R or Excel to find the upper end of a 95% equal-tailed credible interval for \thetaθ.

Q5. Thermometer calibration:

After collecting these data, is it reasonable to conclude that the thermometer is biased toward low values?

Yes, we have P(\theta < 100 \mid \mathbf{y}) > 0.9999P(θ<100∣y)>0.9999.
Yes, we have P(\theta > 100 \mid \mathbf{y}) > 0.9999P(θ>100∣y)>0.9999.
No, we have P(\theta < 100 \mid \mathbf{y}) < 0.0001P(θ<100∣y)<0.0001.
No, we have P(\theta = 100 \mid \mathbf{y}) = 0 P(θ=100∣y)=0.

Q6. Thermometer calibration:

What is the posterior predictive distribution of a single future observation Y^*Y∗?

\text{N}(96.17, 0.292)N(96.17,0.292)
\text{N}(100, 0.50)N(100,0.50)
\text{N}(95.41, 0.292)N(95.41,0.292)
\text{N}(96.17, 0.042)N(96.17,0.042)
\text{N}(95.41, 0.50)N(95.41,0.50)

Q7. For Questions 7-10, consider the following scenario:

Your friend moves from city A to city B and is delighted to find her favorite restaurant chain at her new location. After several meals, however, she suspects that the restaurant in city B is less generous. She decides to investigate.

She orders the main dish on 30 randomly selected days throughout the year and records each meal’s weight in grams. You still live in city A, so you assist by performing the same experiment at your restaurant. Assume that the dishes are served on identical plates (measurements subtract the plate’s weight), and that your scale and your friend’s scale are consistent.

The following histogram shows the 30 measurements from Restaurant B taken by your friend. Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Is it reasonable to assume that these data are normally distributed?

Yes, the distribution appears to follow a bell-shaped curve.
Yes, the data are tightly clustered around a single number.
No, the first bar to the left of the peak is not equal in height to he first bar to the right of the peak.
No, there appear to be a few extreme observations (outliers).

Q8. Restaurants:

Your friend investigates the three observations above 700 grams and discovers that she had ordered the incorrect meal on those dates. She removes these observations from the data set and proceeds with the analysis using n=27n=27.

She assumes a normal likelihood for the data with unknown mean \muμ and unknown variance \sigma^2σ2. She uses the model presented in Lesson 10.2 where, conditional on \sigma^2σ2, the prior for \muμ is normal with mean mm and variance \sigma^2 / w σ2/w. Next, the marginal prior for \sigma^2σ2 is \text{Inverse-Gamma}(a,b)Inverse-Gamma(a,b).

Your friend’s prior guess on the mean dish weight is 500 grams, so we set m=500m=500. She is not very confident with this guess, so we set the prior effective sample size w=0.1w=0.1. Finally, she sets a=3a=3 and b=200b=200.

We can learn more about this inverse-gamma prior by simulating draws from it. If a random variable XX follows a \text{Gamma}(a,b)Gamma(a,b) distribution, then \frac{1}{X}X1 follows an \text{Inverse-Gamma}(a,b)Inverse-Gamma(a,b) distribution. Hence, we can simulate draws from a gamma distribution and take their reciprocals, which will be draws from an inverse-gamma.

To simulate 1000 draws in R (replace aa and bb with their actual values):

1

2

z <- rgamma(n=1000, shape=a, rate=b)

x <- 1/z

To simulate one draw in Excel (replace aa and bb with their actual values):

1

2

= 1 / GAMMA.INV( RAND(), a, 1/b )

where probability=RAND(), alpha=a, and beta=1/b. Then copy this formula to obtain multiple draws.

Simulate a large number of draws (at least 300) from the prior for \sigma^2σ2 and report your approximate prior mean from these draws. It does not need to be exact.

Q9. Restaurants:

With the n=27n=27 data points, your friend calculates the sample mean \bar{y} = 609.7 yˉ=609.7 and sample variance s^2 = \frac{1}{n-1} \sum(y_i – \bar{y})^2 = 401.8 s2=n−11∑(yi−yˉ)2=401.8.

Using the update formulas from Lesson 10.2, she calculates the following posterior distributions:

\sigma^2 \mid \mathbf{y} \sim \text{Inverse-Gamma}(a’, b’) σ2∣y∼Inverse-Gamma(a′,b′)

\mu \mid \sigma^2, \mathbf{y} \sim \text{N}(m’, \frac{\sigma^2}{w+n}) μ∣σ2,y∼N(m′,w+nσ2)

where

a’ = a + \frac{n}{2} = 3 + \frac{27}{2} = 16.5 a′=a+2n=3+227=16.5

b’ = b + \frac{n-1}{2} s^2 + \frac{wn}{2(w+n)}(\bar{y}-m)^2 = 200 + \frac{27-1}{2} 401.8 + \frac{0.1\cdot 27}{2(0.1+27)}(609.7-500)^2 = 6022.9 b′=b+2n−1s2+2(w+n)wn(yˉ−m)2=200+227−1401.8+2(0.1+27)0.1⋅27(609.7−500)2=6022.9

m’ = \frac{n\bar{y} + wm}{w + n} = \frac{27\cdot 609.7 + 0.1\cdot 500}{0.1 + 27} = 609.3 m′=w+nnyˉ+wm=0.1+2727⋅609.7+0.1⋅500=609.3

w=0.1w=0.1, and w+n=27.1w+n=27.1.

To simulate draws from this posterior, begin by drawing values for \sigma^2σ2 from its posterior using the method from the preceding question. Then, plug these values for \sigma^2σ2 into the posterior for \muμ and draw from that normal distribution.

To simulate 1000 draws in R:

1

2

3

z <- rgamma(1000, shape=16.5, rate=6022.9)

sig2 <- 1/z

mu <- rnorm(1000, mean=609.3, sd=sqrt(sig2/27.1))

To simulate one draw in Excel:

1

= 1 / GAMMA.INV( RAND(), 16.5, 1/6022.9 )

gets saved into cell A1 (for example) as the draw for \sigma^2σ2. Then draw

1

2

= NORM.INV( RAND(), 609.3, SQRT(A1/27.1) )

where probability=RAND(), mean=609.3, standard_dev=SQRT(A1/27.1), and A1 is the reference to the cell containing the draw for \sigma^2σ2. Then copy these formulas to obtain multiple draws.

We can use these simulated draws to help us approximate inferences for \muμ and \sigma^2σ2. For example, we can obtain a 95% equal-tailed credible for \muμ by calculating the quantiles/percentiles of the simulated values.

In R:

1

quantile(x=mu, probs=c(0.025, 0.975))

In Excel:

1

2

3

= PERCENTILE.INC( A1:A500, 0.025 )

= PERCENTILE.INC( A1:A500, 0.975 )

where array=A1:A500 (or the cells where you have stored samples of \muμ) and k=0.025 or 0.975.

Perform the posterior simulation described above and compute your approximate 95% equal-tailed credible interval for \muμ. Based on your simulation, which of the following appears to be the actual interval?

(245, 619)
(582, 637)
(602, 617)
(608, 610)

Q10. Restaurants:

You complete your experiment at Restaurant A with n=30n=30 data points, which appear to be normally distributed. You calculate the sample mean \bar{y} = 622.8 yˉ=622.8 and sample variance s^2 = \frac{1}{n-1} \sum(y_i – \bar{y})^2 = 403.1 s2=n−11∑(yi−yˉ)2=403.1.

Repeat the analysis from Question 9 using the same priors and draw samples from the posterior distribution of \sigma_A^2σA2 and \mu_AμA (where the AA denotes that these parameters are for Restaurant A).

Treating the data from Restaurant A as independent from Restaurant B, we can now attempt to answer your friend’s original question: is restaurant A more generous? To do so, we can compute posterior probabilities of hypotheses like \mu_A > \mu_BμA>μB. This is a simple task if we have simulated draws for \mu_AμA and \mu_BμB. For i=1, \ldots, Ni=1,…,N (the number of simulations drawn for each parameter), make the comparison \mu_A > \mu_BμA>μB using the iith draw for \mu_AμA and \mu_BμB. Then count how many of these return a TRUE value and divide by NN, the total number of simulations.

In R (using 1000 simulated values):

1

sum( muA > muB ) / 1000

1

mean( muA > muB )

In Excel (for one value):

1

2

= IF(A1 > B1, 1, 0)

where the first argument is the logical test which compares the value of cell A1 with that of B1, 1=value_if_true, and 0=value_if_false. Copy this formula to compare all \mu_AμA, \mu_BμB pairs. This will yield a column of binary (0 or 1) values, which you can sum or average to approximate the posterior probability.

Would you conclude that the main dish from restaurant A weighs more than the main dish from restaurant B on average?

Yes, the posterior probability that \mu_A > \mu_BμA>μB is at least 0.95.
Yes, the posterior probability that \mu_A > \mu_BμA>μB is less than 0.05.
No, the posterior probability that \mu_A > \mu_BμA>μB is at least 0.95.
No, the posterior probability that \mu_A > \mu_BμA>μB is less than 0.05.

Lesson 11

Q1. Suppose we flip a coin five times to estimate \thetaθ, the probability of obtaining heads. We use a Bernoulli likelihood for the data and a non-informative (and improper) Beta(0,0) prior for \thetaθ. We observe the following sequence: (H, H, H, T, H).

Because we observed at least one H and at least one T, the posterior is proper. What is the posterior distribution for \thetaθ?

Beta(4.5, 1.5)
Beta(1.5, 4.5)
Beta(5,2)
Beta(4,1)
Beta(1,4)
Beta(2,5)

Q2. Continuing the previous question, what is the posterior mean for \thetaθ? Round your answer to one decimal place.

Q3. Consider again the thermometer calibration problem from Lesson 10.

Assume a normal likelihood with unknown mean \thetaθ and known variance \sigma^2=0.25σ2=0.25. Now use the non-informative (and improper) flat prior for \thetaθ across all real numbers. This is equivalent to a conjugate normal prior with variance equal to \infty∞.

You collect the following n=5n=5 measurements: (94.6, 95.4, 96.2, 94.9, 95.9). What is the posterior distribution for \thetaθ?

\text{N}(96.0, 0.25^2)N(96.0,0.252)
\text{N}(96.0, 0.05^2)N(96.0,0.052)
\text{N}(95.4, 0.05)N(95.4,0.05)
\text{N}(95.4, 0.25)N(95.4,0.25)

Q4. Which of the following graphs shows the Jeffreys prior for a Bernoulli/binomial success probability pp?

Hint: The Jeffreys prior in this case is Beta(1/2, 1/2).

Q5. Scientist A studies the probability of a certain outcome of an experiment and calls it \thetaθ. To be non-informative, he assumes a Uniform(0,1) prior for \thetaθ.

Scientist B studies the same outcome of the same experiment using the same data, but wishes to model the odds \phi = \frac{\theta}{1 – \theta}ϕ=1−θθ. Scientist B places a uniform distribution on \phiϕ. If she reports her inferences in terms of the probability \thetaθ, will they be equivalent to the inferences made by Scientist A?

Yes, they both used uniform priors.
Yes, they used the Jeffreys prior.
No, they are using different parameterizations.
No, they did not use the Jeffreys prior.

Module 4 Honors

Q1. Consider again the golf data from the regression quiz for Questions 1-4.

The data are found at http://www.stat.ufl.edu/~winner/data/pgalpga2008.dat and consist of season statistics for individual golfers on the United States LPGA and PGA tours. The first column reports each player’s average driving distance in yards. The second column reports the percentage of the player’s drives that finish in the fairway, measuring their accuracy. The third and final column has a 1 to denote a female golfer (on the LPGA tour), and a 2 to denote male golfer (on the PGA tour).

Now consider a multiple regression on the full data set, including both female and male golfers. Modify the third variable to be a 0 if the golfer is female and 1 if the golfer is male and fit the following regression:

E(y) = b_0 + b_1x_1 + b_2x_2 E(y)=b0+b1x1+b2x2

where x_1x1 is the average driving distance and x_2x2 is the indicator that the golfer is male.

What is the posterior mean estimate of b_0b0? Round your answer to the nearest whole number.

Q2. Golf data:

The posterior mean estimates of the other two coefficients are \hat{b}_1 = -0.323b^1=−0.323, and \hat{b}_2 = 8.94b^2=8.94. What is the interpretation of \hat{b}_1b^1?

Holding all else constant, each additional yard of distance is associated with a 0.323 decrease in drive accuracy percentage.
Holding all else constant, each additional yard of distance is associated with a 0.323 increase in drive accuracy percentage.
Holding all else constant, being male is associated with a 0.323 increase in drive accuracy percentage.
Holding all else constant, being male is associated with a 0.323 decrease in drive accuracy percentage.

Q3. Golf data:

The standard error for b_1 b1 (which we can think of as marginal posterior standard deviation in this case) is roughly 1/101/10 times the magnitude of the posterior mean estimate \hat{b}_1 = -0.323b^1=−0.323. In other words, the posterior mean is more than 10 posterior standard deviations from 0. What does this suggest?

The posterior probability that b_1 < 0 b1<0 is very low, suggesting a negative relationship between driving distance and accuracy.
The posterior probability that b_1 < 0 b1<0 is about 0.5, suggesting no evidence for an association between driving distance and accuracy.
The posterior probability that b_1 < 0 b1<0 is very high, suggesting a negative relationship between driving distance and accuracy.

Q4. Golf data:

The estimated value of b_2b2 would typically be interpreted to mean that holding all else constant (for a fixed driving distance), golfers on the PGA tour are about 9% more accurate with their drives on average than golfers on the LPGA tour. However, if you explore the data, you will find that the PGA tour golfers’ average drives are 40+ yards longer than LPGA tour golfers’ average drives, and that the LPGA tour golfers are actually more accurate on average. Thus b_2b2, while a vital component of the model, is actually a correction for the discrepancy in driving distances. Although model fitting can be easy (especially with software), interpreting the results requires a thoughtful approach.

It would also be prudent to check that the model fits the data well. One of the primary tools in regression analysis is the residual plot. Residuals are defined as the observed values yy minus their predicted values \hat{y}y^. Patterns in the plot of \hat{y}y^ versus residuals, for example, can indicate an inadequacy in the model. These plots are easy to produce.

In R:

1

plot(fitted(mod), residuals(mod))

where “mod” is the model object fitted with the lm() command.

In Excel, residual plots are available as an output option in the regression dialogue box.

Fit the regression and examine the residual plots. Which of the following statements most accurately describes the residual plots for this analysis?

The residuals appear to be more spread apart for smaller predicted values \hat{y}y^. There are no outliers (extreme observations).
The residuals appear to be random and lack any patterns or trends. There are no outliers (extreme observations).
The residuals appear to exhibit a curved trend. There is at least one outlier (extreme observation) that we may want to investigate.
The residuals appear to be random and lack any patterns or trends. However, there is at least one outlier (extreme observation) that we may want to investigate.

Get All Course Quiz Answers of Bayesian Statistics Specialization

Bayesian Statistics: From Concept to Data Analysis Quiz Answers

Bayesian Statistics: Techniques and Models Quiz Answers

Bayesian Statistics: Mixture Models Coursera Quiz Answers

Bayesian Statistics: Time Series Analysis Quiz Answer

Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Get All Weeks Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Table of Contents

Lesson 1

Lesson 2

Lesson 3.1

Lesson 3.2-3.3

Module 1 Honors

Week 02: Statistical Inference Quiz answers

Lesson 4

Lesson 5.1-5.2

Lesson 5.3-5.4

Module 2 Honors

Week 03: Priors and Models for Discrete Data Quiz Answers

Lesson 6

Lesson 7

Lesson 8

Module 3 Honors

Week 04: Models for Continuous Data Quiz Answers

Lesson 09 Quiz Answers

Lesson 10

Lesson 11

Module 4 Honors

Get All Course Quiz Answers of Bayesian Statistics Specialization

Team Networking Funda

Leave a ReplyCancel Reply

Get All Weeks Bayesian Statistics: From Concept to Data Analysis Coursera Quiz Answers

Table of Contents

Lesson 1

Lesson 2

Lesson 3.1

Lesson 3.2-3.3

Module 1 Honors

Week 02: Statistical Inference Quiz answers

Lesson 4

Lesson 5.1-5.2

Lesson 5.3-5.4

Module 2 Honors

Week 03: Priors and Models for Discrete Data Quiz Answers

Lesson 6

Lesson 7

Lesson 8

Module 3 Honors

Week 04: Models for Continuous Data Quiz Answers

Lesson 09 Quiz Answers

Lesson 10

Lesson 11

Module 4 Honors

Get All Course Quiz Answers of Bayesian Statistics Specialization

Team Networking Funda

Related Posts

Developing Data Products Quiz Answers – Coursera Graded Solution

Complete Practical Machine Learning Quiz Answers

Regression Models Quiz Answers – All Weeks Graded Quiz Solution

Leave a ReplyCancel Reply

Trending now