Bayesian Statistics: Techniques and Models Quiz Answers

All Weeks Bayesian Statistics: Techniques and Models Quiz Answers

This is the second of a two-course sequence introducing the fundamentals of Bayesian statistics. It builds on the course Bayesian Statistics: From Concept to Data Analysis, which introduces Bayesian methods through use of simple conjugate models. Real-world data often require more sophisticated models to reach realistic conclusions.

This course aims to expand our “Bayesian toolbox” with more general models, and computational techniques to fit them. In particular, we will introduce Markov chain Monte Carlo (MCMC) methods, which allow sampling from posterior distributions that have no analytical solution. We will use the open-source, freely available software R (some experience is assumed, e.g., completing the previous course in R) and JAGS (no experience required). We will learn how to construct, fit, assess, and compare Bayesian statistical models to answer scientific questions involving continuous, binary, and count data.

This course combines lecture videos, computer demonstrations, readings, exercises, and discussion boards to create an active learning experience. The lectures provide some of the basic mathematical development, explanations of the statistical modeling process, and a few basic modeling techniques commonly used by statisticians. Computer demonstrations provide concrete, practical walkthroughs. Completion of this course will give you access to a wide range of Bayesian analytical tools, customizable to your data.

Enroll on Coursera

Bayesian Statistics: Techniques and Models Quiz Answers

Lesson 1

Q1. Which objective of statistical modeling is best illustrated by the following example?

You fit a linear regression of monthly stock values for your company. You use the estimates and recent stock history to calculate a forecast of the stock’s value for the next three months.

  • Quantify uncertainty
  • Inference
  • Hypothesis testing
  • Prediction

Q2. Which objective of statistical modeling is best illustrated by the following example?

A biologist proposes a treatment to decrease genetic variation in plant size. She conducts an experiment and asks you (the statistician) to analyze the data to conclude whether a 10% decrease in variation has occurred.

  • Quantify uncertainty
  • Inference
  • Hypothesis testing
  • Prediction

Q3. Which objective of statistical modeling is best illustrated by the following example?

The same biologist form the previous question asks you how many experiments would be necessary to have a 95% chance at detecting a 10% decrease in plant variation.

  • Quantify uncertainty
  • Inference
  • Hypothesis testing
  • Prediction

Q4. Which of the following scenarios best illustrates the statistical modeling objective of inference?

  • A social scientist collects data and detects positive correlation between sleep deprivation and traffic accidents.
  • A natural language processing algorithm analyzes the first four words of a sentence and provides words to complete the sentence.
  • A venture capitalist uses data about several companies to build a model and makes recommendations about which company to invest in next based on growth forecasts.
  • A model inputs academic performance of 1000 students and predicts which student will be valedictorian after another year of school.

Q5. Which step in the statistical modeling cycle was not followed in the following scenario?

Susan gathers data recording heights of children and fits a linear regression predicting height from age. To her surprise, the model does not predict well the heights for ages 14-17 (because the growth rate changes with age), both for children included in the original data as well as other children outside the model training data.

  • Fit the model
  • Plan and properly collect relevant data
  • Use the model
  • Explore the data

Q6. Which of the following is a possible consequence of failure to plan and properly collect relevant data?

  • You may not be able to visually explore the data.
  • Your selected model will not be able to fit the data.
  • You will not produce enough data to make conclusions with a sufficient degree of confidence.
  • Your analysis may produce incomplete or misleading results.

Q7. For Questions 6 and 7, consider the following:

Xie operates a bakery and wants to use a statistical model to determine how many loaves of bread he should bake each day in preparation for weekday lunch hours. He decides to fit a Poisson model to count the demand for bread. He selects two weeks which have typical business, and for those two weeks, counts how many loaves are sold during the lunch hour each day. He fits the model, which estimates that the daily demand averages 22.3 loaves.

Over the next month, Xie bakes 23 loaves each day, but is disappointed to find that on most days he has excess bread and on a few days (usually Mondays), he runs out of loaves early.

Which of the following steps of the modeling process did Xie skip?

  • Understand the problem
  • Postulate a model
  • Fit the model
  • Check the model and iterate
  • Use the model

Q8. What might you recommend Xie do next to fix this omission and improve his predictive performance?

  • Abandon his statistical modeling initiative.
  • Collect three more weeks of data from his bakery and other bakeries throughout the city. Re-fit the same model to the extra data and follow the results based on more data.
  • Plot daily demand and model predictions against the day of the week to check for patterns that may account for the extra variability. Fit and check a new model which accounts for this.
  • Trust the current model and continue to produce 23 loaves daily, since in the long-run average, his error is zero.

Lesson 2

Q1. Which of the following is one major difference between the frequentist and Bayesian approach to modeling data?

  • The frequentist paradigm treats the data as fixed while the Bayesian paradigm considers data to be random.
  • Frequentist models require a guess of parameter values to initialize models while Bayesian models require initial distributions for the parameters.
  • Frequentist models are deterministic (don’t use probability) while Bayesian models are stochastic (based on probability).
  • Frequentists treat the unknown parameters as fixed (constant) while Bayesians treat unknown parameters as random variables.

Q2. Suppose we have a statistical model with unknown parameter \thetaθ, and we assume a normal prior \theta \sim \text{N}(\mu_0, \sigma_0^2) θ∼N(μ0​,σ02​), where \mu_0μ0​ is the prior mean and \sigma_0^2 σ02​ is the prior variance. What does increasing \sigma_0^2σ02​ say about our prior beliefs about \theta θ?

  • Increasing the variance of the prior widens the range of what we think \thetaθ might be, indicating greater confidence in our prior mean guess \mu_0 μ0​.
  • Increasing the variance of the prior narrows the range of what we think \thetaθ might be, indicating greater confidence in our prior mean guess \mu_0 μ0​.
  • Increasing the variance of the prior narrows the range of what we think \thetaθ might be, indicating less confidence in our prior mean guess \mu_0 μ0​.
  • Increasing the variance of the prior widens the range of what we think \thetaθ might be, indicating less confidence in our prior mean guess \mu_0 μ0​.

Q3. In the lesson, we presented Bayes’ theorem for the case where parameters are continuous. What is the correct expression for the posterior distribution of \thetaθ if it is discrete (takes on only specific values)?

  • p(θ)=∫p(θy)⋅p(y)dy
  • p(θy)=∫p(yθ)⋅p(θ)dθp(yθ)⋅p(θ)​
  • p(θj​∣y)=∑jp(yθj​)⋅p(θj​)p(yθj​)⋅p(θj​)​
  • p(θ)=∑jp(θyj​)⋅p(yj​)

Q4. For Questions 4 and 5, refer to the following scenario.

In the quiz for Lesson 1, we described Xie’s model for predicting demand for bread at his bakery. During the lunch hour on a given day, the number of orders (the response variable) follows a Poisson distribution. All days have the same mean (expected number of orders). Xie is a Bayesian, so he selects a conjugate gamma prior for the mean with shape 3 3 and rate 1 / 15 1/15. He collects data on Monday through Friday for two weeks.

Which of the following hierarchical models represents this scenario?

  • yi​∣μ∼iidN(μ,1.02)for i=1,…,10,μ∼N(3,152)
  • yi​∣λi​∼indPois(λi​)for i=1,…,10,λi​∣α∼Gamma(α,1/15)α∼Gamma(3.0,1.0)
  • yi​∣λ∼iidPois(λ)for i=1,…,10,λμ∼Gamma(μ,1/15)μ∼N(3,1.02)
  • yi​∣λ∼iidPois(λ)for i=1,…,10,λ∼Gamma(3,1/15)

Q5. Which of the following graphical depictions represents the model from Xie’s scenario?

  • .


  • .


  • .


  • .


Q6. Graphical representations of models generally do not identify the distributions of the variables (nodes), but they do reveal the structure of dependence among the variables.

Identify which of the following hierarchical models is depicted in the graphical representation below.

  • xi,j​∣αj​,β∼indGamma(αj​,β),i=1,…,n,j=1,…,∼Exp(b0​)αj​∣ϕ∼iidExp(ϕ),j=1,…,∼Exp(r0​)
  • xi,j​∣α,β∼iidGamma(α,β),i=1,…,n,j=1,…,∼Exp(b0​)α∼Exp(a0​)ϕ∼Exp(r0​)
  • xi,j​∣αj​,β∼indGamma(αj​,β),i=1,…,n,j=1,…,∼Exp(b0​)αj​∼Exp(a0​),j=1,…,∼Exp(r0​)
  • xi,j​∣αi​,βj​∼indGamma(αi​,βj​),i=1,…,n,j=1,…,mβj​∣ϕ∼iidExp(ϕ),j=1,…,mαi​∣ϕ∼iidExp(ϕ),i=1,…,∼Exp(r0​)

Q7. Consider the following model for a binary outcome yy:


where \theta_iθi​ is the probability of success on trial ii. What is the expression for the joint distribution of all variables, written as p(y_1, \ldots, y_6, \theta_1, \ldots, \theta_6, \alpha)p(y1​,…,y6​,θ1​,…,θ6​,α) and denoted by p(\cdots)p(⋯)? You may ignore the indicator functions specifying the valid ranges of the variables (although the expressions are technically incorrect without them).


The PMF for a Bernoulli random variable is f_y(y \mid \theta) = \theta^{y} (1-\theta)^{1-y} fy​(yθ)=θy(1−θ)1−y for y=0y=0 or y=1y=1 and 0 < \theta < 10<θ<1.

The PDF for a Beta random variable is f_\theta( \theta \mid \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} \theta^{\alpha – 1} (1 – \theta)^{\beta – 1} fθ​(θα,β)=Γ(α)Γ(β)Γ(α+β)​θα−1(1−θ)β−1 where \Gamma()Γ() is the gamma function, 0 < \theta < 10<θ<1 and \alpha, \beta > 0 α,β>0.

The PDF for an exponential random variable is f_\alpha( \alpha \mid \lambda) = \lambda \exp(-\lambda \alpha) ​(αλ)=λexp(−λα) for \lambda, \alpha > 0 λ,α>0.

  • p(⋯)=∏i=16​[θiyi​​(1−θi​)1−yi​Γ(α)Γ(b0​)Γ(α+b0​)​θiα−1​(1−θi​)b0​−1r0​exp(−r0​α)]
  • p(⋯)=∏i=16​[θiyi​​(1−θi​)1−yi​]⋅Γ(α)Γ(b0​)Γ(α+b0​)​θα−1(1−θ)b0​−1⋅r0​exp(−r0​α)
  • p(⋯)=∏i=16​[θiyi​​(1−θi​)1−yi​Γ(α)Γ(b0​)Γ(α+b0​)​θiα−1​(1−θi​)b0​−1]
  • p(⋯)=∏i=16​[θiyi​​(1−θi​)1−yi​Γ(α)Γ(b0​)Γ(α+b0​)​θiα−1​(1−θi​)b0​−1]⋅r0​exp(−r0​α)

Q8. In a Bayesian model, let yy denote all the data and \thetaθ denote all the parameters. Which of the following statements about the relationship between the joint distribution of all variables p(y, \theta) = p(\cdots) p(y,θ)=p(⋯) and the posterior distribution p(\theta \mid y)p(θy) is true?

  • They are proportional to each other so that p(y, \theta) = c \cdot p(\theta \mid y) p(y,θ)=cp(θy) where c c is a constant number that doesn’t involve \theta θ at all.
  • The joint distribution p(y,\theta)p(y,θ) is equal to the posterior distribution times a function f(\theta) f(θ) which contains the modification (update) of the prior.
  • Neither is sufficient alone–they are both necessary to make inferences about \theta θ.
  • They are actually equal to each other so that p(y, \theta) = p(\theta \mid y)p(y,θ)=p(θy).

Lesson 3

Q1. If a random variable XX follows a standard uniform distribution (X \sim \text{Unif}(0,1)X∼Unif(0,1)), then the PDF of XX is p(x) = 1p(x)=1 for 0 \le x \le 10≤x≤1.

We can use Monte Carlo simulation of XX to approximate the following integral: \int_0^1 x^2 dx = \int_0^1 x^2 \cdot 1 dx = \int_0^1 x^2 \cdot p(x) dx = \text{E}(X^2) ∫01​x2dx=∫01​x2⋅1dx=∫01​x2⋅p(x)dx=E(X2).

If we simulate 1000 independent samples from the standard uniform distribution and call them x_i^*xi∗​ for i=1,\ldots,1000i=1,…,1000, which of the following calculations will approximate the integral above?

  • (10001​∑i=11000​xi∗​)2
  • 10001​∑i=11000​(xi∗​−x∗ˉ)2 where \bar{x^*} x∗ˉ is the calculated average of the x_i^*xi∗​ samples.
  • 10001​∑i=11000​xi∗​2
  • 10001​∑i=11000​xi∗​

Q2. Suppose we simulate 1000 samples from a \text{Unif}(0, \pi) Unif(0,π) distribution (which has PDF p(x) = \frac{1}{\pi} p(x)=π1​ for 0 \le x \le \pi 0≤xπ) and call the samples x_i^* xi∗​ for i = 1, \ldots, 1000 i=1,…,1000.

If we use these samples to calculate \frac{1}{1000} \sum_{i=1}^{1000} \sin( x_i^* ) 10001​∑i=11000​sin(xi∗​), what integral are we approximating?

  • ∫−∞∞​sin(x)dx
  • ∫01​πsin(x)​dx
  • ∫01​sin(x)dx
  • ∫0ππsin(x)​dx

Q3. Suppose random variables X X and Y Y have a joint probability distribution p(X, Y) p(X,Y). Suppose we simulate 1000 samples from this distribution, which gives us 1000 (x_i^*, y_i^*) (xi∗​,yi∗​) pairs.

If we count how many of these pairs satisfy the condition x_i^* < y_i^* xi∗​<yi∗​ and divide the result by 1000, what quantity are we approximating via Monte Carlo simulation?

  • Pr[X<Y]
  • E(XY)
  • Pr[X<E(Y)]
  • Pr[E(X)<E(Y)]

Q4. If we simulate 100 samples from a \text{Gamma} (2, 1) Gamma(2,1) distribution, what is the approximate distribution of the sample average \bar{x^*} = \frac{1}{100} \sum_{i=1}^{100} x_i^* x∗ˉ=1001​∑i=1100​xi∗​?

Hint: the mean and variance of a \text{Gamma}(a,b) Gamma(a,b) random variable are a / b a/b and a / b^2 a/b2 respectively.

  • Gamma(2,0.01)
  • N(2,2)
  • N(2,0.02)
  • Gamma(2,1)

Q5. For Questions 5 and 6, consider the following scenario:

Laura keeps record of her loan applications and performs a Bayesian analysis of her success rate \theta θ. Her analysis yields a \text{Beta}(5,3) Beta(5,3) posterior distribution for \theta θ.

The posterior mean for \thetaθ is equal to \frac{5}{5+3} = 0.625 5+35​=0.625. However, Laura likes to think in terms of the odds of succeeding, defined as \frac{\theta}{1 – \theta}1−θθ​, the probability of success divided by the probability of failure.

Use R to simulate a large number of samples (more than 10,000) from the posterior distribution for \theta θ and use these samples to approximate the posterior mean for Laura’s odds of success ( \text{E}(\frac{\theta}{1-\theta}) E(1−θθ​) ).

Report your answer to at least one decimal place.

Q6. Laura also wants to know the posterior probability that her odds of success on loan applications is greater than 1.0 (in other words, better than 50:50 odds).

Use your Monte Carlo sample from the distribution of \thetaθ to approximate the probability that \frac{\theta}{1-\theta} 1−θθ​ is greater than 1.0.

Report your answer to at least two decimal places.

Q7. Use a (large) Monte Carlo sample to approximate the 0.3 quantile of the standard normal distribution ( \text{N}(0,1) N(0,1)), the number such that the probability of being less than it is 0.3.

Use the \tt quantile quantile function in R. You can of course check your answer using the \tt qnormqnorm function.

Report your answer to at least two decimal places

Q8. To measure how accurate our Monte Carlo approximations are, we can use the central limit theorem. If the number of samples drawn mm is large, then the Monte Carlo sample mean \bar{\theta^*} θ∗ˉ used to estimate \text{E}(\theta)E(θ) approximately follows a normal distribution with mean \text{E}(\theta) E(θ) and variance \text{Var}(\theta) / m Var(θ)/m. If we substitute the sample variance for \text{Var}(\theta) Var(θ), we can get a rough estimate of our Monte Carlo standard error (or standard deviation).

Suppose we have 100 samples from our posterior distribution for \thetaθ, called \theta_i^* θi∗​, and that the sample variance of these draws is 5.2. A rough estimate of our Monte Carlo standard error would then be \sqrt{ 5.2 / 100 } \approx 0.228 5.2/100​≈0.228. So our estimate \bar{\theta^*} θ∗ˉ is probably within about 0.456 0.456 (two standard errors) of the true \text{E}(\theta) E(θ).

What does the standard error of our Monte Carlo estimate become if we increase our sample size to 5,000? Assume that the sample variance of the draws is still 5.2.

Report your answer to at least three decimal places.

Week 01 : Markov chains

Q1. All but one of the following scenarios describes a valid Markov chain. Which one is not a Markov chain?

  • Suppose you have a special savings account which accrues interest according to the following rules: the total amount deposited in a given month will earn 10(1/2)^{(r-1)}10(1/2)(r−1)% interest in the rrth month after the deposit. For example, if the deposits in January total $100, then you will earn $10 interest in January, $5 interest at the end of February, $2.50 in March, etc. In addition to the interest from January, if you deposit $80 in February, you will earn an additional $8 at the end of February, $4 at the end of March, and so forth. The total amount of money deposited in a given month follows a gamma distribution. Let X_t Xt​ be the total dollars in your account, including all deposits and interest up to the end of month tt.
  • While driving through a city with square blocks, you roll a six-sided die each time you come to an intersection. If the die shows 1, 2, 3, or 4, then you turn left. If the die shows 5 or 6, you turn right. Each time you reach an intersection, you report your coordinates X_t Xt​.
  • Three friends take turns playing chess with the following rules: the player who sits out the current round plays against the winner in the next round. Player A, who has 0.7 probability of winning any game regardless of opponent, keeps track of whether he plays in game tt with an indicator variable X_t Xt​.
  • At any given hour, the number of customers entering a grocery store follows a Poisson distribution. The number of customers in the store who leave during that hour also follows a Poisson distribution (only up to as many people are in the store). A clerk reports the total number of customers in the store X_t Xt​ at the end of hour tt.

Q2. Which of the following gives the transition probability matrix for the chess example in the previous question? The first row and column correspond to X=0X=0 (player A not playing) while the second row and column correspond to X=1X=1 (player A playing).

  • (010.30.7)
  • (01​0.30.7​)
  • (00.310.7)
  • (00.3​10.7​)
  • (0.70.301)
  • (0.70.3​01​)
  • (0.30.701)
  • (0.30.7​01​)

Q3. Continuing the chess example, suppose that the first game is between Players B and C. What is the probability that Player A will play in Game 4? Round your answer to two decimal places

Q4. Which of the following is the stationary distribution for XX in the chess example?

  • ( .750, .250 )
  • ( .231, .769 )
  • ( 0.0, 1.0 )
  • ( .250, .750 )
  • ( .769, .231 )

Q5. If the players draw from the stationary distribution in Question 4 to decide whether Player A participates in Game 1, what is the probability that Player A will participate in Game 4? Round your answer to two decimal places.

Week 02 : MCMC

Q1. For Questions 1 through 3, consider the following model for data that take on values between 0 and 1:


where \alphaα and \betaβ are independent a priori. Which of the following gives the full conditional density for \alphaα up to proportionality?

  • p(αβ,x)∝Γ(α)nΓ(α+β)n​[∏i=1nxi​]α−1αa−1ebαI(0<α<1)​
  • p(αβ,x)∝Γ(α)nΓ(α+β)n​[∏i=1nxi​]α−1αa−1ebαI(α>0)​
  • p(αβ,x)∝[∏i=1nxi​]α−1αa−1ebαI(α>0)
  • p(αβ,x)∝Γ(α)nΓ(β)nΓ(α+β)n​[∏i=1nxi​]α−1[∏i=1n​(1−xi​)]β−1αa−1ebαβr−1esβI(0<α<1)​I(0<β<1)​

Q2. Suppose we want posterior samples for \alphaα from the model in Question 1. What is our best option?

  • The full conditional for \alphaα is not a proper distribution (it doesn’t integrate to 1), so we cannot sample from it.
  • The full conditional for \alphaα is proportional to a common distribution which we can sample directly, so we can draw from that.
  • The joint posterior for \alphaα and \betaβ is a common probability distribution which we can sample directly. Thus we can draw Monte Carlo samples for both parameters and keep the samples for \alphaα.
  • The full conditional for \alphaα is not proportional to any common probability distribution, and the marginal posterior for \betaβ is not any easier, so we will have to resort to a Metropolis-Hastings sampler.

Q3. If we elect to use a Metropolis-Hastings algorithm to draw posterior samples for \alphaα, the Metropolis-Hastings candidate acceptance ratio is computed using the full conditional for \alphaα as


where \alpha^* α∗ is a candidate value drawn from proposal distribution q(\alpha^* | \alpha)q(α∗∣α). Suppose that instead of the full conditional for \alphaα, we use the full joint posterior distribution of \alphaα and \betaβ and simply plug in the current (or known) value of \betaβ. What is the Metropolis-Hastings ratio in this case?

  • αa−1ebαq(αα∗)>0​αa−1eq(α∗∣α)∗>0​​
  • Γ(α∗)nΓ(β)nq(α∗∣α)Γ(α∗+β)n[∏i=1nxi​]α∗−1[∏i=1n​(1−xi​)]β−1αa−1eβr−1esβq(αα∗)I(0<α∗)​I(0<β)​​
  • Γ(α∗)nΓ(α+β)n[∏i=1nxi​]ααa−1ebαq(αα∗)>0​Γ(α)nΓ(α∗+β)n[∏i=1nxi​]ααa−1eq(α∗∣α)∗>0​​
  • Γ(α∗)nΓ(α+β)n[∏i=1nxi​]αq(αα∗)>0​Γ(α)nΓ(α∗+β)n[∏i=1nxi​]αq(α∗∣α)∗>0​​

Q4. For Questions 4 and 5, re-run the Metropolis-Hastings algorithm from Lesson 4 to draw posterior samples from the model for mean company personnel growth for six new companies: (-0.2, -1.5, -5.3, 0.3, -0.8, -2.2). Use the same prior as in the lesson.

Below are four possible values for the standard deviation of the normal proposal distribution in the algorithm. Which one yields the best sampling results?

  • 0.5
  • 1.5
  • 3.0
  • 4.0

Q5. Report the posterior mean point estimate for \muμ, the mean growth, using these six data points. Round your answer to two decimal places.

Week 03 : Common models and multiple factor ANOVA

Q1. For Questions 1 and 2, consider the Anscombe data from the \tt carcar package in R which we analyzed in the quizzes for Lesson 7.

In the original model, we used normal priors for the three regression coefficients. Here we will consider using Laplace priors centered on 0. The parameterization used in JAGS for the Laplace (double exponential) distribution has an inverse scale parameter \tauτ. This is related to the variance vv of the prior in the following way: v = 2 / \tau^2 v=2/τ2. Suppose we want the Laplace prior to have variance v=2v=2. What value of \tauτ should we use in the JAGS code?

Q2. When using an informative variable selection prior like the Laplace, we typically center and scale the data:


Xc = scale(Anscombe, center=TRUE, scale=TRUE)

data_jags = as.list(data.frame(Xc))

Because we subtracted the mean from all (continuous) variables including the response, this is a rare case where we do not need an intercept. Fit the model in JAGS using the Laplace prior with variance 2 for each of the three coefficients, and an inverse gamma prior for the observation variance with effective sample size 1 and prior guess 1.

How do the inferences for the coefficients compare to the original model fit in the quiz for Lesson 7 (besides that their scale has changed due to scaling the data)

  • The inferences are essentially unchanged. The first two coefficients (for income and percentage youth) are significantly positive and the percent urban coefficient is still negative.
  • The inferences are similar, with one exception. The first two coefficients (for income and percentage youth) are significantly positive and the percent urban coefficient’s posterior looks like the Laplace prior, with a spike near 0. This indicates that the percent urban “effect” is very weak.
  • The inferences are vastly different. The marginal posterior for all three coefficients look like their Laplace priors, with a spike near 0. This indicates that the “effect” associated with each covariate is very weak.
  • Inexplicably, the signs of all coefficients have changed (from positive to negative and from negative to positive).

Q3. Consider an ANOVA model for subjects’ responses to three experimental factor variables related to a proposed health supplement: dose, frequency, and physical activity. Dose has two levels: 100mg and 200mg. Frequency has three levels: “daily,” “twice-weekly,” and “weekly.” Physical activity has two levels: “low” and “high.” If these are the only covariates available and we assume that responses are iid normally distributed, what is the maximum number of parameters we could potentially use to uniquely describe the mean response?

Q4. If we have both categorical and continuous covariates, then it is common to use the linear model parameterization instead of the cell means model. If it is unclear how to set it up, you can use the \tt model.matrix model.matrix function in R as we have in the lessons.

Suppose that in addition to the experimental factors in the previous question, we have two continuous covariates: weight in kg and resting heart rate in beats per minute. If we use 100mg dose, daily frequency, and low physical activity as the baseline group, which of the following gives the linear model parameterization for an additive model with no interactions?

  • E(yi​)=μgi​​+β1​weighti​+β2​hearti​ for g_i \in \{ 1, 2, \ldots, 7 \} gi​∈{1,2,…,7}
  • E(yi​)=μgi​​+β1​weighti​+β2​hearti​ for g_i \in \{ 1, 2, \ldots, 12 \} gi​∈{1,2,…,12}
  • E(yi​)=β0​+β1​Idosei​=100​+β2​Ifreqi​=daily​+β3​Iphysi​=low​+……+β4​weighti​+β5​hearti
  • ​E(yi​)=β0​+β1​Idosei​=200​+β2​Ifreqi​=twice weekly​+β3​Ifreqi​=weekly​+……+β4​Iphysi​=high​+β5​weighti​+β6​hearti

Q5. The reading in this honors section describes an analysis of the warp breaks data. Of the models fit, we concluded that the full cell means model was most appropriate. However, we did not assess whether constant observation variance across all groups was appropriate. Re-fit the model with a separate variance for each group. For each variance, use an Inverse-Gamma(1/2, 1/2) prior, corresponding to prior sample size 1 and prior guess 1 for each variance.

Report the DIC value for this model, rounded to the nearest whole number.

Week 04: Predictive distributions and mixture models

Q1. Consider the Poisson process model we fit in the quiz for Lesson 10 which estimates calling rates of a retailer’s customers. The data are attached below.


CSV File

Re-fit the model and use your posterior samples to simulate predictions of the number of calls by a new 29 year old customer from Group 2 whose account is active for 30 days. What is the probability that this new customer calls at least three times during this period? Round your answer to two decimal places.

Q2. Suppose we fit a single component normal distribution to the data whose histogram is shown below.

If we use a noninformative prior for \muμ and \sigma^2σ2 and plot the fit distribution evaluated at the posterior means (in blue), what would the fit look like? Is this model appropriate for these data?

The single normal fit ignores the smaller component, fitting the cluster of points with most data. Consequently, the model places almost no probability in the region of the smaller component.

  • .

The single normal fit accommodates the bi-modality in the dat, but fails to capture the imbalance in the two components. It is not appropriate.

  • .

A single normal distribution does not allow bi-modality. Consequently, the fit places a lot of probability in a region with no data. It is not appropriate.

  • .

The single normal fit nicely captures the features of the data. It is appropriate

Q3. Which of the following histograms shows data that might require a mixture model to fit?

  • .


  • .


  • .


  • .


Q4. The Dirichlet distribution with parameters \alpha_1 = \alpha_2 = \ldots = \alpha_K = 1 α1​=α2​=…=αK​=1 is uniform over its support, the values for which the random vector contains a valid set of probabilities. If \theta θ contains five probabilities corresponding to five categories and has a \text{Dirichlet}(1,1,1,1,1) Dirichlet(1,1,1,1,1) prior, what is the effective sample size of this prior?

Hint: If \thetaθ has a \text{Dirichlet}(\alpha_1, \alpha_2, \ldots, \alpha_K)Dirichlet(α1​,α2​,…,αK​) prior, and the counts of multinomial data in each category are x_1, x_2, \ldots, x_Kx1​,x2​,…,xK​, then the posterior of \thetaθ is \text{Dirichlet}(\alpha_1 + x_1, \alpha_2 + x_2, \ldots, \alpha_K + x_K)Dirichlet(α1​+x1​,α2​+x2​,…,αK​+xK​). The data sample size is clearly \sum_{k=1}^K x_k ∑k=1Kxk​.

Q5. Recall that in the Bayesian formulation of a mixture model, it is often convenient to introduce latent variables z_izi​ which indicate “population” membership of y_iyi​ (the “population” may or may not have meaning in the context of the data). One possible hierarchical formulation is given by:


where f_{j} (y \mid \theta) fj​(yθ) is a probability density for y y for mixture component jj and w=(w_1, w_2, \ldots, w_J)w=(w1​,w2​,…,wJ​) is a vector of prior probabilities of membership.

What is the full conditional distribution for z_izi​?

  • Pr(zi​=j∣⋯)=∑ℓ=1Jfℓ​(yi​∣θ)fj​(yi​∣θ)​,j=1,…,J
  • Pr(zi​=j∣⋯)=∑ℓ=1Jfℓ​(yi​∣θ)wℓ​fj​(yi​∣θ)wj​​,j=1,…,J
  • Pr(zi​=j∣⋯)=wj​,j=1,…,J
  • JPr(zi​=j∣⋯)=∑ℓ=1JwjI(zi​=j))​​(1−wj​)1−I(zi​=j))​wjI(zi​=j))​​(1−wj​)1−I(zi​=j))​​,j=1,…,J
Bayesian Statistics: Techniques and Models Quiz Answers Course Review:

In our experience, we suggest you enroll in Bayesian Statistics: Techniques and Models Quiz Answers courses and gain some new skills from Professionals completely free and we assure you will be worth it.

Bayesian Statistics: Techniques and Models course is available on Coursera for free, if you are stuck anywhere between a quiz or a graded assessment quiz, just visit Networking Funda to get Bayesian Statistics: Techniques and Models Quiz Answers.


I hope this Bayesian Statistics: Techniques and Models Quiz Answers would be useful for you to learn something new from this Course. If it helped you then don’t forget to bookmark our site for more Quiz Answers.

This course is intended for audiences of all experiences who are interested in learning about new skills in a business context; there are no prerequisite courses.

Keep Learning!

Get All Course Quiz Answers of Bayesian Statistics Specialization

Bayesian Statistics: From Concept to Data Analysis Quiz Answers

Bayesian Statistics: Techniques and Models Quiz Answers

Bayesian Statistics: Mixture Models Coursera Quiz Answers

Bayesian Statistics: Time Series Analysis Quiz Answer

Leave a Reply

Your email address will not be published.

error: Content is protected !!