class: center, middle, inverse, title-slide # Deterministic computational approaches ### Math 315: Bayesian Statistics --- class: middle, clear ## 1. MAP estimation ## 2. Numerical integration ## 3. Bayesian CLT --- # One-parameter example .large[ .content-box-blue[ Does storage in a high-quality freezer improve the taste of hamburgers to justify the extra effort/cost associated with purchasing new equipment? ] ] .large[ - Suppose 16 consumers have been recruited by a fast food chain to compare two types of ground beef patty on the basis of flavor - All patties kept frozen for 8 months + One set (16 patties) in a high-quality freezer, maintains `\(0\pm 1^\circ\)` F + Other set in a freezer where temperature varies between 0 and 15 `\(^\circ\)`F ] --- # One-parameter example .large[ - In test kitchen, patties thawed and prepared by single chef - Each consumer served the patties in a random order - Neither consumers nor servers know which patty is which (double blind) - `\(Y =\)` 13 out of 16 consumers preferred the more-expensive patty ] --- .left-column[ .large[ <br> .bold[Notation] <br> <br> .bold[Assumption] <br> .bold[Likelihood] <br> .bold[Prior] <br> .bold[Posterior] ] ] .right-column[ .large[ `\(Y_i =\)` # prefer the more-expensive patty `\(\theta =\)` prob. selects more-expensive patty <br> Consumers independent; `\(\theta\)` is constant <br> `\(Y|\theta \sim \text{Binomial}(n, \theta)\)` <br> `\(\theta \sim \text{Beta}(a, b) \qquad \qquad \qquad\)` (conjugate prior) <br> `\(\theta | Y \sim \text{Beta}(a + Y, b + N)\)` ] ] --- class: inverse # Your turn .Large[ Let's pretend we didn't know that the posterior followed a beta distribution... Discuss with your neighbor how you might summarize the posterior distribution in this example ] --- # Point estimation .large[ .content-box-blue[ .bold[MAP estimate] the value of `\(\theta\)` that maximizes the posterior distribution ] ] --- # Point estimation .left-column[ .large[ <br> .bolder[When?] <br> <br> .bolder[Benefit?] <br> <br> <br> .bolder[Drawback?] ] ] .large[ .right-column[ Sometimes you don’t need an entire posterior distribution and a single point estimate will do - Example: prediction in machine learning Very fast computation - Closed-form solution via calculus - Fast optimization algorithms<br> (e.g. Newton's method) *No uncertainty expressed!* ] ] --- class: inverse # Your turn .Large[ Write down an integral of `\(p(\theta | Y)\)` that you could solve to find the following posterior summaries: 1. Posterior mean 2. Posterior variance 3. Posterior probability that `\(\theta \in [\ell, u]\)` ] --- # Numeric integration .left-column[ .large[ <br> .bolder[When?] <br> <br> <br> .bolder[Benefit?] <br> .bolder[Drawback?] ] ] .large[ .right-column[ Many posterior summaries of interest are integrals over the posterior These are *p*-dimensional integrals that we usually can’t solve analytically - Example: posterior mean, SD, probability "Easy" computation - `adaptIntegrate()` in R Choice of quadrature methods depends on location and shape of the posterior Only feasible for small number of parameters ] ] --- class: inverse # Your turn .Large[ Discuss with your neighbor: What does the Central Limit Theorem (CLT) tell us? ] --- class: inverse # Your turn .Large[ Discuss with your neighbor: Think back to Math 275, do you remember the asymptotic distribution of the MLE? ] --- # Bayesian CLT <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-1-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CLT <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-2-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CLT <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-3-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CLT <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-4-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CLT <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-5-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Bayesian CLT .left-column[ .large[ <br> .bolder[When?] <br> .bolder[Benefit?] <br><br> <br> .bolder[Drawback?] ] ] .large[ .right-column[ For large data sets with a small number of parameters evoking the Bayes CLT is probably the best approach The approximate posterior can be computing using standard software Can handle larger number of parameters Posterior must be unimodal Accuracy issues — only works for large data sets Posterior can have positive density outside of the parameter space Sensitive to starting value ] ] --- # Two-parameter example .large[ How can we model survival times? Data available on the survival time (in weeks) of 20 rats that were exposed to a high level of radiation ] <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-6-1.svg" width="100%" style="display: block; margin: auto;" /> --- # Two-parameter example .large[ Proposed model: $$ `\begin{aligned} Y_i | \alpha, \beta & \overset{\rm iid}{\sim} \text{Gamma}(\alpha, \beta)\\ \alpha &\sim \text{Gamma}(5, 1)\\ \beta &\sim \text{Gamma}(2, 1) \end{aligned}` $$ <br> <br> Recall: If `\(X \sim \text{Gamma}(a, b)\)`, then it has PDF $$ f(x | a, b) = \dfrac{b^a}{\Gamma(a)} x^{a-1} e^{-bx} $$ ] --- class: inverse # Your turn .large[ Since you don't recognize the form of the joint posterior, you decide to go after the marginal posteriors via integration. For example `$$p(\beta | y_1, \ldots, y_n) = \int_0^\infty p(\alpha, \beta | y_1, \ldots, y_n) d\alpha$$` .bolder[Take a minute and try to evaluate this integral.] Compare notes with your neighbor. ] --- # Multivariate normal PDF .large[ Let `\(\boldsymbol{X}\)` denote a `\(p \times 1\)` random vector, `\(\boldsymbol{\mu}\)` denote the mean vector, and `\(\boldsymbol{\Sigma}\)` denote the covariance matrix. <br> `$$f({\bf X}) = (2\pi)^{p/2} |\boldsymbol{\Sigma}|^{1/2} \exp\left[ -\frac{1}{2} ({\bf X} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} ({\bf X} - \boldsymbol{\mu}) \right]$$` <br> - Contours of log PDF are elliptical - All conditional and marginal distributions are also normal ] --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-7-1.svg" width="85%" style="display: block; margin: auto;" /> --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-8-1.svg" width="85%" style="display: block; margin: auto;" /> --- ```r lgamma_post <- function(theta, data) { # extract parameters alpha <- theta[1] beta <- theta[2] # calculate log likelihood loglike <- 0 for(i in 1:length(data)) { loglike <- loglike + dgamma(data[i], alpha, beta, log = TRUE) } # calculate log posterior post <- loglike + dgamma(alpha, 5, 1, log = TRUE) + dgamma(alpha, 2, 1, log = TRUE) # return log posterior value return(post) } library(LearnBayes) gamma_fit <- laplace( lgamma_post, # log posterior density function mode = c(8, 1), # initial guess for MAP estimate data = times # observed data ) ``` --- ```r gamma_fit ``` ``` ## $mode ## [1] 4.85553691 0.04280149 ## ## $var ## [,1] [,2] ## [1,] 1.49675294 0.0131818444 ## [2,] 0.01318184 0.0001349361 ## ## $int ## [1] -110.4008 ## ## $converge ## [1] TRUE ``` - `\(\widehat{\boldsymbol{\theta}}_{MAP}=\)` `mode` - `\(\widehat{\boldsymbol{\Sigma}}_{MAP}=\)` `var` --- .large[ Posterior variance of each parameter is on the diagonal of `\(\widehat{\boldsymbol{\Sigma}}_{MAP}=\)` `var` Credible intervals can be calculated using the normal distribution: `$$\widehat{\boldsymbol{\theta}}_{i} \pm q \cdot \widehat{\boldsymbol{\Sigma}}_{ii}$$` ```r post_sd <- sqrt(diag(gamma_fit$var)) gamma_fit$mode - qnorm(0.975) * post_sd ``` ``` ## [1] 2.45768062 0.02003415 ``` ```r gamma_fit$mode + qnorm(0.975) * post_sd ``` ``` ## [1] 7.25339320 0.06556882 ``` ] --- # Recap .large[ Just to put everything in one place, here is a table summarizing the posterior distribution for this example: <br> <table> <thead> <tr> <th style="text-align:right;"> MAP </th> <th style="text-align:right;"> SD </th> <th style="text-align:right;"> lower </th> <th style="text-align:right;"> upper </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 4.86 </td> <td style="text-align:right;"> 1.22 </td> <td style="text-align:right;"> 2.46 </td> <td style="text-align:right;"> 7.25 </td> </tr> <tr> <td style="text-align:right;"> 0.04 </td> <td style="text-align:right;"> 0.01 </td> <td style="text-align:right;"> 0.02 </td> <td style="text-align:right;"> 0.07 </td> </tr> </tbody> </table> ] --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-13-1.svg" width="85%" style="display: block; margin: auto;" /> --- # Simulation If you wanted to estimate the mean survival time, the MAP estimate is `\(\widehat{\alpha}/\widehat{\beta}\)` If you want more than a point estimate, simulate from the MVN distribution ```r library(MASS) posterior_samples <- mvrnorm(1000, gamma_fit$mode, gamma_fit$var) mean_survival <- posterior_samples[,1] / posterior_samples[,2] ``` <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-15-1.svg" style="display: block; margin: auto;" /> --- # Another two-parameter example .large[ How can we model cancer mortality data? Data available for 20 cities in Missouri on stomach cancer mortality <br> Variable | Description :--------:|------------------ `y` | number of cancer deaths `n` | number at risk between 45 and 64 ] .footnote[ Adapted from *Bayesian Computation with R* and Tsutakawa et al (1985) ] --- # Likelihood .large[ These data display more variation than the binomial model allows (i.e. they are overdispersed) A beta-binomial model is a better idea for the likelihood: $$ f(y_i | \eta, K) = \dbinom{n_i}{y_i} \dfrac{\text{Beta}(K \eta + y_i,\ K(1-\eta) + (n_i - y_i))}{\text{Beta}(K \eta,\ K(1-\eta))} $$ where - `\(\eta \in (0, 1)\)` is the mean - `\(K > 0\)` is the precision - `\(\text{Beta}(a, b) = \dfrac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}\)` ] --- # The posterior .large[ The researchers used a vague prior `$$\pi(\eta, K) \propto \dfrac{1}{\eta (1-\eta)}\cdot \dfrac{1}{(1+K)^2}$$` <br> So the posterior, `\(p(\eta, K | {\bf y})\)`, is proportional to $$ \dfrac{1}{\eta (1-\eta)}\cdot \dfrac{1}{(1+K)^2} \prod_{i=1}^{20} \dfrac{\text{Beta}(K \eta + y_i,\ K(1-\eta) + (n_i - y_i))}{\text{Beta}(K \eta,\ K(1-\eta))} $$ ] --- ### Log posterior density ```r lbetabinom_post0 <- function(theta, data) { eta <- theta[1] K <- theta[2] y <- data[,1] n <- data[,2] N <- length(y) lpost <- 0 for(i in 1:N) { lpost <- lpost + lbeta(K * eta + y[i], K * (1 - eta) + n[i] - y[i]) } lpost <- lpost - N * lbeta(K * eta, K * (1 - eta)) lpost <- lpost - log(eta) - log(1 - eta) - 2 * log(1 + K) return(lpost) } ``` .footnote[ `lbeta()` function provides the natural log of the beta function ] --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-18-1.svg" width="85%" style="display: block; margin: auto;" /> --- class: inverse # Your turn .Large[ How can you transform the following parameters to be real valued; that is, what function of the parameter will have range `\((-\infty, \infty)\)`? 1. `\(\eta \in (0, 1)\)` 2. `\(K > 0\)` Discuss these transformations with a neighbor. ] --- ```r lbetabinom_post <- function(theta, data) { * theta1 <- theta[1] * theta2 <- theta[2] * eta <- exp(theta1) / (1 + exp(theta1)) * K <- exp(theta2) y <- data[,1] n <- data[,2] N <- length(y) lpost <- 0 for(i in 1:N) { lpost <- lpost + lbeta(K * eta + y[i], K * (1 - eta) + n[i] - y[i]) } lpost <- lpost - N * lbeta(K * eta, K * (1 - eta)) lpost <- lpost - log(eta) - log(1 - eta) - 2 * log(1 + K) return(lpost) } ``` --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-20-1.svg" width="85%" style="display: block; margin: auto;" /> --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-22-1.svg" width="85%" style="display: block; margin: auto;" /> --- <img src="12-bayesian-clt_files/figure-html/unnamed-chunk-23-1.svg" width="85%" style="display: block; margin: auto;" />