Specifying noninformative/objective priors

class: center, middle, inverse, title-slide

# Specifying noninformative/objective priors
### Math 315: Bayesian Statistics

---

class: middle, clear

## 1. Jeffreys' prior

## 2. Introduction to multiparameter models

---

# Jeffreys' prior
.large[
- A transformation invariant prior distribution for `$\theta$` is proportional to the square root of the Fisher information

`\begin{align*}
\pi(\theta) &\propto I(\theta)^{1/2}\\
I(\theta) &= -E \left[ \frac{\partial^2 \log(f(y | \theta))}{\partial\theta} \right]
\end{align*}`

<br>

- Puts more weight in regions of parameter space where the
data are expected to be more informative

<br>

- Review the change of variables formula from calculus!
 ]

---

.Large[
Jeffreys' prior for `$Y|\theta \sim \text{Binomial}(n, \theta)$`
]

???

Note that in this case the prior is inversely proportional to the standard deviation. Why does this make sense?

We see that the data has the least effect on the posterior when the true `$\theta = 1/2$` , and has the greatest effect near the extremes, `$\theta = 0 \text{ or } 1$`.

The Jeffreys prior compensates for this by placing more mass near the extremes of the range, where the data has the strongest effect. We could get the same effect by (for example) setting `$p(\theta) \propto 1 / Var(\theta)$` instead of `$p(\theta) \propto 1 / Var(theta)^{1/2}$` . However, the former prior is not invariant under reparameterization, as we would prefer.

---
class: inverse

# Your turn

.Large[
Let `$Y|\mu \sim \mathcal{N}(\mu, \sigma^2)$` where `$\sigma^2$` is known.

Derive the Jeffreys' prior for `$\mu$`.
]

---
class:inverse, center, middle

# Multiparameter models

---

.left-column[
## Example
]

.right-column[
.large[

- Partial census data for the Dobe area !Kung San, a foraging population

- Compiled from Nancy Howell's interviews
]

<img src="figs/howell-cover.jpg" width="50%" style="display: block; margin: auto;" />
]

---

.left-column[
## Example
]

.right-column[

.large[
- Interest is in analyzing the average height of an adult
]

]

---
class: middle, inverse

# Noninformative analysis

---

# Sampling from the joint posterior

.large[
**Target:** `$p(\mu |y_1, \ldots, y_m)$`

**1.** Sample `$\sigma^2$` from `$\sigma^2 | y_1, \ldots, y_m \sim \text{InverseGamma}\left( \frac{n-1}{2},\ \frac{(n-1)s^2}{2} \right)$`

```r
n <- nrow(adults)
s2 <- var(adults$height)
sigma2s <- 1 / rgamma(1e4, (n-1)/2, (n-2) * s2 / 2) 
```

**2.** Sample `$\mu$` from `$\mu | \sigma^2, y_1, \ldots, y_m \sim \mathcal{N}\left( \bar{y},\ \frac{\sigma^2}{n} \right)$`

```r
ybar <- mean(adults$height)
mus <- rnorm(1e4, ybar, sqrt(sigma2s))
```
]

---
class: clear

```r
post_samples <- data.frame(mu = mus, sigma2 = sigma2s)
ggplot(data = post_samples) +
  geom_point(mapping = aes(x = mu, y = sigma2), alpha = 0.1) +
  theme_minimal() +
  labs(x = expression(mu), y = expression(sigma^2))
```

---
class: clear

```r
ggplot(data = post_samples) +
  geom_point(mapping = aes(x = mu, y = sigma2), alpha = 0.1) +
  geom_density_2d(mapping = aes(x = mu, y = sigma2), color = "orange", size = 0.8) +
  theme_minimal() +
  labs(x = expression(mu), y = expression(sigma^2))
```

---
class: clear

```r
ggplot(data = post_samples) +
  geom_density_2d(mapping = aes(x = mu, y = sigma2), color = "orange") +
  theme_minimal() +
  labs(x = expression(mu), y = expression(sigma^2))
```

---
class: clear

---

## Marginal posteriors

.pull-left[

`$\boldsymbol \mu$`

![](09-intro-multiparameter_files/figure-html/unnamed-chunk-12-1.svg)

```r
quantile(mus, probs = c(0.05, 0.9))
```

```
##       5%      90% 
## 142.0017 164.5319
```

]

.pull-right[

`$\boldsymbol \sigma$`

![](09-intro-multiparameter_files/figure-html/unnamed-chunk-14-1.svg)

```r
quantile(sigma2s, probs = c(0.05, 0.9))
```

```
##      5%     90% 
## 53.0038 66.2231
```

```r
quantile(sqrt(sigma2s), probs = c(0.05, 0.9))
```

```
##       5%      90% 
## 7.280371 8.137758
```
]

---
class: middle, inverse

# Informative analysis

---

# McElreath's analysis

.pull-left[

### Model

`$y_i \sim \mathcal{N}(\mu, \sigma)$`

`$\mu \sim \mathcal{N}(178,\ 20)$`

`$\sigma \sim \text{Unif}(0, 50)$`

]

.pull-right[

![](09-intro-multiparameter_files/figure-html/unnamed-chunk-17-1.svg)

.code70[

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.2   157.2   178.2   178.5   199.4   358.4
```
]

]

---

# Joint posterior via grid approximation

```r
# Create grid over the coordinate plane
param_grid <- expand.grid(
  mu = seq(from = 118, to = 238, length.out = 1000), 
  sigma = seq(from = 0, to = 50, length.out = 1000)
)

# Calculate joint log prior for each point on grid
logprior <- dnorm(param_grid[i, "mu"], 178, 20, log = TRUE) +
      dunif(param_grid[i, "sigma"], 0, 50, log = TRUE)

# Calculate log likelihood for each point on grid
ll <- numeric(length = nrow(param_grid))
for(i in 1:nrow(param_grid)) {
  ll[i] <- sum(dnorm(Howell1$height, mean = param_grid[i, "mu"], 
                     sd = param_grid[i, "sigma"], log = TRUE))
}

# Calculate log posterior, then exponentiate
logposterior <- logprior + ll
unstd_posterior <- exp(logposterior - max(logposterior)) # numeric stability
posterior <- unstd_posterior / sum(unstd_posterior)
```

---

# Joint posterior via grid approximation

```r
posterior_draws <- dplyr::sample_n(
  param_grid, size = 1e4, replace = TRUE, weight = posterior
)
```

---

# Marginal posteriors via grid approximation

.pull-left[

### `$\mu$`

![](09-intro-multiparameter_files/figure-html/unnamed-chunk-22-1.svg)

```r
quantile(posterior_draws$mu, probs = c(0.05, 0.95))
```

```
##       5%      95% 
## 136.2583 140.2222
```

]

.pull-right[

### `$\sigma$`

![](09-intro-multiparameter_files/figure-html/unnamed-chunk-24-1.svg)

```r
quantile(posterior_draws$sigma, probs = c(0.05, 0.95))
```

```
##       5%      95% 
## 26.32633 29.07908
```

<br>

]

---

# Vectorization in R

The grid approximation on a 1000 `$\times$` 1000 grid took about 33 seconds

Vectorizing the log-likelihood calculation improves speed to 14 seconds

```r
# vectorized log likelihood function
llnorm <- function(x, mu, sigma) { 
  sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}

# use all of the x vector for each mu-sigma combo
llnorm <- Vectorize(llnorm, vectorize.args = c("mu", "sigma"))

# Calculate log likelihood for each point on grid
ll <- llnorm(x = Howell1$height, mu = param_grid$mu, sigma = param_grid$sigma)

# Everything else is the same...
```