Specifying conjugate priors

class: center, middle, inverse, title-slide

# Specifying conjugate priors
### Math 315: Bayesian Statistics

---

class: middle, clear

## 1. Data augmentation priors

## 2. Using domain expertise

---
class: inverse

# Your turn

.Large[
Suppose that you are asked to fit a regression model to explore what proportion of Carls like to study on the first floor of the Libe.

What (frequentist) model would you choose?

What subjective decisions went into that choice?
]

---

# Conjugate priors

.large[
.content-box-blue[
A prior is **conjugate** if the posterior is a member of the
same parametric family
]

Examples we have seen:

- Binomial likelihood, Beta prior

- Poisson likelihood, Gamma prior

- Exponential likelihood, Gamma prior

Here's a [longer list of conjugate priors](https:
//en.wikipedia.org/wiki/Conjugate_prior)

]

---

# Beta-binomial model for a proportion

.large[
A fundamental task in statistics is to estimate a proportion
using a series of trials:

- What is the success probability of a new cancer treatment?

- What proportion of voters support my candidate?

- What proportion of the population has a rare gene?
]

--
.large[
**Target:** estimate of the true proportion, `$\theta$`

* obtain the posterior

* calculate a 95% interval

* test whether `$\theta$` equals some predetermined value `$\theta_0$`
]

---

# Beta-binomial model for a proportion

.left-column[

## Assumption

## Likelihood

## Prior

## Posterior

]

.right-column[

.large[
we have *n* independent and identically distributed (iid) success/failure trials

`$Y|\theta \sim \text{Binomial}(n, \theta)$`

`$\theta \in [0, 1]$`, so Beta(*a*, *b*) is a good choice

`$\theta |Y \sim \text{Beta}(a + Y, b + n − Y)$`
]

]

---

# How do we set a and b?

.large[The Beta distribution is **very** flexible, so we need to choose wisely.]

---
class: inverse

# Your turn
.large[
When (in terms of *n*, *a*, and *b*) is the `$\widehat{\theta}$` close to the sample mean, *Y*/*n*?

When  is the `$\widehat{\theta}$` shrunk towards the prior mean, `$a/(a+b)$`?
]

---

# Example

.left-column[

## Setting

## Data

## Likelihood

## Prior

## Posterior

]

.right-column[

.large[
Assessing proportion of U.S. transportation industry workers who use drugs on the job.

RS of size `$n=10$` taken; `$Y = 2$` positive tests

`$Y|\theta \sim \text{Binomial}(n=10, \theta)$`

Based on prior testing, `$\text{Beta}(a=3, b=23)$`

`$\theta |Y \sim \text{Beta}(5, 31)$`
]

]

---

.pull-left[

# Example

.large[

- Posterior `$\propto$` Prior `$\times$` Likelihood

- Posterior can't have density anywhere prior density is 0

- If prior "dominates" the posterior, then the data have very little to contribute to the analysis!

]
]

.pull-right[

]

---

# A "weak" prior

.left-column[

## Setting

## Data

## Likelihood

## Prior

## Posterior

]

.right-column[

.large[
Assessing proportion of U.S. transportation industry workers who use drugs on the job.

RS of size `$n=10$` taken; `$Y = 2$` positive tests

`$Y|\theta \sim \text{Binomial}(n=10, \theta)$`

No prior info so analyst sets `$\text{Beta}(a=1/10, b=1/10)$`

`$\theta |Y \sim \text{Beta}(2.1, 10.1)$`
]

]

---

.pull-left[

# A "weak" prior

.large[
- Likelihood dominates the prior

- The data almost entirely determine the posterior
]
]

.pull-right[

]

---

# A "flat" prior

.left-column[

## Setting

## Data

## Likelihood

## Prior

## Posterior

]

.right-column[

.large[
Assessing proportion of U.S. transportation industry workers who use drugs on the job.

RS of size `$n=10$` taken; `$Y = 2$` positive tests

`$Y|\theta \sim \text{Binomial}(n=10, \theta)$`

No prior info so analyst sets `$\text{Beta}(a=1, b=1)$`

`$\theta |Y \sim \text{Beta}(3, 11)$`
]

]

---

.pull-left[

# A "flat" prior

.large[
- Likelihood dominates the prior

- The data almost entirely determine the posterior
]
]

.pull-right[

]

---
class: center, middle

# Using domain expertise

---

# Example

.large[
Through prior elicitation you determine the following information:

- The researchers *expect* the probability of success, `$\theta$`, to be 2/3

- The researchers believe there is only a 5% chance that `$\theta > 0.9$`
]

---

```r
# Load BB so that we get access to BBsolve()
library(BB)

# define function for the mean of Beta(a, b) dsn
beta.mean <- function(x) x[1] / (x[1] + x[2])

# set up a system of equations to solve
# 2/3 and 0.9 are the elicited values
beta.solve <- function(x){
 obj1 <- beta.mean(x) - 2 / 3 # mean - desired value
 obj2 <- qbeta(0.95, x[1], x[2]) - 0.9 # quantile - desired value
 return(c(obj1, obj2))
 }

# BBsolve(<intial guess>, <system of eqns as function>)
beta.params <- BBsolve(c(1,1), beta.solve, quiet = TRUE)
beta.params$par
```

```
## [1] 5.268468 2.634234
```

???

You could check whether you found the right beta

`qbeta(0.95, a, b) = 0.9`

Check a / (a + b)

---

# The prior predictive distribution

.content-box-blue[
Simulate data implied by your prior specification to see if it seems reasonable
]

```r
prior_thetas <- rbeta(1e5, shape1 = 5.268, shape2 = 2.634)
prior_preds <- rbinom(1e5, size = 17, prob = prior_thetas)
```

???

A predictive distribution that integrates over unknown parameters but is not conditional on observed data is called a prior predictive distribution. **Such a distribution can be useful in evaluating if a prior distribution for `$\theta$` actually translates into reasonable prior beliefs for observable data.**

---

# Sensitivity analysis
.large[
.content-box-blue[
Any time informative priors are used you should conduct a
**sensitivity analysis**
]

- Compare the posterior for several priors

- Describe how the posterior changes

- Example: [https:
//www.ncbi.nlm.nih.gov/pubmed/19010642](https:
//www.ncbi.nlm.nih.gov/pubmed/19010642)
]

---

.footnote[
.small[

Source: [https://www.ncbi.nlm.nih.gov/pubmed/19010642](https:
//www.ncbi.nlm.nih.gov/pubmed/19010642)
]
]

---

# Poisson-gamma model for a rate

.large[
A fundamental task in statistics is to estimate a rate of occurrence:

- Number of virus attacks per day on a computer network

- Number of Ebola cases per day

- Number of diseased trees per square mile in a forest
]

--
.large[
**Target:** estimate of the true rate, `$\lambda$`

* Make observations over time period of length *N* OR make observations over region of area *N*

* Observe `$Y = 0, 1, 2, \ldots$` events

* Obtain the posterior, calculate a 95% interval, test a claim about `$\lambda$` 
]

---

# Poisson-gamma model for a rate

.left-column[

## Assumption

## Likelihood

## Prior

## Posterior

]

.right-column[

.large[
Constant rate of occurrence over unit of time/space

`$Y|\lambda \sim \text{Poisson}(N\lambda)$`

`$\lambda > 0$`, so Gamma(*a*, *b*) is a good choice

`$\lambda |Y \sim \text{Gamma}(a + Y, b + N)$`
]

]

---
class: inverse
# Your turn

.large[
`$$\widehat{\lambda} = E(\lambda | Y) = \dfrac{a + Y}{b+N} = (1-w) \dfrac{a}{b} + w \dfrac{Y}{n}$$`

1. Find an expression for `$w$`.

2. Describe `$Y/N$` and `$a/b$` in words.

3. If we view Gamma(*a*, *b*) as a data augmentation prior, what do *a* and *b* represent?

4. When (in terms of *N*, *a*, and *b*) is  `$\widehat{\lambda}$` close to `$Y/N$`? How does this help you set a "weak" prior?

5. When (in terms of *N*, *a*, and *b*) is  `$\widehat{\lambda}$` shrunk towards `$a/b$`?
]