Generalized linear models

class: center, middle, inverse, title-slide

# Generalized linear models
### Math 315: Bayesian Statistics

---

# Arthritis clinical trial

.large[
A double-blind clinical trial investigated a new treatment for rheumatoid arthritis

We'll focus on a subset of the variables:

Variable | Description
----------|---------------------
`Better`    | whether the drug improved symptoms <br> 1 = yes, 0 = no
`Treatment` | Placebo or Treated
`Sex`       | Male or Female
`Age`       | Age in years
]

---
class: inverse

# Your turn

.Large[
1. What distribution should we use for the response variable, `Better`?

2. How can you transform a linear combination of the predictor to take on values between 0 and 1?

`$$\eta_i = \beta_0 + \beta_1 {\tt Treatment} + \beta_2 {\tt Sex} + \beta_3 {\tt Age}$$`
]

---

# Data preparation

.large[Load and manipulate data]

```r
arthritis <- read.csv("http://aloy.rbind.io/data/arthritis.csv")
```

.large[JAGS needs numeric variables, so convert factors to indicators]

```r
X <- cbind(intercept = 1, arthritis[,2:4])
X$Treatment <- as.numeric(X$Treatment == "Treated")
X$Sex <- as.numeric(X$Sex == "Male")

dat_list <- list(Y = arthritis$Better, X = X, 
                 n = nrow(arthritis), p = ncol(X))
```

---

# Model fitting

```r
# Logistic regression specification
model_string <- textConnection("model{
for(i in 1:n){
  Y[i] ~ dbern(q[i])
  logit(q[i]) <- inprod(X[i,],beta[])
}

for(j in 1:p){beta[j] ~ dnorm(0,0.001)}
}")

# Compile model
logistic_model <- jags.model(model_string, data = dat_list, 
                             n.chains = 3, quiet = TRUE)

# Burn in samples
update(logistic_model, 2000, progress.bar="none")

# Posterior sampling
post_samples <- coda.samples(logistic_model, variable.names = "beta", 
                             n.iter = 50000, thin = 30, 
                             progress.bar="none")
```

---

# Results

.large[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> SD </th>
   <th style="text-align:right;"> 2.5% </th>
   <th style="text-align:right;"> 97.5% </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> intercept </td>
   <td style="text-align:right;"> -3.23 </td>
   <td style="text-align:right;"> 1.21 </td>
   <td style="text-align:right;"> -5.75 </td>
   <td style="text-align:right;"> -1.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:right;"> 1.87 </td>
   <td style="text-align:right;"> 0.56 </td>
   <td style="text-align:right;"> 0.82 </td>
   <td style="text-align:right;"> 3.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sex </td>
   <td style="text-align:right;"> -1.61 </td>
   <td style="text-align:right;"> 0.62 </td>
   <td style="text-align:right;"> -2.87 </td>
   <td style="text-align:right;"> -0.44 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Age </td>
   <td style="text-align:right;"> 0.05 </td>
   <td style="text-align:right;"> 0.02 </td>
   <td style="text-align:right;"> 0.01 </td>
   <td style="text-align:right;"> 0.10 </td>
  </tr>
</tbody>
</table>

<br>

.content-box-yellow[
**Your turn**

Provide an interpretation of the `Treatment` coefficient in context.
]

]

---

# Puffin nesting sites

.large[
Four variables where considered in trying to describe the nesting frequency of the common puffin in a 3m &#215; 3m grid of plots.

Variable | Description
---------|--------------------
`nests`  | number of nests (burrows) per 9 m<sup>2</sup> quadrat
`grass`    | grass cover percentage
`soil`     | mean soil depth in cm
`angle`    | angle of slope in degrees
`distance` | distance from cliff edge in meters
]

.footnotesize[Source: Nettleship, D. N. (1972). Breeding success of the Common Puffin (Fratercula arctica L.) on different habitats at Great Island, Newfoundland. *Ecological Monographs*, 42(2), 239-268.]

---
class: inverse

# Your turn

.Large[
1. What distribution should we use for the response variable, `nests`?

2. How can you transform a linear combination of the predictor to take on values in `$(0, \infty)$`?

`$$\lambda_i = \beta_0 + \beta_1 {\tt soil}_i + \beta_2 {\tt distance}_i$$`
]

---

# Data preparation

.large[Load and manipulate data]

```r
puffin <- read.table(
  "http://www.markirwin.net/stat149/Data/puffin.txt", header = TRUE
)

colnames(puffin)[1] <- "nests"
puffin <- subset(puffin, nests > 0)
```

.large[Data list for JAGS]

```r
X <- cbind(intercept = 1, puffin[,c("soil", "distance")])

dat_list2 <- list(Y = puffin$nests, X = X, 
                 n = nrow(puffin), p = ncol(X))
```

---

# Model fitting

```r
# Logistic regression specification
puffin_string <- textConnection("model{
for(i in 1:n){
  Y[i] ~ dpois(lambda[i])
  log(lambda[i]) <- inprod(X[i,], beta[])
}

for(j in 1:p){beta[j] ~ dnorm(0,0.001)}
}")

# Compile model
puffin_model <- jags.model(puffin_string, data = dat_list2, 
                           n.chains = 3, quiet = TRUE)

# Burn in samples
update(puffin_model, 2000, progress.bar="none")

# Posterior sampling
puffin_samples <- coda.samples(puffin_model, variable.names = "beta", 
                               n.iter = 100000, thin = 30, 
                               progress.bar="none")
```

---

# Results

.large[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> SD </th>
   <th style="text-align:right;"> 2.5% </th>
   <th style="text-align:right;"> 97.5% </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> intercept </td>
   <td style="text-align:right;"> 2.2954 </td>
   <td style="text-align:right;"> 0.3470 </td>
   <td style="text-align:right;"> 1.6096 </td>
   <td style="text-align:right;"> 2.9703 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> soil </td>
   <td style="text-align:right;"> 0.0223 </td>
   <td style="text-align:right;"> 0.0088 </td>
   <td style="text-align:right;"> 0.0052 </td>
   <td style="text-align:right;"> 0.0394 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> distance </td>
   <td style="text-align:right;"> -0.0361 </td>
   <td style="text-align:right;"> 0.0056 </td>
   <td style="text-align:right;"> -0.0470 </td>
   <td style="text-align:right;"> -0.0251 </td>
  </tr>
</tbody>
</table>

<br>

.content-box-yellow[
**Your turn**

Provide an interpretation of the `distance` coefficient in context.
]

]