class: center, middle, inverse, title-slide # Generalized linear models ### Math 315: Bayesian Statistics --- # Arthritis clinical trial .large[ A double-blind clinical trial investigated a new treatment for rheumatoid arthritis We'll focus on a subset of the variables: Variable | Description ----------|--------------------- `Better` | whether the drug improved symptoms <br> 1 = yes, 0 = no `Treatment` | Placebo or Treated `Sex` | Male or Female `Age` | Age in years ] --- class: inverse # Your turn .Large[ 1. What distribution should we use for the response variable, `Better`? 2. How can you transform a linear combination of the predictor to take on values between 0 and 1? `$$\eta_i = \beta_0 + \beta_1 {\tt Treatment} + \beta_2 {\tt Sex} + \beta_3 {\tt Age}$$` ] --- # Data preparation .large[Load and manipulate data] ```r arthritis <- read.csv("http://aloy.rbind.io/data/arthritis.csv") ``` .large[JAGS needs numeric variables, so convert factors to indicators] ```r X <- cbind(intercept = 1, arthritis[,2:4]) X$Treatment <- as.numeric(X$Treatment == "Treated") X$Sex <- as.numeric(X$Sex == "Male") dat_list <- list(Y = arthritis$Better, X = X, n = nrow(arthritis), p = ncol(X)) ``` --- # Model fitting ```r # Logistic regression specification model_string <- textConnection("model{ for(i in 1:n){ Y[i] ~ dbern(q[i]) logit(q[i]) <- inprod(X[i,],beta[]) } for(j in 1:p){beta[j] ~ dnorm(0,0.001)} }") # Compile model logistic_model <- jags.model(model_string, data = dat_list, n.chains = 3, quiet = TRUE) # Burn in samples update(logistic_model, 2000, progress.bar="none") # Posterior sampling post_samples <- coda.samples(logistic_model, variable.names = "beta", n.iter = 50000, thin = 30, progress.bar="none") ``` --- # Results .large[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SD </th> <th style="text-align:right;"> 2.5% </th> <th style="text-align:right;"> 97.5% </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> intercept </td> <td style="text-align:right;"> -3.23 </td> <td style="text-align:right;"> 1.21 </td> <td style="text-align:right;"> -5.75 </td> <td style="text-align:right;"> -1.00 </td> </tr> <tr> <td style="text-align:left;"> Treatment </td> <td style="text-align:right;"> 1.87 </td> <td style="text-align:right;"> 0.56 </td> <td style="text-align:right;"> 0.82 </td> <td style="text-align:right;"> 3.02 </td> </tr> <tr> <td style="text-align:left;"> Sex </td> <td style="text-align:right;"> -1.61 </td> <td style="text-align:right;"> 0.62 </td> <td style="text-align:right;"> -2.87 </td> <td style="text-align:right;"> -0.44 </td> </tr> <tr> <td style="text-align:left;"> Age </td> <td style="text-align:right;"> 0.05 </td> <td style="text-align:right;"> 0.02 </td> <td style="text-align:right;"> 0.01 </td> <td style="text-align:right;"> 0.10 </td> </tr> </tbody> </table> <br> .content-box-yellow[ **Your turn** Provide an interpretation of the `Treatment` coefficient in context. ] ] --- # Puffin nesting sites .large[ Four variables where considered in trying to describe the nesting frequency of the common puffin in a 3m × 3m grid of plots. Variable | Description ---------|-------------------- `nests` | number of nests (burrows) per 9 m<sup>2</sup> quadrat `grass` | grass cover percentage `soil` | mean soil depth in cm `angle` | angle of slope in degrees `distance` | distance from cliff edge in meters ] <br> <br> .footnotesize[Source: Nettleship, D. N. (1972). Breeding success of the Common Puffin (Fratercula arctica L.) on different habitats at Great Island, Newfoundland. *Ecological Monographs*, 42(2), 239-268.] --- class: inverse # Your turn .Large[ 1. What distribution should we use for the response variable, `nests`? 2. How can you transform a linear combination of the predictor to take on values in `\((0, \infty)\)`? `$$\lambda_i = \beta_0 + \beta_1 {\tt soil}_i + \beta_2 {\tt distance}_i$$` ] --- # Data preparation .large[Load and manipulate data] ```r puffin <- read.table( "http://www.markirwin.net/stat149/Data/puffin.txt", header = TRUE ) colnames(puffin)[1] <- "nests" puffin <- subset(puffin, nests > 0) ``` .large[Data list for JAGS] ```r X <- cbind(intercept = 1, puffin[,c("soil", "distance")]) dat_list2 <- list(Y = puffin$nests, X = X, n = nrow(puffin), p = ncol(X)) ``` --- # Model fitting ```r # Logistic regression specification puffin_string <- textConnection("model{ for(i in 1:n){ Y[i] ~ dpois(lambda[i]) log(lambda[i]) <- inprod(X[i,], beta[]) } for(j in 1:p){beta[j] ~ dnorm(0,0.001)} }") # Compile model puffin_model <- jags.model(puffin_string, data = dat_list2, n.chains = 3, quiet = TRUE) # Burn in samples update(puffin_model, 2000, progress.bar="none") # Posterior sampling puffin_samples <- coda.samples(puffin_model, variable.names = "beta", n.iter = 100000, thin = 30, progress.bar="none") ``` --- # Results .large[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SD </th> <th style="text-align:right;"> 2.5% </th> <th style="text-align:right;"> 97.5% </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> intercept </td> <td style="text-align:right;"> 2.2954 </td> <td style="text-align:right;"> 0.3470 </td> <td style="text-align:right;"> 1.6096 </td> <td style="text-align:right;"> 2.9703 </td> </tr> <tr> <td style="text-align:left;"> soil </td> <td style="text-align:right;"> 0.0223 </td> <td style="text-align:right;"> 0.0088 </td> <td style="text-align:right;"> 0.0052 </td> <td style="text-align:right;"> 0.0394 </td> </tr> <tr> <td style="text-align:left;"> distance </td> <td style="text-align:right;"> -0.0361 </td> <td style="text-align:right;"> 0.0056 </td> <td style="text-align:right;"> -0.0470 </td> <td style="text-align:right;"> -0.0251 </td> </tr> </tbody> </table> <br> .content-box-yellow[ **Your turn** Provide an interpretation of the `distance` coefficient in context. ] ]