Projects

My recent projects (mainly) fall into three categories: statistical graphics and visualization (vis), statistics education, and developing R packages.

Statistical Graphics and Visualization

Graphical Model Diagnostics

As humans we are easily able to find trend and patterns in visualizations—in fact, we are able to find patterns where automation fails! Visual inference allows us to harness the power of our eyes for inferential procedures, which I have found to be very useful in model checking.

  • My motivation to use visual inference for model checking came from seemingly suspicious residual plots for linear mixed-effects models. After further investigation, I realized that the patterns in these residual plots were artifacts of the model-fitting procedure, not indications of a model deficiency. This led to my working paper outlining how visual inference can protect us from detecting “insignificant” patterns and also helping us overcome issues introduced by the breakdown of asymptotic results and boundary issues.

  • We often use quantile-quantile (Q-Q) plots to graphically assess distributional assumptions during the modeling process, but the “standard” Q-Q plot is flawed: we are asked to look at the vertical distances, but we are inclined to look at the orthogonal distances. One of my projects investigated different designs for Q-Q plots, and found that detrending Q-Q plots while maintaining the aspect ratio is a more powerful visualization. Further, by using visual inference we can create lineup tests of Q-Q plots that are more powerful than classical normality tests. This project led to a paper in The American Statistician, though there is still more to explore in the realm of graphical distributional assessment.

Election Monitoring

During Fall 2015 and Winter 2016 I worked with Henry Ward on an Independent Study using R to track the polls for the 2016 presidential primary season. Henry created a Shiny app to display aggregated poll results. Check out the app and all of the code is available on github!

Exploring the Soul of Communities

Can data help us explore and expose the soul of the community? This was the challenge posed by the 2013 Data Exposition. The Knight Foundation, in cooperation with Gallup, furnished data from 43,000 people over three years (2008–2010) in 26 communities, which I explored along with my colleagues Karsten Maurer and Dave Osthus, in an effort to discover variables associated with community attachment. Our analysis focused on four cities that stood out after out initial exploration of the data set: State College, PA, Detroit, MI, Milledgeville, GA and Biloxi, MS. We used survey-weighted binned scatterplots to graphically explore the association between an individual’s community attachment and perceived economic outlook. Our Data Expo poster presents our findings as a collection of “short stories”.

Exploring What Corn Belt Farmers are Saying

During my first few months at Lawrence I worked with a team of social scientists at Iowa State University to explore farmer perspectives on agriculture and weather variability across the Corn Belt. This resulted in a statistical atlas presenting the results of a survey of 5,000 farmers from 22 watersheds in 11 Corn Belt states. Specifically, we investigated

  • beliefs about climate change,
  • attitudes toward potential climate change adaptation and mitigation actions,
  • concerns about climate-related threats to farm operations,
  • perceived capacity to deal with the predicted impacts of climate change,
  • and recent experience with extreme weather events.

All of the graphics found in the atlas were rendered in R.

Statistics Education

Shiny Apps

During the summer of 2015 I worked with Alex Damisch, a math major here at Lawrence and now a graduate student at the University of Minnesota, to develop a number of Shiny apps that could be used in introductory statistics courses. Some of the apps we developed are similar to those in Statkey, though they do not animate the selection of the bootstrap distribution, so they are not a replacement for Statkey. The rationale behind the Statkey-like apps is to help motivate students to learn R, while improving the graphics for reporting.

ProcedureParameter(s)
One-sample bootstrapmean; median; standard deviation
Two-sample bootstrapdifference in mean/median; ratio of mean/median/standard deviation
Two-sample permutation testdifference in means

The code for all of the apps can be found on github.

R Packages

lmeresampler

The lme4 and nlme packages have made fitting nested linear mixed-effects models in R quite easy. Using the the functionality of these packages we can easily use maximum likelihood or restricted maximum likelihood to fit our model and conduct inference using our parametric toolkit. In practice, the assumptions of our model are often violated to such a degree that leads to biased estimators and incorrect standard errors. In these situations, resampling methods such as the bootstrap can be used to obtain consistent estimators of the bias and standard errors for inference. lmeresampler provides an easy way to bootstrap nested linear-mixed effects models using either the parametric, residual, cases, CGR (semi-parametric), or random effects block (REB) bootstrap, for models fit using either lme4 or nlme. The output from lmeresampler is compatible with the boot package.

The stable release of lmeresampler is available on CRAN.

lmeresampler is still under development. You can watch the development of the lmeresampler on github.

HLMdiag

Up to now diagnostics for mixed and hierarchical models have required much programming by the analyst, especially if one desires influence diagnostics. To help fill this need, the R package HLMdiag:

  • Provides convenience functions for residual analysis.
    • Allows the analyst to obtain residuals estimated by least squares (LS) or empirical Bayes (EB).
    • Allows the analyst to obtain different residual quantities (e.g. marginal, conditional, BLUPs for the two-level model).
  • Implements influence analysis.
    • Leverage
    • Deletion diagnostics — Cook’s distance, MDFFITS, covariance ratio & trace

The stable release of HLMdiag is available on CRAN.

HLMdiag is still under development and there are still improvements to be made! You can watch the development of the HLMdiag package on github.