Recent Posts

Goin' to Carolina in my mind (or on my hard drive)

Out-of-memory processing of North Carolina’s voter file with DuckDB and Apache Arrow

Oh, I'm sure it's probably nothing

How we do (or don’t) think about null values and why the polyglot push makes it all the more important

Update: grouped data quality check PR merged to dbt-utils

After a prior post on the merits of grouped data quality checks, I demo my newly merged implementation for dbt

Using databases with Shiny

Key issues when adding persistent storage to a Shiny application, featuring {golem} app development and Digital Ocean serving

How to Make R Markdown Snow

Much like ice sculpting, applying powertools to absolutely frivolous pursuits

Talks

Data (error) Generating Process

Interrogating the data generating process to devise better data quality tests.

Operationalizing Column-Name Contracts with dbtplyr

An exploration of how data producers and consumers can use column names as interfaces, configuations, and code to improve data quality and discoverability. The second half of the talk demonstrates how to implement these ideas with my dbtplyr dbt package.

Column Names as Contracts

Exploring the benefits of using controlled vocabularies to encode metadata in column names, and demonstrations of implementing this approach with the convo R package or dbt extensions of SQL.

oRganization: Design patterns for internal packages

An overview of the unique design challenges and opportunities when building R packages for use inside of a single organization versus open-source. By using the jobs-to-be-done framework, this talk explores how internal packages can be better teammates by following specific design patterns for API design, testing, documentaiton, and more.

projmgr: Managing the human dependencies of your project

A lightning talk on key features of the projmgr package

Projects

*

dbtplyr

dbt package bringing dplyr semantics to SQL

convo

R package for managing controlled vocabularies

satRday Chicago Conference Organizer

Speaker & Sponsor lead for 2019 and 2020

Rtistic

Hackathon-in-a-box templates for custom Rmd and ggplot2 themes

projmgr

R package providing project management interface to GitHub

Publications

97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

Contributed six chapters on tops ranging from data design, development, validation, and democratization

R Markdown Cookbook

This cookbook contains tips and tricks to help you get the most out of R Markdown. Topics include the automated generation of content (diagrams, text), customizing format (Pandoc, HTML, and LaTeX templates), workflow improvements (modularizing child documents, cross-referencing code chunks, chunk caching), modifying rendering behavior with hooks, and using alternative language engines.