Programming with dplyr: Part 01, introduction

Reading time ~5 minutes

Like for [others], Hadley Wickham’s dplyr, and more generally, the tidyverse approach has considerably changed the I do data analyses. Most notably, the pipe (coming from magrittr by Stefan Milton Bache, see here) has creeped into nearly every analyses I, do.

That is, is every analyses except for functions, and other non-interactive stuff. In those programming contexts, the dplyr way does not work, due to its non standard evaluation or NSE for short.

So, this post is about programming with dplyr, or more precisely, using the recently introduced tidyeval approach (approached into widely used R libraries, that is).

To understand the usecase, consider the following example. Say, we count the frequencies of some groups, and want to add the proportiong of these counts.

library(tidyverse)
mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n())
## # A tibble: 3 x 2
##     cyl     n
##   <dbl> <int>
## 1     4    11
## 2     6     7
## 3     8    14

Or, shorter mtcars %>% count(cyl).

Now, let’s add a column with the proporting of the count column (n).

mtcars %>% 
  count(cyl) %>% 
  mutate(prop = n / sum(n))
## # A tibble: 3 x 3
##     cyl     n    prop
##   <dbl> <int>   <dbl>
## 1     4    11 0.34375
## 2     6     7 0.21875
## 3     8    14 0.43750

For extending this game to more than one variable, see this post.

Now, let’s assume we would like to put the ‘add a proportion column to my dataframe’ in a function. How to proceed?

We might think this approach should work:

add_prop <- function(df, count_var, group_var){
  df %>% 
    ungroup() %>% 
    group_by(group_var) %>% 
    mutate(prop =  count_var / sum(n()))
}

mtcars %>% 
  count(cyl) %>% 
  add_prop(count_var = n, group_var = cyl)

However, this does not work.

The dplyr ‘verbs’ expect quoted input variables. That means we must quote the parameters first, before we can hand over them to dplyr. quo basically means “hey R, don’t yet evaulate this expression. Just read it, understand the name of the expression, and wait until I tell you. And R, understand, that you got an expression, not a simple text string”. The last sentence tells us (or R) that we do not want a text string, but the quotation (or ‘citation’ of the expression if you like) of an expression to be evaluated in the futre.

mtcars %>% 
  count(cyl) %>% 
  add_prop(count_var = quo(n), group_var = quo(cyl))

The important point is quo.

quo(mtcars$cyl)
## <quosure: global>
## ~mtcars$cyl

But there’s a second step we need to take. Now that quo has defined a quoted expression, the dplyr verbs do not need to quote their input variables, because they are already quoted. That is, we need to tell R now: “Hey, do not quote, evaluate! We have taken care of the quoting before”.

add_prop <- function(df, count_var, group_var){
  df %>% 
    mutate(prop =  (!!count_var) / sum(!!count_var))
}


mtcars %>% 
  count(cyl) %>% 
  add_prop(count_var = quo(n), group_var = quo(cyl)) 
## # A tibble: 3 x 3
##     cyl     n    prop
##   <dbl> <int>   <dbl>
## 1     4    11 0.34375
## 2     6     7 0.21875
## 3     8    14 0.43750

Hizzah!

Wie gut schätzt eine Stichprobe die Grundgesamtheit?

# DatenSie arbeiten bei der Flughafenaufsicht von NYC. Cooler Job.```rlibrary(nycflights13)data(flights)```## Pakete laden```rlibrary(mos...… Continue reading

Some thoughts on tidyveal and environments in R

Published on November 16, 2017

Yart - Yet Another Markdown Report Template

Published on November 15, 2017