Convert list to dataframe

Reading time ~3 minutes

A handy function to iterate stuff is the function purrr::map. It takes a function and applies it to all elements of a given vector. This vector can be a data frame - which is a list, tecnically - or some other sort of of list (normal atomic vectors are fine, too).

However, purrr::map is designed to return lists (not dataframes). For example, if you apply mosaic::favstats to map, you will get some favorite statistics for some variable:

library(mosaic)
library(tidyverse)

data(tips, package = "reshape2")
favstats(tips$tip)
##  min Q1 median     Q3 max     mean       sd   n missing
##    1  2    2.9 3.5625  10 2.998279 1.383638 244       0

Note that favstats does not accept several columns/variables as parameters; one only at a time is permitted.

Ok, let’s apply favstats to each numeric column of our dataframe:

tips %>% 
  select_if(is.numeric) %>% 
  map(mosaic::favstats)
## $total_bill
##   min      Q1 median      Q3   max     mean       sd   n missing
##  3.07 13.3475 17.795 24.1275 50.81 19.78594 8.902412 244       0
## 
## $tip
##  min Q1 median     Q3 max     mean       sd   n missing
##    1  2    2.9 3.5625  10 2.998279 1.383638 244       0
## 
## $size
##  min Q1 median Q3 max     mean        sd   n missing
##    1  2      2  3   6 2.569672 0.9510998 244       0

Quite nice, but we are given back a list with several elements; each element is a “row” of our to-be dataframe. How to change this list to a regular dataframe (tibble)?

This trick can be solved by use of some sort of repeated rbind. rbind binds rows together, hence the name. But rbinds does only accept two elements as input. Now comes do.call to help. Effectiveley, do.call does something like: rbind(my_list[[1]], my_list[[2]], my_list[[2]], ...) (this is pseudo code).

tips %>% 
  select_if(is.numeric) %>% 
  map(mosaic::favstats) %>% 
  do.call(rbind, .)
##             min      Q1 median      Q3   max      mean        sd   n
## total_bill 3.07 13.3475 17.795 24.1275 50.81 19.785943 8.9024120 244
## tip        1.00  2.0000  2.900  3.5625 10.00  2.998279 1.3836382 244
## size       1.00  2.0000  2.000  3.0000  6.00  2.569672 0.9510998 244
##            missing
## total_bill       0
## tip              0
## size             0

Note the little dot in rbind.

Now be happy with the dataframe :-)

Crashkurs Datenanalyse mit R

Nicht jeder liebt Datenanalyse und Statistik... in gleichem Maße! Das ist zumindest meine Erfahrung aus dem Unterricht 🔥. Crashkurse zu R...… Continue reading

Different ways to count NAs over multiple columns

Published on September 08, 2017

Different ways to present summaries in ggplot2

Published on September 08, 2017