# Checking for NA with dplyr

## October 16, 2016

Often, we want to check for missing values (NAs). There are of course many ways to do so. dplyr provides a quite nice one.

library(readr)
extra_file <- "https://raw.github.com/sebastiansauer/Daten_Unterricht/master/extra.csv"



Note that extra is a data frame consisting of survey items regarding extraversion and related behavior.

In case the dataframe is quite largish (many columns) it is helpful to have some quick way. Here, we have 25 columns. That is not enormous, but ok, let’s stick with that for now.

library(dplyr)

extra_df %>%
select_if(function(x) any(is.na(x))) %>%
summarise_each(funs(sum(is.na(.)))) -> extra_NA


So, what have we done? The select_if part choses any column where is.na is true (TRUE). Then we take those columns and for each of them, we sum up (summarise_each) the number of NAs. Note that each column is summarized to a single value, that’s why we use summarise. And finally, the resulting data frame (dplyr always aims at giving back a data frame) is stored in a new variable for further processing.

Now, let’s see:

# library(pander)  # for printing tables in markdown
library(knitr)

kable(extra_NA)

code i6 i9 i12 Facebook Kater Alter Geschlecht extro_one_item Minuten Messe Party Kunden Beschreibung Aussagen i26 extra_mw
82 1 1 1 73 12 3 3 4 37 4 16 49 117 121 3 3

### New bar stacking with ggplot 2.2.0

Recently, ggplot2 2.2.0 was released. Among other news, stacking bar plot was improved. Here is a short demonstration.Load libraries`...… Continue reading

#### Crashkurs zur Erstellung von Barplots für Umfrage-Daten

Published on November 13, 2016

#### Some thoughts (and simulation) on overfitting

Published on November 12, 2016