Often, we want to check for missing values (
NAs). There are of course many ways to do so.
dplyr provides a quite nice one.
First, let’s load some data:
library(readr) extra_file <- "https://raw.github.com/sebastiansauer/Daten_Unterricht/master/extra.csv" extra_df <- read_csv(extra_file)
extra is a data frame consisting of survey items regarding extraversion and related behavior.
In case the dataframe is quite largish (many columns) it is helpful to have some quick way. Here, we have 25 columns. That is not enormous, but ok, let’s stick with that for now.
library(dplyr) extra_df %>% select_if(function(x) any(is.na(x))) %>% summarise_each(funs(sum(is.na(.)))) -> extra_NA
So, what have we done?
select_if part choses any column where
is.na is true (
TRUE). Then we take those columns and for each of them, we sum up (
summarise_each) the number of NAs. Note that each column is summarized to a single value, that’s why we use
summarise. And finally, the resulting data frame (dplyr always aims at giving back a data frame) is stored in a new variable for further processing.
Now, let’s see:
# library(pander) # for printing tables in markdown library(knitr) kable(extra_NA)