Three ways to dichotomize a variable

Reading time ~4 minutes

Dichotomizing is also called dummy coding. It means: Take a variable with multiple different values (>2), and transform it so that the output variable has 2 different values.

Note that this “thing” can be understood as consisting of two different aspects: Recoding and cutting. Recoding means that value “a” becomes values “b” etc. Cutting means that a “rope” of numbers is cut into several shorter “ropes” (that’s why it is called cutting).

Several ways of achieving this exist in R. Here we discuss three.

First, let’s load some data.

library(AER)
data(Affairs)

We will define a new variable, called “halodrie”. A “halodrie” is someone who likes having affairs (German parlance). The variable should have 2 values, ie., “yes” and “no”.

Using {car}

library(car)

Affairs$halodrie <- car::recode(Affairs$affairs, "0 = 'no'; 1:12 = 'yes'")

head(Affairs$halodrie)
## [1] "no" "no" "no" "no" "no" "no"
table(Affairs$halodrie)
## 
##  no yes 
## 451 150

A comfortable feature of this function is that it allows using the colon operator. Note that the whole recode-thing (all values to be recoded) is to be put into quotation marks.

Using {dplyr}

library(dplyr)

Affairs$halodrie <- 
  dplyr::recode(Affairs$affairs, `0` = "no", .default = "yes")

Affairs$halodrie %>% head
## [1] "no" "no" "no" "no" "no" "no"
table(Affairs$halodrie)
## 
##  no yes 
## 451 150

This function does not allow the colon operator. I assume the reason that the author (Hadley Wickham) argues that a given function should only be capable of one thing. Here, the function reassigns a value of a variable, not (much) more, not less.

Using base::cut

Affairs$halodrie <- cut(Affairs$affairs, breaks = c(-Inf, 0, +Inf), labels = c("no", "yes"))

table(Affairs$halodrie)
## 
##  no yes 
## 451 150

Of note, when a continuous variable is “cut”, one must specify the minimum and the maximum value (or arbitrarly small or large values) as cutting points. So cutting in two halfs, is not one cutting point for cut, but three (always add two cutting points: one being the smallest value in the sample [or smaller, even -Inf, the other one being the largest value in the sample [or even Inf]).

Summary

For beginners, I would recommend car::recode. It provides both recoding and cutting in one function, and hence may be easier to apply at start. It also offers quite some flexibility. Assume you are interested in marriages ranging from 5 to 15 years:

Affairs$years5_15 <- car::recode(Affairs$yearsmarried, "0:4 =  'no'; 5:15 = 'yes'; else = 'no'")

head(Affairs$years5_15)
## [1] "yes" "no"  "yes" "yes" "no"  "no"
table(Affairs$years5_15)
## 
##  no yes 
## 245 356

Wie gut schätzt eine Stichprobe die Grundgesamtheit?

# DatenSie arbeiten bei der Flughafenaufsicht von NYC. Cooler Job.```rlibrary(nycflights13)data(flights)```## Pakete laden```rlibrary(mos...… Continue reading

Some thoughts on tidyveal and environments in R

Published on November 16, 2017

Yart - Yet Another Markdown Report Template

Published on November 15, 2017