Simple way to separate train and test sample in R

Reading time ~1 minute

For statistical modeling, it is typical to separate a train sample from a test sample. The training sample is used to build (“train”) the model, whereas the test sample is used to gauge the predictive quality of the model.

There are many ways to split off a test sample from the train sample. One quite simple, tidyverse-oriented way, is the following.

First, load the tidyverse. Next, load some data.

library(tidyverse)
data(Affairs, package = "AER")

Then, create an index vector of the length of your train sample, say 80% of the total sample size.

set.seed(42)
index <- sample(1:601, size = trunc(.8 * 601))

Put bluntly, we draw 480 (.8*601) cases from the dataset, and note their row numbers.

a_train <- Affairs %>%
  filter(row_number() %in% index)

The test set is the complement of the train set, drawn similarly:

a_test <- Affairs %>%
  filter(!(row_number() %in% index))

Wie gut schätzt eine Stichprobe die Grundgesamtheit?

# DatenSie arbeiten bei der Flughafenaufsicht von NYC. Cooler Job.```rlibrary(nycflights13)data(flights)```## Pakete laden```rlibrary(mos...… Continue reading

Some thoughts on tidyveal and environments in R

Published on November 16, 2017

Yart - Yet Another Markdown Report Template

Published on November 15, 2017