regr-tree03

statlearning
trees
tidymodels
string
mtcars
Published

May 17, 2023

library(tidymodels)

Aufgabe

Berechnen Sie einfaches Prognosemodell auf Basis eines Entscheidungsbaums!

Modellformel: am ~ . (Datensatz mtcars)

Berichten Sie die Modellgüte (ROC-AUC).

Hinweise:

  • Tunen Sie alle Parameter (die der Engine anbietet).
  • Erstellen Sie ein Tuning-Grid mit 5 Werten pro Tuningparameter.
  • Führen Sie eine \(v=2\)-fache Kreuzvalidierung durch (weil die Stichprobe so klein ist).
  • Beachten Sie die üblichen Hinweise.











Lösung

Setup

library(tidymodels)
data(mtcars)
library(tictoc)  # Zeitmessung

Für Klassifikation verlangt Tidymodels eine nominale AV, keine numerische:

mtcars <-
  mtcars %>% 
  mutate(am = factor(am))

Daten teilen

d_split <- initial_split(mtcars)
d_train <- training(d_split)
d_test <- testing(d_split)

Modell(e)

mod_tree <-
  decision_tree(mode = "classification",
                cost_complexity = tune(),
                tree_depth = tune(),
                min_n = tune())

Rezept(e)

rec1 <- 
  recipe(am ~ ., data = d_train)

Resampling

rsmpl <- vfold_cv(d_train, v = 2)

Workflow

wf1 <-
  workflow() %>%  
  add_recipe(rec1) %>% 
  add_model(mod_tree)

Tuning/Fitting

Tuninggrid:

tune_grid <- grid_regular(extract_parameter_set_dials(mod_tree), levels = 5)
tune_grid
cost_complexity tree_depth min_n
0.0000000 1 2
0.0000000 1 2
0.0000032 1 2
0.0005623 1 2
0.1000000 1 2
0.0000000 4 2
0.0000000 4 2
0.0000032 4 2
0.0005623 4 2
0.1000000 4 2
0.0000000 8 2
0.0000000 8 2
0.0000032 8 2
0.0005623 8 2
0.1000000 8 2
0.0000000 11 2
0.0000000 11 2
0.0000032 11 2
0.0005623 11 2
0.1000000 11 2
0.0000000 15 2
0.0000000 15 2
0.0000032 15 2
0.0005623 15 2
0.1000000 15 2
0.0000000 1 11
0.0000000 1 11
0.0000032 1 11
0.0005623 1 11
0.1000000 1 11
0.0000000 4 11
0.0000000 4 11
0.0000032 4 11
0.0005623 4 11
0.1000000 4 11
0.0000000 8 11
0.0000000 8 11
0.0000032 8 11
0.0005623 8 11
0.1000000 8 11
0.0000000 11 11
0.0000000 11 11
0.0000032 11 11
0.0005623 11 11
0.1000000 11 11
0.0000000 15 11
0.0000000 15 11
0.0000032 15 11
0.0005623 15 11
0.1000000 15 11
0.0000000 1 21
0.0000000 1 21
0.0000032 1 21
0.0005623 1 21
0.1000000 1 21
0.0000000 4 21
0.0000000 4 21
0.0000032 4 21
0.0005623 4 21
0.1000000 4 21
0.0000000 8 21
0.0000000 8 21
0.0000032 8 21
0.0005623 8 21
0.1000000 8 21
0.0000000 11 21
0.0000000 11 21
0.0000032 11 21
0.0005623 11 21
0.1000000 11 21
0.0000000 15 21
0.0000000 15 21
0.0000032 15 21
0.0005623 15 21
0.1000000 15 21
0.0000000 1 30
0.0000000 1 30
0.0000032 1 30
0.0005623 1 30
0.1000000 1 30
0.0000000 4 30
0.0000000 4 30
0.0000032 4 30
0.0005623 4 30
0.1000000 4 30
0.0000000 8 30
0.0000000 8 30
0.0000032 8 30
0.0005623 8 30
0.1000000 8 30
0.0000000 11 30
0.0000000 11 30
0.0000032 11 30
0.0005623 11 30
0.1000000 11 30
0.0000000 15 30
0.0000000 15 30
0.0000032 15 30
0.0005623 15 30
0.1000000 15 30
0.0000000 1 40
0.0000000 1 40
0.0000032 1 40
0.0005623 1 40
0.1000000 1 40
0.0000000 4 40
0.0000000 4 40
0.0000032 4 40
0.0005623 4 40
0.1000000 4 40
0.0000000 8 40
0.0000000 8 40
0.0000032 8 40
0.0005623 8 40
0.1000000 8 40
0.0000000 11 40
0.0000000 11 40
0.0000032 11 40
0.0005623 11 40
0.1000000 11 40
0.0000000 15 40
0.0000000 15 40
0.0000032 15 40
0.0005623 15 40
0.1000000 15 40
tic()
fit1 <-
  tune_grid(object = wf1,
            grid = tune_grid,
            metrics = metric_set(roc_auc),
            resamples = rsmpl)
toc()
24.798 sec elapsed

Bester Kandidat

autoplot(fit1)

show_best(fit1)
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
0 1 11 roc_auc binary 0.8055556 2 0.0277778 pre0_mod002_post0
0 1 21 roc_auc binary 0.8055556 2 0.0277778 pre0_mod003_post0
0 1 30 roc_auc binary 0.8055556 2 0.0277778 pre0_mod004_post0
0 1 40 roc_auc binary 0.8055556 2 0.0277778 pre0_mod005_post0
0 4 11 roc_auc binary 0.8055556 2 0.0277778 pre0_mod007_post0

Finalisieren

wf1_finalized <-
  wf1 %>% 
  finalize_workflow(select_best(fit1))

Last Fit

final_fit <- 
  last_fit(object = wf1_finalized, d_split)

collect_metrics(final_fit)
.metric .estimator .estimate .config
accuracy binary 0.8750000 pre0_mod0_post0
roc_auc binary 0.8750000 pre0_mod0_post0
brier_class binary 0.1157407 pre0_mod0_post0

Categories:

  • statlearning
  • trees
  • tidymodels
  • string