library(tidymodels)
data(mtcars)
library(tictoc) # Zeitmessung
library(doParallel) # Nutzen mehrerer Kernetidymodels-tree2
statlearning
trees
tidymodels
speed
string
Aufgabe
Berechnen Sie folgendes einfache Modell:
- Entscheidungsbaum
Modellformel: am ~ . (Datensatz mtcars)
Hier geht es darum, die Geschwindigkeit (und den Ressourcenverbrauch) beim Fitten zu verringern. Benutzen Sie dazu folgende Methoden
- Verwenden mehrerer Prozesskerne
Hinweise:
- Tunen Sie alle Parameter (die der Engine anbietet).
- Verwenden Sie Defaults, wo nicht anders angegeben.
- Führen Sie eine \(v=2\)-fache Kreuzvalidierung durch (weil die Stichprobe so klein ist).
- Beachten Sie die üblichen Hinweise.
Lösung
Setup
Für Klassifikation verlangt Tidymodels eine nominale AV, keine numerische:
mtcars <-
mtcars %>%
mutate(am = factor(am))Daten teilen
set.seed(42)
d_split <- initial_split(mtcars)
d_train <- training(d_split)
d_test <- testing(d_split)Modell(e)
mod_tree <-
decision_tree(mode = "classification",
cost_complexity = tune(),
tree_depth = tune(),
min_n = tune())Rezept(e)
rec_plain <-
recipe(am ~ ., data = d_train)Resampling
set.seed(42)
rsmpl <- vfold_cv(d_train, v = 2)Workflows
wf_tree <-
workflow() %>%
add_recipe(rec_plain) %>%
add_model(mod_tree)Tuning/Fitting
Tuninggrid:
tune_grid <- grid_regular(extract_parameter_set_dials(mod_tree), levels = 5)
tune_grid| cost_complexity | tree_depth | min_n |
|---|---|---|
| 0.0000000 | 1 | 2 |
| 0.0000000 | 1 | 2 |
| 0.0000032 | 1 | 2 |
| 0.0005623 | 1 | 2 |
| 0.1000000 | 1 | 2 |
| 0.0000000 | 4 | 2 |
| 0.0000000 | 4 | 2 |
| 0.0000032 | 4 | 2 |
| 0.0005623 | 4 | 2 |
| 0.1000000 | 4 | 2 |
| 0.0000000 | 8 | 2 |
| 0.0000000 | 8 | 2 |
| 0.0000032 | 8 | 2 |
| 0.0005623 | 8 | 2 |
| 0.1000000 | 8 | 2 |
| 0.0000000 | 11 | 2 |
| 0.0000000 | 11 | 2 |
| 0.0000032 | 11 | 2 |
| 0.0005623 | 11 | 2 |
| 0.1000000 | 11 | 2 |
| 0.0000000 | 15 | 2 |
| 0.0000000 | 15 | 2 |
| 0.0000032 | 15 | 2 |
| 0.0005623 | 15 | 2 |
| 0.1000000 | 15 | 2 |
| 0.0000000 | 1 | 11 |
| 0.0000000 | 1 | 11 |
| 0.0000032 | 1 | 11 |
| 0.0005623 | 1 | 11 |
| 0.1000000 | 1 | 11 |
| 0.0000000 | 4 | 11 |
| 0.0000000 | 4 | 11 |
| 0.0000032 | 4 | 11 |
| 0.0005623 | 4 | 11 |
| 0.1000000 | 4 | 11 |
| 0.0000000 | 8 | 11 |
| 0.0000000 | 8 | 11 |
| 0.0000032 | 8 | 11 |
| 0.0005623 | 8 | 11 |
| 0.1000000 | 8 | 11 |
| 0.0000000 | 11 | 11 |
| 0.0000000 | 11 | 11 |
| 0.0000032 | 11 | 11 |
| 0.0005623 | 11 | 11 |
| 0.1000000 | 11 | 11 |
| 0.0000000 | 15 | 11 |
| 0.0000000 | 15 | 11 |
| 0.0000032 | 15 | 11 |
| 0.0005623 | 15 | 11 |
| 0.1000000 | 15 | 11 |
| 0.0000000 | 1 | 21 |
| 0.0000000 | 1 | 21 |
| 0.0000032 | 1 | 21 |
| 0.0005623 | 1 | 21 |
| 0.1000000 | 1 | 21 |
| 0.0000000 | 4 | 21 |
| 0.0000000 | 4 | 21 |
| 0.0000032 | 4 | 21 |
| 0.0005623 | 4 | 21 |
| 0.1000000 | 4 | 21 |
| 0.0000000 | 8 | 21 |
| 0.0000000 | 8 | 21 |
| 0.0000032 | 8 | 21 |
| 0.0005623 | 8 | 21 |
| 0.1000000 | 8 | 21 |
| 0.0000000 | 11 | 21 |
| 0.0000000 | 11 | 21 |
| 0.0000032 | 11 | 21 |
| 0.0005623 | 11 | 21 |
| 0.1000000 | 11 | 21 |
| 0.0000000 | 15 | 21 |
| 0.0000000 | 15 | 21 |
| 0.0000032 | 15 | 21 |
| 0.0005623 | 15 | 21 |
| 0.1000000 | 15 | 21 |
| 0.0000000 | 1 | 30 |
| 0.0000000 | 1 | 30 |
| 0.0000032 | 1 | 30 |
| 0.0005623 | 1 | 30 |
| 0.1000000 | 1 | 30 |
| 0.0000000 | 4 | 30 |
| 0.0000000 | 4 | 30 |
| 0.0000032 | 4 | 30 |
| 0.0005623 | 4 | 30 |
| 0.1000000 | 4 | 30 |
| 0.0000000 | 8 | 30 |
| 0.0000000 | 8 | 30 |
| 0.0000032 | 8 | 30 |
| 0.0005623 | 8 | 30 |
| 0.1000000 | 8 | 30 |
| 0.0000000 | 11 | 30 |
| 0.0000000 | 11 | 30 |
| 0.0000032 | 11 | 30 |
| 0.0005623 | 11 | 30 |
| 0.1000000 | 11 | 30 |
| 0.0000000 | 15 | 30 |
| 0.0000000 | 15 | 30 |
| 0.0000032 | 15 | 30 |
| 0.0005623 | 15 | 30 |
| 0.1000000 | 15 | 30 |
| 0.0000000 | 1 | 40 |
| 0.0000000 | 1 | 40 |
| 0.0000032 | 1 | 40 |
| 0.0005623 | 1 | 40 |
| 0.1000000 | 1 | 40 |
| 0.0000000 | 4 | 40 |
| 0.0000000 | 4 | 40 |
| 0.0000032 | 4 | 40 |
| 0.0005623 | 4 | 40 |
| 0.1000000 | 4 | 40 |
| 0.0000000 | 8 | 40 |
| 0.0000000 | 8 | 40 |
| 0.0000032 | 8 | 40 |
| 0.0005623 | 8 | 40 |
| 0.1000000 | 8 | 40 |
| 0.0000000 | 11 | 40 |
| 0.0000000 | 11 | 40 |
| 0.0000032 | 11 | 40 |
| 0.0005623 | 11 | 40 |
| 0.1000000 | 11 | 40 |
| 0.0000000 | 15 | 40 |
| 0.0000000 | 15 | 40 |
| 0.0000032 | 15 | 40 |
| 0.0005623 | 15 | 40 |
| 0.1000000 | 15 | 40 |
Ohne Parallelisierung
tic()
fit_tree <-
tune_grid(object = wf_tree,
grid = tune_grid,
metrics = metric_set(roc_auc),
resamples = rsmpl)
toc()28.681 sec elapsed
ca. 45 sec. auf meinem Rechner (4-Kerne-MacBook Pro 2020).
Mit Parallelisierung
Wie viele CPUs hat mein Computer?
parallel::detectCores(logical = FALSE)[1] 4
Parallele Verarbeitung starten:
cl <- makePSOCKcluster(4) # Create 4 clusters
registerDoParallel(cl)tic()
fit_tree2 <-
tune_grid(object = wf_tree,
grid = tune_grid,
metrics = metric_set(roc_auc),
resamples = rsmpl)
toc()27.109 sec elapsed
ca. 17 Sekunden - deutlich schneller!
Categories:
- statlearning
- trees
- tidymodels
- speed
- string