→ A | warning: 21 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x1
There were issues with some computations A: x3
→ B | warning: 30 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x3
There were issues with some computations A: x25 B: x7
→ C | warning: 40 samples were requested but there were 12 rows in the data. 12 will be used.
There were issues with some computations A: x25 B: x7
There were issues with some computations A: x25 B: x25 C: x10
There were issues with some computations A: x26 B: x25 C: x25
There were issues with some computations A: x35 B: x25 C: x25
There were issues with some computations A: x50 B: x38 C: x25
There were issues with some computations A: x50 B: x50 C: x39
There were issues with some computations A: x50 B: x50 C: x50
toc()
23.317 sec elapsed
ca. 45 sec. auf meinem Rechner (4-Kerne-MacBook Pro 2020).
---exname: tidymodels-tree2expoints: 1extype: stringexsolution: NAcategories:- statlearning- trees- tidymodels- speed- stringdate: '2023-11-08'slug: tidymodels-tree2title: tidymodels-tree2---# AufgabeBerechnen Sie folgendes einfache Modell:1. EntscheidungsbaumModellformel: `am ~ .` (Datensatz `mtcars`)Hier geht es darum, die Geschwindigkeit (und den Ressourcenverbrauch) beim Fitten zu verringern.Benutzen Sie dazu folgende Methoden- Verwenden mehrerer ProzesskerneHinweise:- Tunen Sie alle Parameter (die der Engine anbietet). - Verwenden Sie Defaults, wo nicht anders angegeben.- Führen Sie eine $v=2$-fache Kreuzvalidierung durch (weil die Stichprobe so klein ist).- Beachten Sie die [üblichen Hinweise](https://datenwerk.netlify.app/hinweise).</br></br></br></br></br></br></br></br></br></br># Lösung## Setup```{r}library(tidymodels)data(mtcars)library(tictoc) # Zeitmessunglibrary(doParallel) # Nutzen mehrerer Kerne```Für Klassifikation verlangt Tidymodels eine nominale AV, keine numerische:```{r}mtcars <- mtcars %>%mutate(am =factor(am))```## Daten teilen```{r}set.seed(42)d_split <-initial_split(mtcars)d_train <-training(d_split)d_test <-testing(d_split)```## Modell(e)```{r}mod_tree <-decision_tree(mode ="classification",cost_complexity =tune(),tree_depth =tune(),min_n =tune())```## Rezept(e)```{r}rec_plain <-recipe(am ~ ., data = d_train)```## Resampling```{r}set.seed(42)rsmpl <-vfold_cv(d_train, v =2)```## Workflows```{r}wf_tree <-workflow() %>%add_recipe(rec_plain) %>%add_model(mod_tree)```## Tuning/FittingTuninggrid:```{r}tune_grid <-grid_regular(extract_parameter_set_dials(mod_tree), levels =5)tune_grid```## Ohne Parallelisierung```{r}tic()fit_tree <-tune_grid(object = wf_tree,grid = tune_grid,metrics =metric_set(roc_auc),resamples = rsmpl)toc()```ca. 45 sec. auf meinem Rechner (4-Kerne-MacBook Pro 2020).## Mit ParallelisierungWie viele CPUs hat mein Computer?```{r}parallel::detectCores(logical =FALSE)```Parallele Verarbeitung starten:```{r}cl <-makePSOCKcluster(4) # Create 4 clustersregisterDoParallel(cl)``````{r}tic()fit_tree2 <-tune_grid(object = wf_tree,grid = tune_grid,metrics =metric_set(roc_auc),resamples = rsmpl)toc()```ca. 17 Sekunden - deutlich schneller!---Categories: - statlearning- trees- tidymodels- speed- string