tidymodels-vorlage

tidymodels

statlearning

template

string

Published

May 17, 2023

Aufgabe

Schreiben Sie eine prototypische Analyse für ein Vorhersagemodell, das sich als Vorlage für Analysen dieser Art eignet!

Hinweise:

Berechnen Sie ein Modell
Tunen Sie mind. einen Parameter des Modells
Verwenden Sie Kreuzvalidierung
Verwenden Sie Standardwerte, wo nicht anders angegeben.
Fixieren Sie Zufallszahlen auf den Startwert 42.

Lösung

# 2023-05-08


# Setup:
library(tidymodels)

── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──

✔ broom        1.0.5     ✔ recipes      1.0.8
✔ dials        1.2.0     ✔ rsample      1.2.0
✔ dplyr        1.1.3     ✔ tibble       3.2.1
✔ ggplot2      3.4.4     ✔ tidyr        1.3.0
✔ infer        1.0.5     ✔ tune         1.1.2
✔ modeldata    1.2.0     ✔ workflows    1.1.3
✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
✔ purrr        1.0.2     ✔ yardstick    1.2.0

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.4
✔ lubridate 1.9.3     ✔ stringr   1.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ purrr::discard()    masks scales::discard()
✖ dplyr::filter()     masks stats::filter()
✖ stringr::fixed()    masks recipes::fixed()
✖ dplyr::lag()        masks stats::lag()
✖ readr::spec()       masks yardstick::spec()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tictoc)  # Zeitmessung
library(baguette)  # Bagged-Trees


# Data:
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)

Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (6): rownames, bill_length_mm, bill_depth_mm, flipper_length_mm, body_ma...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

set.seed(42)
d_split <- initial_split(d)
d_train <- training(d_split)
d_test <- testing(d_split)


# model:
mod_bag <-
  bag_tree(mode = "regression",
           cost_complexity = tune())


# cv:
set.seed(42)
rsmpl <- vfold_cv(d_train)


# recipe:
rec1_plain <- recipe(body_mass_g ~  ., data = d_train)


# workflow:
wf1 <-
  workflow() %>% 
  add_model(mod_bag) %>% 
  add_recipe(rec1_plain)


# tuning:
tic()
wf1_fit <-
  wf1 %>% 
  tune_grid(
    resamples = rsmpl)
toc()

35.202 sec elapsed

# best candidate:
show_best(wf1_fit)

Warning: No value of `metric` was given; metric 'rmse' will be used.

# A tibble: 5 × 7
  cost_complexity .metric .estimator  mean     n std_err .config              
            <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
1        2.10e- 3 rmse    standard    302.    10    13.1 Preprocessor1_Model09
2        1.72e-10 rmse    standard    305.    10    18.4 Preprocessor1_Model10
3        1.71e- 9 rmse    standard    306.    10    17.3 Preprocessor1_Model07
4        3.34e- 5 rmse    standard    311.    10    18.1 Preprocessor1_Model04
5        7.33e- 9 rmse    standard    316.    10    15.1 Preprocessor1_Model03

# finalize wf:
wf1_final <-
  wf1 %>% 
  finalize_workflow(select_best(wf1_fit))

Warning: No value of `metric` was given; metric 'rmse' will be used.

wf1_fit_final <-
  wf1_final %>% 
  last_fit(d_split)


# Modellgüte im Test-Set:
collect_metrics(wf1_fit_final)

# A tibble: 2 × 4
  .metric .estimator .estimate .config             
  <chr>   <chr>          <dbl> <chr>               
1 rmse    standard     326.    Preprocessor1_Model1
2 rsq     standard       0.847 Preprocessor1_Model1

Categories:

tidymodels
statlearning
template
string