

May 17, 2023


Kann man die Anzahl gerade verliehener Fahrräder eines entsprechenden Anbieters anhand der Temperatur vorhersagen?

In dieser Übung untersuchen wir diese Frage.

Sie können die Daten von der Webseite der UCI herunterladen.

Wir beziehen uns auf den Datensatz day.

Berechnen Sie einen Entscheidungsbaum mit der Anzahl der aktuell vermieteten Räder als AV und der aktuellen Temperatur als UV!

Tunen Sie den Cp-Parameter des Baumes.

Geben Sie den MSE an!



d <- read.csv("/Users/sebastiansaueruser/datasets/Bike-Sharing-Dataset/day.csv")
Rows: 731
Columns: 16
$ instant    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ dteday     <chr> "2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "20…
$ season     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ yr         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ mnth       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ holiday    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
$ weekday    <int> 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4,…
$ workingday <int> 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,…
$ weathersit <int> 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2,…
$ temp       <dbl> 0.3441670, 0.3634780, 0.1963640, 0.2000000, 0.2269570, 0.20…
$ atemp      <dbl> 0.3636250, 0.3537390, 0.1894050, 0.2121220, 0.2292700, 0.23…
$ hum        <dbl> 0.805833, 0.696087, 0.437273, 0.590435, 0.436957, 0.518261,…
$ windspeed  <dbl> 0.1604460, 0.2485390, 0.2483090, 0.1602960, 0.1869000, 0.08…
$ casual     <int> 331, 131, 120, 108, 82, 88, 148, 68, 54, 41, 43, 25, 38, 54…
$ registered <int> 654, 670, 1229, 1454, 1518, 1518, 1362, 891, 768, 1280, 122…
$ cnt        <int> 985, 801, 1349, 1562, 1600, 1606, 1510, 959, 822, 1321, 126…

Data split

d_split <- initial_split(d, strata = cnt)

d_train <- training(d_split)
d_test <- testing(d_split)

Define recipe

rec1 <- 
  recipe(cnt ~ temp, data = d)

Define model

m1 <-
  decision_tree(cost_complexity = tune(),
                mode = "regression")

Define Resamples

rsmpl <- vfold_cv(d_train)


wf1 <-
  workflow() %>% 
  add_model(m1) %>% 


fit1 <- tune_grid(
  object = wf1, 
  resamples = rsmpl)
7.197 sec elapsed
# Tuning results
# 10-fold cross-validation 
# A tibble: 10 × 4
   splits           id     .metrics          .notes          
   <list>           <chr>  <list>            <list>          
 1 <split [492/55]> Fold01 <tibble [20 × 5]> <tibble [0 × 3]>
 2 <split [492/55]> Fold02 <tibble [20 × 5]> <tibble [0 × 3]>
 3 <split [492/55]> Fold03 <tibble [20 × 5]> <tibble [0 × 3]>
 4 <split [492/55]> Fold04 <tibble [20 × 5]> <tibble [0 × 3]>
 5 <split [492/55]> Fold05 <tibble [20 × 5]> <tibble [0 × 3]>
 6 <split [492/55]> Fold06 <tibble [20 × 5]> <tibble [0 × 3]>
 7 <split [492/55]> Fold07 <tibble [20 × 5]> <tibble [0 × 3]>
 8 <split [493/54]> Fold08 <tibble [20 × 5]> <tibble [0 × 3]>
 9 <split [493/54]> Fold09 <tibble [20 × 5]> <tibble [0 × 3]>
10 <split [493/54]> Fold10 <tibble [20 × 5]> <tibble [0 × 3]>

Bester Kandidat

Warning: No value of `metric` was given; metric 'rmse' will be used.
# A tibble: 5 × 7
  cost_complexity .metric .estimator  mean     n std_err .config              
            <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
1      0.0208     rmse    standard   1478.    10    34.7 Preprocessor1_Model09
2      0.00220    rmse    standard   1538.    10    36.4 Preprocessor1_Model01
3      0.000306   rmse    standard   1556.    10    40.3 Preprocessor1_Model07
4      0.00000175 rmse    standard   1558.    10    39.7 Preprocessor1_Model02
5      0.0000194  rmse    standard   1558.    10    39.7 Preprocessor1_Model03
wf1_best <-
  wf1 %>% 
  finalize_workflow(parameters = select_best(fit1))
Warning: No value of `metric` was given; metric 'rmse' will be used.

Last Fit

fit_testsample <- last_fit(wf1_best, d_split)

Model performance (metrics) in test set

fit_testsample %>% collect_metrics()
# A tibble: 2 × 4
  .metric .estimator .estimate .config             
  <chr>   <chr>          <dbl> <chr>               
1 rmse    standard    1430.    Preprocessor1_Model1
2 rsq     standard       0.473 Preprocessor1_Model1
MSE <- fit_testsample %>% collect_metrics() %>% pluck(3, 1)
[1] 1430.304

Solution: 1430.3042213


  • statlearning
  • tidymodels
  • num