Kapitel 3 Zentrale Ergebnisse

3.1 Cave

Dieses Kapitel berichtet das Wie des Darstellens von Ergebnissen. Es geht nicht darum, warum man welches Vorgehen wählt und welches Vorgehen am sinnvollsten ist.

Konsultieren Sie für letzteres das Statistikbuch Ihres Vertrauens :-)

3.2 Vorbereitung

3.2.1 R-Pakete

In diesem Kapitel benötigen wir folgende R-Pakete:

library(tidyverse)  # Datenjudo
library(easystats)  # make stasts easy again
library(knitr)  # Tabellen schick
library(moderndive)  # Regressionsausgabe schick
library(rstanarm)  # Bayes

easystats ist ein Metapaket⁷, ein R-Paket also, das mehrere R-Pakete beinhaltet. Hier findet sich ein Überblick. Das Paket ist noch nicht auf CRAN, aber es kann wie folgt problemlos installiert werden:

install.packages("easystats", repos = "https://easystats.r-universe.dev")

3.2.2 Daten

data_url <- "https://raw.githubusercontent.com/sebastiansauer/modar/master/datasets/extra.csv"
extra <- read_csv(data_url)

3.3 Relevante Variablen

Damit es einfach bleibt, begrenzen wir uns auf ein paar Variablen.

Sagen wir, das sind die Variablen, die uns interessieren:

extra_corr_names <- 
extra %>% 
  select(n_facebook_friends, n_hangover, age, extra_single_item, n_party, extra_mean) %>% 
  names()

extra_corr_names
#> [1] "n_facebook_friends" "n_hangover"        
#> [3] "age"                "extra_single_item" 
#> [5] "n_party"            "extra_mean"

3.4 Deskriptive Ergebnisse darstellen

Sie können deskriptive Ergebnisse (Ihrer relevanten Variablen) z.B. so darstellen.

extra %>% 
  select(any_of(extra_corr_names)) %>% 
  describe_distribution() %>% 
  kable(digits = 2)

Variable	Mean	SD	IQR	Min	Max	Skewness	Kurtosis	n	n_Missing
n_facebook_friends	532.61	3704.48	300.0	0.0	96055	25.67	662.76	671	155
n_hangover	9.47	30.72	9.0	0.0	738	17.54	399.53	800	26
age	25.50	5.75	6.0	18.0	54	1.81	4.39	813	13
extra_single_item	2.79	0.86	1.0	1.0	4	-0.27	-0.60	816	10
n_party	17.38	19.32	19.0	0.0	150	3.27	16.10	793	33
extra_mean	2.89	0.45	0.6	1.2	4	-0.43	-0.11	822	4

3.5 Korrelationen darstellen

In einer Umfrage erhebt man häufig mehrere Variablen, ein Teil davon oft Konstrukte. Es bietet sich in einem ersten Schritt an, die Korrelationen dieser Variablen untereinander darzustellen.

3.5.1 Korrelationsmatrix

extra %>% 
  select(any_of(extra_corr_names)) %>%  
  correlation() %>% 
  summary() %>% 
  kable(digits = 2)

Parameter	extra_mean	n_party	extra_single_item	age	n_hangover
n_facebook_friends	0.05	0.08	0.07	-0.03	0.13
n_hangover	0.06	0.34	-0.02	-0.06	NA
age	-0.14	-0.21	-0.03	NA	NA
extra_single_item	0.57	0.06	NA	NA	NA
n_party	0.25	NA	NA	NA	NA

Sie möchten das Ergebnis als normalen R-Dataframe? Sie haben keine Lust auf dieses Rumgetue, sondern wollen das lieber als selber gerade ziehen. Also gut:

cor_results <- 
extra %>% 
  select(any_of(extra_corr_names)) %>%  
  correlation() %>% 
  summary()

cor_results
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter          | extra_mean |  n_party | extra_single_item |   age | n_hangover
#> -----------------------------------------------------------------------------------
#> n_facebook_friends |       0.05 |     0.08 |              0.07 | -0.03 |      0.13*
#> n_hangover         |       0.06 |  0.34*** |             -0.02 | -0.06 |           
#> age                |   -0.14*** | -0.21*** |             -0.03 |       |           
#> extra_single_item  |    0.57*** |     0.06 |                   |       |           
#> n_party            |    0.25*** |          |                   |       |           
#> 
#> p-value adjustment method: Holm (1979)

Man kann sich die Korrelationsmatrix auch in der Bayes-Geschmacksrichtung ausgeben lassen:

extra %>% 
  select(any_of(extra_corr_names)) %>%  
  correlation(bayesian = TRUE) %>% 
  summary() %>% 
  kable(digits = 2)

Parameter	extra_mean	n_party	extra_single_item	age	n_hangover
n_facebook_friends	0.05	0.08	0.07	-0.03	0.13
n_hangover	0.05	0.33	-0.02	-0.06	NA
age	-0.14	-0.21	-0.03	NA	NA
extra_single_item	0.56	0.06	NA	NA	NA
n_party	0.25	NA	NA	NA	NA

3.5.2 Korrelationsmatrizen visualisieren

Viele R-Pakete bieten sich an. Nehmen wir easystats.

extra %>% 
  select(any_of(extra_corr_names)) %>%  
  correlation() %>% 
  summary() %>% 
  plot() +
  labs(title = "Korrelationsmatrix, boa")

3.6 Regressionsergebnisse

3.6.1 Frequentistisch

lm1 <- lm(extra_mean ~ n_facebook_friends + n_hangover, data = extra)

lm1_tab <- 
get_regression_table(lm1) 

lm1_tab %>% 
  kable(digits = 2)

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	2.88	0.02	147.79	0.00	2.85	2.92
n_facebook_friends	0.00	0.00	0.66	0.51	0.00	0.00
n_hangover	0.00	0.00	4.08	0.00	0.00	0.01

3.6.2 Bayesianisch

Für Bayes gibt es keine ganz so komfortable Lösung. Aber keine Panik, es sollte nur wenige Minuten kosten.

lm2 <- stan_glm(extra_mean ~ n_facebook_friends + n_hangover, 
                refresh = 0,
                data = extra)

Ergebnisse, d.h. die Koeffizienten:

parameters(lm2)
#> Parameter          |   Median |        95% CI |     pd | % in ROPE |  Rhat |     ESS |                     Prior
#> ----------------------------------------------------------------------------------------------------------------
#> (Intercept)        |     2.88 | [ 2.85, 2.92] |   100% |        0% | 1.000 | 4948.00 |     Normal (2.93 +- 1.10)
#> n_facebook_friends | 3.12e-06 | [ 0.00, 0.00] | 75.58% |      100% | 1.000 | 5300.00 | Normal (0.00 +- 2.95e-04)
#> n_hangover         | 4.55e-03 | [ 0.00, 0.01] |   100% |      100% | 0.999 | 4881.00 |     Normal (0.00 +- 0.07)

Und als Tabelle:

parameters(lm2) %>% 
  kable(digits = 2)

Parameter	Median	CI	CI_low	CI_high	pd	ROPE_Percentage	Rhat	ESS	Prior_Distribution	Prior_Location	Prior_Scale
(Intercept)	2.88	0.95	2.85	2.92	1.00	0	1	4948.37	normal	2.93	1.10
n_facebook_friends	0.00	0.95	0.00	0.00	0.76	1	1	5299.88	normal	0.00	0.00
n_hangover	0.00	0.95	0.00	0.01	1.00	1	1	4881.09	normal	0.00	0.07

Dazu noch \(R^2\):

r2_bayes(lm2)
#> # Bayesian R2 with Compatibility Interval
#> 
#>   Conditional R2: 0.028 (95% CI [0.008, 0.054])

Das schreiben Sie entweder als Fußnote unter die Tabelle oder erwähnen es im Text.

3.6.3 Reportr

Hier ist noch ein experimentelles Feature: Es gibt Ihnen den Text aus, um die Ergebnisse Ihrer Bayes-Analyse zu berichten.

report(lm2)

We fitted a Bayesian linear model (estimated using MCMC sampling with 4 chains of 2000 iterations and a warmup of 1000) to predict extra_mean with n_facebook_friends and n_hangover (formula: extra_mean ~ n_facebook_friends + n_hangover). Priors over parameters were set as normal (mean = 0.00, SD = 2.95e-04) and normal (mean = 0.00, SD = 0.07) distributions. The model’s explanatory power is weak (R2 = 0.03, 95% CI [7.66e-03, 0.05], adj. R2 = 0.01). The model’s intercept, corresponding to n_facebook_friends = 0 and n_hangover = 0, is at 2.88 (95% CI [2.85, 2.92]). Within this model:

The effect of n facebook friends (Median = 3.12e-06, 95% CI [-5.98e-06, 1.18e-05]) has a 75.58% probability of being positive (> 0), 0.00% of being significant (> 0.02), and 0.00% of being large (> 0.13). The estimation successfully converged (Rhat = 1.000) and the indices are reliable (ESS = 5300)
The effect of n hangover (Median = 4.55e-03, 95% CI [2.38e-03, 6.74e-03]) has a 100.00% probability of being positive (> 0), 0.00% of being significant (> 0.02), and 0.00% of being large (> 0.13). The estimation successfully converged (Rhat = 0.999) and the indices are reliable (ESS = 4881)

Following the Sequential Effect eXistence and sIgnificance Testing (SEXIT) framework, we report the median of the posterior distribution and its 95% CI (Highest Density Interval), along the probability of direction (pd), the probability of significance and the probability of being large. The thresholds beyond which the effect is considered as significant (i.e., non-negligible) and large are |0.02| and |0.13| (corresponding respectively to 0.05 and 0.30 of the outcome’s SD). Convergence and stability of the Bayesian sampling has been assessed using R-hat, which should be below 1.01 (Vehtari et al., 2019), and Effective Sample Size (ESS), which should be greater than 1000 (Burkner, 2017).

Hier wird das sog. SEXIT-Framework verwendet.

3.7 Tabellen von R nach Word

3.7.1 Rmd-Dokument als Word-Dokument ausgeben lassen.

Vielleicht die einfachste Möglichkeit: Erstellen Sie Ihre Analyse nicht als .R-Daten, sondern als .Rmd-Datei. Dann “knittern” Sie die Analyse als Word-Dokument. Fertig!

Um eine schicke Tabelle zu bekommen, nutzen Sie die den Befehl kable(). Den wenden Sie auf einen Dataframe an, der als schicke Tabelle auf die Welt kommen soll.

3.7.2 Rmd-Dokument copy-pasten

Sie können auch Folgendes tun: Erstellen Sie Ihre Analyse als Rmd-Datei; knittern Sie das Dokument. Sie erhalten dann ein HTML-Dokument, aus dem Sie problemlos copy-pasten können (von HTML nach Word).

3.7.3 Ich will kein Rmd!

Wenn Sie (partout) nicht mit Rmd-Dateien arbeiten wollen, hilft Ihnen das R-Paket flextable:

library(flextable)

meine_flex_tab <- flextable(lm1_tab)

meine_flex_tab

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	2.884	0.020	147.789	0.000	2.846	2.922
n_facebook_friends	0.000	0.000	0.655	0.513	0.000	0.000
n_hangover	0.005	0.001	4.085	0.000	0.002	0.007

Dieses Objekt können Sie dann als .docx abspeichern:

save_as_docx("Tabelle 1 " = meine_flex_tab, path = "meine_tab.docx")

2 Daten aufräumen

4 Fallstudie Popup-Stores