Plotting skewed distributions

Reading time ~3 minutes

Let’s plot some skewed stuff, aehm, distributions!

Actually, the point I - initially - wanted to make is that in skewed distribution, don’t use means. Or at least, be very aware that (arithmetic) means can be grossly misleading. But for today, let’s focus on drawing skewed distributions.

Some packages:

library(tidyverse)
library(fGarch) # for snorm

Some skewed distribution include:

  • “polluted” normal distributions, ie., mixtures of two normals
  • Exponential distributions
  • Gamma distributions
  • Beta distributions

One way to visualize them is to draw their curve, ie., their functional (analytical) form:

data_frame(
  x = seq(-10, 10, .05)
) %>% 
  ggplot +
  aes(x) +
  stat_function(fun = dbeta, args = list(shape1 = 4, shape2 = 4), color = "red") +
  stat_function(fun = dexp, args = list(rate = .10), color = "green") +
  stat_function(fun = dsnorm, args = list(mean = 0, sd = 3, xi = 7.5), color = "blue") +
  stat_function(fun = dgamma, args = list(shape = 2, scale = 2), color = "orange") +
  coord_cartesian(ylim = c(0,.5))

plot of chunk unnamed-chunk-2

Second, we could draw some random instances from the respective distribution; we will get then not “smooth” curves but more “realistic” or “zigzag” histogram (or density diagrams).

df <- data_frame(
  skewed_normal = rsnorm(n = 1000, mean = 0, sd = 18, xi = 130),
  exp_distrib = rexp(n = 1000, rate = .1),
  gamma_distrib = rgamma(n = 1000, shape = 2, scale = 2),
  beta_distrib = rbeta(n = 1000, shape1 = 4, shape2 = 2)
  
)
mypal <- 

df %>% 
  gather(key = distribution, value = value) %>% 
  ggplot +
  aes(x = value) +
  geom_histogram(aes(fill = distribution)) +
  facet_wrap(~distribution) +
  scale_fill_manual(values = c("red", "green", "blue", "orange")) +
  scale_color_manual(values = c("red", "green", "blue", "orange")) +
  labs(title = "Histogram of random draws from different distributions",
       subtitle = "test")

Note that the domain of the beta distribution is [0,1], that’s why a narrow red bar pops out as histogram (the other distribution spread out much more explicitly). See:

df %>% 
  gather(key = distribution, value = value) %>% 
  dplyr::filter(distribution == "beta_distrib") %>% 
  ggplot +
  aes(x = value) +
  geom_histogram(aes(fill = distribution)) +
  #facet_wrap(~distribution) +
  scale_fill_manual(values = c("red", "green", "blue", "orange")) +
  scale_color_manual(values = c("red", "green", "blue", "orange")) +
  labs(title = "Histogram of random draws from a Beta distribution")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk unnamed-chunk-5

Crashkurs Datenanalyse mit R

Nicht jeder liebt Datenanalyse und Statistik... in gleichem Maße! Das ist zumindest meine Erfahrung aus dem Unterricht 🔥. Crashkurse zu R...… Continue reading

Different ways to count NAs over multiple columns

Published on September 08, 2017

Different ways to present summaries in ggplot2

Published on September 08, 2017