# Shading multiple areas under normal curve

## August 30, 2016

When plotting a normal curve, it is often helpful to color (or shade) some segments. For example, often we might want to indicate whether an absolute value is greater than 2.

How can we achieve this with ggplot2? Here is one way.

First, load packages and define some constants. Specifically, we define mean, sd, and start/end (z-) value of the area we want to shade. And your favorite color is defined.

library(ggplot2)
library(dplyr)

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
##
##     filter, lag

## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union

mean.1 <-0
sd.1 <- 1
zstart <- 2
zend <- 3
zcritical <- 1.65

my_col <- "#00998a"


Next, we build a sequence from 3sd left to 3sd right to the mean. Along this sequence (for each value) we will compute the density of the normal curve. The data will be used for plotting the curve and the shaded area(s).

x <- seq(from = mean.1 - 3*sd.1, to = mean.1 + 3*sd.1, by = .01)

MyDF <- data.frame(x = x, y = dnorm(x, mean = mean.1, sd = sd.1))


shade_curve <- function(MyDF, zstart, zend, fill = "red", alpha = .5){
geom_area(data = subset(MyDF, x >= mean.1 + zstart*sd.1
& x < mean.1 + zend*sd.1),
aes(y=y), fill = fill, color = NA, alpha = alpha)
}


Now we plot it:

p1a <- ggplot(MyDF, aes(x = x, y = y)) + geom_line() +
shade_curve(MyDF = MyDF, zstart = -1, zend = 1, fill = my_col, alpha = .3) +
shade_curve(MyDF = MyDF, zstart = 1, zend = 2, fill = my_col, alpha = .5) +
shade_curve(MyDF = MyDF, zstart = -2, zend = -1, fill = my_col, alpha = .5) +
shade_curve(MyDF = MyDF, zstart = 2, zend = 6, fill = my_col, alpha = .7) +
shade_curve(MyDF = MyDF, zstart = -3, zend = -2, fill = my_col, alpha = .7) +
scale_x_continuous(breaks = -3:3) +
scale_y_continuous(breaks = NULL) +
theme_classic() +
ylab("") + xlab("")

p1a


OK. Another nice feature would be have printed the cumulative percentages for each shaded segment.

For that purpose, let’s add a variable with the cumulative density.

MyDF %>%
mutate(y_cdf = cumsum(y)) -> MyDF


But we are only interested in some quantiles. So let’s filter these and kick out the rest.

MyDF %>%
filter(x %in% c(-3, -2.58, -2, -1.65, -1, -.5, 0, .5, 1, 1.65, 2, 2.58, 3)) -> MyDF_filtered


Now, let’s add the cumulative percentages for some quantiles of interest.

p1a + geom_text(data = MyDF_filtered,
aes(x = x, y = y + .1, label = paste(round(y_cdf, 0),"%")),
check_overlap = TRUE) +
geom_segment(data = MyDF_filtered,
aes(x = x, xend = x, y = 0, yend = y), linetype = "dashed")


### New bar stacking with ggplot 2.2.0

Recently, ggplot2 2.2.0 was released. Among other news, stacking bar plot was improved. Here is a short demonstration.Load libraries`...… Continue reading

#### Crashkurs zur Erstellung von Barplots für Umfrage-Daten

Published on November 13, 2016

#### Some thoughts (and simulation) on overfitting

Published on November 12, 2016