I kept wondering who to plot two R plots side by side (ie., in one “row”) in a .Rmd chunk. Here’s a way, well actually a number of ways, some good, some … not.

library(tidyverse)
library(gridExtra)
library(grid)
library(png)
library(downloader)
library(grDevices)


data(mtcars)

Plots from ggplot

Say, you have two plots from ggplot2, and you would like them to put them next to each other, side by side (not underneath each other):


ggplot(mtcars) +
  aes(x = hp, y = mpg) +
  geom_point() -> p1

ggplot(mtcars) +
  aes(x = factor(cyl), y = mpg) +
  geom_boxplot() +
  geom_smooth(aes(group = 1), se = FALSE) -> p2

grid.arrange(p1, p2, ncol = 2)

plot of chunk p-test

So, grid.arrange is the key.

Plots from png-file

comb2pngs <- function(imgs, bottom_text = NULL){
  img1 <-  grid::rasterGrob(as.raster(readPNG(imgs[1])),
                            interpolate = FALSE)
  img2 <-  grid::rasterGrob(as.raster(readPNG(imgs[2])),
                            interpolate = FALSE)
  grid.arrange(img1, img2, ncol = 2, bottom = bottom_text)
}

The code of this function was inspired by code from Ben from this SO post.

Now, let’s load two pngs and then call the function above.

png1_path <- "https://sebastiansauer.github.io/images/2016-08-30-03.png"
png2_path <- "https://sebastiansauer.github.io/images/2016-08-31-01.png"


png1_dest <- "https://sebastiansauer.github.io/images/2017-10-12/img1.png"
png2_dest <- "https://sebastiansauer.github.io/images/2017-10-12/img2.png"


#download(png1_path, destfile = png1_dest)
#download(png2_path, destfile = png2_dest)

comb2pngs(c(png1_dest, png2_dest))

plot of chunk unnamed-chunk-3

This works, it produces two plots from png files side by side.

Two plots side-by-side the knitr way. Does not work.

But what about the standard knitr way?

knitr::include_graphics(c(png1_dest,png2_dest))

<img src=”“https://sebastiansauer.github.io/images/2017-10-12/img1.png” title=”plot of chunk unnamed-chunk-4” alt=”plot of chunk unnamed-chunk-4” width=”30%” style=”display: block; margin: auto;” /><img src=”“https://sebastiansauer.github.io/images/2017-10-12/img2.png” title=”plot of chunk unnamed-chunk-4” alt=”plot of chunk unnamed-chunk-4” width=”30%” style=”display: block; margin: auto;” />

Does not work.

Maybe with only one value for out.width??

knitr::include_graphics(c(png1_dest, png2_dest))

plot of chunk unnamed-chunk-5plot of chunk unnamed-chunk-5

Nope. Does not work.

Does not work either, despite some saying so.

Maybe two times include_graphics?

imgs <- c(png1_dest, png2_dest)
imgs
#> [1] "https://sebastiansauer.github.io/images/2017-10-12/img1.png"
#> [2] "https://sebastiansauer.github.io/images/2017-10-12/img2.png"

knitr::include_graphics(png1_dest);  knitr::include_graphics(png2_dest)

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

An insight why include_graphics fails

No avail. Looking at the html code in the md-file which is produced by the knitr -call shows one interesting point: all this version of include_graphics produce the same code. And all have this style="display: block; margin: auto;" part in it. That obviously created problems. I am unsure who to convince include_graphics to divorce from this argument. I tried some versions of the chunk argument fig.show = hold, but to no avail.

Plain markdown works

Try this code ![](https://sebastiansauer.github.io/images/2017-10-12/img1.png){ width=30% } ![](https://sebastiansauer.github.io/images/2017-10-12/img2.png){ width=40% } The two commands ![]... need not appear in one row. However, no new paragraph may separate them (no blank line between, otherwise the images will appear one below the other).

{ width=30% } { width=40% }

Works. But the markdown way does not give the fill comfort and power. So, that’s not quite perfect.

Conclusion

A partial solution is there; but it’s not optimal. There wil most probably be different alternatives. For example, using plain html or Latex. But it’s a kind of pity, the include_graphics call does not work as expected (by me).

There is the idea that the alt-right German party AfD is followed by those who are deprived of chances, thoses of fearing to falling down the social ladder, and so on. Let’s test this hypothesis. No, I am not thinking on hypothesis testing, p-values, and stuff. Rather, let’s color a map of German election districts (Wahlkreise) according to whether the area is poor AND the AfD gained a lot of votes (and vice versa: the area is rich AND the AfD gained relatively few votes). More specifically, let’s look at unemployment ratios and incomes at different election areas in the country and compare those figures to AfD election results.

In a similar vein, one could put forward the “unemployment-AfD-accordance hypothesis”: The rank order of a given area is equal (or similar) to the AfD vote rank order. Let’s test this hypothesis in this post.

Packages

library(sf)
library(stringr)
library(tidyverse)
library(readxl)
library(magrittr)
library(huxtable)
library(broom)

Geo data

In this post, we try to map election areas of Germany (“Wahlkreise”) to three types of statistical data:

  • AfD election results
  • unemployment ratio and/or
  • income

:attention: The election ratios are unequal to the district areas (as far as I know, not complete identical to the very least). So will need to get some special geo data. This geo data is available here and the others links on that page.

Download and unzip the data; store them in an appropriate folder. Adjust the path to your needs:

my_path_wahlkreise <- "~/Documents/datasets/geo_maps/btw17_geometrie_wahlkreise_shp/Geometrie_Wahlkreise_19DBT.shp"
file.exists(my_path_wahlkreise)
## [1] TRUE
wahlkreise_shp <- st_read(my_path_wahlkreise)
## Reading layer `Geometrie_Wahlkreise_19DBT' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/btw17_geometrie_wahlkreise_shp/Geometrie_Wahlkreise_19DBT.shp' using driver `ESRI Shapefile'
## Simple feature collection with 299 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 280387.7 ymin: 5235855 xmax: 921025.5 ymax: 6101444
## epsg (SRID):    NA
## proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
glimpse(wahlkreise_shp)
## Observations: 299
## Variables: 5
## $ WKR_NR    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1...
## $ LAND_NR   <fctr> 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 01, 13, 13,...
## $ LAND_NAME <fctr> Schleswig-Holstein, Schleswig-Holstein, Schleswig-H...
## $ WKR_NAME  <fctr> Flensburg – Schleswig, Nordfriesland – Dithmarschen...
## $ geometry  <simple_feature> MULTIPOLYGON (((543474.9057..., MULTIPOLY...
wahlkreise_shp %>%
  ggplot() +
  geom_sf()

plot of chunk unnamed-chunk-3

That was easy, right? The sf package fits nicely with the tidyverse. Hence not much to learn in that regard. I am not saying that geo data is simple, quite the contrary. But luckily the R functions fit in a well known schema.

For instance, let’s remove the axis labels and let’s fill the country with some different color. Hm, what’s the color of Germany? Grey? Black (color of the leading party)? Black-red-gold?

wahlkreise_shp %>%
  ggplot() +
  geom_sf(fill = "grey40") +
  theme_void()

plot of chunk unnamed-chunk-4

Ok, that’s complete. How many of those “WahlKreise” do we have? 318, it appears. At least (that’s the number of rows in the dataframe). That’s not exactly the number of administrative districts in Germany, which is 401.

unemployment ratios

These data can as well be fetched from the same site as above, as mentioned above, we need to make sure that we have the statistics according to the election aras, not the administrative areas.

unemp_file <- "~/Documents/datasets/Strukturdaten_De/btw17_Strukturdaten-utf8.csv"

file.exists(unemp_file)
## [1] TRUE
unemp_de_raw <- read_delim(unemp_file,
    ";", escape_double = FALSE,
    locale = locale(decimal_mark = ",",
        grouping_mark = "."),
    trim_ws = TRUE,
    skip = 8)  # skipt the first 8 rows
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Land = col_character(),
##   `Wahlkreis-Nr.` = col_integer(),
##   `Wahlkreis-Name` = col_character(),
##   `Gemeinden am 31.12.2015 (Anzahl)` = col_integer(),
##   `Verfügbares Einkommen der privaten Haushalte 2014 (€ je Einwohner)` = col_integer(),
##   `Bruttoinlandsprodukt 2014 (€ je Einwohner)` = col_integer(),
##   `Absolventen/Abgänger beruflicher Schulen 2015` = col_character(),
##   `Sozialversicherungspflichtig Beschäftigte am 30.06.2016 - Land- und Forstwirtschaft, Fischerei (%)` = col_character(),
##   `Sozialversicherungspflichtig Beschäftigte am 30.06.2016 - Produzierendes Gewerbe (%)` = col_character(),
##   `Sozialversicherungspflichtig Beschäftigte am 30.06.2016 - Handel, Gastgewerbe, Verkehr (%)` = col_character(),
##   `Sozialversicherungspflichtig Beschäftigte am 30.06.2016 - Öffentliche und private Dienstl eister (%)` = col_character(),
##   Fußnoten = col_character()
## )
## See spec(...) for full column specifications.
#glimpse(unemp_de_raw)

Jezz, we need to do some cleansing before we can work with this dataset.

unemp_names <- names(unemp_de_raw)

unemp_de <- unemp_de_raw

names(unemp_de) <- paste0("V",1:ncol(unemp_de))

The important columns are:

unemp_de <- unemp_de %>%
  rename(state = V1,
         area_nr = V2,
         area_name = V3,
         for_prop = V8,
         pop_move = V11,
         pop_migr_background = V19,
         income = V26,
         unemp = V47)  # total, as to March 2017

AfD election results

Again, we can access the data from the same source, the Bundeswahlleiter here. I have prepared the column names of the data and the data structure, to render the file more accessible to machine parsing. Data points were not altered. You can access my version of the file here.

elec_file <- "~/Documents/datasets/Strukturdaten_De/btw17_election_results.csv"
file.exists(elec_file)
## [1] TRUE
elec_results <- read_csv(elec_file)
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   Gebiet = col_character(),
##   Waehler_Erststimmen_vorlaeufig = col_double(),
##   Waehler_Zweitsimmen_vorlaeufig = col_double(),
##   Waehler_ungueltige_Zweitstimmen_vorlauefig = col_double(),
##   CDU_Erststimmen_Vorperiode = col_character(),
##   CDU_Zweitstimmen_Vorperiode = col_double(),
##   Gruene3 = col_double(),
##   Gruene4 = col_double(),
##   AfD3 = col_double(),
##   AfD4 = col_double(),
##   Piraten4 = col_double(),
##   Vorperiode__3 = col_double(),
##   Vorläufig__4 = col_double(),
##   Vorperiode__18 = col_character(),
##   Vorperiode__20 = col_character(),
##   Vorläufig__22 = col_character(),
##   Vorperiode__22 = col_character(),
##   Vorperiode__23 = col_character(),
##   Vorperiode__24 = col_character(),
##   Vorperiode__25 = col_character()
##   # ... with 46 more columns
## )
## See spec(...) for full column specifications.

For each party, four values are reported:

  1. primary vote, present election
  2. primary vote, previous election
  3. secondary vote, present election
  4. secondary vote, previous election

The secondary vote refers to the party, that’s what we are interested in (column 46). The primary vote refers to the candidate of that area; the pimary vote may also be of similar interest, but that’s a slightly different question, as it taps more into the approval of a person, rather to a party (of course there’s a lot overlap between both in this situation).

# names(elec_results)
afd_prop <- elec_results %>%
  select(1, 2, 46, 18) %>%
  rename(afd_votes = AfD3,
         area_nr = Nr,
         area_name = Gebiet,
         votes_total = Waehler_gueltige_Zweitstimmen_vorlauefig) %>%
  mutate(afd_prop = afd_votes / votes_total) %>%
  na.omit

In the previous step, we have selected the columns of interest, changed their name (shorter, English), and have computed the proportion of (valid) secondary votes in favor of the AfD.

Match unemployment and income to AfD votes for each Wahlkreis

wahlkreise_shp %>%
  left_join(unemp_de, by = c("WKR_NR" = "area_nr")) %>%
  left_join(afd_prop, by = "area_name") -> chloro_data

Plot geo map with afd votes

chloro_data %>%
  ggplot() +
  geom_sf(aes(fill = afd_prop)) -> p1
p1

plot of chunk unnamed-chunk-11

We might want to play with the fill color, or clean up the map (remove axis etc.)

p1 + scale_fill_distiller(palette = "Spectral") +
  theme_void()

plot of chunk unnamed-chunk-12

Geo map (of election areas) with unemployment map

chloro_data %>%
  ggplot() +
  geom_sf(aes(fill = unemp)) +
  scale_fill_distiller(palette = "Spectral") +
  theme_void() -> p2
p2

plot of chunk unnamed-chunk-13

Concordance of AfD results and unemployment/income

Let’s compute the percent ranking for each of the variables of interest (AfD votes, unemployment ratio, and income). Then we can compute the concordance for each pair by simply computing the difference (or maybe absolute difference). After that, we plot this “concordance variables” as fill color to the map.

chloro_data %>%
  mutate(afd_rank = percent_rank(afd_prop),
         unemp_rank = percent_rank(unemp),
         income_rank = percent_rank(income)) %>%
  mutate(afd_income_diff = subtract(afd_rank, income_rank),
         afd_unemp_diff = subtract(afd_rank, unemp_rank)) -> chloro_data

Let’s check the first ranks for each of the variables of interest. AfD ranks first:

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_rank, afd_prop, unemp_rank, income_rank) %>%
  arrange(-afd_rank) %>%
  slice(1:5)
## # A tibble: 5 x 5
##                          area_name  afd_rank  afd_prop unemp_rank
##                              <chr>     <dbl>     <dbl>      <dbl>
## 1 Sächsische Schweiz-Osterzgebirge 1.0000000 0.3546391  0.5637584
## 2                          Görlitz 0.9966443 0.3288791  0.9463087
## 3                           Meißen 0.9932886 0.3287724  0.7114094
## 4                        Bautzen I 0.9899329 0.3276397  0.6476510
## 5                    Mittelsachsen 0.9865772 0.3124584  0.5872483
## # ... with 1 more variables: income_rank <dbl>

Saxonia leads. Unemployment “top” places:

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_prop, unemp_rank, income_rank) %>%
  arrange(-unemp_rank) %>%
  slice(1:5)
## # A tibble: 5 x 4
##                                    area_name  afd_prop unemp_rank
##                                        <chr>     <dbl>      <dbl>
## 1                              Gelsenkirchen 0.1703539  1.0000000
## 2                                 Duisburg I 0.1132661  0.9932886
## 3                                Duisburg II 0.1542772  0.9932886
## 4 Vorpommern-Rügen – Vorpommern-Greifswald I 0.1964088  0.9832215
## 5                                   Essen II 0.1496040  0.9832215
## # ... with 1 more variables: income_rank <dbl>

The Ruhrpott is ahead of this sad pack. And the lowest unemployment ranks are at:

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_prop, unemp_rank, income_rank) %>%
  arrange(unemp_rank) %>%
  slice(1:5)
## # A tibble: 5 x 4
##            area_name  afd_prop  unemp_rank income_rank
##                <chr>     <dbl>       <dbl>       <dbl>
## 1 Erding – Ebersberg 0.1191024 0.000000000   0.9496644
## 2           Freising 0.1356325 0.003355705   0.7751678
## 3         Donau-Ries 0.1470312 0.003355705   0.8255034
## 4         Ingolstadt 0.1513400 0.010067114   0.5973154
## 5            Neu-Ulm 0.1511712 0.010067114   0.8456376

And finale income, low 5 and top 5:

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_prop, unemp_rank, income_rank) %>%
  arrange(income_rank) %>%
  slice(c(1:5, 294:299))
## # A tibble: 11 x 4
##                             area_name   afd_prop unemp_rank income_rank
##                                 <chr>      <dbl>      <dbl>       <dbl>
##  1                      Gelsenkirchen 0.17035387 1.00000000 0.000000000
##  2                          Leipzig I 0.20813820 0.79530201 0.003355705
##  3                         Leipzig II 0.15977744 0.79530201 0.003355705
##  4                              Halle 0.17787876 0.92281879 0.010067114
##  5                         Duisburg I 0.11326607 0.99328859 0.013422819
##  6 Bad Tölz-Wolfratshausen – Miesbach 0.11688174 0.04026846 0.983221477
##  7                        Main-Taunus 0.10320408 0.20805369 0.986577181
##  8                         Hochtaunus 0.11155599 0.23825503 0.989932886
##  9      Starnberg – Landsberg am Lech 0.09900204 0.04026846 0.993288591
## 10                          Heilbronn 0.16416233 0.26174497 0.996644295
## 11                       München-Land 0.09399476 0.03355705 1.000000000

Now plot.

chloro_data %>%
  ggplot() +
  geom_sf(aes(fill = afd_unemp_diff)) +
  scale_fill_gradient2() +
  theme_void() -> p3
p3

plot of chunk unnamed-chunk-19

The fill color denotes the difference between unemployment rank of a given area and its afd vote rank. For example, if area X has an unemployment rank of .5 (50%), it means that half of the areas in the country have a lower (higher) unemployment ratio, respectively (the median). Similarly, an AfD vote rank of .5 indicates the median position. The difference of these two figures is zero, indicating accordance or close match. Thus, figures around zero denote accordance or match. 1 (100%) of AfD vote rank indicates the area with the best AfD results (area with the most votes); similar reasoning applies for income and unemployment ratio.

Hence, numbers greater than zero indicate that the AfD scored better than it would be expected by the accordance-hypothesis.

Similarly, numbers smaller than zero indicate that the AfD scored better than it would be expected by the accordance-hypothesis.

Areas with (near) white filling provide some support for the accordance hypothesis. There are areas of this type, but it is not the majority. The vast majority of areas showed too much or too little AfD - relative to their unemployment ratio.

This reasonsing shows that the AfD received better results in southern and middle areas of Germany than it would be expected by the accordance hypothesis. In contrast, the more poorer northern areas voted for the AfD much less often than it would be expected by the accordance hypothesis.

Let’s look at the areas with minimal and maximal dis-accordance, out of curiosity.

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_unemp_diff, unemp, afd_prop) %>%
  arrange(afd_unemp_diff) %>%
  slice(c(1:5, 295:299)) %>% hux %>%
  add_colnames
area_name afd_unemp_diff unemp afd_prop
Essen III -0.85 11.90 0.08
Berlin-Friedrichshain-Kreuzberg – Prenzlauer Berg Ost -0.83 9.40 0.06
Köln II -0.80 8.50 0.05
Kiel -0.78 8.80 0.07
Bremen I -0.77 9.80 0.08
Deggendorf 0.71 3.60 0.19
Rottal-Inn 0.76 3.10 0.17
Donau-Ries 0.77 2.20 0.15
Neu-Ulm 0.78 2.50 0.15
Ingolstadt 0.78 2.50 0.15

And areas with high accordance (diff score close to zero):

chloro_data %>%
  as.data.frame %>%
  select(area_name, afd_unemp_diff, unemp, afd_prop) %>%
  arrange(afd_unemp_diff) %>%
  filter(afd_unemp_diff > -0.05, afd_unemp_diff < .05) %>%
  hux %>%
  add_colnames
area_name afd_unemp_diff unemp afd_prop
Rostock – Landkreis Rostock II -0.04 8.90 0.16
Offenbach -0.03 6.60 0.12
Berlin-Lichtenberg -0.03 9.40 0.17
Rhein-Sieg-Kreis I -0.03 5.20 0.10
Mosel/Rhein-Hunsrück -0.03 4.10 0.09
Berlin-Treptow-Köpenick -0.02 9.40 0.17
Herzogtum Lauenburg – Stormarn-Süd -0.02 4.80 0.10
Helmstedt – Wolfsburg -0.01 5.80 0.11
Oberbergischer Kreis -0.01 5.50 0.11
Kreuznach -0.01 6.40 0.12
Herford – Minden-Lübbecke II -0.01 5.70 0.11
Mansfeld -0.01 10.20 0.24
Siegen-Wittgenstein 0.00 5.40 0.11
Minden-Lübbecke I 0.01 5.30 0.11
Heidelberg 0.01 4.30 0.09
Leipzig II 0.01 8.30 0.16
Neuwied 0.02 5.30 0.11
Osterholz – Verden 0.03 4.40 0.10
Prignitz – Ostprignitz-Ruppin – Havelland I 0.03 8.90 0.19
Dessau – Wittenberg 0.04 9.10 0.20
Elbe-Elster – Oberspreewald-Lausitz II 0.04 9.60 0.25

Similar story for income.

chloro_data %>%
  ggplot() +
  geom_sf(aes(fill = afd_income_diff)) +
  scale_fill_gradient2() +
  theme_void() -> p4
p4

plot of chunk unnamed-chunk-22

The map shows a clear pattern: The eastern parts of Germany are far more afd-oriented than their income rank would predict (diff scores above zero, blue color). However, for some areas across the whole rest of the country, the likewise pattern is true too: A lot areas are rich and do not vote for the AfD (reddish color, diff score below zero). And, thirdly, a lot of aras support the accordance hypothesis (white color, diff score around zero).

More simple map

Maybe we should simplify the map: Let’s only distinguish three type of areas: too much AfD in comparison to the unemployment, too few AfD for the unemployment, or AfD at par with unemployment. Maybe the picture is more clearcut then.

chloro_data %>%
  select(afd_unemp_diff) %>%
  mutate(afd_unemp_diff_3g = cut_interval(afd_unemp_diff, n = 3,
         labels = c("AFD < Arbeitslosigkeit",
                    "AFD = Arbeitslosigkeit",
                    "AFD > Arbeitslosigkeit"))) %>%
  ggplot() +
  geom_sf(aes(fill = afd_unemp_diff_3g)) +
  labs(fill) +
  theme_void()

plot of chunk unnamed-chunk-23

“AfD density”

In a similar vein, we could compute the ratio of AfD votes and unemployment. That would give us some measure of covariability. Let’s see.

library(viridis)
chloro_data %>%
  mutate(afd_dens = afd_prop / unemp) %>%
  ggplot +
  geom_sf(aes(fill = afd_dens)) +
  theme_void() +
  scale_fill_viridis()

plot of chunk unnamed-chunk-24

The diagram shows that in relation to unemployment, the AfD votes are strongest in central Bavaria (Oberbayern). Don’t forget that this measure is an indication of co-occurence, not of absolute AfD votes.

Correlation of unemployment and AfD votes

A simple, straight-forward and well-known approach to devise assocation strength is Pearson’s correlation coefficient. Oldie but goldie. Let’s depict it.

chloro_data %>%
  select(unemp, afd_prop, income, area_name) %>%
  ggplot +
  aes(x = unemp, y = afd_prop) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk unnamed-chunk-25

The correlation itself is

chloro_data %>%
  select(unemp, afd_prop, income, area_name) %>%
  as.data.frame %T>%
  summarise(cor_afd_unemp = cor(afd_prop, unemp)) %>%
  do(tidy(cor.test(.$afd_prop, .$unemp)))
##    estimate statistic     p.value parameter  conf.low conf.high
## 1 0.1834224    3.2156 0.001445273       297 0.0714793 0.2908024
##                                 method alternative
## 1 Pearson's product-moment correlation   two.sided

Which is not strong, but is certainly more than mere noise (and p-value below some magic number).

Regression residuals of predicting unemployment by afd_score

Let’s predict the AfD vote score taking the unemployment as an predictor. Then let’s plot the residuals to see how good the prediction is, ie., how close (or rather, far) the association of unemployment and AfD voting is.

lm1 <- lm(afd_prop ~ unemp, data = chloro_data)

chloro_data %>%
  mutate(afd_lm1 = lm(afd_prop ~ unemp, data = .)$residuals) -> chloro_data

And now plot the residuals:

chloro_data %>%
  select(afd_lm1) %>%
  ggplot() +
  geom_sf(aes(fill = afd_lm1)) +
  scale_fill_gradient2() +
  theme_void()

plot of chunk unnamed-chunk-28

Interesting! This model shows a clearcut picture: The eastern part is too “afd” for its unemployment ratio (some parts of east-southern Bavaria too); the west is less afd-ic than what would be expected by the unemployment. The rest (middle and south) parts over-and-above show the AfD levels that woul be expected by their unemployment.

Conclusion

The regression model provides a quite clearcut picture, much more than the rank difference has unveiled. The difference in information may be due to the fact that the rank difference is of ordinal level only, and hence omits information compared to the regression level. The story of the data may thus be summarized in easy words: The higher the unemployment ratio, the higher the AfD ratio. However, this is only part of the story. unemployment explains a rather small fraction of AfD votes. Yet, given the multitude of potential influences on voting behavior, a correlation coefficient of .18 is not negligible, rather substantial.

A chloropleth map is a geographic map where statistical information are mapped to certain areas. Let’s plot such a chloropleth map in this post.

Packages

library(sf)
library(stringr)
library(tidyverse)
library(readxl)

Geo data

Best place to get German geo data is from the “Bundesamt für Kartografie und Geodäsie (BKG)”. One may basically use the data for a purposes unless it is against the law. I have downloaded the data 2017-10-09. More specifically, we are looking at the “Verwaltungsgebiete” (vg), that is, the administrative areas of the country, ie., counties, states etc.

Look for the “Verwaltungsgebiete”. On this page you’ll get the geo data scaled 1:2,500,000. This material includes area map of the whole state (hence sta, see below) of Germany.

It’s quite a bit of data. Download it, unzip it and adapt your path variables accordingly. The data is stores as shape files, a standard format for open geo data.

my_path_vg2500_sta <- "~/Documents/datasets/geo_maps/vg2500/vg2500_sta.shp"
file.exists(my_path_vg2500_sta)
#> [1] TRUE
vg2500_sta <- st_read(my_path_vg2500_sta)
#> Reading layer `vg2500_sta' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/vg2500/vg2500_sta.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 3 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5235878 xmax: 921021.2 ymax: 6104656
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
vg2500_sta %>%
  slice(2:3) %>%  # line 1 are coastal "water" areas
  ggplot() +
  geom_sf()

plot of chunk unnamed-chunk-3

That was easy, right? The sf package fits nicely with the tidyverse. Hence not much to learn in that regard. I am not saying that geo data is simple, quite the contrary. But luckily the R functions fit in a well known schema.

For instance, let’s remove the axis labels and let’s fill the country with some different color. Hm, what’s the color of Germany? Grey? Black (color of the leading party)? Black-red-gold?

vg2500_sta %>%
  slice(2:3) %>%
  ggplot() +
  geom_sf(fill = "grey40") +
  theme_void()

plot of chunk unnamed-chunk-4

Now, let’s have a look to the more hi-res data. There a a number of files. We will first use the files with borders, or, geometrically, lines:

my_path_vg250_L <- "~/Documents/datasets/geo_maps/vg250/vg250_kompakt/VG250_L.shp"  # de for Germany (country code), and L as in lines
vg_250_L <- st_read(my_path_vg250_L)
#> Reading layer `VG250_L' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/vg250/vg250_kompakt/VG250_L.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 36292 features and 3 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6106244
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
head(vg_250_L)
#> Simple feature collection with 6 features and 3 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 312391.4 ymin: 5271478 xmax: 897074.9 ymax: 5657429
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
#>   AGZ RDG GM5                       geometry
#> 1   1   1   0 LINESTRING (439205.44865438...
#> 2   1   1   0 LINESTRING (316806.88365514...
#> 3   1   1   0 LINESTRING (390438.74675088...
#> 4   1   1   0 LINESTRING (392662.34014679...
#> 5   1   1   0 LINESTRING (826537.76462522...
#> 6   1   1   0 LINESTRING (896720.92151288...

st_read comes as a friend, smooth and hasslefree. All geo data is stored in one column: geometry. The other columns provide district data. The most important one here for us at this point is AGZ meaning type of border. Value 1 is national border. All right then, let’s draw the country borders.

vg_250_L %>%
  filter(AGZ == 1) %>%
  ggplot +
  geom_sf()

plot of chunk unnamed-chunk-6

Note that this time, we have lines, no areas, hence we cannot fill the country with some color.

Now, let’s look at the larger Verwaltungsgebiete of Germany, they are:

  • Land (state; lan)
  • Regierungsbezirk (legislative area, rbz)
  • Kreis (county, krs)

Same principle as before:

my_path_vg2500_lan <- "~/Documents/datasets/geo_maps/vg2500/vg2500_lan.shp"
file.exists(my_path_vg2500_sta)
#> [1] TRUE
vg2500_lan <- st_read(my_path_vg2500_lan)
#> Reading layer `vg2500_lan' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/vg2500/vg2500_lan.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 16 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5235878 xmax: 921021.2 ymax: 6101335
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
head(vg2500_lan)
#> Simple feature collection with 6 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5423990 xmax: 674168.3 ymax: 5979709
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
#>   ADE RS         RS_0                 GEN                       geometry
#> 1   2 02 020000000000             Hamburg MULTIPOLYGON (((578594.2157...
#> 2   2 03 030000000000       Niedersachsen MULTIPOLYGON (((354762.3478...
#> 3   2 04 040000000000              Bremen MULTIPOLYGON (((468599.9059...
#> 4   2 05 050000000000 Nordrhein-Westfalen MULTIPOLYGON (((477387.5570...
#> 5   2 06 060000000000              Hessen MULTIPOLYGON (((546606.7604...
#> 6   2 07 070000000000     Rheinland-Pfalz MULTIPOLYGON (((418854.5347...
vg2500_lan %>%
  ggplot() +
  geom_sf()

plot of chunk unnamed-chunk-8

my_path_vg2500_rbz <- "~/Documents/datasets/geo_maps/vg2500/vg2500_rbz.shp"
file.exists(my_path_vg2500_sta)
#> [1] TRUE
vg2500_rbz <- st_read(my_path_vg2500_rbz)
#> Reading layer `vg2500_rbz' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/vg2500/vg2500_rbz.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 19 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5235878 xmax: 855553.1 ymax: 5820162
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
head(vg2500_rbz)
#> Simple feature collection with 6 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5471359 xmax: 553250.7 ymax: 5820162
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
#>   ADE  RS         RS_0        GEN                       geometry
#> 1   3 051 051000000000 Düsseldorf MULTIPOLYGON (((308582.1649...
#> 2   3 053 053000000000       Köln MULTIPOLYGON (((384170.0176...
#> 3   3 055 055000000000    Münster MULTIPOLYGON (((415321.4839...
#> 4   3 057 057000000000    Detmold MULTIPOLYGON (((477387.5570...
#> 5   3 059 059000000000   Arnsberg MULTIPOLYGON (((397616.4384...
#> 6   3 064 064000000000  Darmstadt MULTIPOLYGON (((506866.1477...
vg2500_rbz %>%
  ggplot() +
  geom_sf()

plot of chunk unnamed-chunk-10

Hmm, this one looks weird; some areas appear to be lost in action.

vg2500_rbz %>% pull(GEN)
#>  [1] Düsseldorf    Köln          Münster       Detmold       Arnsberg     
#>  [6] Darmstadt     Gießen        Kassel        Stuttgart     Karlsruhe    
#> [11] Freiburg      Tübingen      Oberbayern    Niederbayern  Oberpfalz    
#> [16] Oberfranken   Mittelfranken Unterfranken  Schwaben     
#> 19 Levels: Arnsberg Darmstadt Detmold Düsseldorf Freiburg ... Unterfranken

Only 19 Kreises are reported, and no values from northern or eastern part of the country. Strange.

my_path_vg2500_krs <- "~/Documents/datasets/geo_maps/vg2500/vg2500_krs.shp"
file.exists(my_path_vg2500_krs)
#> [1] TRUE
vg2500_krs <- st_read(my_path_vg2500_krs)
#> Reading layer `vg2500_krs' from data source `/Users/sebastiansauer/Documents/datasets/geo_maps/vg2500/vg2500_krs.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 401 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 280353.1 ymin: 5235878 xmax: 921021.2 ymax: 6101335
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
head(vg2500_krs)
#> Simple feature collection with 6 features and 4 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 478836.8 ymin: 5913378 xmax: 629246.8 ymax: 6075267
#> epsg (SRID):    NA
#> proj4string:    +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
#>   ADE    RS         RS_0                 GEN
#> 1   4 01001 010010000000           Flensburg
#> 2   4 01002 010020000000                Kiel
#> 3   4 01003 010030000000              Lübeck
#> 4   4 01004 010040000000          Neumünster
#> 5   4 01051 010510000000        Dithmarschen
#> 6   4 01053 010530000000 Herzogtum Lauenburg
#>                         geometry
#> 1 MULTIPOLYGON (((531470.9568...
#> 2 MULTIPOLYGON (((577310.2796...
#> 3 MULTIPOLYGON (((624204.4378...
#> 4 MULTIPOLYGON (((567602.4930...
#> 5 MULTIPOLYGON (((479551.7420...
#> 6 MULTIPOLYGON (((616195.6437...
vg2500_krs %>%
  ggplot() +
  geom_sf()

plot of chunk unnamed-chunk-13

Ok, that’s complete. How many of those “Kreise” do we have? Well, 401, that’s the number of rows we have in the data frame.

unemployment rates

These data can as well be fetched from official sites, that’s in this case the “Bundesämter für Statistik”. We consider here the unemployment rates of German counties for 2016 as provided by this agency.

unemp_file <- "~/Documents/datasets/Strukturdaten_de/GUEST_9559782118415_Arbeitslosenquote_2016.csv"
file.exists(unemp_file)
#> [1] TRUE

unemp_de_raw <- read_csv2(unemp_file)
unemp_de_raw %>%
  mutate(Wert = as.numeric(Wert)) %>%
  separate(Name, into = c("Name", "vg_type"), sep = ",") -> unemp_de

unemp_de %>%
  filter(Name == "Nürnberg")
#> # A tibble: 1 x 4
#>   Schluessel     Name vg_type  Wert
#>        <chr>    <chr>   <chr> <dbl>
#> 1      09564 Nürnberg    <NA>   6.6

Match unemployment to geo data

Presumingly, we will not have a perfect match, but let’s see how good or bad it will be out of the box.

vg2500_krs %>%
  mutate(GEN = as.character(GEN)) %>%
  left_join(unemp_de, by = c("GEN" = "Name")) -> vg2500_krs_unemp

Let’s plot the unemployment rates per administrative area.

vg2500_krs_unemp %>%
  ggplot() +
  geom_sf(aes(fill = Wert)) +
  labs(fill = "unemployment\nrate") +
  scale_fill_distiller(palette = "RdYlGn") +
  theme_void()

plot of chunk unnamed-chunk-16

Appeared to work pretty well; but I have not checked the details yet. For the top right state, Mecklenburg-Vorpommern, no data were made available by the agency.

Let’s draw a map of Bavaria, a state of Germany, in this post.

Packages

library(tidyverse)
library(maptools)
library(sf)
library(RColorBrewer)
library(ggmap)
library(viridis)
library(stringr)

Data

Let’s get the data first. Basically, we need to data files:

  • the shape file, ie., a geographic details of state borders and points of interest
  • the semantic information to points of interest eg., town names

Shape file

The shape file can be downloaded from this source: http://www.metaspatial.net/download/plz.tar.gz

This site also looks great to get geospatial data.

These data are PD as stated here. Download and unpack the data. Let’s assume that the data are stored in a path called my_path. See for instance in my case:

# replace the following path with your path:
my_path <- "/Users/sebastiansauer/Documents/Datensaetze/plz/"
my_shape <- "post_pl.shp"

shape_dest <- paste0(my_path, my_shape)
file.exists(shape_dest)
#> [1] TRUE

Parse the shape data:

de_shape <- sf::st_read(shape_dest)  # "de" is for Germany
#> Reading layer `post_pl' from data source `/Users/sebastiansauer/Documents/Datensaetze/plz/post_pl.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 8270 features and 3 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 5.866286 ymin: 47.2736 xmax: 15.04863 ymax: 55.05826
#> epsg (SRID):    NA
#> proj4string:    NA

Check the result:

head(de_shape)
#> Simple feature collection with 6 features and 3 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 13.69598 ymin: 51.02433 xmax: 13.91312 ymax: 51.14665
#> epsg (SRID):    NA
#> proj4string:    NA
#>   PLZ99 PLZ99_N PLZORT99                       geometry
#> 1 01067    1067  Dresden MULTIPOLYGON (((13.7189358 ...
#> 2 01069    1069  Dresden MULTIPOLYGON (((13.74983892...
#> 3 01097    1097  Dresden MULTIPOLYGON (((13.727583 5...
#> 4 01099    1099  Dresden MULTIPOLYGON (((13.82015592...
#> 5 01109    1109  Dresden MULTIPOLYGON (((13.75953588...
#> 6 01127    1127  Dresden MULTIPOLYGON (((13.72007952...
de_shape$PLZORT99 %>% head
#> [1] Dresden Dresden Dresden Dresden Dresden Dresden
#> 6359 Levels: Aach Aachen Aalen Aarbergen Abenberg Abensberg ... \xdfxheim

Note that the file contains semantic data, too (zip code, town names).

Some encoding problems. More often than not, text data is in the wrong format. Blind guessing: Data was stored on a Windows machine, hence in Latin1. My system assumes UTF-8, so culture clash is to expected. Let’s try to fix.

de_shape$PLZORT99 <- as.character(de_shape$PLZORT99)
Encoding(de_shape$PLZORT99) <- "latin1"
slice(de_shape$PLZORT99, 90:100)
#> Error in UseMethod("slice_"): no applicable method for 'slice_' applied to an object of class "character"

Ok, fixed.

Shape semantics

“Shape semantics” is a rather fancy word for the county/community/town names. We can access the data here.

Download and unzip the data, then move it to my_path. The German word for “zip code” is “PLZ”, so that’s why I call it my_plz:

my_plz <- "PLZ_Ort_Long_Lat_Land_DE.txt"
de_plz <- paste0(my_path, my_plz)
file.exists(de_plz)
#> [1] TRUE

plz_df <- read_tsv(file = de_plz, col_names = FALSE)  # tab separated data

In the help file from the downloaded data, we find some explanation as to the columns:

plz_df <- plz_df %>%
  rename(country_code = X1,
         postal_code = X2,
         place_name = X3,
         state = X4,
         state_code = X5,
         county = X6,
         county_code = X7,
         community = X8,
         community_code = X9,
         lat = X10,
         long = X11,
         accuracy = X12)  # accuracy of lat/lng from 1=estimated to 6=centroid


glimpse(plz_df)
#> Observations: 16,481
#> Variables: 12
#> $ country_code   <chr> "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE",...
#> $ postal_code    <chr> "01945", "01945", "01945", "01945", "01945", "0...
#> $ place_name     <chr> "Hohenbocka", "Lindenau", "Schwarzbach", "Grüne...
#> $ state          <chr> "Brandenburg", "Brandenburg", "Brandenburg", "B...
#> $ state_code     <chr> "BB", "BB", "BB", "BB", "BB", "BB", "BB", "BB",...
#> $ county         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
#> $ county_code    <chr> "00", "00", "00", "00", "00", "00", "00", "00",...
#> $ community      <chr> "Landkreis Oberspreewald-Lausitz", "Landkreis O...
#> $ community_code <int> 12066, 12066, 12066, 12066, 12066, 12066, 12066...
#> $ lat            <dbl> 51.4, 51.4, 51.5, 51.4, 51.4, 51.5, 51.4, 51.4,...
#> $ long           <dbl> 14.0, 13.7, 13.9, 14.0, 13.9, 13.9, 13.9, 13.7,...
#> $ accuracy       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

Test it

Now, enough of grey theory! Unfortunately, I don’t know any nice Hamlet citation here, which would certainly be impressive.

my_col <- sample(1:7, length(de_shape), replace = T)
plot(de_shape[c("PLZ99", "geometry")],
     col = brewer.pal(7, "Greens")[my_col], border = F)

plot of chunk unnamed-chunk-8

Filter Bavaria

plz_bavaria <- plz_df %>%
  filter(state == "Bayern")

plz_bavaria_vec <- plz_bavaria$postal_code

Easy; ~2259 zip codes in Bavaria.

Now let’s filter the shape file.

my_rows <- de_shape$PLZ99 %in% plz_bavaria_vec

And plot Bavaria:

my_col <- sample(1:7, length(de_shape), replace = T)

bav_data <- de_shape[my_rows,c("PLZ99", "geometry")]
plot(bav_data,
     col = brewer.pal(7, "Oranges")[my_col], border = F)

plot of chunk unnamed-chunk-11

Now plot with ggplot2


bav_data$PLZ2 <- str_extract(bav_data$PLZ99, "\\d\\d")


ggplot(data = bav_data) +
  geom_sf(aes(fill = PLZ2)) +
  scale_fill_viridis_d() +
  guides(fill = FALSE) -> p_bavaria
p_bavaria

plot of chunk unnamed-chunk-12

Sample some counties

FWIW, let’s sample some counties, and fill them in a distinct color.


bav_data$PLZ_sample <- sample(c(0,1),
                              size = nrow(bav_data),
                              prob = c(.99, .01),
                              replace = TRUE)


ggplot(data = bav_data) +
  geom_sf(aes(fill = PLZ_sample)) +
  scale_fill_viridis() +
  guides(fill = FALSE) -> p_bavaria2
p_bavaria2

plot of chunk unnamed-chunk-13

Let’s define our own palette:

bav_data$PLZ_sample <- sample(c(0,1),
                              size = nrow(bav_data),
                              prob = c(.99, .01),
                              replace = TRUE) %>% factor
my_pal <- c("grey80", "firebrick")
ggplot(data = bav_data) +
  geom_sf(aes(fill = PLZ_sample)) +
  scale_fill_manual(values = my_pal) +
  guides(fill = FALSE) -> p_bavaria3
p_bavaria3

plot of chunk unnamed-chunk-14

Bigger visual chunks

Let’s cut the polygons in bigger chunks, say, according to the first digit in the zip code.

bav_data$PLZ1 <- str_extract(bav_data$PLZ99, "\\d")

my_pal <- c("grey80", "firebrick")
ggplot(data = bav_data) +
  geom_sf(aes(fill = PLZ1)) +
  scale_fill_viridis_d() -> p_bavaria4
p_bavaria4

plot of chunk unnamed-chunk-15

Happy map painting!

Hier finden Sie eine Auswahl an wissenschaftlichen Kongressen in 2018 aus der Wirtschaftspsychologie und angrenzenden Feldern.

Nationale Kongresse (in DACH)

Internationale Kongresse