Density plots with ggjoy, evolution of Catalan vote

A few days ago I discovered in Twitter the ggjoy package, an extension for ggplot2. Thanks to Ilya Kashnitsky here I’m playing with ggjoy with, as usual, Catalan election data. As a former geologist, I find very fun density plots. I spent several years working with grain size density plots and the sedimentological concepts associated to them. Easily applicable to data science, as other techniques used by geologist for a century(read the comment about the map in the articles’ end).

1 Methods

Firts, the packages used:

library(tidyverse)
library(forcats)
library(ggjoy)
library(viridis)
library(grid)
library(gridExtra)

Now the data. I’m using the results for all the Catalan regional elections (1980-2015), processed from Idescat.

#Data
df <- read_csv("Catalan Parliament 1980-2015.csv")
glimpse(df)
## Observations: 10,382
## Variables: 28
## $ INE           <chr> "25001", "25001", "25001", "25001", "25001", "25...
## $ Municipality  <chr> "Abella de la Conca", "Abella de la Conca", "Abe...
## $ County        <chr> "Pallars Jussà", "Pallars Jussà", "Pallars Jussà...
## $ Vegueria      <chr> "Alt Pirineu", "Alt Pirineu", "Alt Pirineu", "Al...
## $ Year          <int> 1980, 1984, 1988, 1992, 1995, 1999, 2003, 2006, ...
## $ CiU_votes     <int> 50, 64, 40, 46, 49, 56, 53, 37, 57, 47, NA, 411,...
## $ PSC_votes     <int> 4, 0, 0, 3, 6, 12, 6, 8, 7, 4, 11, 407, 726, 732...
## $ PP_votes      <int> 0, 0, 0, 3, 4, 6, 0, 3, 0, 0, 5, 6, 91, 57, 83, ...
## $ CSQP_votes    <int> 1, 0, 1, 0, 1, 0, 6, 8, 8, 7, 3, 389, 214, 199, ...
## $ ERC_votes     <int> 2, 3, 2, 3, 7, 6, 12, 15, 6, 17, NA, 81, 49, 44,...
## $ Cs_votes      <int> NA, NA, NA, NA, NA, NA, NA, 0, 0, 0, 9, NA, NA, ...
## $ CUP_votes     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 4, 18, NA, N...
## $ JxSí_votes    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57, NA, ...
## $ Other_votes   <int> 19, 3, 4, 5, 3, 3, 0, 0, 5, 2, 4, 315, 90, 264, ...
## $ Total_votes   <int> 76, 70, 47, 60, 70, 83, 77, 71, 83, 81, 107, 160...
## $ Indy_votes    <dbl> 52, 67, 42, 49, 56, 62, 65, 52, 63, 68, 75, 492,...
## $ Unio_votes    <int> 5, 0, 1, 6, 11, 18, 12, 19, 15, 11, 28, 802, 103...
## $ CiU_percent   <dbl> 65.79, 91.43, 85.11, 76.67, 70.00, 67.47, 68.83,...
## $ PSC_percent   <dbl> 5.26, 0.00, 0.00, 5.00, 8.57, 14.46, 7.79, 11.27...
## $ PP_percent    <dbl> 0.00, 0.00, 0.00, 5.00, 5.71, 7.23, 0.00, 4.23, ...
## $ CSQP_percent  <dbl> 1.32, 0.00, 2.13, 0.00, 1.43, 0.00, 7.79, 11.27,...
## $ ERC_percent   <dbl> 2.63, 4.29, 4.26, 5.00, 10.00, 7.23, 15.58, 21.1...
## $ Cs_percent    <dbl> NA, NA, NA, NA, NA, NA, NA, 0.00, 0.00, 0.00, 8....
## $ CUP_percent   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 4.94, 16.82,...
## $ JxSí_percent  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 53.27, N...
## $ Other_percent <dbl> 25.00, 4.29, 8.51, 8.33, 4.29, 3.61, 0.00, 0.00,...
## $ Indy_percent  <dbl> 68.42, 95.71, 89.36, 81.67, 80.00, 74.70, 84.42,...
## $ Unio_percent  <dbl> 6.58, 0.00, 2.13, 10.00, 15.71, 21.69, 15.58, 26...
head(df)
## # A tibble: 6 x 28
##     INE       Municipality        County    Vegueria  Year CiU_votes
##   <chr>              <chr>         <chr>       <chr> <int>     <int>
## 1 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1980        50
## 2 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1984        64
## 3 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1988        40
## 4 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1992        46
## 5 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1995        49
## 6 25001 Abella de la Conca Pallars Jussà Alt Pirineu  1999        56
## # ... with 22 more variables: PSC_votes <int>, PP_votes <int>,
## #   CSQP_votes <int>, ERC_votes <int>, Cs_votes <int>, CUP_votes <int>,
## #   JxSí_votes <int>, Other_votes <int>, Total_votes <int>,
## #   Indy_votes <dbl>, Unio_votes <int>, CiU_percent <dbl>,
## #   PSC_percent <dbl>, PP_percent <dbl>, CSQP_percent <dbl>,
## #   ERC_percent <dbl>, Cs_percent <dbl>, CUP_percent <dbl>,
## #   JxSí_percent <dbl>, Other_percent <dbl>, Indy_percent <dbl>,
## #   Unio_percent <dbl>

Now I’m generating the plots for the main parties. Here the code for one party, which it’s repeated for the others.

#CiU
p1 <- df %>% select(Municipality, INE, Year, CiU_percent) %>% 
  filter(!is.na(CiU_percent)) %>% 
  mutate(
    Municipality = as.factor(Municipality),
    Year = as.factor(Year)
  ) %>% 
  ggplot(aes(x = CiU_percent, y = Year %>% fct_rev())) +
  geom_joy(aes(fill = Year))+
  scale_fill_viridis(discrete = T, option = "D", direction = -1, 
                     begin = .1, end = .9) +
  labs(x = "Percent Vote",
       y = "Election's Year",
       title = "CiU's vote in Catalan elections",
       subtitle = "Analysis unit: municipalities",
       caption = "Marc Belzunces (@marcbeldata)") +
  theme_minimal() +
  theme(legend.position = "none")

The result:

grid.arrange(p1, p2, p3, p4, p5, p6, ncol = 2)

_config.yml

Interpretation of the density plots is out of scope of this exercise, but some words: data corresponds to percent vote for every political party in every of the 947 municipalities in Catalonia. Most of the municipalities are small, below 10,000, and a lot below 2,500. The capital, Barcelona, has 1,6 million inhabitants, and surronding cities arround 100,000-200,000 inhabitants. The problem here, then, is that Barcelona has the same weight as a municipality of 1,000 inhabitants. This exercise would have more sense with census track data, which tends to have the same amount of people (tipically 1000-2000 voters), but I don’t have this data for all the years, in addition that they are not published openly by Idescat.

Last years Catalan politics are focused with the independence of Catalonia. Here the results for indy parties and unionist parties:

grid.arrange(p7,p8, ncol = 2)

_config.yml

All the code used in this post it’s in my GibHubGist.

Written on July 20, 2017