Density plots with ggjoy, Indy vs Unionist vote in 2015 Catalan elections

This is my second post about ggjoy, using Catalan political data for playing, as usual. In the first post I compared the evolution of Catalan vote in all elections. Now, I’m focusing in the last elections, looking for Indy and Unionist vote in every municipality, with the vote distribution in census tracts.

1 Methods

First, as always, loading packages and data:

library(tidyverse)
library(readxl)
library(forcats)
library(ggjoy)
library(stringr)

#Data obtained and processed from Idescat (not public)
df <- read_excel("27S Seccions Censals Provisional TREBALL.xlsx")
glimpse(df)
## Observations: 5,048
## Variables: 44
## $ Codi_INE     <dbl> 800101001, 800101002, 800101003, 800101004, 80010...
## $ INEmuni      <dbl> 8001, 8001, 8001, 8001, 8001, 8001, 8001, 8002, 8...
## $ SeccioCensal <chr> "Districte 1 Secció 1", "Districte 1 Secció 2", "...
## $ MuniNO       <chr> "Abrera", "Abrera", "Abrera", "Abrera", "Abrera",...
## $ ComCod       <dbl> 11, 11, 11, 11, 11, 11, 11, 7, 21, 21, 21, 21, 24...
## $ Comarca      <chr> "Baix Llobregat", "Baix Llobregat", "Baix Llobreg...
## $ Cens         <dbl> 1156, 1241, 570, 1297, 1356, 1837, 1371, 203, 195...
## $ Votants      <dbl> 930, 972, 477, 1071, 1029, 1465, 1129, 164, 1639,...
## $ Participacio <dbl> 80.45, 78.32, 83.68, 82.58, 75.88, 79.75, 82.35, ...
## $ Abstencio    <dbl> 226, 269, 93, 226, 327, 372, 242, 39, 319, 328, 2...
## $ AbsCent      <dbl> 19.55, 21.68, 16.32, 17.42, 24.12, 20.25, 17.65, ...
## $ Nuls         <dbl> 7, 0, 3, 3, 2, 8, 5, 0, 7, 2, 0, 6, 0, 4, 0, 3, 2...
## $ NulsCent     <dbl> 0.61, 0.00, 0.53, 0.23, 0.15, 0.44, 0.36, 0.00, 0...
## $ Blancs       <dbl> 2, 6, 1, 8, 8, 7, 8, 1, 14, 5, 2, 7, 0, 7, 2, 5, ...
## $ BlancCent    <dbl> 0.17, 0.48, 0.18, 0.62, 0.59, 0.38, 0.58, 0.49, 0...
## $ VotsCand     <dbl> 921, 966, 473, 1060, 1019, 1450, 1116, 163, 1618,...
## $ Valids       <dbl> 923, 972, 474, 1068, 1027, 1457, 1124, 164, 1632,...
## $ Indy         <dbl> 319, 239, 189, 388, 242, 448, 313, 139, 1103, 786...
## $ IndyCent     <dbl> 34.56, 24.59, 39.87, 36.33, 23.56, 30.74, 27.85, ...
## $ Altres       <dbl> 602, 727, 284, 672, 777, 1002, 803, 24, 515, 683,...
## $ AltresCent   <dbl> 65.44, 75.41, 60.13, 63.67, 76.44, 69.26, 72.15, ...
## $ Partit1r     <chr> "JxSí", "Cs", "JxSí", "JxSí", "PSC", "Cs", "Cs", ...
## $ JxSi         <dbl> 249, 187, 158, 298, 174, 363, 256, 121, 967, 686,...
## $ Cs           <dbl> 201, 227, 114, 277, 246, 399, 307, 4, 169, 273, 1...
## $ PSC          <dbl> 175, 219, 74, 146, 247, 223, 191, 0, 119, 97, 76,...
## $ CSQP         <dbl> 106, 149, 52, 138, 143, 170, 180, 5, 78, 84, 35, ...
## $ PP           <dbl> 84, 90, 28, 70, 100, 157, 90, 4, 89, 164, 95, 133...
## $ CUP          <dbl> 70, 52, 31, 90, 68, 85, 57, 18, 136, 100, 121, 11...
## $ UDC          <dbl> 18, 20, 7, 26, 18, 21, 21, 10, 55, 52, 53, 47, 6,...
## $ Pacma        <dbl> 14, 16, 4, 6, 10, 27, 5, 0, 2, 10, 10, 8, 0, 7, 1...
## $ Recortes     <dbl> 4, 6, 5, 9, 13, 5, 9, 1, 3, 3, 1, 1, 0, 3, 2, 3, ...
## $ Guanyem      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Pirates      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ JxSicent     <dbl> 26.98, 19.24, 33.33, 27.90, 16.94, 24.91, 22.78, ...
## $ CsCent       <dbl> 21.78, 23.35, 24.05, 25.94, 23.95, 27.39, 27.31, ...
## $ PSCcent      <dbl> 18.96, 22.53, 15.61, 13.67, 24.05, 15.31, 16.99, ...
## $ CSQPcent     <dbl> 11.48, 15.33, 10.97, 12.92, 13.92, 11.67, 16.01, ...
## $ Ppcent       <dbl> 9.10, 9.26, 5.91, 6.55, 9.74, 10.78, 8.01, 2.44, ...
## $ CUPcent      <dbl> 7.58, 5.35, 6.54, 8.43, 6.62, 5.83, 5.07, 10.98, ...
## $ UDCcent      <dbl> 1.95, 2.06, 1.48, 2.43, 1.75, 1.44, 1.87, 6.10, 3...
## $ PacCent      <dbl> 1.52, 1.65, 0.84, 0.56, 0.97, 1.85, 0.44, 0.00, 0...
## $ RecCent      <dbl> 0.43, 0.62, 1.05, 0.84, 1.27, 0.34, 0.80, 0.61, 0...
## $ GuanyCent    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ PiratCent    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

Now I select percentage data and some other columns, generating three more variables:

df2 <- df %>% select(
  Codi_INE,
  MuniNO,
  Comarca,
  Cens,
  Votants,
  Participacio,
  contains("Cent")
  ) %>% 
  mutate(
   UnioCent1 = CsCent+PSCcent+CSQPcent+Ppcent+UDCcent,
   UnioCent2 = CsCent+PSCcent+Ppcent,
   FedeCent = CSQPcent+UDCcent
  )

Selection of most populated municipalities

Catalonia has currently 948 municipalities, which are impossible to show simultaneously in a plot. So, I’m selecting the 100 most populated ones. Data observations are census tracts, so I have to aggregate voters (Cens) by municipality, and then to select the top 100. Also, I’m generating the difference between indy and unionist vote, as later I want to order them by difference (Dif).

#SELECTING 100 MOST POPULATED MUNICIPALITIES
#Finding them
muni <- df2 %>% group_by(MuniNO) %>% 
  summarise(
    Cens = sum(Cens),
    Indy = median(IndyCent),
    Unio = median(UnioCent1)
  ) %>% 
  mutate(
    Dif = Indy-Unio
  ) %>% 
  arrange(desc(Cens)) %>% 
  top_n(100, wt = Cens)

#Filtering data (selection of census tracts of the 100 municipalities)
df2 <- df2 %>% filter(MuniNO %in% muni$MuniNO) 

df2 %>% group_by(MuniNO) %>% 
  summarise(voters = sum(Cens)) %>% 
  arrange(desc(voters))
## # A tibble: 100 x 2
##                         MuniNO  voters
##                          <chr>   <dbl>
##  1                   Barcelona 1141101
##  2 Hospitalet de Llobregat, l'  174741
##  3                    Badalona  155938
##  4                    Terrassa  153136
##  5                    Sabadell  152499
##  6                      Lleida   92161
##  7                   Tarragona   90140
##  8                      Mataró   86201
##  9    Santa Coloma de Gramenet   78759
## 10                        Reus   70160
## # ... with 90 more rows

Ordering factor data and changing data structure

By default, factor data order is alphabetically. But I want to show in the plot first the most populated municipalities ordered by the difference between indy vote and unionist vote (IndyCent - UnioCent1). Then I have to reorder factors by difference, calculated previously (Dif).

#ORDERING FACTORS
#Ordering by Dif
df3 <- left_join(muni, df2, by = "MuniNO") %>% 
  arrange(desc(Dif, MuniNO))

#Transform into factor MuniNO, with factor order from Dif order
lev <- unique(df3$MuniNO)
df3$MuniNO <- factor(df3$MuniNO, levels = lev)

For a lesser code, I change data structure with gather(), allowing the use of only one geom_joy(), instead of two geom_joy().

#Changing data structure for plot
df3 <- df3 %>% select(MuniNO, IndyCent, UnioCent1) %>% 
  gather(IndyCent, UnioCent1, key = "Option", value = "Percentage") %>% 
  arrange(MuniNO, Option)

Plot

And, finally, the plot:

#Indy vs Unionist
df3 %>%
  ggplot(aes(y = MuniNO %>% fct_rev())) +
  geom_joy(aes(x = Percentage, fill = Option), alpha = 0.5) +
  scale_fill_manual(values = c("royalblue4", "firebrick"),
                    labels = c("Indy", "Unionist")) +
  labs(x = "Vote (%)",
       y = "Municipality",
       title = "Indy vs Unionist vote in the 100 most populated municipalities in 2015 Catalan elections",
       subtitle = "Analysis unit: census tract (n = 3,730) | Ordered by difference Indy-Unionist vote",
       caption = "Marc Belzunces (@marcbeldata) | Data source: Gencat") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top", legend.title = element_blank())

_config.yml

Update 23/07/2017: ggjoy’s author, Claus Wilke, published a nice code example from a previous plot of mine in the official documentation, that clearly visually improves the plot. I’m using it here. First, a trick to generate alternate intensity colors, and then the plot with some other improvements:

#Trick for color representation
df3 <- df3 %>% mutate(
  Alt = (as.numeric(as.factor(MuniNO)) - 1) %% 2 + 1,
  Rep = str_c(Option, Alt)
)

#ggjoy example code version
df3 %>%
  ggplot(aes(y = MuniNO %>% fct_rev())) +
  geom_joy(aes(x = Percentage, fill = Rep), alpha = 0.7, color = "grey90") +
  scale_fill_manual(breaks = c("IndyCent1", "UnioCent11"),
                    labels = c(IndyCent1 = "Indy", UnioCent11 = "Unionist"),
                    values = c("#ff0000", "#ff8080", "#0000ff", "#8080ff")
                    ) +
  labs(x = "Vote (%)",
       y = "Municipality",
       title = "Indy vs Unionist vote in the 100 most populated municipalities in 2015 Catalan elections",
       subtitle = "Analysis unit: census tract (n = 3,730) | Ordered by difference Indy-Unionist vote",
       caption = "Marc Belzunces (@marcbeldata) | Data source: Gencat") +
  scale_y_discrete(expand = c(0.01, 0)) +
  scale_x_continuous(limits = c(0, 100), expand = c(0.01, 0)) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top",
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust= 0.5)
        )

_config.yml

Package ggjoy is really powerful!

Written on July 22, 2017