Language shift between Catalan and Spanish linguistic groups in the Balearic Islands (2010)

I analyze data of the Modular Survey of Social Habits (EMHS 2010) from the Government of the Balearic Islands. This survey contains linguistic variables focused in Catalan and Spanish languages, and it’s one of the few surveys in the Catalan linguistic area that includes this kind of information. Data shows that both groups, Catalan native speakers (CatNS) and Spanish native speakers (EspNS) transmit their group language with high loyalty. As Catalan sociolinguists pointed out, there is some language shift from EspNS transmitting Catalan to their children. Data shows that, but in very small figures. In addition to this shift, CatNS also exhibits shift language into Spanish in the same magnitude, fact didn’t point out previously, at least publicly. The main reason to the language shift seems to be the language of the couple in linguistically mixed couples, with significant differences between Catalan and Spanish Linguistic groups. In conclusion, the exogenous linguistic group (EspNS) shows the same behavior as the endogenous linguistic group (CatNS) in language transmission, in the Balearic Islands. EspNS shows significant different behavior compare to the other exogenous linguistic groups, which show clearly less linguistic loyalty and more language shift. The survey didn’t include any variable about birthrate, a key factor to determine which of the linguistic groups are expanding demographically and which are not. Code for reproduce the analysis in R is provided.

1 Selection of variables and observations

The Survey contains 298 variables for 3,510 observations (interviews). Codebook for the variables can be found in the webpage of the survey. So, we first download the data, select the variables (geographic, gender, age and linguistic ones) and the individuals with children, as we are going to analyze the transmission of native language between parents and children.

#Read the data from Internet
fileUrl <- "http://ibestat.caib.es/ibfiles/content/files/microdades/EMHS/HIA.csv"
data <- read.csv(fileUrl)
language <- data[ ,grepl("ling", names(data))]
data <- cbind(data[,c(4:6,8:9)], language)
cat("Number of samples:",dim(data)[1])
## Number of samples: 3510
#Selection of those who have children
data <- data[data$ling_4g != 5,]
cat("Number of samples:",dim(data)[1])
## Number of samples: 2838
cat("Number of variables:",dim(data)[2])
## Number of variables: 34

The resulting dataset contains 34 variables and 80% of the original observations.

2 Language shift in Catalan native speakers (CatNS)

First, we focus our attention in the CatNS. I assume a CatNS is someone who speaks Catalan with both parents, or with one parent if he has one-parent family. Later, I consider which language speaks to their children.

#Speaks Catalan with both parents
catalan.both <- data[data$ling_4a == 1 & data$ling_4b == 1,]
#Have only 1 parent
catalan.mother <- data[data$ling_4a == 1 & data$ling_4b == 5,]
catalan.father <- data[data$ling_4a == 5 & data$ling_4b == 1,]
#Total Catalan native speakers
catalan <- rbind(catalan.both, catalan.mother)
catalan <- rbind(catalan, catalan.father)
#Speaks Catalan to descendants
cat.transmission.cat <- catalan[catalan$ling_4g ==1,]
#Speaks Spanish to desdendants
cat.transmission.esp <- catalan[catalan$ling_4g ==2,]
#Coef Catalan
general.cat.cat <- round((dim(cat.transmission.cat)[1]/dim(catalan)[1])*100,2)
#Coef Spanish
general.cat.esp <- round((dim(cat.transmission.esp)[1]/dim(catalan)[1])*100,2)
#Table
table <- data.frame(matrix(ncol=3, nrow=1))
table[1,] <- paste(c("Balearic Islands",general.cat.cat, general.cat.esp))
names(table) <- c("Catalan native speakers", "Catalan transmission", "Spanish transmission")
cat("Number of samples:", dim(catalan)[1])
## Number of samples: 1017
cat("Number of samples (Catalan transmission):", dim(cat.transmission.cat)[1])
## Number of samples (Catalan transmission): 927
cat("Number of samples (Spanish transmission):", dim(cat.transmission.esp)[1])
## Number of samples (Spanish transmission): 35
table
##   Catalan native speakers Catalan transmission Spanish transmission
## 1        Balearic Islands                91.15                 3.44

So, in Balearic Islands 91% (927 of 1017) of CatNS transmit their language, while 3% (35 of 1017) shift language and use Spanish with their children.

What is the reason of language shift in CatNS? Let’s see which language they speak to their couple:

lang.couple <- as.data.frame(summary(as.factor(cat.transmission.esp$ling_4f)))
row.names(lang.couple) <- c("Catalan", "Spanish", "Other language", "Doesn't have a couple")
names(lang.couple) <- "Language spoken with the couple (frequency)"
lang.couple
##                       Language spoken with the couple (frequency)
## Catalan                                                         2
## Spanish                                                        30
## Other language                                                  1
## Doesn't have a couple                                           2

So the main reason (85%) is that the CatNS speaks Spanish with their couple and use this same language to their children. We don’t know if the CatNS is using Spanish as the native language of the couple, but I assume that (linguistically mixed couple). Only 6% of the language shift occurs in couples where both are CatNS. Otherwise, consider that sample is very small (n=35).

Next question it’s if CatNS in linguistically mixed couples transmit more Catalan or Spanish. Let’s see what happens when the transmitted language of CatNS in a mixed couple is Catalan:

lang.couple2 <- as.data.frame(summary(as.factor(cat.transmission.cat$ling_4f)))
row.names(lang.couple2) <- c("Catalan", "Spanish", "Catalan and Spanish", "Other", "Doesn't have a couple")
names(lang.couple2) <- "Language spoken with the couple (frequency)"
lang.couple2
##                       Language spoken with the couple (frequency)
## Catalan                                                       818
## Spanish                                                        55
## Catalan and Spanish                                             6
## Other                                                           4
## Doesn't have a couple                                          44

So, a total of 85 CatNS forms a linguistically mixed couple, of which 55 (64%) transmit Catalan and 30 (35%) transmit Spanish. Mixed couple with a CatNS in the Balearic Islands favors Catalan clearly.

Going back to those CatNS who transmit Spanish in a linguistically mixed couple, what about the sex (I mean, the gender)?

gender <- as.data.frame(summary(as.factor(cat.transmission.esp$sexo)))
row.names(gender) <- c("Man", "Women")
names(gender) <- "Number"
gender
##       Number
## Man       13
## Women     22

So, it seems that language shift in CatNS linguistic group affects more women than men.

Some methodological warning: codebook shows that gender variable has value 1 for men and 6 for women. But dataset contains 1’s and 2’s in this variable. I’m assuming 1 for men and 2 for women.

2.1 Is there any differences between generations?

Next question is if language shift is more common now than in the past. I compared two age groups: younger or older than 40 years.

#Younger than 40 years
cat.cat.young <- cat.transmission.cat[cat.transmission.cat$fnac_edad <= 40,]
cat.esp.young <- cat.transmission.esp[cat.transmission.esp$fnac_edad <= 40,]
general.cat.cat.young <- round((dim(cat.cat.young)[1]/dim(catalan[catalan$fnac_edad <= 40,])[1])*100,2)
general.cat.esp.young <- round((dim(cat.esp.young)[1]/dim(catalan[catalan$fnac_edad <= 40,])[1])*100,2)
#Older than 40 years
cat.cat.old <- cat.transmission.cat[cat.transmission.cat$fnac_edad > 40,]
cat.esp.old <- cat.transmission.esp[cat.transmission.esp$fnac_edad > 40,]
general.cat.cat.old <- round((dim(cat.cat.old)[1]/dim(catalan[catalan$fnac_edad > 40,])[1])*100,2)
general.cat.esp.old <- round((dim(cat.esp.old)[1]/dim(catalan[catalan$fnac_edad > 40,])[1])*100,2)
#Table
table[2,] <- paste(c("Balearic Islands <= 40 years", general.cat.cat.young, general.cat.esp.young))
table[3,] <- paste(c("Balearic Islands > 40 years", general.cat.cat.old, general.cat.esp.old))
table
##        Catalan native speakers Catalan transmission Spanish transmission
## 1             Balearic Islands                91.15                 3.44
## 2 Balearic Islands <= 40 years                84.13                 7.38
## 3  Balearic Islands > 40 years                 93.7                 2.01

It seems that language shift it’s increasing, with linguistic fidelity in CatNS falling from 94% to 84%, ten percentage points.

Is the reason that CatNS tends to use more Spanish with the couple?

cat("CatNS use Spanish in couple (less than 40 years):",nrow(catalan[catalan$fnac_edad <= 40 & catalan$ling_4f == 2,])/nrow(catalan[catalan$fnac_edad <= 40,])*100,"%")
## CatNS use Spanish in couple (less than 40 years): 15.12915 %
cat("CatNS use Spanish in couple (more than 40 years):",nrow(catalan[catalan$fnac_edad > 40 & catalan$ling_4f == 2,])/nrow(catalan[catalan$fnac_edad > 40,])*100,"%)")
## CatNS use Spanish in couple (more than 40 years): 8.176944 %)

It seems so. 8% of CatNS use Spanish with the couple in the older group, while 15% in the younger one. The proportion has doubled.

My interpretation is that linguistically mixed couple are increasing in the Balearic Islands, as it can be expected. Spanish linguistic group in the Balearic Islands comes from a recent massive immigration process (last 50 years from Spain, last 15 from Latin America). In this context, EspNS couple tends to maintain Spanish (as we will see later), while CatNS is increasingly changing into Spanish.

2.2 Is there any difference by island?

Now it’s time to analyze if it’s any difference between islands in the Balearic archipelago. We come back to the total observations in the CatNS group.

island <- unique(data$isla)

for (i in island) {
  cat.cat <- cat.transmission.cat[cat.transmission.cat$isla == i,]
  cat.esp <- cat.transmission.esp[cat.transmission.esp$isla == i,]
  cat.cat <- round((dim(cat.cat)[1]/dim(catalan[catalan$isla ==i,])[1])*100,2)
  cat.esp <- round((dim(cat.esp)[1]/dim(catalan[catalan$isla ==i,])[1])*100,2)
  table[3+i,] <- paste(c(i, cat.cat, cat.esp))
  }
table$`Catalan native speakers`[table$`Catalan native speakers` == 1] <- "Mallorca Island"
table$`Catalan native speakers`[table$`Catalan native speakers` == 2] <- "Menorca Island"
table$`Catalan native speakers`[table$`Catalan native speakers` == 3] <- "Eivissa Island"
table$`Catalan native speakers`[table$`Catalan native speakers` == 4] <- "Formentera Island"
table
##        Catalan native speakers Catalan transmission Spanish transmission
## 1             Balearic Islands                91.15                 3.44
## 2 Balearic Islands <= 40 years                84.13                 7.38
## 3  Balearic Islands > 40 years                 93.7                 2.01
## 4              Mallorca Island                89.61                 4.13
## 5               Menorca Island                97.12                 1.44
## 6               Eivissa Island                93.33                 1.33
## 7            Formentera Island                94.23                 1.92

It’s seems that language shift to Spanish in CatNS affects more Mallorca, the most populated island of the archipelago.

2.3 Is there any difference by County in Mallorca?

Survey specifies Counties (comarques). The only island where the county doesn’t coincide with the whole island, is Mallorca, which has 7 counties. Let’s see:

county <- unique(data$c_nuts4)
county <- county[1:7]

for (i in county) {
  cat.cat <- cat.transmission.cat[cat.transmission.cat$c_nuts4 == i,]
  cat.esp <- cat.transmission.esp[cat.transmission.esp$c_nuts4 == i,]
  cat.cat <- round((dim(cat.cat)[1]/dim(catalan[catalan$c_nuts4 == i,])[1])*100,2)
  cat.esp <- round((dim(cat.esp)[1]/dim(catalan[catalan$c_nuts4 == i,])[1])*100,2)
  table[nrow(table)+1,] <- paste(c(i, cat.cat, cat.esp))
  }
table$`Catalan native speakers`[table$`Catalan native speakers` == 40701] <- "Pla County"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40702] <- "Raiguer County"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40703] <- "Nord County"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40704] <- "Tramuntana"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40705] <- "Sud County"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40706] <- "Llevant County"
table$`Catalan native speakers`[table$`Catalan native speakers` == 40707] <- "Badia de Palma County"
table
##         Catalan native speakers Catalan transmission Spanish transmission
## 1              Balearic Islands                91.15                 3.44
## 2  Balearic Islands <= 40 years                84.13                 7.38
## 3   Balearic Islands > 40 years                 93.7                 2.01
## 4               Mallorca Island                89.61                 4.13
## 5                Menorca Island                97.12                 1.44
## 6                Eivissa Island                93.33                 1.33
## 7             Formentera Island                94.23                 1.92
## 8                    Pla County                79.38                 3.09
## 9                Raiguer County                91.74                 7.44
## 10                  Nord County                97.87                 1.06
## 11                   Tramuntana                97.03                    0
## 12                   Sud County                90.76                 5.04
## 13               Llevant County                91.26                 2.91
## 14        Badia de Palma County                80.17                 7.76

Here the problem is that with county segmentation of the data, sample is going down, so probably we are losing statistical significance. Nevertheless, language shift in the CatN affects more the counties of Badia the Palma (the most populated and where the capital, Palma, is) and Pla, in the middle of the island.

2.4 Conclusion in the CatNS group

Linguistic loyalty is very high in the CatNS (91%), while linguistic shift to Spanish is low (3%), although it’s increasing over time. The main reason of the linguistic shift into Spanish is the formation of mixed couples, although mixed coupled favors Catalan over the Spanish in transmission to their children.

Unfortunately, we don’t have data about birthrate, so we can’t say if CatNS group in Balearic Islands is increasing or decreasing as a result of demographics.

3 Transmission of languages in Spanish native speakers (EspNS)

In this section I follow the same analysis performed in the CatNS group, to the EspNS group.

#Speaks Spanish with both parents
spanish.both <- data[data$ling_4a == 2 & data$ling_4b == 2,]
#Have only 1 parent
spanish.mother <- data[data$ling_4a == 2 & data$ling_4b == 5,]
spanish.father <- data[data$ling_4a == 5 & data$ling_4b == 2,]
#Total Spanish native speakers
spanish <- rbind(spanish.both, spanish.mother)
spanish <- rbind(spanish, spanish.father)
#Speaks Catalan to descendants
esp.transmission.cat <- spanish[spanish$ling_4g ==1,]
#Speaks Spanish to desdendants
esp.transmission.esp <- spanish[spanish$ling_4g ==2,]
#Coef Spanish
general.esp.cat <- round((dim(esp.transmission.cat)[1]/dim(spanish)[1])*100,2)
#Coef Spanish
general.esp.esp <- round((dim(esp.transmission.esp)[1]/dim(spanish)[1])*100,2)
#Table
table2 <- data.frame(matrix(ncol=3, nrow=1))
table2[1,] <- paste(c("Balearic Islands",general.esp.cat, general.esp.esp))
names(table2) <- c("Spanish native speakers", "Catalan transmission", "Spanish transmission")
cat("Number of samples:", dim(spanish)[1])
## Number of samples: 891
cat("Number of samples (Catalan transmission):", dim(esp.transmission.cat)[1])
## Number of samples (Catalan transmission): 56
cat("Number of samples (Spanish transmission):", dim(esp.transmission.esp)[1])
## Number of samples (Spanish transmission): 775
table2
##   Spanish native speakers Catalan transmission Spanish transmission
## 1        Balearic Islands                 6.29                86.98

This group is formed by less members that in the CatNS (891 vs 1017). As in the CatNS group, the EspNS are highly loyal in language transmission, by less that CatNS (EspNS 87%, CatNS 91%). Language Shift affects more in the EspNS: 6%. In overall, the differences are minimum, and could be explained by statistical significance/error of the sample.

As before, what is the reason why 56 people change Spanish to Catalan to speak to their children? Let see which are the language they speak to their couple:

lang.couple <- as.data.frame(summary(as.factor(esp.transmission.cat$ling_4f)))
row.names(lang.couple) <- c("Catalan", "Spanish", "Catalan and Spanish", "Doesn't have a couple")
names(lang.couple) <- "Language spoken with the couple"
lang.couple
##                       Language spoken with the couple
## Catalan                                            26
## Spanish                                            23
## Catalan and Spanish                                 2
## Doesn't have a couple                               5

Here we find a new information. While EspNS transmit Catalan to their children because they use Catalan with the couple, we also find that more or less the same amount of linguistic shift in the EspNS (23 vs 26) is produced in couples where both members are EspNS. This was pointed out by sociolinguistic as support of EspNS towards the Catalan language. But, we must admit that the figures are anecdotal, at least in the Balearic Island: this suport to the Catalan only affects 2.5% of EspNS.

EspNS tends to speak clearly less the native language of their couples than the CatNS (compare figures between both groups), but when they do, they clearly tend to transmit Catalan to their children:

#Mixed couple 
lang.couple2 <- as.data.frame(summary(as.factor(esp.transmission.esp$ling_4f)))
row.names(lang.couple2) <- c("Catalan", "Spanish", "Catalan and Spanish", "Other", "Doesn't have a couple")
names(lang.couple2) <- "Language spoken with the couple (frequency)"
lang.couple2
##                       Language spoken with the couple (frequency)
## Catalan                                                         6
## Spanish                                                       738
## Catalan and Spanish                                             4
## Other                                                           4
## Doesn't have a couple                                          23

We have 5 (16%) EspNS who speak Catalan to their couple and Spanish to their children, and 26 (84%) who speaks Catalan to the couple and children.

So, in conclusion, the language shift in EspNS is very low. Half is explained by the language spoken to the couple, and the other half by support to the Catalan language. Both figures are really low.

And what about the gender?

gender <- as.data.frame(summary(as.factor(esp.transmission.cat$sexo)))
row.names(gender) <- c("Man", "Women")
names(gender) <- "Number"
gender
##       Number
## Man       24
## Women     32

More women than men, again (but with closer figures).

3.1 Is there any differences between generations?

As before, I compared two age groups: younger or older than 40 years.

#Younger than 40 years
esp.cat.young <- esp.transmission.cat[esp.transmission.cat$fnac_edad <= 40,]
esp.esp.young <- esp.transmission.esp[esp.transmission.esp$fnac_edad <= 40,]
general.esp.cat.young <- round((dim(esp.cat.young)[1]/dim(spanish[spanish$fnac_edad <= 40,])[1])*100,2)
general.esp.esp.young <- round((dim(esp.esp.young)[1]/dim(spanish[spanish$fnac_edad <= 40,])[1])*100,2)
#Older than 40 years
esp.cat.old <- esp.transmission.cat[esp.transmission.cat$fnac_edad > 40,]
esp.esp.old <- esp.transmission.esp[esp.transmission.esp$fnac_edad > 40,]
general.esp.cat.old <- round((dim(esp.cat.old)[1]/dim(spanish[spanish$fnac_edad > 40,])[1])*100,2)
general.esp.esp.old <- round((dim(esp.esp.old)[1]/dim(spanish[spanish$fnac_edad > 40,])[1])*100,2)
#Table
table2[2,] <- paste(c("Balearic Islands <= 40 years", general.esp.cat.young, general.esp.esp.young))
table2[3,] <- paste(c("Balearic Islands > 40 years", general.esp.cat.old, general.esp.esp.old))
table2
##        Spanish native speakers Catalan transmission Spanish transmission
## 1             Balearic Islands                 6.29                86.98
## 2 Balearic Islands <= 40 years                 7.29                83.17
## 3  Balearic Islands > 40 years                 5.48                90.06

As in the CatNS group, in EspNS language shift is increasing.

Are EspNS increasing the use of Catalan with the couple?

cat("EspNS use Catalan in couple (less than 40 years):",nrow(spanish[spanish$fnac_edad <= 40 & spanish$ling_4f == 1,])/nrow(spanish[spanish$fnac_edad <= 40,])*100,"%")
## EspNS use Catalan in couple (less than 40 years): 4.020101 %
cat("EspNS use Catalan in couple (more than 40 years):",nrow(spanish[spanish$fnac_edad > 40 & spanish$ling_4f == 1,])/nrow(spanish[spanish$fnac_edad > 40,])*100,"%)")
## EspNS use Catalan in couple (more than 40 years): 4.056795 %)

Answer: No. Here we find a new difference with the CatNS groups. While CatNS group tends to use more Spanish with the couple, EspNS use the same (anecdotal) Catalan with the couple through the time: 4%.

3.2 Is there any difference by island or county?

As before, first by island:

#Islands
island <- unique(data$isla)

for (i in island) {
  esp.cat <- esp.transmission.cat[esp.transmission.cat$isla == i,]
  esp.esp <- esp.transmission.esp[esp.transmission.esp$isla == i,]
  esp.cat <- round((dim(esp.cat)[1]/dim(spanish[spanish$isla == i,])[1])*100,2)
  esp.esp <- round((dim(esp.esp)[1]/dim(spanish[spanish$isla == i,])[1])*100,2)
  table2[3+i,] <- paste(c(i, esp.cat, esp.esp))
  }
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 1] <- "Mallorca Island"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 2] <- "Menorca Island"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 3] <- "Eivissa Island"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 4] <- "Formentera Island"
table2
##        Spanish native speakers Catalan transmission Spanish transmission
## 1             Balearic Islands                 6.29                86.98
## 2 Balearic Islands <= 40 years                 7.29                83.17
## 3  Balearic Islands > 40 years                 5.48                90.06
## 4              Mallorca Island                 7.08                84.97
## 5               Menorca Island                 4.23                95.77
## 6               Eivissa Island                 1.89                94.34
## 7            Formentera Island                 9.09                86.36

And by county (comarca):

#Counties
county <- unique(data$c_nuts4)
county <- county[1:7]

for (i in county) {
  esp.cat <- esp.transmission.cat[esp.transmission.cat$c_nuts4 == i,]
  esp.esp <- esp.transmission.esp[esp.transmission.esp$c_nuts4 == i,]
  esp.cat <- round((dim(esp.cat)[1]/dim(spanish[spanish$c_nuts4 == i,])[1])*100,2)
  esp.esp <- round((dim(esp.esp)[1]/dim(spanish[spanish$c_nuts4 == i,])[1])*100,2)
  table2[nrow(table2)+1,] <- paste(c(i, esp.cat, esp.esp))
  }
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40701] <- "Pla County"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40702] <- "Raiguer County"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40703] <- "Nord County"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40704] <- "Tramuntana"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40705] <- "Sud County"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40706] <- "Llevant County"
table2$`Spanish native speakers`[table2$`Spanish native speakers` == 40707] <- "Badia de Palma County"
table2
##         Spanish native speakers Catalan transmission Spanish transmission
## 1              Balearic Islands                 6.29                86.98
## 2  Balearic Islands <= 40 years                 7.29                83.17
## 3   Balearic Islands > 40 years                 5.48                90.06
## 4               Mallorca Island                 7.08                84.97
## 5                Menorca Island                 4.23                95.77
## 6                Eivissa Island                 1.89                94.34
## 7             Formentera Island                 9.09                86.36
## 8                    Pla County                   25                65.62
## 9                Raiguer County                13.33                85.19
## 10                  Nord County                 3.12                93.75
## 11                   Tramuntana                18.42                68.42
## 12                   Sud County                 6.82                93.18
## 13               Llevant County                 4.29                84.29
## 14        Badia de Palma County                 2.09                86.61

It’s seems that language shift in the EspNS group is higher in those islands and counties with more rural areas (less EspNS immigration).

4 Comparing language shift in CatNS and EspNS groups

Finally, we compare the figures of the two main linguistic groups in the Balearic Island:

final.table <- cbind(table, table2)
final.table <- final.table[,-4]
final.table <- final.table[,c(1:2,5,3:4)]
names(final.table) <- c("", "CatNS loyalty", "EspNS loyalty", "CatNS shift", "EspNS shift")
final.table
##                      CatNS loyalty EspNS loyalty CatNS shift EspNS shift
## 1 Balearic Islands         91.15         86.98        3.44       6.29
## 2  B.I <= 40 years         84.13         83.17        7.38       7.29
## 3   B.I > 40 years         93.70        90.06        2.01        5.48
## 4  Mallorca Island         89.61         84.97        4.13       7.08
## 5   Menorca Island         97.12         95.77        1.44       4.23
## 6   Eivissa Island         93.33         94.34        1.33       1.89
## 7Formentera Island         94.23         86.36        1.92       9.09
## 8       Pla County         79.38         65.62        3.09      25.00
## 9   Raiguer County         91.74         85.19        7.44      13.33
## 10     Nord County         97.87         93.75        1.06       3.12
## 11      Tramuntana         97.03         68.42           0      18.42
## 12      Sud County         90.76         93.18        5.04       6.82
## 13  Llevant County         91.26         84.29        2.91       4.29
## 14    Palma County         80.17         86.61        7.76       2.09

5 The role of bilinguals and other native speakers in Catalan/Spanish competition

Here, I analyze the role of bilinguals, those who speaks different languages with one of the two parents. One-parent families are excluded, obviously. As observations are lower, we can’t segment the data with statistical significance, so we only analyze the general figures.

As usually nobody speaks several languages simultaneously with the same own child, bilinguals have to choose one language to transmit to the next generation. Are the bilinguals favoring more Catalan or Spanish?

#Bilingual with parents
bilingual <- data[data$ling_4a !=  data$ling_4b,]
#Speaks Catalan to descendants
bil.transmission.cat <- bilingual[bilingual$ling_4g ==1,]
#Speaks Spanish to descendants
bil.transmission.esp <- bilingual[bilingual$ling_4g ==2,]
#Coef Catalan
general.bil.cat <- round((dim(bil.transmission.cat)[1]/dim(bilingual)[1])*100,2)
#Coef Spanish
general.bil.esp <- round((dim(bil.transmission.esp)[1]/dim(bilingual)[1])*100,2)
#Table
table3 <- data.frame(matrix(ncol=3, nrow=1))
table3[1,] <- paste(c("Balearic Island",general.bil.cat, general.bil.esp))
names(table3) <- c("Bilingual native speakers", "Catalan transmission", "Spanish transmission")
cat("Number of samples:", dim(bilingual)[1])
## Number of samples: 295
cat("Number of samples (Catalan transmission):", dim(bil.transmission.cat)[1])
## Number of samples (Catalan transmission): 172
cat("Number of samples (Spanish transmission):", dim(bil.transmission.esp)[1])
## Number of samples (Spanish transmission): 79
table3
##   Bilingual native speakers Catalan transmission Spanish transmission
## 1           Balearic Island                58.31                26.78

Clearly Catalan. 58% vs 27%.

And the speakers of other languages (OtherNS)?

#Other native speakers
other <- data[data$ling_4a == 4 &  data$ling_4b == 4,]
#Other to children
other.transmission.other <- other[other$ling_4g ==4,]
#Catalan to children
other.transmission.cat <- other[other$ling_4g ==1,]
#Spanish to children
other.transmission.esp <- other[other$ling_4g == 2,]
#Coef Other
general.other.other <- round((dim(other.transmission.other)[1]/dim(other)[1])*100,2)
#Coef Spanish
general.other.cat <- round((dim(other.transmission.cat)[1]/dim(other)[1])*100,2)
#Coef Catalan
general.other.esp <- round((dim(other.transmission.esp)[1]/dim(other)[1])*100,2)
table4 <- data.frame(matrix(ncol=4, nrow=1))
table4[1,] <- paste(c("Balearic Island",general.other.other,general.other.cat, general.other.esp))
names(table4) <- c("Other native speakers", "Other transmission", "Catalan transmission", "Spanish transmission")
cat("Number of samples:", dim(other)[1])
## Number of samples: 254
cat("Number of samples (Other transmission):", dim(other.transmission.other)[1])
## Number of samples (Other transmission): 171
cat("Number of samples (Catalan transmission):", dim(other.transmission.cat)[1])
## Number of samples (Catalan transmission): 49
cat("Number of samples (Spanish transmission):", dim(other.transmission.esp)[1])
## Number of samples (Spanish transmission): 32
table4
##            Transmission:   Other     Catalan   Spanish
## 1       Balearic Island    67.32      19.29      12.6

OtherNS transmit first their language (67%), and later Catalan (19%) and Spanish (13%). So, when OtherNS shift language they chose Catalan or Spanish in the same magnitude, although favoring Catalan.

Taking into account that we are in the linguistic Catalan area, it’s interesting to note the differences between exogenous linguistics groups. EspNS are, by far, more loyal that OtherNS to their native language. Indeed, EspNS shows basically the same behavior as the endogenous linguistic group (CatNS). Let’s compare the general figures:

total.table <- cbind(table[1,], table2[1,], table3, table4)
names(total.table) <- c("", "Catalan loyalty", "CatNS shift", "", "EspNS shift", "EspNS loyalty", "", "Bilingual to Catalan"," Bilingual to Spanish", "", "OtherNS loyalty", "OtherNS to Catalan", "OtherNS to Spanish")
total.table <- total.table[,c(-4,-7,-10)]
total.table <- total.table[,c(1:2,5,8, 3:4, 9:10, 6:7)]
total.table <- as.data.frame(t(total.table))
names <- rownames(total.table)
total.table$language <- names
rownames(total.table) <- NULL
total.table <- total.table[-1,]
total.table <- total.table[,c(2,1)]
names(total.table) <- c("Language shift", "Percentage")
total.table
##           Language shift Percentage
## 2          CatNS loyalty      91.15
## 3          EspNS loyalty      86.98
## 4        OtherNS loyalty      67.32
## 5            CatNS shift       3.44
## 6            EspNS shift       6.29
## 7     OtherNS to Catalan      19.29
## 8     OtherNS to Spanish       12.6
## 9   Bilingual to Catalan      58.31
## 10  Bilingual to Spanish      26.78

And that’s all. For any comment, please use Twitter (@marcbeldata).

Written on April 29, 2016