Includes a Google Colab notebook ([https://github.com/somos-ubb/Lyrics_Gender_Violence/blob/main/code/BETO/model.ipynb]) used to generate a model adjusted to our purpose, gender-based violence against women. We use as a base the Spanish BERT model available at ([https://github. com/dccuchile/beto])
It includes two versions of GBV_dataset: GBV_dataset1000.csv, a corpus of 1,000 song lyrics, labeled as {0: without gender-based violence; 1: with gender-based violence}, and GBV_dataset1400_.csv, a corpus of 1,400 song lyrics. Its construction was based on new examples collected and previous work relabeled by an expert in gender-based approaches:
- GBV Spanish Corpus available at https://github.com/somos-ubb/Lyrics_Gender_Violence [1] from Corpus folder
- Augmented DataSet available at https://github.com/somos-ubb/DataAugmentation [2]
- Sexism in the lyrics of the most listened to songs in Spain available at https://github.com/mscasanova/SexismInLyrics [3]. In this case, we selected lyrics with content related to sexual harassment, rape, sexual assault, and physical violence, among others, against women.
[1] Calbullanca Viluñir, R., Segura Navarrete, A., Vidal-Castro, C., & Martínez-Araneda, C. (2024). Corpus of song lyrics in Spanish labeled for gender-based violence against women (1.0.0) [Data set]. Zenodo.https://doi.org/10.5281/zenodo.13370289
[2] Gutiérrez, R., Segura Navarrete, A. A., Martínez-Araneda, C., & Vidal-Castro, C. (2024). Augmented DataSet [Data set]. Zenodo.https://doi.org/10.5281/zenodo.12802358
[3] Casanovas-Buliart, L., Álvarez-Cueva, P., & Castillo, C. (2024). Evolution over 62 years: an analysis of sexism in the lyrics of the most-listened-to songs in Spain. Cogent Arts & Humanities, 11(1). https://doi.org/10.1080/23311983.2024.2436723
Calbullanca Viluñir, R., Segura-Navarrete, A., Vidal-Castro, C., & Martínez-Araneda, C. (2024). Corpus of Song Lyrics in Spanish Labeled for Gender-Based Violence against Women (Version 1.0.0) [Data set]. Zenodo. [https://doi.org/10.5281/zenodo.13370289]
Segura-Navarrete, A., Martínez-Araneda, C., Quintana-Reyes, C., Vidal-Castro, C., & Gómez-Meneses, P. (2026). somos-ubb/Lyrics_Gender_Violence: Gender-based violence DataSet (GBV_dataset1400) (1.0.1) [Data set]. Zenodo. [https://doi.org/10.5281/zenodo.18157160]
date-updated: January 5, 2026
