The Social Media Hate Speech Barometer

Making-of

Sebastian Sauer, Alexander Piazza, Sigurd Schacht

University of Ansbach

July 14, 2023

Project objective

To provide applied researcher a template for speech classification

Definition of hate speech¹

flowchart LR
subgraph ways[acting]
  direction LR
  speech
  writing
  behavior
end
subgraph hurt[that aims to hurt]
  direction LR
  attacks
  lang[pejorative language]
end
subgraph targets[people]
  direction LR
  individuals
  groups
end
subgraph who[based on who they are]
  direction LR
  race
  gender
  etc.
end
ways --> hurt --> targets --> who

Hate speech as a menace to society

flowchart LR
sm[Social Media] --> hs[hate speech] 
hs --> dem[destabilizes democracy]
hs --> civ[civil rights]
hs --> mh[mental health]
hs --> ph[psychosocial health]

Calvert (1997), Cinelli et al. (2021), Castaño-Pulgarín et al. (2021), Chan, Ghose, and Seamans (2016)

Machine learning for detecting hate speech? See Hartvigsen et al. (2022), Velankar, Patil, and Joshi (2022)

Overview on design ideas

Project management via targets
Machine learning via tidymodels in
Collaboration platform via Github

1. `targets` manages outdated computational objects

2. Machine learning standard pipeline

flowchart LR
prepData[Import and </br>prepare data]  --> folds
subgraph folds[for each fold do]
  subgraph tune[for each tune value do]
    direction TB
    prep2[prepare data] --> fit[fit model] --> predict[predict on</br>assessment data]
  end
end
folds --> fit2[fit on train data]
fit2 -->pred2[predict on test data]
pred2 --> performance[assess</br>performance]

2. `library(tidymodels)`

d_split <- initial_split(mtcars)

ranger_recipe <- recipe(formula = am ~ ., data = mtcars) 

ranger_spec <- 
  rand_forest(mtry = tune(), min_n = tune(), trees = 1000) %>% 
  set_mode("classification") %>% 
  set_engine("ranger") 

rsmpl <- vfold_cv(d_train) 

ranger_workflow <- workflow() %>% 
  add_recipe(ranger_recipe) %>% 
  add_model(ranger_spec) 

ranger_tune <- tune_grid(ranger_workflow, 
                         resamples = stop("add your rsample object"), 
                         grid = stop("add number of candidate points"))

2. Machine learning pipeline

3. Collaborate via `git(hub)`

That’s all folks! 👋

QR-code with link to the slides

Sebastian Sauer

sebastian.sauer(å†)hs-ansbach.de

https://sebastiansauer.github.io/hate-speech-barometer/talks/bmt-2023

Technical details

Date of last update: 2023-07-14 08:01:26.

 setting  value
 version  R version 4.2.1 (2022-06-23)
 os       macOS Big Sur ... 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Madrid
 date     2023-07-14
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

References

Calvert, C. 1997. “Hate Speech and Its Harms: A Communication Theory Perspective.” Journal of Communication 47 (1): 4–19. https://doi.org/10.1111/j.1460-2466.1997.tb02690.x.

Castaño-Pulgarín, Sergio Andrés, Natalia Suárez-Betancur, Luz Magnolia Tilano Vega, and Harvey Mauricio Herrera López. 2021. “Internet, Social Media and Online Hate Speech. Systematic Review.” Aggression and Violent Behavior 58 (May): 101608. https://doi.org/10.1016/j.avb.2021.101608.

Chan, Jason, Anindya Ghose, and Robert Seamans. 2016. “The Internet and Racial Hate Crime: Offline Spillovers from Online Access.” Management Information Systems Quarterly 40 (2): 381–403. https://aisel.aisnet.org/misq/vol40/iss2/8.

Cinelli, Matteo, Andraž Pelicon, Igor Mozetič, Walter Quattrociocchi, Petra Kralj Novak, and Fabiana Zollo. 2021. “Dynamics of Online Hate and Misinformation.” Scientific Reports 11 (1, 1): 22083. https://doi.org/10.1038/s41598-021-01487-w.

Hartvigsen, Thomas, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. “ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection.” July 14, 2022. https://doi.org/10.48550/arXiv.2203.09509.

Velankar, Abhishek, Hrushikesh Patil, and Raviraj Joshi. 2022. “A Review of Challenges in Machine Learning Based Automated Hate Speech Detection.” September 12, 2022. https://doi.org/10.48550/arXiv.2209.05294.