Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Role of NLP in Analysing Hate Speech

Role of NLP in Analysing Hate Speech

_themessier

July 17, 2023
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Role of NLP in Analysing Hate Speech
    Sarah Masud, [email protected], IIITD
    Visiting Researcher @TUM

    View Slide

  2. Introduction and Motivation

    View Slide

  3. Outline
    ● Introduction and Motivation
    ● How NLP can help in understanding (our contributions)?
    ○ Detection of Hate
    ○ Diffusion of Hate
    ○ Mitigation of Hate
    ● Open Question and Challenges?
    Disclaimer: Subsequent content has extreme
    language (verbatim from social media), which does
    not reflect the opinions of myself or my
    collaborators. Reader’s discretion is advised.

    View Slide

  4. Hatred is an age old problem
    [1]: Wiki
    [2]: Youtube
    [3], [4]: Anti-Sematics Schooling
    [5]: Radio and Rawanda, Image
    Fig 1 : List of Extremist/Controversial SubReddits [1]
    Fig3, 4: Twitter hate Speech [3]
    Fig 2: Youtube Video Incident to Violence and
    Hate Crime [2]
    Fig 5: Rwanda Genocide, 1994 [5]
    “I will surely kill thee”
    Story of Cain and Abel

    View Slide

  5. Internet’s policy w.r.t curbing Hate
    Moderated
    ● Twitter
    ● Facebook
    ● Instagram
    ● Youtube
    Semi- Moderated
    ● Reddit
    Unmoderated
    ● Gab
    ● 4chan
    ● BitChute
    ● Parler
    ● StormFront
    ● Anonymity has lead to increase in
    anti-social behaviour [1], hate speech
    being one of them.
    ● They can be studied at a macroscopic as
    well as microscopic level. [2]
    ● Exists in various mediums.
    [1]: Super, John, CyberPsychology & Behavior, 2004
    [2]: Luke Munn, Humanities and Social Sciences Communication, Article 53

    View Slide

  6. Definition of Hate Speech
    ● Hate is subjective, temporal and cultural in
    nature.
    ● UN defines hate speech as “any kind of
    communication that attacks or uses
    pejorative or discriminatory language
    with reference to a person or a group on
    the basis of who they are.” [1]
    ● Need sensitisation of social media users.
    [1]: UN hate
    [2]: Pyramid of Hate
    Fig 1: Pyramid of Hate [2]

    View Slide

  7. Workflow for Analysing and Mitigating Hate Speech
    [1]: Tanmoy and Sarah, Nipping in the bud: detection, diffusion and mitigation of hate speech on social media, ACM SIGWEB Winter, Invited Publication
    Our Contributions
    so far

    View Slide

  8. Questions we ask
    ● Question: Does spread of hate depend on the topic under consideration?
    ○ Takeaway: Yes, topical information drives hate.
    ○ Takeaway: Additionally, exogenous signals are as important as endogenous (in platform)
    signals to influence the spread of hate.
    ● Question: Is there a middle ground to help users transition from extreme hate to non-hate?
    ○ Takeaway: The way to curbing hate speech is more speech.
    ○ Takeaway: Free speech and equal opportunity to speech are not same.
    ● Question: How do different endogenous information help in detection of hate?
    ○ Takeaway: Context matter in determining hatefulness.
    ○ Takeaway: User’s recent history around a tweet captures similar psycho-linguistic patterns.

    View Slide

  9. Spread of Hate

    View Slide

  10. Hate is the New Infodemic: A Topic-aware
    Modeling of Hate Speech Diffusion on Twitter
    Sarah Masud, Subhabrata Dutta, Sakshi Makkar , Chhavi Jain , Vikram Goyal , Amitava Das , Tanmoy Chakraborty
    Published at ICDE 2021

    View Slide

  11. Literature Overview: Hate Analysis
    [1]: Ribeiro et al., WebSci’18
    [2]: Mathew et al., WebSci '19
    Fig 1: Belief Propagation to determine hatefulness of users [1]
    Fig 2: Repost DAG [2]
    ● Source: GAB as it promotes “free speech”.
    ● User and Network Level Features.
    ● They curated their own list of hateful lexicons.
    ● Initial hateful users were enlisted based on
    hate lexicon mapping of users.
    Fig 3: Difference in hateful and non-hateful cascades [2]

    View Slide

  12. Limitations of Existing Diffusion Analysis
    ● Only exploratory analysis.
    ● Consider the hate, non-hate to be separate groups. [1]
    ● Generic Information Cascade models do not take content into account, only who follows whom. [2, 3]
    ● How different topics can lead to generation and spread of hate speech in a user network?
    ● How a hateful tweet diffuses via retweets?
    Motivativation
    [1]: Mathew et al., WebSci '19
    [2]: Wang et al., ICDM’17
    [3]: Yang et al., IJCAI,19

    View Slide

  13. Proposed Hate Diffusion Specific Dataset
    ● Crawled a large-scale Twitter dataset.
    ○ Timeline
    ○ Follow network (2-hops)
    ○ Meta data
    ● Manually annotated a total of 17k tweets (k=0.58).
    ● Trained a Hate Detection model for our dataset.
    ● Additionally crawled online news articles (600k).
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  14. Hate Diffusion Specific Dataset
    Fig 1. #tag level information of RETINA [1]
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  15. Some Interesting observations
    Fig 1: Hatefulness of different users towards different hashtags in RETINA [1]
    Fig 2: Retweet cascades for hateful and non-hate tweets in RETINA [1]
    ● Different users show varying tendencies to engage in hateful content depending on the topic.
    ● Hate speech spreads faster in a shorter period.
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  16. Problem Statement
    Given a hateful tweet and associated signals, at a given time window predict if
    the given user (a follower account) will retweet the given hateful tweet. [1]
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  17. Proposed Model: RETINA
    Fig 1: Exogenous Attention Mechanism [1]
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  18. Proposed Model: RETINA
    Fig 1: Exogenous Attention Mechanism [1] Fig 2: Static Retweet Prediction [1]
    Fig 3: Dynamic Retweet Prediction [1]
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021

    View Slide

  19. Experimental Results: RETINA
    Fig 1: Baseline Comparisons [1]
    Fig 2: Behaviour of cascade for different baselines.
    Darker bars are hate [1].
    [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
    No- Exgo signal used

    View Slide

  20. Mitigation of Hate

    View Slide

  21. Proactively Reducing the Hate Intensity of Online
    Posts via Hate Speech Normalization
    Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty
    Accepted at KDD 2022

    View Slide

  22. Hate Intensity
    ● Intensity/Severity of hate speech captures the
    explicitness of hate speech.
    ● High Intensity hate is more likely to contain
    offensive lexicon, and offensive spans, direct
    attacks and mentions of target entity.
    Consuming Coffee is bad, I hate it! (the world
    can live with this opinion)
    Lets bomb every coffee shop and kill all
    coffee makers (this is a threat)
    Fig 1: Pyramid of Hate [1]
    [1]: Pyramid of Hate

    View Slide

  23. Literature Overview: Intervention
    during Tweet creation
    ● 200k users identified in the study. 50% randomly assigned to the
    control group
    ● H1: Are prompted users less likely to post the current offensive content.
    ● H2: Are prompted users less likely to post content in future.
    [1]: Katsaros et al., ICWSM ‘22
    Fig 1: User behaviour statistics as a part of intervention study [1]
    Fig 2: Twitter reply test for offense replies. [1]

    View Slide

  24. NACL Dataset
    ● Hateful samples collected from existing Hate Speech datasets.
    ● Manually annotated for Hate intensity and hateful spans.
    ● Hate Intensity is marked on a scale of 1-10.
    ● Manual generation of normalised counter-part and its intensity. (k = 0.88)
    Fig 1: Original and Normalised Intensity Distribution [1]
    Fig 2: Dataset Stats [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  25. Motivation & Evidence
    ● Reducing intensity is the stepping stone towards non-hate.
    ● Does not force to change sentiment or opinion.
    ● Evidently leads to less virality.
    Fig 1: Difference in predicted
    number of comments per set per
    iteration. [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  26. Problem Statement
    For a given hate sample 𝑡, our objective is to obtain its normalized (sensitised) form 𝑡` such
    that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1]
    𝜙
    𝑡`
    < 𝜙
    𝑡
    Fig [1]: Example of original high intensity vs normalised sentence [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  27. Proposed Method: NACL- Neural hAte speeCh normaLizer
    Hate Intensity
    Prediction (HIP)
    Hate Span
    Prediction (HSI)
    Hate Intensity
    Reduction (HIR)
    Fig 1: Flowchart of NACL [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
    Extremely Hateful
    Input
    (ORIGINAL)
    Less Hateful Input
    (SUGGESTIVE)
    HATE NORMALIZATION Extremely Hateful
    Input
    (ORIGINAL)
    User’s Choice

    View Slide

  28. Hate Intensity
    Reduction
    Overall Loss
    Reward
    Fig 1: Hate Normalization Framework [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  29. Hate Intensity Reduction (HIR)
    Fig 1: Hate Intensity Reduction Module [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  30. Tool: Detects Hateful spans and suggests changes as you type
    Fig 1: Snapshot of NACL tool [1]
    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022

    View Slide

  31. Detection of Hate

    View Slide

  32. Revisiting Hate Speech Benchmarks: From
    Data Curation to System Deployment
    Atharva Kulkarni, Sarah Masud, Vikram Goyal , Tanmoy Chakraborty
    KDD’23

    View Slide

  33. Literature Overview: Hate Dataset
    Dataset Source & Language (Modality) Year Labels Annotation
    Waseem & Hovy [1] Twitter, English, Texts 2016 R,S,N 16k, E, k = 0.84
    Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92
    Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45
    Gibert et al. [5] Stormfront, English, Texts 2018 H,N 10k, k = 0.62
    Founta et al. [4] Twitter, English, Texts 2018 H,A,SM,N 70k, C, k = ?
    Albadi et al [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81
    R- Racism
    S- Sexism
    H- Hate
    PA- Personal
    Attack
    A- Abuse
    SM- Spam
    O- Offensive
    L- Religion
    N- Neither
    I- Implicit
    E- Explicit
    [1]: Waseem & Hovy, NAACL’16
    [2]: Davidson et al., WebSci’17
    [3]: Wulczyn et al., WWW’17
    [4]: Founta at al., WebSci’18
    [5]: Gibert et al., ALW2’18
    [6]: Albadi et al., ANLP’20
    E- Internal Experts
    C- Crowd Sourced

    View Slide

  34. Dataset Source & Language (Modality) Year Labels Annotation
    Mathur et al. [1] Twitter, Hinglish, Texts 2018 H, O, N 3k, E, k = 0.83
    Rizwan et al. [3] Twitter, Urdu (Roman Urdu), Texts 2020 A, S, L, P,N 10k, E, k=?
    Gomez et al. [4] Twitter, English, Memes 2020 H, N 150k, C, k = ?
    ElSherief et al. [11] Twitter, English, Texts 2021 I,E,N
    Literature Overview: Hate Dataset
    [1]: Mathur et al., AAAI’20
    [3]: Rizwan et al., EMNLP’19
    [4]: Gomez et al., WACv’20
    ● HASOC [5], Jigsaw Kaggle [6], SemEval [7], FB
    Hate-Meme Challenge [8],
    ● WOAH [9], CONSTRAINT [10]
    [5]: HASOC
    [6]: Jigsaw Kaggle
    [7]: SemEval
    [8]: FB Hate-Meme
    [9]: WOAH
    [10]: CONSTRAINT
    [11]: ElSheried et al., EMNLP’21
    E- Internal Experts
    C- Crowd Sourced
    R- Racism
    S- Sexism
    H- Hate
    PA- Personal
    Attack
    A- Abuse
    SM- Spam
    O- Offensive
    L- Religion
    N- Neither
    I- Implicit
    E- Explicit

    View Slide

  35. Literature Overview: Hate Detection
    ● N-gram Tf-idf + LR/SVM [1,2]
    ● Glove + CNN, RNN [3]
    ● Transformer based
    ○ Zero , Few Shot [4]
    ○ Fine-tuning [5]
    ○ HateBERT [6]
    ● Generation for classification [7,11]
    ● Multimodality
    ○ Images [8]
    ○ Historical Context [9]
    ○ Network and Neighbours [10]
    ○ News, Trends, Prompts [11]
    [1]: Waseem & Hovy, NAACL’16
    [2]: Davidson et al., WebSci’17
    [3]: Barjatiya et al., WWW’17
    [4]: Pelican et al., EACL Hackashop’21
    [5]: Timer et al. ,EMNLP’21
    [6]: Caselli et al., WOAH’21
    [7]: Ke-Li et al.
    [8]: Kiela et al., NeuIPS’20
    [9]: Qian et al., NAACL’19
    [10]: Mehdi et al., IJCA’20, Vol 13
    [11]: Badr et al.,

    View Slide

  36. Limitations of Existing Datasets
    ● A myopic approach for hate speech datasets using hate lexicons. [1, 2]
    ● The hate speech in real world goes beyond hateful slurs. [3]
    ● Limited Study in Hinglish context.
    Motivation
    ● Can we curate a large scale Hinglish Dataset yet compassing different geographies?
    ● Can we model contextual information into detection of hate?
    [1]: Waseem & Hovy, NAACL’16
    [2]: Davidson et al., WebSci’17
    [3]: ElSheried et al., EMNLP’21

    View Slide

  37. GOTHate Dataset Curation
    ● Curated from 3 different
    geographies (India, USA, UK)
    ● Intermixing of events like Trump
    Visit and NCR protests
    ● Neutral seeding
    ● Tweets present in English,
    Hindi and Hinglish.
    ○ 3k tweets in pure devnagri
    ● We additionally collected
    timelines of users and their 1st
    hop follower-followee network.
    Fig 1: Dataset Sample of GOTHate [1]
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  38. GOTHate Annotation Process
    Fig 1: 2-phased Annotation Mode [1]
    Fig 2: Overview of
    Annotation Guideline [1]
    ● Phase I: k = 0.80
    ● Phase II k = 0.70
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  39. GOTHate Dataset Statistics
    Fig 1: Dataset Stats [1]
    ● 50k tweets.
    ● 3k hateful.
    ● Delhi Riots related topics
    garner maximum hate.
    ● NT does have as much
    hate as expected.
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  40. Yet Another Hate Speech Dataset
    Fig 1: Intra-class JS Distance of different HS datasets [1]
    Observations:
    O1: JS distance (H-P=0.087) and (N-P=0.063) are
    lower than other pairs.
    O2: In proposed dataset the hate class is closer to
    neutral than with offense class.
    O3: All HS datasets have lower divergence.
    Reasons:
    O1: Cause and product of provocative disagreement
    in human annotation.
    O2: Due to neutral hate seeding and lack of lexicon
    for curation.
    O3: Curation from real-world interactions leads to
    fuzzy classification of hate.
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  41. Proposed Method
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  42. Experiments and Ablation
    Fig 1: Baseline and Ablation
    Comparison[1]
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  43. Proposed Pipeline
    [1]: Kulkarni et al., Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment, KDD 2023

    View Slide

  44. Open Questions

    View Slide

  45. 1. Large volume of hate
    2. Data Collection from multiple online sources
    3. Data Labeling w.r.t annotation bias and labeling error
    4. Modeling Dynamic Context from multiple endo/exogenous sources
    5. Modeling Subtext/Implied statements
    6. Modeling Multilingual, multimodal/cultural aspect of hate
    Open Challenges: Major themes

    View Slide

  46. My work
    sara-02.github.io

    View Slide

  47. Research @LCS2
    ● Dialog
    ● LLM/Representation Learning
    ● Hate Speech & Harmful memes
    ● Fake news
    ● Opinion Mining

    View Slide

  48. LCS2 Publications
    lcs2.in

    View Slide

  49. Q&A
    Thank you

    View Slide