Language Agnostic Model: Detecting Islamophobic Content on Social Media
Language Agnostic Model: Detecting Islamophobic Content on Social Media
No Thumbnail Available
Date
2021
Authors
Khan, Heena
Journal Title
Journal ISSN
Volume Title
Publisher
Middle Tennessee State University
Abstract
Islamophobia or anti-Muslim racism is one dominant yet neglected form of racism in
our current day. The last few years have seen a tremendous increase in Islamophobic hate
speech on social media throughout the world. This kind of hate speech promotes violence
and discrimination against the Muslim community. Despite an abundance of literature on
hate speech detection on social media, there are very few papers on Islamophobia detection.
To encourage more studies on identifying online Islamophobia we are introducing the
first public dataset for the classification of Islamophobic content on social media. Past
work has focused on first building word embeddings in the target language which limits
its application to new languages. We use the Google Neural Machine Translator (NMT) to
identify and translate Non-English text to English to make the system language agnostic.
We can therefore use already available pre-trained word embeddings, instead of training our
models and word embeddings in different languages. We have experimented with different
word-embedding and classifier pairs as we aimed to assess whether translated English data
gives us accuracy comparable to English dataset. Our best performing model SVM with
TF-IDF gave us a 10-fold accuracy of 95.56 percent followed by the BERT model with a 10-
fold accuracy of 94.66 percent on the translated data. This accuracy is close to the accuracy
of the untranslated English dataset and far better than the accuracy of the untranslated Hindi
dataset.
Description
Keywords
Dataset,
Islamophobia,
Natural Language Processing,
Sentiment Analysis,
Social Media,
Text Classification,
Computer science