Language Agnostic Model: Detecting Islamophobic Content on Social Media

Khan, Heena

Language Agnostic Model: Detecting Islamophobic Content on Social Media

dc.contributor.advisor	Phillips, Joshua L.
dc.contributor.author	Khan, Heena
dc.contributor.committeemember	Barbosa, Sal
dc.contributor.committeemember	Li, Cen
dc.date.accessioned	2021-04-14T01:11:58Z
dc.date.available	2021-04-14T01:11:58Z
dc.date.issued	2021
dc.date.updated	2021-04-14T01:11:58Z
dc.description.abstract	Islamophobia or anti-Muslim racism is one dominant yet neglected form of racism in our current day. The last few years have seen a tremendous increase in Islamophobic hate speech on social media throughout the world. This kind of hate speech promotes violence and discrimination against the Muslim community. Despite an abundance of literature on hate speech detection on social media, there are very few papers on Islamophobia detection. To encourage more studies on identifying online Islamophobia we are introducing the first public dataset for the classification of Islamophobic content on social media. Past work has focused on first building word embeddings in the target language which limits its application to new languages. We use the Google Neural Machine Translator (NMT) to identify and translate Non-English text to English to make the system language agnostic. We can therefore use already available pre-trained word embeddings, instead of training our models and word embeddings in different languages. We have experimented with different word-embedding and classifier pairs as we aimed to assess whether translated English data gives us accuracy comparable to English dataset. Our best performing model SVM with TF-IDF gave us a 10-fold accuracy of 95.56 percent followed by the BERT model with a 10- fold accuracy of 94.66 percent on the translated data. This accuracy is close to the accuracy of the untranslated English dataset and far better than the accuracy of the untranslated Hindi dataset.
dc.description.degree	M.S.
dc.identifier.uri	https://jewlscholar.mtsu.edu/handle/mtsu/6394
dc.language.rfc3066	en
dc.publisher	Middle Tennessee State University
dc.source.uri	http://dissertations.umi.com/mtsu:11389
dc.subject	Dataset
dc.subject	Islamophobia
dc.subject	Natural Language Processing
dc.subject	Sentiment Analysis
dc.subject	Social Media
dc.subject	Text Classification
dc.subject	Computer science
dc.thesis.degreelevel	masters
dc.title	Language Agnostic Model: Detecting Islamophobic Content on Social Media

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Khan_mtsu_0170N_506/Language-agnostic-model-Detecting-Islamophobic-content-on-Social-Media-master.zip
Size:: 10.34 MB
Format:: Unknown data format

Download

Name:: Khan_mtsu_0170N_11389.pdf
Size:: 1.24 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses