Text Summarization and Sentiment Analysis of Drug Reviews: A Transfer Learning Approach

No Thumbnail Available
Abuka, Gloria
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Transfer learning is a machine learning method where a model that has been trained on a specific or general task (source domain) is reused as a starting point for a similar task in a new model (target domain). This is an important concept in the Natural Language Processing field because of its ability to produce remarkable results from small datasets. Text summarization produces a concise and meaningful form of text from a larger one while sentiment analysis distinguishes the polarity present in the text. News and scientific articles have been used in text summarization models over the years, but drug reviews have gotten considerably less attention. This study proposes a text summarization and sentiment analysis method based on the transformer architecture for the 10 most useful reviews for 500 different drugs from a dataset of drugs reviews. We created human summaries for the drug reviews manually and compared the performance of a fine-tuned Text-to-Text Transfer Transformer (T5) model and Pre-training with extracted gap-sentences for abstractive summarization (PEGASUS) models with that of a Long Short-Term Memory (LSTM) model. Additionally, we assessed the impact of various preprocessing steps on the ROUGE scores. We also fine-tuned the Bidirectional Encoder Representation from Transformers (BERT) model for sentiment analysis in comparison to an LSTM model. Our T5-Base model had the best results with average ROUGE1, ROUGE2, and ROUGEL scores of 50.31, 29.14, and 40.06 respectively while the BERT model achieved an accuracy of 84\% for the sentiment analysis task. We evaluated our fine-tuned models on a dataset of BBC news summaries for text summarization and we achieved average ROUGE1, ROUGE2, and ROUGEL scores of 72.20, 63.59, and 57.42 respectively. Our models outperformed two previous works, which had ROUGE1, ROUGE2, and ROUGEL of 47.0, 33.0, 42.0 and 47.30, 26.50 and 36.10 respectively.
LSTM, PEGASUS, Sentiment Analysis, T5, Text Summarization, Computer science