Text Summarization and Sentiment Analysis of Drug Reviews: A Transfer Learning Approach

dc.contributor.advisor Ranganathan, Jaishree
dc.contributor.author Abuka, Gloria
dc.contributor.committeemember Dong, Zhijiang
dc.contributor.committeemember Gu, Yi
dc.date.accessioned 2023-04-25T16:06:29Z
dc.date.available 2023-04-25T16:06:29Z
dc.date.issued 2023
dc.date.updated 2023-04-25T16:06:29Z
dc.description.abstract Transfer learning is a machine learning method where a model that has been trained on a specific or general task (source domain) is reused as a starting point for a similar task in a new model (target domain). This is an important concept in the Natural Language Processing field because of its ability to produce remarkable results from small datasets. Text summarization produces a concise and meaningful form of text from a larger one while sentiment analysis distinguishes the polarity present in the text. News and scientific articles have been used in text summarization models over the years, but drug reviews have gotten considerably less attention. This study proposes a text summarization and sentiment analysis method based on the transformer architecture for the 10 most useful reviews for 500 different drugs from a dataset of drugs reviews. We created human summaries for the drug reviews manually and compared the performance of a fine-tuned Text-to-Text Transfer Transformer (T5) model and Pre-training with extracted gap-sentences for abstractive summarization (PEGASUS) models with that of a Long Short-Term Memory (LSTM) model. Additionally, we assessed the impact of various preprocessing steps on the ROUGE scores. We also fine-tuned the Bidirectional Encoder Representation from Transformers (BERT) model for sentiment analysis in comparison to an LSTM model. Our T5-Base model had the best results with average ROUGE1, ROUGE2, and ROUGEL scores of 50.31, 29.14, and 40.06 respectively while the BERT model achieved an accuracy of 84\% for the sentiment analysis task. We evaluated our fine-tuned models on a dataset of BBC news summaries for text summarization and we achieved average ROUGE1, ROUGE2, and ROUGEL scores of 72.20, 63.59, and 57.42 respectively. Our models outperformed two previous works, which had ROUGE1, ROUGE2, and ROUGEL of 47.0, 33.0, 42.0 and 47.30, 26.50 and 36.10 respectively.
dc.description.degree M.S.
dc.identifier.uri https://jewlscholar.mtsu.edu/handle/mtsu/6884
dc.language.rfc3066 en
dc.publisher Middle Tennessee State University
dc.source.uri http://dissertations.umi.com/mtsu:11682
dc.subject LSTM
dc.subject PEGASUS
dc.subject Sentiment Analysis
dc.subject T5
dc.subject Text Summarization
dc.subject Computer science
dc.thesis.degreelevel masters
dc.title Text Summarization and Sentiment Analysis of Drug Reviews: A Transfer Learning Approach
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Abuka_mtsu_0170N_11682.pdf
Size:
485.86 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.27 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections