Extending LDA functionality using cosine similarity in tracking the COVID-19 Publications

No Thumbnail Available
Osekowsky, Jessica
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Data is being created at an alarming rate, and it is becoming unrealistic to gather important information in a timely fashion without the use of machine learning techniques. The COVID-19 pandemic is one instance where the medical community came together and generated a large amount of data in a short period of time to gain a better understanding of the issues at hand. In this research, a new process called LDASine was developed that extended the Latent Dirichlet Allocation methodology for tracking topic changes over time. Two experiments were conducted to test the viability of LDASine. The first experiment involved determining which number of topics produced the most unique topics for three different time periods. The second experiment involved associating topics from different time periods and analyzing the changes between topics using the LDASine process. The results of the experiments proved the viability of LDASine as a process to analyze how topics change over time and determine which number of topics produced unique topics for a given measure of time.
Cosine similarity, LDA, Topic modeling, Topics, Computer science