Towards Better Recommendation Explainability Evaluation for Conversational Recommender Systems

No Thumbnail Available
May, Joseph Andrew
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
This study focuses on Conversational Recommender Systems (CRS) and proposes a method for classifying recommendations as good or bad. Traditional conversational recommendation metrics like BLEU, ROGUE, and METEOR are not sophisticated enough to assess recommendation quality. A shift towards different metrics is needed to assess recommendation quality. Eight quality factors, length, readability, repetition, word importance, polarity, subjectivity, grammar, and feature appearance are proposed to be more relevant, explainable, and impactful metrics to assess conversational recommendation quality. Towards that end, three different neural networks are created using GPT2, GPT-NEO, and t5 as base models that embed a conversational recommendation and factor in the eight aforementioned quality factors as inputs to a linear residual network architecture to classify recommendations. The GPT-NEO model achieves the highest average prediction accuracy at 83\%, GPT2 has an average accuracy of 78\%, and t5 74\%. Individual Conditional Expectation analysis shows that grammar, feature appearance, and repetition are the most impactful quality factors. A Shapley value analysis shows each factor can push model predictions toward bad or good classes for all three models. The 8 quality factors assess recommendation quality more meaningfully, accurately, and contextually than current standard methods.
AI Explainability, CRS, ICE Analysis, LLM, Shap Analysis, Computer science