An Artificial Intelligence-based service to automatize the INFN CNAF User Support

Ronchieri, Elisabetta; Barbetti, Matteo; Trashaj, Alberto; Pellegrino, Carmelo; CESINI, Daniele; Fornari, Federico; Lattanzio, Daniele; Morganti, Lucia; Pascolini, Alessandro; Rendina, Andrea; Giugliano, Carmen

doi:10.22323/1.458.0005

Abstract

The INFN CNAF User Support unit acts as the first-contact interface between the users and the
CNAF data center that provides computing resources to over 60 scientific communities in the fields
of Particle, Nuclear and Astro-particle Physics, Cosmology and Medicine. Since its duties span
from repetitive tasks to supporting complex scientific-computing workflows, there is room for
enabling automation mechanism by relying on modern Artificial Intelligence (AI) techniques that
have recently shown to successfully cope Natural Language Processing (NLP) problems. Indeed,
part of the users’ requests cannot be directly addressed without the intervention of one of the other
specialized INFN CNAF units that act as a second level of support. In these cases, disposing of
an automatic AI-based labeling can be exploited to promptly notify the relevant units with the
pending requests. Over the many years of activity of the User Support group, several thousands of
users’ bilingual e-mail messages, both in Italian and English, have been received. Such collection
of e-mails provides the ideal sample for training Machine Learning (ML) models, and validating
them with new coming users’ requests. These messages can be organized in threads including
user requests together with the corresponding solutions, as well as the messages of the involved
second-level support unit, which are implicitly labelled by the recipients list of the e-mail. In
this study, we have applied a set of Machine Learning classification models, such as k-Nearest
Neighbors, Random Forest, Extreme Gradient Boosting, and Feed-forward Neural Network, to
the features extracted through NLP solutions aiming to automatize the e-mail labeling. The
performance of the defined models has been compared by considering various feature extraction
techniques, such as Bag of Words, Term Frequency - Inverse Document Frequency, Bag of 𝑛-
Grams, and WordEmbedding. Ongoing developments aim to involve the best performing model in
combination to Large Language Models (e.g., GPT-3.5, Llama 2) to build an AI-powered Digital
User Support Assistant. It will be designed to receive text via e-mail and provides a reply based
on the acquired knowledge base. A first prototype has been implemented in Python through
the usage of several ML/AI libraries, among them nltk, scikit-learn, and LangChain. A set of
User Supporters have been involved for test and validation. In conclusion, our study not only
showcases the technical prowess of AI in enhancing the INFN CNAF User Support activities, but
also emphasizes the broader considerations of user satisfaction, scalability, and future readiness.