Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

Title:

Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

Link:

https://ieeexplore.ieee.org/document/9956245

Abstract:

Artificial Intelligence, particularly through recent advancements in deep learning (DL), has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. For certain high-stake domains, in addition to desirable performance metrics, a high level of interpretability is often required in order for AI to be reliably utilized. Unfortunately, the black box nature of DL models prevents researchers from providing explicative descriptions for a DL model’s reasoning process and decisions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model’s decision-making process.

Citation:

Yuansheng Xie, Soroush Vosoughi, Saeed Hassanpour, “Towards Interpretable Deep Reinforcement Learning via Inverse Reinforcement Learning”, International Conference on Pattern Recognition (ICPR), Montreal, Quebec, Canada, 2022.

Previous
Previous

AI-RADS: Successes and Challenges of a Novel Artificial Intelligence Curriculum for Radiologists across Different Delivery Formats

Next
Next

Bladder Cancer Prognosis Using Deep Neural Networks and Histopathology Images