No video :(

Improving Generalization in Deep Reinforcement Learning via SLC Weightings

Подписаться 474

50% 1

Improving Generalization in Deep Reinforcement Learning via Spatially Localized Confidence Weightings
From the July 2023 Melbourne Machine Learning and AI Meetup: www.meetup.com/machine-learning-ai-meetup/
Talk Description: Deep reinforcement learning agents have achieved human-level performance in a variety of tasks. However, these agents often struggle with poor generalization capabilities, especially when presented with unseen environments. We propose a novel approach to enhance the generalization performance of reinforcement learning agents by leveraging the human inductive bias of inattention blindness.
Our proposed method, Spatially Localized Confidence (SLC), divides the latent representation of the input observation into multiple patches and assigns each patch a confidence weighting based on how relevant it perceives its local information to be for determining global policy and value functions. This approach mimics human inattention blindness and allows agents to focus on critical patches while ignoring irrelevant ones.
We evaluate our method within the Procgen Benchmark and observe a 15% improvement in mean normalized scores compared to the IMPALA baseline. By incorporating global context through a vision transformer encoder, the enhanced SLC+GC model achieves a 34% relative improvement over IMPALA. Our method demonstrates the potential benefits of leveraging inattention blindness, locality, and translation invariance within decision-making as inductive biases to promote generalization capabilities in deep reinforcement learning agents.
Speaker Bio: Axel Ahmer is currently completing his masters degree in artificial intelligence at RMIT after having studied his double undergraduate in mech eng and maths at the University of Adelaide. He's a casual TA at RMIT teaching first year comp sci and higher level AI. He loves neural networks to the max, and is particularly interested in how they are applied to sequential decision making tasks and meta-cognition. He spends his free time playing chess, making weird websites, and going for very short runs.