Abstract
Motivation
Using a gradient-based attribution mask, we investigate the causes of the degradation in generalization in challenging environments by examining salient regions across consecutive stacked frames used as an RL input. Based on our analysis, we empirically identified two phenomena, highlighting them as key causes of performance degradation: (i) what we refer to as imbalanced saliency and (ii) observational overfitting [1].
Method
1. Feature-Level Frame Stack
To alleviate the imbalanced saliency problem, we modify the encoder structure from an image-level frame stack to a feature-level frame stack.
2. Shifted Random Overlay Augmentation
To alleviate the observational overfitting problem and make the encoder robust to dynamic backgrounds, we propose a new data augmentation called shifted random overlay.
SimGRL
Based on SVEA [2] as a baseline algorithm, we propose SimGRL by adopting the two regularizations.
Results
DMControl-GB
DistractingCS
Robotic Manipulation
Demonstrations
DMControl-GB
DistractingCS
Each test shows the results for intensity levels $\in$ {0.05, 0.1, 0.15, 0.2, 0.3}.
Robotic Manipulation
Reference
[1] Song et al. “Observational Overfitting in Reinforcement Learning.” ICLR (2020).
[2] Hansen et al. “Stabilizing deep q-learning with convnets and vision transformers under data augmentation.” NeurIPS (2021).
Citation
@inproceedings{songsimple,
title={A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations},
author={Song, Wonil and Choi, Hyesong and Sohn, Kwanghoon and Min, Dongbo},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}