- Computer vision based.
- Trained via Reinforcement Learning.
- Simulated in PyBullet.
In this work, an agent is trained to fly a drone along a railway using a semi-supervised reinforcement learning algorithm. In most cases, it is trivial to design a suboptimal policy to follow to solve a task, but designing an optimal one is very hard. In this case, it is easy to make the drone follow the railway using simple rules, but it is unclear what will be the optimal trajectory when presented with obstacles along the flight path. To this end, we present an Evidentially Supervised Soft Actor Critic (e2SAC) algorithm. The algorithm learns to dynamically tradeoff between learning from the suboptimal policy when it is uncertain about a region of the environment (it has not encountered a situation before) versus learning via exploitation of its own well-explored value function. This algorithm requires no fine-tuning of trade-off scheduling, and does the trade-off using a measure of its aleatoric uncertainty.
Only through this algorithm can the RL agent learn how to fly the drone via high-dimensional image information completely in a simulation that has domain randomisation involved. The publication for this work is not out yet, but early experiments have shown that it has promising results in terms of sample efficiency for solving a wide range of tasks.
