Employing robots in the real world to perform a large variety of tasks remains a great challenge to current perception, planning and control algorithms. Various specialized representations, such as for mapping or localization, have been proposed which are typically used in fixed pipelines that fuse perception, planning and control. These approaches are typically highly interpretable in a way that humans can reason about the prediction uncertainty of the system, or what additional measurements are necessary to improve the predictions. On the other hand, such static frameworks do not allow the robot to learn from experience or adapt to changing task requirements.
Learning-based approaches have found great success in domains where large amounts of labelled data is available. Many problems in robotics, however, do not belong to such regime where training data is easily obtainable. Instead, it is often only possible to provide few kinesthetic demonstrations, or rely entirely on self-supervised or reinforcement learning. While the longstanding motivation behind such learning approaches is to enable robots to improve by learning from their own experience, the current instantiations of state-of-the-art reinforcement learning (RL) algorithms, even model-based, require extensive amounts of interaction samples, such that, in most cases, simulators are necessary to provide a risk-free environment that runs orders of magnitude faster than real time. With regards to accountable AI, many of these approaches are not human-interpretable – they may achieve high performance on certain tasks but it remains an open research question how the training setup must be designed to guarantee performance throughout all metrics over the tasks of interest.
Whether control policies are learned through reinforcement learning, or feedback control laws are optimized – in most approaches simulators are used to validate the algorithms before deploying them on the real system. An inherent problem to such techniques is the disparity between the simulated and the real world, i.e., the sim2real gap. Various methods have been proposed to overcome this issue, such as domain randomization and domain adaption.
In this work, we approach the problem of visuomotor control from a different angle. Instead of learning a separate model or policy in a simulator, we use the simulator as the model that we can use to derive controllers and estimate the state of our system of interest. Simulators, such as physics engines, already encode our understanding of the world through the laws of physics and generalize well to a wide variety of application scenarios. Most quantities, such as the geometry of the objects, are human-interpretable and can be even verified through a variety of specialized tools. We propose to design the simulator from the ground up to be invertible, i.e., such that we can estimate the simulation settings from the observations of the real system.