Schumacher, P., Geijtenbeek, T., Caggiano, V., Schmitt, S., Martius, G., & Häufle, D. (2023). Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models. Preprint.
@unpublished{schumacher2023walking,
author = {Schumacher, Pierre and Geijtenbeek, Thomas and Caggiano, Vittorio and Schmitt, Syn and Martius, Georg and Häufle, Daniel},
year = {2023},
month = aug,
pages = {},
title = {Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models},
note = {Preprint.},
doi = {10.13140/RG.2.2.33187.22569/1}
}
Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning (RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments.
Refereed conference proceedings
Schumacher, P., Haeufle, D., Büchler, D., Schmitt, S., & Martius, G. (2023). DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems. The Eleventh International Conference on Learning Representations . https://openreview.net/forum?id=C-xa_D3oTj6
@inproceedings{schumacher2023deprl,
title = {{DEP}-{RL}: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems},
author = {Schumacher, Pierre and Haeufle, Daniel and B{\"u}chler, Dieter and Schmitt, Syn and Martius, Georg},
booktitle = {The Eleventh International Conference on Learning Representations },
year = {2023},
url = {https://openreview.net/forum?id=C-xa_D3oTj6},
file = {schumacher2023deprl.pdf}
}
Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles.
Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance.
We conjecture that ineffective exploration in large overactuated action spaces is a key problem.
This is supported by the finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems.
We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction.
By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems, outperforming current approaches in all considered tasks in sample efficiency and robustness.
Wochner, I., Schumacher, P., Martius, G., Büchler, D., Schmitt, S., & Haeufle, D. (2022). Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks. 6th Annual Conference on Robot Learning. https://openreview.net/forum?id=Xo3eOibXCQ8
@inproceedings{wochner2022learning,
title = {Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks},
author = {Wochner, Isabell and Schumacher, Pierre and Martius, Georg and B{\"u}chler, Dieter and Schmitt, Syn and Haeufle, Daniel},
booktitle = {6th Annual Conference on Robot Learning},
year = {2022},
file = {wochner2022learning.pdf},
url = {https://openreview.net/forum?id=Xo3eOibXCQ8}
}
Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as in robotics, so far, no detailed analysis has been performed to show the benefits of muscles in this setting. Our study closes this gap by investigating core robotics challenges and comparing the performance of different actuator morphologies in terms of data-efficiency, hyperparameter sensitivity, and robustness.
Workshops
Schneider, J., Schumacher, P., Haeufle, D. F. B., Schölkopf, B., & Büchler, D. Investigating the Impact of Action Representations in Policy Gradient Algorithms. Workshop on effective Representations, Abstractions, and Priors for Robot Learning (RAP4Robots) at ICRA 2023.
@misc{schneider2023,
author = {Schneider, Jan and Schumacher, Pierre and Haeufle, Daniel F. B. and Schölkopf, Bernhard and Büchler, Dieter},
title = {Investigating the Impact of Action Representations in Policy Gradient Algorithms},
file = {schneider2023.pdf},
note = {Workshop on effective Representations, Abstractions, and Priors for Robot Learning (RAP4Robots) at ICRA 2023}
}
Reinforcement learning (RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation can significantly influence the learning performance on popular RL benchmark tasks. The analysis results indicate that some of the performance differences can be attributed to changes in the complexity of the optimization landscape. Finally, we discuss open challenges of analysis techniques for RL algorithms.