Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

John Nassour; Vincent Hugel; Fethi Ben Ouezdou; Gordon Cheng

doi:10.1109/TNNLS.2012.2224370

Article Dans Une Revue IEEE Transactions on Neural Networks and Learning Systems Année : 2013

Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

(1) , (2) , (3) , (1)

1
2
3

John Nassour

Fonction : Auteur

Institute for Cognitive Systems

Vincent Hugel

Fonction : Auteur
PersonId : 18091
IdHAL : vincent-hugel
ORCID : 0000-0003-3675-4894
IdRef : 162162723

Laboratoire Conception de Systèmes Mécaniques et Robotiques - EA 7398

Fethi Ben Ouezdou

Fonction : Auteur

Laboratoire d'Ingénierie des Systèmes de Versailles

Gordon Cheng

Fonction : Auteur

Institute for Cognitive Systems

Résumé

In the human brain, rewards are encoded in a flexible and adaptive way after each novel stimulus. Neurons of the orbitofrontal cortex are the key reward structure of the brain. Neurobiological studies show that the anterior cingulate cortex of the brain is primarily responsible for avoiding repeated mistakes. According to vigilance threshold, which denotes the tolerance to risks, we can differentiate between a learning mechanism that takes risks and one that averts risks. The tolerance to risk plays an important role in such a learning mechanism. Results have shown the differences in learning capacity between risk-taking and risk-avert behaviors. These neurological properties provide promising inspirations for robot learning based on rewards. In this paper, we propose a learning mechanism that is able to learn from negative and positive feedback with reward coding adaptively. It is composed of two phases: evaluation and decision making. In the evaluation phase, we use a Kohonen self-organizing map technique to represent success and failure. Decision making is based on an early warning mechanism that enables avoiding repeating past mistakes. The behavior to risk is modulated in order to gain experiences for success and for failure. Success map is learned with adaptive reward that qualifies the learned task in order to optimize the efficiency. Our approach is presented with an implementation on the NAO humanoid robot, controlled by a bioinspired neural controller based on a central pattern generator. The learning system adapts the oscillation frequency and the motor neuron gain in pitch and roll in order to walk on flat and sloped terrain, and to switch between them.

Mots clés

Experience-based learning mechanism humanoid learning humanoid robot walking neurorobotics

Domaines

Automatique / Robotique Apprentissage [cs.LG]

Fichier principal

06365318.pdf (752.13 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claire Dune : Connectez-vous pour contacter le contributeur

https://univ-tln.hal.science/hal-01723809

Soumis le : mardi 18 février 2020-18:07:55

Dernière modification le : samedi 24 février 2024-03:31:59

Dates et versions

hal-01723809 , version 1 (18-02-2020)

Identifiants

HAL Id : hal-01723809 , version 1
DOI : 10.1109/TNNLS.2012.2224370

Citer

John Nassour, Vincent Hugel, Fethi Ben Ouezdou, Gordon Cheng. Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24 (1), pp.81-93. ⟨10.1109/TNNLS.2012.2224370⟩. ⟨hal-01723809⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLN UVSQ LISV TDS-MACS COSMER GS-SPORT-HUMAN-MOVEMENT

112 Consultations

235 Téléchargements

Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager