Generation and control of locomotion patterns for biped robots by using central pattern generators

This paper presents an efficient closed-loop locomotion control system for biped robots that operates in the joint space. The robot’s joints are directly driven through control signals generated by a central pattern generator (CPG) network. A genetic algorithm is applied in order to find out an optimal combination of internal parameters of the CPG given a desired walking speed in straight line. Feedback signals generated by the robot’s inertial and force sensors are directly fed into the CPG in order to automatically adjust the locomotion pattern over uneven terrain and to deal with external perturbations in real time. Omnidirectional motion is achieved by controlling the pelvis motion. The performance of the proposed control system has been assessed through simulation experiments on a NAO humanoid robot.


I. INTRODUCTION
During the last decades, biped locomotion has basically been tackled as an inverse kinematic problem, aiming to generate a dynamic locomotion pattern by calculating trajectories for the robot arms and legs in the robot's Cartesian space under the constraint that the robot walks while keeping its dynamical balance.This is a valid solution widely used in humanoid robots.However, animals and humans do not need to compute any Cartesian space trajectory nor require precise models of their body or the environment, since their complex nervous system is able to automatically learn motion patterns by controlling extensor and flexor movements and then adapt them according to internal changes or external environmental conditions.
Many studies show the presence of specialized networks of neurons able to generate the rhythmic patterns in animals and humans, such as walking, running and swimming.These networks are called central pattern generators (CPGs).The term central indicates that sensory feedback is not necessary for the generation of rhythmic signals.CPGs are modelled as networks of neurons capable of generating stable and periodic signals controlled through a set of constant parameters.In the case of vertebrates, these networks are located in the central nervous system within the spinal cord.The output signals from these CPGs are sent to the muscles through the peripheral nervous system.High-level commands are sent to the different CPGs by the brain through the spinal cord.These commands do not generate the periodic signal by themselves, since the oscillation is autonomously generated within the CPG in the spinal cord.
Currently, many works about CPG-based locomotion control of legged robots and other types of robots have been proposed ( [1], [2]).The CPG networks have mainly been used for controlling the robot gait in the robot's task-space or in the robot's joint-space.Biped locomotion is a complex problem since it involves the inherent instability of humanoid robots.Therefore, it is important to develop an appropriate control scheme capable of generating stable motions, and CPGs have shown to be an appropriate model for solving this problem adequately.Thus, the robotics community has shown an increasing interest in locomotor central pattern generators since these networks are able to generate complex high-dimensional signals for controlling coordinated periodic movements with simple input signals.
Within the task-space approach, a CPG network that generates the stepping and propulsive motion for locomotion control of a biped robot was proposed in [3].The feedback pathways for propulsive motion were obtained through a gradient method, by using the pelvis angular velocity in the sagittal and coronal planes as inputs in order to generate a feedback signal that controls the trajectory of the legs in the walking direction.However, only results on flat terrain were reported.Alternatively, a control system that generates the motion of a biped robot in the task-space by using nonlinear oscillators was presented in [4].These movements are modulated through the signals provided by touch sensors.Later in [5], the same authors extended their previous work in order to control the turning behaviour of the biped robot.In [6], a method was proposed to generate a walking pattern and stabilize it based on coupled oscillators without real time computation of the zero moment point (ZMP).In [7], a CPG is utilized to describe and modulate the trajectory of the robot's center of gravity and, as a result, the trajectories of its limbs in the workspace.Experiments show that the robot is able to walk on both flat and inclined terrain with slopes of +/-10 degrees.In [8], a pattern generator system for biped locomotion based on CPG networks is proposed.The system operates in the task-space.The authors claim that the robot can walk on flat and inclined terrain with slopes of +/-7 degrees.
Regarding the joint-space approach, a CPG implemented with coupled nonlinear oscillators was proposed in [9] in order to control the biped locomotion of a humanoid robot.The system is able to learn an arbitrary signal in a supervised framework.It can modulate some parameters and allows the introduction of feedback signals provided by the robot's sensors.However, well defined locomotion patterns must be defined in advance.
In [10], the signals for the robot's joints are generated by using coupled oscillator models based on sensory information about the location of the center of pressure and its velocity.However, results on flat terrain were only reported.
In turn, a feedback mechanism for phase regulation by using load sensory information was proposed in [11].The signals for the motors are specified in the joint-space through mathematical formulations that define the angular displacement, with the parameters that characterize the system's behaviour being hand-tuned.Later [12], the same authors proposed a multiobjective staged evolutionary algorithm in order to find out the parameters that characterize the open-loop behaviour of the system.However, due to the reduced number of individuals used by the genetic algorithm and that a hand-tuned gait was included as an individual in a random initial population, thus biasing the final convergence, there is no guarantee that the algorithm ends up exploring the whole search space and, as a result, that it finds out all feasible solutions.In addition, the control system was only tested on flat and sloped terrain with a maximum ascending slope of 4 degrees and a maximum descending slope of 2.5 degrees.
In [13], a control scheme for qualitative adaptive reward learning with success failure maps applied to humanoid robot walking was proposed.However, that technique does not ensure a stable interaction with the floor, since the robot tends to drag its feet when walking, which is likely to lead to falls on uneven terrain.The authors present results with the NAO walking on slopes of +/-10 degrees.
Table I summarizes the most representative control schemes for locomotion control of biped robots that have successfully been tested on small-size humanoid robots.The proposed technique belongs to the joint-space category, as the CPG output signals directly drive the angular position of the robot's joints, and yields results comparable to those reported in [7] in terms of walking speeds and types of terrain, although the latter is a task-space approach that requires solving the inverse kinematics, thus limiting the response time to unexpected events, which may end up compromising the robot's safety.The proposed CPG guarantees that the open-loop control system generates a locomotion pattern that correctly interacts with the floor.In addition, it allows a straightforward modulation of the locomotion patterns through sensory feedback in order to cope with uneven terrain and transitions between different types of ground, and eases the introduction of additional feedback controllers to deal with external perturbations.
This paper is organized as follows.Section II describes the control system.Experimental results are presented and discussed in Section III.Finally, conclusions and future work are given in Section IV.

II. CPG-BASED CONTROL SYSTEM
This section describes a CPG network and the associated methodology to automatically estimate the configuration parameters of the system in order to generate well-characterized locomotion patterns in straight line.The locomotion pattern is automatically obtained with a genetic algorithm by evaluating   the locomotion performance with different combinations of parameters through dynamics simulations [14].Some feedback strategies are presented in order to continuously walk on various types of terrains and to deal with external perturbations.

A. CPG network and neuron's model
The CPG utilized in this work is based on a network of 4 interconnected neurons with mutual inhibition previously proposed by Matsuoka [15].The topology of that CPG is shown in Fig. 1.That network has been chosen as it generates oscillatory output signals in phase, anti-phase and with phase differences of π 2 and 3π 2 radians.These phase differences are sufficient to control the robot's movement directly in the joint space, as shown in [10].In the present work, however, that network directly drives the robot's joints instead of the phase oscillators used in [10].The interconnection weights between the neurons of that CPG, which have been set according to [3], are shown in Table II.Figure 2 shows the output signal of each neuron of the CPG network.
The CPG's neurons are defined according to the well-known Matsuoka's neuron model: The external input u e affects the amplitude of the neuron's output signal.The frequency of the output signal is determined by the time constants τ and τ ′ .The set of parameters must satisfy some requirements in order to yield stable oscillations ( [15], [16]).Term f i is a feedback variable that can be used to control the output amplitude and to synchronize the output signals with a periodic input signal.Parameter w ij represents the bidirectional interconnection weight between two neurons.Those inteconnection weights determine the phase difference among the output signals generated by the CPG.When a network of neurons is set, they all oscillate together according to their internal parameters and the network interconnections, converging to specific patterns and limit cycles.Variable N represents the number of neurons that constitute the CPG (N = 4 in this work).
Parameter K f has been introduced as proposed in [17] in order to modulate the frequency of the output signal.The time constants in (1) and ( 2) are thus reformulated as: where τ o and τ ′ o are the original time constants.The internal parameters that determine the behaviour of each neuron are summarized in table III.The CPG generates stable oscillations provided those parameters satisfy some requirements ( [15], [16]).
In this work, the proposed control system has been tested on the NAO platform [18], which is a small size humanoid robot with 21 degrees of freedom, 56 cm tall and weighting 4.8 Kg.Notwithstanding, the same control system can easily be adapted to other humanoid robots with a similar kinematic structure.
The locomotion control of humanoid robots in the joint space must control the pitch and roll motion of the different robot's joints from the output signals generated by the CPG.In this work, the controllers proposed in [10] have been used to determine the angle in radians of the following joints of the NAO robot: Those controllers depend on 10 internal parameters: 4 biases (bias1, ..., bias4) and 6 gains (a, b, c, d, e, f ).Parameter ξ controls the stride length.Both the latter and the locomotion frequency, which is controlled through the value of K f , determine the robot's walking velocity.By taking into account the relationship between locomotion frequency and stride length in the human gait, which has been studied in [19], table IV shows the pairs (K f , ξ) that have experimentally been chosen in this work for 5 reference velocities of the NAO robot.The remaining joints have experimentally been set to the constant values shown in table V in order to yield a stable upright position.

B. Estimation of CPG parameters through evolutionary computation
The genetic algorithm (GA) proposed in [20] has been applied in order to estimate the best combination of all internal parameters of the locomotion controllers specified in the previous section.In the present work, the input parameter of the GA is the required velocity in straight line.The chromosome structure is composed of 10 traits associated with the respective gains and biases that constitute the internal parameters of the locomotion controllers: (a, b, c, d, e, f ) and (bias1, bias2, bias3, bias4).Table VI shows the allowed intervals for those parameters, which constitute the GA's search space.Those limits were experimentally delimited by taking into account the range of variation of the optimum solutions found by the GA after an extensive set of executions.
The GA's fitness function evaluates each individual of the current population at the end of a constant simulation period (30 seconds in this work) in which the robot is allowed to walk using the Webots real-time simulator.In particular, the fitness function that is maximized in order to sort out the individuals evaluated by the GA in each generation is the average of four terms.The first term applies a Gaussian function to the difference between the required velocity in straight line and the velocity reached at the end of the simulation period for the evaluated individual.The second term applies a Gaussian function to the difference between the distance that the robot should travel in straight line at the required velocity at the end of the simulation period and the final distance traveled by the evaluated individual.That term is maximized if the robot follows a straight path during the whole simulation period.The third term corresponds to the deviation distance with respect to the straight-line path at the end of the simulation period.That deviation is negated in order to be maximized.This term is maximized when the robot reaches the desired destination along the straight path at the end of the simulation period.The fourth term is the percentage of time within the simulation period that the robot's ZMP stability margin is above a given threshold.That term is maximized when the robot's stability is optimal during the various motion stages (both single-support and double-support modes).
In order to obtain acceptable locomotion patterns, two restrictions were imposed to the solutions yielded by the GA.The first restriction prevents solutions with large torso inclinations.In particular, solutions with a torso inclination above 16 degrees were rejected in this work.With lower thresholds, the GA hardly found valid solutions, whereas higher thresholds led to unnatural bent postures while walking.
The second restriction is associated with the ground clearance.Specifically, it is required that the swing foot be parallel to the floor and with the sole's height higher than 1 cm for the swing leg most of the time.That guarantees a correct interaction between the robot and the floor, as well as the avoidance of small obstacles.

C. Feedback strategies
Some feedback pathways have been introduced in the CPG described above in order to adjust the locomotion pattern in real time.
1) Posture controller: The posture controller keeps the robot's trunk in an upright position by using information provided by the robot's gyrometer and accelerometer.The trunk inclination in the sagittal plane can be controlled by changing the value of parameter bias1 in (3).This parameter is set proportionally to the difference between the reference inclination θ and the current trunk inclination estimated from the sensors, θ, as well as to the derivative of that difference, both in radians: where bias1 0 is the original bias1 parameter.
2) Stepping controller: It regulates the interaction between the robot's feet and the ground by synchronizing the output signals generated by the CPG with the real time interaction between the robot and the floor by using the measures provided by the force sensors located in the robot's feet soles.Such synchronization is performed by taking advantage of the entrainment property of neural oscillators.Thus, the frequency of the generated locomotion pattern is adjusted according to the current interaction between the feet soles and the floor.This allows the control system to compensate for both external perturbations and mismatches related to the robot's mechanical parts.Furthermore, if the stride length is set to zero, this controller guarantees the correct stepping.
Let L f , L b , L l and L r be the force measures corresponding to the four force sensors located at the front, back, left and right positions of the left foot, respectively.Likewise, let R f , R b , R l and R r be the corresponding force measures for the right foot.The stepping controller is defined as: where f 1 , f 2 , f 3 and f 4 are the feedback inputs corresponding to the respective 4 neurons of the CPG (1).
3) Stride length controller: It modulates the stride length ξ by taking into account the stability margin along the sagittal plane, µ X , which is measured in centimetres.The goal is to lower the stride length whenever the stability margin is reduced in order to recover stability.The stride length is redefined as: where κ is a threshold that has experimentally been set to 3 cm and ξ 0 is the original stride length.

D. Omnidirectional controller
In real applications, it is necessary that the robot explores its workspace by changing its walking direction at any moment.In particular, a joint located in the robot's pelvis is used to control the walking direction in order to describe a circular motion in either the clockwise or counterclockwise directions.That joint in the NAO is referred HipY itch.The following controller is utilized to determine its angle in radians: where y 1 and y 3 are the corresponding CPG's output signals and k 5 is a variable whose magnitude is inversely proportional to the curvature radius and whose sign determines whether the direction of circular motion is clockwise (negative sign) or counterclockwise (positive sign).

E. Phase resetting controller
Phase resetting is a fast and simple feedback strategy that has also been used to change the phase of the locomotion pattern generated by the control system in order to recover the robot's balance whenever an external perturbation is applied to the robot's body.This effective feedback strategy is suitable for humanoid robots with reduced computational capability since it does not require a complex processing of data [21].
The closed-loop system for locomotion control of biped robots with phase resetting must detect the external force applied to the robot's body through the fast analysis and tracking of the measures provided by the robot's sensors.Once the external perturbation is detected by the system, it must react by activating the phase resetting mechanism in order to quickly recover balance.
This controller synchronizes the neurons' output signals in order to modify the current phase of the locomotion pattern generated by the system to a desired phase given an external event or stimulus, such as an external force applied to the robot's body.The aim of this mechanism is the generation of a force in the direction opposite to the one of the force generated by the external perturbation by changing the phase of the current locomotion pattern in order to guarantee the fast recovery of balance.
The information provided by the 3-axis accelerometer is used to detect the instant at which the external force is applied to the robot's body and also to estimate the magnitude and direction of the external force applied to the robot's body.According to the current phase of the generated locomotion pattern and the external force applied to the robot, the phase resetting controller must react by changing the current phase of the locomotion pattern to another phase that allows the robot to recover its balance.The phase change is effective after ∆t seconds.

III. EXPERIMENTAL RESULTS
The proposed locomotion control system has been tested on a NAO biped robot in simulation on the Webots simulator.For determining the gains and biases of the locomotion controllers for any given velocity, the GA interacts with the Webots simulator in order to evaluate the different individuals belonging to every generation.A total of 12,000 individuals were evaluated for every generation in order to cover a wide range of possible solutions within the search space.Each individual requires the simulation of the robot while walking in straight line during the evaluation period, which was set to 30 seconds in this work.
The proposed system has been evaluated upon 5 reference velocities: 1, 3, 5, 7 and 9 cm/s, which span the same speed range as tested in [11].For each reference velocity, the GA was executed 50 times in order to find out the best combination of internal parameters of the locomotion controllers.The GA stops whenever either the fitness function does not significantly vary for a predefined number of generations (3 generations with a fitness variation below 0.001 in this work) or a maximum number of generations is reached (8 generations in this work).Only the solutions whose fitness values were above a predefined threshold (2.4 in this work) were selected and the median of their corresponding parameters computed.Table VII shows the median values of those parameters for the 5 tested velocities.A total of 7 solutions had a fitness value above the predefined threshold for 1, 3 and 7 cm/s, whereas 4 solutions passed that threshold for 5 and 9 cm/s.Those solutions represent optimal locomotion patterns for the given reference velocities.Intermediate velocities can be obtained without changing the selected pattern by slightly modifying the values of the stride length, ξ, and/or the frequency gain, K f , as is shown in fig. 4 and fig.6.The control scheme proposed was evaluated in simulation studies using a workspace that consists of an ascending 10degree slope, followed by a flat surface and a final descending 10-degree slope.The robot started and stopped walking on the flat surface on both sides of the slope.Figure 3 contains a sequence of snapshots showing the performance of the system while successfully traversing the workspace at a velocity of 5 cm/s.The feedback gains that successfully deal with that environment at that speed were heuristically found in simulation.Future work will aim at automatically finding those feedback gains according to the available sensory information in order to deal with increasingly challenging environments.

A. Step length modulation
Using the parameters found by the GA for the velocity of 5 cm/s, variable ξ was modulated to change the straightline velocity on-line.Figure 4 shows the relation between the robot's measured straight-line velocity while variable ξ is modulated.This is another option for controlling the velocity of the locomotion pattern in real-time with the proposed control scheme.Figure 5 presents the footsteps generated by the robot when variable ξ is used to modify the walking velocity on-line.In the plot, the footsteps are colored blue and red for the robot's left and right soles, respectively.

B. Frequency modulation
Variable K f can be modulated to change the frequency of the locomotion pattern and, therefore, its velocity.Figure 6 shows the relation between the robot's measured straight-line velocity while variable K f is modulated.The set of parameters used were those obtained for the locomotion pattern at 5 cm/s.

C. Omnidirectional locomotion experiment
Figure 7 shows an example of a circular motion in the counterclockwise direction described by the robot using the  Velocity [cm/s] K f Fig. 6.Velocity modulation by varying parameter K f omnidirectional controller and the optimal parameters found for the walking velocity of 5 cm/s.The stride length ξ in that case was set to zero in order to be able to turn in place.

D. Phase resetting experiment
In this section, a simple experiment is presented to show the suitability of the phase resetting controller for fast recovery of balance in biped robots.In the experiment described below, the external force was considered to be applied to the robot's The measures are in m s 2 .In the plots, the red line represents the system response when there is no external force applied to the robot's body.Thus, the robot is just walking.The blue line represents the behaviour when the external force is applied to the robot's head and the phase resetting controller is not activated.Finally, the green line represents the behaviour when the phase resetting controller is activated and the external force is applied to the robot's head.head along a known direction defined manually.This force guarantees that the robot will fall down when the feedback mechanism is not activated.Therefore, it has been used to test the system operating in both open and closed loop.The simulator allows the definition of the exact point in the robot's body in which the force is applied, as well as its desired magnitude and direction.The external force was also applied at a known phase of the locomotion pattern and at the same point on the robot's body in order to test the system under the same dynamic conditions.
The external force was applied at the instant in which the robot is standing on a single foot (right foot at the highest position and left foot supporting the full robot's weight).This pose was chosen as an example to validate that the control system is able to deal with unstable situations.Figure 10 represents the instant at which the external force is applied to the robot's head while the robot is standing on its left foot.
The locomotion pattern was generated by means of the proposed CPG-joint-space control scheme, with the parameters found for the straight-line locomotion pattern by considering a walking speed of 5 cm/s.In the experiment, the controller's response time (∆t) was set to 40 ms.However, this time could be smaller according to the desired system's response.
Figure 8 represents the measures provided by the robot's accelerometer for 3 possible situations, namely, the system response in open-loop without any external force applied to the robot's head (red), the system response in open loop with the external force applied to the robot's head (blue) and, finally, the system response in closed-loop with the external force applied to the robot's head (green).The sampling time was set to 1.7 ms.The information provided by the robot's accelerometer was used in order to determine the instant in which the external force is applied to the robot's body and thus the phase resetting controller is activated.The effect of the phase resetting mechanism in the output signals generated by the CPG network used to control the generated locomotion pattern is shown in fig.9. From these figures it can be observed the fast and stable response produced by the system.The effect of the phase resetting mechanism can be appreciated in fig. 9 and in the plots shown in fig.8.The external force is detected by the system in sample number 3064.The feedback mechanism is activated at that instant.After the controller's response time (40 ms) the system compensates for the external force applied to the humanoid robot's head through a fast motion that generates a force in the opposite direction.This minimizes the effect of the external perturbation and manages to recover balance quickly.
A sequence of snapshots showing the performance of the robot when phase resetting is off and on are shown in fig. 10 and fig.11, respectively.These experiments have shown that the closed-loop response is fast and effective, which makes this system suitable for humanoid robots with reduced processing capabilities.This system can also deal with larger forces than those tackled by other control strategies.
Experimental results showing the behaviour of the overall system in the simulated workspace can be found on the companion website 1 .

IV. CONCLUSIONS
The proposed system belongs to the joint-space category, as the CPG output signals drive the angular position of the robot's joints through a set of controllers whose optimal configuration of internal parameters is computed through an evolutionary GA given a desired walking speed in straight line.The proposed CPG guarantees that the open-loop control system generates a locomotion pattern that correctly interacts with 1 Companion website: https://youtu.be/Pl71G04ujws  the floor.It also straightforwardly modulates the locomotion patterns through sensory feedback so that the robot can cope with uneven terrain and transitions between different types of ground, and facilitates additional feedback controllers to deal with external perturbations.This is a very important feature because it enables the system to be improved incrementally by adding controllers so that more complicated situations can be copes with.The performance of the proposed control system has been assessed through simulation experiments on a NAO humanoid robot, showing the effectiveness of the proposed approach, although it can also be applied to other families of humanoid robots with a similar kinematic structure.
Future work will include the rigorous study of feedback controllers in order to cope with more complex types of terrain and external perturbations.Furthermore, a rigorous study about the variation of the internal parameters of the locomotion controllers (gains and biases) will be conducted with the final aim of establishing mathematical models that allow the system to automatically determine optimal parameters for any required velocity and direction, without executing the GA-based optimization process for every new speed.Finally, it is necessary to define feasible strategies to automatically compute the feedback gains based on sensory information about the environment in order to be able to cope with increasingly challenging real environments.

Fig. 5 .
Fig. 5. Footsteps obtained by varying parameter ξ from 0.676 to 2.176.The set of parameters used were those found for the locomotion pattern at 5 cm/s

Fig. 9 .
Fig. 9. Output signals of the 4-neuron CPG network shown in fig. 1.The plots represent the system's response without (top) and with (bottom) the proposed phase resetting mechanism.

TABLE I CPG
-BASED LOCOMOTION CONTROL SYSTEMS TESTED ON SMALL SIZE HUMANOID ROBOTS

TABLE II CPG
'S INTERCONNECTION WEIGHTS