Application of the FACS in the Design and Construction of a Mechatronic Head with Realistic Appearance

—The growing demand for service robots requires a better and more natural human-machine interaction. Given that an important part of human communication is non-verbal, it is necessary to endow robots with gestural communication capabilities similar to humans. This paper describes the design and construction of a realistic, mechatronic head with high gesture capacity. The proposed design is based on the human anatomy and the facial expressions are deﬁned through the Facial Action Coding System (FACS). The paper shows the implementation details of the mechatronic head and the way a set of servomotors can generate the basic action units of FACS as well as the basic and more complex emotional gestures.


Application of the FACS in the Design and Construction of a Mechatronic Head with Realistic Appearance
David Loza, Samuel Marcos, Eduardo Zalama, Jaime Gómez García-Bermejo and José Luis González.
Abstract-The growing demand for service robots requires a better and more natural human-machine interaction.Given that an important part of human communication is non-verbal, it is necessary to endow robots with gestural communication capabilities similar to humans.This paper describes the design and construction of a realistic, mechatronic head with high gesture capacity.The proposed design is based on the human anatomy and the facial expressions are defined through the Facial Action Coding System (FACS).The paper shows the implementation details of the mechatronic head and the way a set of servomotors can generate the basic action units of FACS as well as the basic and more complex emotional gestures.

I. INTRODUCTION
H UMAN Machine Interaction (HMI) can be understood as: "two powerful computers (human and equipment) that try to communicate with each other through an interface with a very limited narrow band" [29].HMI have to deal not only with the design of efficient interfaces, but also with the minimization of the barrier between the human cognitive model of the tasks to be accomplished and the computer's understanding of these tasks.
For this reason, in the field of robotics, there is a growing interest in the development of devices that include social functionality and interaction methods similar to those of human communication.This interaction strategy is based on the consideration of the robot (the machine) as a member of a group in which one or more activities involve humans.Robots with large perception capabilities, equipped with advanced communication interfaces based on natural language, expressivity and gesture recognition make communication easier, not only for people with low technological skills, but also for people with decreased attention capability.

HUMAN-COMPUTER INTERACTION
In the field of HMI, a personified agent is defined as an entity with corporeal representation which can physically ------------------------------David Loza: davidlozaing@hotmail.com, Universidad de Valladolid Samuel Marcos: sammar@cartif.es,Fundación Cartif.Eduardo Zalama: ezalama@eis.uva.es,Universidad de Valladolid.Jaime Gómez: jaigom@eis.uva.es,Universidad de Valladolid.José Luis González: joseluis@seguritron.com, Seguritron Robotics interact with the environment through their body.The anthropomorphism of an agent can be defined in terms of behavior and appearance.In terms of behavior, robot anthropomorphism involves the identification of certain human attributes by users [6] (e.g.personality, knowledge or emotion).In terms of appearance, the anthropomorphism of an agent is defined as the resemblance of its shape to human morphology [19].It should be noted that the anthropomorphism not only depends on the physical features of the agent, but also on the way it communicates with humans.
Several aspects should be considered in the design of robots able to communicate with humans.The robot's appearance has a clear influence on the empathy humans feel for it.However, there is no consensus in the literature about whether the robot anthropomorphism impacts positively or negatively on the interaction with users.In [20], it is shown that agents with a higher degree of anthropomorphism in appearance are more credible and are thus preferred by users.However, in [25] and [18], it is shown that less anthropomorphic agents lead to more positive perceptions by users and are preferred for interaction.
There is also no consensus about the degree of realism required for the agent to express emotions convincingly.In a study conducted more than a decade ago [8], the need to use more sophisticated ways to express emotions in virtual environments was shown.In [28], the direct relationship between the expressive abilities of an agent and its ability to interact with its environment is presented.The results of [15] show that the use of agents with little expressiveness was the main barrier to effective interaction.Moreover, [21] show that even very basic drawings of faces were able to create an impression on the user's personality.According to [7], the face is a highly expressive element that humans tend to interpret in face to face communication.For this reason, the use of faces with a high degree of realism and expressiveness may unintentionally produce different messages that may confuse the receiver.So, in this case, the more realistic interaction may hinder communication.According to [3], there is evidence that the use of exaggerated images (e.g.cartoons) can express emotions more accurately than more realistic faces.
However, in most anthropomorphic robots, the use of an unrealistic or cartoonish appearance has been justified in terms of avoiding what is known as the "uncanny valley" postulated by Mori [17].A robot with simplified appearance can lead to friendly interfaces (but with lower communication skills).In fact, during the last decade, the implications of the uncanny valley have been discussed extensively.Moreover, different studies show that user preferences in terms of appearance and behavior are not universal.These preferences may differ depending on cultural and psychological aspects [14] [31] [4], as well as on age or gender (men vs. women, elderly vs. young people).
Initially there has been a tendency to develop robots with social skills and great expressivity with a simple motorization or illuminated faces [22].One of the pioneers in this field is the Kismet robot [5], a social robot developed at the MIT AI Lab.This robot is equipped whith an unrealistic face with large expressive eyes, eyebrows and mouth.Another relevant example is the Robot WE-4R Waseda [13] with a simple head, but which can output rich emotional expressions and behavior, not only with the face, but also the waist, arms and neck.However, although these robots exhibit large expressiveness, they lack relevant emotional cues on a detailed level.It is necessary to incorporate movements that mimic human gestures in greater detail, as is evident in the development of robots with humanoid appearance, like the android HRP-4C [26] presented by the National Institute of Advanced Industrial Science and Technology (AIST) of Japan, the DER2 and DER3 projects constructed by Osaka University.
In other approaches; [27] uses pneumatic actuators to display facial expressions, [12] presents an interesting emotional architecture, and [16] tries to imitate the muscle structures of the human face.However, they either do not have a natural appearance or they are very hard to implement.
The proposed approach shares similar objectives, although following biomechanics principles of the human face based on action units.This is aimed at obtaining effective and visually realistic results, providing the user who is interacting with the robot with the same amount of visual information as that of a human face.The anatomical features of a real face are taken into account in the design of the robot.The analysis and selection of those muscles whose action results in recognizable gestures is approached, as well as the implementation of these actions using a set of servomotors that confers realism and simplicity to the head.

III. HUMAN EXPRESSION
The human face is one of the most complex areas of the human anatomy, given the large number of muscles and their movement combinations.Therefore, the integration of all the different features and details of a human face into a mechanical model is a very difficult task.Thus, all robotic approaches have to assume several simplifications, in order to both reduce the number of facial components and simplify their behavior.
The design of a mechanical system that emulates a human face involves a careful selection of the muscles to be implemented and their allowed movements.A general scheme of the facial muscles is presented in Figure 1.The most significant facial muscles involved in facial expression are listed in Table I.
The analysis of the muscular anatomy allows information to be obtained concerning the relationship between muscular contraction and the subsequent deformations on the face surface.However, from the point of view of social interaction, it is also necessary to determine what the facial expressions represent and how they are interpreted by humans.To this end, researchers in social psychology have explored the development of standard systems to encode and parameterize facial movements, with the aim of relating emotions and facial expressions.One of the most widespread systems is the Facial Action Coding System (FACS) [9].As a final goal, the FACS seeks to recognize and describe the operation of the socalled Action Units (AU).These units represent the minimum muscular activity units that produce momentary changes in facial appearance.Action units can be generated, described and recognized, and their proper combination can describe any global feature of the face.
The FACS describes more than 60 action units that faces can perform.However, the implementation of all these actions would result in an extremely complex, hardly parametrizable and impractical mechatronic head.Therefore, it is necessary to assume some simplifications by selecting those muscles and AUs corresponding to the most significant human expressions.Following [10] and our previous work with realistic animated avatars [24], a significant reduction in the number of AUs can be adopted while emulating convincingly the six universal emotional expressions described in the FACS: disgust, sadness, Fig. 2: Group universal complex expressions.From left to right, top to bottom: disgust, sadness, anger, happiness, fear and surprise [30] anger, happiness, fear and surprise.To be precise, these expressions can be produced using 17 AUs.
For the present work, this reduction in the number of AUs and related muscles has been considered.Only those AUs with the greatest influence on the visual perception of emotional expression have been kept.Based on the 17 significant action units, the face muscles involved in the generation of such micro-expressions have been selected.In this way, we aim to emulate the face's muscular action from the point of view of both the external visual result and the anatomical and physiognomical considerations.As described in the following sections, each AU has been emulated by using one or more properly placed servos, depending on the anatomical position of the emulated muscle (or muscle group).The selected units, their associated actions, corresponding muscles and the emotional expression in which they are activated are shown in Table II.
The robot is equipped with two cameras (in the eyes) for the visual tracking of users and moving objects.Tracking requires eight more action units, related to the movement of the eyes and the neck.Moreover, several studies have shown that these movements are critical for a proper communication and perception of some emotional expressions [1], [2], [11].For example, an expression of anger is much more recognizable when the face gesture is complemented by a moving down of the head (through a neck turn) and a moving up of the eyes.Finally, we have included the AU related to eye blinking.This is a movement that humans perform continuously and unconsciously.Blinking gives the robot a natural look even in a resting position.Therefore, the 9 action units reported in Table III have been added to those considered in Table II.IV.MECHANICAL DESIGN OF THE ROBOTIC HEAD UPON THE FACS.This section describes the mechanical design of the robot.The muscles associated with each servomotor and their selection and restrictions are specified.Some other mechanisms, such as those used in the eyes, are also explained.The final implementation is also described.

A. Location of servo motors
The servomotors are set to represent one or more muscles of the human face.The different AUs and related muscles are shown in Table II.The proposed locations for the different servomotors are shown in Figure 3. Table IV identifies the actuator number (according to Figure 3) and the most significant muscle associated with each actuator, as well as the area where its effect can be seen.23 degrees of freedom have been obtained using this servomotor distribution.The main characteristics of the servomotors used are listed in Table V.
Each servomotor is controlled using the pulse width modulation technique (PWM).
The mechanism of the eyes has been designed upon the human anatomy (see Figure 4).

B. Eye mechanism
The mechanical approach is shown in inferior rectus, lateral rectus, medial rectus, superior oblique and inferior oblique muscles and allow eye movements to be performed.
The eyeballs are about 25 mm in diameter.Two cameras have been integrated into the eyeballs to provide the head with visual perception and tracking.

C. Other mechanisms
The implementation of the mouth area is not easy.The action of the orbicular muscle cannot be simulated using a single servo, given the ellipsoidal nature of this muscle.The proposed approach consists in using 6 servomotors (4 on top and two at the bottom), which simulate the most significant insertions of other muscles into the orbicularis oris.Moreover, the ellipsoidal shape of the orbicular muscle will be reinforced by means of the silicone skin we plan to add in the near future.The location of the emulated insertions can be seen in Figure 5.
The side and front views of the final implementation are shown in Figure 6.The silicone skin will complete the feeling of realism of the mechatronic head.

V. CONTROL OF EXPRESSIONS
The control system provides a suited, integrated action of the different servomotors.It has three major modules: the  control unit, action units module and the emotion manager module.Figure 7 shows a scheme of the whole system.

A. Control unit
The control system is structured into a hardware component and a software component.The hardware component is an SSC32 controller card that can control up to 32 servos to a resolution of 1us, with a range of 2.50 mS to 0.50 mS.The software component is needed to manage the movement of the servos in terms of the orders received from the top levels.It is also in charge of verifying that the servos are within the actual limits of proper operation, taking into account the maximum and minimum displacement and the maximum speeds of movement.The controller modulates the pulse width subject to the different limits and parameters (which can be modified through a configurable file).

B. Module of action units
This module contains all AUs coded as different movements of the servos.This coding is based on the analysis described in Section II.Table VI shows the action units associated to each servo or group of servos.

Combination of servos
Action Units This module also allows the intensity level of each AU and its running time to be selected.Moreover, the module implements the basic interaction restrictions between the movements of the different servomotors.
In the FACS, the intensity of each AU is encoded by a series of thresholds that are notated as A, B, C, D and E. A represents the lowest level (the action unit is almost imperceptible) and E represents the highest level (the action unit is well defined).The threshold ranges are not equally sized, ranges C and D being the widest ones.To change the position of the servomotor according to the chosen intensity, a so-called intensity factor, K i , is provided.The relationship between K i and i is shown in the equation below, where i is the intensity, coded 1 to 5 (1 being the A and 5 the E intensity level, respectively) and K i is a parameter between 0 and 1, corresponding to the servomotor movement percentage.Figure 8 (a) describes the intensity according to FACS while Figure 8 (b) shows the curve described by the set of values that the intensity factor can acquire, K i , as well as the coding used in the FACS intensities. (1) Figure 9 shows the position of the servomotor as a function of time.The execution time t 0 is set as the time taken to complete an action unit and return to the neutral position.This time depends on the emotion that is running.Expressions such as anger and surprise will usually have a lower activation time t 1 than expressions of happiness and sadness.Equation 2expresses the analytical function shown in Figure 9.The activation time t 1 is the time taken by the AU to reach maximum intensity and the fall time t 2 is that taken by the AU to return to the resting position.Figure 9 shows an example of an actuator activity through time.

C. Emotion manager module
The emotion manager module is responsible for creating complex expressions: disgust, sadness, anger, happiness, fear and surprise.
The manager indicates the AUs that must be activated for performing each emotional expression (Figure 11).Each emotional expression j, is determined by a couple of vectors, I j ,T j , corresponding to Equation 3.
Where Ki j is the desired strength of the current emotion, i AUi is the intensity of each action unit, Kt j is the desired running time, and t AUi is the activation time period of each action unit.
Figure 11 shows the way the combination of action units allows complex expressions to be performed.For example, the expression of disgust is generated by combining the action units (AU4 + AU5 + AU10L), and the anger expression was obtained with the action units (AU7 + AU10L + AU15L + AU62).

VI. INTEGRATION IN A ROBOTIC ARCHITECTURE
In general, robotic systems are complex.Providing modularity to the system structure helps to deal with this complexity.In this way, the whole system can be divided into small components with a well-defined level of abstraction.
The mechatronic head has been developed upon this philosophy.We have used ROS (Robotic Operating System) to implement modular hardware abstraction, low-level interprocess message handling, and package management.Thus the head can receive or send messages to other modules, either in a local or a remote way.We have added a joystick for direct control and also a random behavior mode so that the head does not stay static.This flexibility gives important advantages.For example, it allows some components to be replaced by others with the same interface but different implementations (e.g.change the joystick type).It also allows new features to be added.In concrete, we have added a number of features previously developed by us, such as the visual tracking of objects and faces, facial expression recognition, speech synthesis, and others.All these modules can run locally or remotely, In this way, the system can work autonomously or by using a remote computer (if required).Figure 12 shows the architecture used.

VII. CONCLUSIONS AND FUTURE WORK
The present article describes the construction of a mechatronic head with a realistic appearance, whose structure and movements have been designed upon human anatomy and the Facial Action Coding System.
The proposed mechanical structure, the number of degrees of freedom and some characteristics of the mechanical ele-ments have been analyzed, along with the required mechanical approaches.
The proposed system is able to generate a wide range of expressions of the human face.With this aim, we have considered the set of micromovements or AUs described in the FACS.The corresponding microexpressions are controlled and combined by a control system which encodes each action unit as an arrangement of servo movements (determining the intensity and speed as parameters).
With this scheme, and with reference to the FACS, the robot is able to generate a wide range of different complex expressions through the combination of action units.
The robot is integrated into a more complex social architecture that includes robot emotion, visual tracking, user emotion recognition and a dialogue system [23] .

Fig. 6 :
Fig. 6: Front and side view of the building head.

Fig. 9 :
Fig. 9: Example of servo activity through time

TABLE I :
Relevant muscles of the human face.

TABLE II :
Muscles to be represented, along with their emotional expressions.

TABLE III :
Action units added.

TABLE IV :
Muscles associated with the different servomotors.

TABLE VI :
Action units associated with servos.