Children ’ s responses and opinion on three bots that motivate , educate and play

Social robots may help children in their daily health-care related activities, such as adherence to diet and exercises of diabetics. Based on a domain and literature study, we specified three support roles with corresponding bot behaviors: motivator, educator and buddy. These behaviors, such as showing attentiveness, could be implemented well in a physical character (the iCat robot), somewhat less well in a virtual character, and least well in a text interface. Twenty—eight to nine years old—children participated in a controlled experiment to evaluate the bots. They proved to value the support roles positively, in particular the buddy role. Objective and subjective data showed that they highly appreciated both the physical and virtual characters (more than the text interface). Furthermore, children proved to interact faster with the character than with the text interface. There is a clear added value of robots compared to conventional text interfaces.


I. INTRODUCTION
NFORMATION and communication technology (ICT) in home, school and health settings has changed dramatically in the last two decades. For example for education it has been changing from one computer in a class that is hardly used, to computer usage by every school subject and the requirement to do homework on the computer. This use can be extended from homework tasks for school to physical exercise. These physical exercises might help to counter the increasing number of children suffering from obesity and diabetes. ICT technologies can thus aid in doing exercises [1][2][3][4][5][6], giving social support [7,8], and helping with lifestyle change [9][10][11][12]. Research on persuasive technology [13] and affective computing [14] provides (partial) solutions, e.g. for the realization of social behavior, such as social talk and turntaking [2][3][4][5], and of empathic behavior, such as attentiveness and giving compliments [7,8], [9], [6,10,12]. This research comprises supporting technologies that are more conventional text-based [6,9,12], and more innovative character-based virtual [1,2] or physical [3][4][5]7,8,10] "robots". The media equation [15] states that technology is higher appreciated when it exposes social behavior and is physically present. Consequently, one would expect that physical characters are appreciated more than virtual characters and text interfaces. This is confirmed in research comparing virtual with physical characters, as all results are in favor of the physical character [4,[16][17][18][19]. In comparison with adults, children react to, and interact with, physical characters differently. Tanaka [20] found that children -after 27 lessons -interact with a physical character as if it was a peer instead of a toy. This can be caused by their tendency to heavily anthropomorphize the character. Draper [21] conducted research towards physical characters in the education of children. This research showed that a teacher teaches best, but that a physical character is better than a sound-tape with the lesson.
The paragraph above summarizes some research on persuasive technology, affective computing, virtual and physical characters. However, more research is needed for better understanding of the added value of robots compared to conventional text interfaces. First, there is a need for further theoretical foundation from psychology, pedagogy, persuasive technology and affective computing, to improve the development of a motivating and educating social companion. Second, there is a need for further empirical foundation, in which the different user interfaces are being evaluated in a comparative experiment with children.
In this paper, we address this by comparing a text interface, a virtual and a physical character that all implement the roles of educator, motivator, and (game)buddy as far as their dialogue and appearance characteristics allow for. Our general hypothesis is that a physical character is better at fulfilling these roles than a text interface and virtual character. We focus on the user experience [22]: how the children response to, and enjoy the interaction with the different interfaces.

II. DESIGN OF THREE BOTS FOR YOUNG DIABETICS
We chose the iCat from Philips (Fig. 1), in both physical and virtual form, to implement the behaviors for the concerning roles. This character was previously used in an experiment with older adults [10,23]. During this experiment, participants evaluated five different interfaces: a text interface, a social and non-social virtual character, and a social and nonsocial physical character. User preference was measured for the different assistants on several factors, such as empathy, Children's responses and opinion on three bots that motivate, educate and play Rosemarijn Looije, Mark A. Neerincx, Vincent de Lange I trust, and acceptance. The results indicated that socially intelligent characters are rated more empathetic than a text interface and a non social character. Moreover, the virtual character was appreciated more than the physical character both on the trustworthiness and the empathy dimensions [10]. Notwithstanding the positive results for the virtual character, half of the users indicated that they preferred the text interface while the other half preferred a social character. A possible explanation could be the anxiety that older adults have towards characters [24].

A. Media equation
People have the tendency to socialize information and communication technology [15], this is called the media equation. The more a device supports this tendency, the more people will like to use the technology. Furthermore, a physical character will have a greater social facilitation effect [4,[16][17][18] (i.e. people tend to perform simple tasks better in the presence of others [25]) than a virtual character [16]. Both the tendency to like social devices and the social facilitation effect support the idea that a social physical character is preferred as a personal assistant. Therefore, we distinguish three bots in this study: • Conventional text • Virtual robot (virtual iCat) • Physical robot (physical iCat)

B. Design of a prototype for children
The social characters and text interface developed for adults were taken as a starting point for the design of the prototype for children. The existing prototype was adapted for the use by children and made more automatic. We had to adapt the prototype because children ask for a different approach of both the design as well as the evaluation of the interface. During the design phase, special attention should be given to the different interests and cognitive abilities that children have in comparison with adults, which influence their interaction with the computer [26]. We looked specifically at cognitive, physical, and affective characteristics of children in the age group of 8-9. Children of this age are linguistically skilled and start performing several tasks independently. An example is diabetes where children start administering insulin and counting carbohydrates themselves.
Relating the cognitive development of children, interfaces should be visually oriented with not too much text and, just as for adults, immediate feedback is needed to keep the interaction natural and non-irritating. In relation to the physical development, Chiason and Gutwin [26] propose that interfaces for children should be tangible, such as the physical iCats, and that interfaces need not be cuddly in order to be engaging. Finally, research in affective computing shows that children like to have the possibility to be in control of the interaction with technology and that children stay engaged and motivated by providing them with occasional entertaining events [26]. Engagement and motivation can be stimulated by challenging and fun games, e.g. implemented in a (game) buddy [6,27]. The (game) buddy ensures that users keep using the assistant, because it is fun [22].
In the evaluation phase, subjective measures are often used to get the opinion of the user about the tested interface. The opinion of children is important, because adults do not always understand what children want and why [28]. Doing a survey with young children is not easy. The children should be able to interpret all the questions correctly and make a considered choice between the answers. Another problem for the analysis is that children have the tendency to have extreme opinions on all the products they rate [28].

C. Diabetic children
In previous research a domain analysis of adults with diabetes was performed. We extended this analysis to the domain of children with diabetes, using diabetes as a case study. A diabetic nurse, play therapist, a patient who acquired diabetes on a young age and a game developer were interviewed. This analysis yielded insights in the differences and similarities between adults and children with diabetes and their computer technology usage. Both adults and children have a need for an educator who teaches them more about diabetes, because chronically ill have little knowledge about their disease [12] and therefore do not understand why they have to comply with certain advices. Furthermore, there is a need for a buddy that is a companion in coping with the disease. In addition, children were in need of help for counting carbohydrates, and one that helps keeping track of time to take their medication in time. An important remark was that the use of the device should be fun and challenging to improve the engagement and motivation. Eventually, diabetic children could be one of the first "serious" users of the envisioned personal assistant. Eating, physical exercise, and their joint effect on energy consumption are important issues for such children, and, therefore, 'core' elements for our study on robot assistance.

III. DESIGN OF THREE ROLES FOR THE BOTS
Based on the knowledge we gathered about diabetic children and their needs, a scenario was developed that includes personal assistance. Based on the scenario we chose three roles to be implemented in the prototype: educator, motivator, and game buddy. An extra advantage of implementing the motivator and educator roles is that the results can be compared to the motivator and educator role in the experiment for a personal assistant for older adults [10]. That experiment showed that these roles are appreciated when implemented in a social robot. We implemented the roles in the same three bots as in [10]: a chatbot, a virtual, and a physical robot. In contrast to the chatbot, the robots have the possibility to express facial and voice emotions.

A. Motivator
Both the motivator and educator are based on the Motivational interviewing theory, which by means of questions tries to facilitate increase in knowledge on persons' behavior and disease -in our case diabetes -thereby increasing the motivation to change. A therapist who can apply motivational interviewing successfully should be: empathetic [29] and trustworthy [30]. Motivational Interviewing is successfully applied in a text-based personal assistant, the HealthBuddy®, for chronically ill [9,11]. We divide the properties of motivational interviewing into two roles, the motivator and educator role. The motivator role implements the properties that are linked to how things are said and done while the educator role focuses on what is said and done. This means that the motivator role looks at ways to make the assistant appear empathetic and trustworthy. To make the assistant look empathetic we could find some skills with related behaviors to implement. We implemented three behaviors for three skills; reflective listening, positive regard, and attentiveness. The virtual and physical iCat are able to implement behaviors for all three skills, while the text interface can only implement behaviors for the positive regard skill. Reflective listening behaviors that are implemented are: reacting positive or negative according to the event and asking questions when something is not understood. The behaviors that are implemented for positive regard are: give compliments when something is done correct and do not punish if something is done wrong. The behaviors for the last skill, attentiveness, are: look at the user, have an active listening expression, and sometimes nod. It is very difficult to find behaviors that make an assistant look trustworthy; trust in an application is something that comes in time, but it can be stimulated. To enable trust, the dialog, mainly the form and content, can be made acceptable for the user. This can be done for example by taking the vocabulary of the user in account. Another way to receive trust, that the play therapist proposed, is to make the user comfortable (e.g. let the user play a game).

B. Educator
Motivational interviewing tries to increase the knowledge of a patient by educating the user. We implemented this in a quiz form that used educational videos on nutrition and/or exercise each followed by a multiple choice quiz question about the video to increase the knowledge of the user about the subject. The educator uses behaviors from the motivator to appear empathetic and trustworthy. It listens to what the user says, is happy when the user answers a question correctly, and just gives the reason for the correct answer when the answer is incorrect. The educator behavior was the same for the physical and virtual iCat and for the text interface.

C. Game Buddy
The game buddy role was chosen, because an assistant for children would definitely need a fun activity. Children need to stay engaged, and alongside of the serious tasks a personal assistant can offer them, some entertaining functionality is necessary. A first prerequisite for the game buddy was to offer a familiar two player game that was not too difficult, did not take long, and was fun for a little while. In previous research with the game of tic-tac-toe [31], children found it fun to play it with the iCat. Therefore, we decided to use tic-tac-toe in our prototype. Furthermore we based the personality of the game buddy on the personality that was preferred in the research of Verhaegh [31]: moderate expressive. There was an algorithm that made sure that the level of the game was adapted to the user so that it became harder if the user won and easier if the user lost. The outcome of the previous game was stored in a user profile. We tried to keep the game challenging in this way. The personal assistant in the game buddy role was empathetic (using the motivator behaviors, which were different for the robots and the text interface, see section III.A) towards the user; it gave compliments and was not over enthusiastic if it won a game. The personal assistant gave comments on the game; compliments ("nice move"), neutral remark ("now we are equal"), and congratulating remarks ("congratulations you won"). The comments were given taking three factors into account: Who made the last move, whether the situation is advantageous for the user, and if the game is in an end state. Besides being complimentary the assistant was also attentive in the way that it asked the user if he/she would like to start, which symbol he/she preferred to use, and it looked at the game board when the attention of the user was there. Furthermore the assistant did not cheat, and left the user in control.

IV. MULTI AGENT STRUCTURE
We implemented the prototype with the use of distributed agents that were in compliance with the FIPA standards [32]. The different roles were all implemented in their own agent so that the structure was modular. The modularity makes it possible to extend or adapt the system without changing the whole system. Furthermore, the use of agents makes the whole system easy distributable. Fig. 3 gives an overview of the implemented agents. The agents are implemented in JADE.net [33] with the use of C#, because the communication framework was already implemented in C#. The three different roles are implemented in different agents. The motivator is implemented in the dialogue agent (which is the central agent), deciding when what text and what expression should be used. The dialogue agent also poses the quiz questions and handles the answers. Secondly the tic-tactoe agent implements the game buddy that decides when to do which move. Finally the quiz agent implements the educator role by starting up movies. The touch screen agent displays the movie and tic-tac-toe and sends the move of the user in tic-tac-toe back to the tic-tac-toe agent.
The text, touch-screen, and iCat agent receive and send information from and to the environment. The text agent represents the text interface, and the iCat agent represents the iCat. Within the iCat agent, there is a module that handles the text input from the speech recognition that is performed by the experimenter. The last agent is the personal profile agent that holds information about the user, such as age, gender, lost and won games. This information can be used to adapt dialogue, game, and quiz.

A. Wizard of Oz
The participants thought they were using a completely autonomous assistant, but the experimenter/wizard simulated the speech-to-text. The agents, text interface, and iCat were implemented in a way that the whole interaction between participant and personal assistant was autonomous (i.e., only the speech recognition was simulated via a person in another room, the so-called Wizard of Oz).

V. EVALUATION
The three bots; chatbot, virtual robot, and physical robot, were implemented with the use of the predetermined roles and agents. After which they were evaluated. In this evaluation we tested if the participants thought of the bots as being empathetic, trustworthy, and fun, amongst others. Furthermore, we objectively measured positive and negative utterances and time spent at the interaction with the robot. Based on literature about social actors and previous research our hypotheses were: (H1) The robots will be evaluated as more empathetic than the chatbot. (H2) Children will trust the physical robot most and the chatbot least. (H3) The physical robot is most attractive. (H4) The interaction will be faster with the robots.

A. Method
Participants: Twenty-four non-diabetic children took part in the experiment, that lasted around 1 hour and quarter, for which they were rewarded with a book token. The data of twenty children was usable (due to incompleteness and a child with a neuro-developmental disorder). The twenty children were all third graders (i.e., fifth group of the primary school in the Netherlands), aged 8-9 (M age = 8.40, SD = 0.50). Setting: The experiment was conducted in a room that resembled a living room. There was a table, on which touchscreen and iCat stood, or instead of the iCat a keyboard and computer screen stood (Fig. 2). Experimental design: A within subject design was used for iCat vs. text interface, while there was a between subject design for physical vs. virtual iCat. This meant that all children used the text interface and the iCat for which the order of use was counterbalanced. Furthermore the children that used the virtual iCat did talk and played a game with the Would you like to use the robot/chatbot again? ife2 Would you like to play another game with the robot/chatbot some time? ife3 Would you like to play another quiz with the robot/chatbot some time? Engagement ife4 Would you like to talk some more with the robot/chatbot some time? physical iCat at the end to get some additional information on their preferences for a virtual or physical robot.

Measures:
We limited the amount of questions to a minimum to keep the experimentation time reasonable. Fun: The six questions regarding subjective fun (Table 2) were asked with the use of a smiley-o-meter [28], which is a five point Likert scale that uses smileys to represent the answers. We did also count the number of negative utterances and number of positive utterances and subtracted these from each other as a measure for observed fun. The utterances we counted are enumerated in Table 1.
Acceptance: Five different questions about acceptance were asked ( Table 3). The questions were all posed on a five point Likert scale. We adapted the annotation of the scale to every question; An example of this is "Do you understand the robot" which has the scale "Never", "Sometimes", "Always". Empathy: For empathy four questions were asked (Table 3), the questions were also posed on a five point Likert scale and posed in the same way as the acceptance questions. Trust: Three questions for trust were asked ( Table 3). The questions were posed on a five point Likert scale similar to that of the acceptance and empathy questions. Do you understand the robot/chatbot? ia4 Which interface did you find easiest to use? Acceptance ia5 Which interface did you prefer? ie1 Do you find the robot friendly? ie2 Do you think the robot understands you? ie3 Do you think the robot tells the truth? Empath y ie4 Do you find the robot is curious about you? iv1 Do you think the robot tells the truth? iv2 Would you answer honestly to the robot's questions? Trust iv3 Do you think the robot would tell your secrets to someone else?

hi1
How many times a day would you like to eat fruit? Health Int.
hi2 How many lollipops do you think you should be allowed to eat a day?
Efficiency: The efficiency was calculated using the time of interaction with the interface. Because the virtual iCat and the physical iCat condition require some extra time caused by the "speech recognition", this amount of time had to be subtracted. The subtraction of the speech recognition was done because in the future this will be done automatically and not by hand as was the case in this experiment. We calculated the efficiency by taking the total amount of interaction time minus the wizard time. This is around 6% of the total time. Learning effect: The learning effect is related to the accurateness and completeness of the tasks. The effectiveness was therefore measured by the number of correctly answered quiz questions. Health intention: Health Intention is interesting in relation with the motivational interviewing (change in lifestyle) approach we took. Therefore we asked questions about the attitude towards nutrition before the experiment and after the use of each assistant. The questions (Table 3) were based on the theory of Reasoned Action [34].

Procedure:
Participants were told they participated in an experiment to evaluate personal assistants for children. They would work with a number of interfaces and have to fill in some questionnaires on what they thought of the interfaces. They used the bots subsequently. First they answered a question about their health intention. And before using an interface, they answered a question about expected fun. They were told that when they would hear a beep, the interaction would start. The interaction with the interface followed a structured dialog, which was led by the interface. In the interaction, questions were asked by the bots and the participants were expected to answer on those. It was structured, since we wanted to let the participants experience more or less the same interaction, in order to be able to compare the results. In each condition, the dialog followed the same structure, consisting of three parts or tasks that represented the three different roles: motivator, educator, gamebuddy. First the assistant introduced itself (talking task/motivator), then a video quiz was played with the children followed by a quiz question (video quiz task/educator) and finally one or two tic-tac-toe games were played (game task/gamebuddy). After the interaction children were asked the five remaining questions on the experienced fun and the questions about trust, health intention (two after the first interface and three after the second), perceived empathy and three of the acceptance questions (ia1-ia3). In the end the children were asked what kind of roles or applications they would use the iCat for and ia4-ia5.

Fun:
The question about the fun expectation (ifx1) resulted in a significant difference between the physical iCat (mean = 4.6 out of 5) and the text interface (mean = 4.0 out of 5) (Mann-Whitney U (1,8)=20.5, Z=2.06, p<0.05). In addition, we compared the indicated value of fun per task within and between interfaces (ife2-4). The game with the physical iCat was valued significantly more fun (mean = 4.7 out of 5) than the quiz with the physical iCat (mean = 3. Acceptance: Both acceptance questions about the ease of use (ia4) and preference (ia5), asked at the end of the experiment, showed significant differences between the different interfaces. The iCats were found easier to use than the text interface (Chi-Square (1,19) = 5.0, df = 1 p<0.03). The physical and virtual robots were found easiest to use, 70% and 80%, respectively. Similar results were found when asked for their preference. About 70% favored the iCats and 30% the text interface (Chi Square(1,19) = 4.1, df = 1 p<0.05) . The majority of the children stated the iCat to be more fun. The reasons they gave are summarized in Table 4. Children who performed their tasks with the virtual iCat were also given the opportunity to use the physical iCat. These children were also asked which of the three interfaces they preferred. The physical iCat appeared to be the most fun to work with. It was favored by 80% of the children, because it was real. Some additional comments were that its eyebrows and mouth could move. The remaining three questions regarding acceptance did not yield significant differences. All interfaces were rated high on acceptance: scoring 4.3, 4.5, and 4.4 out of 5 for the text interface, virtual iCat, and physical iCat, respectively. This indicates that all interfaces were very acceptable.
Empathy: All the three interfaces had high scores on the empathy questions ranging from 4.0 to 4.2 out of 5: 4.2 for the physical iCat, 4.0 for the virtual iCat, and 4.1 for the text interface. All interfaces were thus perceived as empathetic. There were no significant differences between the interfaces.
Trust: The children rated all three interfaces high on trust 4.1 out of 5 for the physical iCat and the text interface and 4.3 out of 5 for the virtual iCat. Again there were no significant differences between the interfaces.
Efficiency: For the efficiency of the interfaces we looked at the duration of the complete interaction. Both the efficiency of the virtual iCat and the physical iCat differed significantly from the text interface (Table 5). A comparison between the iCat and virtual iCat did not provide any significant difference.
Learning effect: About 85% of the children answered the question, posed before the movie containing the information, correctly. This affirms that the children were already knowledgeable on the topic. On average the children answered 8.3 out of 10 questions correct. Thus no learning effects could be found.

VII. CONCLUSION AND DISCUSSION
The experimental set-up, in which only the speech recognition was simulated, worked well, and the physical and virtual robots were highly appreciated. We realized bots that could have meaningful and pleasant dialogues with children for their three roles. The interaction with the robots was significantly faster than with the chatbot and the physical robot was most fun to interact with. The game buddy role was important for the engagement with the personal assistant of the children. In contrast with the experiment with older adults [10], no significant differences were found for empathy. This can be explained by the high ratings the children gave to all three interfaces ("ceiling effect"). So, the proposed type of support for personal healthcare was well-accepted by the children in general.
This study compared three interfaces with their "natural" dialogue styles: a text-based chat-bot with two speech-based robots. You could say that we compared text to speech. We argue that a text interface for the characters would have been unnatural, because their appearance strongly suggests they have the ability to speak and listen. Correspondingly, speech dialogues are uncommon for the graphical, direct manipulation displays (windows).
In the short term, no significant discrepancies were observed regarding motivation and education between the different personal assistants. Therefore, a long-term experiment should be conducted in which engagement will play a larger role, because children will have to keep using the personal assistant for a longer period of time. Long-term effects of artificial agents in healthcare interventions are discussed in e.g. Marsella, Lewis Johnson, Bore [35] (education about cancer), Bickmore and Picard [36] (motivating to exercise), and Brave, Nass, and Hutchinson [37](social support). These papers show the relevance of the educator, motivator and buddy roles for user support. The long term results suggest that virtual characters that exhibit affection are more enjoyable, more trustworthy, more supportive, and a better educator in comparison with no virtual character or a virtual character without affective abilities. Furthermore, learning results were better, and the participants were more willing to continue working with the social character. This literature focused only on adults. We would like to explore the long term effects on children and the effects of a physical character in comparison with a virtual character. In the healthcare domains we are looking into children with e.g. obesities, diabetes, and coeliac. These children should adapt their diet to stay healthy and are not allowed to eat the same as most children (i.e. a diabetic should keep track of his/her sugar intake). A buddy to cope with being different could be appreciated. Furthermore, the buddy could help educating them about their condition and motivate them to follow the physician's advice of the physician.
In the future the game buddy role should be extended to make it possible to play multiple games. Furthermore, the dialog agent should be able to handle more diverse interactions and preferably even conversations that were not anticipated by the programmer beforehand. As expected, the results showed that the quiz was valued as less fun than the game. Fun is very important to keep the children engaged, as we learned from the educational game developer during domain analysis. In the future, we would like to explore other educational methods that are perhaps more fun to use (this might eventually lead to a game educator).
In general, we can say that the children rated the interface properties high, which caused a small number of significant differences in the subjective measures. The objective measures also showed a preference for the robots, while their interaction was faster and exhibited more social behavior. They were excited about participating in the experiment and using the iCat. These results indicate that the iCat is an interface that attracts the attention and therefore can have positive effects on motivating and educating children while being a buddy, which is of importance when applying the robot in the healthcare domain. So, the motivator and educator roles that we developed are appropriate for both older adults (see [23]) and children, and the iCat is a good platform to implement and test such roles for both user groups.