Intrinsic motivation in virtual assistant interaction for fostering spontaneous interactions

With the growing utility of today’s conversational virtual assistants, the importance of user motivation in human–artificial intelligence interactions is becoming more obvious. However, previous studies in this and related fields, such as human–computer interaction, scarcely discussed intrinsic motivation (the motivation to interact with the assistants for fun). Previous studies either treated motivation as an inseparable concept or focused on non-intrinsic motivation (the motivation to interact with the assistant for utilitarian purposes). The current study aims to cover intrinsic motivation by taking an affective engineering approach. A novel motivation model is proposed, in which intrinsic motivation is affected by two factors that derive from user interactions with virtual assistants: expectation of capability and uncertainty. Experiments in which these two factors are manipulated by making participants believe they are interacting with the smart speaker “Amazon Echo” are conducted. Intrinsic motivation is measured both by using questionnaires and by covertly monitoring a five-minute free-choice period in the experimenter’s absence, during which the participants could decide for themselves whether to interact with the virtual assistants. Results of the first experiment showed that high expectation engenders more intrinsically motivated interaction compared with low expectation. However, the results did not support our hypothesis that expectation and uncertainty have an interaction effect on intrinsic motivation. We then revised our hypothetical model of action selection accordingly and conducted a verification experiment of the effects of uncertainty. Results of the verification experiment showed that reducing uncertainty encourages more interactions and causes the motivation behind these interactions to shift from non-intrinsic to intrinsic.


Introduction
Virtual assistants (also termed 'voice assistants') in today's market serve users through conversation. The competition among Amazon Alexa, Google Assistant, and similar systems involves not simply their usefulness. Developers want to build likeable virtual characters, not mere autonomous toolboxes. Recent researches shifted to topics such as affective experience [1] and hedonic benefits [2] during the interactions. From the perspective of affective engineering, these aspects of virtual assistants are inseparable from user motivation. Unfortunately, user motivation, although generally recognized as an important factor in determining the attractiveness of contemporary and future virtual assistants [3], has been little studied in this context. Here, one must also distinguish between different types of motivation: intrinsic motivation is derived from the expectation of enjoyment through taking an action, whereas extrinsic motivation is oriented toward the consequences of the action [4]. Between extrinsic and intrinsic motivation, there is another category, epistemic motivation, derived from the need to gain knowledge, or to hone skills, with the hope of benefiting from progress in the long run [4]. In the field of human-AI interaction, and the related fields such as human-computer interaction (HCI) and human-robot interaction (HRI; though only those robots who interact with users as conversational agents are of our interest), previous studies have scarcely discussed intrinsic motivation. Those studies either treated motivation as an inseparable concept, or focused on non-intrinsic, e.g. the motivation to comply with tasks (extrinsic) [5,6] or to gather information for pragmatic usages (extrinsic or epistemic) [1]. Although there were studies addressing concepts related to intrinsic motivation (such as interest and curiosity), the affecting factors of intrinsic motivation were still unstudied [7].
Intrinsic motivation should be the next focal issue in virtual assistant interaction (and indeed in HRI generally) because it is crucial for enhancing the perceived smartness of future smart products and their likeability. In the fields of education and consumer psychology, intrinsic motivation has been found to be related to higher levels of enjoyment, performance, and immersion in an activity [4, 8,9].
Unfortunately, virtual assistants are typically treated as autonomous tools by adult users, not as potential friends. A study found that, while children are willing to befriend virtual assistants found in household smart products, adults often are not [10].
The main objective of the present research is to obtain knowledge about intrinsic motivation and intrinsically motivated interactions in virtual assistant interactions. We wish to learn under what circumstances the user would interact with a virtual assistant "just for fun." Eventually, we wish to design the virtual assistant's interaction strategy in order to foster spontaneous interactions, thus improving product engagement.
In everyday life, users interact with virtual assistants on their free will. On the other hand, in case of laboratory studies, there are cases where the experimenter assigns interactive tasks to the user (participant). In such tasking scenarios, the participant is typically told to interact, therefore such interactions are not 'based on free will'. To distinguish these two cases, we introduced the concept of 'spontaneous interaction' in our study and specifically in our experimental design. We defined a 'spontaneous interaction' as any interaction that is taken by users of their own will, whereas if the user is told to act (or is told to achieve a goal and acts accordingly), the action is not spontaneous. The motivation behind spontaneous interactions is best discussed by relating action selection to motivation in the context of generalized user-product interaction. We have constructed a theoretical model of the process by which intrinsic motivation is formed. On the basis of this model, we have put forward and tested hypotheses concerning the relationship between motivation, expectation, and uncertainty.
In the following sections of Chapter 2, we will first introduce how expectation affects the action selection process (Section 2.1). Then we will discuss effects of expectation and uncertainty on intrinsic motivation in Section 2.2. In Sections 2.3 and 2.4, we will propose hypotheses on expectation and uncertainty's effects of these two factors.
To verify the hypotheses, we conducted a two-by-two experiment, which will be introduced in Chapter 3 Method. As will be discussed in Chapters 4 and 5, Results and Discussion, unexpected results were obtained such that we revised our model partially and proposed another hypothesis on uncertainty. The verification experiment on uncertainty's effects will be introduced in Chapter 6. The Results and Discussion will be presented in Chapters 7 and 8. In the final chapter, Chapter 9 General Discussion, we will summarize our findings on the mechanism of intrinsic motivation, and proposed suggestions for designing better virtual assistants.

Expectation based action selection
In a simplified setting, the interaction process consists of four phases: 1) expectation, in which the user estimates the consequences of the intended action; 2) action; 3) observation, in which the user discovers the actual consequences of the action; 4) learning, in which the user assesses the differences between expectation and observation. Expectation can be updated by learning. When the observation is unexpected, the user tends to re-explore the product and to gather information about what caused the unexpected result [11]. Each time the user learns about the product, the expectation is updated, which in turn might bring changes to future action 4 selection strategies. Learning is an on-going process. Actually, in user-product interaction, the user can only interpret observations and make strategy accordingly but can never acquire all the information available in principle [11]. This is because the interface is not transparent, which inhibits the user from knowing what is happening inside the machine. Nevertheless, the user can explore a product until they gather enough knowledge to establish habits that they stick to. For instance, one can use a computer without necessarily knowing how the hardware works, thanks to the graphic user interface.
Regarding user motivation in virtual assistant interactions, a crucial aspect is the expectation of the assistant's capability. The user might ask themselves: 'How smart is it, after all?' or 'Where is its limit?' Accordingly, when the user has an intention ('I need book a fight'), expectation of interaction result is formed before asking. If the user holds an expectation (of the assistant's capability) lower than what the intention would ask for, they are unlikely to initiate an interaction for booking ('I'd rather book it from my smartphone app!'). In a recent survey-based research, many of the interviewees mentioned that sometimes usage of virtual assistants made it slower to reach the goal [7]. The user motivation (to interact) is clearly vulnerable to low capability expectation in respect to the difficulty of intended tasks.
While the aforementioned situations involved only extrinsic motivation (pragmatic intentions), we argue that intrinsic motivation would be damaged equally. When the user's intention is to seek enjoyment or reduce boredom, which by definition is intrinsically motivated, such motivation can also be damaged if the capability expectation in terms of entertainment capability is low. And in such a situation, the user just 'won't bother' to interact for fun.
Unlike video games, which solely serve entertainment purposes, virtual assistant carries both pragmatic and entertainment functions. It is likely that capability expectations of a virtual assistant are formed throughout daily usage, where expectation to pragmatic and entertainment capabilities are not independent to each other. In other words, both intrinsic and non-intrinsic motivation are affected by the capability expectation. This statement is based on a mapping between motivation type and interaction type. Interactions that are enjoyment-oriented are intrinsically motivated (by the definition of intrinsic motivation [4]). Interactions, whose goal is to complete tasks or raise productivity, are extrinsically motivated (by the definition of extrinsic motivation [4]), hence non-intrinsically motivated. Interactions, whose goal is to gather information or learn, are driven by epistemic motivation, therefore also nonintrinsically motivated [12]. Now that we know motivation and action selection are affected by expectation, the next question is: what factors affect the forming of expectations and the learning process?

Expectation and uncertainty in virtual assistant interactions
In a previous study addressing expectation-based sensory perception (of physical property such as weight), a computational model was proposed and validated, explaining how human's posterior perception is formed based on prior expectation and information gained through surprising observations. The bias between prior expectation and posterior perception (the bias is termed 'expectation effect') is a function of uncertainty, prediction error, and noise [13]. According to this model, the forming of posterior perception follows the Bayesian Estimator [13,14,15].
Although this model addressed sensory perception, the knowledge is transferable to virtual assistant interaction. Hereby we burrowed Yanagisawa's model [13] to understand expectation and perception of 5 the virtual assistant's capability. In Fig 1, the horizontal axis represents the assistant's capability. The observation mean indicates that the assistant is capable of tasks easier than the difficulty represented by the mean; meanwhile, it indicates that the assistant is incapable of tasks beyond this difficulty to the right.
Similarly, the prior expectation mean indicates the expected capability before observation. The capability expectation is learned through previous interactions. The user compares the assistant's expected capability against the expected 'task difficulty', i.e. the minimal required capability in order to cope with the user's intention. If the assistant fails 'rather easy' tasks, which are expected within the capability, the observation will be on the left of the expectation mean, hence the posterior perception of capability will shift to the left ('The assistant is not as capable as I thought.') Conversely, if the user 6 observes that the assistant copes with difficult tasks, which are expected beyond the capability, the posterior perception of capability will shift to the right ('The assistant is more capable than I thought.') Coping with tasks that are expected within the capability or failing tasks that are expected beyond the capability does not affect the mean of posterior.
In the long run, posterior perceptions update the user's experience, thus affecting the prior expectation in the future. In virtual assistant interaction, inconsistent performance by the virtual assistant gives rise to uncertainty. For instance, consider the following scenario: a virtual assistant complies with a task; just moments later, it fails to comply with an apparently similar task. Observing such inconsistent performance results in a flattened posterior distribution compared with when uncertainty is small. In other words, the user becomes uncertain about the virtual assistant's capability.
So far, we can conclude that the virtual assistant's capability is perceived in consideration of 'task difficulty' and that uncertainty weakens the confidence that the user places in their expectation. What has expectation and uncertainty to do with intrinsic motivation, then? In the following paragraphs, we will discuss and propose hypotheses on their effects on intrinsic motivation.

Hypothesis on expectation
From here on, we will use 'high' versus 'low' to describe capability expectation: high expectation means the assistant is expected by the user to be able to cope with difficult tasks or satisfy complex intentions.
Here, high versus low, as well as difficult versus easy, are discussed using relative measurement.
Considering what the virtual assistants of the present time are capable of, a real-world example of an 'easy' task would be 'Set a 3-minute timer.' Conversely, a 'difficult' (if not impossible) task would be 'Wake me up at 4 a.m. if Team A still has a chance to advance', when the chance is actually dependent on another game that ends at 3:30 a.m.
As aforementioned in Section 2.1, the user's motivation to interact with a virtual assistant is limited by the expected capability. The higher the expectation, the more rational it is for the user to believe that their intention will be satisfied through interaction. The lower the expectation, the more likely the user would 'not bother' to interact. Therefore, we hypothesized that high expectation has positive effects on both intrinsic and extrinsic motivation to further interact with the assistant.

Hypothesis on uncertainty
As we discussed in Section 2.2, uncertainty can be perceived through observing inconsistent performances by the assistant. For here on, we will use 'small' versus 'large' to describe uncertainty. The uncertainty is small (if not non-existing) when the assistant's performance is always consistent in response to the same sort of task. Conversely, the uncertainty is perceived if the assistant copes with rather difficult tasks, and later fails at easier tasks of the same genre. Similar to expectation, large versus small uncertainty is discussed using relative measurement. Another way to interpret uncertainty is to ask: 'How capable is the assistant, after all?' The more difficult to judge its capability, the flatter the prior distribution (as a result of larger uncertainty). In our experiment, to introduce large uncertainty we deliberately made the virtual assistant perform in an inconsistent manner.
We hypothesized that uncertainty's effect on motivation depends on the degree of expectation. If the user holds high expectation of an assistant, the intrinsic motivation to further interact will be greater if small 7 uncertainty has been perceived during previous interactions. If the expectation is low, we hypothesized that uncertainty will work in the opposite way. Fig 2 illustrates the reasoning behind the hypothesis. Assume that the 'task difficulty' of an intention is indicated by the vertical dashed line. When the expectation is low, the prior mean is on the left side of the dashed line. In this situation, large uncertainty (the flatter distribution) portends higher probability that the assistant can satisfy the intention, than small uncertainty. Conversely, when the expectation is high, large uncertainty portends higher probability that the assistant is unable to satisfy the intention.

Fig 2a. Hypothesis on uncertainty.
A supposedly difficult task 'intersects' with the large uncertainty prior distribution at higher probability than when uncertainty is small. When the expectation is low, it is more rational to expect a large-uncertainty assistant to perform better than it usually does.

Fig 2b. Hypothesis on uncertainty.
A supposedly easy task 'intersects' with the large uncertainty prior distribution at higher probability than when uncertainty is small. When the expectation is high, it is more rational to expect that a large-uncertainty assistant might fail to cope with easy tasks.
To sum up, we hypothesized a positive effect by expectation on motivation, and an interaction effect by expectation and uncertainty on motivation. To verify these hypotheses, we designed a 2 by 2 experiment, where expectation was manipulated within-group while uncertainty was manipulated between-group. In the next chapter, we will explain how we created small versus large uncertainty and high versus low 8 expectation, in the experimental settings. We will also explain why expectation is better manipulated within-group than between group (at the end of Section 3.2, Manipulation of variables).

Experiment design
To verify the hypotheses, we conducted an experiment using Echo Dot and Echo Plus, the smart speakers by Amazon (Fig 3). In the experiment, the virtual assistants had different wake words (names): 'Echo' and 'Alexa'. Participants were told that they were two inherently different assistants; no additional information, such as information regarding capability, was informed throughout the experiment (see Section 3.4 Implementation and S4 Appendix). The experiment involved two task sections, followed by a 'free-choice period', during which the participant can interact (can also decide not to interact) with the virtual assistants. While the participants believed they were asked to interact with the virtual assistants during the task sections, it was actually the experimenter covertly simulating autonomous responses using Wizard of Oz method [16]. However, in the ensuing 'free-choice period', the Wizard of Oz method manipulation was not used, so that the participant could interact with the real Amazon Echo assistants (they still had different wake words, 'Alexa' and 'Echo'). The purpose of the task sections is to shape the participant's prior expectation and uncertainty regarding the two assistants, which is the basis of their action selections once they move on to the free-choice period. The purpose of the free-choice period is to covertly observe any spontaneous interaction initiated by the participant, which serves as the observed behavior measurement of intrinsic motivation [17].
Since we have two variables, uncertainty and expectation, we used a two by two design. Participants were randomly assigned to either the small uncertainty group or to the large uncertainty group. For each participant, there were two task sections. During each section, the participant was asked to interact with a different assistant and carry out interactive tasks designated in a different task list.
The two task lists differentiated in overall task difficulty. As we will elaborate later in Section 3.2, we designed tasks and their responses such that, after the two task sections, the participant should conclude that one assistant is more capable than the other. In other word, by the time the participant gets a chance to interact freely (in free-choice period), we have 'sculptured' a high-expectation assistant and a lowexpectation assistant. Lastly comparing the two groups, the participants in the large uncertainty group should place less confident in their expectations (no matter low or high), than those in the small uncertainty group. In the following paragraphs, we will explain how uncertainty and expectation were manipulated.

Manipulation of variables
The Wizard of Oz method allowed us to manipulate the response correctness by the virtual assistants, regardless of Amazon Echo's real capability. Responses during the task sections were audio files generated using the Alexa Developer Console prior to the data collection phase. During the task sections, we streamed the audio files via Bluetooth from the smart speaker.
Task lists and response manipulation details are shown in Tables S1 and S2. The tasks involved 'coin flip', 'dice-roll', 'get random number', 'judge a number is odd or even', 'smart home light bulb maneuver', 'converting units', 'taking notes', and combinations of them. These tasks were selected because: 1) they were commonly seen functions of real virtual assistants; 2) the responses are timeinvariant (unlike weather inquiries) and; 3) the responses are relatively short.
In order to introduce uncertainty, we made the virtual assistants in the large uncertainty group respond in inconsistent manner. For instance, we made the assistant cope successfully with the task 'Get a random number between 1 and 6'. Then, for the next task, 'Get two random numbers between 1 and 6', we made the assistant fail (see tasks 4 and 5 in Table S1A and their response manipulation in Table S2A). We assume that these two tasks are of equal difficulty. Therefore, we consider it as 'inconsistent performance'.
In order to differentiate high versus low capability expectation, we made the high-expectation assistant capable of accomplishing a list of difficult tasks, while the low-expectation assistant was made capable of easy tasks but begins to struggle when the task difficulty escalated. Difficulty was added gradually: the easy task list (for the low-expectation assistant) mainly consisted of single task, whereas the difficult task consisted of successive tasks (do A and then do B), storing variables, calculations using variables, etc. Actually, as a recent survey-based study on 'voice assistant' usages revealed, today's voice assistants still cannot 'remember context', nor can they 'answer several questions at once.' [7] The capability 'ceiling' of the low-expectation assistant is at 'successive tasks'. For instance, the lowexpectation assistant is made able to 'Flip a coin' and 'Roll a dice' separately but cannot cope with 'Flip a coin and then roll a dice'. The high-expectation assistant, however, is made able to cope with 'Roll a dice and flip that many coins', which even involves calculations using variables. In the experiment, the wake words of the low-and high-expectation assistant were 'Echo' and 'Alexa', respectively.
In sum, in this experiment, uncertainty was manipulated between-group, while expectation was manipulated within-group. Participants from the large uncertainty group experienced 'large uncertainty-11 low expectation' condition and 'large uncertainty-high expectation' condition. Participants from the small uncertainty group experienced 'small uncertainty-low expectation' condition and 'small uncertainty-high expectation' condition.
The reason that expectation was manipulated within-group has to do with unwanted 'learning' effects. If we used between-group design for expectation, the two task lists would need to have equally difficult tasks for each group. Meanwhile, the two assistants still need to show different responses to those virtually identical tasks in order to manipulate uncertainty. If that is the case, participants would expect the same interaction outcomes before the second task section (since they can read task lists before interacting, see Section 3.4 and S4 Appendix), only to be surprised by a different 'can-do' and 'cannotdo' profile from the second assistant. Eventually, this would cause participants to have different capability expectations of the two assistants after experiencing both task sections, therefore we believed expectation must be manipulated within-group.

Measurement
Intrinsic motivation was assessed by self-reported and observed behavior measurements. For selfreported measurement, we used questionnaires taken partially from the Intrinsic Motivation Inventory [18] to appraise five aspects of user attitudes toward the virtual assistant: intrinsic motivation of the user, smartness of the assistant, comprehensibility (the extent to which the virtual assistant's thought process is understandable), trust, and human-likeness (see Table S3). Additional questions were borrowed or modified from previous studies on motivation and technology acceptance [19,20]. For observed behavior measurement, we applied the free choice paradigm, in which participants are left alone in the experiment room and made to believe that they are no longer under observation. During this free choice period, participants can initiate interactions of their own decision or do nothing at all [17,21]. The number of interactions within the free choice period (which lasts for five minutes) served as an indicator of intrinsic motivation at the behavioral level.

Implementation
The experiment was conducted indoor. The smart speakers Echo Dot and Echo Plus were placed on a table, next to a Philips Hue that is mounted on a bulb socket (Fig 3, at the lower left corner). The experimental setting is shown from above in Fig 4. 12

Results
We used two-way ANOVA to analyze the effects of expectation and uncertainty on self-reported intrinsic motivation and other sub-scales (see Table S3). The questionnaire scores can be found in Spreadsheet

Fig 11. Accumulated numbers of interactions by uncertainty and virtual assistant in charge.
Note that the participants could choose freely to interact with either assistant or to avoid any interaction. 18 The observed interactions can be classified according to the type of motivation. We classified interaction as either intrinsically motivated or non-intrinsically motivated. Recent survey-based studies on motivation of using 'voice assistant' found that users use voice assistants for fun or to avoid boredom; such activities include 'asking for a joke, taking a quiz, reading a poem out loud, and playing social games with friends', etc. [1,2,7]. These activities were classified as 'motivated by hedonic benefits' in those literatures (as opposed to 'utilitarian benefits' [1]). Although different terminologies were used, it is clear that the above-mentioned activities are driven by the need for enjoyment or pleasure, therefore should be classified as intrinsically motivated. Intrinsically motivated interactions we observed during free-choice period included pastime and playing activity, which includes game requests, music requests, chatting (e.g. asking the assistant's favorite color), and trivia questions.
Non-intrinsically motivated interactions encompass epistemically-and extrinsically motivated interactions. Extrinsically motivated interactions are goal-oriented or efficiency-oriented, pragmatic, utilitarian uses of the assistants [1,7]. These usages are 'useful and convenient' usages for the user [2].
Epistemically motivated interactions were not explicitly discussed in the above-mentioned literatures.
However, Weber et.al. [7] classified interactions which serve the purpose to 'test the conversational agents' understanding of language and to push it to its limits' as 'trolling', which neither falls into entertainment nor utilitarian uses. Friston et.al. [12] related extrinsic-and epistemic motivation to 'exploitation and exploration', respectively. Exploitation means executing a pragmatic action which fulfills goals directly ('I want the virtual assistant to do this for me'); exploration discloses information that enables pragmatic action in a long run ('What can I do with the assistant?').
In our experiment, we observed epistemically motivated interactions such as test and trial activities, which included attempting unaccomplished tasks from the task sections and exploring the assistants' capability using consecutive challenges (e.g. "My name is…" followed by "Do you know who I am?").
We argue that these activities are driven by the need to understand the virtual assistants' limitations, namely, capability.
We also observed extrinsically motivated interactions, for instance, searching restaurants ( "Find a Sushi place near me"). We argue that participants initiated such interactions to fulfill pragmatic goals.
So far, we have used specific, interaction content-based criteria to classify our observations. For the sake of experimental reproducibility, in case duplicated experiment observes activities beyond our classification, we hereby propose the generalized criteria as follows. Criteria for intrinsically motivated interactions are: 1) the outcome of interaction does not convey new knowledge or information regarding the assistant's capability or limitations; and 2) the interaction does not increase productivity or efficiency.
In light of the above-discussed classification criteria, participants in the small-uncertainty group engaged  We have made box plots of the numbers of interactions by motivation type and uncertainty group under both expectation conditions (Figs 14 and 15). In Fig 14, a reversed proportion of motivation types can be observed. The median numbers of intrinsically motivated interactions for the large and small-20 uncertainty groups were two and four, respectively. On the other hand, the median number of nonintrinsically motivated interactions for the large-uncertainty group was two, while for the smalluncertainty group it was one.  We used a paired sample t-test to compare numbers of interactions by motivation type. For the smalluncertainty group, t(4)=2.132, p=0.002, indicating more engagement caused by intrinsic motivation than by non-intrinsic motivation. For the large-uncertainty group, only three participants engaged in interaction at all, which caused insufficient significance: t(2)=2.919, p=0.211. Nevertheless, those three participants initiated more interactions for non-intrinsic than for intrinsic motivation. We were unable to do the same comparison under low-expectation conditions, because only three participants engaged in interaction, of whom only one was from the large-uncertainty group.

Discussion
The self-reported results, taken together, did not support our hypothesis of an interaction effect. However, we made unexpected findings in the observed behavior measurement (see Fig 11). During the free-choice period, participants from both uncertainty groups preferred to interact with the virtual assistant with high capability-expectation (41 interactions) than with the low expectation one (14 interactions). Moreover, participants in the small-uncertainty group had more interactions (33) than participants in the largeuncertainty group (22).
In interpreting Figs 14 and 15, it must be borne in mind that all the participants were given the same amount of time during the free-choice period. Two types of motivation must have been competing against each other. We inferred that, after experiencing large uncertainty, participants tended to act out of nonintrinsic motivation. Conversely, if the uncertainty was small, participants tended to act out of intrinsic motivation.
The reversed motivation type seen in Fig 14 was the result of different action-selection strategies.
According to the active inference theory [12], when a precise goal is extant, extrinsic and epistemic values dominate action selection situationally. Friston et.al. used the term 'intrinsic' as a synonym of 'epistemic'. While extrinsic value is an action's pragmatic potential to reach the goal, epistemic value is defined as an action's potential to enable pragmatic actions in the long run. According to the theory, uncertainty 'about the state of the world' affects action selection strategy. When uncertainty is extant, epistemic value dominates action selection, in other words, humans always prioritize reducing uncertainty through exploration. Conversely, when there is no uncertainty, one tends to act out of extrinsic motivation to fulfill precise goals. In the latter situation, action policies no longer differ in epistemic value, since uncertainty cannot be further reduced; therefore, extrinsic value dominate action selection. On the other hand, when there is no precise goal, the author argued epistemic value can dominate action selection. Fig 16 shows

Verification experiment of uncertainty's effects
In order to verify the revised model, we created two levels of uncertainty, while encouraging relatively high expectation under both conditions. Instead of faking inconsistent performance, we proposed a method to reduce uncertainty from the baseline level.

Manipulation of uncertainty
Consider a scenario where the virtual assistant appears unable to cope and produces an ambiguous response such as: "Sorry, I don't know." Since the response does not disclose causes of the failed task ('What went wrong?'), it can be difficult for the user to adjust their strategies and utterances in future interactions. In some cases, the virtual assistant is able to recognize the user's utterances, but the task is beyond its capability. An example is the task 'doing multiple tasks at once'. Alexa would never hint the user of her limitation (cannot do successive tasks); the user has to figure out through trials. We argue that uncertainty about the (expected) capability can be reduced by disclosing causes of failed tasks. That is, by issuing informative response instead of ambiguous response.
In this experiment, the only variable is uncertainty, which has two levels, small and large therefore we manipulated uncertainty within group. All participants were asked to interact with the small-uncertainty assistant in one of the task sections and interact with the large-uncertainty assistant in the other (order counterbalanced). The wake words were 'Alexa' and 'Echo' for the small-and large uncertainty assistant, 24 respectively. The experiment procedures were the same with the first experiment.
The small-and large uncertainty task lists are documented in Table S6. In the large uncertainty task list, the virtual assistant only issues ambiguous responses when 'unable' to carry out the task. Conversely, in the small uncertainty task list, informative responses were issued. For instance, when asked to 'calculate 1000 US dollars minus 70,000 Japanese Yen', we prompted an informative response: 'Sorry, I can't calculate between different currencies.' Since the responses by the two assistants do not differ in terms of the 'can-do' and 'cannot-do' profile, we argue that the expectations induced by the two task sections should be identical.

Implementation
The experimental setup and procedures, including instructions, were identical to those in the first experiment (see 3.4 Implementation and S4 Appendix). The questionnaires used can be found in Table   S7. To counterbalance the effects of order, half the participants started in the large uncertainty condition, and half in the small. The experimental procedures are detailed in Fig S8. Ten students from the University of Tokyo (4 female, 6 male, age ranges from 21 to 27, experiment conducted in July 2019) participated in the experiment. One set of data had to be discarded because devices lost connection to the internet during task sections. All participants were new to virtual assistants, and none of them had participated in Experiment 1. Participants were rewarded with 1000 Japanese Yen for participating. They were informed of the monetary reward when recruited and received the reward after the experiment. The study protocol was approved by the Ethics Committee of the Graduate School of Engineering, the University of Tokyo. All participants provided written informed consent prior to their participation in this study.

Results
We examined the content of interactions during the five-minute free-choice period. The complete protocols can be found in Table S9. Eight of the ten participants engaged in spontaneous interactions during the free-choice period. Only two participants made conversation with the large-uncertainty virtual assistant; the other participants only interacted with the small-uncertainty virtual assistant. As in Experiment 1 (and as we had predicted), the conversation contents could be sorted into categories.
Testing and making trials were regarded as evidence of non-intrinsic motivation; playing and pastime activities were regarded as evidence of intrinsic motivation. We excluded conversations that were interrupted or received no response. The number of interactions of both kinds are presented in Fig 18.

Fig 18. Accumulated number of interactions by motivation type.
Intrinsically motivated interaction occurred 45 times, while non-intrinsically motivated interaction occurred 11 times. The large-uncertainty virtual assistant did not draw the attention of six of the nine interacting participants at all.
We used a paired sample t-test to compare the numbers of interactions targeting the large-and smalluncertainty virtual assistants. The results showed that participants interacted with the small-uncertainty virtual assistant more than with the large-uncertainty one. The difference had statistical significance: t(8)=3.1724, p=0.007.
Next, we used a one-tail paired t-test to compare the numbers of intrinsically versus non-intrinsically motivated interactions targeting the small-uncertainty virtual assistant. The results showed that participants had more intrinsically motivated interactions than non-intrinsically motivated ones. The difference had statistical significance: t(8)=3.5228, p=0.004. The results are shown in the box plot in Fig   19. For completeness, we remark that the mean of the number of interactions with the small-uncertainty virtual assistant was 5.6667, with a standard deviation of 4.8218.

Fig 19. Average number of interactions by uncertainty and motivation type.
On the other hand, when the interaction partner was the large-uncertainty virtual assistant, we could not conclude which type of motivation caused more interaction: the p-value was 0.2971. However, the total number of interactions was considerably smaller (mean=0.5556, standard deviation=1.0138).
We used a paired sample t-test to compare self-reported intrinsic motivation, smartness, and

Discussion
In this case, our hypothesized model (Fig 17) was validated by experimental findings on the behavioral level. We successfully fostered intrinsically motivated interaction by issuing informative responses that explained the reasons for failure. As shown in Fig 18,  We can conclude that, when operating under high-expectation conditions, reducing uncertainty is an effective way to foster intrinsically motivated interaction. When the user expects the virtual assistant to cope but it fails, uncertainty immediately rises because of the unexpectedness of such behavior. However, an informative response soon afterwards serves to decrease this uncertainty, since it removes the user's uneasiness about the hidden causes of the unexpected event. For a short period, and as long as the user's attention is still directed on the interaction, there is no more uncertainty to reduce. As we expected, such absence of uncertainty is important for determining whether intrinsic or epistemic value dominates in action selection, and thus whether interactions are spontaneous.

Mechanism of intrinsically motivated interaction
We constructed a theory to discuss user motivation and spontaneous interactions in the context of conversing with virtual assistants. Our theory withstood experimental examination: we proved that when the user has high expectations of capability, intrinsically motivated interaction can be fostered by reducing uncertainty (more specifically, by making the virtual assistant issue an informative response whenever it is unable to satisfy the user's intention).
The mechanism behind our findings is best explained by focusing attention on three interlocked aspects: motivation, expectation, and action selection. Motivation depends on the expectation of capability: it will result in spontaneous interaction only when the user anticipates that the virtual assistant will successfully cope with the stated intention.
We emphasize that user motivation should be divided into two categories: intrinsic and non-intrinsic.
Intrinsic motivation is derived from the expected enjoyment or pleasure of the action itself. Non-intrinsic motivation is derived from the need to achieve an external goal and should be further divided into epistemic and extrinsic motivation.
In general, if the user is exploring whether the assistant is capable of doing something, the action should 27 be considered epistemically motivated. If the user is certain about the capability and the interaction promises pragmatic value in terms of fulfilling separable goals by the user, it should be seen as extrinsically motivated. Borrowing terminologies from the active inference theory, the former case counts as exploration, which enables future pragmatic actions; the latter counts as exploitation, which fulfills the goals directly [12].
In the context of virtual assistant interaction, intrinsically motivated interactions involve activities such as pastime and playing activities. Epistemically motivated interactions involve testing and making trials ('What can I do with the assistant?'). Extrinsically motivated interactions involve pragmatic usages to gain information or have some 'chores' done by the assistants instead of the users themselves, thus saving time [1]. For clarity, using the virtual assistant as a learning tool counts extrinsic usage since the separable goal is to learn knowledge.
Spontaneous interactions can be intrinsic as well as non-intrinsic. To ask when actions are intrinsically motivated is to ask how they are selected. Action selection, according to reversal theory [22], depends on the user's state of mind. The human mind can operate in two meta-motivational modes: the goaloriented telic mode and the activity-oriented paratelic mode. Intrinsically motivated interaction is most likely to occur in the paratelic mode. The mind shifts between these two modes situationally; such shifts are termed "reversals". Our work shows that an informative response can facilitate reversal from the telic to the paratelic mode.
Issuing an informative response is merely one of many methods to foster intrinsically motivated interaction; it works by reducing uncertainty, so that the user is placed in a comfortable condition (paratelic mode), free from pressure. But one still might ask why reducing uncertainty is the key to intrinsic motivation.
Imagine being trapped in a jungle and facing dire survival concerns. Clearly, actions are not likely to be taken for enjoyment of the activity itself. One must explore the environment, in other words, reduce uncertainty, before deciding which way to go as well as where to find food. Interestingly, the human mind can actually derive pleasure from resolving uncertainty; designers and artists have long used this trick to add novelty to their work. In the theory of aesthetic valence, it is pointed out that optimum novelty requires work to be perceived initially as a chunk of incomprehensible information (which introduces uncertainty) [23]. Artists and designers weave hints and cues that facilitate sense-making into the novel work, so that it does not take observers too long to arrive at an understanding of the work. The pleasure perceived when the work finally makes sense is termed the "second reward" [23], as opposed to the first reward from simply receiving the information. The informative response strategy works the same way.
An informative response would most likely start with 'Sorry, I can't / I don't know…'. The user gets 'triggered' by the 'Sorry', knowing that their intention is not to be satisfied. From there, the informative part explains 'what went wrong' or 'where was the limit', helping the user to make sense. It does not only offer consolation, but also provides the user with hints for future interaction. For instance, after hearing that the assistant cannot perform multiple tasks at once, the user might adjust interaction strategies to break down complex intention into single tasks. Actually, recent studies on explainable AI stressed the importance of clarifying the assistant's limitations such that users understand what interaction choices they have. [7,24,25] Studies on the relationship between physiological arousal and hedonic states have also stated that reducing uncertainty "makes the world enjoyable" [26,27]. Certainly, this does not mean that, from the hedonic point of view, an informative failure response would be better than a coping response. As we observed in experiments, high performance by the virtual assistant leads to high expectation of capability, 28 resulting in more intrinsically motivated interactions. Nevertheless, if we wish to apply such knowledge to foster intrinsically motivated interaction, it is effective to create a second reward when the user's intention cannot be satisfied, by informing the limitations of the assistant after the inevitable 'Sorry'.

Application to virtual assistant design
Our experimental findings are transferable to practical virtual assistant design. Modern virtual assistants recognize users' intentions by filling slots. Slots are variables that are embedded in applications to serve the user's requests. They are 'filled' by picking up key information from user utterances. For instance, a flight booking application typically expects to fill slots such as 'destination', 'date', etc. For Amazon Echo series, developers are allowed to define intents and slots in order to capture meaningful information from user utterances. For each intent and under each slot there can be multiple corresponding utterances, which are spoken phrases with a high likelihood of being used to convey intent [28].
If the assistant picks up an utterance that is understandable but not implementable, it should issue an informative response. To achieve this, developers should collect utterances that are recognized, but with which the assistant cannot yet cope. Next, they need to categorize the reasons for this failure and try to obtain clusters of such reasons. Each cluster should then be addressed by a set of (at least one) informative responses explaining to the user that the virtual assistant cannot cope with the intention yet.
Addressing ambiguous responses cluster by cluster is not an exhaustive method, but it does not take an exhaustive method to reduce uncertainty. For instance, telling users "I'm not able to do two things at once" actually forestalls a very large set of problems that would be impossible to list exhaustively.

Limitations
A major limitation of this study is that manipulation of expectation and uncertainty was implemented only in task sections, in which all activities were extrinsically motivated. In addition to our current findings, we wish to investigate results of such manipulation when they are performed during intrinsically motivated or intrinsically motivating activities. In real-life interaction scenarios, the learning process is experienced through tasking as well as through playing. Moreover, one study found that, in the context of gaming, the uncertainty of success brings suspense, which in turn leads to enjoyment [29].
Indeed, with the current findings we cannot predict how uncertainty during intrinsically motivated interaction will further affect ensuing action selection, because the user's acceptance of uncertainty is likely different when in paratelic rather than telic mode.