[Tom's Home Page]
[Professional] [Life,
Fun, &c] [Tell Me...]
Designing Agents as if People Mattered
Thomas Erickson
User Experience Architects' Office, Apple Computer
(now at) snowfall@acm.org
One of Apple Computer's buildings used to have an advanced energy management
system. Among its many features was the ability to make sure lights were
not left on when no one was around. It did this by automatically turning
the lights off after a certain interval, during times when people weren't
expected to be around. I overheard the following dialog between a father
and his six year old daughter, one Saturday evening at Apple. The energy
management system had just noticed that the lights were on during 'off hours,'
and so it turned them off.
- Daughter: Who turned out the lights?
- Father: The computer turned off the lights.
- Daughter: (pause) Did you turn off the lights?
- Father: No, I told you, the computer turned off the lights.
- (someone else manually turns the lights back on)
- Daughter: Make the computer turn off the lights again!
- Father: (with irony in his voice) It will in a few minutes.
I like this vignette. It illustrates a number of the themes we're going
to be exploring in this chapter. It is evident that the child is struggling
to understand what is going on. She clearly had a model of how the world
worked: people initiate actions; computers don't. But the world didn't behave
as expected. Even after double checking to make sure Dad really didn't turn
off the lights, she still assumed that ultimately he was in control: surely
he could make the computer turn off the lights again. In this, too, she
was mistaken. One wonders how the little girl revised her model of the world
to account for the apparently capricious, uncontrollable, but semi-predictable
behavior of the computer. The computer as weather? The computer as demigod?
Just like the little girl, we all strive to make sense of our world. We
move through life with sets of beliefs and expectations about how things
work. We try to understand what is happening. We make up stories about how
things work. We try to change things. We make predictions about what will
happen next. The degree to which we succeed in doing these things is the
degree to which we feel comfortable and in control of our world.
As the opening vignette illustrates, we have no guarantees that technology
will behave in accordance with our expectations and wishes. We may suddenly
find ourselves in the dark, wondering what on earth happened. The goal of
this chapter is to explore ways of preventing this. The central theme is
that we need to focus not just on inventing new technologies, not just on
making them smarter, but on designing technologies so that they fit gracefully
into our lives.
Agents are a case in point. As this volume illustrates, a lot of work is
being directed at the development of agents. Researchers are exploring ways
to make agents smarter, to allow them to learn by observing us, to make
them appear more lifelike. However, relatively little work is being focused
on how people might actually experience agents, and on how agents might
be designed so that we feel comfortable with them.
What Does "Agent" Mean?
To begin, let's take a look at the concept of agent. "Agent"
is the locus of considerable confusion. Much of this confusion is due to
the fact that agent has two different meanings that are often conflated.
One way in which the word is used is to designate an autonomous or semi-autonomous
computer program. An agent is a program that is, to some degree, capable
of initiating actions, forming its own goals, constructing plans of action,
communicating with other agents, and responding appropriately to events--all
without being directly controlled by a human. This sense of agent implies
the existence of particular functional capacities often referred to as intelligence,
adaptivity, or responsiveness. To discuss agents in this sense of the term,
I will use the phrase adaptive functionality .
The second meaning of agent is connected with what is portrayed to the user.
Here, agent is used to describe programs which appear to have
the characteristics of an animate being, often a human. This is what I will
call the agent metaphor. The agent metaphor suggests a particular
model of what the program is, how it relates to the user, and its capabilities
and functions. Examples of the agent metaphor include the bow-tied human
figure depicted in Apple's Knowledge Navigator video (Apple 1987), the digital
butler envisioned by Negroponte (this volume), and the Personal Digital
Parrot described by Ball and his colleagues (this volume).
Now, of course, these two meanings of agent often go together. A common
scenario is that of a program which intercepts incoming communications and
schedules meetings based on a set of rules derived from its understanding
of its user's schedule, tasks, and responsibilities. Such a program might
be portrayed using the metaphor of an electronic secretary, and would of
course require adaptive functionality to learn and appropriately apply the
rules. But it is important to recognize that the metaphor and functionality
can be decoupled. The adaptive functionality that allows the 'agent' to
perform its task need not be portrayed as a talking head or animated character:
it could, for example, be presented as a smart, publicly accessible calendar.
Thus, someone wanting to schedule a meeting could log on to it and directly
schedule a meeting in an available slot. The rules would still be present,
but rather than being portrayed through an agent which handled the scheduling,
they would be reflected in which (if any) calendar slots were made available
to the person seeking the meeting.
It is important to distinguish between these two meanings of agent because
each gives rise to different problems. Adaptive functionality raises a number
of design issues--as we saw in the opening vignette--that are independent
of how it is portrayed to users. Programs that take initiative, attempt
to act intelligently (sometimes failing), and change their behavior over
time fall outside our range of experience with computer programs. Likewise,
the agent metaphor has its own set of problems that are distinct from those
caused by adaptive functionality. Portraying a program as a human or animal
raises a variety of expectations that designers have not had to deal with
in the past.
In this chapter we will explore the difficulties surrounding adaptive functionality,
and the agent metaphor, respectively. In the first case, I describe the
three basic problems that computer researchers and developers will have
to address, regardless of whether or not they use an agent metaphor. In
the second case, I discuss how people react to the agent metaphor, and consider
the implications of these reactions for designing agents. Finally, we look
beyond the surface of the agent metaphor and note that it suggests a very
different conceptual model for human computer interfaces. This, in turn,
has implications for when and how the agent metaphor should be used. Throughout
the chapter, the ultimate concern is with how to design agents that interact
gracefully with people. What good are agents? When should functionality--adaptive
or not--be portrayed through the agent metaphor? What benefits does depicting
something as an agent bring, and what sort of drawbacks? While there are
no absolute answers, an understanding of some of the tradeoffs, as well
as issues that require further research, can only aid us as we move into
the future.
Adaptive Functionality: Three Design Issues
Whether our future is filled with agents or not, there is no question
that there will be lots of adaptive functionality. Consider just a few of
the things brewing in university and industry laboratories:
- After observing its user performing the same set of actions over and
over again, a computer system offers to produce a system-generated program
to complete the task (Cypher 1991).
- An adaptive phone book keeps track of which numbers are retrieved;
it then uses that information to increase the accessibility of frequently
retrieved numbers (Greenberg and Whitten 1985).
- A "learning personal assistant" fits new appointments into
the busy calendar of its user, according to rules inferred by observing
previous scheduling behavior (Mitchell, et al. 1995).
- A multi-user database notices that over time certain seemingly unrelated
bibliographic records--call them X and Y--are frequently retrieved in the
same search session. It uses that information to increase the probability
that Y is retrieved whenever X is specified, and vice versa (Belew 1989).
- A full text database allows its users to type in questions in plain
English. It interprets the input, and returns a list of results ordered
in terms of their relevance. Users can select an item, and tell it to 'find
more like that one' (Dow Jones & Co. 1989).
- A variety of recognition systems transform handwriting, speech, gestures,
drawings, or other forms of human communication from fuzzy, analog representations
into structured, digital representations.
In general, systems with adaptive functionality are doing three things:
- noticing: trying to detect potentially relevant events
- interpreting: trying to recognize the events (generally, this means
mapping the external event into an element in the system's 'vocabulary')
by applying a set of recognition rules
- responding: acting on the interpreted events by using a set of action
rules, either by taking some action that affects the user, or by altering
their own rules (i.e. learning)
Thus, a speech recognition system tries to notice sounds that
may correspond to words, tries to interpret each sound by matching
it to a word in its vocabulary (using rules about phonetics and what the
user is likely to be saying at the moment), and then responds
by doing an action that corresponds to the word it recognized, reporting
an error if it couldn't interpret the word, or adjusting its recognition
rules if it is being trained.
Such adaptive functionality holds great promise for making computer systems
more responsive, personal, and proactive. However, while such functionality
is necessary for enhancing our systems, it is not sufficient. Adaptive functionality
does no good if it is not, or can not be used; it may do harm if it confuses
its users, interferes with their work practices, or has unanticipated effects.
Notice that there are many chances for adaptive functionality to fail. The
system may fail to notice a relevant event (or may mistakenly notice an
irrelevant event). It may misinterpret an event that has been noticed. Or
it may respond incorrectly to an event that it has correctly noticed and
interpreted (that is, the system does everything right, but the rules that
it has for responding to the event don't match what the user expects). These
failures are important to consider because they have a big impact on the
user's experience. Let's take a closer look at some of the design issues
which are raised by adaptive functionality.
Understanding: What Happened and Why?
Consider an intelligent tutoring system that is teaching introductory
physics to a teenager. Suppose the system notices that the student learns
best when information is presented as diagrams, and adapts its presentation
appropriately. But even as the system is watching for events, interpreting
them, and adjusting its actions, so is the student watching the system,
and trying to interpret what the system is doing. Suppose that after a while
the student notices that the presentation consists of diagrams rather than
equations: it is likely that the student will wonder why: 'Does the system
think I'm stupid? If I start to do better will it present me with equations
again?' There is no guarantee that the student's interpretations will correspond
with the system's. How can such potentially negative misunderstandings on
the user's part be minimized?
Control: How can I Change It?
If the system makes an error--either because it has failed in notification
or interpretation, or because its actions are not in line with the users
wishes or expectations--what should the design response be? In most circumstances
the user ought to be given a way to take control of the system and to undo
what the adaptive functionality has wrought. But how is this to be done?
The problem is not simply one of providing an undo capability. That works
well for today's graphic user interfaces where users initiate all actions
and the "undo" command can be invoked when a mistake is made.
However, with adaptive functionality, the difficulty is that the user did
not initiate the action. This leads to several problems.
First, since the user didn't initiate the change, it may not be clear how
to undo it. Thus, the student who wants the teaching system to continue
presenting equations will have no idea what to do, or even where to look,
to make the system return to its earlier behavior. This is complicated by
the fact that it may take the user a while to notice that the system has
changed in an undesirable way, and so clues about what actually happened
have vanished.
Second, there may be a mismatch between the user's description of what has
happened, and the system's description of its action. What the user notices
may only be a side effect of the system's action. Users may need assistance
in discovering what the relevant action was in the first place, and it is
an open question of whether the system will be able to provide it. If the
tutoring system shifted to content which just happened to consist
of diagrams, a student searching for a way to modify the style
of presentation may be baffled. If the energy management system describes
its action as shutting off a particular power subsystem, a user searching
for a way to control the lights on the fourth floor may have difficulty.
All of this presupposes that the users understand that the system can be
controlled in the first place. What kind of model of the system is necessary
to make this clear? It would be important for the model to not only indicate
what aspects of the system can be controlled, but to provide an obvious
representation and set of methods for exercising control.
Prediction: Will it Do What I Expect?
Prediction goes hand in hand with understanding and trying to control
what is happening. Let's take a close look at an actual example of adaptive
functionality, found in a program called DowQuest (Dow Jones and Co. 1989).
DowQuest is a commercially available system with a basic, command line interface,
but very sophisticated functionality. It provides access to the full text
of the last 6 to 12 months of over 350 news sources, and permits users to
retrieve information via relevance feedback (Stanfill and Kahle 1986).
Rather than using a sophisticated query language, DowQuest allows users
to type in a sentence (e.g. 'Tell me about the eruption of the Alaskan volcano'),
get a list of articles, and then say--in essence--'find more like that one.'
Figures 1 and 2 show two phases of the process of constructing a query.
In Figure 1 the user has entered a question and pressed return. DowQuest
does not try to interpret the meaning of the question; in the example shown,
the system will drop out the words "tell," "me," "about,"
"the," and "of," and use the lower frequency words to
search the database. Next the system returns the titles of the 16 most 'relevant'
articles, where relevance is defined by a sophisticated statistical algorithm
based on a variety of features over which the user has no control (and often
no knowledge). While this list frequently contains articles relevant to
the user's question, it also usually contains items which appear--to the
user--to be irrelevant. At this point, the user has the option of reading
the articles retrieved or continuing to the second phase of the query process.
- tell me about the eruption of the alaskan volcano
- DOWQUEST STARTER LIST HEADLINE PAGE 1 OF 4
- 1 OCS: BILL SEEKS TO IMPOSE BROAD LIMITS ON INTERIOR . . .
- INSIDE ENERGY, 11/27/98 (935 words)
- 2 Alaska Volcano Spews Ash, Causes Tremors
- DOW JONES NEWS SERVICE , 01/09/90 (241)
- 3 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . .
- AVIATION WEEK and SPACE TECHNOLOGY, 01/01/90 (742)
- 4 Volcanic Explosions Stall Air Traffic in Anchorage
- WASHINGTON POST: A SECTION, 01/04/90 (679)
- * * * * *
Figure 1. The first phase of DowQuest interaction: the
user types in a 'natural language' query and the system searches the database
using the non-'noise words' in the query and returns a list of titles of
the 'most relevant' articles.
In phase 2 of the process (figure 2) the user tells the system which articles
are good examples of what is wanted. The user may specify an entire article
or may open an article and specify particular paragraphs within it. The
system takes the full text of the selections, drops out the high frequency
noise words, and uses a limited number of the most informative words for
use in the new query. It then returns a new list of the 16 'most relevant'
items. This second, relevance feedback phase may be repeated as many times
as desired.
- search 2 4 3
- DOWQUEST SECOND SEARCH HEADLINE PAGE 1 OF 4
- 1 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . .
- AVIATION WEEK and SPACE TECHNOLOGY, 01/01/90 (742 words)
- 2 Alaska Volcano Spews Ash, Causes Tremors
- DOW JONES NEWS SERVICE , 01/09/90 (241)
- 3 Volcanic Explosions Stall Air Traffic in Anchorage
- WASHINGTON POST: A SECTION, 01/04/90 (679)
- 4 Alaska's Redoubt Volcano Gushes Ash, Possibly Lava
- DOW JONES NEWS SERVICE , 01/03/90 (364)
- * * * * *
Figure 2. The second phase of DowQuest interaction: the
user instructs the database to find more articles like 2, 3 and 4, and the
system returns a new set of relevant articles. (Note that the first three,
'most relevant' articles are those that were fed back (an article is most
'like' itself); the fourth article is new.
New users generally had high expectations of DowQuest: it seemed quite intelligent.
However, their understanding of what the system was doing was quite different
from what the system was actually doing. The system appeared to understand
plain English; but in reality it made no effort to understand the question
that was typed in--it just used a statistical algorithm. Similarly, the
system appeared to be able to 'find more items like this one;' but again,
it had no understanding of what an item was like--it just used statistics.
These differences were important because they led to expectations that could
not be met.
Users' expectations were usually dashed when, in response to the first phase
of the first query, DowQuest returned a list of articles containing many
obviously irrelevant items. When this happened some users concluded that
the system was 'no good,' and never tried it again. While reactions like
this may seem hasty and extreme, they are not uncharacteristic of busy people
who do not love technology for its own sake. Furthermore, such a reaction
is perfectly appropriate in the case of conventional application: a spreadsheet
that adds incorrectly should be rejected. Users who had expected DowQuest
to be intelligent could plainly see that it was not. They did not see it
as a semi-intelligent system that they had control over, and that would
do better as they worked with it. This was quite ironic, as the second stage
of the process, relevance feedback, was the most powerful and helpful aspect
of the system.
Only a few users gave up after the first phase. However, efforts to understand
what was going on, and to predict what would happen continued to influence
their behavior. In the second phase of a DowQuest query, when users requested
the system to retrieve more articles 'like that one,' the resulting list
of articles was ordered by 'relevance.' While no computer scientist would
be surprised to find that an article is most relevant to itself, some ordinary
users lacked this insight: when they looked at the new list of articles
and discovered that the first, most relevant article was the one they had
used as an example, they assumed that there was nothing else relevant available
and did not inspect the rest of the list. Obviously, a system with any intelligence
at all would not show them articles that they had already seen if it had
anything new.
DowQuest is a very compelling system. It holds out the promise of freeing
users from having to grapple with arcane query languages. But, as is usually
the case with adaptive functionality, it doesn't work perfectly. Here we've
seen how users have tried to understand how the system works (it's smart!),
and how their expectations have shaped their use of the system.
How can designers address these problems? One approach is to provide users
with a more accurate model of what is going on. Malone, Grant, and Lai (this
volume), advocate this sort of approach, with their dictum of 'glass boxes,
not black boxes,' suggesting that agents rules be made visible to and modifiable
by users. This is certainly a valid approach, but it is not likely to always
work. After all, the statistical algorithm which computes the 'relevance'
of stories is sufficiently complex that describing it would probably be
futile, if not counterproductive, and allowing users to tinker with its
parameters would probably lead to disaster. In the case of DowQuest, perhaps
the aim should not be to give users an accurate picture of what is going
on. One approach might be to encourage users to accept results that seem
to be of low quality, so that they will use the system long enough to benefit
from its sophistication. Another approach might be to construct a 'fictional'
model of what the system is doing, something that will set up the right
expectations, but without exposing them to the full complexity of the system's
behavior. See Erickson and Salomon (1991) and Erickson (1995) for a discussion
other issues in this task domain, and a glimpse of one type of design solution.
Understanding how to portray a system which exhibits partially intelligent
behavior is a general problem. Few will dispute that, for the foreseeable
future, intelligent systems will fall short of the breadth and flexibility
which characterize human-level intelligence. But how can the semi-intelligence
of computer systems be portrayed? People have little if any experience with
systems which are extremely (or even just somewhat) intelligent in one narrow
domain, and utterly stupid in another, so appropriate metaphors or analogies
are not easy to find. Excellent performance in one domain or instance is
likely to lead to expectations of similar performance everywhere. How can
these expectations be controlled?
The Agent Metaphor: Reactions and Expectations
In this section we turn to the agent metaphor and the expectations
it raises. Why should adaptive functionality be portrayed as an agent? What
is gained by having a character appear on the screen, whether it be a bow-tied
human visage, an animated animal character, or just a provocatively named
dialog box? Is it somehow easier or more natural to have a back and forth
dialog with an agent than to fill in a form that elicits the same information?
Most discussions that advance the cause of agents focus on the adaptive
functionality that they promise--however, as we've already argued, adaptive
functionality need not be embodied in the agent metaphor. So let's turn
to the question of what good are agents as ways of portraying
functionality? When designers decide to invoke the agent metaphor, what
benefits and costs does it bring with it?
First it must be acknowledged that in spite of the popularity of the agent
metaphor there is remarkably little research on how people react to agents.
The vast bulk of work has been focused either on the development of adaptive
functionality itself, or on issues having to do with making agents appear
more lifelike: how to animate them, how to make them better conversants,
and so on. In this section, we'll look at three strands of research that
shed some light on the experience of interacting with agents.
Guides
The Guides project involved the design of an interface to a CD
ROM based encyclopedia (Salomon, Oren, and Kreitman 1989; Oren, et al. 1990).
The intent of the design was to encourage students to explore the contents
of the encyclopedia. The designers wanted to create a halfway point between
directed searching and random browsing by providing a set of travel guides,
each of which was biased towards a particular type of information.
The interface used stereotypic characters such as a settler woman, an Indian,
and an inventor (the CD-ROM subset of the encyclopedia covered early American
history). The guides were represented by icons that depicted the guide's
role--no attempt was made to reify the guide, either by giving it a realistic
looking picture or by providing information such as a name or personal history.
As users browsed though stories in the encyclopedia, each guide would create
a list of articles that were related to the article being looked at and
were in line with its interests. When clicked on, the guide would display
its 'suggestions.' Thus, if the user were reading an article about the gold
rush, the Indian guide might suggest articles about treaty violations, whereas
the inventor guide might suggest an article about machines for extracting
gold.
The system was implemented and was then tested on high school students.
The students had a variety of reactions. They tended to assume that the
guides, which were presented as stock characters, embodied particular characters.
For example, since many of the articles in the encyclopedia were biographies,
users would assume that the first biography suggested by a guide was its
own. If the inventor guide first suggested an article on Samuel Morse, users
often assumed that Morse was now their guide. Students also wondered if
they were seeing the article from the guide's point of view (they weren't).
And they sometimes assumed that guides had specific reasons for suggesting
each story and wanted to know what they were (in line with users' general
wish to understand what adaptive functionality is actually doing).
In some cases the students also became emotionally engaged with the guides.
Oren, et al. (1990) report some interesting examples of this: " the
preacher guide brought one student to the Illinois history article and she
could not figure out why. The student actually got angry and did not want
to continue with the guide. She felt the guide had betrayed her." While
anecdotes of users getting angry with their machines are common, stories
about users getting angry with one interface component are much less so.
In another case, a bug in the software caused the guide to disappear. Oren,
et al., write: "One student interpreted this as ' the guide got mad,
he disappeared.' He wanted to know ' if I go back and take his next choice,
will he come back and stay with me?'" Here the tables are turned. The
user infers that the guide is angry. While no controlled experiment is available,
it is hard to believe that the user would have made such an inference if
the suggested articles been presented in a floating window that had vanished.
While this evidence is anecdotal, it is nevertheless interesting and relevant.
Here we again see users engaged in the effort to understand, control, and
predict the consequences of adaptive functionality. What is particularly
interesting is how these efforts are shaped by the agent metaphor. The students
are trying to understand the guides by particularizing them, and thinking
about their points of view. One student wants to control his guide (the
one that 'got mad and disappeared') by being more agreeable, suspecting
that the guide will come back if his recommendations are followed. All of
this happens in spite of the rudimentary level of the guides' portrayals.
Computers as Social Actors
Nass, and his colleagues at Stanford, have carried out an extensive
research program on the tendency of people to use their knowledge of people
and social rules to make judgments about computers. Two aspects of their
results are interesting in relation to the agent metaphor. First, they show
that very small cues can trigger people's readiness to apply social rules
to computers. For example, simply having a computer use a human voice is
sufficient to cause people to apply social rules to the computer (Nass and
Steuer 1993). This suggests that the agent metaphor may be invoked very
easily--human visages with animated facial expressions, and so forth, are
not necessary. This is in accord with the finding from the Guides study,
in which stereotypic pictures and role labels triggered attributions of
individual points of views and emotional behavior. The second aspect of
interest is the finding that people do, indeed, apply social rules when
making judgments about machines.
Let's look at an example. One social rule is that if person B praises person
A, a third person will perceive the praise as more meaningful and valid
than if person A praises himself. Nass, Steuer, and Tauber (1994) designed
an experiment to show that this social rule holds when A and B are replaced
with computers. The experiment went something like this (it has been considerably
simplified for expository purposes):
in part 1, a person went through a computer-based tutorial on a topic
in part 2, the person was given a computer-based test on the material covered
in part 3, the computer critiqued the effectiveness of the tutorial in part
1.
The experimental manipulation was that in one condition, parts 1, 2, and
3 were all done on computer A (i.e. computer A praised itself), whereas
in the second condition computer A was used for giving the tutorial and
computer B was used to give the test and critique the tutorial (i.e., B
praised A). Afterwards, the human participants in the study were asked to
critique the tutorial themselves. The result was that their ratings were
much more favorable when computer B had praised A's tutorial, than when
computer A had praised itself. That is, they were more influenced by B's
praise of A than by A's praise of itself.
The finding that people are willing apply their social heuristics to computers
is surprising, particularly since the cues that trigger the application
of the social rules are so minimal. In the above experiment, the only cue
was voice. There was no attempt to portray the tutorial as an agent or personal
learning assistant. No animation, no picture, no verbal invocation of a
teacher role, just a voice that read out a fact each time the user clicked
a button. This finding appears to be quite general. Nass and colleagues
are engaged in showing that a wide variety of social rules are applied to
computers given the presence of certain cues: to date, these range from
rules about politeness, to gender biases, to attributions about expertise
(Nass, Steuer, and Tauber 1994; Nass and Steuer 1993).
While this research is important and interesting, there is a tendency to
take it a bit too far. The finding that people apply social rules to interpret
the behavior of computers is sometimes generalized to the claim that individuals'
interactions with computers are fundamentally social (e.g., Nass, Steuer,
and Tauber 1994; Ball, et al., this volume). I think that this is incorrect.
It is one thing for people to apply social heuristics to machines; it is
quite another to assume that this amounts to social interaction, or to suggest
that the ability to support social interaction between humans and machines
is now within reach. Interaction is a two way street: just as people act
on and respond to computers, so computers act on and respond to people.
Interaction is a partnership. But social interaction relies on deep knowledge,
complex chains of inferences and subtle patterns of actions and responses
on the part of all participants (see, for example, Goffman 1967). Computers
lack the knowledge, the inferential ability, and the subtlety of perception
and response necessary to be even marginally competent social partners.
Does this mean that this research should be disregarded? Certainly not.
If anything, the willingness of people to apply social rules to entities
that can't hold up their end of an anticipated social interaction raises
more problems for designers.
Faces
Thus far we have looked at cases where rather minimal portrayals
of agents have evoked surprising reactions. For an interesting contrast,
let's move to the other end of the spectrum and examine work on extremely
realistic portrayals of agents.
One of the more famous examples of a highly realistic agent is "Phil",
an agent played by a human actor in the Knowledge Navigator video tape (Apple
Computer 1987). During the video, Phil interacts via natural language, and
uses vocal inflection, direction of gaze, and facial expressions to support
the interaction. While, as noted in the previous section, the intelligence
and subtlety necessary to support such interaction is far beyond the capacities
of today's software and hardware, it is possible to create portrayals of
agents which synchronize lip movements with their speech and make limited
use of gaze and facial expression (e.g. Walker, Sproull, and Subramani 1994;
Takeuchi and Taketo 1995).
Walker, Sproull, and Subramani (1994) report on a controlled study of human
responses to two versions of a synthesized talking face that was used to
administer a questionnaire. One group simply filled in a textual questionnaire
presented on the computer. Two other groups listened while synthesized talking
faces (a different one for each group) read a question, and then typed their
answer on the computer. Compared to people who simply filled out the questionnaire,
those who answered the questions delivered by the synthesized faces spent
more time, wrote more comments, and made fewer errors. People who interacted
with the faces seemed more engaged by the experience.
Of particular interest was the difference between people's responses to
the two synthesized faces. The faces differed only in their expression:
one face was stern, the other was more neutral. Although the difference
in expression was extremely subtle--the only difference was that the inner
portion of the eyebrows were pulled inward and downward--it did make a difference.
People who answered questions delivered by the stern face spent more time,
wrote more comments, and made fewer errors. Interestingly enough, they also
liked the experience and the face less.
Is the Agent Metaphor Worth the Trouble?
So far it looks like the agent metaphor is more trouble than its worth.
Designers who use the agent metaphor have to worry about new issues like
emotion and point of view and politeness and other social rules and--if
they put a realistic face on the screen--whether people like
the face's expression! Perhaps the agent metaphor should be avoided.
I think there are several reasons not to give up on agents. First, it is
too soon to give up on the agent metaphor. The difficulties noted above
are problems for designers--not necessarily for users. They may very well
be solvable. We simply don't know enough about how people react to agents.
Far more research is needed on how people experience agents. Second, the
research by Nass and his colleagues suggests that we may not have much of
a choice. Very simple cues like voice may be sufficient to invoke the agents
metaphor. Perhaps our only choice is to try to control expectations, to
modulate the degree to which the agent metaphor is manifested. It's not
clear. The third reason is that I believe the agent metaphor brings some
clear advantages with it.
The Agent Conceptual Model
We've discussed the two meanings of agent--adaptive functionality and the
agent metaphor--and some of the new problems they raise. In this section
I want to look below the surface of the agent metaphor at its most fundamental
characteristics. The agent metaphor brings with it a new conceptual model,
one that is quite different from that which underlies today's graphic user
interfaces. It is at this level that the agent metaphor has the most to
offer. To begin with, let's look at the conceptual model that underlies
today's interfaces, and then we'll consider the agent conceptual model in
relation to it.
The Object-Action Conceptual Model
Today's graphic interfaces use a variety of different metaphors. The canonical
example is the desktop metaphor, in which common interface components such
as folders, documents, and the trash can, can be laid out on the computer
screen in a manner analogous to laying items out on a desktop. However,
I don't think the details of the metaphors--folders, trash cans, etc.--are
what is most important. Rather, it is the conceptual model that underlies
them.
The underlying conceptual model of today's graphical user interfaces has
to do with objects and actions. That is, graphic user interface elements
are portrayed as objects on which particular actions may be done. The power
of this object-action conceptual model is rooted in the fact that users
know many things about objects Some of the general knowledge that is most
relevant to the objects found in graphic user interfaces includes the following:
objects are visible
objects are passive
objects have locations
objects may contain things
This knowledge translates into general expectations. An object has a particular
appearance. Objects may be moved from one location to another. Because objects
are passive, if users wish to move them, they must do so themselves. Objects
that contain things may be opened, their contents inspected or changed,
and then closed again.
Graphic user interfaces succeed in being easy to use because these expectations
are usually met by any component of the interface. When users encounter
an object--even if they have absolutely no idea what it is--they know that
it is likely that they can move it, open it, and close it. Furthermore,
they know that clicking and dragging will move or stretch the object, and
that double clicking will open it. They know that if they open it up and
find text or graphics inside it, they will be able to edit the contents
in familiar ways, and close it in the usual way. Because this general knowledge
is applicable to anything users see in the interface, they will always be
able to experiment with any new object they encounter, regardless of whether
they recognize it.
The Agent Conceptual Model
The agent metaphor is based on a conceptual model that is different
from the object-action conceptual model. Rather than passive objects that
are acted upon, the agent metaphor's basic components (agents, of course)
have a degree of animacy and thus can respond to events. We'll call this
the responsive agent conceptual model.
Consider some of the general knowledge people have about agents:
agents can notice things
agents can carry out actions
agents can know things
agents can go places
This knowledge translates into expectations for agents that differ from
those for objects. Since agents can notice things and carry out actions,
in contrast to inanimate objects where these attributes don't apply, the
responsive agent conceptual model is well suited to representing aspects
of a system which respond to events. The sorts of things an agent might
notice, and the ways in which it might respond, are a function of its particular
portrayal.
Another basic difference is that while objects can contain things, agents
know things, and, as a corollary, can learn things. Thus, the agent conceptual
model is suitable for representing systems which acquire, contain, and manage
knowledge. What sort of things are agents expected to learn or know? That
depends on the way in which the agent is portrayed. To paraphrase Laurel
(1990), one might expect an agent portrayed as a dog to fetch the electronic
newspaper, but one would not expect it to have a point of view on its contents.
A 'stupid' agent might only know a few simple things that it is taught,
and might be unable to offer explanations for its actions beyond citing
its rules; a more intelligent agent might be able to learn by example, and
construct rationales for its actions. Note that more intelligence or knowledge
is not necessarily better: what is important is the match between the agent's
abilities and the user's expectations. Ironically, the agent metaphor may
be particularly useful not because agents can represent intelligence, but
because agents can represent very low levels of intelligence.
Another difference between object-action and agent conceptual model is that
agents can go places. Users expect objects to stay where they're put; agents,
on the other hand, are capable of moving about. Where can agents go? That
depends both on the particular portrayal of the agent, as well as on the
spatial metaphor of the interface. At the very least, an agent is well suited
for representing a process that can log onto a remote computer, retrieve
information, and download it to its user's machine. Another consequence
of an agent's ability to go places is that it need not be visible to be
useful or active. The agent may be present 'off stage,' able to be summoned
by the user when interaction is required, but able to carry out its instructions
in the background.
Objects and Agents
These arguments about the differences between the object and agent conceptual
models could be ignored. After all, interface components ignore many properties
of the real things on which they are based. For example, 'Folder objects'
in graphic user interfaces can be deeply nested, one inside another inside
another inside another, unlike their real world counterparts. Yet in spite
of this departure from our knowledge of the real world objects, it works
well. Perhaps we could simply integrated adaptive functionality into what
were formerly passive, unintelligent objects. It's easy to conceive of an
interface folder that is 'smart,' or that can 'notice' particular kinds
of documents and 'grab' them, or that can 'migrate' from one a desktop machine
to a portable when it is time to go home. However, the drawback of such
a design tack is that it undermines the object-action conceptual model.
If that tack were pursued, users wouldn't know as much about what they seen
on the screen. If they encounter a new object, what will it do? Perhaps
it will just sit there, or perhaps it will wake up and do something. Perhaps
double clicking will open it, or perhaps double clicking will start it running
around, doing things.
I believe that there is much to be said for maintaining the separation between
the object and agent conceptual models. It becomes a nice way of dividing
up the computational world. That is, objects and agents can be used in the
same interface, but they are clearly distinguished from one another. Objects
stay what they are: nice, safe, predictable things that just sit there and
hold things. Agents become the repositories for adaptive functionality.
They can notice things, use rules to interpret them, and take actions based
on their interpretations. Ideally, a few consistent methods can be defined
to provide the users with the knowledge and control they need. That is,
just as there are consistent ways of moving, opening, and closing objects,
so can there be consistent ways of finding out what an agent will notice,
what actions it will carry out, what it knows, and where it is. Such methods
get us a good deal of the way to providing users with the understanding,
control, and prediction they need when interacting with adaptive systems.
There is a risk of over emphasizing the importance of metaphors and conceptual
models. Normally, people are not aware of the conceptual model, the metaphor,
or even individual components of the interface. Rather, they are absorbed
in their work, accomplishing their actions with the kind of unreflective
flow that characterizes expert performance. It is only when there are problems--the
lights go out, the search agent brings back worthless material, the encyclopedia
guide vanishes--that we begin to reflect and analyze and diagnose.
But this is why metaphors and conceptual models are particularly important
for adaptive functionality. For the foreseeable future, it will fall short
of perfection. After all, even humans make errors doing these sorts of tasks,
and adaptive functionality is immeasurably distant from human competence.
As a consequence, systems will adapt imperfectly, initiate actions when
they ought not, and act in ways that seem far from intelligent.
Concluding Remarks
In this chapter we've explored a number of problems that are important to
consider when designing agents. First we noted that there are two distinct
senses of agent: the metaphor that is presented to the user, and the adaptive
functionality that underlies it. Each gives rise to particular problems.
The agent metaphor brings a number of expectations that are new to user
interface design. And adaptive functionality raises a number of other issues
that are independent of how the functionality is portrayed.
The chief challenge in designing agents, or any other portrayal of adaptive
systems, is to minimize the impact of errors and to enable people to step
in and set things right as easily and naturally as possible. We've discussed
two approaches to this. One is to make sure that adaptive systems are designed
to enable users to understand what they're doing, and predict and control
what they may do in the future. Here we've suggested that the agent conceptual
model may provide a good starting point, providing general mechanisms for
accessing and controlling agents. Second, since the agent metaphor can create
a wide variety of expectations, we need to learn more about how portrayals
of agents shape users' expectations and then use that knowledge to adjust
(which usually means lower) people's expectations. Research which focuses
on the portrayal of adaptive functionality, rather than on the functionality
itself, is a crucial need if we wish to design agents that interact gracefully
with their users.
Acknowledgments
Gitta Salomon contributed to the analysis of the DowQuest system.
A number of the findings about the use of DowQuest are from an unpublished
manuscript by Meier, et al. (1990), carried out as project for a Cognitive
Engineering class under the supervision of Don Norman, with Salomon and
myself as outside advisors. The paper benefited from the comments of Stephanie
Houde, Gitta Salomon, and three anonymous reviewers.
References
Apple Computer. 1987. The Knowledge Navigator. (Videotape)
Belew, R. K. 1989. Adaptive Information Retrieval.: Using a Connectionist
Representation to Retrieve and Learn about Documents. In Proceedings
of SIGIR . Cambridge, MA: ACM Press, pp 11-20.
Cypher, A. 1991. EAGER: Programming Repetitive Tasks by Example. Human
Factors in Computing Systems: the Proceedings of CHI '91 , pp 33
39. New York: ACM Press.
Dow Jones and Company, Inc. 1989. Dow Jones News/Retrieval User's Guide.
Erickson, T. 1996. "Feedback and Portrayal in Human Computer Interface
Design." Dialogue and Instruction , eds. R. J. Beun, M.
Baker and M. Reiner. Heidelberg: Springer-Verlag, in press, 1996.
Erickson, T., and Salomon, G. 1991. Designing a Desktop Information System:
Observations and Issues. Human Factors in Computing Systems: the Proceedings
of CHI '91 . New York: ACM Press.
Goffman, E. 1967. Interaction Ritual. New York: Anchor Books.
Greenberg, S., and Whitten, I. 1985. Adaptive Personalized Interfaces--A
Question of Viability. Behavior and Information Technology
, 4(1): 31 45.
Laurel, B. 1990, Interface Agents: Metaphors with Character. The Art
of Human-Computer Interface Design , ed. B. Laurel. Addison Wesley,
pp 355 365.
Meier, E.; Minjarez, F.; Page, P.; Robertson, M.; and Roggenstroh, E. Personal
communication, 1990.
Mitchell, T.; Caruana, R.; Freitag, D.; McDermott, J.; and Zabowski, D.
1995. Experience with a Learning Personal Assistant. Communications
of the ACM , 37(7): 1-91.
Nass, C., and Steuer, J. 1993. Anthropomorphism, Agency, and Ethopoeia:
Computers as Social Actors. Human Communication Research, 19
(4): 504-527.
Nass, C.; Steuer, J; and Tauber, E. R. 1994. Using a Human Face in an Interface.
Human Factors in Computing Systems: CHI '94 Conference Proceedings
. New York: ACM Press.
Oren T.; Salomon, G.; Kreitman K.; and Don, A. 1990, Gui
des: Characterizing the Interface. The Art of Human-Computer Interface
Design , ed. B. Laurel. Addison Wesley, pp. 367 381.
Salomon, G.; Oren, T.; and Kreitman. K. 1989. Using Guides to Explore Multimedia
Databases. The Proceedings of the Twenty-Second Annual Hawaii International
Conference on System Science .
Stanfill, C., and Kahle, B. 1986. Parallel Free-text Search on the Connection
Machine System. Communications of the ACM , 29(12,): 1229 1239.
Takeuchi, A.; and Taketo, N. 1995. Situated Facial Displays: Towards Social
Interaction. Human Factors in Computing Systems: CHI '95 Conference
Proceedings . New York: ACM Press.
Walker, J.; Sproull, L.; and Subramani, R. 1994. Computers are Social Actors.
Human Factors in Computing Systems: CHI '94 Conference Proceedings
. New York: ACM Press.
[Tom's Home Page]
[Professional] [Life,
Fun, &c] [Tell Me...]
© Copyright 1997 by Thomas Erickson.
All Rights Reserved.