Gestures Are Not “Natural”

I’m sitting in Helsinki Airport on my way home from NordiCHI. It’s been a great conference and I’ve had a lot of fun exploring Helsinki too. Despite a very grey start to the week, the sun eventually came out and illuminated the bright colours of Helsinki’s beautiful architecture.

In this post I’m going to explain why I think gesture interaction is not “natural” or “intuitive”. A few talks this week justified using gestures because they were “natural” and I think that’s not really true. There are many practical realities that mean this isn’t the case and so long as people keep thinking of gestures as being “natural”, we won’t overcome those issues. Nothing of what I’m saying is new, we’ve known it for years; heck, Don Norman said the same thing (about “natural user interfaces”) and made many of the same points. This post is inspired by a discussion over coffee at NordiCHI!

Why Gesture Interaction Isn’t “Natural”

The Midas Touch Problem

In gesture interaction, the Midas Touch problem is that any sensed movements may be treated as input. Since gestures are “natural” movements which we perform in everyday life, that means that everyday movements may be treated as gestures! Obviously this is undesirable. If I’m being sensed by one or more interfaces in the surrounding environment then I don’t want hand movements I use in conversation, for example, to be treated as input to some interface.

Many solutions exist to addressing the Midas Touch problem, including clutch actions. A clutch action is one which begins or ends part of an interaction. A familiar example from speech input is saying “OK Google” to activate voice input for Google Now or Google Glass. In gesture interaction, a clutch may be a particular gesture (often called activation gestures) or body pose (like the Teapot gesture in StrikeAPose). Other alternatives include activation zones or using some other input modality as a clutch, like pressing a button or using a voice command.

Regardless of how you address the Midas Touch problem, you’re moving further away from something people do “naturally”. It’s necessary to have to perform some action specific to user interaction.

The Address Problem

In their Making Sense of Sensing Systems paper, Bellotti et al. (2002) described the problem of how to address a sensing system. Users need to be able to direct their input towards the system they intend to interact with; that is, they must be able to address the desired interface. This is more of a problem in environments with more than one sensing interface. Given that the HCI community is using gestures in more and more contexts, it’s reasonable to assume that we’ll eventually have many gesture interfaces in our environments. We need to be able to address interfaces to avoid our movements accidentally affecting others (another variant of the Midas Touch problem).

In conversation and other human interactions, we typically address each other using cues such as body language or by making eye contact. This isn’t necessarily possible in interaction, as detecting intention to interact implicitly is challenging. Instead we’ll need more explicit interaction techniques to help users address interfaces. As with the Midas Touch problem, this is not “natural”.

The Sensing Problem

Unlike the gestures we make (unintentionally or intentionally) in everyday life, gestures for gesture input need to meet certain conditions. Depending on the sensing method, users have to perform a gesture that will be understood by the system. For example, if users must move their hand across their body from right to left then this movement must be done in a way that can be recognised. This may involve directly facing a gesture sensor, slowing the movement down, moving in a perfectly horizontal line or exaggerating aspects of the gesture. In trying to make gestures understood by sensors, users perform more rigid and forced movements. Again, this is not “natural”.

Implications for HCI

Although there are more reasons that gesture interaction should not be considered a “natural” or “intuitive” input modality, I think these are the three most important ones. All result in users performing hand or body movements which are very specific to interaction; much like how we speak to computers differently than we speak to other people. I think speech input is another modality which is considered “natural” but which suffers similar problems.

I’m not sure if we’ll ever solve these problems from the computer side of gesture interaction. It would be nice, but it’s asking for a lot. Instead, we should embrace the fact that gestures are not “natural” and do what we’re good at in HCI: finding solutions for overcoming problems with technology. We need to design interaction techniques which acknowledge the unnatural aspects of gesture input in order for gestures to be usable outside of our lab studies and in an intelligent world filled with sensing interfaces and devices.