Gestures With Touch

I’ve always seen gestures as an alternative interaction technique, available when others like speech or touch are unavailable or less convenient. For example, gestures could be used to browse recipes without touching your tablet and getting it messy, or could be used for short ‘micro-interactions’ where gestures from a distance are better than approaching and touching something.

Lately, two papers at UIST ’14 looked at using gestures alongside touch, rather than instead of. I really like this idea and I’m going to give a short overview of those papers here. Combining hand gestures with other interaction techniques isn’t new though, an early and notable example from 1980 was Put That There, where users interacted using voice and gesture together.

In Air+Touch, Chen and others look at how fingers may move and gesture over touchscreens while also providing touch input. They grouped interactions into three types: gestures happening before touch, gestures happening between touches and gestures happening after touch. They also identified various finger movements which can be used over touchscreens but which are distinct from incidental movements. These include circular paths, sharp turns and jumps into a higher than normal space over the screen. In Air+Touch, users gestured and touched with one finger. This lets users provide more expressive input than touch alone provides.

In contrast to unimodal input (meaning one hand rather than one input modality, in this case) is bimodal input, which Song and others looked at. They focused on gestures in the wider space around the device, using the non-touching hand for gestures. As users interacted with mobile devices using touch with one hand, the other hand could gesture nearby to access other functionalities. For example, they describe how users may browse maps with touch, while using gestures to zoom in or out from the map.

While each of these papers take different approaches to combining touch and gesture, both have some similarities. Touch can be used to help segment input. Rather than detecting gestures at all times, interfaces can just look for gestures which occur around touch events; touch is implicitly used as a clutch mechanism. Clutching helps avoid accidental input and saves power as gesture sensing doesn’t need to happen all the time.

Both also demonstrate using gestures for easier context switching and secondary tasks. Users may gesture with their other hand to switch map mode while browsing or may lift their finger between swipes to change map mode. Gestures are mostly used for discrete secondary input, rather than as continuous primary input; although this is certainly available. There are similarities between these concepts and AD-Binning from Hasan and others. They used around-device gestures for accessing content, while interacting with that content using touch with their other hand.


Gestures Are Not “Natural”

I’m sitting in Helsinki Airport on my way home from NordiCHI. It’s been a great conference and I’ve had a lot of fun exploring Helsinki too. Despite a very grey start to the week, the sun eventually came out and illuminated the bright colours of Helsinki’s beautiful architecture.

In this post I’m going to explain why I think gesture interaction is not “natural” or “intuitive”. A few talks this week justified using gestures because they were “natural” and I think that’s not really true. There are many practical realities that mean this isn’t the case and so long as people keep thinking of gestures as being “natural”, we won’t overcome those issues. Nothing of what I’m saying is new, we’ve known it for years; heck, Don Norman said the same thing (about “natural user interfaces”) and made many of the same points. This post is inspired by a discussion over coffee at NordiCHI!

Why Gesture Interaction Isn’t “Natural”

The Midas Touch Problem

In gesture interaction, the Midas Touch problem is that any sensed movements may be treated as input. Since gestures are “natural” movements which we perform in everyday life, that means that everyday movements may be treated as gestures! Obviously this is undesirable. If I’m being sensed by one or more interfaces in the surrounding environment then I don’t want hand movements I use in conversation, for example, to be treated as input to some interface.

Many solutions exist to addressing the Midas Touch problem, including clutch actions. A clutch action is one which begins or ends part of an interaction. A familiar example from speech input is saying “OK Google” to activate voice input for Google Now or Google Glass. In gesture interaction, a clutch may be a particular gesture (often called activation gestures) or body pose (like the Teapot gesture in StrikeAPose). Other alternatives include activation zones or using some other input modality as a clutch, like pressing a button or using a voice command.

Regardless of how you address the Midas Touch problem, you’re moving further away from something people do “naturally”. It’s necessary to have to perform some action specific to user interaction.

The Address Problem

In their Making Sense of Sensing Systems paper, Bellotti et al. (2002) described the problem of how to address a sensing system. Users need to be able to direct their input towards the system they intend to interact with; that is, they must be able to address the desired interface. This is more of a problem in environments with more than one sensing interface. Given that the HCI community is using gestures in more and more contexts, it’s reasonable to assume that we’ll eventually have many gesture interfaces in our environments. We need to be able to address interfaces to avoid our movements accidentally affecting others (another variant of the Midas Touch problem).

In conversation and other human interactions, we typically address each other using cues such as body language or by making eye contact. This isn’t necessarily possible in interaction, as detecting intention to interact implicitly is challenging. Instead we’ll need more explicit interaction techniques to help users address interfaces. As with the Midas Touch problem, this is not “natural”.

The Sensing Problem

Unlike the gestures we make (unintentionally or intentionally) in everyday life, gestures for gesture input need to meet certain conditions. Depending on the sensing method, users have to perform a gesture that will be understood by the system. For example, if users must move their hand across their body from right to left then this movement must be done in a way that can be recognised. This may involve directly facing a gesture sensor, slowing the movement down, moving in a perfectly horizontal line or exaggerating aspects of the gesture. In trying to make gestures understood by sensors, users perform more rigid and forced movements. Again, this is not “natural”.

Implications for HCI

Although there are more reasons that gesture interaction should not be considered a “natural” or “intuitive” input modality, I think these are the three most important ones. All result in users performing hand or body movements which are very specific to interaction; much like how we speak to computers differently than we speak to other people. I think speech input is another modality which is considered “natural” but which suffers similar problems.

I’m not sure if we’ll ever solve these problems from the computer side of gesture interaction. It would be nice, but it’s asking for a lot. Instead, we should embrace the fact that gestures are not “natural” and do what we’re good at in HCI: finding solutions for overcoming problems with technology. We need to design interaction techniques which acknowledge the unnatural aspects of gesture input in order for gestures to be usable outside of our lab studies and in an intelligent world filled with sensing interfaces and devices.

NordiCHI ’14: Beyond the Switch and Nokia

At the moment I’m in Helsinki for NordiCHI 2014. Yesterday I was taking part in the Beyond the Switch workshop, where we discussed interactive lighting and interaction – both implicit and explicit – with light sources.

It was interesting to learn more about how people interact with light and hear about how others are using light in their own research and products. I’m hoping that I also brought an interesting point of view, as an “outsider” in this community. In our research we use interactive light as an output modality, exploiting the increasing connectivity of “smart” light sources. As new and existing light sources are developed with interactivity in mind, I think we’ll start to see others using light in new ways too.

In the first half of the workshop we presented our position papers and identified some interesting topics and challenges which arose in these presentations. After some discussion – and lots of post-it notes – we arranged these topics into themes. Two of the bigger themes which emerged in the workshop were (in my own words): semantics and “natural” interaction with light.

Lots of questions which emerged from the discussion were about what light actually means and how we can use light to represent information. This was relevant to what we do in our research as we use interactive light to encode information about gesture interaction. I think an interesting area for future research would be to understand what properties of light best represent what types of information and what makes a “good” interactive light encoding. There is already research which has started to look at some of these design challenges although more is needed.

In the second half of the workshop we thought more about “natural” interaction. I put quotes around the word natural because it, along with “intuitive”, is a bad word in HCI (according to Steve Brewster, anyway)! As the workshop was about interaction beyond the switch, however, there was a lot of interest in how else we could interact with light. We split up into teams and each focused on different aspects of interaction with light. My team looked at explicit control of light, whilst the other focused on implicit interaction with light. Overall, it was a fun workshop. Lots of really cool demos of Philips Hue, too!

Earlier today I went to visit Vuokko and give a talk about my PhD research, at Nokia Technologies in Otaniemi. After two years of working with and being funded by Nokia, it was nice to finally visit them in Finland! My slides from my talk are available here.

Now that my workshop and presentation at Nokia are finished, I have the rest of the week to enjoy the conference. I’m looking forward to exploring Helsinki more. I’ve walked quite a lot since I got here – mostly at night – and it’s been fun to see the city. Tomorrow morning is the start of the main conference program, starting with a keynote by Don Norman.

Mobile HCI ’14 Poster

This time next week I’ll be boarding a plane to fly to Toronto for Mobile HCI! I’ll be presenting a poster there on above-device gesture design and I’m also participating in the doctoral consortium. I’ve set up a page to accompany my poster and demonstrate our above-device gestures: see here. My poster is also finished, printed and ready to go!

Mobile HCI '14 Poster
Mobile HCI ’14 Poster

Posters, SICSA HCI and Second Year Viva


June has gotten off to an exciting start! I won a poster presentation competition, got a poster paper into Mobile HCI ’14 and scheduled my annual progress review. I’ve arranged my 2nd year viva, which is an annual progress review for my PhD research. Finishing and submitting my report felt a little anticlimactic compared to last year; I suppose the first year review is much more important and by this stage it’s more of a checkpoint. Lately I’ve been writing and research planning a lot so I’m looking forward to getting back to actually doing research. Designing, making, all those fun things that make HCI awesome! Yesterday was the SICSA HCI yearly meetup, which was good fun. This year it was hosted by University of Dundee so travelling to Dundee was nice. I spent a lot of time there as a kid, especially around the university campus, so it was cool to go back there and see how everything has changed. Highlights of the day included keynotes from Miguel Nacenta and David Flatla. Miguel presented some really cool research and David was so damn entertaining! We also had a few posters and a demo from our group. I won the poster presentation competition, which was a nice surprise. My poster (below) gave a general overview of my PhD research and showed off a couple of projects.

SICSA HCI ’14 Poster

Winning the poster competition must have been a good omen because I returned home to a notification from Mobile HCI saying my poster was accepted. That paper and poster are about two gesture design studies from the start of my PhD. I’ll post more about them another time.

Mobile HCI Doctoral Consortium

I’ve been accepted for the doctoral consortium at Mobile HCI ’14! I’m looking forward to it – it’ll be great to get a grilling from others and will help with my thesis, something which becomes increasingly daunting knowing that I’m halfway through my three years of PhD research.

TEDx Demos

TEDx badge

A couple of days ago our group showed off some of our research during TEDx Glasgow University. It was a fun experience. We don’t often get to engage with a non-academic audience so it was refreshing to chat about future technology with non-computing scientists. I came away from the demo session feeling inspired and with some fresh ideas about where to take my research.

I presented a gesture interface for mobile phones, using around-device gestures as input. People seemed to find this modality particularly attractive for use in the kitchen, when hands are often full, wet or messy. I was also showing how wearables could be used alongside mobile phones. People seemed to enjoy the novelty of this, although understandably there was some doubt about having to wear another accessory (alongside fashion items like watches or bracelets). I definitely feel that future wearables need to be designed with fashion in mind so people want to wear them as accessories first and interfaces second.

Inevitably someone mentioned interfaces found in Minority Report and Iron Man – I hope HCI finds ways to inspire people’s imaginations about interfaces of the future in the same way that Hollywood has.

PyQT: QPixmap and threads

I’ve been working with PyQT lately and got stuck on a seemingly simple problem: updating the UI from another thread. Having never used PyQT before it wasn’t obvious what the solution was and any Stack Overflow results I found gave incomplete code samples. I’m hoping this post helps give pointers for anyone searching for the same things I did.

This particular example is very contrived but it’s the only solution I could find for updating an image with QPixmap objects in a multithreaded interface, overcoming the “QPixmap: It is not safe to use pixmaps outside the GUI thread” error message. I think part of my problem was that I wasn’t using QThreads in my threaded code and I wasn’t willing to refactor a large codebase just to improve PyQT integration.

First another thread calls someFunctionCalledFromAnotherThread, which uses PyQT’s signal mechanism to pass events across threads. This function creates a LoadImageThread with the filename and desired size as arguments, connects it to a signal to call the showImage function, then starts the thread.

def someFunctionCalledFromAnotherThread(self):
  thread = LoadImageThread(file="test.png", w=512, h=512)
  self.connect(thread, QtCore.SIGNAL("showImage(QString, int, int)"), self.showImage)

def showImage(self, filename, w, h):
  pixmap = QtGui.QPixmap(filename).scaled(w, h)

LoadImageThread then does nothing other than emit a response to the showImage signal we connected above, passing the thread arguments back. This means showImage will be executed on the GUI thread, avoiding those nasty QPixmap errors. Note the __del__ function below; that prevents the thread from being garbage collected while running.

class LoadImageThread(QtCore.QThread):
  def __init__(self, file w, h):
    self.file = file
    self.w = w
    self.h = h

  def __del__(self):

  def run(self):
    self.emit(QtCore.SIGNAL('showImage(QString, int, int)'), self.file, self.w, self.h)

There we have it – a stupid and contrived solution to a stupid problem.

Network Messages in Pure Data

Pure Data is great for generating sound and is used quite often in HCI for this reason. I’ve previously written about how it can be used for creating tactons as well. This post shows a simple patch which receives messages from a local socket and parses the input. As a Pure Data newbie I found documentation to be pretty poor so I’m hoping this helps others see how to easily integrate pd~ with other programs using sockets.


Source: networklistener.pd

First the tcpserver object creates a TCP server which listens on the given port number (34567 here). The first outlet has incoming messages; the second has connection status. The bytes2any object takes a bytestream and creates a message from it. As an example of how to parse information from these messages, the unpack object here parses three floats from messages. This patch has four outlets: the first three are the parsed floats, the fourth is connection status (True when a socket is connected to the server).

PyOpenNI and OpenCV

In my last post I gave an example of how to use OpenKinect to get a depth stream which can be manipulated with OpenCV in Python. This example uses PyOpenNI instead, which is more powerful as it exposes useful OpenNI functionality.

The gist of this example is that PyOpenNI abstracts the depth map behind a DepthMap object and for OpenCV we need to turn that into a numpy array. As DepthMap can be iterated over, the most obvious solution is to just call numpy.asarray with the depth map; however, I’ve found that to be far too slow for practical use.

A much faster way is to use the older raw depth stream functions from PyOpenNI’s DepthGenerator class. These functions – get_raw_depth_map and get_raw_depth_map_8 – return byte strings of 16 and 8 bits per pixel, respectively. Numpy can create an array from these byte strings significantly faster than just calling numpy.asarray with the DepthMap object.

from openni import *
import numpy as np
import cv2

# Initialise OpenNI
context = Context()

# Create a depth generator to access the depth stream
depth = DepthGenerator()
depth.fps = 30

# Start Kinect

# Create array from the raw depth map string
frame = np.fromstring(depth.get_raw_depth_map_8(), "uint8").reshape(480, 640)

# Render in OpenCV
cv2.imshow("image", frame)

In this example a single frame is taken from the Kinect depth stream and turned into an OpenCV-compatible numpy array. This could be useful as a starting point for a Kinect finger tracker because it allows OpenNI’s skeleton and hand tracking functionality to be used alongside OpenCV. As an example, OpenNI could be used to track current hand position so that OpenCV knows which region of the depth map (or rgb stream – this code is easily modifiable to use that instead) contains a hand. Computer vision techniques could then be used to look for fingers in that region – removing the need to try and segment using colour when looking for hands in a normal rgb image.