Synthesising Speech in Python

There’s a Scottish company called CereProc who do some of the best speech synthesis in the world. They excel in regional accents, especially difficult Scottish ones! I’ve been using their CereVoice Cloud SDK in some recent projects (like Speek). In this post I’m going to share a wee Python script and an Android class for using their cloud API to generate synthesised speech. To use these, you’ll need to create a (free) account over on CereProc’s developer site and then add your auth credentials to the code.

Downloading Speech in Python

Call the download() function with the message you wish to synthesise, optionally specifying which voice to use, which file format to use and what to name the file.

Downloading and Playing Speech in Android

Create a CereCloudPlayer object and use its play method to request, download, and play the message you wish to synthesise.

Pure Data Patches

I’ve started uploading and documenting some Pure Data patches which I’ve used for generating Earcons and Tactons – hopefully they’ll be useful to someone. Check them out here.

ICMI ’14 Highlights

Last week I was in Istanbul for ICMI ’14, the International Conference on Multimodal Interaction. ICMI is where signal processing and machine learning meets human-computer interaction, with aims of finding ways to use and improve multimodal interaction.

Ask two people and you’ll get a different definition of “multimodal interaction“. From my (HCI) perspective, it is interaction with technology using a variety of human capabilities; such as our perceptual abilities (like seeing, hearing, feeling) and motor control abilities (like speaking, gesturing, touching). In one of this year’s keynotes, Yvonne Rogers said we should design multimodal interfaces because we also experience the world using many modalities.

In this post I’m going to recap what I thought were the most interesting papers at the conference this year. There are also some photos of the sights, because why not?

Gesture Heatmaps: Understanding Gesture Performance with Colorful Visualizations

by Radu-Daniel Vatavu, Lisa Anthony and Jacob O. Wobbrock

Vatavu et al. presented a poster on Gesture Heatmaps, which are visualisations of how users perform touch-stroke gestures. Their visualisations represent characteristics of how users perform gestures, such as stroke speed and distance error from a gesture template. These visualisations can be used to summarise gesture performances, giving insight into how users perform touch gestures. These could be used to identify problematic gestures or understand which parts of gestures users find difficult, for example. Something which I liked about this paper was the way they used these visualisations to create confusion matrices, showing where and why gestures were misclassified.

CrossMotion: Fusing Device and Image Motion for User Identification, Tracking and Device Association

by Andrew D. Wilson and Hrvoje Benko

Wilson and Benko found that device acceleration (from accelerometers) was highly correlated with image acceleration (from a Kinect, in this case). This means that fusing acceleration data from these two sources can be used to identify a particular person in an image, even if their mobile device isn’t visible (for example, phone in pocket). Some advantages of using this approach are that users can be found in an image from their device movement alone (simplifying identification) and devices can be identified and tracked, even without direct line of sight.

SoundFLEX: Designing Audio to Guide Interactions with Shape-Retaining Deformable Interfaces

by Koray Tahiroğlu, Thomas Svedström, Valtteri Wikström, Simon Overstall, Johan Kildal and Teemu Ahmaniemi

Tahiroğlu et al. looked at how audio cues could be used to guide interactions with a deformable interface. They found that sound was an effective way of encouraging users to deform devices and some of their designs were particularly effective for guiding users to specific deformations. Based on these findings, they recommend using sound to help users discover deformations. Koray had a cool demo at the conference, which is the first time I’ve tried a deformable device prototype. Pretty neat idea.

Gestures With Touch

I’ve always seen gestures as an alternative interaction technique, available when others like speech or touch are unavailable or less convenient. For example, gestures could be used to browse recipes without touching your tablet and getting it messy, or could be used for short ‘micro-interactions’ where gestures from a distance are better than approaching and touching something.

Lately, two papers at UIST ’14 looked at using gestures alongside touch, rather than instead of. I really like this idea and I’m going to give a short overview of those papers here. Combining hand gestures with other interaction techniques isn’t new though, an early and notable example from 1980 was Put That There, where users interacted using voice and gesture together.

In Air+Touch, Chen and others look at how fingers may move and gesture over touchscreens while also providing touch input. They grouped interactions into three types: gestures happening before touch, gestures happening between touches and gestures happening after touch. They also identified various finger movements which can be used over touchscreens but which are distinct from incidental movements. These include circular paths, sharp turns and jumps into a higher than normal space over the screen. In Air+Touch, users gestured and touched with one finger. This lets users provide more expressive input than touch alone provides.

In contrast to unimodal input (meaning one hand rather than one input modality, in this case) is bimodal input, which Song and others looked at. They focused on gestures in the wider space around the device, using the non-touching hand for gestures. As users interacted with mobile devices using touch with one hand, the other hand could gesture nearby to access other functionalities. For example, they describe how users may browse maps with touch, while using gestures to zoom in or out from the map.

While each of these papers take different approaches to combining touch and gesture, both have some similarities. Touch can be used to help segment input. Rather than detecting gestures at all times, interfaces can just look for gestures which occur around touch events; touch is implicitly used as a clutch mechanism. Clutching helps avoid accidental input and saves power as gesture sensing doesn’t need to happen all the time.

Both also demonstrate using gestures for easier context switching and secondary tasks. Users may gesture with their other hand to switch map mode while browsing or may lift their finger between swipes to change map mode. Gestures are mostly used for discrete secondary input, rather than as continuous primary input; although this is certainly available. There are similarities between these concepts and AD-Binning from Hasan and others. They used around-device gestures for accessing content, while interacting with that content using touch with their other hand.

References

Gestures Are Not “Natural”

I’m sitting in Helsinki Airport on my way home from NordiCHI. It’s been a great conference and I’ve had a lot of fun exploring Helsinki too. Despite a very grey start to the week, the sun eventually came out and illuminated the bright colours of Helsinki’s beautiful architecture.

In this post I’m going to explain why I think gesture interaction is not “natural” or “intuitive”. A few talks this week justified using gestures because they were “natural” and I think that’s not really true. There are many practical realities that mean this isn’t the case and so long as people keep thinking of gestures as being “natural”, we won’t overcome those issues. Nothing of what I’m saying is new, we’ve known it for years; heck, Don Norman said the same thing (about “natural user interfaces”) and made many of the same points. This post is inspired by a discussion over coffee at NordiCHI!

Why Gesture Interaction Isn’t “Natural”

The Midas Touch Problem

In gesture interaction, the Midas Touch problem is that any sensed movements may be treated as input. Since gestures are “natural” movements which we perform in everyday life, that means that everyday movements may be treated as gestures! Obviously this is undesirable. If I’m being sensed by one or more interfaces in the surrounding environment then I don’t want hand movements I use in conversation, for example, to be treated as input to some interface.

Many solutions exist to addressing the Midas Touch problem, including clutch actions. A clutch action is one which begins or ends part of an interaction. A familiar example from speech input is saying “OK Google” to activate voice input for Google Now or Google Glass. In gesture interaction, a clutch may be a particular gesture (often called activation gestures) or body pose (like the Teapot gesture in StrikeAPose). Other alternatives include activation zones or using some other input modality as a clutch, like pressing a button or using a voice command.

Regardless of how you address the Midas Touch problem, you’re moving further away from something people do “naturally”. It’s necessary to have to perform some action specific to user interaction.

The Address Problem

In their Making Sense of Sensing Systems paper, Bellotti et al. (2002) described the problem of how to address a sensing system. Users need to be able to direct their input towards the system they intend to interact with; that is, they must be able to address the desired interface. This is more of a problem in environments with more than one sensing interface. Given that the HCI community is using gestures in more and more contexts, it’s reasonable to assume that we’ll eventually have many gesture interfaces in our environments. We need to be able to address interfaces to avoid our movements accidentally affecting others (another variant of the Midas Touch problem).

In conversation and other human interactions, we typically address each other using cues such as body language or by making eye contact. This isn’t necessarily possible in interaction, as detecting intention to interact implicitly is challenging. Instead we’ll need more explicit interaction techniques to help users address interfaces. As with the Midas Touch problem, this is not “natural”.

The Sensing Problem

Unlike the gestures we make (unintentionally or intentionally) in everyday life, gestures for gesture input need to meet certain conditions. Depending on the sensing method, users have to perform a gesture that will be understood by the system. For example, if users must move their hand across their body from right to left then this movement must be done in a way that can be recognised. This may involve directly facing a gesture sensor, slowing the movement down, moving in a perfectly horizontal line or exaggerating aspects of the gesture. In trying to make gestures understood by sensors, users perform more rigid and forced movements. Again, this is not “natural”.

Implications for HCI

Although there are more reasons that gesture interaction should not be considered a “natural” or “intuitive” input modality, I think these are the three most important ones. All result in users performing hand or body movements which are very specific to interaction; much like how we speak to computers differently than we speak to other people. I think speech input is another modality which is considered “natural” but which suffers similar problems.

I’m not sure if we’ll ever solve these problems from the computer side of gesture interaction. It would be nice, but it’s asking for a lot. Instead, we should embrace the fact that gestures are not “natural” and do what we’re good at in HCI: finding solutions for overcoming problems with technology. We need to design interaction techniques which acknowledge the unnatural aspects of gesture input in order for gestures to be usable outside of our lab studies and in an intelligent world filled with sensing interfaces and devices.

NordiCHI ’14: Beyond the Switch and Nokia

At the moment I’m in Helsinki for NordiCHI 2014. Yesterday I was taking part in the Beyond the Switch workshop, where we discussed interactive lighting and interaction – both implicit and explicit – with light sources.

It was interesting to learn more about how people interact with light and hear about how others are using light in their own research and products. I’m hoping that I also brought an interesting point of view, as an “outsider” in this community. In our research we use interactive light as an output modality, exploiting the increasing connectivity of “smart” light sources. As new and existing light sources are developed with interactivity in mind, I think we’ll start to see others using light in new ways too.

In the first half of the workshop we presented our position papers and identified some interesting topics and challenges which arose in these presentations. After some discussion – and lots of post-it notes – we arranged these topics into themes. Two of the bigger themes which emerged in the workshop were (in my own words): semantics and “natural” interaction with light.

Lots of questions which emerged from the discussion were about what light actually means and how we can use light to represent information. This was relevant to what we do in our research as we use interactive light to encode information about gesture interaction. I think an interesting area for future research would be to understand what properties of light best represent what types of information and what makes a “good” interactive light encoding. There is already research which has started to look at some of these design challenges although more is needed.

In the second half of the workshop we thought more about “natural” interaction. I put quotes around the word natural because it, along with “intuitive”, is a bad word in HCI (according to Steve Brewster, anyway)! As the workshop was about interaction beyond the switch, however, there was a lot of interest in how else we could interact with light. We split up into teams and each focused on different aspects of interaction with light. My team looked at explicit control of light, whilst the other focused on implicit interaction with light. Overall, it was a fun workshop. Lots of really cool demos of Philips Hue, too!

Earlier today I went to visit Vuokko and give a talk about my PhD research, at Nokia Technologies in Otaniemi. After two years of working with and being funded by Nokia, it was nice to finally visit them in Finland! My slides from my talk are available here.

Now that my workshop and presentation at Nokia are finished, I have the rest of the week to enjoy the conference. I’m looking forward to exploring Helsinki more. I’ve walked quite a lot since I got here – mostly at night – and it’s been fun to see the city. Tomorrow morning is the start of the main conference program, starting with a keynote by Don Norman.

Mobile HCI ’14: Why would I use around-device gestures?

Toronto is a fantastic city, which has made this conference so enjoyable.
Toronto is a fantastic city, which has made this conference so enjoyable.

At the Mobile HCI poster session I had some fantastic discussions with some great people. There’s been a lot of around-device interaction research presented at the conference this week and a lot of people who I spoke to when presenting my poster asked: why would I want to do this?

That’s a very important question and the reason it gets asked can maybe give some insight into when around-device gestures may and may not be useful. A lot of people said that if they were already holding their phone, they would just use the touchscreen to provide input. Others said they would raise the device to their mouth for speech input or would even use the device itself for performing a gesture (e.g. shaking it).

In our poster and its accompanying paper, we focused on above-device gestures. We focus on a particular area of the around-device space – directly over the device – as we think this is where users are mostly likely to benefit from using gestures. People typically keep their phones on flat surfaces – Pohl et al. found this in their around-device device paper [link], Wiese et al. [link] found that in their CHI ’13 study, and Dey et al. [link] found that three years ago. As such, gestures are very likely to be used over a phone.

Enjoying some local pilsner to wrap up the conference!
Enjoying some local pilsner to wrap up the conference!

So, why would we want to gesture over our phones? My favourite example, and one which really seems to resonate with people, is using gestures to read recipes while cooking in the kitchen. Wet and messy hands, the risks of food contamination, the need for multitasking – these are all inherent parts of preparing food which can motivate using gestures to interact with mobile devices. Gestures would let me move through recipes on my phone while cooking, without having to first wash my hands. Gestures would let me answer calls while I multitask in the kitchen, without having to stop what I’m doing. Gestures would let me dismiss interruptions while I wash the dishes afterwards, without having to dry my hands.

This is just one scenario where we envisage above-device gestures being useful. Gestures are attractive for a variety of reasons in this context: touch input is inconvenient (I need to wash my hands first); touch input requires more engagement (I need to stop what I’m doing to focus); and touch input is unavailable (I need to dry my hands). I think the answer to why we would want to use these gestures is that they let us interact when other input is inconvenient. Our phones are nearby on surfaces so let’s interact with them while they’re there.

In summary, our work focuses on gestures above the device as this is where we see them being most commonly used. There are many reasons people would want to use around-device gestures but we think the most compelling ones motivate using above-device gestures.

Mobile HCI ’14: “Are you comfortable doing that?”

OCAD University, who are one of the Mobile HCI '14 hosts, have some fantastic architecture on campus.
OCAD University, who are one of the Mobile HCI ’14 hosts, have some fantastic architecture on campus.

One of my favourite talks from the third day of Mobile HCI ’14 was Ahlstrom et al.’s paper on the social acceptability of around-device gestures [link]. In short: they asked users if they were comfortable doing around-device gestures. I think this is a timely topic because we’re now seeing around-device interfaces added to commercial smartphones. Samsung’s Galaxy S4 had hover gestures over the display and Google’s Project Tango added depth sensors to the smartphone form factor. I feel that now we’ve established ways of detecting around-device gestures, it’s now time to look at what around-device gestures should be and if users are willing to use them.

In Ahlstrom’s paper, which was presented excellently by Pourang Irani, they did three studies looking at different aspects of the social acceptability of around-device gestures. They looked mainly at aspects of gesture mechanics: gesture size, gesture duration, position relative to device, distance from the device. When asking users if they were comfortable doing gestures, they found that users were most happy to gesture near the device (biased towards the side of their dominant hand) and found shorter interactions more acceptable.

They also looked at how spectators perceived these gestures, by opportunistically asking onlookers what they thought of someone who was using gestures nearby. What surprised me was that spectators found around-device gestures more acceptable in a wider variety of social situations than the users from the first studies. Does seeing other people perform gestures make those types of gesture input seem more acceptable?

Tonight I presented my poster [paper link] on our design studies for above-device gesture design. There were some similarities between our work and Ahlstrom’s; purely by coincidence, we both asked users if they were comfortable and willing to use certain gestures. However, we focused on what the gestures were, whereas they focused on other aspects of gesturing (e.g. gesture duration).

In our poster and paper we present design recommendations for creating around-device interactions which users think are more usable and more acceptable. I think the next big step for around-device research is looking at how to map potential gestures to actions and identifying ways of making around-device input better. My PhD research is focusing on the output side of things, looking at how we can design feedback to help users as they gesture using the space near devices. If you saw my poster tonight or had a chat with me, there’s more about the research in our poster here; tonight was fun so thanks for stopping by!

Mobile HCI ’14: Using Ordinary Surfaces for Interaction

Mobile HCI '14 day one: a wee bit of Toronto and Henning Pohl's idea of around-device devices.
Mobile HCI ’14 day one: a wee bit of Toronto and Henning Pohl’s idea of around-device devices.

Today was the first day of the papers program at Mobile HCI ’14 and amongst the great talks was one I particularly liked on the idea of “around-device devices” by Pohl et al. [link]. I’ve written before about around-device interaction, above-device interaction, and how the space around mobile devices can be used for gesturing. What’s novel about interaction using around-device devices, however, is that interaction in the around-device space is not just limited to free-hand gestures relative to the device. Instead, nearby objects can become potential inputs in the user interface. One of the motivations for using nearby objects for interaction is that mobile devices are very commonly kept on surfaces – tables, desks, kitchen worktops – which are also used for storing objects. In this post title I call these ordinary surfaces to distance this idea from interactive surfaces.

The example Henning Pohl gives in the paper title is “my coffee mug is a volume dial“. I think this example captures the idea of around-device devices well: mugs, being cylindrical objects, afford certain interactions. In this case, turning them around. There’s implicit physical feedback from interacting with a tangible object which could make interaction easier. Also, using nearby objects provides many of the benefits which around-device gestures give: larger interaction space, unoccluded content on the device screen, potential for more expressive input, etc.

Exploring Toronto during the lunch break. Looking out across downtown from the Blue Jay's stadium.
Exploring Toronto during today’s lunch break. Looking out across downtown from the Blue Jays’ stadium.

Another interesting paper from today was about Toffee, by Xiao et al. [link]. Sticking with the around-device interaction theme, they looked at if it would be possible to use piezo actuators to localise taps and knocks on surrounding table surfaces. Like with around-device devices, this was another way of making use of nearby ordinary surfaces for input. They found that taps could be most reliably localised when given using more solid objects, like touch styluses or knuckles. Softer points, like fingertips, were more difficult to localise. Toffee would be ideal for radial input around devices, due to the characteristics of the tap localisation approach.

I like both of these papers because they push the around-device interaction space a little beyond mid-air free-hand gestures, in both cases using ordinary surfaces as part of the interaction. I know this has been done before with interfaces like SideSight and Qian Qin’s Dynamic Ambient Lighting for Mobile Devices, but I think it’s important that others are exploring this space further.

Mobile HCI ’14 Poster

This time next week I’ll be boarding a plane to fly to Toronto for Mobile HCI! I’ll be presenting a poster there on above-device gesture design and I’m also participating in the doctoral consortium. I’ve set up a page to accompany my poster and demonstrate our above-device gestures: see here. My poster is also finished, printed and ready to go!

Mobile HCI '14 Poster
Mobile HCI ’14 Poster