OpenKinect Python and OpenCV

I’ve spent the past day or so messing around with Kinect and OSX, trying to find a nice combination of libraries and drivers which works well – a more difficult task than you’d imagine! Along the way I’ve found that a lot of these libraries have poor or no documentation.

Here I’m sharing a little example of how I got OpenKinect and OpenCV working together in Python. The Python wrapper for OpenKinect gives depth data as a numpy array which conveniently is the datatype used in the cv2 module.

import freenect
import cv2
import numpy as np

"""
Grabs a depth map from the Kinect sensor and creates an image from it.
"""
def getDepthMap():	
	depth, timestamp = freenect.sync_get_depth()

	np.clip(depth, 0, 2**10 - 1, depth)
	depth >>= 2
	depth = depth.astype(np.uint8)

	return depth

while True:
	depth = getDepthMap()

	blur = cv2.GaussianBlur(depth, (5, 5), 0)

	cv2.imshow('image', blur)
	cv2.waitKey(10)

Here the getDepthMap function takes the depth map from the Kinect sensor, clips the array so that the maximum depth is 1023 (effectively removing distance objects and noise) and turns it into an 8 bit array (which OpenCV can render as grayscale). The array returned from getDepthMap can be used like a grayscale OpenCV image – to demonstrate I apply a Gaussian blur. Finally, imshow renders the image in a window and waitKey is there to make sure image updates actually show.

This is by no means a comprehensive guide to using freenect and OpenCV together but hopefully it’s useful to someone as a starting point!

Heuristics for smarter Leap Motion tracking

Leap Motion is quite easy to develop for but I find that hand and finger tracking can be tricky – especially for more precise gesture control. I’ve been doing some development with Leap that requires precise positioning (~mm accuracy needed) with multiple fingers. In this post I run through a few strategies I use to try and improve my Leap applications.

1) Track hand and finger identifiers

Each Finger and Hand object has a unique identifier which should remain consistent between frames, so long as the object continues to be tracked. If you can make a reasonable assumption about which finger(s) the user is intending to use for interaction, you can look for a specific identifier to find the finger(s) of interest, ignoring anything else in the frame. I find that when extending their index finger for interaction, most users tend to hold their thumb out unwittingly; which Leap detects, of course.

I use a simple heuristic to try address this issue: pick the finger with the smallest z-coordinate (furthest forward) and track its identifier in future frames. When I want to track multiple fingers for interaction, I make sure those fingers are attached to the hand whose identifier I’m tracking. This rejects other unattached fingers in the field of view, e.g. the cylindrical lights in my office at university! (Seriously…)

2) Check finger lengths

Sometimes Leap sees your knuckles in a closed fist and reports them as fingers. The looser the grip, the worse the problem. Checking that the length of each finger is above a threshold length can help you reject misclassified knuckles and clenched fingers. A threshold of 2 cm or thereabouts should be fine; any longer and you risk rejecting small fingers or fingers which are fully extended but Leap just isn’t tracking properly.

3) Look at direction vectors for unlikely values

The biggest problem I have with Leap is that it picks up fingers in a closed fist. Looking at the debug visualiser, I tend to see fingers pointing upwards through the fist, rather than an accurate representation of a finger in a clenched fist. These “phantom” fingers tend to have extreme y values in the direction vector (> 0.6). Through observation of some of my users I’ve also noticed that when using a certain hand pose for input, “inactive” fingers are left to hang down; which Leap picks up perfectly. These fingers typically point downwards, with a high y value in the direction vector (< -0.6).

A heuristic I use to reject “phantom” fingers and fingers the user probably doesn’t want to use for input is to take the absolute y value from the direction vector and ignore the finger if this value exceeds a threshold. I use a threshold around 0.4. A limitation of this heuristic is that the user has to keep their hand quite flat. To address this, you could instead measure finger direction relative to the hand orientation and then apply the threshold.

Summary

Sometimes Leap gives crap data and we also can’t expect our users to keep a perfectly clenched fist. A few simple checks on each frame update can help us reject fingers which either don’t exist or weren’t intended to be part of the interaction.

Leaping into motion

leapmotionMy Leap Motion arrived today. I’ve been working with a development version of the hardware for a couple of months now but I was excited to see what improvements the retail version brought. Other than the obvious aesthetic differences (I’m a fan of the brushed aluminium look!), the retail version has a wider field of view. While this doesn’t affect use of the device much, it does mean that it’s less likely to lose sight of your hands when you move too far away whilst close to the desk. Based on the diagnostic visualiser, it also seems to be more accurate and deals with hand rotation better (although this is likely a software rather than hardware improvement).

Having tried some of the apps in the Airspace app store I’m surprised at the large variety of pointing and selection mechanisms being used by developers. Using the finger as a pointer to control the cursor seems easy enough (although it quickly becomes fatiguing after extended periods!), and most developers seem consistent in how they implement this. Getting the control:display gain correct is going to be a key challenge, as some apps require only the most subtle of finger movements to send the cursor flying accidentally. Smoothing the cursor position (e.g. using a Kalman filter) would help to deal with cursor jitter and unexpected movement.

There are a variety of selection methods being used, including finger poke and tap (two gestures recognised by the base SDK), dwell, and opening the hand. The latter selection gesture is used in the Jazz Hands game and works surprisingly well. While pointing to control a cursor, the user can make a selection by opening their hand up. While there is a little inadvertent movement of the cursor as a result of the gesture, it doesn’t require any direct movement of the pointing finger.

I’m not a fan of the poke and tap gestures but that might just be because I’ve yet to try a good implementation of them. I’ve implemented my own recogniser for tap gestures a couple of weeks ago and know how challenging it can be to get it feeling “right”.

Dwell selection is generally very accurate and I think that this is going to be one of the best selection methods for Leap going forward. This is largely because it allows the user to leap into an app (a terrible pun, I know) without having to figure out how to make selections. The immediacy of the visual feedback makes it obvious that by remaining over a button, something is taking place.

I’m excited about the future of Leap and other gesture interfaces because they offer the potential to create new ways of interacting with computers. It’s not the future, but it’s a step in the right direction.

Balanced Latin Squares in C# and R

For a recent experiment I used the method presented here to generate a balanced latin square for selecting participant condition orders. Rather than do this manually, I wrote a couple of functions to do this automatically for squares of any size. Here’s an implementation in both C# and R!

C#

public static int[,] GetLatinSquare(int n)
{
    // 1. Create initial square.
    int[,] latinSquare = new int[n, n];

    // 2. Initialise first row.
    latinSquare[0, 0] = 1;
    latinSquare[0, 1] = 2;

    for (int i = 2, j = 3, k = 0; i < n; i++)
    {
        if (i % 2 == 1)
            latinSquare[0, i] = j++;
        else
            latinSquare[0, i] = n - (k++);
    }

    // 3. Initialise first column.
    for (int i = 1; i <= n; i++)
    {
        latinSquare[i - 1, 0] = i;
    }

    // 4. Fill in the rest of the square.
    for (int row = 1; row < n; row++)
    {
        for (int col = 1; col < n; col++)
        {
            latinSquare[row, col] = (latinSquare[row - 1, col] + 1) % n;

            if (latinSquare[row, col] == 0)
                latinSquare[row, col] = n;
        }
    }

    return latinSquare;
}

R

LatinSquare <- function(n) {
  # 1. Create initial square.
  sq <- matrix(0, n, n)

  # 2. Initialise first row.
  sq[1, 1] <- 1
  sq[1, 2] <- 2

  j <- 3
  k <- 0

  for (i in 3:n) {
    if (i %% 2 == 0) {
      sq[1, i] <- j
      j <- j + 1
    } else {
      sq[1, i] <- n - k
      k <- k + 1
    }
  }

  # 3. Initialise first column.
  for (i in 2:(n+1)) {
    sq[i - 1, 1] <- i - 1
  }

  # 4. Fill in the rest of the square.
  for (row in 2:n) {
    for (col in 2:n) {
      sq[row, col] <- (sq[row - 1, col] + 1) %% n

      if (sq[row, col] == 0) {
        sq[row, col] = n
      }
    }
  }

  return (sq)
}

 

Method Profiling in Android

I’ve recently been using the Android implementation of OpenCV for real-time computer vision on mobile devices. Computer vision is computationally expensive – especially when you’re working with a camera stream in real-time. In trying to speed up my object tracking algorithm I used Android’s method profiler to analyse the time spent in each function, hoping to identify potential areas for optimisation. This makes an interesting little case study and example of how to use Android’s profiling tools.

How do I enable profiling?
Traceview is part of the Eclipse ADT. Whilst in the DDMS perspective, method profiling can be enabled by selecting a debuggable process and clicking the button circled below. To stop profiling, click the button again. After the profiler is stopped, a Traceview window will appear.


Interpreting Traceview output


The image above was my first method trace, capturing around seven seconds of execution and thousands of method invocations. Each row in the trace corresponds to a method (ordered by CPU usage by default). Selecting a row expands that method, showing all methods invoked from within that method. Again, these are ordered by their CPU usage.

Optimisation using profile data
Using the above example, we can see that my object tracking algorithm spends most of its time waiting for four methods to return: Imgproc.pyrDown, MainActivity.blobUpdate, Imgproc.cvtColor and VideoCapture.retrieve. The pyrDown method downsamples an image matrix whilst applying a Gaussian blur filter. The blobUpdate method is a callback I use to give updates on a tracked object. The cvtColor method converts the values in a matrix to those of another colour space. The retrieve method captures a frame from the device camera.

The latter two methods are crucial to my object tracking algorithm, as I need to call retrieve to get images from the camera and cvtColor is used to convert from RGB to HSV colour space, as it is better to perform colour thresholding this way. The former two, however, can potentially be optimised.

From this trace I’ve already identified a redundant yet expensive method call: pyrDown. 30% of the time spent in the processFrame method is spent waiting for pyrDown to return. I was using this function to downsample an image from the camera to 240×320, as a smaller image can be processed faster. Instead, this call can be eliminated by requesting 240×320 images from the camera.

In the blobUpdate method I send updates about the location of the tracked object and its size. I maintain a short history of these readings and use dynamic time warping to detect gesture input. By expanding the trace for this method I see that my gesture classification function is taking the most time to execute. As dynamic time warping, by design, finds alignments between sequences of different lengths, I can reduce the frequency of checking for gestures. By only checking for gestures in every second call of blobUpdate, I effectively half the amount of time spent checking for gestures. This still maintains a high recognition rate by virtue of dynamic time warping’s resilience to differences in alignment length.

Conclusion
The case study in this post demonstrates how method profiling can be used to identify potential areas for optimisation; something which can be particularly beneficial in a computationally expensive application. By profiling a few seconds of execution of a computer vision algorithm I was able to capture data about thousands of method invocations. From the trace data I identified a redundant method call which accounted for 30% of my algorithm’s execution time and identified an optimisation to the second most expensive method call.

Multimodal Android Development Part 1

This post is the first of two which gives a brief introduction to creating multimodal interactions in Android applications. I’ll briefly cover some of the SDK features available to you as an Android developer which you can use to create richer interactions in your apps. Example code will be quite concise because I assume you have at least a basic knowledge of Android development. Feel free to leave any comments suggesting how I can better explain these concepts, or to let me know if I’ve made any mistakes or omissions.

What is “multimodal” interaction?

Multimodal interaction, put simply, is interaction involving more than one modality (e.g. multiple senses). For example, an application may provide a combination of visual and haptic (touch) feedback. These types of interaction design provide a number of benefits, for example allowing those with sensory impairment to interact using other senses, or allowing interaction in contexts where one sense may be otherwise occupied.

One of the most ubiquitous examples of a multimodal interaction is the way in which mobile phones combine visual, audible and haptic feedback to inform users of a new text, phone call, etc. This combination of modalities is particularly useful when your phone is, say, in your pocket. Obviously you can’t see the phone, but you will probably feel the phone vibrate or hear your ringtone as new notifications appear.

Haptic feedback in Android

Most handheld Android devices have some sort of rotation motor in them allowing simple haptic feedback. Although not common in tablets (largely due to size constraints), all modern Android phones will have tactile feedback available. You can control the phone vibrator through the Vibrator class. Note that in order to use this, your Manifest must request the following permission: android.permission.VIBRATE

/* Request the device's vibrator service. Remember to check
 * for null return value, in case this isn't available. */
Vibrator vibrator = (Vibrator) getSystemService(Context.VIBRATOR_SERVICE);

/* Two ways to control the vibrator:
 *  1. Turn on for a specific time
 *  2. Provide a vibration pattern */

/* 1. Vibrate for 200ms */
vibrator.vibrate(200);

/* 2. Vibrate for 200ms, pause for 100ms, vibrate for 300ms. */
long[] pattern = new long[] {0, 200, 100, 300};

/* Perform this pattern once only (repeat := -1). */
vibrator.vibrate(pattern, -1);

/* Vibrate for 200ms, followed by indefinite repeat of
 * 100ms pause followed by 300ms vibrate. Setting
 * repeat := 2 tells the vibrator to repeat at offset
 * 2 into the vibration pattern. */
vibrator.vibrate(pattern, 2);

 

Touchscreen gestures

Using touchscreen gestures to interact with applications can be fun, efficient and useful when users may be unable to select a particular action on the screen. For example, it can be difficult to select a button on-screen when running or walking. A touch gesture, however, is a lot easier and requires less precision from the user. The disadvantage with touch gestures is that if not used sparingly, there may be too much for the user to remember!
Creating a set of gestures for your application is simple: create a gesture library on an Android Virtual Device using the Gesture Builder application (available on the AVD by default) and add a GestureOverlayView to your activity layout. In your activity, you just have to load the gesture library from your resources and implement an OnGesturePerformedListener.

 

private GestureLibrary mLibrary;

public void onCreate(Bundle savedInstanceState) {
  ...
  /* 1. Load gesture library from the res/raw/gestures file */
  mLibrary = GestureLibraries.fromRawResource(this, R.raw.gestures);

  if (!mLibrary.load())
    /* Error: unable to load from resources! */
    ...

  /* 2. Find reference to the gesture overlay view */
  GestureOverlayView gov = (GestureOverlayView) findViewById(R.id.gestureOverlay);

  /* 3. Register callback for gesture input */
  gov.addOnGesturePerformedListener(this);
}

The callback method for gesture performance receives a Gesture as an argument. This can be used to obtain a list of predictions: which gestures in your library that Android thought the gesture was. With these predictions, you can use the prediction score (or contextual information) to determine which gesture the user was most likely to have performed. I find it useful to define a threshold for gesture acceptance, so that you can reject erroneous or inaccurate gestures. The best way to choose this threshold value is through trial and error: see what works for you and your gestures.

private static final double ACCEPTANCE_THRESHOLD = 10.0;

public void onGesturePerformed(GestureOverlayView overlay, Gesture gesture) {
  /* 1. Get list of gesture predictions */
  ArrayList predictions = mLibrary.recognize(gesture);

  if (predictions.size() > 0) {
    /* 2. Find highest scoring prediction */
    Prediction bestPrediction = predictions.get(0);

    for (int i = 1; i < predictions.size(); i++) {
      Prediction p = predictions.get(i);
      if (p.score > bestPrediction.score)
        bestPrediction = p;
    }

    /* 3. Decide if we'll accept this gesture */
    if (bestPrediction.score > ACCEPTANCE_THRESHOLD)
      gestureAccepted(bestPrediction.name);
  }
}

private void gestureAccepted(String gestureName) {
  /* Respond appropriately to the gesture name */
  ...
}

 

Saving map images in Android

Recently I’ve been working on a little Android project and wanted to save thumbnail images of a map within the application. This post is just sharing how to do exactly that. Nothing too complicated. 

public class MyMapActivity extends MapActivity {
    private MapView mapView;

    ...

    private Bitmap getMapImage() {
        /* Position map for output */
        MapController mc = mapView.getController();
        mc.setCenter(SOME_POINT);
        mc.setZoom(16);

        /* Capture drawing cache as bitmap */
        mapView.setDrawingCacheEnabled(true);
        Bitmap bmp = Bitmap.createBitmap(mapView.getDrawingCache());
        mapView.setDrawingCacheEnabled(false);

        return bmp;
    }

    private void saveMapImage() {
        String filename = "foo.png";
        File f = new File(getExternalFilesDir(null), filename);
        FileOutputStream out = new FileOutputStream(f);

        Bitmap bmp = getMapImage();

        bmp.compress(Bitmap.CompressFormat.PNG, 100, out);

        out.close();
    }
}

In the getMapImage method, we’re telling the map controller to move to a particular point (this may not matter to you, you may just want to take the image as it appears) and zooming in to show a sufficient level of detail. Then a Bitmap is created from the map view’s drawing cache. The saveMapImage method is just an example of how you may want to save an image to the application’s external file directory.

Left-recursion in Parsec

Lately I’ve been using the Parsec library for Haskell to write a parser and interpreter for a university assignment. Right-recursive grammars are trivial to parse with combinatorial parsers; tail recursion and backtracking make this simple. However, implementing a left-recursive grammar will often result in an infinite loop, as is the case in Parsec when using basic parsers. 

Parsec does support left-recursion however. Unsatisfied with the lack of good tutorials when I googled for advice, I decided to write this. Hopefully it helps someone. If I can make this better or easier to understand, please let me know!

Left recursive parsing can be achieved in Parsec using chainl1.

chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a

As an example of how to use chainl1, I’ll demonstrate its use in parsing basic integer addition and subtraction expressions.

First we’ll need an abstract syntax tree to represent an integer expression. This can represent a single integer constant, or an addition / subtraction operation which involves two integer expressions.

data IntExp = Const Int
            | Add IntExp IntExp
            | Sub IntExp IntExp

If addition and subtraction were to be right-associative, we’d parse the left operand as a constant, and attempt to parse the right operand as another integer expression. Upon failing, we’d backtrack and instead attempt to parse an integer constant. Reversing this approach to make the expressions left-associative would cause infinite recursion; we’d attempt to parse the left operand as an integer expression, which attempts to parse the left operand as an integer expression, which tries to… you get the point.

Instead we use chainl1 with two parsers; one to parse an integer constant, and another which parses a symbol and determines if the expression is an addition or subtraction.

parseIntExp :: Parser IntExp
parseIntExp =
  chainl1 parseConstant parseOperation

parseOperation :: Parser (IntExp -> IntExp -> IntExp)
parseOperation =
  do spaces
     symbol <- char '+' <|> char '-'
     spaces
     case symbol of
       '+' -> return Add
       '-' -> return Sub

parseConstant :: Parser IntExp
parseConstant =
  do xs <- many1 digit
     return $ Const (read xs :: Int)

Here, parseOperation returns either the Add or Sub tag of IntExp. Using GHCi, you can confirm the type of Add as:

Add :: IntExp -> IntExp -> IntExp

So, we have a parser which will parse a constant and a parser which will parse a symbol and determine what type of operation an expression is. In parseIntExp, chainl1 is the glue which brings these together. This is what allows left-associative parsing without infinitely recursing.

A complete code sample is available here. The abstract syntax tree has been created as an instance of the Show typeclass to print in a more readable format, which shows that the grammar is indeed left-associative.

ghci>  run parseIntExp "2 + 3 - 4"
((2+3)-4)

Finding maximum zero submatrix with C#

After spending most of this evening attempting to solve this problem, it only feels right that I share my solution.

Suppose you have a binary matrix and you wish to find the largest zero submatrix, i.e. the largest rectangle of zeroes in the matrix (see below, highlighted in orange).

1 1 0 0 0 1 0
1 0 0 0 0 1 1
1 1 0 1 0 1 1
1 1 1 0 1 1 0

The brute-force approach to this problem isn’t particularly efficient, with complexity O(n2m2). It involves looking at every position in the matrix, taking that position as an arbitrary point of the submatrix (e.g. the top-left point) and then testing all possible rectangles which originate at that point (e.g. all possible bottom-right points).

This algorithm finds the largest zero submatrix by looking at each position in turn and attempting to “grow” the submatrix up, to the left, and to the right. A dynamic programming approach is used and it keeps track of where the value 1 occurs. Array d contains the previous row where a 1 was found for each column. Array d1 contains the position of the left borders, and d2 contains the position of the right borders.

A complete code snippet for this program is available here. This has code to output the results and shows how to make use of the values calculated in this function.

static void MaxSubmatrix(int[,] matrix)
{
 int n = matrix.GetLength(0); // Number of rows
 int m = matrix.GetLength(1); // Number of columns

 int maxArea = -1, tempArea = -1;

 // Top-left corner (x1, y1); bottom-right corner (x2, y2)
 int x1 = 0, y1 = 0, x2 = 0, y2 = 0;

 // Maximum row containing a 1 in this column
 int[] d = new int[m];

 // Initialize array to -1
 for (int i = 0; i < m; i++)
 {
  d[i] = -1;
 }

 // Furthest left column for rectangle
 int[] d1 = new int[m];

 // Furthest right column for rectangle
 int[] d2 = new int[m];

 Stack stack = new Stack();

 // Work down from top row, searching for largest rectangle
 for (int i = 0; i < n; i++)
 {
  // 1. Determine previous row to contain a '1'
  for (int j = 0; j < m; j++)
  {
   if (matrix[i,j] == 1)
   {
    d[j] = i;
   }
  }

  stack.Clear();

  // 2. Determine the left border positions
  for (int j = 0; j < m; j++)
  {
   while (stack.Count > 0 && d[stack.Peek()] <= d[j])
   {
    stack.Pop();
   }

   // If stack is empty, use -1; i.e. all the way to the left
   d1[j] = (stack.Count == 0) ? -1 : stack.Peek();

   stack.Push(j);
  }

  stack.Clear();

  // 3. Determine the right border positions
  for (int j = m - 1; j >= 0; j--)
  {
   while (stack.Count > 0 && d[stack.Peek()] <= d[j])
   {
    stack.Pop();
   }

   d2[j] = (stack.Count == 0) ?  m : stack.Peek();

   stack.Push(j);
  }

  // 4. See if we've found a new maximum submatrix
  for (int j = 0; j < m; j++)
  {
   // (i - d[j]) := current row - last row in this column to contain a 1
   // (d2[j] - d1[j] - 1) := right border - left border - 1
   tempArea = (i - d[j]) * (d2[j] - d1[j] - 1);

   if (tempArea > maxArea)
   {
    maxArea = tempArea;

    // Top left
    x1 = d1[j] + 1;
    y1 = d[j] + 1;

    // Bottom right
    x2 = d2[j] - 1;
    y2 = i;
   }
  }
 }
}