PyOpenNI and OpenCV

In my last post I gave an example of how to use OpenKinect to get a depth stream which can be manipulated with OpenCV in Python. This example uses PyOpenNI instead, which is more powerful as it exposes useful OpenNI functionality.

The gist of this example is that PyOpenNI abstracts the depth map behind a DepthMap object and for OpenCV we need to turn that into a numpy array. As DepthMap can be iterated over, the most obvious solution is to just call numpy.asarray with the depth map; however, I’ve found that to be far too slow for practical use.

A much faster way is to use the older raw depth stream functions from PyOpenNI’s DepthGenerator class. These functions – get_raw_depth_map and get_raw_depth_map_8 – return byte strings of 16 and 8 bits per pixel, respectively. Numpy can create an array from these byte strings significantly faster than just calling numpy.asarray with the DepthMap object.

In this example a single frame is taken from the Kinect depth stream and turned into an OpenCV-compatible numpy array. This could be useful as a starting point for a Kinect finger tracker because it allows OpenNI’s skeleton and hand tracking functionality to be used alongside OpenCV. As an example, OpenNI could be used to track current hand position so that OpenCV knows which region of the depth map (or rgb stream – this code is easily modifiable to use that instead) contains a hand. Computer vision techniques could then be used to look for fingers in that region – removing the need to try and segment using colour when looking for hands in a normal rgb image.

OpenKinect Python and OpenCV

I’ve spent the past day or so messing around with Kinect and OSX, trying to find a nice combination of libraries and drivers which works well – a more difficult task than you’d imagine! Along the way I’ve found that a lot of these libraries have poor or no documentation.

Here I’m sharing a little example of how I got OpenKinect and OpenCV working together in Python. The Python wrapper for OpenKinect gives depth data as a numpy array which conveniently is the datatype used in the cv2 module.

Here the getDepthMap function takes the depth map from the Kinect sensor, clips the array so that the maximum depth is 1023 (effectively removing distance objects and noise) and turns it into an 8 bit array (which OpenCV can render as grayscale). The array returned from getDepthMap can be used like a grayscale OpenCV image – to demonstrate I apply a Gaussian blur. Finally, imshow renders the image in a window and waitKey is there to make sure image updates actually show.

This is by no means a comprehensive guide to using freenect and OpenCV together but hopefully it’s useful to someone as a starting point!

Method Profiling in Android

I’ve recently been using the Android implementation of OpenCV for real-time computer vision on mobile devices. Computer vision is computationally expensive – especially when you’re working with a camera stream in real-time. In trying to speed up my object tracking algorithm I used Android’s method profiler to analyse the time spent in each function, hoping to identify potential areas for optimisation. This makes an interesting little case study and example of how to use Android’s profiling tools.

How do I enable profiling?
Traceview is part of the Eclipse ADT. Whilst in the DDMS perspective, method profiling can be enabled by selecting a debuggable process and clicking the button circled below. To stop profiling, click the button again. After the profiler is stopped, a Traceview window will appear.

Interpreting Traceview output

The image above was my first method trace, capturing around seven seconds of execution and thousands of method invocations. Each row in the trace corresponds to a method (ordered by CPU usage by default). Selecting a row expands that method, showing all methods invoked from within that method. Again, these are ordered by their CPU usage.

Optimisation using profile data
Using the above example, we can see that my object tracking algorithm spends most of its time waiting for four methods to return: Imgproc.pyrDown, MainActivity.blobUpdate, Imgproc.cvtColor and VideoCapture.retrieve. The pyrDown method downsamples an image matrix whilst applying a Gaussian blur filter. The blobUpdate method is a callback I use to give updates on a tracked object. The cvtColor method converts the values in a matrix to those of another colour space. The retrieve method captures a frame from the device camera.

The latter two methods are crucial to my object tracking algorithm, as I need to call retrieve to get images from the camera and cvtColor is used to convert from RGB to HSV colour space, as it is better to perform colour thresholding this way. The former two, however, can potentially be optimised.

From this trace I’ve already identified a redundant yet expensive method call: pyrDown. 30% of the time spent in the processFrame method is spent waiting for pyrDown to return. I was using this function to downsample an image from the camera to 240×320, as a smaller image can be processed faster. Instead, this call can be eliminated by requesting 240×320 images from the camera.

In the blobUpdate method I send updates about the location of the tracked object and its size. I maintain a short history of these readings and use dynamic time warping to detect gesture input. By expanding the trace for this method I see that my gesture classification function is taking the most time to execute. As dynamic time warping, by design, finds alignments between sequences of different lengths, I can reduce the frequency of checking for gestures. By only checking for gestures in every second call of blobUpdate, I effectively half the amount of time spent checking for gestures. This still maintains a high recognition rate by virtue of dynamic time warping’s resilience to differences in alignment length.

The case study in this post demonstrates how method profiling can be used to identify potential areas for optimisation; something which can be particularly beneficial in a computationally expensive application. By profiling a few seconds of execution of a computer vision algorithm I was able to capture data about thousands of method invocations. From the trace data I identified a redundant method call which accounted for 30% of my algorithm’s execution time and identified an optimisation to the second most expensive method call.