PyOpenNI and OpenCV

In my last post I gave an example of how to use OpenKinect to get a depth stream which can be manipulated with OpenCV in Python. This example uses PyOpenNI instead, which is more powerful as it exposes useful OpenNI functionality.

The gist of this example is that PyOpenNI abstracts the depth map behind a DepthMap object and for OpenCV we need to turn that into a numpy array. As DepthMap can be iterated over, the most obvious solution is to just call numpy.asarray with the depth map; however, I’ve found that to be far too slow for practical use.

A much faster way is to use the older raw depth stream functions from PyOpenNI’s DepthGenerator class. These functions – get_raw_depth_map and get_raw_depth_map_8 – return byte strings of 16 and 8 bits per pixel, respectively. Numpy can create an array from these byte strings significantly faster than just calling numpy.asarray with the DepthMap object.

from openni import *
import numpy as np
import cv2

# Initialise OpenNI
context = Context()
context.init()

# Create a depth generator to access the depth stream
depth = DepthGenerator()
depth.create(context)
depth.set_resolution_preset(RES_VGA)
depth.fps = 30

# Start Kinect
context.start_generating_all()
context.wait_any_update_all()

# Create array from the raw depth map string
frame = np.fromstring(depth.get_raw_depth_map_8(), "uint8").reshape(480, 640)

# Render in OpenCV
cv2.imshow("image", frame)

In this example a single frame is taken from the Kinect depth stream and turned into an OpenCV-compatible numpy array. This could be useful as a starting point for a Kinect finger tracker because it allows OpenNI’s skeleton and hand tracking functionality to be used alongside OpenCV. As an example, OpenNI could be used to track current hand position so that OpenCV knows which region of the depth map (or rgb stream – this code is easily modifiable to use that instead) contains a hand. Computer vision techniques could then be used to look for fingers in that region – removing the need to try and segment using colour when looking for hands in a normal rgb image.