How many methods can I use to acquire depth data?

I am a newbie in Robotics. As far as I know, there're generally two ways to acquire depth information of a scene. One is the stereo vision method which applies two cameras. The other is the RGBD sensors such as Kinect and Primesense. It seems that both methods are in use currently. However, I do not know what are their advantages against each other.

I think the Kinect is a perfect solution over stereo vision ignoring its expense. So I have two questions here:

  1. Is there advantages of binocular methods over Kinect besides expense?
  2. As I know, both the two methods are confined with a limited distance detection range. In real world applications, we sometimes also need depth data at distance. Is there a method that we can acquire or estimate data information at far distance( not considering laser-based detectors ).

Further more, my application may be a small flight vehicle. Which method and equipment should I choose? Will the traditional binocular camera be too slow for my application?

Answers 4

  • Actually kinect sensors are similar to stereo-cameras. They are made up of two devices, an IR emitter and a IR receiver. The emitter projects a light-dots pattern of known shape and size, while the other device receives the actual reflected dots. From the deformation of the pattern is possible to find out depth, but the maths behind it, as far as I know, is the same of stereo triangulation with known correspondences (see this video as an example).

    The main disadvantages of kinect when compared to stereo-camera systems are:

    1. it does not work outdoors
    2. it has a range of a few meters (less than 8m).

    Stereo-cameras, on the contrary, work outdoors. Their range depends on the baseline, i.e. the distance between the cameras: the higher it is, the farther are the points that can be triangulated. However, most of commercial stereo-cameras, with baselines of few cm, have an accuracy of a few meters (see for example this commercial stereo device). You can build yourself a stereo-rig to match the accuracy you need, but if you want good accuracy the device will not be cheap if compared to a kinect (depends mostly on the cameras you think to buy) and will cost you time to set up. Finally, if you have to triangulate very far points, then you probably need to use a Structure from Motion algorithm (see Wikipedia for a quick definition).

  • I know you said "not considering laser based detectors," but I am assuming you are discounting them due to price? There is a LIDAR unit available from sparkfun for less than $100 which claims to have a range to 40m. That's about half the price of a Kinect sensor. AFAIK, this unit only measures range to a single point, I'm not sure how easy it would be to implement it as a rastering scanner.

  • For gathering depth information and in general for mapping, there is no perfect solution yet. You need to have a good idea about application before choosing a sensor. Try to narrow down your options. Are you going to use it indoor or outdoor? How fast you want the depth measurements (you can use a single camera, but it will take around 10 seconds to generate a good point cloud)? What is the range? How accurate you want it to be (Lidar)? What resolution you need (e.g. 320x240)? What is your processing capabilities? Weight limits? Are you going to have vibration (no Lidar then)?

    Configuration of stereo-vision sensor directly affects the range and accuracy. You can increase the distance between two cameras to get more reliable measurements for larger distances. It's not too difficult to make your own stereo-vision sensor. The only tricky part is two levels of calibration. One is for flatting the recorded images of each camera and the other is calculating geometric translation relation between two cameras. You can also add more cameras to your system. For example, there is a PointGrey sensor that uses 3 cameras using the same principle used in stereo-vision.

    Time-of-flight cameras are generally smaller and lighter. Faster to process and give more information (e.g. color). They are not good options for outdoor or applications requiring long ranges. Emitted light might reflect more than once (does not happen too often like in sonar sensors) and give inaccurate measurements.

    Speaking of sonars, you might want to look at side-scan sonars.

  • I'm not familiar with internals of Kinect, but don't these RGBD devices essentially use stereo vision internally?

    Regardless, stereo vision needs some software to give you RGBD values (there are most likely libraries already available for free), but those devices have that software already written for their particular technology. In the end, the data you get is similar.

    Regarding your second question, the farther the distance between the cameras, the more 3D you can see. So if you go with two cameras, the advantage is that you can always adjust their distance based on how far your target is. Xtion, Kinect and similar technologies are optimized for a very particular scenario (recognizing people in front of the TV withing a couple of meters), so of course they are only suitable if your conditions are similar.

Related Questions