Ivan Kirigin
{ Computer Vision Specialist }

{ Research} Advised by Sanjiv Singh at the Robotics Institute, Carnegie Mellon University

{Navigation Without Odometry}

Navigation Without Odometry
[note: current and ongoing work]

Based on the successful visual servoing result in work related to RHEX, the next step is a higher level use of matching templates. In the visual servoing task, a template that was matched would yield a control based on the motion that would move the template closer and to the center of the image. This would yield a “follow-me” behavior, where the robot would move in the direction of a target.

In a homing task, these templates are used as landmarks for navigation. If it were possible to navigate based only on the location of a template in the image, all metric estimates of the location of the robot would not be needed. This means that the robot could operate in an environment without GPS, and would not rely on expensive computation and sensors needed to solve the Simultaneous Localization and Mapping (SLAM) problem.

One potential method for navigation is bearings-only-navigation, as described in the paper “Angle-Based Methods for Mobile Robot Navigation: Reaching the Entire Plane”, by Bekris, Argyros, & Kavraki. The basic idea is that given a series of landmarks, the difference between the angles between two landmarks at a start and goal location can yield direction which should be followed at the start location to reach the goal.

For our purposes, finding the location of a template corresponding to a world landmark which was seen before will yield an angle measure change to that landmark. If templates are recorded on an initial traversal, they can be used to find matches with currently seen images to reach any point on the initial traversal. Given the initial sequence, the algorithm should be able to return to a start location.

I designed a data-logging system with two very wide-angle cameras oriented at 45 degrees to the left and right with a fairly cheap 6 degree of freedom inertial measurement unit attached to the left camera. This was used to take an image sequence for the initial traversal, and also for the reverse traversal for an off-line demonstration of the algorithm. The following is the initial sequence, followed by the return sequence. In each the left camera is the image on the left, and the right is on the right. Both have been reduced in quality for the web.


View a video sequence: f_low.mov


View a video sequence: b_low.mov

The method for identifying revisited landmarks is based on the SIFT object recognition framework, described in another section of this site, here. To summarize, sections or subsections of images in the initial traversal will be templates against which the currently seen image on the return traversal will be compared. The first qualitative analysis of these sequences can be found by using the entire image as a template, and seeing how it compares to the other images. Here, every 20th image from each camera in the initial traversal is compared to each 20th image in the reverse traversal. This means that there are 164 images to compare, because each sequence is composed of 801 images from a given camera. The results are displayed in the following graph, where a positive match is white, and negative is black. The axes are labeled such that “R” & “L” refer to the right and left camera image respectively, and “for” & “back” refer to the forward and reverse (or ‘backward’) image sequence respectively.

Clearly there is a strong correlation along the downward diagonal. This means that in a given sequence, there are a large number of matches from the same camera for images slightly offset in time. This makes sense: an image of a given scene will very strongly correlate with an image of the same scene from a few meters away. In addition, there are many matches corresponding to the stereo pairs, making the top left and bottom right of the graph tri-diagonal. This is expected, yet not relevant for now.

The key to a successful navigation will be correspondences from the forward and backward sequences, and these are strongest along the upward diagonal from the bottom left to top right. This means that the right image from the end of the forward traversal corresponds to the left image from the reverse traversal, which makes a great deal of intuitive sense. Depending on the path followed, this could change, but can be used to formulate an algorithm for navigation:

• Build a stack from every n-th image in the initial sequence for each camera. (this can be a dynamic value based upon something like the information gain from adding a new image M away)

• Compare the current left image in the reverse traversal to the top of the right stack. Repeat for the right image and left stack.

• If both match, find an instruction for navigation and go to the next image pair in the reverse traversal

• If only one matches:

o If the past 3 comparisons have had only one image pair match, pop both stacks
o Else, find an instruction for navigation and go to the next image pair in the reverse traversal

• If none match, pop both stacks

This is demonstrated in the following video, where the reverse traversal left and right images are displayed on the top from left to right, and the forward traversal left and right and images are displayed on the bottom right to left. The reverse traversal images are sampled at every 20 frames in this rough test. The top does not advance until there is at least one image match. The green dots indicate matched points from the reverse traversal right camera (top right) and the forward traversal left camera (bottom right). The red dots indicate matched points from the reverse traversal left camera (top left) and the forward traversal right camera (bottom left). The sequence is completed in entirety, meaning for each step in the reverse sequence, at least one image matched a previously seen image.


View a video sequence: comp.mov

My current work invovles taking this information, and outputting a navigation command. This will be compared to using subsections of images, manually and automatically selected, rather than the entire image, in addition to sampling the forward sequence at irregular intervals based on feature tracking data. This can be evaluated quantitatively in an offline demonstration by comparing how the navigation commands compare to the actual path taken. Ideally the former should follow the derivative of the latter. In a live test, success manifests itself by the robot reaching its goal.

Back to top

Professional
{ Research }
{ I-Kbot }