Navigation
Without Odometry
[note: current and ongoing work]
Based
on the successful visual servoing result in work related to RHEX,
the next step is a higher level use of matching templates. In the
visual servoing
task, a template that was matched would yield a control based on
the motion that would move the template closer and to the center
of the image. This would yield a “follow-me” behavior,
where the robot would move in the direction of a target.
In
a homing task, these templates are used as landmarks for navigation.
If it were possible to navigate based only on the location
of a template in the
image, all metric estimates of the location of the robot would not be needed.
This means that the robot could operate in an environment without GPS, and
would not rely on expensive computation and sensors needed to solve the Simultaneous
Localization and Mapping (SLAM) problem.
For
our purposes, finding the location of a template corresponding
to a world landmark which was seen before will yield an angle
measure change to that
landmark. If templates are recorded on an initial traversal, they
can be used to find
matches with currently seen images to reach any point on the initial
traversal. Given the initial sequence, the algorithm should
be able to return to a start
location.
I designed a data-logging system with two very wide-angle cameras oriented
at 45 degrees to the left and right with a fairly cheap 6 degree of
freedom inertial measurement unit attached to the left camera. This
was used to take
an image sequence for the initial traversal, and also for the reverse
traversal for an off-line demonstration of the algorithm. The following
is the initial
sequence, followed by the return sequence. In each the left camera
is the image on the left, and the right is on the right. Both have
been reduced
in quality
for the web.
The
method for identifying revisited landmarks is based on the
SIFT object recognition
framework, described in another section of this
site, here. To summarize,
sections or subsections of images in the initial traversal
will be templates
against which the currently seen image on the return traversal
will be compared. The first qualitative analysis of these sequences
can
be found by using the entire image as a template, and seeing
how it compares to the other images. Here, every 20th image
from each
camera in the initial traversal is compared to each 20th image
in the reverse traversal. This means that there are 164 images
to compare,
because each sequence is composed of 801 images from a given
camera. The results are displayed in the following graph, where
a positive
match is white, and negative is black. The axes are labeled
such that “R” & “L” refer to the
right and left camera image respectively, and “for” & “back” refer
to the forward and reverse (or ‘backward’) image
sequence respectively.
Clearly
there is a strong correlation along the downward diagonal. This
means that in a given sequence, there are a large number of
matches from the same camera for images slightly offset in time.
This makes sense: an image of a given scene will very strongly
correlate with an image of the same scene from a few meters
away. In addition,
there are many matches corresponding to the stereo pairs, making
the top left and bottom right of the graph tri-diagonal. This
is expected, yet not relevant for now.
The key to a successful navigation will be correspondences from
the forward and backward sequences, and these are strongest along
the
upward diagonal from the bottom left to top right. This means that
the right image from the end of the forward traversal corresponds
to the left image from the reverse traversal, which makes a great
deal of intuitive sense. Depending on the path followed, this could
change, but can be used to formulate an algorithm for navigation:
• Build a stack from every n-th image in the initial sequence for each
camera. (this can be a dynamic value based upon something like
the information gain from adding a new image M away)
• Compare the current left image in the reverse traversal to the top
of the right stack. Repeat for the right image and left stack.
• If both match, find an instruction for navigation and go to the next
image pair in the reverse traversal
• If
only one matches:
o If
the past 3 comparisons have had only one image pair match,
pop both stacks o Else, find an instruction for navigation
and go to the next image pair in the reverse traversal
• If
none match, pop both stacks
This is demonstrated in the following video, where the reverse traversal
left and right images are displayed on the top from left to right,
and the forward traversal left and right and images are displayed
on the bottom right to left. The reverse traversal images are sampled
at every 20 frames in this rough test. The top does not advance until
there is at least one image match. The green dots indicate matched
points from the reverse traversal right camera (top right) and the
forward traversal left camera (bottom right). The red dots indicate
matched points from the reverse traversal left camera (top left)
and the forward traversal right camera (bottom left). The sequence
is completed in entirety, meaning for each step in the reverse sequence,
at least one image matched a previously seen image.
My
current work invovles taking
this information, and outputting a navigation command. This will
be compared to using subsections of images, manually
and automatically selected, rather than the entire image, in addition
to sampling the forward sequence at irregular intervals based on
feature tracking data. This can be evaluated quantitatively in an
offline demonstration by comparing how the navigation commands compare
to the actual path taken. Ideally the former should follow the derivative
of the latter. In a live test, success manifests itself by the robot
reaching its goal.