GraphSLAM: why are constraints imposed twice in the information matrix?

I was watching Sebastian Thrun's video course on AI for robotics (freely available on udacity.com). In his final chapter on GraphSLAM, he illustrates how to setup the system of equations for the mean path locations $x_i$ and landmark locations $L_j$.

To setup the matrix system, he imposes each robot motion and landmark measurement constraint twice. For example, if a robot motion command is to move from x1 by 5 units to the right (reaching x2), i understand this constraint as

$$-x_2+x_1= -5$$

However, he also imposes the negative of this equation $$x_2-x_1=5$$ as a constraint and superimposes it onto a different equation and i'm not sure why. In his video course, he briefly mentions that the matrix we're assembling is known as the "information matrix", but i have no why the information matrix is assembled in this specific way.

So, I tried to read his book Probabilistic Robotics, and all i can gather is that these equations come from obtaining the minimizer of the negative log posterior probability incorporating the motion commands, measurements, and map correspondences, which results in a quadratic function of the unknown variables $L_j$ and $x_i$. Since it is quadratic (and the motion / measurement models are also linear), the minimum is obviously obtained by solving a linear system of equations.

But why are each of the constraints imposed twice, with once as a positive quantity and again as the negative of the same equation? Its not immediately obvious to me from the form of the negative log posterior probability (i.e. the quadratic function) that the constraints must be imposed twice. Why is the "information matrix assembled this way? Does it also hold true when the motion and measurement models are nonlinear?

Any help would be greatly appreciated.

Answers 1

  • After painstakingly trying to find someone on the internet with more experience on this subject to help me out (to no avail), I finally gave up and decided to take matters into my own hands and figure it out myself! As it turns out, the constraints are imposed twice as a direct result of applying the chain rule for derivatives when obtaining the gradient of the negative log posterior belief function equal to zero (which is equivalent to finding the maximum of the belief).

    Unfortunately, there's no easy way to demonstrate this other than going through the math one step at a time.

    Problem Setup

    To help explain, let me setup an example to work with. For simplicity, lets assume that the robot moves in only on direction (in this case, the x-direction). In one dimension, the covariance matrices for motion and sensor data are simply the variances $\sigma^2_{motion}$ and $\sigma^2_{sensor}$. Again, for simplicity, let's assume that $\sigma^2_{motion}=\sigma^2_{sensor}=1$.

    Now, let's assume that the robot starts at the point $x_0=0$ and then executes two motion commands in this following order:

    1. Move forward by 10 units ($x_1 = x_0 + 10$)
    2. Move forward by 14 units ($x_2 = x_1 + 14$)

    Let's also assume that the robot world only contains one landmark $L_0$ which lies somewhere in the 1D world of the robot's motion. Suppose that the robot senses the following distances to the landmark from each of the three positions $x_0, x_1, x_2$:

    1. At $x_0$: The robot sensed Landmark $L_0$ at a distance of 9 units ($L_0-x_0=9$)
    2. At $x_1$: The robot sensed Landmark $L_0$ at a distance of 8 units ($L_0-x_1=8$)
    3. At $x_2$: The robot sensed Landmark $L_0$ at a distance of 21 units ($L_0-x_1=8$)

    (These numbers may look a little strange, but just take them as a given for this exercise).

    Belief Function

    So, each of the relative motion and measurement constraints contributes a Gaussian function to the "posterior belief" function. So, with the information assumed above, we can write the belief function as the product of gaussians as follows:

    $$Belief = C e^{-\frac{(x_0-0)^2}{2\sigma^2}}e^{-\frac{(x_1-x_0-10)^2}{2\sigma^2}}e^{-\frac{(x_2-x_1-14)^2}{2\sigma^2}} * e^{-\frac{(L_0-x_0-9)^2}{2\sigma^2}}e^{-\frac{(L_0-x_1-8)^2}{2\sigma^2}}e^{-\frac{(L_0-x_2-21)^2}{2\sigma^2}}$$

    Note that $C$ is a constant, but we won't really need to know the exact value of $C$. Recall that we assume all the variances $\sigma^2=1$, so we obtain

    $$Belief = C e^{-\frac{(x_0-0)^2}{2}}e^{-\frac{(x_1-x_0-10)^2}{2}}e^{-\frac{(x_2-x_1-14)^2}{2}} * e^{-\frac{(L_0-x_0-9)^2}{2}}e^{-\frac{(L_0-x_1-8)^2}{2}}e^{-\frac{(L_0-x_2-21)^2}{2}}$$

    Negative Log Posterior

    Our main goal is to find the values of $x_0,x_1,x_2,L_0$ that maximize this function. However, we can make some transformations to the "belief" function that enable us to find the maximum very easily. First, finding the maximum of the $Belief$ is equivalent to finding the maximum of $log(Belief)$, which allows us to exploit the properties of logarithms which gives us:

    $$log(Belief)= log(C) - \frac{1}{2}(x_0-0)^2-\frac{1}{2}(x_1-x_0-10)^2-\frac{1}{2}(x_2-x_1-14)^2 -\frac{1}{2}(L_0-x_0-9)^2-\frac{1}{2}(L_0-x_1-8)^2-\frac{1}{2}(L_0-x_2-21)^2$$

    Also, finding the maximum of a function $f(x)$ is equivalent to finding the minimum of the function $-f(x)$. So we can restate this problem as finding the minimum of

    $$F\equiv-log(Belief)= -log(C) + \frac{1}{2}(x_0-0)^2+\frac{1}{2}(x_1-x_0-10)^2+\frac{1}{2}(x_2-x_1-14)^2 +\frac{1}{2}(L_0-x_0-9)^2+\frac{1}{2}(L_0-x_1-8)^2+\frac{1}{2}(L_0-x_2-21)^2$$

    Optimization

    To find the minimum, we take the partial derivative of the $F$ function with respect to each of the variables: $x_0, x_1, x_2,$ and $L_0$:

    $F_{x_0}= (x_0 - 0) - (x_1 - x_0 - 10) - (L_0-x_0-9) = 0$
    $F_{x_1}= (x_1 - x_0 - 10) - (x_2-x_1-14)- (L_0-x_1-8) = 0$ $F_{x_2}= (x_2 - x_1 - 14) - (L_0-x_2-21) = 0$
    $F_{L_0}= (L_0-x_0-9) + (L_0-x_1-8)+ (L_0-x_2-21) = 0$

    Notice that the 1st and second equations impose the first relative motion constraint $x_1=x_0+10$ twice: the first equation with a negative sign as a result of the chain rule for derivatives and the second equation with a positive sign (also as a result of the chain rule). Similarly, the second and third equation contain the second relative motion constraint, with opposite signs as a result of applying the chain rule for derivatives. A similar argument can be said for the measurement constraints in their corresponding equations. There's no inherent explanation for why it MUST necessarily work out this way... It just happens to have this structure in the end after working out the math. You may notice that only the initial position constraint $(x_0-0)$ is imposed only once because its quadratic term $\frac{1}{2}(x_0-0)^2$ only features a single variable inside the parentheses, so it is impossible for this term to appear in the gradient of $F$ with respect to any other variable other than $x_0$.

    Conclusion

    It was not apparent to me just by looking at the structure of the belief function that the gradient takes on this form without working through the details explicitly. Of course, I've made a number of simplifying assumptions along the way, including avoiding the problem of "data association" and assuming linear expressions for the motion constraints and measurement constraints. In the more general version of GraphSLAM, we do not necessarily assume this and the algorithm becomes more complicated. But in the linear case (that is, with linear motion and measurement constraints), the gradients of the negative log posterior belief function leads to imposing the motion and measurement constraints twice (once with a positive sign, once with a negative sign), each of which is also weighted by the corresponding covariance. There appears to be no inherent or more fundamental reason why it must necessarily work out this way based upon higher principles... It's just a bunch of trivial calculus that happens to work out this structured way.


Related Questions