Ceres - Rotation manifold optimization

November 2021

I spend some time getting familiar with rotations and how lie groups help to provide a rigid and clear framework to deal with them in the context of optimization.
I chose Ceres -- offering a mature optimization framework -- as a starting point and publications by Joan Sola to get a better intuition about the theoretical background on Lie theory and Quaternions in the context of rotation estimation.

I learned the hard way that both are not 1:1 compatible. Some blog posts and the diplom thesis by Christoph Hertzberg finally helped to fill in the gaps left by both Ceres and Solas explanations.

On this page I will share my key findings and details about the gaps to combine both the ready optimization framework Ceres with the Lie Group intuition that Sola provides in his work. I implemented a very simple rotation optimization code accompanying this article in my GitHub repository.

Recommended Ceres Differentiation: Automatic

Ceres offers multiple methods to compute jacobians (and gradients) for optimization problems: They can be implemented by an analytical formula, but Ceres also offers to compute the numerical gradient (mainly for debugging and compatibility) as well as the fast automatic jacobians from a given function only.

Furthermore it offers utilities for optimization on manifolds via LocalParameterization, where the derivatives may map between non-trivial tangent spaces. In particular for the rotation group SO(3) ceres offers off-the-shelf functions with one little catch: One is bound to use automatic derivatives. To get going fast, this is definitely the first choice, however my measurements (last paragraph) suggest to give analytic derivates a try when performance is important. It may come with headaches getting the analytical gradient right, as many sources - including all above - use different conventions. The next paragraphs shall help on this issue.

The analytical rotation jacobian in Ceres

To get a better understanding, I tried to forcably go down the road of attempting a clean implementation of an analytical jacobian for a simple reprojection error and quaternion rotation. However I was biting on stone when it came to verifying whether it is correct. After some reading I realized that part of that challenge was due to the various conventions used by different authors. Each source left gaps to be found at another source, but also another convention. So let me give a walkthrough through the maths of computing the jacobian while indicating where other sources go other ways.

On Rotation transformations

Let's start high-level with the definition of our problem, where we can already see possible divergence in approaches. One formulation found very often is that used by the SE(3) group (1) and its inverse (2).

There are many possibilities how to interpret this formula. Some authors transform from a local camera system to a global one, some the other way around. Some apply it on the coordinate system, some inversely apply it on points in the system.

One frequent formulation (Hartley&Zisserman's Multiview geometry and others) is (3) and its corresponding inverse. For the optimization we are interested in transforming points in world coordinates to points in the camera system (not the system itself), so we stick with (3). A bonus of this representation: t is equal to the cameras position in world coordinates. To focus on rotations in this exercise we simplify by keeping t constant and make use of the reformulation (4).

Note that the blog uses the inverse of (3). My gitHub code uses (4).

More Conventions

All formulas displayed are subject to various conventions that are far from standardized.

The order of quaternion coefficients differs even between Eigen and Ceres leading to Ceres providing an extra EigenQuaternionParameterization.

Some authors write out the rotation matrix as above, some use the equality w² + v_x² = 1 - v_y² - v_z² leading to different diagonal entries and different jacobian dR(q)*p / dq.

Even the exponential map usually uses a factor 0.5 (Eigen, Sola), s.t. the norm of theta corresponds to rotation angles in radians. And last but not least, q can also be multiplied from the left leading to yet another parametrization (Sola).

In this blog we follow the conventions used in the formulas, i.e. choose to order quaternion coefficients the Eigen way and define Exp_q the Ceres way,

s.t. it can easily be transformed into code using Ceres utilities as in my example code.

Note: Despite having to make a choice of rotation parametrization leading to different jacobians dR*p / dq, the chained jacobians dR*p / dtheta will be equal again.

On Parametrizations and Quaternions

Having fixed the convention we can now go (over)parametrize the rotation by unit(!) quaternions (5) and unit quaternions themselves through the exponential map (6).

Rotating a point via quaternion multiplication does exactly the same calculation as multiplying it via the rotation matrix (5)*.

Finally we are interested finding a jacobian for the action R(x), so R*x is what we will be looking at.

The exponential maps Exp_q finally offers a local parametrization for all unit quaternions. Using those allows optimizing in a euclidean space with the same dimensions as the manifold of unit quaternions.

This local parametrization has different names. Sola uses a circle "plus" (+), Hertzberg and Ceres a "box plus" [+]. They are the same as Exp_q (6) -- potentially up to a scaling factor -- providing arguably more intuition when writing formulas for derivatives. Consult Sola for a beautiful derivation of the exponential map for quaternions.

Now we have made explicit and implicit -- e.g. quaternion mutliplication is done the "standard" Hamilton way -- choices of conventions, let's get going with the actual jacobian.

The Jacobians And Ceres

Let's start with the straight-forward, but calculation heavy dR/dq. Using the above conventions we get the jacobian matrix (7). Confirmation is left to the reader*.

We will look differently at the jacobians of the exponential maps. The whole idea about LocalParametrizations is that it changes with the point of interest: By design we demand that Exp_q(0) = q. Since for each q we choose the corresponding local parametrization, we only need to evaluate the jacobian at 0. For this we use the well-known Taylor expansion (8) and matrix formulation of the quaternion product (9) yielding a simple expression for dExp_q / dtheta (10).

With those two jacobians you are now ready to go on Ceres. But careful, if somewhere on the way you decide to use a different convention ;)

*For the mathematical inclined: It is a very rewarding derivation with lots of symmetries to discover (some already indicated). Even more beauty (and pages of symbols) can be found when showing that dRp/dtheta is equal no matter the choice of parametrization of R via q. For the coder: Implementing both parametrizations is a nice cross check there is no typo in the implementation.

Lessons from alternative approaches

The main difficulty is to get the indexing right, here we use a row major layout to roll out the matrix for the jacobian computation. To add some more salt, also note that Exp_R applies R on the left. Apart from being a nice brain exercise, there is nothing to be aligned with Ceres, however it aligns with Sola's convention.

What about direct use OF Rotation Matrices?

Instead of parametrizing rotations via quaternions, we can also directly optimize over all 9 rotation matrix parameters. Of course, we see them on a manifold whose dimension can be reduced to 3 using a local parametrization the same way as done above for quaternions. Expectations to get any performance/stability improvement are quite low. In fact since we are now dealing with twice as many parameters leading to more operations/memory consumption, we would expect worse performance/stability.

On the left you see the key formulas for that approach and in my github repository you can find the implementation and a simple test.

The results indicated there is no noticable difference between the approaches. Now I did not run a full fledged testsuite with lots of repititions, different systems, etc. - so this observations should be taken with a grain of salt.

analytic vs. automatic derivative

Going along this road of testing different approaches with the ready made analytic jacobians, I just had to compare ... and was surprised by the results.

Again I expected no noticable difference, but in fact I found that the analytic derivative is consistently 10x faster than the automatic one. Results had comparable quality with either approach, in most cases improving accuracy from 0.1 to 1e^-10 and sometimes failing to optimize, i.e. not going far beyond the initial accuracy.

Also here I ran the optimizations on a dummy scene and did not conduct a full-fledged test-suite. However the definitive take-away is, that using analytic instead of automatic derivatives is still something to look out for and test in a real world scenario despite the usual recommendation to use automatic derivatives.

Singularities

One interesting property of using the local parametrization approach is, that by centering the exponential map around zero, values stay as far away from singularities as possible. Imagining e.g. a 2D rotation, the singularity would be on the other side of the sphere.

So no matter the current location on the manifold, the local parametrization has practically no chance to be affected by singularity effects.

Does the optimization actually stay in the manifold?

Mathematically speaking each operation R * R yields another rotation matrix, and the same may be assumed for one numeric operation. During optimization this operation happens however quite a lot, is correction via normalization required?

In my experiment I used double precision and ran over 100times. Using either quaternion or matrix parametrization, the values never left the manifold, i.e. the norm/determinant was stable 1 over all iterations.

Lie Style Jacobian via "Hacking Ceres"

While Ceres gives the framework to use Lie Theory and Local parametrizations, it is a bit surprising to find that the split outlined above is necessary. In particular since the Lie derivative dR(theta)*p/dtheta = -[R*p]_x is so simple.

Fan Zheng gives an outline to trick Ceres and implementing the complete jacobian in the cost function. While this preserves the mathematically beautiful and simple formulation, the code logic will make others think twice. I implemented a variant for the above problem and found that also in terms of performance there is no significant gain. In fact it is on average 50% slower on my machine.