Last post I mentioned adding a “pseudo” receiver dynamics mode to RTKLIB that gives most of the benefit of the receiver dynamics option but with a much smaller number of computations. Since then I have realized that a better alternative is to optimize the existing dynamics mode to use less computations rather than replace it with something new. That is what I will discuss this post.

First let’s take a quick look at how the receiver dynamics feature works.

There are normally two steps for each time sample in the kalman filter calculations. In the first step, the kalman filter predicts the receiver position for the current time sample based on the receiver position of the previous sample, the time elapsed since the last sample, and the known dynamics of the receiver. It also estimates the confidence of that estimate in terms of variances. In the second step, these predictions are updated based on the input measurements from the receiver. Confidence levels (variances) are also estimated for the measurements and the relative weighting between prediction and measurement used in this update are determined by the variances.

It is the first step that we are interested in here. It is referred to as the time update of the kalman filter in the documentation. It is also sometimes referred to as the state prediction.

If receiver dynamics is set to off, then this first step is skipped. Instead of predicting the position, RTKLIB simply uses the coarse measurement derived from the pseudorange measurements. This measurement will generally be much less accurate than the predicted position, which is why enabling receiver dynamics is desirable.

If receiver dynamics is set to on, then RTKLIB will add six states to the kalman filter, three for receiver velocity (x, y, z) and three for acceleration. This is in addition to the existing three states for receiver position (x, y, z). The kalman filter will then use these estimates of the receiver’s velocity and acceleration when predicting what the receiver position will be one sample later. These states are updated in a similar way to the other states, but note that unlike most of the states, these do not have direct measurements associated with them. Since, in typical RTKLIB applications, the time between samples is usually quite short relative to the receiver’s maximum acceleration, these predictions should be quite accurate. The confidence of these position estimates will also be updated based on the length of time to the next sample and the acceleration characteristics of the receiver defined by the configuration file input parameters “stats-prnaccelh” and “stats-prnaccelv”. The longer the time between samples and the larger the possible accelerations of the receiver are, the less confident the kalman filter will be in its estimates. Less confidence means higher variances.

For more details on exactly how this is done let’s look at Appendix E in the user manual. In section E.7 we find the matrix used for the time update of the kalman filter.

The updated state vector x(k+1) , one sample time later than x(k), is calculated by multiplying this matrix by x(k). In other words:

x(k+1) = F * x(k)

In this case F is a square matrix, nxn, where n is the number of states. The exact number of states will depend on input parameters but for my example, with GPS, GLONASS, SBAS, and dynamics enabled, n is equal to 98. This is one state for each satellite plus the nine for the receiver mentioned above. Multiplying an nxm matrix with an mxp matrix, assuming no shortcuts are taken, requires nxmxp multiplications, in this case, 98x98x1=9604.

Let’s take a closer look at F, the state-transition matrix. First, notice that it is almost equal to the identity matrix, which is a matrix with all 1’s on the diagonal (I is a notation for identity matrix here). Only the one 3×3 term multiplied by tau is off the diagonal. If it wasn’t for this term, x(k+1) would simply equal x(k) since multiplying a matrix by I is equivalent to multiplying a scalar by one. This makes sense since we are only updating the position states … all the satellite bias states should remain constant with time.

So what is the tau term doing here? Let’s look at the first row in the F matrix. By replacing the I notation with actual numbers we can see that it is equal to {1 0 0 τ 0 0 …} where the remaining 92 elements are all zero. In this case τ is the time difference between samples. Since the first three states are position in the x,y,z axes and the next three states are velocity in the x,y,z axes, the first three rows of F corresponds to the equations

posx(k+1) = posx(k) + velx(k) * τ

posy(k+1) = posy(k) + vely(k) * τ

posz(k+1) = posz(k) + velz(k) * τ

So what could appear to be a somewhat intimidating equation in matrix form turns out to be quite simple and familiar when reduced to its parts.

Since the rest of F is the identity matrix, the equations for all the other states will be even simpler and will just be equal to their previous value.

In actuality, the version of the F matrix I’ve quoted here from the documentation is not quite right and probably reflects an earlier version since it only adds velocity states and not acceleration states. The correct F matrix to match the code is:

The only difference is we’ve added the acceleration states in addition to the velocity states and update the velocities with the accelerations just as we updated the positions with the velocities.

Since only the first nine states are used in the off-diagonal terms of the equation, there really is no need to multiply the full matrices to update the state vector. We can calculate the first nine states which requires 9x9x1 multiplications, and replace the rest of the operation with a matrix copy, which is equivalent to multiplying by the identity matrix but requires only one step per element instead of 98 floating point multiplies. This is what I did in the code.

Percentage-wise, the computational savings are significant since we have reduced the floating point multiplies from 98×98 to 9×9. However, the number of multiplies in this step is not large to start with so the savings are not significant.

The real opportunity is in the calculation of the covariance matrix P which is a similar operation but much more computationally intensive because the covariance matrix is 98×98 elements, unlike the state vector which is only 98×1 elements. The larger size is because it contains the covariances (or correlations) between each state and every other state. We use a similar operation and the same state-transition matrix F, but the equation becomes

P(k+1) = F * P(k) * F’

where P is the covariance matrix, and F’ is the transpose of the F matrix. Again, because F is very close to the identity matrix, we can replace most of the multiply by a copy, then calculate directly the few elements that change with the state update. It is a little more complicated because to realize a significant savings we can’t get away with just multiplying a sub-matrix like we did with the state vector. Instead we directly compute the elements that need to be modified. To calculate F*P, the original code looked like this:

matmul(“NN”,rtk->nx,rtk->nx,rtk->nx,1.0,F,rtk->P,0.0,FP);

Since F and P are both 98×98 element matrices, this call required 98x98x98 or 941192 floating point multiplies per sample time. I replaced the above matrix multiplication with the following lines of code

/* P=F*P, only calc non-zero off diaganol terms to save time */

matcpy(FP,rtk->P,rtk->nx,rtk->nx);

for (j=0;j<rtk->nx;j++)

for (i=0;i<6;i++)

FP[i+j*rtk->nx]+=rtk->P[i+3+j*rtk->nx]*tt;

This gives the identical answer as the full multiply but only requires 98×6 or 588 floating point multiplies. A similar substitution for the FP * F’ operation saved nearly another million multiplies every sample.

While it will vary from system to system, I was able to roughly estimate how much time is saved in my particular case by adding a call to the Windows timeGetTime() function before and after the call to the state update function, Before the modification, the state update was taking roughly 15 msec which was about 50% of the total computation time for each epoch. With the change, the time was too small to measure. I also verified that the solution results after this change are identical to the previous results which would be expected since we haven’t changed anything except the way we do the calculation.

I had previously uploaded my “pseudo” dynamics feature to the demo3 code in my GitHub repository, but with this change I felt there was no value in that feature, so have removed it and replaced it with this one.

Dear Tim,

Excellent article as usual.

I have designed an EKF for orientation estimation using a 9-DOF inertial measurement unit (a very bad nomenclature as it has nothing to do with degrees of freedom, but anyway using a MEMS accelerometer, a gyroscope and a magnetometer). Nothing special, it has a great literature, I just have added some states to estimate the most important standard errors (biases) of the sensors.

Now I would like to do some experiments on implementing a tight (or semi-tight) INS-coupling to RTKLIB. My idea is to keep the two Kalman-filters run in parallel: asthe usual way for orientation estimation is done by quaternion algebra, which requires an EKF to handle the nonlinearities however converting the RTKLIB KF to EKF is probably not computationally feasible knowing the huge number of states.

Instead I would like to establish a closed loop feedback between the two KFs: the inertial EKF feeds linear accelerations in Earth frame to the RTKLIB KF and in exchange it provides velocity observations back to the inertial EKF thus helping it to distinguish accelerations from orientation changes (by their nature without external observations all these algorithms get confused when the sensor is in constant acceleration).

May I kindly ask you if you find any flaw in my approach?

Do you think it will provide a significant performance benefit? These sensors are absolutely in the ultra-cheap ($10) price range.

Many thanks in advance,

Marton

LikeLike

Hi Marton. I don’t think I can directly answer your question, but I can offer a few comments

– RTKLIB does use an extended kalman filter (EKF) so this might make it easier to integrate the additional sensors into the main kalman filter than you suggest. See section E.7 of appendix E in the user manual for the details.

– I would be concerned that coupling the two feedback systems together as you describe could lead to stability problems. If it is possible to separate the responses of the two loops by making one of them low bandwidth and the other higher bandwidth, that will help. I think, though, that a single kalman filter with additional states added for the sensors would be a more optimal solution if it’s possible.

– Regarding the computational load, RTKLIB’s internal implementation of matrix multiplication does not look like it is designed to take advantage of the sparse nature of some of the matrices as I demonstrated in the example in this post. It does, though, have the option of being compiled with the LAPACK linear algebra library. I don’t know much about this library but if it can automatically optimize the multiplies similar to what I did manually in the post, that may be a significant opportunity to free up MIPS that can then be used to process the additional sensor states. Alternatively, using any other available linear algebra library for matrix multiplication would be an easy change to RTKLIB.

– As to the potential performance benefit, I think that is going to be very application dependent and should not be taken for granted. It will be most helpful in very uncontrolled environments where complete loss of satellite signals for extended times is possible. In better controlled environments where outages are shorter and don’t involve all the satellites at once, the advantage seems less obvious. Assuming receiver dynamics are enabled and reasonably well optimized for the specific rover, then the value of the sensors will depend on how their noise levels compare to the prediction errors of the kalman filter.

– In general, additional CPU MIPs are quite cheap and from a cost/benefit perspective, I suspect that if you believe your solution is constrained by available MIPs, then increasing the power of the CPU will provide more benefit per dollar than adding sensors. If you are not constrained by MIPS but still need additional performance, then sensors might make sense, but again it’s important to do the analysis first.

Just my thoughts, hope they help.

LikeLike

Just for your information:

I did a test with LAPACK, it taks 12.79 ms/ep, slightly less than your 15 ms. The LAPACK was not well designed for sparse matrix computation.

Test environment:

Ubuntu 16.04, x86_64 x86_64 x86_64

– Processing use rnxwrtkp with kinematic mode without dynamic: processing 1 epoch data needs 1.42 ms

– Processing use rnxwrtkp with kinematic mode with dynamic

– NO LAPACK lib used: 14.2 ms/ep

– LAPACK lib used: 12.79 ms/ep

– Wit the optimized codes (calculate only the non-zero element in matrix): 1.78 ms/ep

LikeLike

Hello,

Great blog, thank you for your effort!

For the Android users, here you can find the RTKGPS+ 2.4.3 based on the newest demo3 code:

mega.nz/#!Es8AnbCJ!kuudjjljijok_U2urA08oPra4nj_OLO3oNiaLcO42Q0 (RtkGps+DEMO3.apk)

The app if for x86 (Intel) CPUs only because, for some reason that I couldn’t identify, the 2.4.3 builds do not run well on ARM.

Maybe rtklibexplorer will also find a cure for this issue 😉

LikeLike