This article compares the performance of QuickMCL and AMCL when it comes to computational resources. It does not compare them for speed or quality of localisation, but these should be identical if a parameter set is used that is supported by both programs (e.g. likelihood field model).
Test methodology
The tests were made with a fixed number of particles, effectively disabling KLD sampling (Fox, 2003). This is needed to get any sort of useful results, since otherwise the results will not be reproducible.
The test consisted of playing back a ROS bag file with a recorded scenario and measuring the CPU usage & memory usage at the end. This was repeated 5 times per parameter value tested and the average was taken.
The CPU (Intel Core i7 i7‑8550U) was set to run in performance mode to minimise effects of CPU frequency scaling and optimisations (-O3) were enabled for GCC 5.4.0‑6ubuntu1~16.04.11.
For memory usage unique set size was measured, which is the amount of memory would be freed should the program exit at that point. This gives a fairer measurement than resident set size, virtual set size or similar measurements, since it will ignore shared memory-mapped resources such as the C standard library, but it will include unique libraries used by the program.
The tests were run with the bag file played back at an accelerated rate, after verifying that this did not change the results as long as the CPU load was below 100%, which was carefully monitored.
The code to re-create these results is available on github.
First, let us look at CPU performance
As can be seen, AMCL has quadratic complexity! Turns out this is from a poor algorithm choice in the resampler where it uses a linear array of the sum of particle weights, resulting in a complexity of \(O(m\cdot n)\) where m is the number of new particles and n is the number of old particles.
As a contrast, QuickMCL uses a binary tree for this, resulting in a complexity of \(O(m\cdot\log n)\)
Also of interest is that the conversion from LaserScan
to PointCloud2
is essentially free, but it should be noted that there is some overhead in running it as two programs. Thus it is only recommended to use external laser processing if you need the point cloud for something else anyway, or if you want to do more advanced processing, such as merging data from multiple lasers.
The overhead is even larger if we include the resource usage of both programs (not shown in this graph).
Also of note is that switching from the default float
to double
for pose components is essentially free when it comes to CPU usage.
Memory usage
Here, both implementations at least scale linearly, but AMCL does use significantly more memory, both for the baseline and per particle.
AMCL does have some structure members that are no longer in use (or only used in certain situations) as well as cases of inefficient memory layout resulting in unnecessary padding, but it seems strange it would cause such a huge difference.
Thus it isn’t clear what AMCL is doing that is causing it as even with internal laser processing, QuickMCL doesn’t even come close to it.
Also of note is that switching from the default float
to double
for pose components is essentially free for memory usage as well.
Conclusions
There are three main points to draw from this study:
- QuickMCL handily beats AMCL when it comes to resource usage, both for CPU and memory. AMCL is, of course, better tested and has more features (beam model, extra odometry models, etc).
- It might be worth making double precision the default for QuickMCL since it doesn’t appear that the difference between single and double precision is very large, at least on modern x86 CPUs.
- External laser processing only make sense if you need the point cloud for something else, or need to do more advanced processing (such as merging data from multiple lasers). Otherwise, you are better off using the internal laser processing.
Bibliography
Dieter Fox. Adapting the Sample Size in Particle Filters Through KLD-Sampling. The International Journal of Robotics Research, 22(12):985–1003, 12 2003. doi:10.1177/0278364903022012001. ↩