Skip to content

Resolve "(core): image iterator performs badly"

Flavien BRIDAULT requested to merge 779-core-image-iterator-performs-badly into dev

Description

The image iterator was optimized to match STL iterator performance as much as possible.

Several modifications were brought:

  • the iterator no longer locks the image. This brings an extra loss of 200%, even in release mode.
  • the inlining was forced to get better performance in debug, around 50%.
  • the out-of-bounds assertions are no longer enabled. However, they can still be enabled by defining SIGHT_DEBUG_ITERATOR before including the ImageIterator.hpp (it may also be necessary to temporarily disable the PCH in the current target if the file is already included in an other file).

Also, unrelated to the performance, the design of the iterator was improved to handle the constness. It no longer requires an extra template parameter to handle this. This makes the code easier to read.

Closes #779 (closed)

How to test it?

You can profile some code of your choice before/after, with FW_PROFILE.

Some results

A test was added in ImageTest to benchmark and compare performance with the STL. We run two tests, one doing a std::for_each loop and the second doing a std::copy. An image of size 200x100x100 is created, and we compare the performance with a std::vector of the same size.

Here is a summary of the results I can get on my laptop (Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz)

for_each Debug for_each Release copy Debug copy Release
STL 20.407 2.436 1.647 1.138
before (clang) 20.586 2.463 24.682 7.971
after (clang) 12.687 2.42 9.291 5.512
after (gcc) 8.78 - 8.93 -

We can see that we significantly improved the performance in Debug for both tests, which was our main purpose. Our iterator is surprisingly even faster than the STL in the std::for_each test.

We can also notice that we do not manage to match STL performance for the std::copy. We understood it is normal because the STL optimizes the loop by a call to memmove. At the moment, there is no way we can tell the STL to achieve the same optimization with our iterator. It seems we would need to specialize std::__niter() but we failed, and it does not seem it is meant to be extended anyway. However, we do not consider this as a real issue, since the developer can still directly call std::memcpy and get the same performance.

Last, we have to say that performance in Release mode was already satisfying before these changes and did not improve. We are exactly at the same level than the STL.

Edited by Flavien BRIDAULT

Merge request reports