Skip to content

enh(core): mesh iterator performs badly

Flavien BRIDAULT requested to merge 625-core-mesh-iterator-performs-badly into dev

Description

Eventually, the data::Mesh class was severely modified. All the deprecated API was removed. The iterator is a completely new one, that is common to the data::Array, data::Image and data::Mesh classes. The only difference in the class Mesh is that you can iterate over multiples attributes at will, using the zip_range() function. It just creates a boost::zip_iterator, which is indeed the same concept as the zip() built-in function in Python.

Our iterators can loop over a single array or multiple arrays, thanks to boost::zip_iterator.

Example to iterate over points:

   data::Mesh::sptr mesh = data::Mesh::New();
   mesh->resize(25, 33, data::Mesh::CellType::TRIANGLE);

   float p[3] = {12.f, 16.f, 18.f};

   for (auto& pt = mesh->range<data::iterator::point::xyz>())
   {
       pt.x = p[0];
       pt.y = p[1];
       pt.z = p[2];
   }

Example to iterate over cells:

    data::Mesh::sptr mesh = data::Mesh::New();
    mesh->resize(25, 33, data::Mesh::CellType::TRIANGLE);

    auto itrPt = mesh->begin<data::iterator::point::xyz>();
    float p[3];

    for(const auto& cell : mesh->range<data::iterator::cell::triangle>())
    {
        for(size_t i = 0 ; i < 3 ; ++i)
        {
            const auto pIdx = cell.pt[i];

            auto& pointItr(itrPt + pIdx);
            p[0] = pointItr->x;
            p[1] = pointItr->y;
            p[2] = pointItr->z;
        }
    }

The iterators are compatible with all STL algorithm functions, for example std::copy.

    void copyPoints(const data::Mesh& origin, const data::Mesh& dest)
    {
        SIGHT_ASSERT("Meshes must have the same number of points",
                   origin.getNumberOfPoints() == dest.getNumberOfPoints());

        auto origIt = origin.begin< data::Mesh::iterator::xyz >();
        auto origEnd = origin.end< data::Mesh::iterator::xyz >();
        auto destIt = dest.begin< data::Mesh::iterator::xyz >();
        std::copy(origIt, origEnd, dest);
    }

Last but not least, it is also possible to get an iterator over multiple attributes using the zip_range() function. Coupled with C++17 structured bindings, this makes such loops fairly elegant.

    uint32_t count = 0;
    for(auto&& [p, n, c, uv] : mesh->zip_range<point::xyz, point::nxyz, point::rgba, point::uv>())
    {
        p.x = static_cast<float>(3 * count);
        p.y = static_cast<float>(3 * count + 1);
        p.z = static_cast<float>(3 * count + 2);

        n.nx = static_cast<float>(3 * count + 1);
        n.ny = static_cast<float>(3 * count + 2);
        n.nz = static_cast<float>(3 * count + 3);

        c.r = static_cast<std::uint8_t>(4 * count);
        c.g = static_cast<std::uint8_t>(4 * count + 1);
        c.b = static_cast<std::uint8_t>(4 * count + 2);
        c.a = static_cast<std::uint8_t>(4 * count + 3);

        uv.u = static_cast<float>(2 * count);
        uv.v = static_cast<float>(2 * count + 1);
        ++count;
    }

Using boost::combine you can even realize awesome loops over multiples meshes and multiples attributes !

    const auto range1 = mesh->czip_range<point::xyz, point::rgba, point::nxyz>();
    const auto range2 = mesh2->czip_range<point::xyz, point::rgba, point::nxyz>();

    for(const auto& [orig, cur] : boost::combine(range1, range2))
    {
        const auto& [pt1, c1, n1] = orig;
        const auto& [pt2, c2, n2] = cur;

        CPPUNIT_ASSERT_EQUAL(pt1.x, pt2.x);
        CPPUNIT_ASSERT_EQUAL(pt1.y, pt2.y);
        CPPUNIT_ASSERT_EQUAL(pt1.z, pt2.z);

        CPPUNIT_ASSERT_EQUAL(c1.r, c2.r);
        CPPUNIT_ASSERT_EQUAL(c1.g, c2.g);
        CPPUNIT_ASSERT_EQUAL(c1.b, c2.b);
        CPPUNIT_ASSERT_EQUAL(c1.a, c2.a);

        CPPUNIT_ASSERT_EQUAL(n1.nx, n2.nx);
        CPPUNIT_ASSERT_EQUAL(n1.ny, n2.ny);
        CPPUNIT_ASSERT_EQUAL(n1.nz, n2.nz);
    }

Performances are discussed in the results section below.

Other changes

The array, image and mesh APIs have been cleaned from all the deprecated stuff. Consistency has been improved. For instance, all setters concerning the number of elements or type of arrays were removed, in favor of a single resize() function. Indeed we try to avoid this state-machine-like behavior we add before where we were supposed to call the correct setters before allocating memory.

An important change is also that it is no longer possible to mix different cell types in a mesh. We agreed we no longer have this need, and this simplified a lot the internal code.

data::Mesh::adjustAllocatedMemory was renamed into shrinkToFit(), but I am opened to any other proposition. A truncate() method has been added to reduce the number of points and cells without reallocating (for dynamic point clouds for instance). This is a better way to express this need, rather than setting directly the number of points and cells. This way, we can at least assert that the number of points and cells requested is lower than the allocated size.

All functions returning a number of something were renamed from the VTK-ish getNumberOfXX() to the simpler numXX().

Closes #625 (closed)

How to test it?

Of course, all tests run, so you may just test some different applications. Load some meshes and even some images.

Some results

Test std::for_each with a loop of 1000 iterations over 655360 points, results in ms.

Pseudo-code used for the inner loop:

p.x          = p.x > 2.f ? 5.f : 1.f;
p.y          = 2.f;
p.z          = 3.f;
n.nx         = -1.f;
n.ny         = -1.f;
n.nz         = 1.f;
Gcc Debug Clang Debug MSVC Debug Gcc Debug -Og Clang Debug -Og MSVC Debug /Ox Gcc Release MSVC Release
old iterator 7497 5869 3850 465 4312 574 210 326
new iterator 24515 17506 70046 321 11646 191 140 187

In this test, we can see that the new iterator is really bad in pure debug mode. This could be explained by the fact that boost::zip_iterator has a lot of template code, that is not inlined in this configuration. However, in optimized debug builds, the new iterator performs better than the previous one, by an average of 30%, except on Clang where it is actually way worse. This is because Clang optimized debug mode is really poorly optimized. Last, in release mode, the same 30% can be seen on GCC and even better a 45% on MSVC.

The situation is not perfect, we would have preferred to have a full debug experience with good performances but that seems to be an impossible trade. The only way would have been to redevelop the boost::zip_iterator, force the inlining everywhere, and cross our fingers. This did not seem wise.

Thus, we propose instead a new switch in sight_add_target() called FAST_DEBUG which allows enabling the optimized debug build per target. This switch is meant to be used in targets where speed is crucial, for image and mesh processing. Developers can temporarily remove the optimization on a target if a raw debug experience is needed. This should be better than our previous attempt with the debug optimized build, where the whole build was affected. Developers were not very happy to have to do a rebuild of sight when they needed a raw debug experience with only a small piece of code.

Last, ExRealSense has been fixed. Closes #754 (closed).

Edited by Flavien BRIDAULT

Merge request reports