Why are Boolean Operations so slooo...ooow ? Part 2

by - 17:19

(continued)

So, I dove into the code and found out that the constructors were plain initializers of internal fields (e.g. double, integer, pointer). All three constructors look similar. Unwinding stacks revealed the root-cause – objects were created using new[] operator (e.g. ptr = (void*) (new IntPolyh_Triangle [N]) ) which was called on *huge* amount of copies. Look at this:



Each IntPolyh_MaillagaAffinage constructor creates arrays of 10 000 points, 20 000 triangles, and 30 000 edges for each face. Are they all really used ? With the debugger I stepped all the steps where these arrays are filled in, and what I did find ? Very often they are filled in with less than 100 elements. A few dozens effectively used while allocating for dozens of thousands ?! Unbelievable !
Two additional observations:
1. initialized elements in the array have never been read and have always been rewritten.
2. effective number of elements can be easily calculated upfront (e.g. n * m)

So, I looked into all IntPolyh classes to ensure that this is a common usage model, so that I could easily fix this with a deferred initialization with a particular number of elements. As usually, life is not that simple as it sometimes seems. Some classes (e.g. IntPolyh_ArrayOfEdges) implied that a number of effective uses can grow over time, and this feature was really used during mesh refinement. Moreover I found that many classes implement the same pattern of an array – where there is a number of allocated elements and number of effectively used. However we already saw above how ineffectively that strategy could be used :-(.

IntPolyh contains 7 classes IntPolyh_Array* that implement this pattern and are implemented as code duplication.

So, I went and created a single generic class IntPolyh_DynamicArray which would allocate memory with Standard::Allocate() (in order to not call new[] and constructors) and could grow over time if previously allocated memory was not enough. All 7 classes became instances of this template what significantly reduces a size of code to maintain.

Next, I made deferred initialization of these arrays when the number of elements to fill them in with is known (e.g. IntPolyh_MaillageAffinage::FillArrayOfPnt()).

There are other possible easy improvements to be made such as inlining all relevant methods of _Point, _Edge, etc. I did not do this right now and leave this up to the Open CASCADE team.

After these modifications time attributed to TKGeomAlgo.dll (measured with Amplifier) decreased as much as 5x-18x (from 2 to 7.28sec vs original 36.07s) !!! Overall speed up was about 3.5x (see screenshot below).




So, this was a relatively ‘low hanging fruit’ to tear off. And it gave such impressive results. I believe there are more than what can be done to improve, and I encourage OCC folks to do more, up to and including multi-threading. I don’t know BOP internals but I would check if running face-face intersection in parallel threads is feasible.

However, this was not the end of my own research.

(to be continued)

You May Also Like

9 comments

  1. There's a little problem with the last screenshot: it does NOT "open" when clicked.

    Now, to my opinion: your findings and tests are really nice. I hope you also provide links to patches that improve or fix code in OCC, like the one you're currently working on in this article.

    You're not only giving hope to OCC users, you're also demonstrating what a nice piece of software you got in your hands from Intel. You've convinced me, I'll subscribe :)

    ReplyDelete
  2. Really nice job. But, will OCC team take your method into the new OCC version?

    ReplyDelete
  3. Thanks! Well, I hope that OCC team will take them and run through their acceptance testing, and if they reveal no regressions they will be integrated. Of course, it's up to them to decide on timing, so obviously no promises.
    I'm glad to see OCC folks here on the blog and their insightful comments, this makes me hope that the fixes won't be left unattended.

    ReplyDelete
  4. What happened the the patch you submitted as you desribed in http://www.opencascade.org/org/forum/thread_15003/?forum=3
    ?

    Was it rejected? It would seem this would be a useful optmisation. The code for 6.5.0 still contains the fixed size arrays:
    http://git.dev.opencascade.org/gitweb/?p=occt.git;a=blob;f=src/IntPolyh/IntPolyh_MaillageAffinage.cxx;h=17a85e8c4bba78ec0adf50ea631998091c0f4101;hb=HEAD

    ReplyDelete
  5. I am not certain if there was a detailed feedback from the team, so the patch remained without follow up :-(.
    There could be use of vector-like container (i.e. dynamically growing), e.g. NCollection_Vector.

    ReplyDelete
    Replies
    1. I found the patch on sourceforge, but I admit that it is beyond me to apply it with my current knowledge. Have you considered contributing the improvements to the community edition, OCE? I for one would be interested in evaluating them to speed up my project - it's called Shapesmith (http://shapesmith.net)

      Delete
  6. Benjamin, to provide a full-fledged patch you need to provide a more thoroughly tested solution. I did/do not have time for that, neither do I have a test case database for BOPs.
    My intention was to provide a proof-of-concept that using the tools (Intel Amplifier) you can identify and root-cause the bootlenecks and address them. This seems to have not been encouraging enough for the OCC team to pursue this - likely because of being below their priority line.

    ReplyDelete
  7. Well thank roman for your posts! very interesting!

    ReplyDelete