Why are Boolean Operations so slooo...ooow ? Part 2

by Roman Lygin - 17:19

(continued)

So, I dove into the code and found out that the constructors were plain initializers of internal fields (e.g. double, integer, pointer). All three constructors look similar. Unwinding stacks revealed the root-cause – objects were created using new[] operator (e.g. ptr = (void*) (new IntPolyh_Triangle [N]) ) which was called on *huge* amount of copies. Look at this:

Each IntPolyh_MaillagaAffinage constructor creates arrays of 10 000 points, 20 000 triangles, and 30 000 edges for each face. Are they all really used ? With the debugger I stepped all the steps where these arrays are filled in, and what I did find ? Very often they are filled in with less than 100 elements. A few dozens effectively used while allocating for dozens of thousands ?! Unbelievable !
Two additional observations:
1. initialized elements in the array have never been read and have always been rewritten.
2. effective number of elements can be easily calculated upfront (e.g. n * m)

So, I looked into all IntPolyh classes to ensure that this is a common usage model, so that I could easily fix this with a deferred initialization with a particular number of elements. As usually, life is not that simple as it sometimes seems. Some classes (e.g. IntPolyh_ArrayOfEdges) implied that a number of effective uses can grow over time, and this feature was really used during mesh refinement. Moreover I found that many classes implement the same pattern of an array – where there is a number of allocated elements and number of effectively used. However we already saw above how ineffectively that strategy could be used :-(.

IntPolyh contains 7 classes IntPolyh_Array* that implement this pattern and are implemented as code duplication.

So, I went and created a single generic class IntPolyh_DynamicArray which would allocate memory with Standard::Allocate() (in order to not call new[] and constructors) and could grow over time if previously allocated memory was not enough. All 7 classes became instances of this template what significantly reduces a size of code to maintain.

Next, I made deferred initialization of these arrays when the number of elements to fill them in with is known (e.g. IntPolyh_MaillageAffinage::FillArrayOfPnt()).

There are other possible easy improvements to be made such as inlining all relevant methods of _Point, _Edge, etc. I did not do this right now and leave this up to the Open CASCADE team.

After these modifications time attributed to TKGeomAlgo.dll (measured with Amplifier) decreased as much as 5x-18x (from 2 to 7.28sec vs original 36.07s) !!! Overall speed up was about 3.5x (see screenshot below).

So, this was a relatively ‘low hanging fruit’ to tear off. And it gave such impressive results. I believe there are more than what can be done to improve, and I encourage OCC folks to do more, up to and including multi-threading. I don’t know BOP internals but I would check if running face-face intersection in parallel threads is feasible.

However, this was not the end of my own research.

(to be continued)

Tags :

9 comments

CenizaDecember 8, 2008 at 2:43 PM
There's a little problem with the last screenshot: it does NOT "open" when clicked.

Now, to my opinion: your findings and tests are really nice. I hope you also provide links to patches that improve or fix code in OCC, like the one you're currently working on in this article.

You're not only giving hope to OCC users, you're also demonstrating what a nice piece of software you got in your hands from Intel. You've convinced me, I'll subscribe :)
ReplyDelete
Replies
Roman LyginDecember 8, 2008 at 2:50 PM
Yep, now fixed, thanks.
ReplyDelete
Replies
AnonymousDecember 9, 2008 at 4:03 AM
Really nice job. But, will OCC team take your method into the new OCC version?
ReplyDelete
Replies
Roman LyginDecember 9, 2008 at 10:55 AM
Thanks! Well, I hope that OCC team will take them and run through their acceptance testing, and if they reveal no regressions they will be integrated. Of course, it's up to them to decide on timing, so obviously no promises.
I'm glad to see OCC folks here on the blog and their insightful comments, this makes me hope that the fixes won't be left unattended.
ReplyDelete
Replies
Benjamin NortierApril 17, 2012 at 6:49 PM
What happened the the patch you submitted as you desribed in http://www.opencascade.org/org/forum/thread_15003/?forum=3
?

Was it rejected? It would seem this would be a useful optmisation. The code for 6.5.0 still contains the fixed size arrays:
http://git.dev.opencascade.org/gitweb/?p=occt.git;a=blob;f=src/IntPolyh/IntPolyh_MaillageAffinage.cxx;h=17a85e8c4bba78ec0adf50ea631998091c0f4101;hb=HEAD
ReplyDelete
Replies
Roman LyginApril 17, 2012 at 7:21 PM
I am not certain if there was a detailed feedback from the team, so the patch remained without follow up :-(.
There could be use of vector-like container (i.e. dynamically growing), e.g. NCollection_Vector.
ReplyDelete
Replies
Roman LyginApril 18, 2012 at 7:47 PM
Benjamin, to provide a full-fledged patch you need to provide a more thoroughly tested solution. I did/do not have time for that, neither do I have a test case database for BOPs.
My intention was to provide a proof-of-concept that using the tools (Intel Amplifier) you can identify and root-cause the bootlenecks and address them. This seems to have not been encouraging enough for the OCC team to pursue this - likely because of being below their priority line.
ReplyDelete
Replies
MicheleMay 4, 2012 at 6:58 PM
Well thank roman for your posts! very interesting!
ReplyDelete
Replies