So, I dove into the code and found out that the constructors were plain initializers of internal fields (e.g. double, integer, pointer). All three constructors look similar. Unwinding stacks revealed the root-cause – objects were created using new operator (e.g. ptr = (void*) (new IntPolyh_Triangle [N]) ) which was called on *huge* amount of copies. Look at this:
Each IntPolyh_MaillagaAffinage constructor creates arrays of 10 000 points, 20 000 triangles, and 30 000 edges for each face. Are they all really used ? With the debugger I stepped all the steps where these arrays are filled in, and what I did find ? Very often they are filled in with less than 100 elements. A few dozens effectively used while allocating for dozens of thousands ?! Unbelievable !
Two additional observations:
1. initialized elements in the array have never been read and have always been rewritten.
2. effective number of elements can be easily calculated upfront (e.g. n * m)
So, I looked into all IntPolyh classes to ensure that this is a common usage model, so that I could easily fix this with a deferred initialization with a particular number of elements. As usually, life is not that simple as it sometimes seems. Some classes (e.g. IntPolyh_ArrayOfEdges) implied that a number of effective uses can grow over time, and this feature was really used during mesh refinement. Moreover I found that many classes implement the same pattern of an array – where there is a number of allocated elements and number of effectively used. However we already saw above how ineffectively that strategy could be used :-(.
IntPolyh contains 7 classes IntPolyh_Array* that implement this pattern and are implemented as code duplication.
So, I went and created a single generic class IntPolyh_DynamicArray which would allocate memory with Standard::Allocate() (in order to not call new and constructors) and could grow over time if previously allocated memory was not enough. All 7 classes became instances of this template what significantly reduces a size of code to maintain.
Next, I made deferred initialization of these arrays when the number of elements to fill them in with is known (e.g. IntPolyh_MaillageAffinage::FillArrayOfPnt()).
There are other possible easy improvements to be made such as inlining all relevant methods of _Point, _Edge, etc. I did not do this right now and leave this up to the Open CASCADE team.
After these modifications time attributed to TKGeomAlgo.dll (measured with Amplifier) decreased as much as 5x-18x (from 2 to 7.28sec vs original 36.07s) !!! Overall speed up was about 3.5x (see screenshot below).
So, this was a relatively ‘low hanging fruit’ to tear off. And it gave such impressive results. I believe there are more than what can be done to improve, and I encourage OCC folks to do more, up to and including multi-threading. I don’t know BOP internals but I would check if running face-face intersection in parallel threads is feasible.
However, this was not the end of my own research.
(to be continued)