The great convenience of standard collections for a developer is that they free him/her from extra efforts on managing underlying memory allocations.
It’s so much convenient to just write
or likewise for OCC fixed size array:
and be done. Who cares where the memory gets allocated for underlying elements ? Not a big deal, right ?
Well, it can become a big deal if you have to work with multiple allocations/deallocations which become the hotspots.
I was recently improving thread-safety of a few core Open CASCADE algorithms: approximation, intersection, etc (see #23952) and observed an interesting pattern that motivated posting this blog.
Some OCC developer(s) realized in the past that memory allocations can be expensive in hot cycles and worked around this by using allocation in static memory. Excerpt from ApproxInt_ImpPrmSvSurfaces::Compute() (ApproxInt_ImpPrmSvSurfaces.gxx):
This would result in a single allocation (upon first entry into this function) which would stay until the program termination.
Obviously that does not work in multi-threaded environment. This would create data races – as two or more threads concurrently calling the same method – and results would be unpredictable (wrong and/or unstable results, sporadic crashes, etc).
Substituting with local variables
would resolve the reentrancy problem but would create another one – continuous memory allocation/deallocation would undermine performance as each constructor and destructor would call new[] and delete[] operators.
The good news is that Open CASCADE provides constructors for math_* and TCollection_Array* classes that accept an extra parameter – an address of allocated memory. In this case they simply reuse that memory and only provide access to it. Neither do they attempt to free it upon own destruction. This provides a possibility to allocate data on stack (which has essentially zero performance cost) and make respective objects use it, preserving source compatibility with the rest of the code.
Here is a fix that uses that:
You might want to take advantage of this simple technique whenever dealing with small objects and knowing the required sizes in advance.
It’s so much convenient to just write
{ std::vector v(n); for (int i = 0; i < n; ++i) v[i] = i; }
or likewise for OCC fixed size array:
{ TColStd_Array1OfReal a (0, n-1); for (int i = 0; i < n; ++i) v(i) = sqrt (i); }
and be done. Who cares where the memory gets allocated for underlying elements ? Not a big deal, right ?
Well, it can become a big deal if you have to work with multiple allocations/deallocations which become the hotspots.
I was recently improving thread-safety of a few core Open CASCADE algorithms: approximation, intersection, etc (see #23952) and observed an interesting pattern that motivated posting this blog.
Some OCC developer(s) realized in the past that memory allocations can be expensive in hot cycles and worked around this by using allocation in static memory. Excerpt from ApproxInt_ImpPrmSvSurfaces::Compute() (ApproxInt_ImpPrmSvSurfaces.gxx):
static math_Vector BornInf(1,2),BornSup(1,2),F(1,1), X(1,2), Tolerance(1,2); static math_Matrix D(1, 1, 1, 2);
This would result in a single allocation (upon first entry into this function) which would stay until the program termination.
Obviously that does not work in multi-threaded environment. This would create data races – as two or more threads concurrently calling the same method – and results would be unpredictable (wrong and/or unstable results, sporadic crashes, etc).
Substituting with local variables
math_Vector BornInf(1,2),BornSup(1,2),F(1,1), X(1,2),Tolerance(1,2); math_Matrix D(1, 1, 1, 2);
would resolve the reentrancy problem but would create another one – continuous memory allocation/deallocation would undermine performance as each constructor and destructor would call new[] and delete[] operators.
The good news is that Open CASCADE provides constructors for math_* and TCollection_Array* classes that accept an extra parameter – an address of allocated memory. In this case they simply reuse that memory and only provide access to it. Neither do they attempt to free it upon own destruction. This provides a possibility to allocate data on stack (which has essentially zero performance cost) and make respective objects use it, preserving source compatibility with the rest of the code.
Here is a fix that uses that:
Standard_Real aBornInf[2],aBornSup[2],aF[1],aX[2], aTolerance[2]; math_Vector BornInf(aBornInf,1,2),BornSup(aBornSup,1,2), F(aF,1,1), X(aX,1,2),Tolerance(aTolerance,1,2); Standard_Real aD[1][2]; math_Matrix D(aD,1, 1, 1, 2);
You might want to take advantage of this simple technique whenever dealing with small objects and knowing the required sizes in advance.