( continued...)
Data races
Data races is one of the most frequent problems that pop up when transitioning from sequential to parallel programming. When you develop a sequential code you usually make totally different assumptions on how you code would execute. It may require some imagination to understand how your code may work in multi-threaded environment and to anticipate which problems may happen.
Data races may result in really unpredictable application behavior – from sometimes correct results to non-reproducible incorrect results, and to spontaneous crashes. If you observe such behavior it may likely be due to a data race in your code or in 3rd party library. To catch it you can use Intel Parallel Inspector which is able to identify different memory and thread errors. Here is how it looks when catching an error in Open CASCADE:
Open CASCADE contains quite a lot of data races prone places if used in parallel applications as is. I first mentioned about data races in OpenCASCADE when experimented with IGES translation last year (search for forum threads on that). Working on ACIS importer revealed a few more. Most frequent examples are the following:
a). return of const& to a static variable inside the function. For instance:
const gp_Dir& gp::DX()
{
static gp_Dir gp_DX(1,0,0);
return gp_DX;
}
This works perfectly fine in a single-threaded application but may create a problem in a parallel one. When two threads simultaneously call gp::DX() for the first time, there is concurrent write-access to gp_DX variable and result is unpredictable. The fix in this case is to put definition outside of the function and thus to initialize the variable during load time:
static gp_Dir gp_DX(1,0,0);
const gp_Dir& gp::DX()
{
return gp_DX;
}
b). Use of static variables defined in a file.
These cases range from forced use when a variable cannot be part of an argument list or a class member to ugly usages that I can only attribute to non-understanding of C++. Fixes for these cases were symmetrical – from using Standard_Mutex whenever it was unavoidable to creating auto variables on a stack.
The example of the first case was GeomConvert_ApproxCurve.cxx which defines a C-style function that must use data outside of it (e.g. as a member of a class that calls it). The API of this function is prescribed elsewhere and itself is specified as an argument to some class.
static Handle(Adaptor3d_HCurve) fonct = NULL;
static Standard_Real StartEndSav[2];
extern "C" void myEval3d(Standard_Integer * Dimension,
// Dimension
Standard_Real * StartEnd,
// StartEnd[2]
Standard_Real * Param,
// Parameter at which evaluation
Standard_Integer * Order,
// Derivative Request
Standard_Real * Result,
// Result[Dimension]
Standard_Integer * ErrorCode)
// Error Code
{
...
StartEndSav[0]=StartEnd[0];
StartEndSav[1]=StartEnd[1];
...
}
In this case I had to protect fonct and StartEndSav with Standard_Mutex in the caller class to prevent data race. Reading Open CASCADE 6.3.1 Release Notes, I noticed that this issue was addressed and the OCC team made deeper changes introducing a class (not a C function) what allows to handle data transfer in a way not involving synchronization. This is good and I will be able to roll back my fixes.
c). Special case of BSplCLib, BSplSLib and PLib.
This case was already discussed in the blog last December in context of speeding up Boolean Operations. Let me briefly remind you the issue. For B-Splines calculations, version 6.3.0 (and prior) uses an auxiliary array of doubles allocated on heap; it is reallocated if new requested size is greater than previously allocated or is simply reused if already allocated buffer suffices. This is done in order to avoid frequent allocation/deallocation as this creates a bottleneck but obviously is not thread-safe.
So discussing then, Andrey Betenev suggested using auto variables (allocated on stack) as there is a limitation of a maximum B-Spline degree. Digging the code I realized that though in 99+% cases this will work, there can still be cases when any maximum threshold can be exceeded. This is because the number of poles in B-Splines is unlimited (and there are curves featuring more than 8192 poles!). So I implemented an improved alternative that use auto-buffers if the requested size is less than some threshold and uses heap otherwise.
template class BSplCLib_DataBuffer
{
public:
//8K * sizeof (double) = 64K
static const size_t MAX_ARRAY_SIZE = 8192;
BSplCLib_DataBuffer (const size_t theSize) : myHeap (0), myP (myAuto)
{
Allocate(theSize);
}
BSplCLib_DataBuffer () : myHeap (0), myP (myAuto) {}
virtual ~BSplCLib_DataBuffer()
{ Deallocate(); }
void Allocate (const size_t theSize)
{
Deallocate();
if (theSize > MAX_ARRAY_SIZE)
myP = myHeap = new T [theSize];
else
myP = myAuto;
}
operator T* () const
{ return myP; }
protected:
BSplCLib_DataBuffer (const BSplCLib_DataBuffer&) : myHeap (0), myP (myAuto) {}
BSplCLib_DataBuffer& operator= (const BSplCLib_DataBuffer&) {}
void Deallocate()
{
if (myHeap) {
delete myHeap;
myHeap = myP = 0;
}
}
T myAuto [MAX_ARRAY_SIZE];
T* myHeap;
T* myP;
};
(to be continued...)
Data races
Data races is one of the most frequent problems that pop up when transitioning from sequential to parallel programming. When you develop a sequential code you usually make totally different assumptions on how you code would execute. It may require some imagination to understand how your code may work in multi-threaded environment and to anticipate which problems may happen.
Data races may result in really unpredictable application behavior – from sometimes correct results to non-reproducible incorrect results, and to spontaneous crashes. If you observe such behavior it may likely be due to a data race in your code or in 3rd party library. To catch it you can use Intel Parallel Inspector which is able to identify different memory and thread errors. Here is how it looks when catching an error in Open CASCADE:
Open CASCADE contains quite a lot of data races prone places if used in parallel applications as is. I first mentioned about data races in OpenCASCADE when experimented with IGES translation last year (search for forum threads on that). Working on ACIS importer revealed a few more. Most frequent examples are the following:
a). return of const& to a static variable inside the function. For instance:
const gp_Dir& gp::DX()
{
static gp_Dir gp_DX(1,0,0);
return gp_DX;
}
This works perfectly fine in a single-threaded application but may create a problem in a parallel one. When two threads simultaneously call gp::DX() for the first time, there is concurrent write-access to gp_DX variable and result is unpredictable. The fix in this case is to put definition outside of the function and thus to initialize the variable during load time:
static gp_Dir gp_DX(1,0,0);
const gp_Dir& gp::DX()
{
return gp_DX;
}
b). Use of static variables defined in a file.
These cases range from forced use when a variable cannot be part of an argument list or a class member to ugly usages that I can only attribute to non-understanding of C++. Fixes for these cases were symmetrical – from using Standard_Mutex whenever it was unavoidable to creating auto variables on a stack.
The example of the first case was GeomConvert_ApproxCurve.cxx which defines a C-style function that must use data outside of it (e.g. as a member of a class that calls it). The API of this function is prescribed elsewhere and itself is specified as an argument to some class.
static Handle(Adaptor3d_HCurve) fonct = NULL;
static Standard_Real StartEndSav[2];
extern "C" void myEval3d(Standard_Integer * Dimension,
// Dimension
Standard_Real * StartEnd,
// StartEnd[2]
Standard_Real * Param,
// Parameter at which evaluation
Standard_Integer * Order,
// Derivative Request
Standard_Real * Result,
// Result[Dimension]
Standard_Integer * ErrorCode)
// Error Code
{
...
StartEndSav[0]=StartEnd[0];
StartEndSav[1]=StartEnd[1];
...
}
In this case I had to protect fonct and StartEndSav with Standard_Mutex in the caller class to prevent data race. Reading Open CASCADE 6.3.1 Release Notes, I noticed that this issue was addressed and the OCC team made deeper changes introducing a class (not a C function) what allows to handle data transfer in a way not involving synchronization. This is good and I will be able to roll back my fixes.
c). Special case of BSplCLib, BSplSLib and PLib.
This case was already discussed in the blog last December in context of speeding up Boolean Operations. Let me briefly remind you the issue. For B-Splines calculations, version 6.3.0 (and prior) uses an auxiliary array of doubles allocated on heap; it is reallocated if new requested size is greater than previously allocated or is simply reused if already allocated buffer suffices. This is done in order to avoid frequent allocation/deallocation as this creates a bottleneck but obviously is not thread-safe.
So discussing then, Andrey Betenev suggested using auto variables (allocated on stack) as there is a limitation of a maximum B-Spline degree. Digging the code I realized that though in 99+% cases this will work, there can still be cases when any maximum threshold can be exceeded. This is because the number of poles in B-Splines is unlimited (and there are curves featuring more than 8192 poles!). So I implemented an improved alternative that use auto-buffers if the requested size is less than some threshold and uses heap otherwise.
template
{
public:
//8K * sizeof (double) = 64K
static const size_t MAX_ARRAY_SIZE = 8192;
BSplCLib_DataBuffer (const size_t theSize) : myHeap (0), myP (myAuto)
{
Allocate(theSize);
}
BSplCLib_DataBuffer () : myHeap (0), myP (myAuto) {}
virtual ~BSplCLib_DataBuffer()
{ Deallocate(); }
void Allocate (const size_t theSize)
{
Deallocate();
if (theSize > MAX_ARRAY_SIZE)
myP = myHeap = new T [theSize];
else
myP = myAuto;
}
operator T* () const
{ return myP; }
protected:
BSplCLib_DataBuffer (const BSplCLib_DataBuffer&) : myHeap (0), myP (myAuto) {}
BSplCLib_DataBuffer& operator= (const BSplCLib_DataBuffer&) {}
void Deallocate()
{
if (myHeap) {
delete myHeap;
myHeap = myP = 0;
}
}
T myAuto [MAX_ARRAY_SIZE];
T* myHeap;
T* myP;
};
(to be continued...)