This a small continuation of a series on parallelism and Open CASCADE started last year. This post is just about a particular class but it is fundamental enough deserving a separate mention.
Geom_BSplineCurve, Geom_BSplineSurface, Geom2d_BSplineCurve, Geom_BezierCurve, Geom_BezierSurface, Geom2d_BezierCurve. They implement B-Spline and Bezier curves and surface.
Supposedly for optimization of calculation of points and derivatives, they use a cache that stores information used in calculation of the lastest B-Spline segment. In 2008, when I was working on Extrema performance improvements, I made some experiments about this cache efficiency. I tried several different workloads – Boolean operations, curve projections, IGES and STEP imports, and several others – but cache misses were ~50%-75%. That is, in more than a half cases, the cache had to be flushed out and recalculated. This made me wonder if this technique makes sense at all, if such computation-intensive algorithms do not really take advantage of it. Of course, to claim that I had to conduct more thorough experiments which I did not have time for. So I just left that issue alone. (But if anyone has some insights I would be interested to hear.)
The issue popped up in a year (December 2009 or early January 2010) when I was preparing a public release of the ACIS converter of CAD Exchanger. Until that it was parallelized and took advantage of multi-core machines performing translation in multi-threaded mode. However on some models from time to time I was receiving spontaneous crashes which made me wonder about root-causes. As symptoms looked as data races, I launched Intel Parallel Inspector to see if it's true.
What was my surprise when I saw that the root-cause was ... Geom_BSplineCurve::D0() ! My first impression was that Inspector was probably too jealous about OCC and is plain wrong, as D0() is just calculation of a point on the curve. Moreover it's a const method. How come a data race can be between two const methods ? But looking deeper at stacks, source code and applying some brain power, I realized that. Yep! D0() first casts a const pointer to a non-const pointer and calls a non-const method:
Geom_BSplineCurve * MyCurve = (Geom_BSplineCurve *) this ;
MyCurve->ValidateCache(NewU) ;
So, what does that mean ? It means that when the same B-Spline curve is used in 2 or more threads they read/write the cache without synchronization, thereby overriding data and creating a data race. Obviously, the value returned by D0() can be totally wrong what leads to subsequent crashes in the algorithm that calls it (e.g. projection of a 3D curve on a B-Spline surface). As this situation (simultaneous use of the same surface object in different threads) is rather rare, crashes did not always happen.
I did not have time before that release to fully solve the problem, so had to simply disable parallelism in the ACIS converter. Now I'm returning to this issue and I'm going to introduce object copying before use in multiple threads (what would add some overhead but this will be a 'cold path' as such cases would be still be rare).
So if you consider use of Open CASCADE in multi threads beware of this potential issue. Make sure you have protected access (e.g. copy the objects, access synchronization, etc) or independent copies of the B-Splines.
Geom_BSplineCurve, Geom_BSplineSurface, Geom2d_BSplineCurve, Geom_BezierCurve, Geom_BezierSurface, Geom2d_BezierCurve. They implement B-Spline and Bezier curves and surface.
Supposedly for optimization of calculation of points and derivatives, they use a cache that stores information used in calculation of the lastest B-Spline segment. In 2008, when I was working on Extrema performance improvements, I made some experiments about this cache efficiency. I tried several different workloads – Boolean operations, curve projections, IGES and STEP imports, and several others – but cache misses were ~50%-75%. That is, in more than a half cases, the cache had to be flushed out and recalculated. This made me wonder if this technique makes sense at all, if such computation-intensive algorithms do not really take advantage of it. Of course, to claim that I had to conduct more thorough experiments which I did not have time for. So I just left that issue alone. (But if anyone has some insights I would be interested to hear.)
The issue popped up in a year (December 2009 or early January 2010) when I was preparing a public release of the ACIS converter of CAD Exchanger. Until that it was parallelized and took advantage of multi-core machines performing translation in multi-threaded mode. However on some models from time to time I was receiving spontaneous crashes which made me wonder about root-causes. As symptoms looked as data races, I launched Intel Parallel Inspector to see if it's true.
What was my surprise when I saw that the root-cause was ... Geom_BSplineCurve::D0() ! My first impression was that Inspector was probably too jealous about OCC and is plain wrong, as D0() is just calculation of a point on the curve. Moreover it's a const method. How come a data race can be between two const methods ? But looking deeper at stacks, source code and applying some brain power, I realized that. Yep! D0() first casts a const pointer to a non-const pointer and calls a non-const method:
Geom_BSplineCurve * MyCurve = (Geom_BSplineCurve *) this ;
MyCurve->ValidateCache(NewU) ;
So, what does that mean ? It means that when the same B-Spline curve is used in 2 or more threads they read/write the cache without synchronization, thereby overriding data and creating a data race. Obviously, the value returned by D0() can be totally wrong what leads to subsequent crashes in the algorithm that calls it (e.g. projection of a 3D curve on a B-Spline surface). As this situation (simultaneous use of the same surface object in different threads) is rather rare, crashes did not always happen.
I did not have time before that release to fully solve the problem, so had to simply disable parallelism in the ACIS converter. Now I'm returning to this issue and I'm going to introduce object copying before use in multiple threads (what would add some overhead but this will be a 'cold path' as such cases would be still be rare).
So if you consider use of Open CASCADE in multi threads beware of this potential issue. Make sure you have protected access (e.g. copy the objects, access synchronization, etc) or independent copies of the B-Splines.