Extreme fight with Extrema. Part 3


However this did not fully solve the problem of caching results. As I mentioned earlier, B-Splines are split into C2 intervals which are processed individually. For example, on one B-Spline there were 20 C2 intervals, so caching only worked within each of them but every time a new subrange of the another edge's range was called, all data were recalculated. So, I extended caching capabilities to support lists of caches and every time a new subrange of the 1st edge's range is used, points along subranges of the 2nd edge's range are retrieved from the cache. After this modification, D0() is called only 475K times, which is 570 times less comparing with initial baseline (273.2M). CPU time reduced by 2+x. Yes, I did it !

* New targets *
So, what remains ? Looking at new hotspots below, you can see that Extrema_ECCOfExtCC::Perform() remained among top 3.

What's inside ? Looking at its top 3 hotspots (1st and 2nd are shown below, and the 3rd is similar to the 2nd) are connected with access to the array of doubles TbDist2 (containing square distances between sample points along two curves).

I suspect these are due to data latency, e.g. cache misses, i.e. when referenced data are not available in processor cache and they are loaded from RAM, which is time consuming. When some data are loaded from memory, some adjacent memory is also loaded by the processor (it assumes that you may need some of it and does this in advance). With TbDist2 this is not quite the case – because for each element [i, j] eight neighbors are checked ([i-1,j-1], [i-1,j]…[i+1,j+1]). Data for most of them are not adjacent in memory and therefore there are likely many misses here. And due to millions of calls (which as you correctly guess did not change after above modifications) result in a hotspot. I am not sure if there is anything that can be done with this, and I will probably have to talk to my Intel colleagues to check what can be done in a case like this. Perhaps the only possible solution lies upstream in reducing a number of subranges edges are split into. Today, after brief emails with the OCC team, I have no clues if/how this can be done. So, this will be up to them to continue in that direction.

Two other hot-spots – PLib::EvalPolynomial(), BSplCLib::CacheD1() – are connected with calling first derivative on B-Spline curve. Geom_BSplineCurve::D1() is called when finding a root of function of minimizing a distance between two curves. BSplCLib::Bohm() is called 2.4M times and PLib::EvalPolynomial() – 30M times. I did not find the way to cache calculations (if you see such, please let me know) and probably the solution is also upstream.

* Next steps *
So, this where I am now, and frankly I am about to pause here ;-). I sent modifications to OCC for validation on the regression database (keep fingers crossed ;-)). If they are OK, I'm going to make similar modifications for 2D case to ensure consistency and perhaps some other polishing in Extrema for easier future maintenance.

Of course, there is a possibility to try multi-threading on BOPs but I am not yet up to this, as this will require deeper understanding of the algorithms and its concurrency possibilities. We'll see…

Meanwhile I'll try to post fixes on the SourceForge if anyone would like to try them. As usual, any feedback will be appreciated, especially if you witness performance improvements or degradation.


Please rate this post using the voting buttons below.

After the fist article I received a few more test cases, and even more, a set of modifications in BOP on performance improvements. They have been made by one of OCC customer, registered here as Solar Angel. Per his estimations speed up in Cut (as common part seems not affected) was from a few minutes to a matter of seconds. Cool! I was excited to get these fixes and get in touch with the guy again after several years !

Extreme fight with Extrema. Part 2


The first one of such was sqrt(), which was among top 5 hotspots. As it is a system function, I did not notice it until pressed a button on the Amplifier toolbar (the matter of fact is that by default Amplifier attributes system functions time to its callers).

After unwinding its stacks I found the reason. Extrema used gp_Pnt::Distance() (or sometimes gp_Vec::Magnitude()) everywhere both to compute exact distances and to find out if one point is closer than another. Well, this had a price for such an abuse – 6.5% of total CPU time. Without sacrificing any correctness, distances can be replaced with square distances. This is a hint for you – try to use SquareDistance() or SquareModulus() wherever possible instead of their countparts invoking sqrt(), if you have multiple calculations. So after made modifications, I managed to save about 5-6% (I did not eliminate all the calls).

Another finer improvement related to Geom_BSplineCurve. As we already noticed and will see more precisely below, calculations of points along B-Spline curves were intensively performed. This involves Geom_BSplineCurve::IsCacheValid() (to check if a local cache can be used for faster computations). Its time was 6.25%, and looking at the source and assembly code helped to understand why:

Reason – division (the fdiv instruction) by spinlenghtcache to normalize an original range into [0, 1). Actually it could be easily avoided and I simplified the code also eliminating multiple branching to increase readability (and possibly performance, as branching is expensive). Compare:

This gave about 60% time reduction of this function. Well, floating point operations are expensive, and even if subtraction is not as expensive as division, it still takes time.

*Calculations of B-Spline curve points*
Unwinding the stacks of the top hotspot PLib::NoDerivativeEvalPolynomial() helped locate the place where it was called from – Geom_BSplineCurve::D0() – which in its turn was called from Extrema (remember above mentioned tons of calls from inside ?). I put a counter and measured that number of calls was 272,3 millions ! Wow !

As nothing in the method itself promised any improvements, I had to made modifications upstream. In Extrema.


Until now Extrema architecture regarding curve-curve case was very inflexible. It always redid all computations flushing out results of any previous call. On the other hand, point-surface and curve-surface has some caching, so I wonder why developers missed it for this case.

Anyway, I extended the API to be able to first set every curve and/or its ranges and later perform calculations. To take advantage of that, I also had to modify IntTools_BeanBeanIntersector::ComputeUsingExtrema() to set curve for theRange2 (specified as its argument) only once. This was a first step and reduced number of calls to Geom_BSplineCurve::D0() / PLib:: PLib::NoDerivativeEvalPolynomial() to 8.6M.

(To be continued)

Extreme fight with Extrema. Part 1

Those who regularly follow my blog likely remember that after recent speed up of Boolean Operations (see Why are Boolean Operations so slooo...ooow?) there remained a test case from Pawel K that almost was not affected. So I accepted a challenge and spent more time on this. Many thanks to Pawel for providing these models. Though they appeared to be very peculiar (such as tiny ellipse arcs and huge B-Splines of 300+ poles and 40+ knots) they allowed to detect a problem with wider impact.

Bottom line first: achieved speed up is 2.3x (from 113s to 49s). Not bad, but not quite a breakthrough because elapsed time is still in order of dozens of seconds. There are some (not obvious) further directions for optimization but they will likely be even more time-consuming, so I am not yet ready to commit to that (moreover as I already spent half of my vacations ;-)).
But what is more important is that made modifications open a door to speed up other OCC algorithms, as modified was the core component – Extrema. Today's story is about changes in it…

So, as usual, I used Intel Parallel Amplifier (by the way, we just RTM'ed, Released to Manufacturing, the version for public beta which should be available in January) to understand hotspots.

They were totally different than in the cases I dealt with in my previous experiment and it promptly became obvious they related to edge-to-edge intersections (see stack on the right).

Looking at IntTools_BeanBeanIntersector::ComputeUsingExtrema() it became understandable what was happening.

Each pair of edges which is a candidate for intersection (e.g. if their bounding boxes intersect) is analyzed. Each edge is split into certain number of ranges (e.g. ellipse – into 40) and then these ranges are analyzed with the other edge's ranges. And though inside ComputeUsingExtrema() the 2nd range does not change, the Extrema object (used to calculate the smallest distance) is created from scratch and initialized with it every time. Diving further with the debugger into Extrema, I found out that B-Splines are split into C2 intervals (which on Pawel's model were about 20!). Along each curve interval, a few sample points (32 by default) are taken. Distances are measured on the fly and candidates are further passed to math_Function* for precise calculations. Inside it the number of sample points is sometimes magically increased by 2x (just for more reliable sampling, ah ?) So, tons and tons of calculations without any attempt to reuse what already been calculated some time before.

The main idea for optimization was obvious – cache and reuse. This would require some change in Extrema API (as it did not support that) which I eventually did. It is currently limited to 3D curves only, similar improvements for other cases are deferred until OCC folks (who I sent the fixes today to) can confirm there are no regressions.

But before proceeding to API redesign, I decided to make finer improvements to make sure tiny bottlenecks do not go away from radar due to larger-scale improvements.

(To be continued).


Sexy background

If you want to add a special touch to your application, here are a couple of hints on ‘personalizing’ your 3D view.

*Adding an image as background*
Use V3d_View::SetBackgroundImage() that accepts a filename to gif, bmp or xwd image and a placement option (center, stretched or tiled) defined by the Aspect_FillMethod enumeration. Calling its with Aspect_FM_NONE erases the image.
Here’s an original image and its use with the Aspect_FM_STRETCH option:

You may want to extend pre-built options, and for instance add an image to a bottom-right corner to add your company logo.

* Gradient background *
This is a frequently used background style in CAD applications. Open CASCADE does not offer a direct API to set it, but it can be implemented. The following is an excerpt from my extension of DRAWEXE:

static void UpdateGradientBackground (const Handle(Visual3d_Layer)& theLayer,
const Quantity_Color& theTopColor,
const Quantity_Color& theBottomColor)
int aWidth = ..., aHeight = ...; //e.g. QWidget::width() and height() for Qt-based apps
theLayer->Clear(); //make sure we draw on a clean layer
theLayer->SetViewport (aWeight, aHeight);
//now draw a polygon using top and bottom colors
//remember that default boundary is [-1,1;-1,1] and origin is in the left bottom corner

//check position for the middle color - if transition should be non-uniform then
//additional points should be inserted and techiques changes - 2 polygons instead of 1
theLayer->SetColor (theTopColor);
theLayer->AddVertex (-1,1);
theLayer->AddVertex (1,1);

theLayer->SetColor (theBottomColor);
theLayer->AddVertex (1,-1);
theLayer->AddVertex (-1,-1);

static int VSetBgColor (Draw_Interpretor& di, Standard_Integer argc, const char** argv)
Handle(V3d_View) V3dView = ViewerTest::CurrentView();
if ( V3dView.IsNull() ) return 1;

static Handle(Visual3d_Layer) aLayer;

if (argc == 4) {
if (!aLayer.IsNull()) {
//switch to a single color mode
aLayer->Destroy(); //explicit destruction is required as destructor
// will not be called (one reference remains in Visual3d_ViewManager)
V3dView->SetBackgroundColor (Quantity_Color (atof(argv[1]), atof(argv[2]), atof(argv[3]), Quantity_TOC_RGB));
} else if (argc == 7) {
Quantity_Color aTopColor (atof(argv[1]), atof(argv[2]), atof(argv[3]), Quantity_TOC_RGB);
Quantity_Color aBottomColor (atof(argv[4]), atof(argv[5]), atof(argv[6]), Quantity_TOC_RGB);
if (aLayer.IsNull()) {
Standard_Boolean aSizeDependant = Standard_True; //each window to have particular mapped layer?
aLayer = new Visual3d_Layer (V3dView->Viewer()->Viewer(),
Aspect_TOL_UNDERLAY, aSizeDependant);
UpdateGradientBackground(aLayer, aTopColor, aBottomColor);
} else {
di << "Usage : " << argv[0] << " {color(R G B) | top_color(R G B) bottom_color(R G B)} \n";
return 1;

return 0;

Here’s an example view:

Please rate this post using the voting buttons below.


License to kill. License to use

Famous James Bond 007 had a license to kill; the "00" designation in his code number meant he had a sanction to apply a deadly force. In order to use any software you also need a license. Let me repeat, *any* software, even one which you can download free of charge with a couple of mouse clicks.

When accepting a license you become bounded by a legal agreement that the software is available under. Tell me honestly, how many of you and how often do you read the license agreement shown during install screen that makes you select a radio-button "Yes, I accept this license" before clicking Next? I bet very few, and I’m not a role model either ;-).

The fact that you can freely download some software does not imply you can use it and distribute your software based on it at your own wish. The most famous example is GPL'ed software (General Public License) which is known to be 'viral', i.e. making your own software GPL'ed. This at least applies to GPL version 2; version 3 comes with some more sophisticated terms which I did not fully study yet.

So, what about Open CASCADE license ? Among most important aspects, I'd underline that it's quite permissive and allows you to use OCC in your project (open or closed source, free of charge or for a fee), that can be distributed under your own proprietary license. Like most other Open Source licenses it requires that you include a copy of the license into your distribution. All modifications to Open CASCADE software you might make must be made available in source code, under the same license to anyone.

The summary on the site says it is "LGPL-like". I must confess this was my own suggested wording which we put with the web team, when I worked at Open CASCADE. We put that note to contrast it with GPL. It was based on my then knowledge of the subject. Working at Intel, where licensing issues are explained as part of mandatory trainings I now view this is not exactly the case. LGPL is still quite viral (though much less than GPL) and is not too welcomed in commercial applications. The OCC Public license has quite different focuses.

I believe the OCC license is somewhat weird and should better be changed. There are several existing widely recognized and adopted Open Source licenses. Any time the company comes out with its own 'open source' or 'public' license, it creates a headache for potential users and company's lawyers to read it and to understand implied rights and obligations. For average people (even native English speakers), juridical vocabulary is Greek. Take the Tax Code of your country, open it and start reading on an arbitrary page. I bet you will have to re-read each paragraph several times before you get an idea of what it is about, not saying a word about tiny details where, as we know, the evil is. So, if you really want to give your software into the Open Source world, you better choose something existing.

In this regard, Intel's move in 2005 was very symbolic to discontinue its own (and, by the way, recognized and approved) Open Source license. Motivation ? Exactly that, stop license proliferation and ease Intel’s software adoption. Look what other successful people or companies do – learn and do the same ;-).

So, my modest suggestion to the Open CASCADE company is to consider favoring some well recognized Open Source license and to migrate to it. This will ease OCC adoption and will benefit all parties.

We can continue discussion in comments. So feel free to throw in your ideas !


Why are Boolean Operations so slooo...ooow ? Part 3


For my experiments I used my own patched version of Open CASCADE 6.3.0 that accumulated all the modifications for prototyping multi-threading I mentioned earlier on the forum. One set of the modifications in it are thread-safe versions of BSplSlib, BSplCLib and PLib. Original 6.3.0 uses static array which are re-used by further calls to ::D0(), D1(), etc what is obviously not thread-safe (or even non-reentrant), so a few months back I changed them to use local variables which would be allocated/destroyed upon every relevant *Lib function call. This was just fine for IGES import scenarios but working on BOPs I noticed that my code revealed performance regressions to 6.3.0. See the image below:

So it was obvious that constant allocations/destructions were not an option. I checked how many times reallocation (or actually allocation) was used on a BOP test case, and it was 11.7 millions; allocated buffers were from 2 to 264 doubles long. Well, I had to return back the previous approach (i.e re-allocation in the case when new requested buffer exceeded previously allocated one). But how to ensure re-enterability ? The answer was quite obvious – TLS, or Thread-Local Storage. That is, each thread has its own buffer, and it's only used by that thread. So, I wrote a class (let’s tentatively name it TLSManager) that contains a map (hash-table) of {Standard_ThreadId, set of buffers}, and returns a requested buffer depending on a thread id from which a buffer was requested.

Another obvious problem poped up – TLSManger must be thread-safe but using Standard_Mutex to protect it would be an overkill. There is a solution to this typical problem, which is a read-write-lock, i.e. an object that allows multiple concurrent read-accesses to a shared resource (in our case a map with buffer sets) and exclusive write-access (when a new set of buffers is created for a new thread). Well, I went and re-invented a wheel, and added a class OSD_ReadWriteLock. Doing this (like when adding other thread classes in OSD a while ago) I once again thought that OCC should rather bring in Boost (www.boost.org) than re-designing own wheels. Salome does use Boost, so OCC can obviously too. For instance, Boost also offers a template for TLS - http://www.boost.org/doc/libs/1_37_0/doc/html/thread/thread_local_storage.html.

Once the RWL class has been written, I created a simple test case to check it and used Intel Parallel Inspector for that. Inspector allows to identify memory errors (leaks, uninitialized memory use, etc) and thread errors (data races, deadlocks, etc). Thread checker is one of a kind (there are now other software as far as I know) but its overhead is substantial.

Well, this session with Inspector that was something! It reported data races, as if my RWL were simultaneously read and wrote into the same memory (object member).

I spent several hours and felt like a full dumb looking at my one page code trying to root-cause the errors and beating my head over the keyboard and anything else around. Crazy Friday evening at home! When I gave up, I recompiled my unit test to use Qt’s QReadWriteLock, and what ? Same data races!

That made me doubt even further; I wrote down all behavior scenarios on a paper sheet and reaffirmed that there cannot be any data races, everything was protected. It’s just plain crazy false positive (i.e. a report on a problem that actually does not exist)! I know that Inspector has false positives issues but I could not imagine it would beat me that much. So, I am looking forward to talking to my Intel colleagues about that. (Make a note for you, when you download Inspector and notice the same problem, you might want to recall my case – perhaps it will be some unfixed false positive ;-)).

OK, now being confident in my RWL and I went and tried TLSManger with RWL in BSplSLib. And ? The regression has gone ! This small overhead for RWL use (instead of former shared static buffers) became actually unnoticeable. Excellent!

With all those modifications, overall speed up vs 6.3.0 was about 4x on Open CASCADE test case. I tried a simpler model sent by forum participants, and it revealed 20x speed up. Very small cases running at a fraction of a second did not reveal substantial speed up. Thus, in general we can roughly project speed up in 3x-10x range in average.

There is still something to do, e.g. to design TLSManager in CDL and migrate PLib and BSplCLib to it. I will do this as time permits, hopefully sooner while my memory is fresh.

And, by the way, if you want to try out your models with a new version, please feel free to send me the models via email or a download link. Those who eager to get the fixes, just let me know ;-)

Looking back, I think that time spent on it was worth it. I do hope that these findings will inspire the OCC team to dig further and to find further rooms for improvements, beyond BOP. I hope that my colleagues at Intel will appreciate 14 bug reports and enhancement requests I compiled during these days, and that by a commercial release the tools will be even better than they are today. I was able to learn something new in depths of OCC Modeling algorithms, and this was good. Folks from the Community will benefit via future OCC releases that would hopefully include my modifications.

I will continue to prepare OCC test cases for app testing, and if there is anything interesting, I will share with you.

Let me add a few more words on performance. Being at Intel I now view it a bit differently than when working at a software development company. Guys, times are changing (or already changed if you want). Free lunch, when your app would run faster just with every new released processor, is over. Megahertz era is over. To make your application run faster you must make it multi-threaded and scalable. Performance is not just that your app runs faster. Higher performance means more features. Look at spell checker in MS Word – it checks as your type your document. It’s just because it’s fast enough and because it runs in a parallel thread.
If you want to stay competitive in the market, you must parallel. No other way. It’s challenging but fortunately there are tools to help with that. And I am happy that somehow I relate to them. Go and try Intel tools (www.intel.com/go/parallel).

Endorsement ? Well, may be. Sincere ? Absolutely ! (I practice what I preach ;-) )

Good luck !

P.S. Please rate this article using the voting buttons below.

Why are Boolean Operations so slooo...ooow ? Part 2


So, I dove into the code and found out that the constructors were plain initializers of internal fields (e.g. double, integer, pointer). All three constructors look similar. Unwinding stacks revealed the root-cause – objects were created using new[] operator (e.g. ptr = (void*) (new IntPolyh_Triangle [N]) ) which was called on *huge* amount of copies. Look at this:

Each IntPolyh_MaillagaAffinage constructor creates arrays of 10 000 points, 20 000 triangles, and 30 000 edges for each face. Are they all really used ? With the debugger I stepped all the steps where these arrays are filled in, and what I did find ? Very often they are filled in with less than 100 elements. A few dozens effectively used while allocating for dozens of thousands ?! Unbelievable !
Two additional observations:
1. initialized elements in the array have never been read and have always been rewritten.
2. effective number of elements can be easily calculated upfront (e.g. n * m)

So, I looked into all IntPolyh classes to ensure that this is a common usage model, so that I could easily fix this with a deferred initialization with a particular number of elements. As usually, life is not that simple as it sometimes seems. Some classes (e.g. IntPolyh_ArrayOfEdges) implied that a number of effective uses can grow over time, and this feature was really used during mesh refinement. Moreover I found that many classes implement the same pattern of an array – where there is a number of allocated elements and number of effectively used. However we already saw above how ineffectively that strategy could be used :-(.

IntPolyh contains 7 classes IntPolyh_Array* that implement this pattern and are implemented as code duplication.

So, I went and created a single generic class IntPolyh_DynamicArray which would allocate memory with Standard::Allocate() (in order to not call new[] and constructors) and could grow over time if previously allocated memory was not enough. All 7 classes became instances of this template what significantly reduces a size of code to maintain.

Next, I made deferred initialization of these arrays when the number of elements to fill them in with is known (e.g. IntPolyh_MaillageAffinage::FillArrayOfPnt()).

There are other possible easy improvements to be made such as inlining all relevant methods of _Point, _Edge, etc. I did not do this right now and leave this up to the Open CASCADE team.

After these modifications time attributed to TKGeomAlgo.dll (measured with Amplifier) decreased as much as 5x-18x (from 2 to 7.28sec vs original 36.07s) !!! Overall speed up was about 3.5x (see screenshot below).

So, this was a relatively ‘low hanging fruit’ to tear off. And it gave such impressive results. I believe there are more than what can be done to improve, and I encourage OCC folks to do more, up to and including multi-threading. I don’t know BOP internals but I would check if running face-face intersection in parallel threads is feasible.

However, this was not the end of my own research.

(to be continued)

Why are Boolean Operations so slooo...ooow ? Part 1

Open CASCADE Boolean Operations (BOPs) have frequently been claimed to be slow. Have anyone tried to find out why ?

As you probably remember, I recently mentioned in another post that at Intel we have decided to integrate Open CASCADE into our application testing database. So I took on a challenge to create a few test cases to regularly check Intel Parallel Amplifier and Inspector (part of new Intel Parallel Studio).

In addition to my recent test cases with IGES import which has been prototyped to run in multi-threading mode, this time I have proceeded to Boolean Operations (BRepAlgoAPI). I requested a few models on the forum but replies were surprisingly not numerous :-(. Anyway, I am thankful to Evgeny L, Prasad G, Pawel K, as well as to Igor F for their examples.

The bottom line. On relatively complex models, overall achieved speed up was from 4x (100+ faces in a model) to 20x (several dozens faces). Examples of reduced CPU time – from 80secs to 20s, from 30s to 1.4s. (Disclaimer: once this article has been drafted during last week-end, I experimented with another set of models sent by Pawel Kowalski. They revealed other different bottlenecks than mentioned below, and therefore described improvements do not affect them. I’ll be continuing my experiments as time permits and will hopefully post further findings)

* Story *
So let us follow the steps which have been made.

I have focused on the BopTools_DSFiller class which is central to the Boolean Operations (BOP) as it prepares the models by intersecting them so that later on fuse, common, and cut just take its results and reconstruct requested combination.

As a first test case, I took two models provided by my former colleagues at OCC who participated in Intel Parallel Studio beta program. These were two solids of 130+ faces each, and BopTools_DSFiller::Perform()took 67secs of CPU time.

I installed the latest build of Intel Parallel Amplifier (reminder: public Beta will be available in early January and you can subscribe already now here – www.intel.com/go/parallel). The only applicable analysis type was ‘Hotspot Analysis’ which identifies most CPU-consuming functions. Amplifier also offers ‘Concurrency Analysis’ and ‘Waits & Locks Analysis’ but these were irrelevant as BOPs currently run in single thread only, while they are tailored to multi-threaded apps.

* First findings *
Top functions that Amplifier reported were located in TKGeomAlgo.dll and related to the IntPolyh package. Not surprising as BOPs are based on meshes intersection and IntPolyh creates those meshes.
Top 3 functions – constructors of IntPolyh_Triangle, _StartPoint, and _Edge altogether took almost 20 seconds (see the image below).

(to be continued)


Adding colors and names to your application. Part 3


* Visualization of the shapes*

The XDE framework provides functionality to display contents in 3D viewer with the help of XCAFPrs_AISObject, which eventually inherits AIS_InteractiveObject and thus can be used in a usual manner.

Since XDE is OCAF-based you should couple it with AIS in OCAF-specific way, i.e. associating a driver (TPrsStd_Driver descendant) that creates an interactive object. In this case the driver is XCAFPrs_Driver that creates an instance of XCAFPrs_AISObject. Below is a code skeleton:

TDF_Label anAccess = aDoc->GetData()->Root();
Handle(TPrsStd_AISViewer) anAISViewer;
if (!TPrsStd_AISViewer::Find (anAccess, anAISViewer)) {
Handle(V3d_Viewer) aViewer = ...;
anAISViewer = TPrsStd_AISViewer::New (anAcces, aViewer);

// collect sequence of labels to display
Handle(XCAFDoc_ShapeTool) aShapeTool = XCAFDoc_DocumentTool::ShapeTool (aDoc->Main());
TDF_LabelSequence seq;
aShapeTool->GetFreeShapes (seq);

// set presentations and show
for ( Standard_Integer i=1; i <= seq.Length(); i++ ) {
Handle(TPrsStd_AISPresentation) prs;
if ( ! seq.Value(i).FindAttribute ( TPrsStd_AISPresentation::GetID(), prs ) ) {
prs = TPrsStd_AISPresentation::Set(seq.Value(i),XCAFPrs_Driver::GetID());

Here’s a 3D viewer screenshot.

In order to enable display of names, then XCAFPrs::SetViewNameMode() must be called with Standard_True (before display). Below is an example of 3D view with names display turned on:

Note that displayed text labels adversely impact performance, and in the case of numerous displayed labels, your viewer can become significantly less responsive.

* Non-OCAF based documents *

If for any reason you don’t use OCAF and want to exchange attributes with IGES and STEP, you will have to do this on your own directly accessing objects representing file entities. Look at source code of STEPCAFControl and IGESCAFControl packages to copy XDE behavior.

(The end)
Please rate this article using the voting buttons under the text.


Adding colors and names to your application. Part 2


OCC_UT_MyXDEApp could have inherited XCAFApp_Application and redefinition of Formats() and ResourcesName() would not have been required. But XCAFApp_Application’s constructor has been declared private (instead of protected) what disables inheritance :-(. Perhaps, OCC folks could correct that.

The document is created in a straightforward way:

Handle(TDocStd_Application) anApp = OCC_UT_MyXDEApp::GetApplication(); Handle(TDocStd_Document) aDoc; anApp->NewDocument ("XmlXCAF", aDoc);

The screenshot below shows the structure of a new document created with above application.

Note that XCAFDoc_DocumentTool has been assigned to the label 0:1:1:1 and it created a sub-tree with several pre-defined labels (XCAFDoc_ShapeTool, _ColorTool, etc).

OCC provides persistence for XDE-specific attributes in all three supported format (see StdResource) – standard textual, xml and binary. Some attributes (e.g. XCAFDoc_MaterialTool) are not stored in a file and are re-creating during read process.

TCollection_ExtendedString anErrorMsg;
anApp->SaveAs (aDoc, "C:\\Dev\\3dmodels\\sampledoc.xml", anErrorMsg);
if (anErrorMsg.Length()) {
std::cout << "Error occurred - " << TCollection_AsciiString (anErrorMsg).ToCString() << std::endl;


Here is a resulting XML file:

* Import /export with IGES and STEP *

To read/write colors and names from/to IGES or STEP you have to use classes {IGES,STEP}CAFControl_{Reader,Writer}, e.g. IGESCAFControl_Reader or STEPCAFControl_Writer.

IGESCAFControl_Reader aReader; IFSelect_ReturnStatus aStatus = aReader.ReadFile (aFileName); if (aStatus == IFSelect_RetDone) { Standard_Boolean aRes = aReader.Transfer (aDoc); }

Here is a document content after import of a sample file:

(to be continued)

Adding colors and names to your application. Part 1

If you have to exchange data with other applications via IGES or STEP (or perhaps other formats, if you are a commercial client of the Open CASCADE company), you might want to enrich your application with meta data in addition to geometry. We will consider names and colors which are often asked about on the Open CASCADE forum.

OCC provides a ready-to-use framework – called XDE (eXtended Data Exchange) – which is based on OCAF. XDE offers a pre-defined document sub-structure to store colors, names, layers as well as other attributes (see XDE User’s Guide for details). This is done through a set of attributes defined in the XCAFDoc package that provides API to access data.

*Basic definitions*

Let’s start with assumption that your application uses OCAF for data description. In order to make your OCAF document XDE-compliant, you need to add XCAFDoc_DocumentTool attribute to your label of choice. It will add other required attributes to the sublabels. The easiest way is to extend your application class deriving TDocStd_Application, for example as follows:

class OCC_UT_MyXDEApp : public TDocStd_Application
//singleton pattern
Standard_EXPORT static const Handle(OCC_UT_MyXDEApp)& GetApplication();

virtual void Formats (TColStd_SequenceOfExtendedString& theFormats)
{ XCAFApp_Application::GetApplication()->Formats (theFormats); }

virtual Standard_CString ResourcesName()
{ return XCAFApp_Application::GetApplication()->ResourcesName(); }

Standard_EXPORT virtual void InitDocument (const Handle(TDocStd_Document)& theDoc) const;

OCC_UT_MyXDEApp() {}



const Handle(OCC_UT_MyXDEApp)& OCC_UT_MyXDEApp::GetApplication()
static Handle(OCC_UT_MyXDEApp) anApp = new OCC_UT_MyXDEApp;
return anApp;

void OCC_UT_MyXDEApp::InitDocument (const Handle(TDocStd_Document)& theDoc) const
//create a child of the main label and put XCAFDoc_DocumentTool there (i.e.
//one level below comparing to default XDE)
TDF_Label aL = theDoc->Main().FindChild (1, Standard_True); //0:1:1
XCAFDoc_DocumentTool::Set (aL.FindChild (1, Standard_True), Standard_False); //0:1:1:1

(to be continued)


Sharpening your tools

Let me share with you a couple of simple things that could make your life a bit easier if you use Microsoft Visual Studio for development on Open CASCADE.

1. Highlighting Open CASCADE class names in the editor

This will improve your code readability and what is more important, can serve as additional quick check whether you correctly spell a OCC class name before you get a compiler error ;-).

What you just need to do is to copy the file UserType.dat shipped with Open CASCADE (tools subdirectory) it to where devenv.exe is located. For instance, on my laptop it’s here - c:\Program Files\Microsoft Visual Studio 8\Common7\IDE\. If you already have some UserType.dat, just concatenate the contents.

2. More convenient display of Open CASCADE data types in debugger

Have you even been tired of clicking in a deep nested tree in the Watcher window to dig down from your surface just to check its reference count ? If yes, this can be for you.

As you might know, the debugger allows you to display complex types (e.g. classes) not only as default ‘{…}’. To do that you can describe display rules, modifying the autoexp.dat file.
I have found and extended one with a few frequently used OCC types. Go download it here and insert into autoexp.dat, located inside Visual Studio hierarchy. For instance, for VS2005 it is here - c:\Program Files\Microsoft Visual Studio 8\Common7\Packages\Debugger\Autoexp.dat.
Compare these two screenshots. The one below (with modified autoexp.dat) displays everything what one above does but in a more concise form without forcing you to expand the tree.

OCC team: feel free to include this file or its extended version into Open CASCADE release ;-).

There is yet another functionality – skipping stepping into a function in the Debugger, so that when you click F11, it bypasses some internal functions you don’t want to spend your time in. This is an undocumented technique described here. I did not try it but you may want to. If you are a success, please share your settings with us, this will likely be helpful for several people.

There are likely more than that. If anyone is willing to share his/her experience, please send me an email at roman.lygin@gmail.com and I can post it.


Building from scratch. Part 2


First, do you know where WOK name comes from ? It’s not just Workshop Organization Kit as named in the documentation. In Chinese cuisine wok is a deep convex pan in which chef cooks food (I’m sure you saw it, right ?) The name was given by a former Matra Datavision employee who was of Chinese origin and initiated it. WOK was used in MDTV in multi-site, multi-team, cross-platform environment who intensively worked on development of one product bundle (Euclid Quantum) that contained many components (including its foundation – CAS.CADE). The source base was huge, release cycles were asynchronous and this could not fit into one source control repository that would permanently built. Take into account state of the art hardware of that time (slow Sun workstations, networks, etc). So development environment had to reflect that and WOK really enabled that dispersed and hierarchical environment. You could take a piece of software and put into your local workbench or creates a tree of workbenches and develop.

Time passed, Euclid Quantum dissolved, CAS.CADE transformed into Open CASCADE, but WOK is still alive though not developed anymore. It’s pure legacy, and current investment into it would unlikely be justified. WOK still does its job enabling cross-platform and multi-team software development but it’s way too heavy for that. It’s like if you used a truck to drive to your office one mile away from your home.

In terms of building time WOK does not stand any competition with VS which builds in parallel threads. Building Data Exchange module in Release mode on quad-core Xeon processor took 13 minutes where WOK was crunching for 32.

My personal view on development environment is that you should borrow what is available outside. Don’t re-invent a wheel when others produce good ones you can freely use. There are options available out there, with wide adoption, with active evolution. Go and grab some. In our team in Intel we use SVN and SCONS-based build system (of course, with custom tweaks to setup a whole infrastructure around it – promotion process, unit and integration tests, build rotation, etc). I heard Google’s Chrome also uses SCONS. There are also cmake, qmake (Qt-tailored) and likely many other decent options.

Migration has a cost, but if you count long term benefits, you can change your mind. My humble opinion.

Building from scratch. Part 1

As I said in my previous post, we decided to integrate OCC into our team’s app test database at Intel. So I committed to enable that integration and proceeded to the task.

In order to get correct data when profiling an application (including using Intel Parallel Amplifier and Inspector), it must be compiled with /Zi option and linked with /DEBUG options to enable symbol information and store it in pdb. For performance tuning it should also be compiled with /O2 (maximize speed), and software vendors usually apply it. So, I needed all these three. This is not a default combination offered by Visual Studio (/Zi and /DEBUG come in Debug configuration with /Od and Release come with /O2 but without /Zi and /DEBUG). OCC ships *.vcproj files with those conventions. So I had to manually update all *.vcproj and for the Release config to set /Zi and /DEBUG. This was a first challenge, as OCC does not provide a single VS solution (*.sln) where I could select all projects and modify only once (fortunately Visual Studio provides multiple selection of the projects to change settings at once). So I had to reopen each solution one by one. I was luck as setting /Zi and /DEBUG could be done at the project file level and each source file (*.cxx) inherited these settings. This is not the case for optimization parameters (/O2) which are set in OCC shipped *.vcproj files on individual basis, for each source file. I believe this was done by intended oversight in some scripts generating *.vcproj from WOK. But this did not make my life easier two months ago when I initially had to set /Od for Release configurations (and I had to ‘grep’ project files and remove individual settings then). OCC folks in charge of the build environment and those who tweak OCC files might want to take this note into account.

Rebuilding OCC in MSVS (I am currently using 2005, by the way) was straightforward and seamless. Just had to reopen a new solution every few minutes and click Build Solution. The fact that there is no single global solution is a bit irritating though. I remember in the past we refrained from creating it because VS98 simply could not deal with it (running out of memory or something like that). Is it still the problem on newer VS (2003 and beyond), did anyone try ?
What was a bit annoying is number of warnings, sometimes really ugly (like intentional semicolon after some ‘if’ in Foundation Classes or constant arithmetic overflow in Modeling Algorithms). Right now in our team at Intel we are also fighting with warnings and set an aggressive goal to treat warnings as errors (we are not there yet however). Every code promotion will be rejected if it reveals warnings. Though it adds some overhead (and some developers complain) I believe this is beneficial for the product. So, if OCC team adopts that behavior sometime, we may see OCC compiled with 0 warnings ;-).

A few words on WOK. As I have to use my patched version of OCC 6.3.0 (to enable multi-threading and use some other modifications), I had to start with WOK. There are modifications in CDL extractor (that generates *.hxx from *.cdl) and in addition that are several new classes (with *.cdl definitions). Well, WOK is not something an average developer would (and should!) understand. It’s a beast in its own. Two months ago I spent a good deal of time to install it on my laptop and even this time when migrating to a lab workstation I had to be very careful in setting up an environment in multiple *.edl and *.bat files. Unless you spent several months or years with it, any mistake can cost you hours to figure out what goes wrong. I would seriously recommend you not to use it unless you really have to and understand what you are doing. WOK however was a great idea in 1990-es and let me shed some light on it.

(to be continued)


Open CASCADE inside Intel, or ... Intel inside

Just finished the meeting with my Intel colleagues where we decided that Open CASCADE would be integrated into an internal application database for testing our software products. I am a program manager in the team that is developing new products – Intel Parallel Amplifier and Parallel Inspector, which will be part of the Intel Parallel Studio (www.intel.com/go/parallel). They are approaching public Beta program (feel free to sign up by the way!) and are currently in hands of our few Beta customers. Guess who is among those few ?

Why did we chose Open CASCADE software for app testing? Because it helps make Intel products better ! It already allowed us to find several bugs before we released first Beta, and we are still discovering more with it. Despite that it’s single-threaded today there are steps to multi-threading. I am also going to use the multi-threading framework prototype for testing it in parallel mode (mentioned on the forum in October).

I really enjoy situations like this – when you can combine interests of several parties. It’s good for Intel, it’s good for Open CASCADE, it’s good for me ;-). I love to be a ‘deal maker’, and you ?

Disclaimer: Whenever I talk about Intel, this represents my sole personal opinion, it does not reflect any official position of the company. That’s a policy. I had to say it. I did.


Open CASCADE Handles. Let's handle'em. Part 3


3. Bypass DownCast() in critical places.
Handle(Standard_Transient) aTransient = new OCC_UT_Id(1);

for (i = 0 ; i < aNbCycles; i++) {
anId = Handle(OCC_UT_Id)::DownCast (aTransient)->Id();
for (i = 0 ; i < aNbCycles; i++) {
Handle(OCC_UT_Id)& anIdHTmp = *((Handle(OCC_UT_Id)*) &aTransient);
anId = anIdHTmp->Id();
c. for (i = 0 ; i < aNbCycles; i++) {
OCC_UT_Id* anIdPTmp = (OCC_UT_Id*)aTransient.Access();
anId = anIdPTmp->Id();

a. takes 1.75s, b. and c. – ~0.04s, or are 40+ times faster (in some runs 100+) !

You will find use of b. in BRep_Tool methods, by the way.

However beware of cases with potential problems. Never use direct cast unless you can reliably check, if you can safely use it. For instance, the following code throws an exception instead of returning a null handle:
TopoDS_Face aFace;
TopLoc_Location aLoc;
Handle(Geom_Surface) aSurf = BRep_Tool::Surface (aFace, aLoc);

These were my insights. Anybody to share others ? I’d love to hear.

As Handles() are most widely used classes, their implementation must be flawless. In this regard, I wonder if use of UndefinedHandleAddress (equal to 0xfefd0000 or 0xfefdfefdfefd0000 on 64-bit platforms – see Hande_Standard_Transient.hxx) to denote a null handle makes any sense. Perhaps, plain zero would be enough, and this would save on comparison operators. Is someone willing to experiment ? OCC folks ?

*Multi-threading considerations*
In the conclusion, let me add that Handle() is not (yet?) thread-safe. You need to protect a handle instance with critical section (e.g. with Standard_Mutex) to use concurrently. Otherwise you may have a data race (concurrent access to unprotected data) and may end up with broken reference counter and consequently memory leaks, access violations, or other headaches.

Well, that was it, folks. So, how useful was it ? Please post your comments and tell me what you think. This is important for me. Thanks.


Open CASCADE Handles. Let's handle'em. Part 2


*Standard_Transient and MMgt_TShared*
If you are attentive enough, you will likely notice that most handle classes inherit MMgt_TShared, while it does not add any value to Standard_Transient. The comments in the cdl file claims it supports reference counting. Well, reference counter is inside Standard_Transient ! I dove into the source code of Open CASCADE 3.0 (released in 1999, the first open source release) and it is the same. So, it seems this change was done more than ten years ago, but the orphan MMgt_TShared is still alive. So, I am now using Standard_Transient only and advised OCC folks to remove MMgt_TShared eventually. So, you might want to migrate to Standard_Transient as your base class as well.

*Cyclic dependencies*
The underlying objects won’t get destroyed if you have cyclic dependence (i.e. one object referring to the other), and you will end up with a memory leak. For instance the following class is prone to such a cyclic dependency:

class OCC_UT_Employee : public Standard_Transient
OCC_UT_Employee (const Handle(OCC_UT_Employee)& theBoss) : myBoss (theBoss) {}
const Handle(OCC_UT_Employee)& Boss() const { return myBoss; }
void AddDirectReport (const Handle(OCC_UT_Employee)& theReport);

Handle(OCC_UT_Employee) myBoss;
TColStd_ListOfTransient myDirectReports;

As Foundation Classes User’s Guide suggests you either have to use pointers in some case (e.g. OCC_UT_Employee* myBoss) or nullify some references first (myBoss.Nullify()).

*Handle overhead*
Like for anything in this world, you have to pay for convenience of using a handle. Any assignment or deallocation leads to execution of several instructions (comparisons, function calls, branching, etc). Keep that mind. If you are using handle in performance-critical places, think how you can optimize. Some hints on top of my head:

1. Make DownCast() to some expected type first and then use inside a cycle.
Compare two implementations:

static void CheckEmployee (const Handle(Standard_Transient)& theEmployee)
for (int i = 0 ; i < aNbCycles; i++) {
Handle(OCC_UT_Employee) anEmp = Handle(OCC_UT_Employee)::DownCast (theEmployee);

static void CheckEmployee (const Handle(Standard_Transient)& theEmployee)
Handle(OCC_UT_Employee) anEmployee = Handle(OCC_UT_Employee)::DownCast (theEmployee);
for (int i = 0 ; i < aNbCycles; i++) {
Handle(OCC_UT_Employee) anEmp = anEmployee;

With aNbCycles equal 10 millions, on my laptop, the former takes 1.6 secs, and the latter – 0.42 secs, or 3.8X speed up.

2. Avoid copies
Make your class methods return const Handle()& whenever possible and assign to the same type in the caller, rather than using plain Handle(). Consider the following two cases:

a. Imagine you initially defined the Boss() method in OCC_UT_Employee class as follows:
Handle(OCC_UT_Employee) Boss() const { return myBoss; }

and use it as follows:
static void CheckBoss (const Handle(Standard_Transient)& theEmployee)
Handle(OCC_UT_Employee) anEmployee = Handle(OCC_UT_Employee)::DownCast (theEmployee);
for (int i = 0 ; i < aNbCycles; i++) {
Handle(OCC_UT_Employee) aBoss = anEmployee->Boss();

b. Now you switch to const Handle()&:

const Handle(OCC_UT_Employee)& Boss() const { return myBoss; }
static void CheckBoss (const Handle(Standard_Transient)& theEmployee)
Handle(OCC_UT_Employee) anEmployee = Handle(OCC_UT_Employee)::DownCast (theEmployee);
for (int i = 0 ; i < aNbCycles; i++) {
const Handle(OCC_UT_Employee)& aBoss = anEmployee->Boss();

Speed up is from 0.26s to 0.03s, or 8.5x ! However, admittedly, Open CASCADE code itself is contaminated with these issues. OCC folks: if you are reading this, please take a note ;-).

(to be continued)

Open CASCADE Handles. Let's handle'em. Part 1

Let me start the first article with something simple yet important if you want to develop on Open CASCADE.
You likely noticed that many classes inherit Standard_Transient, directly or via other ancestors, e.g. Geom_Surface or AIS_InteractiveObject, and that in the code that are used as Handle(Geom_Surface).

Open CASCADE tries to never use pointers (at least in API). Whenever it needs a shared object it uses a handle.

Handle is a well known concept often referred to as a smart pointer. The Boost library (www.boost.org) , for instance, has several classes for smart point and, intrusive_ptr seems to be the closest equivalent to OCC’s Handle. Qt (www.trolltech.com) has QPointer.

Handle provides you with a mechanism to automatically refer to an object without a headache of destruction. The underlying object (Standard_Transient descendant) will get destroyed when the last handle pointing to it is destroyed.
In addition to that handle provides safe type recognition and downcasting. Read Foundation Classes User’s Guide for more details.

Unlike Boost or Qt which define smart pointers with templates, Open CASCADE uses two parallel explicit class hierarchies: one deriving from Standard_Transient and another from Handle_Standard_Transient. When CDL extractor generates a header file, or when you are using a macro DEFINE_STANDARD_HANDLE, then you end up with a new handle class. I believe a choice in favor of hierarchy of Handles was done for historical reason when old compilers in 1990es did not support templates well. Presumably, for the same reason TCollection classes are also based on #defines (until in 2002 there appeared NCollection which is template-based).

For the sake of truth, I must note that there is yet another parallel hierarchy – for so called persistent classes inheriting Standard_Persistent. But as they are almost never used directly we can omit them in this article. But everything said here, applies to those classes as well.

To create your handle class you need to use the following macros defined in Standard_DefineHandle.hxx:



For instance :

class OCC_UT_Id : public Standard_Transient
OCC_UT_Id() : myId (0) {}
OCC_UT_Id(const Standard_Integer theId) : myId (theId) {}
Standard_Integer Id() const { return myId; }
Standard_Integer myId;


Most curios folks might have noticed that DEFINE_STANDARD_RTTI does not really need an argument as it translates into method declaration inside the class, but it’s kept for better consistency ;-).

Until 6.3.0, the Open CASCADE CDL extractor generated explicit code for handle (e.g. Handle_Geom_Surface.hxx). I have recently sent a modified version thereof to the folks in the company so that it now generates macros-using header making code cleaner. May it eat own food.

Another note. On the forum I saw people using macros IMPLEMENT_STANDARD_RTTI along with IMPLEMENT_STANDARD_TYPE, IMPLEMENT_STANDARD_SUPERTYPE,
IMPLEMENT_STANDARD_SUPERTYPE_ARRAY, etc trying to imitate CDL extractor (what it puts into drv subdirectory). Don’t bother with that, folks. IMPLEMENT_STANDARD_RTTIEXT will do all the work for you, and you code will be cleaner.

(to be continued)


My first blog post. Introduction

This is my first experience as a blogger. Ever. No idea if it will work out or not. Let's see together.

I had been working at Open CASCADE for 7 years till 2004, before moved to Intel where I am now. These were fantastic years, that gave me invaluable experience in software development and project management, customer relationship, and working across multiple geos and cultures.

My path started in 1997 as a software engineer in the CAD Data Exchange team, and this proved to be lucky as it allowed to learn much of the OCC code, as data exchange employs multiple OCC algorithms. If you ever happened to look into the source code of IGES, STEP, or Shape Healing modules you have probably noticed either my full name there (e.g. in IGESToBRep_IGESBoundary.cxx) , or an acronym 'rln' (e.g. in ShapeFix_Wire.cxx). Every Open CASCADE employee has an acronym, widely used internally (I remember meeting minutes containing something like "Attendees: ABV, PDN, GKA, SMH, SZV, ..."). RLN was mine. By the way, it was funny not to find this system in Intel which is overwhelmed with TLAs (Three Letter Acronyms), people use first names here (sometimes with a first letter of a last name - for example, Roman S - if there are several Romans in context).

I'm still in quite close touch with the Open CASCADE team, we are friends with many guys there. Even at Intel from time to time I dive into OCC to check what is new there, and take a chance to develop something(a few weeks ago I prototyped a framework for multi-threading in OCC).

So why a blog ? What is it for ?

- To share my inspiration. Sometimes I cannot resist to reply being caught by some question on the forum or to break for a lunch in the middle of some started prototype. Can you imagine a crazy guy who would develop a corporate product at home because his job did not imply his doing that in working hours? Well, that was about me in 2001 when I switched to full time management. Despite all its deficiencies, I believe OCC is a great product but substantially undervalued, imho. It does not have recognition it could have.
- To share knowledge and help others. There are people struggling on the forum asking same questions again and again, which I could answer in a matter of seconds. Reference documentation is not always enough, sometimes an overview would be a better option.

Is there anything in it for me ? First off, to learn more about who and how uses it. Visiting customers in the past, I saw impressive OCC-based apps so that I was ready to shout 'how did you do that ?'. So, if people would like share their tricks, I will be happy to learn with others. Software skills are important to maintain.

I am thinking to put my notes into the form of short (or not so short) articles that would be interesting for a community. So, if you already have some particular interest please advise what you would like to see.

A few clarifications:
- This blog is not supposed to be a one-way channel. Comments are encouraged and appreciated. Opinions and experiences are welcomed. If anyone will be ready to add his/her own article, then it will be great.
- I don't know all tiny details of OCC (and doubt there is a single person to meet that expectation). There are shadow areas, which I am not ready to comment upfront. But I can dig into it or share some insights as needed. Again, if there are people with deeper knowledge, their contribution will be invaluable.
- And yes, this blog is my personal initiative, in no way is it sponsored by Open CASCADE or any other parties.

Thank you.