Extreme fight with Extrema. Part 3


However this did not fully solve the problem of caching results. As I mentioned earlier, B-Splines are split into C2 intervals which are processed individually. For example, on one B-Spline there were 20 C2 intervals, so caching only worked within each of them but every time a new subrange of the another edge's range was called, all data were recalculated. So, I extended caching capabilities to support lists of caches and every time a new subrange of the 1st edge's range is used, points along subranges of the 2nd edge's range are retrieved from the cache. After this modification, D0() is called only 475K times, which is 570 times less comparing with initial baseline (273.2M). CPU time reduced by 2+x. Yes, I did it !

* New targets *
So, what remains ? Looking at new hotspots below, you can see that Extrema_ECCOfExtCC::Perform() remained among top 3.

What's inside ? Looking at its top 3 hotspots (1st and 2nd are shown below, and the 3rd is similar to the 2nd) are connected with access to the array of doubles TbDist2 (containing square distances between sample points along two curves).

I suspect these are due to data latency, e.g. cache misses, i.e. when referenced data are not available in processor cache and they are loaded from RAM, which is time consuming. When some data are loaded from memory, some adjacent memory is also loaded by the processor (it assumes that you may need some of it and does this in advance). With TbDist2 this is not quite the case – because for each element [i, j] eight neighbors are checked ([i-1,j-1], [i-1,j]…[i+1,j+1]). Data for most of them are not adjacent in memory and therefore there are likely many misses here. And due to millions of calls (which as you correctly guess did not change after above modifications) result in a hotspot. I am not sure if there is anything that can be done with this, and I will probably have to talk to my Intel colleagues to check what can be done in a case like this. Perhaps the only possible solution lies upstream in reducing a number of subranges edges are split into. Today, after brief emails with the OCC team, I have no clues if/how this can be done. So, this will be up to them to continue in that direction.

Two other hot-spots – PLib::EvalPolynomial(), BSplCLib::CacheD1() – are connected with calling first derivative on B-Spline curve. Geom_BSplineCurve::D1() is called when finding a root of function of minimizing a distance between two curves. BSplCLib::Bohm() is called 2.4M times and PLib::EvalPolynomial() – 30M times. I did not find the way to cache calculations (if you see such, please let me know) and probably the solution is also upstream.

* Next steps *
So, this where I am now, and frankly I am about to pause here ;-). I sent modifications to OCC for validation on the regression database (keep fingers crossed ;-)). If they are OK, I'm going to make similar modifications for 2D case to ensure consistency and perhaps some other polishing in Extrema for easier future maintenance.

Of course, there is a possibility to try multi-threading on BOPs but I am not yet up to this, as this will require deeper understanding of the algorithms and its concurrency possibilities. We'll see…

Meanwhile I'll try to post fixes on the SourceForge if anyone would like to try them. As usual, any feedback will be appreciated, especially if you witness performance improvements or degradation.


Please rate this post using the voting buttons below.

After the fist article I received a few more test cases, and even more, a set of modifications in BOP on performance improvements. They have been made by one of OCC customer, registered here as Solar Angel. Per his estimations speed up in Cut (as common part seems not affected) was from a few minutes to a matter of seconds. Cool! I was excited to get these fixes and get in touch with the guy again after several years !

Extreme fight with Extrema. Part 2


The first one of such was sqrt(), which was among top 5 hotspots. As it is a system function, I did not notice it until pressed a button on the Amplifier toolbar (the matter of fact is that by default Amplifier attributes system functions time to its callers).

After unwinding its stacks I found the reason. Extrema used gp_Pnt::Distance() (or sometimes gp_Vec::Magnitude()) everywhere both to compute exact distances and to find out if one point is closer than another. Well, this had a price for such an abuse – 6.5% of total CPU time. Without sacrificing any correctness, distances can be replaced with square distances. This is a hint for you – try to use SquareDistance() or SquareModulus() wherever possible instead of their countparts invoking sqrt(), if you have multiple calculations. So after made modifications, I managed to save about 5-6% (I did not eliminate all the calls).

Another finer improvement related to Geom_BSplineCurve. As we already noticed and will see more precisely below, calculations of points along B-Spline curves were intensively performed. This involves Geom_BSplineCurve::IsCacheValid() (to check if a local cache can be used for faster computations). Its time was 6.25%, and looking at the source and assembly code helped to understand why:

Reason – division (the fdiv instruction) by spinlenghtcache to normalize an original range into [0, 1). Actually it could be easily avoided and I simplified the code also eliminating multiple branching to increase readability (and possibly performance, as branching is expensive). Compare:

This gave about 60% time reduction of this function. Well, floating point operations are expensive, and even if subtraction is not as expensive as division, it still takes time.

*Calculations of B-Spline curve points*
Unwinding the stacks of the top hotspot PLib::NoDerivativeEvalPolynomial() helped locate the place where it was called from – Geom_BSplineCurve::D0() – which in its turn was called from Extrema (remember above mentioned tons of calls from inside ?). I put a counter and measured that number of calls was 272,3 millions ! Wow !

As nothing in the method itself promised any improvements, I had to made modifications upstream. In Extrema.


Until now Extrema architecture regarding curve-curve case was very inflexible. It always redid all computations flushing out results of any previous call. On the other hand, point-surface and curve-surface has some caching, so I wonder why developers missed it for this case.

Anyway, I extended the API to be able to first set every curve and/or its ranges and later perform calculations. To take advantage of that, I also had to modify IntTools_BeanBeanIntersector::ComputeUsingExtrema() to set curve for theRange2 (specified as its argument) only once. This was a first step and reduced number of calls to Geom_BSplineCurve::D0() / PLib:: PLib::NoDerivativeEvalPolynomial() to 8.6M.

(To be continued)

Extreme fight with Extrema. Part 1

Those who regularly follow my blog likely remember that after recent speed up of Boolean Operations (see Why are Boolean Operations so slooo...ooow?) there remained a test case from Pawel K that almost was not affected. So I accepted a challenge and spent more time on this. Many thanks to Pawel for providing these models. Though they appeared to be very peculiar (such as tiny ellipse arcs and huge B-Splines of 300+ poles and 40+ knots) they allowed to detect a problem with wider impact.

Bottom line first: achieved speed up is 2.3x (from 113s to 49s). Not bad, but not quite a breakthrough because elapsed time is still in order of dozens of seconds. There are some (not obvious) further directions for optimization but they will likely be even more time-consuming, so I am not yet ready to commit to that (moreover as I already spent half of my vacations ;-)).
But what is more important is that made modifications open a door to speed up other OCC algorithms, as modified was the core component – Extrema. Today's story is about changes in it…

So, as usual, I used Intel Parallel Amplifier (by the way, we just RTM'ed, Released to Manufacturing, the version for public beta which should be available in January) to understand hotspots.

They were totally different than in the cases I dealt with in my previous experiment and it promptly became obvious they related to edge-to-edge intersections (see stack on the right).

Looking at IntTools_BeanBeanIntersector::ComputeUsingExtrema() it became understandable what was happening.

Each pair of edges which is a candidate for intersection (e.g. if their bounding boxes intersect) is analyzed. Each edge is split into certain number of ranges (e.g. ellipse – into 40) and then these ranges are analyzed with the other edge's ranges. And though inside ComputeUsingExtrema() the 2nd range does not change, the Extrema object (used to calculate the smallest distance) is created from scratch and initialized with it every time. Diving further with the debugger into Extrema, I found out that B-Splines are split into C2 intervals (which on Pawel's model were about 20!). Along each curve interval, a few sample points (32 by default) are taken. Distances are measured on the fly and candidates are further passed to math_Function* for precise calculations. Inside it the number of sample points is sometimes magically increased by 2x (just for more reliable sampling, ah ?) So, tons and tons of calculations without any attempt to reuse what already been calculated some time before.

The main idea for optimization was obvious – cache and reuse. This would require some change in Extrema API (as it did not support that) which I eventually did. It is currently limited to 3D curves only, similar improvements for other cases are deferred until OCC folks (who I sent the fixes today to) can confirm there are no regressions.

But before proceeding to API redesign, I decided to make finer improvements to make sure tiny bottlenecks do not go away from radar due to larger-scale improvements.

(To be continued).


Sexy background

If you want to add a special touch to your application, here are a couple of hints on ‘personalizing’ your 3D view.

*Adding an image as background*
Use V3d_View::SetBackgroundImage() that accepts a filename to gif, bmp or xwd image and a placement option (center, stretched or tiled) defined by the Aspect_FillMethod enumeration. Calling its with Aspect_FM_NONE erases the image.
Here’s an original image and its use with the Aspect_FM_STRETCH option:

You may want to extend pre-built options, and for instance add an image to a bottom-right corner to add your company logo.

* Gradient background *
This is a frequently used background style in CAD applications. Open CASCADE does not offer a direct API to set it, but it can be implemented. The following is an excerpt from my extension of DRAWEXE:

static void UpdateGradientBackground (const Handle(Visual3d_Layer)& theLayer,
const Quantity_Color& theTopColor,
const Quantity_Color& theBottomColor)
int aWidth = ..., aHeight = ...; //e.g. QWidget::width() and height() for Qt-based apps
theLayer->Clear(); //make sure we draw on a clean layer
theLayer->SetViewport (aWeight, aHeight);
//now draw a polygon using top and bottom colors
//remember that default boundary is [-1,1;-1,1] and origin is in the left bottom corner

//check position for the middle color - if transition should be non-uniform then
//additional points should be inserted and techiques changes - 2 polygons instead of 1
theLayer->SetColor (theTopColor);
theLayer->AddVertex (-1,1);
theLayer->AddVertex (1,1);

theLayer->SetColor (theBottomColor);
theLayer->AddVertex (1,-1);
theLayer->AddVertex (-1,-1);

static int VSetBgColor (Draw_Interpretor& di, Standard_Integer argc, const char** argv)
Handle(V3d_View) V3dView = ViewerTest::CurrentView();
if ( V3dView.IsNull() ) return 1;

static Handle(Visual3d_Layer) aLayer;

if (argc == 4) {
if (!aLayer.IsNull()) {
//switch to a single color mode
aLayer->Destroy(); //explicit destruction is required as destructor
// will not be called (one reference remains in Visual3d_ViewManager)
V3dView->SetBackgroundColor (Quantity_Color (atof(argv[1]), atof(argv[2]), atof(argv[3]), Quantity_TOC_RGB));
} else if (argc == 7) {
Quantity_Color aTopColor (atof(argv[1]), atof(argv[2]), atof(argv[3]), Quantity_TOC_RGB);
Quantity_Color aBottomColor (atof(argv[4]), atof(argv[5]), atof(argv[6]), Quantity_TOC_RGB);
if (aLayer.IsNull()) {
Standard_Boolean aSizeDependant = Standard_True; //each window to have particular mapped layer?
aLayer = new Visual3d_Layer (V3dView->Viewer()->Viewer(),
Aspect_TOL_UNDERLAY, aSizeDependant);
UpdateGradientBackground(aLayer, aTopColor, aBottomColor);
} else {
di << "Usage : " << argv[0] << " {color(R G B) | top_color(R G B) bottom_color(R G B)} \n";
return 1;

return 0;

Here’s an example view:

Please rate this post using the voting buttons below.


License to kill. License to use

Famous James Bond 007 had a license to kill; the "00" designation in his code number meant he had a sanction to apply a deadly force. In order to use any software you also need a license. Let me repeat, *any* software, even one which you can download free of charge with a couple of mouse clicks.

When accepting a license you become bounded by a legal agreement that the software is available under. Tell me honestly, how many of you and how often do you read the license agreement shown during install screen that makes you select a radio-button "Yes, I accept this license" before clicking Next? I bet very few, and I’m not a role model either ;-).

The fact that you can freely download some software does not imply you can use it and distribute your software based on it at your own wish. The most famous example is GPL'ed software (General Public License) which is known to be 'viral', i.e. making your own software GPL'ed. This at least applies to GPL version 2; version 3 comes with some more sophisticated terms which I did not fully study yet.

So, what about Open CASCADE license ? Among most important aspects, I'd underline that it's quite permissive and allows you to use OCC in your project (open or closed source, free of charge or for a fee), that can be distributed under your own proprietary license. Like most other Open Source licenses it requires that you include a copy of the license into your distribution. All modifications to Open CASCADE software you might make must be made available in source code, under the same license to anyone.

The summary on the site says it is "LGPL-like". I must confess this was my own suggested wording which we put with the web team, when I worked at Open CASCADE. We put that note to contrast it with GPL. It was based on my then knowledge of the subject. Working at Intel, where licensing issues are explained as part of mandatory trainings I now view this is not exactly the case. LGPL is still quite viral (though much less than GPL) and is not too welcomed in commercial applications. The OCC Public license has quite different focuses.

I believe the OCC license is somewhat weird and should better be changed. There are several existing widely recognized and adopted Open Source licenses. Any time the company comes out with its own 'open source' or 'public' license, it creates a headache for potential users and company's lawyers to read it and to understand implied rights and obligations. For average people (even native English speakers), juridical vocabulary is Greek. Take the Tax Code of your country, open it and start reading on an arbitrary page. I bet you will have to re-read each paragraph several times before you get an idea of what it is about, not saying a word about tiny details where, as we know, the evil is. So, if you really want to give your software into the Open Source world, you better choose something existing.

In this regard, Intel's move in 2005 was very symbolic to discontinue its own (and, by the way, recognized and approved) Open Source license. Motivation ? Exactly that, stop license proliferation and ease Intel’s software adoption. Look what other successful people or companies do – learn and do the same ;-).

So, my modest suggestion to the Open CASCADE company is to consider favoring some well recognized Open Source license and to migrate to it. This will ease OCC adoption and will benefit all parties.

We can continue discussion in comments. So feel free to throw in your ideas !


Why are Boolean Operations so slooo...ooow ? Part 3


For my experiments I used my own patched version of Open CASCADE 6.3.0 that accumulated all the modifications for prototyping multi-threading I mentioned earlier on the forum. One set of the modifications in it are thread-safe versions of BSplSlib, BSplCLib and PLib. Original 6.3.0 uses static array which are re-used by further calls to ::D0(), D1(), etc what is obviously not thread-safe (or even non-reentrant), so a few months back I changed them to use local variables which would be allocated/destroyed upon every relevant *Lib function call. This was just fine for IGES import scenarios but working on BOPs I noticed that my code revealed performance regressions to 6.3.0. See the image below:

So it was obvious that constant allocations/destructions were not an option. I checked how many times reallocation (or actually allocation) was used on a BOP test case, and it was 11.7 millions; allocated buffers were from 2 to 264 doubles long. Well, I had to return back the previous approach (i.e re-allocation in the case when new requested buffer exceeded previously allocated one). But how to ensure re-enterability ? The answer was quite obvious – TLS, or Thread-Local Storage. That is, each thread has its own buffer, and it's only used by that thread. So, I wrote a class (let’s tentatively name it TLSManager) that contains a map (hash-table) of {Standard_ThreadId, set of buffers}, and returns a requested buffer depending on a thread id from which a buffer was requested.

Another obvious problem poped up – TLSManger must be thread-safe but using Standard_Mutex to protect it would be an overkill. There is a solution to this typical problem, which is a read-write-lock, i.e. an object that allows multiple concurrent read-accesses to a shared resource (in our case a map with buffer sets) and exclusive write-access (when a new set of buffers is created for a new thread). Well, I went and re-invented a wheel, and added a class OSD_ReadWriteLock. Doing this (like when adding other thread classes in OSD a while ago) I once again thought that OCC should rather bring in Boost (www.boost.org) than re-designing own wheels. Salome does use Boost, so OCC can obviously too. For instance, Boost also offers a template for TLS - http://www.boost.org/doc/libs/1_37_0/doc/html/thread/thread_local_storage.html.

Once the RWL class has been written, I created a simple test case to check it and used Intel Parallel Inspector for that. Inspector allows to identify memory errors (leaks, uninitialized memory use, etc) and thread errors (data races, deadlocks, etc). Thread checker is one of a kind (there are now other software as far as I know) but its overhead is substantial.

Well, this session with Inspector that was something! It reported data races, as if my RWL were simultaneously read and wrote into the same memory (object member).

I spent several hours and felt like a full dumb looking at my one page code trying to root-cause the errors and beating my head over the keyboard and anything else around. Crazy Friday evening at home! When I gave up, I recompiled my unit test to use Qt’s QReadWriteLock, and what ? Same data races!

That made me doubt even further; I wrote down all behavior scenarios on a paper sheet and reaffirmed that there cannot be any data races, everything was protected. It’s just plain crazy false positive (i.e. a report on a problem that actually does not exist)! I know that Inspector has false positives issues but I could not imagine it would beat me that much. So, I am looking forward to talking to my Intel colleagues about that. (Make a note for you, when you download Inspector and notice the same problem, you might want to recall my case – perhaps it will be some unfixed false positive ;-)).

OK, now being confident in my RWL and I went and tried TLSManger with RWL in BSplSLib. And ? The regression has gone ! This small overhead for RWL use (instead of former shared static buffers) became actually unnoticeable. Excellent!

With all those modifications, overall speed up vs 6.3.0 was about 4x on Open CASCADE test case. I tried a simpler model sent by forum participants, and it revealed 20x speed up. Very small cases running at a fraction of a second did not reveal substantial speed up. Thus, in general we can roughly project speed up in 3x-10x range in average.

There is still something to do, e.g. to design TLSManager in CDL and migrate PLib and BSplCLib to it. I will do this as time permits, hopefully sooner while my memory is fresh.

And, by the way, if you want to try out your models with a new version, please feel free to send me the models via email or a download link. Those who eager to get the fixes, just let me know ;-)

Looking back, I think that time spent on it was worth it. I do hope that these findings will inspire the OCC team to dig further and to find further rooms for improvements, beyond BOP. I hope that my colleagues at Intel will appreciate 14 bug reports and enhancement requests I compiled during these days, and that by a commercial release the tools will be even better than they are today. I was able to learn something new in depths of OCC Modeling algorithms, and this was good. Folks from the Community will benefit via future OCC releases that would hopefully include my modifications.

I will continue to prepare OCC test cases for app testing, and if there is anything interesting, I will share with you.

Let me add a few more words on performance. Being at Intel I now view it a bit differently than when working at a software development company. Guys, times are changing (or already changed if you want). Free lunch, when your app would run faster just with every new released processor, is over. Megahertz era is over. To make your application run faster you must make it multi-threaded and scalable. Performance is not just that your app runs faster. Higher performance means more features. Look at spell checker in MS Word – it checks as your type your document. It’s just because it’s fast enough and because it runs in a parallel thread.
If you want to stay competitive in the market, you must parallel. No other way. It’s challenging but fortunately there are tools to help with that. And I am happy that somehow I relate to them. Go and try Intel tools (www.intel.com/go/parallel).

Endorsement ? Well, may be. Sincere ? Absolutely ! (I practice what I preach ;-) )

Good luck !

P.S. Please rate this article using the voting buttons below.

Why are Boolean Operations so slooo...ooow ? Part 2


So, I dove into the code and found out that the constructors were plain initializers of internal fields (e.g. double, integer, pointer). All three constructors look similar. Unwinding stacks revealed the root-cause – objects were created using new[] operator (e.g. ptr = (void*) (new IntPolyh_Triangle [N]) ) which was called on *huge* amount of copies. Look at this:

Each IntPolyh_MaillagaAffinage constructor creates arrays of 10 000 points, 20 000 triangles, and 30 000 edges for each face. Are they all really used ? With the debugger I stepped all the steps where these arrays are filled in, and what I did find ? Very often they are filled in with less than 100 elements. A few dozens effectively used while allocating for dozens of thousands ?! Unbelievable !
Two additional observations:
1. initialized elements in the array have never been read and have always been rewritten.
2. effective number of elements can be easily calculated upfront (e.g. n * m)

So, I looked into all IntPolyh classes to ensure that this is a common usage model, so that I could easily fix this with a deferred initialization with a particular number of elements. As usually, life is not that simple as it sometimes seems. Some classes (e.g. IntPolyh_ArrayOfEdges) implied that a number of effective uses can grow over time, and this feature was really used during mesh refinement. Moreover I found that many classes implement the same pattern of an array – where there is a number of allocated elements and number of effectively used. However we already saw above how ineffectively that strategy could be used :-(.

IntPolyh contains 7 classes IntPolyh_Array* that implement this pattern and are implemented as code duplication.

So, I went and created a single generic class IntPolyh_DynamicArray which would allocate memory with Standard::Allocate() (in order to not call new[] and constructors) and could grow over time if previously allocated memory was not enough. All 7 classes became instances of this template what significantly reduces a size of code to maintain.

Next, I made deferred initialization of these arrays when the number of elements to fill them in with is known (e.g. IntPolyh_MaillageAffinage::FillArrayOfPnt()).

There are other possible easy improvements to be made such as inlining all relevant methods of _Point, _Edge, etc. I did not do this right now and leave this up to the Open CASCADE team.

After these modifications time attributed to TKGeomAlgo.dll (measured with Amplifier) decreased as much as 5x-18x (from 2 to 7.28sec vs original 36.07s) !!! Overall speed up was about 3.5x (see screenshot below).

So, this was a relatively ‘low hanging fruit’ to tear off. And it gave such impressive results. I believe there are more than what can be done to improve, and I encourage OCC folks to do more, up to and including multi-threading. I don’t know BOP internals but I would check if running face-face intersection in parallel threads is feasible.

However, this was not the end of my own research.

(to be continued)

Why are Boolean Operations so slooo...ooow ? Part 1

Open CASCADE Boolean Operations (BOPs) have frequently been claimed to be slow. Have anyone tried to find out why ?

As you probably remember, I recently mentioned in another post that at Intel we have decided to integrate Open CASCADE into our application testing database. So I took on a challenge to create a few test cases to regularly check Intel Parallel Amplifier and Inspector (part of new Intel Parallel Studio).

In addition to my recent test cases with IGES import which has been prototyped to run in multi-threading mode, this time I have proceeded to Boolean Operations (BRepAlgoAPI). I requested a few models on the forum but replies were surprisingly not numerous :-(. Anyway, I am thankful to Evgeny L, Prasad G, Pawel K, as well as to Igor F for their examples.

The bottom line. On relatively complex models, overall achieved speed up was from 4x (100+ faces in a model) to 20x (several dozens faces). Examples of reduced CPU time – from 80secs to 20s, from 30s to 1.4s. (Disclaimer: once this article has been drafted during last week-end, I experimented with another set of models sent by Pawel Kowalski. They revealed other different bottlenecks than mentioned below, and therefore described improvements do not affect them. I’ll be continuing my experiments as time permits and will hopefully post further findings)

* Story *
So let us follow the steps which have been made.

I have focused on the BopTools_DSFiller class which is central to the Boolean Operations (BOP) as it prepares the models by intersecting them so that later on fuse, common, and cut just take its results and reconstruct requested combination.

As a first test case, I took two models provided by my former colleagues at OCC who participated in Intel Parallel Studio beta program. These were two solids of 130+ faces each, and BopTools_DSFiller::Perform()took 67secs of CPU time.

I installed the latest build of Intel Parallel Amplifier (reminder: public Beta will be available in early January and you can subscribe already now here – www.intel.com/go/parallel). The only applicable analysis type was ‘Hotspot Analysis’ which identifies most CPU-consuming functions. Amplifier also offers ‘Concurrency Analysis’ and ‘Waits & Locks Analysis’ but these were irrelevant as BOPs currently run in single thread only, while they are tailored to multi-threaded apps.

* First findings *
Top functions that Amplifier reported were located in TKGeomAlgo.dll and related to the IntPolyh package. Not surprising as BOPs are based on meshes intersection and IntPolyh creates those meshes.
Top 3 functions – constructors of IntPolyh_Triangle, _StartPoint, and _Edge altogether took almost 20 seconds (see the image below).

(to be continued)


Adding colors and names to your application. Part 3


* Visualization of the shapes*

The XDE framework provides functionality to display contents in 3D viewer with the help of XCAFPrs_AISObject, which eventually inherits AIS_InteractiveObject and thus can be used in a usual manner.

Since XDE is OCAF-based you should couple it with AIS in OCAF-specific way, i.e. associating a driver (TPrsStd_Driver descendant) that creates an interactive object. In this case the driver is XCAFPrs_Driver that creates an instance of XCAFPrs_AISObject. Below is a code skeleton:

TDF_Label anAccess = aDoc->GetData()->Root();
Handle(TPrsStd_AISViewer) anAISViewer;
if (!TPrsStd_AISViewer::Find (anAccess, anAISViewer)) {
Handle(V3d_Viewer) aViewer = ...;
anAISViewer = TPrsStd_AISViewer::New (anAcces, aViewer);

// collect sequence of labels to display
Handle(XCAFDoc_ShapeTool) aShapeTool = XCAFDoc_DocumentTool::ShapeTool (aDoc->Main());
TDF_LabelSequence seq;
aShapeTool->GetFreeShapes (seq);

// set presentations and show
for ( Standard_Integer i=1; i <= seq.Length(); i++ ) {
Handle(TPrsStd_AISPresentation) prs;
if ( ! seq.Value(i).FindAttribute ( TPrsStd_AISPresentation::GetID(), prs ) ) {
prs = TPrsStd_AISPresentation::Set(seq.Value(i),XCAFPrs_Driver::GetID());

Here’s a 3D viewer screenshot.

In order to enable display of names, then XCAFPrs::SetViewNameMode() must be called with Standard_True (before display). Below is an example of 3D view with names display turned on:

Note that displayed text labels adversely impact performance, and in the case of numerous displayed labels, your viewer can become significantly less responsive.

* Non-OCAF based documents *

If for any reason you don’t use OCAF and want to exchange attributes with IGES and STEP, you will have to do this on your own directly accessing objects representing file entities. Look at source code of STEPCAFControl and IGESCAFControl packages to copy XDE behavior.

(The end)
Please rate this article using the voting buttons under the text.


Adding colors and names to your application. Part 2


OCC_UT_MyXDEApp could have inherited XCAFApp_Application and redefinition of Formats() and ResourcesName() would not have been required. But XCAFApp_Application’s constructor has been declared private (instead of protected) what disables inheritance :-(. Perhaps, OCC folks could correct that.

The document is created in a straightforward way:

Handle(TDocStd_Application) anApp = OCC_UT_MyXDEApp::GetApplication(); Handle(TDocStd_Document) aDoc; anApp->NewDocument ("XmlXCAF", aDoc);

The screenshot below shows the structure of a new document created with above application.

Note that XCAFDoc_DocumentTool has been assigned to the label 0:1:1:1 and it created a sub-tree with several pre-defined labels (XCAFDoc_ShapeTool, _ColorTool, etc).

OCC provides persistence for XDE-specific attributes in all three supported format (see StdResource) – standard textual, xml and binary. Some attributes (e.g. XCAFDoc_MaterialTool) are not stored in a file and are re-creating during read process.

TCollection_ExtendedString anErrorMsg;
anApp->SaveAs (aDoc, "C:\\Dev\\3dmodels\\sampledoc.xml", anErrorMsg);
if (anErrorMsg.Length()) {
std::cout << "Error occurred - " << TCollection_AsciiString (anErrorMsg).ToCString() << std::endl;


Here is a resulting XML file:

* Import /export with IGES and STEP *

To read/write colors and names from/to IGES or STEP you have to use classes {IGES,STEP}CAFControl_{Reader,Writer}, e.g. IGESCAFControl_Reader or STEPCAFControl_Writer.

IGESCAFControl_Reader aReader; IFSelect_ReturnStatus aStatus = aReader.ReadFile (aFileName); if (aStatus == IFSelect_RetDone) { Standard_Boolean aRes = aReader.Transfer (aDoc); }

Here is a document content after import of a sample file:

(to be continued)

Adding colors and names to your application. Part 1

If you have to exchange data with other applications via IGES or STEP (or perhaps other formats, if you are a commercial client of the Open CASCADE company), you might want to enrich your application with meta data in addition to geometry. We will consider names and colors which are often asked about on the Open CASCADE forum.

OCC provides a ready-to-use framework – called XDE (eXtended Data Exchange) – which is based on OCAF. XDE offers a pre-defined document sub-structure to store colors, names, layers as well as other attributes (see XDE User’s Guide for details). This is done through a set of attributes defined in the XCAFDoc package that provides API to access data.

*Basic definitions*

Let’s start with assumption that your application uses OCAF for data description. In order to make your OCAF document XDE-compliant, you need to add XCAFDoc_DocumentTool attribute to your label of choice. It will add other required attributes to the sublabels. The easiest way is to extend your application class deriving TDocStd_Application, for example as follows:

class OCC_UT_MyXDEApp : public TDocStd_Application
//singleton pattern
Standard_EXPORT static const Handle(OCC_UT_MyXDEApp)& GetApplication();

virtual void Formats (TColStd_SequenceOfExtendedString& theFormats)
{ XCAFApp_Application::GetApplication()->Formats (theFormats); }

virtual Standard_CString ResourcesName()
{ return XCAFApp_Application::GetApplication()->ResourcesName(); }

Standard_EXPORT virtual void InitDocument (const Handle(TDocStd_Document)& theDoc) const;

OCC_UT_MyXDEApp() {}



const Handle(OCC_UT_MyXDEApp)& OCC_UT_MyXDEApp::GetApplication()
static Handle(OCC_UT_MyXDEApp) anApp = new OCC_UT_MyXDEApp;
return anApp;

void OCC_UT_MyXDEApp::InitDocument (const Handle(TDocStd_Document)& theDoc) const
//create a child of the main label and put XCAFDoc_DocumentTool there (i.e.
//one level below comparing to default XDE)
TDF_Label aL = theDoc->Main().FindChild (1, Standard_True); //0:1:1
XCAFDoc_DocumentTool::Set (aL.FindChild (1, Standard_True), Standard_False); //0:1:1:1

(to be continued)