2013-06-02

Significant side effects in OCAF/XDE 6.6.0

This is a heads-up for those who deal with OCAF documents and migrating to 6.6.0.

Issue: You may observe sporadic crashes especially when dealing with AIS_InteractiveContext and/or XDE documents.

Root-cause: The root-cause of the issues is side effects of the fix #23523. It has changed the order of destruction of the document contents.
In 6.5.4 and earlier, the destructor of TDocStd_Document let the TDF_Data destructor be called what caused the following order:
- the attributes got destructed in the reversed order in the tree (i.e. first destructed were the attributes at the deepest label 0:x1:x2:...:xn, last - at root label 0:)
- the attributes got destructed while still attached to their labels (i.e. Label() in destructor still returned a valid label)

With 6.6.0 the TDocStd_Document destructor explicitly calls TDocStd_Document::Destroy() which does the opposite:
- it calls TDF_Label::ForgetAllAttributes() which removes the attributes in the direct order, from the root to the deepest label
- it first detaches the attributes' label prior to nullifying the attribute, so Label() in destructor will now always return 0.

This new behavior has at least two side effects that affected my case:
1. If the OCAF document has an attached AIS_InteractiveContext, then a crash happens upon document destruction. The root-cause is that TPrsStd_AISViewer attached to a root label contains a handle of AIS_InteractiveContext and TPrsStd_AISPresentations (at sub-labels) contain AIS_InteractiveObject's. Each of the latter has a *pointer* (not a handle, to avoid circular dependencies) to that AIS_IC context. Now when the document is destroyed in a direct order, then AIS_IC gets destroyed first (as part of TPrsStd_AISViewer destruction). So all pointers in AIS_IO's become dangling. Then during destruction of TPrsStd_AISPresentations, the method AISErase() access those dangling pointers, leading to a crash.

The same will happen in similar cases, when you have a resource on "top" labels pointed to from the "deeper" labels.
The work-around for the AISViewer was to ensure another handle to AIS_IC outside of the document which would keep the object (i.e. refcount > 0) while the document gets destroyed.

2. In XDE, XCAFDoc_DocumentTool maintains a global map storing labels containing this attribute. This design already had a flaw and at some point I had addressed it with fix 23593. That time I made a note that a better fix would get rid of a global map.
With 6.6.0 behavior XCAFDoc_DocumentTool::Destroy() is no longer effective - as explained above, upon destruction the label is already null, so the global map keeps on containing orphan labels. Depending on memory allocation, a similar label node address can be created and an access to a map would incorrectly retrieve a mapped value for that label, leading to a crash.
The new fix (24007) resolves this by getting rid of global map and keeping self-contained XDE document.  It now just stores a reference (tree-node attribute) to uniquely find a document tool label  inside the document. The fix works fine for the documents stored with older versions of XDE.

Take-aways:
1. You currently cannot rely on the order in which the document gets destroyed.
2. You should avoid accessing the resources stored at other labels during attributes destruction. Perhaps for some cases, this would require tweaks outside the OCAF doc (like in the case of AIS_IO described above).
3. Deallocation of resources by attributes should not happen in their destructors. Instead this should be done in a redefined method TDF_Label::BeforeRemoval(). So, the first implementation of 23953 was suboptimial anyway.

Recommendations/requests to OCC:
1. My personal point is that previous "bottom-up" destruction order is more practical than current "top-down". Like in C++, destruction order should be opposite to creation one, and in most applications you populate the document from top to bottom.
2. I believe OCAF must document and maintain a guarantee of the destruction policy, so that the user's applications can rely on this guarantee. Is this something OCC team can plan to address ?

Thank you in advance,
Roman