Significant side effects in OCAF/XDE 6.6.0

by - 12:04

This is a heads-up for those who deal with OCAF documents and migrating to 6.6.0.

Issue: You may observe sporadic crashes especially when dealing with AIS_InteractiveContext and/or XDE documents.

Root-cause: The root-cause of the issues is side effects of the fix #23523. It has changed the order of destruction of the document contents.
In 6.5.4 and earlier, the destructor of TDocStd_Document let the TDF_Data destructor be called what caused the following order:
- the attributes got destructed in the reversed order in the tree (i.e. first destructed were the attributes at the deepest label 0:x1:x2:...:xn, last - at root label 0:)
- the attributes got destructed while still attached to their labels (i.e. Label() in destructor still returned a valid label)

With 6.6.0 the TDocStd_Document destructor explicitly calls TDocStd_Document::Destroy() which does the opposite:
- it calls TDF_Label::ForgetAllAttributes() which removes the attributes in the direct order, from the root to the deepest label
- it first detaches the attributes' label prior to nullifying the attribute, so Label() in destructor will now always return 0.

This new behavior has at least two side effects that affected my case:
1. If the OCAF document has an attached AIS_InteractiveContext, then a crash happens upon document destruction. The root-cause is that TPrsStd_AISViewer attached to a root label contains a handle of AIS_InteractiveContext and TPrsStd_AISPresentations (at sub-labels) contain AIS_InteractiveObject's. Each of the latter has a *pointer* (not a handle, to avoid circular dependencies) to that AIS_IC context. Now when the document is destroyed in a direct order, then AIS_IC gets destroyed first (as part of TPrsStd_AISViewer destruction). So all pointers in AIS_IO's become dangling. Then during destruction of TPrsStd_AISPresentations, the method AISErase() access those dangling pointers, leading to a crash.

The same will happen in similar cases, when you have a resource on "top" labels pointed to from the "deeper" labels.
The work-around for the AISViewer was to ensure another handle to AIS_IC outside of the document which would keep the object (i.e. refcount > 0) while the document gets destroyed.

2. In XDE, XCAFDoc_DocumentTool maintains a global map storing labels containing this attribute. This design already had a flaw and at some point I had addressed it with fix 23593. That time I made a note that a better fix would get rid of a global map.
With 6.6.0 behavior XCAFDoc_DocumentTool::Destroy() is no longer effective - as explained above, upon destruction the label is already null, so the global map keeps on containing orphan labels. Depending on memory allocation, a similar label node address can be created and an access to a map would incorrectly retrieve a mapped value for that label, leading to a crash.
The new fix (24007) resolves this by getting rid of global map and keeping self-contained XDE document.  It now just stores a reference (tree-node attribute) to uniquely find a document tool label  inside the document. The fix works fine for the documents stored with older versions of XDE.

Take-aways:
1. You currently cannot rely on the order in which the document gets destroyed.
2. You should avoid accessing the resources stored at other labels during attributes destruction. Perhaps for some cases, this would require tweaks outside the OCAF doc (like in the case of AIS_IO described above).
3. Deallocation of resources by attributes should not happen in their destructors. Instead this should be done in a redefined method TDF_Label::BeforeRemoval(). So, the first implementation of 23953 was suboptimial anyway.

Recommendations/requests to OCC:
1. My personal point is that previous "bottom-up" destruction order is more practical than current "top-down". Like in C++, destruction order should be opposite to creation one, and in most applications you populate the document from top to bottom.
2. I believe OCAF must document and maintain a guarantee of the destruction policy, so that the user's applications can rely on this guarantee. Is this something OCC team can plan to address ?

Thank you in advance,
Roman

You May Also Like

5 comments

  1. Hello Roman,

    1. In my opinion, explicit "top-down" destruction is best choice. When we have simple data structures where attributes do not depend on each other, the destruction order can be arbitrary. In complex data structures (see e.g. TObj) it is more logical to put "parent" object at the root of the hierarchy of labels where "children" are stored. Then destroying parent before children allows it to destroy its data in manageable way.

    Note that complex data structures (like XCAF) have no definite order of creation of attributes, thus neither "bottom-up" nor "top-down" destruction can be considered as following creation order.

    2. We can start from documenting current behavior, if not yet done. You are welcome to contribute.

    Besides, note that the fix #23523 that you mentioned as the root-cause of this issue has been made to fix a problem of the same kind (crash on destruction) caused in turn by fix on #23489.

    Andrey

    ReplyDelete
  2. Hi Andrey,
    Thanks for offering your perspective. Indeed, in arbitrary case there is no predefined order about what refers what. It can be on case by case basis. However in order to put the resource management on user's shoulder OCAF should come with a documented policy so that the user can design respectively. Previous de-facto order existed for years, and even if not explicitly it formed expectations. Now the convention changed, again implicitly.

    The policy should come from and be enforced by the OCAF owner(s), the person/team in charge of maintaining code, doing code reviews and taking design decisions. It cannot be a contributor's community choice. So at this point, I view my part already performed via drawing your attention to the issue. Hopefully you will follow up by drafting the documentation, which I and others could review, contribute, etc.

    The problem *right now* is that the mainstream scenario of OCAF usage (bundling with AIS) has been compromised and users' codes became vulnerable. So right now users worldwide will have to tweak their codes to work-around changed destruction order convention. Please consider the following pseudo-code.


    - create OCAF document d
    - create AIS_IC ic
    - attach TPrsStd_AISViewer with ic (this will store a handle to ic inside and put viewer on root label)
    - create TPrsStd_AISPresentation and display an object via it (this will create a pointer inside AIS_IO to ic)
    - ic.Nullify(); //now refcount==1
    - close and nullify d

    This will cause an exception inside TPrsStd_AISViewer::AISErase() which is called by its BeforeForget(), as by this time ic will already be destroyed.

    To work-around that the users will have to tweak their source codes to hold a handle to ic during d's destruction. This may be not trivial as documents could be self-contained without any extra explicit AIS_IC. Example - the document is stored in low-level data model classes, while ic is added in visualization part which can be unloaded by the time data models gets destroyed.

    So you might want to address this scenario as part of OCC with the new convention.

    Thanks again,
    Roman

    ReplyDelete
  3. So how do you fix this issue? Should I downgrade to 6.5.4

    thanks

    ReplyDelete
  4. You can defer upgrade and wait until OCC comes with the fix for AIS (see recently opened 24047 http://tracker.dev.opencascade.org/view.php?id=24047), or you can work-around that issue by adding an extra handle to AIS_IC as explained in the post.

    ReplyDelete