This project is a static heap analysis technique based on the combination of a shape analysis style abstract heap model and points-to analysis style transfer functions. The overall design of this hybrid analysis and some preliminary experimental numbers can be in this paper. The interprocedural analysis is based loosely on the approach an approach described here. The use/mod and identity definitions are an updated implementation of the approach described here. These papers provide a general guide of how/what is in the source code, however the source code is likely to differ in non-trivial ways as there has been a fair bit of change as we have been merging our private codebases to produce the version that is available here.

The project also contains a simple intermediate representation (IR) and (a currently very hacked up) translator from .Net bytecode to the IR. This IR is loosely based on the LLVM IR.

We have licensed the code quite permissively (and are willing to discuss alternative licenses) to enable it to be used by both research, open source, and closed source applications. However, we ask that if you use this code you provide the appropriate ackowledgements and/or citations. We are happy to collaborate in making use of it or in providing advice on what can (and cannot) be easily done with it. Finally, we note this is an active research project, so there are some non-trivial limitations on what it can do and (at least for the near future) is expected to be subject to regular changes.

Sample Results
The analysis computes, for every reachable method in the program, a set of pre/post sbtract heap states. Below we show a sample pre/post state for the IndexDoc method in the LUIndex program.

The pre state shows the heap state at the entry of the method. We will refer to the papers for a more detailed explanation of all the information in the figures but we note that the analysis has discovered a simple points-to graph (or storage shape graph) structure that is a natural representation of the actual data structure organization in the program. Further it has discovered that most of the pointer sets (represented by the edges) do not contain aliasing pointers (are grey). The notable exceptions are the SegmentInfo dir fields and the shared strings that appear in one of the dictionaries (edges are shaded orange).


The post state for the method (shown below) is more complex as a number of new data structures have been created (nodes with the HI@ or HIE@ tags). Further the analysis has shaded nodes that may be used (but not modified) green and nodes that may be both used and have pointer fields modified red (or purple if only scalar fields are modified, for example the SegmentInfos object has a counter field use/modified but the pointer to the segs list is only read from). Again the analysis is able to infer that overall the amount of sharing is minimal and that the majority of the heap is structured via ownership relations. In particular the RamDirectory structure, where the RamDirectory node has an injective edge to the node representing the set of dictionary objects that the files fields refer to (e.g., each RamDirectory owns the corresponding dictionary). And similarly each dictionary owns the set of RamFile objects stored in it, each of which own their respective byte[] lists (and contents).


Last edited Jul 22, 2011 at 4:25 AM by DefaultSteve, version 6


No comments yet.