One month ago, my diploma thesis has been accepted and I got my Bachelor of Science in Computer Science.
I made my diploma thesis at Lawrence Berkeley National Laboratory, Berkeley, California. I was in the team responsible of the developmenet of the ATLAS Software for the LHC in Cern. The title of my thesis is Inlining Assistance for large-scale object-oriented applications
The goal of this project was to create a C++ analyzer to find the best functions and call sites to inline. The input of the analyzer is a call graph generated by CallGrind of the Valgrind project.
The functions and call sites to inline are computed using a heuristic, called the temperature. This heuristic is based on the cost of calling the given function, the frequency of calls and the size of the function. The cost of calling a function is based on the number of parameters, the virtuality of the function and the shared object the function is located in.
The analyzer is also able to find clusters of call sites. A cluster is a set of hot call sites related to each other. It can also finds the functions that should be moved from one library to the other or the function that should not be virtual by testing the use of each function in a class hierarchy.
To achieve this project, it has been necessary to study in details how a function is called on the Linux platform. The inlining optimization has also been studied to know what were the advantages and the problems of this technique.
To retrieve the information about the sizes and the virtuality of the function, it has been necessary to read the shared libraries and executables files. For that, we used libelf. The virtuality of a function is calculated by reading each virtual table and searching for the function in the virtual tables content.
The graph manipulation is made by the Boost Graph Library. As it was an advanced library, it has helped me improving my skills in specific topics like templates, traits or Template Metaprogramming.
The analyzer is able to run on the Linux platform on any program that has been compiled using gcc.
We have tested the analyzer on the CLang compiler and have been able to obtain these results :
Cluster 1 and 2 indicates that we have inlined the first, respectively the second, cluster. Callsites_5 indicates that the first five call sites have been inlined and Functions_3 indicates that we inlined the first three functions. As we can see, the results are not really impressive, but we have only done inlining. Moreover, in some large applications, even little savings are really interesting. For example, in the ATLAS software, 1% of saving represents 100K$/year.
Here is the base outline of my work:
- Analysis : Function Call, Inlining, Compiler Inlining
- Information Extraction : Graph manipulation, parsing object files
- Information analysis : Heuristc, library issues, clustering, virtual hierarchy issues
- Tests : Tests on CLang, Tests on ATLAS
- Performance analysis : Performance of the analyzer
- Tools : Application used during the project