Memory consumption of the Analysis Framework during the build phase

mfasel · July 16, 2020, 4:07pm

Dear experts,

the analysis framework currently consumes a huge amount of memory, so much that we manage to get the process killed on our Ubuntu machines with 16 GB RAM. Would it be possible to reorganize the Framework library in a way that the analysis framework is separate from the core library so that it can be easily removed from the build in the case where it is not needed and where resources are sparse. Is there any way to reduce the memory consumption of the build of the Analysis Framework.

Thanks in advance!

Best regards

Markus

mfasel · August 3, 2020, 9:19am

Any chance that the memory during the build can be reduced? This doesn’t only affect O2, which itself can be problematic if the memory on the system is not enormously huge - see description above - but also packages depending on it, i.e. web applications like the QC GUI nowadays run on platforms like OpenStack or docker where resources are sparse, and as it relies on O2 simply due to the QualityControl dependency we can no longer build new versions of the QC GUI on our web platform due to the O2 build memory consumption. @bvonhall @eulisse.

pezzi · August 8, 2020, 6:39pm

Hi @mfasel, you can select the number of parallel compilation jobs specifying -j number_of_jobs to aliBuild. See aliBuild build --help . Try reducing the number of jobs to spare some memory.

mfasel · August 9, 2020, 9:03pm

Hi @pezzi,

thanks for the comment! I am aware of the option. The problem is that there are couple of classes in the analysis framework for which the compilation consumes 2-3 GB at least on out Ubuntu 18.04 system, and when the resources are sparse, i.e. in virtual machines, then even compiling with -j1 won’t help.

Cheers

Markus

eulisse · August 10, 2020, 6:39am

PRs build fine in 12GB of RAM on Centos7 VMs. While not ideal and I have a few ideas on how to improve, I would be interested in having more details in what takes memory on Ubuntu (you say “a couple of classes” can you elaborate?) and what compiler you are using (is this our 7.3 or ubuntu’s?). I’ve seen in the past std::tuple and various boost bits being extremely heavyweight with GCC, so we are now trying to limit their usage, but more details about your case would help.

mfasel · August 10, 2020, 8:05am

Hi @eulisse,

Thanks for your reply!

Our machines run Ubunutu 18.04 with gcc from the Ubuntu:

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright © 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And have 16 GB RAM. I cannot yet tell which classes are responsible because top just tells me the cc process - I will need to check which process belongs to which file. Currently we can build o2 on this machines only if we limit the processes for building o2 to 1-2. We also have a ubunutu 18.04 VM on CERN OpenStack for our instance of the QCGUI which has 4 GB of memory, and there (which is in the QCG dependency list - why is a separate question as it doesn’t make use of it) does not build at all even if I reduce to 1 process. I did not run into such issues with macO2, where O2 builds with significantly less memory.

Cheers

Markus

mfasel · August 10, 2020, 5:32pm

An class which seems to be very consumptive in memory allocation for the build is the AODReaderHelpers.

DEBUG:qcg:O2:qcg: [  7%] Building CXX object Framework/Core/CMakeFiles/O2lib-Framework.dir/src/AODReaderHelpers.cxx.o
DEBUG:qcg:O2:qcg: virtual memory exhausted: Cannot allocate memory
DEBUG:qcg:O2:qcg: Framework/Core/CMakeFiles/O2lib-Framework.dir/build.make:91: recipe for target 'Framework/Core/CMakeFiles/O2lib-Framework.dir/src/AODReaderHelpers.cxx.o' failed

In general it would also be good to think about factorization of the product. Why i.e. do I worry on the FLP about the analysis framework?

Cheers

Markus

eulisse · August 10, 2020, 5:53pm

Because that’s how CMake works when you have multiple projects. You cannot build a “partial project” and use it from another one, as far as I know. Either we need to consolidate in one project or we need to change configuration tool. Alternatively if you have CVMFS in you container you could try the new alibuild mode where you source QCGUI init.sh and then build directly QCGUI with cmake / make, reusing externals from the CVMFS installation.

wiechula · August 13, 2020, 12:29pm

Dear @eulisse,

is this new build mode documented somewhere? Can it also be used for o2? I think you mentioned something like this in the TPC GPU meeting.

Cheers,
Jens

eulisse · August 13, 2020, 1:05pm

I added the PR url in the chat and sent a mail to WP3. The docs are at:

I am waiting for @mpuccio to green light it before we merge.

mfasel · August 18, 2020, 8:19am

One more thing: it seems we are running now analysis tutorial dpl workflows during the build phase and this takes a huge amount of time and CPU time. Actually iI get the suspicion that they don’t terminate themselves so the build takes forever. Any chance to abandon this again?

pkonopka · October 21, 2020, 7:10am

Hi, I am bumping this old thread, because I can’t compile O2 with -j5 on my CC7 machine with 8 GB RAM anymore. It fails with internal compiler error when building analysis and I was observing very high memory consumption then:

[ 94%] Building CXX object Analysis/Tasks/PWGLF/CMakeFiles/O2exe-analysis-cascadeproducer.dir/cascadeproducer.cxx.o
[ 94%] Linking CXX executable ../../../stage/bin/o2-analysis-hf-task-dplus
[ 94%] Built target O2exe-analysis-hf-task-dplus
Scanning dependencies of target O2exe-analysis-cascadeconsumer
[ 94%] Building CXX object Analysis/Tasks/PWGLF/CMakeFiles/O2exe-analysis-cascadeconsumer.dir/cascadeconsumer.cxx.o
c++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
Analysis/Tasks/PWGLF/CMakeFiles/O2exe-analysis-cascadefinder.dir/build.make:81: recipe for target 'Analysis/Tasks/PWGLF/CMakeFiles/O2exe-analysis-cascadefinder.dir/cascadefinder.cxx.o' failed
gmake[2]: *** [Analysis/Tasks/PWGLF/CMakeFiles/O2exe-analysis-cascadefinder.dir/cascadefinder.cxx.o] Error 4

Even two jobs were too much, though I managed to compile it with -j 1.

Is anyone else experiencing similar problems recently?

mfasel · October 21, 2020, 7:48am

Hi, On the Ubuntu 18.04 node with 16 GB of memory I can only compile O2 with “-j 1”, already for 2 processes compilation runs out of memory. Classes which seem to be very memory consumptive are the AOD reader helpers and several of the analysis tasks.

Cheers

Markus

rehlersi · October 21, 2020, 9:06am

I’m running into the same sort of issue. The build has managed to lockup our cluster build machine more than once. The only workarounds that I’ve found are to:

Run the entire build with j1 or 2.
Slightly more efficient: hack the o2 build recipe, and hope that I’ve reduced the core count enough (either 1 or 2).

Both of these workarounds make it difficult to iterate on code. Can anything be done?

Thanks,
Raymond

bvonhall · November 3, 2020, 7:34am

Dear all,

I have created a ticket to follow up this issue: https://alice.its.cern.ch/jira/browse/O2-1805

Cheers,
Barth