Very slow CMake O2 installation on macOS

I’d say today after a quick announcement during WP3, we’ll see if something is holding us from migrating, but I don’t think so.

Coming back to a question I had a couple of posts above.

Take a build (sub) directory after an aliBuild --defaults o2 build O2, with no module loaded, and execute only the install part :

> module list
No Modulefiles Currently Loaded.
> cd ~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
> time cmake -P ./cmake_install.cmake > /dev/null     
cmake -P ./cmake_install.cmake > /dev/null  0.61s user 1.48s system 93% cpu 2.243 total

Now do the same thing but with modules loaded :

> module list
Currently Loaded Modulefiles:
 1) alice             5) GEANT3/v2-5-5       9) protobuf/v3.0.2-1     13) lhapdf/v6.2.1-alice1-1           17) FairRoot/run3-3  21) HepMC3/v3.0.0+git_d43693ce0e-1  25) arrow/v0.9.0-alice1-1
 2) BASE/1.0          6) GEANT4/v10.3.3-3   10) pythia6/428-alice1-2  14) pythia/v8230-4                   18) Vc/1.3.3-1       22) Monitoring/v1.7.1-1             26) O2/run3-1
 3) GSL/v1.16-1       7) vgm/v4-4-7         11) boost/v1.66.0-2       15) nanomsg/v1.0.0+git_c52f1bedca-1  19) GSL/v1.16-3      23) Configuration/v1.4.1-1
 4) ROOT/v6-12-06-1   8) GEANT4_VMC/v3-5-7  12) yaml-cpp/v0.5.2-3     16) DDS/run3-1                       20) ROOT/v6-12-06-2  24) ms_gsl/1-2
❯ time cmake -P ./cmake_install.cmake > /dev/null
cmake -P ./cmake_install.cmake > /dev/null  4.37s user 13.73s system 98% cpu 18.322 tota

i.e. the install (of only xx libs in this case) (actually non-install as they are up to date) is 8x times slower !

Why ?

Cmake is invoking shell script to do parts of the build.

Can you check if time bash -c echo foo is slow when running under alienv enter?

Indeed already a simple bash is slower :

❯ module list
No Modulefiles Currently Loaded.
~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
❯ time bash -c echo foo
bash -c echo foo  0.02s user 0.04s system 92% cpu 0.065 total

❯ module load alice O2
~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
❯ time bash -c echo foo
bash -c echo foo  0.08s user 0.26s system 96% cpu 0.356 total

I suspect you have something in your bashrc / .bash_profile / .profile which is doing a lot of path lookups. Can you check? Also check with time bash --norc --noprofile -c echo foo if it’s better.

Actually tried already (and my regular shell is zsh btw) :wink: but it’s not helping

❯ module list
No Modulefiles Currently Loaded.
~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
❯ time bash --norc --noprofile -c echo foo
bash --norc --noprofile -c echo foo  0.03s user 0.04s system 89% cpu 0.081 total
~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
❯ module load alice O2
~/alice/sw/BUILD/1ada7dc013827f1d5181e49c0fad2a1786e78826/O2/Detectors/MUON/MCH
❯ time bash --norc --noprofile -c echo foo
bash --norc --noprofile -c echo foo  0.08s user 0.27s system 93% cpu 0.374 total

For staying in topic:

On @laphecet’s problem (probably deserving a new topic): it might also be that we force some variables not to be cached by CMake (in CMakeCache.txt), and the buckets machinery has to go through the whole $LD_LIBRARY_PATH as well.

one more observation (before moving to a new topic or a JIRA issue :wink: ). If I unset DYLD_LIBRARY_PATH, problem seems to be gone …

~/alice/o2-dev/O2  mch-simu*
❯ module purge && time bash -c echo toto
bash -c echo toto  0.02s user 0.03s system 92% cpu 0.054 total
❯ module load alice O2 && time bash -c echo toto
bash -c echo toto  0.07s user 0.20s system 98% cpu 0.279 total
❯ module load alice O2 && unset DYLD_LIBRARY_PATH && time bash -c echo toto
bash -c echo toto  0.02s user 0.02s system 88% cpu 0.043 total

For the record, I notice the very same behaviour :confused:

I think we just have too many entries in the LD_LIBRARY_PATH and the library lookup goes wild on Mac. One solution would be to have one place where we create a symlink forest with all the libraries in the dependency chain…

The problem is not LD_LIBRARY_PATH but DYLD_LIBRARY_PATH.
Which brings the question : why do we need DYLD_LIBRARY_PATH at all ?

That’s what MacOS uses as first lookup, LD_LIBRARY_PATH is a fallback to it.

In principle we should:

  • not use DYLD_LIBRARY_PATH
  • not disable SIP
  • use rpath properly

I guess we still need it for some weird reasons (like gSystem->Load() in ROOT).

I disagree on DYLD_LIBRARY_PATH / rpath. I think we should have only one entry point for it, but using rpath might result in even more subtle issues. We should however reduce the number of paths we have in DYLD_LIBRARY_PATH.

Say, the point is that it’s not the best thing asking Mac users to disable
SIP (but we can live with that). As for reducing the number of paths, I
fully agree. The issue with gSystem->Load() still stands so we might not
even have the option to get rid of those variables.

The alidist side of things is now merged for o2-dev-fairroot. On AliceO2 side a PR is approved and pending. So, in principle all the code pieces are there. @dberzano has the plan on how to apply it :slight_smile:

Edit: hmmn thats strange, I clicked the reply button of Giulios post from 6 days ago, but now my post is sorted at the very end, which puts it out of context…

You should select the portion of code of relevance as I just did and hit the Quote button that pops up :slight_smile:

Ahh, thx :slight_smile:

What about a read-only union fs? Anyways, symlink forest or union fs view, this will definitely invalidate any burned in RPATH ^^

BTW, @eulisse you seem to strongly oppose the usage of rpath. Could you elaborate on why is that ? At a glance I can only see advantages to it for us ?