LLVM Error in the TPC reconstruction workflow

Hello,
is there any update on the issue? I am having the same error message.

Hi,

I get the same error that @cimordas has when I run a qc workflow using o2-qc while being connected to the o2-running machine via ssh. Adding the option -b to prevent the debug GUI from opening solves the problem.
However, not every workflow opening a debug GUI is affected in my case. The o2-tpc-reco-workflow for example works also without the -b option but it still does not open the debug GUI even though it usually does.

Another interesting thing is that opening the debug GUI works just fine when I am physically sitting on the o2 machine. Only when I am connected to it via ssh -Y user@machine it does not work, which is unfortunate doing home office.

I uploaded the output of

o2-tpc-reco-workflow --infile tpcdigits.root --output-type tracks | LD_DEBUG=files LD_DEBUG_OUTPUT=ld_QC_debug_files.txt o2-qc --config json://home/tklemenz/AliSoftware/QualityControl/Modules/TPC/run/tpcQCPID_direct.json

here.

Thanks for having a look.

Cheers,
Thomas

Hi Cindy,

I've just tried, for me the 3 commands you used work fine (on

ubuntu).

Cheers,
 Ruben

Could you please provide what:

LD_DEBUG=files

tells you?

Can you please run with:

LD_DEBUG=files o2-tpc-workflow

and tell me what you get?

Hi @eulisse,
I again stumbled about this problem. This time when running our TPC raw data monitor. I followed the suggestion with the LD_DEBUG=files and examined the output in more detail. What I found is (condensed):

file=libAfterImage.so.0 [0];  needed by /home/wiechula/software/alicesw/sw/slc7_x86-64/ROOT/v6-20-02-alice3-1/lib/libASImage.so.6.20.02
...
file=/lib64/libGLX.so.0 [0];  needed by /lib64/libAfterImage.so.0
...
file=libGLX_system.so.0 [0];  dynamically loaded by /lib64/libGLX.so.0
...
file=/usr/lib64/dri/tls/swrast_dri.so [0];  dynamically loaded by /lib64/libGLX_system.so.0
...
file=libLLVM-6.0-rhel.so [0];  needed by /usr/lib64/dri/swrast_dri.so

So it seems libLLVM from the system is pulled in via libAfterImage from the system, which is required by ROOT.

I then added -Dbuiltin_afterimage=ON in root.sh and recompiled.
At least for me this solved the problem. I can again run my code without the error. Also I don’t see any reference to the system llvm in the LD_DEBUG=files output any longer.

Hello,

Sorry, my previous message got deleted while I tried to format it better.
I have the same issue with different workflows:

o2-calibration-data-generator-workflow
o2-calibration-lhc-clockphase-workflow
o2-calibration-ccdb-populator-workflow

the suggestion by Jens:

diff --git a/root.sh b/root.sh
index ad09b51…9a66915 100644
— a/root.sh
+++ b/root.sh
@@ -141,6 +141,7 @@ cmake $SOURCEDIR
-Dshadowpw=OFF
-Dvdt=ON
-Dbuiltin_vdt=ON \

  •  -Dbuiltin_afterimage=ON                                                          \
     ${ALIEN_RUNTIME_REVISION:+-Dmonalisa=ON}                                         \
     -Dkrb5=OFF                                                                       \
     -Dgviz=OFF                     
    

did not work for me, only using “-b” does, but then I don’t have the GUI.

Any suggestion?

Chiara

I would like to attach the output of

LD_DEBUG=files o2-calibration-data-generator-workflow --lanes 10 --mean-latency 100000 --max-timeframes 500

but it tells me that “new users cannot upload files”. Anything I should do?

Chiara

Can you zip the data and put it to CERNbox?

Here it is:

https://cernbox.cern.ch/index.php/s/90AdcZEuuLs2x9K

Is it normal that no attachment can be added?

Chiara

I think the trace is:

file=libglfw.so.3 [0];  needed by /home/zampolli/SOFT/alibuild/ali-o2/sw/ubuntu1804_x86-64/DebugGUI/v0.1.0-34bc77ae9c-2/lib/libO2DebugGUI.so
...
file=libGL.so.1 [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libglfw.so.3
...
file=libGLX.so.0 [0];  needed by /usr/lib/x86_64-linux-gnu/libGL.so.1
...
file=libGLX_mesa.so.0 [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libGLX.so.0
...
file=/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0
...
file=libLLVM-9.so.1 [0];  needed by /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

So it seems that glfw from the system is the problem.
Not sure if it works, but you could try to comment the lines

prefer_system: "(?!osx)"
prefer_system_check: |
  printf "#if ! __has_include(<GLFW/glfw3.h>)\n#error \"GLFW not found, checking if we can build it.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null || \
  { printf "#if __has_include(<GL/gl.h>) && __has_include(<X11/extensions/XInput2.h>) && __has_include(<X11/X.h>)\n#error \"OpenGL is found. We build GLFW ourselves.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null && echo "OpenGL not found. This is a dummy package."; }

in glfw.sh in alidist and recompile.

@eulisse could say more I think.

Hi @wiechula,

I first tried to add -Dbuiltin_afterimage=ON to the root.sh but it did not fix the problem.
Then I commented the lines you suggested in the glfw.sh and rebuilt everything. Indeed glfw and the debug GUI were built by alibuild but still I get the following error when running
LD_DEBUG=files LD_DEBUG_OUTPUT=ld_debug_files.txt o2-tpc-reco-workflow --infile tpcdigits.root --output-type clusters,tracks

: CommandLine Error: Option 'help-list' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

The output of above command can be found here.

Hope that helps.
Cheers,
Thomas

Hi @tklemenz,
it seems now the problem ist:

file=libGLU.so.1 [0];  needed by /home/tklemenz/AliSoftware/sw/ubuntu1804_x86-64/O2/v1.2.0-1/lib/libO2GPUTracking.so

Can you try to comment line 25

set(GPUCA_EVENT_DISPLAY ON)

in the file

GPUTracking/CMakeLists.txt

and recompile o2

I think the actual issue is that the radeon driver uses LLVM and interferes with the ROOT cling version. In principle we protected against this, but apparently there is still something which does not work correctly. For the moment I would just use -b.

Hi @wiechula,

I did as suggested but the same error still appears.

The output of

LD_DEBUG=files LD_DEBUG_OUTPUT=ld_debug_files.txt o2-tpc-reco-workflow --infile tpcdigits.root  --output-type clusters,tracks

can be found here.

Cheers,
Thomas

Hi @tklemenz,

now I see again

file=libglfw.so.3 [0];  needed by /home/tklemenz/AliSoftware/sw/ubuntu1804_x86-64/DebugGUI/v0.1.0-34bc77ae9c-4/lib/libO2DebugGUI.so

are you sure glfw built by alibuild is used?

I did all the steps described above:

-Dbuiltin_afterimage=ON

in root.sh, comment

prefer_system: "(?!osx)"
prefer_system_check: |
  printf "#if ! __has_include(<GLFW/glfw3.h>)\n#error \"GLFW not found, checking if we can build it.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null || \
  { printf "#if __has_include(<GL/gl.h>) && __has_include(<X11/extensions/XInput2.h>) && __has_include(<X11/X.h>)\n#error \"OpenGL is found. We build GLFW ourselves.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null && echo "OpenGL not found. This is a dummy package."; }

in glfw.sh, and comment

set(GPUCA_EVENT_DISPLAY ON)

in GPUTracking/CMakeLists.txt and recompiled.

Is there a way to check which glfw is used?

Dear all,

I was wondering if by this time anyone has found a solution to this problem? I never managed to run the DebugGUI with an AMD graphics card installed. I opened a jira ticket in the past, but that never got resolved. I think the underlying problem is still that interfering LLVM versions are used by the radeon drivers and ROOT.

I have tried with the newest 5.12 kernel (and thus radeon drivers), aliBuild 1.8.2 and the latest alidist recipes. Still I cannot run e.g. the o2-sim-digitizer-workflow without the -b flag. I would really like to use the DebugGUI, but I only managed to run it on my old laptop with intel graphics. My operating system is Ubuntu 20.04.

I have also tried to do the steps described here, but no luck. I have uploaded the LD_DEBUG files to cernbox.

Some help would greatly be appreciated.

Cheers,
Ole

I think ROOT solved the issue at some point (maybe already in 6.22). Can you try bumping the version in your alidist?

@eulisse in https://github.com/alisw/root the latest tag is 6-20. I tried with the official root-project/root repo and from there version 6.24.
I recompiled with that one, but then I cannot even do a simulation:

o2-sim -n 10 -g pythia8 --skipModules ZDC
[INFO] This is o2-sim version 1.2.0 (d0b54880b1)
[INFO] Built by ALIBUILD:1.8.2, ALIDIST-REV:f4e5ce19b9e181c7711573dd7184e2479dfda8e4 on OS:Linux-5.4.0-51-generic
[INFO] Running with 4 sim workers
[INFO] CREATING SIM SHARED MEM SEGMENT FOR 4 WORKERS
Spawning particle server on PID 98576; Redirect output to o2sim_serverlog
Spawning sim worker 0 on PID 98578; Redirect output to o2sim_workerlog0
Spawning hit merger on PID 98579; Redirect output to o2sim_mergerlog
[INFO] Merger process 98579 returned
[INFO] Simulation process took 0.341745 s
o2-sim: /home/oschmidt/alice/sw/SOURCES/ROOT/v6-24-00/v6-24-00/interpreter/cling/lib/Interpreter/CIFactory.cpp:617: {anonymous}::collectModuleMaps(clang::CompilerInstance&, llvm::SmallVectorImpl<std::__cxx11::basic_string >&)::<lambda(llvm::StringRef, const string&, const string&, std::string&, bool, bool)>: Assertion `llvm::sys::fs::exists(originalLoc.str())’ failed.
Aborted (core dumped)