Cannot run QC task anymore

Dear Experts,

I already installed O2 and the QualityControl software several times from scratch and although I can compile every package just fine, I end up with a segmentation violation if I run QC tasks.
I ran the following command:

o2-tpc-file-reader --tpc-track-reader ‘–infile tpctracks.root’ --input-type tracks | o2-qc --config json://…/Modules/TPC/run/tpcQCTracks_sampled.json

I uploaded the stack trace to my cern box:
https://cernbox.cern.ch/s/iEwEaEPNrAd1NBl

In the meantime I figured out that running without the graphical interface works and publishes the MOs correctly (as far as I can tell) on the qcg:
o2-tpc-file-reader -b --tpc-track-reader ‘–infile tpctracks.root’ --input-type tracks | o2-qc -b --config json://home/max/alice/QualityControl/Modules/TPC/run/tpcQCTracks_sampled.json

Thanks for the help & best,
Max

Hi,

Do you run it remotely ? via ssh ?

The debug gui does not work in such a case.

Cheers,
Barth

Dear Barth,

thanks for having a look. No, I run everything on my local machine. For now I can just develop the new code and don’t use the gui. However, I just want to avoid running into some trouble later down the line, clearly something does not work as intended.

Best,
Max

Hi,

Does it do it also for o2-qc-run-basic ?

Have you followed these instructions ? i.e. install glfw-devel?

Cheers,
Barth

Hi Barth,

yes, I followed the instructions, I did install libglfw3-dev (I hope this is what you mean) on my Ubuntu 22.04 system. This is also part of the prerequisites for AliPhysics, which runs just fine for me, but I made sure to also run sudo apt install libglfw3 libglfw3-dev and compile everything afterwards again. I checked the o2-qc-run-basic command, it also throws a segfault.

Best,
Max

Hi,

Ok, then I don’t know. Ubuntu is supported on a best effort for O2.
Perhaps @eulisse has an idea ?

Cheers,

Your graphics driver is trying to pickup the O2 LLVM which we use for other stuff and the two are incompatible. I am afraid the best option is to switch off the GUI using -b.

Could you also tell me what:

nm /home/max/alice/sw/ubuntu2204_x86-64/arrow/v11.0.0-alice1-12/lib/libgandiva.so.1100 | grep llvm | grep PMTopLevelManager | grep schedulePass

tells you?

Dear Giulio,

thanks for your comment, I am a bit puzzled why it works for all my colleagues who use Ubuntu systems. But as long as this is really not some underlying issue and just impacts the GUI it should be fine.
I executed the command you suggested and the output is the following:
nm /home/max/alice/sw/ubuntu2204_x86-64/arrow/v11.0.0-alice1-12/lib/libgandiva.so.1100 | grep llvm | grep PMTopLevelManager | grep schedulePass
0000000001faf340 t _ZN4llvm17PMTopLevelManager12schedulePassEPNS_4PassE
0000000001faf340 t _ZN4llvm17PMTopLevelManager12schedulePassEPNS_4PassE.localalias

Best,
Max

It depends on the graphics card driver, so if your colleagues have a different card / driver version it might misbehave. It only affects the GUI. I will try to do another pass to see if I can hide the incriminated symbols more (although nm already shows them as private, so I am puzzled).

Your graphics driver is trying to pickup the O2 LLVM which we use for other stuff and the two are incompatible. I am afraid the best option is to switch off the GUI using -b.

I am having a somewhat opposite problem, also on Ubuntu. My GLFW or GLX seems to pick the system LLVM, but then LLVM complains about use-dbg-addr argument registered twice:

[QualityControl/latest] ~/alice $> o2-testworkflows-diamond-workflow
: CommandLine Error: Option 'use-dbg-addr' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted (core dumped)

ld and gdb outputs show that indeed two different libLLVM are loaded, the latter being called by GLFW.

[QualityControl/latest] ~/alice $> LD_DEBUG=libs o2-testworkflows-diamond-workflow --run 2> ld_debug.log
Aborted (core dumped)
[QualityControl/latest] ~/alice $> grep 'calling init' ld_debug.log | grep LLVM
   1072998:	calling init: /home/pkonopka/alice/sw/ubuntu2204_x86-64/Clang/v15.0.7-6/lib/libLLVM-15.so
   1072998:	calling init: /lib/x86_64-linux-gnu/libLLVM-15.so.1
Program received signal SIGABRT, Aborted.
0x00007ffff2c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff2c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff2c42476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff2c287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fffe56393db in llvm::report_fatal_error(llvm::Twine const&, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
#4  0x00007fffe5639226 in llvm::report_fatal_error(char const*, bool) () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
#5  0x00007fffe5620e6e in ?? () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
#6  0x00007fffe56127bb in llvm::cl::Option::addArgument() () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
#7  0x00007fffe5549844 in ?? () from /lib/x86_64-linux-gnu/libLLVM-15.so.1
#8  0x00007ffff7fc947e in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7fc9568 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff2d74ce5 in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x00007ffff7fd0ff6 in ?? () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff2d74c88 in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x00007ffff7fd134e in ?? () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff2c9063c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x00007ffff2d74c88 in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007ffff2d74d53 in _dl_catch_error () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x00007ffff2c9012e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x00007ffff2c906c8 in dlopen () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x00007fffed89e3da in glLabelObjectEXT () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#20 0x00007fffed89e4cd in glLabelObjectEXT () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#21 0x00007fffed87c495 in ?? () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#22 0x00007fffed892705 in ?? () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#23 0x00007fffed884331 in ?? () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#24 0x00007fffed880dfc in ?? () from /lib/x86_64-linux-gnu/libGLX_mesa.so.0
#25 0x00007ffff2a96fc3 in extensionSupportedGLX (extension=0x7ffff2aba0ff "GLX_EXT_swap_control")
    at /jenkins/workspace/DailyBuilds/DailyO2-ubuntu2204/daily-tags.I3UajISd5Y/SOURCES/GLFW/3.3.2/3.3.2/src/glx_context.c:210
#26 _glfwInitGLX () at /jenkins/workspace/DailyBuilds/DailyO2-ubuntu2204/daily-tags.I3UajISd5Y/SOURCES/GLFW/3.3.2/3.3.2/src/glx_context.c:359
#27 0x00007ffff2a92e4d in _glfwPlatformCreateWindow (window=window@entry=0x79f0a0, wndconfig=wndconfig@entry=0x7ffffffe2360, 
    ctxconfig=ctxconfig@entry=0x7ffffffe22d0, fbconfig=fbconfig@entry=0x7ffffffe2310)
    at /jenkins/workspace/DailyBuilds/DailyO2-ubuntu2204/daily-tags.I3UajISd5Y/SOURCES/GLFW/3.3.2/3.3.2/src/x11_window.c:1981
#28 0x00007ffff2a8c0ef in glfwCreateWindow (width=1280, height=720, title=<optimised out>, monitor=0x0, share=<optimised out>)
    at /jenkins/workspace/DailyBuilds/DailyO2-ubuntu2204/daily-tags.I3UajISd5Y/SOURCES/GLFW/3.3.2/3.3.2/src/window.c:216
#29 0x00007fffedc7cfa3 in o2::framework::initGUI (name=0x7ffff7b2eb60 "O2 Framework debug GUI", error_callback=<optimised out>)
    at /home/pkonopka/alice/sw/SOURCES/DebugGUI/v0.8.0/0/DebugGUI/src/DebugGUI.cxx:42
#30 0x00007fffedf6b535 in ImGUIDebugGUI::initGUI (this=0x77e6b0, windowTitle=0x7ffff7b2eb60 "O2 Framework debug GUI", registry_=...)
    at /home/pkonopka/alice/sw/SOURCES/O2/dev/0/Framework/GUISupport/src/Plugin.cxx:76
#31 0x00007ffff7997876 in runStateMachine (workflow=..., workflowInfo=..., previousDataProcessorInfos=..., commandInfo=..., driverControl=..., driverInfo=..., 
    driverConfig=..., metricsInfos=..., varmap=..., driverServices=..., frameworkId=...)
    at /home/pkonopka/alice/sw/SOURCES/O2/dev/0/Framework/Core/src/runDataProcessing.cxx:1338
#32 0x00007ffff79aac62 in doMain (argc=1, argv=0x7fffffff70a8, workflow=..., channelPolicies=..., completionPolicies=..., dispatchPolicies=..., 
    resourcePolicies=..., callbacksPolicies=..., sendingPolicies=..., currentWorkflowOptions=..., configContext=...)
    at /home/pkonopka/alice/sw/SOURCES/O2/dev/0/Framework/Core/src/runDataProcessing.cxx:3029
#33 0x0000000000495324 in mainNoCatch (argc=1, argv=0x7fffffff70a8)
    at /home/pkonopka/alice/sw/SOURCES/O2/dev/0/Framework/Core/include/Framework/runDataProcessing.h:218
#34 0x00000000004956df in main (argc=1, argv=0x7fffffff70a8) at /home/pkonopka/alice/sw/SOURCES/O2/dev/0/Framework/Core/include/Framework/runDataProcessing.h:243

Has someone encountered a similar issue by any chance and managed to solve it?