LLVM Error in the TPC reconstruction workflow

Dear all,

I am currently trying to run the simulations for the TPC. My alidist, O2 and QC are all up-to-date from yesterday evening and I am working on Ubuntu 18.04.
In a fresh terminal, I enter the QC environment using

alienv enter O2Suite/latest-o2

(but I also tried with loading it again in a fresh terminal with

eval `alienv -w /home/cindy/aliceO2/sw/ load QualityControl/latest-o2`

with the same results)
Then in an empty folder, I run consecutively

o2-sim -m TPC [ITS PIPE] -n 100
o2-sim-digitizer-workflow -b
o2-tpc-reco-workflow --infile tpcdigits.root --output-type clusters,tracks

The two first commands work without problems, but the third one exits with the error

: CommandLine Error: Option ‘help-list’ registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

Thank you for your help,
Best,
Cindy

There is a workaround: start workflows with the -b option. So for now we are fine.
It would still be interesting to know why this error occurs, though.

Cheers,
Thomas

I observed the same and to me it seems that there is a problem with library loading. It looks like some llvm library is loaded twice. once from he system and once from alibuild:

#0  0x00007f0622288420 in llvm::report_fatal_error(char const*, bool)@plt () from /home/wiechula/software/alicesw/sw/slc7_x86-64/Clang/v9.0.0-10/lib/../lib/libLLVMSupport.so.9
#1  0x00007f06222c9680 in (anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, llvm::cl::SubCommand*) () from /home/wiechula/software/alicesw/sw/slc7_x86-64/Clang/v9.0.0-10/lib/../lib/libLLVMSupport.so.9
#2  0x00007f06222c97bd in llvm::cl::Option::addArgument() () from /home/wiechula/software/alicesw/sw/slc7_x86-64/Clang/v9.0.0-10/lib/../lib/libLLVMSupport.so.9
#3  0x00007f060b94fb61 in __static_initialization_and_destruction_0(int, int) [clone .constprop.362] () from /usr/lib64/libLLVM-6.0-rhel.so

And this part is in the debug gui. That’s by the batch mode masks this problem.

Can you please run with

LD_DEBUG=1 the-failing-workflow

and provide me the logs?

LD_DEBUG=1  o2-tpc-reco-workflow --infile tpcdigits.root  --output-type clusters,tracks 
warning: debug option `1' unknown; try LD_DEBUG=help

LD_DEBUG=help  o2-tpc-reco-workflow --infile tpcdigits.root  --output-type clusters,tracks 
Valid options for the LD_DEBUG environment variable are:

  libs        display library search paths
  reloc       display relocation processing
  files       display progress for input file
  symbols     display symbol table processing
  bindings    display information about symbol binding
  versions    display version dependencies
  scopes      display scope information
  all         all previous options combined
  statistics  display relocation statistics
  unused      determined unused DSOs
  help        display this help message and exit

To direct the debugging output into a file instead of standard output
a filename can be specified using the LD_DEBUG_OUTPUT environment variable.

sorry, I meant LD_DEBUG=files

Ok, I ran with

LD_DEBUG=files LD_DEBUG_OUTPUT=ld_debug_files.txt o2-tpc-reco-workflow --infile tpcdigits.root  --output-type clusters,tracks

and put the output here:
https://cernbox.cern.ch/index.php/s/qoS2V8he6GfTwME
It created many files, I guess one for each process.

Thanks for looking.

Hi,
I ran into the same problem with the latest checkout / alidist.
o2-sim-digitizer-workflow is affected as well

The full LD_DEBUG output can be found at:
https://cernbox.cern.ch/index.php/s/29DFbmGRyXxZRgK

I get several symbol lookup errors there:

[O2/latest-dev-o2] ~/alice/sim/10 $> cat ld_debug_reco | grep -i error
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCore.so.6.18: error: symbol lookup error: undefined symbol: usedToIdentifyRootClingByDlSym (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCore.so.6.18: error: symbol lookup error: undefined symbol: usedToIdentifyRootClingByDlSym (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCore.so.6.18: error: symbol lookup error: undefined symbol: usedToIdentifyStaticRoot (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: usedToIdentifyRootClingByDlSym (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: __dso_handle (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: __dso_handle (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD0Ev (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD0Ev (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD1Ev (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD1Ev (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD2Ev (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: _ZN4TMVA10IPruneToolD2Ev (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: atexit (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: atexit (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCling.so: error: symbol lookup error: undefined symbol: at_quick_exit (fatal)
3257: o2-tpc-reco-workflow: error: symbol lookup error: undefined symbol: at_quick_exit (fatal)
3257: /home/johannes/alice/sw/ubuntu1804_x86-64/ROOT/v6-18-04-alice1-1/lib/libCore.so.6.18: error: symbol lookup error: undefined symbol: usedToIdentifyRootClingByDlSym (fatal)

Hello,
is there any update on the issue? I am having the same error message.

Hi,

I get the same error that @cimordas has when I run a qc workflow using o2-qc while being connected to the o2-running machine via ssh. Adding the option -b to prevent the debug GUI from opening solves the problem.
However, not every workflow opening a debug GUI is affected in my case. The o2-tpc-reco-workflow for example works also without the -b option but it still does not open the debug GUI even though it usually does.

Another interesting thing is that opening the debug GUI works just fine when I am physically sitting on the o2 machine. Only when I am connected to it via ssh -Y user@machine it does not work, which is unfortunate doing home office.

I uploaded the output of

o2-tpc-reco-workflow --infile tpcdigits.root --output-type tracks | LD_DEBUG=files LD_DEBUG_OUTPUT=ld_QC_debug_files.txt o2-qc --config json://home/tklemenz/AliSoftware/QualityControl/Modules/TPC/run/tpcQCPID_direct.json

here.

Thanks for having a look.

Cheers,
Thomas

Hi Cindy,

I've just tried, for me the 3 commands you used work fine (on

ubuntu).

Cheers,
 Ruben

Could you please provide what:

LD_DEBUG=files

tells you?

Can you please run with:

LD_DEBUG=files o2-tpc-workflow

and tell me what you get?

Hi @eulisse,
I again stumbled about this problem. This time when running our TPC raw data monitor. I followed the suggestion with the LD_DEBUG=files and examined the output in more detail. What I found is (condensed):

file=libAfterImage.so.0 [0];  needed by /home/wiechula/software/alicesw/sw/slc7_x86-64/ROOT/v6-20-02-alice3-1/lib/libASImage.so.6.20.02
...
file=/lib64/libGLX.so.0 [0];  needed by /lib64/libAfterImage.so.0
...
file=libGLX_system.so.0 [0];  dynamically loaded by /lib64/libGLX.so.0
...
file=/usr/lib64/dri/tls/swrast_dri.so [0];  dynamically loaded by /lib64/libGLX_system.so.0
...
file=libLLVM-6.0-rhel.so [0];  needed by /usr/lib64/dri/swrast_dri.so

So it seems libLLVM from the system is pulled in via libAfterImage from the system, which is required by ROOT.

I then added -Dbuiltin_afterimage=ON in root.sh and recompiled.
At least for me this solved the problem. I can again run my code without the error. Also I don’t see any reference to the system llvm in the LD_DEBUG=files output any longer.

Hello,

Sorry, my previous message got deleted while I tried to format it better.
I have the same issue with different workflows:

o2-calibration-data-generator-workflow
o2-calibration-lhc-clockphase-workflow
o2-calibration-ccdb-populator-workflow

the suggestion by Jens:

diff --git a/root.sh b/root.sh
index ad09b51…9a66915 100644
— a/root.sh
+++ b/root.sh
@@ -141,6 +141,7 @@ cmake $SOURCEDIR
-Dshadowpw=OFF
-Dvdt=ON
-Dbuiltin_vdt=ON \

  •  -Dbuiltin_afterimage=ON                                                          \
     ${ALIEN_RUNTIME_REVISION:+-Dmonalisa=ON}                                         \
     -Dkrb5=OFF                                                                       \
     -Dgviz=OFF                     
    

did not work for me, only using “-b” does, but then I don’t have the GUI.

Any suggestion?

Chiara

I would like to attach the output of

LD_DEBUG=files o2-calibration-data-generator-workflow --lanes 10 --mean-latency 100000 --max-timeframes 500

but it tells me that “new users cannot upload files”. Anything I should do?

Chiara

Can you zip the data and put it to CERNbox?

Here it is:

https://cernbox.cern.ch/index.php/s/90AdcZEuuLs2x9K

Is it normal that no attachment can be added?

Chiara

I think the trace is:

file=libglfw.so.3 [0];  needed by /home/zampolli/SOFT/alibuild/ali-o2/sw/ubuntu1804_x86-64/DebugGUI/v0.1.0-34bc77ae9c-2/lib/libO2DebugGUI.so
...
file=libGL.so.1 [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libglfw.so.3
...
file=libGLX.so.0 [0];  needed by /usr/lib/x86_64-linux-gnu/libGL.so.1
...
file=libGLX_mesa.so.0 [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libGLX.so.0
...
file=/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so [0];  dynamically loaded by /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0
...
file=libLLVM-9.so.1 [0];  needed by /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

So it seems that glfw from the system is the problem.
Not sure if it works, but you could try to comment the lines

prefer_system: "(?!osx)"
prefer_system_check: |
  printf "#if ! __has_include(<GLFW/glfw3.h>)\n#error \"GLFW not found, checking if we can build it.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null || \
  { printf "#if __has_include(<GL/gl.h>) && __has_include(<X11/extensions/XInput2.h>) && __has_include(<X11/X.h>)\n#error \"OpenGL is found. We build GLFW ourselves.\"\n#endif\n" | cc -xc++ -std=c++17 - -c -o /dev/null && echo "OpenGL not found. This is a dummy package."; }

in glfw.sh in alidist and recompile.

@eulisse could say more I think.

Hi @wiechula,

I first tried to add -Dbuiltin_afterimage=ON to the root.sh but it did not fix the problem.
Then I commented the lines you suggested in the glfw.sh and rebuilt everything. Indeed glfw and the debug GUI were built by alibuild but still I get the following error when running
LD_DEBUG=files LD_DEBUG_OUTPUT=ld_debug_files.txt o2-tpc-reco-workflow --infile tpcdigits.root --output-type clusters,tracks

: CommandLine Error: Option 'help-list' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

The output of above command can be found here.

Hope that helps.
Cheers,
Thomas