Error starting o2-qc

Hi,

we observe a problem, when we want to start qc for tpc pedestal.
we are running the qc process on epn160 and get the following output.

Loading QualityControl/v1.25.0-1
Loading requirement: BASE/1.0 asio/v1.19.1-2 GCC-Toolchain/v10.2.0-alice2-3 ZeroMQ/v4.3.3-6 lzma/v5.2.3-6 zlib/v1.2.8-8 OpenSSL/v1.0.2o-9 libpng/v1.6.34-9 sqlite/v3.15.0-2 libffi/v3.2.1-2
FreeType/v2.10.1-8 Python/v3.6.10-12 Python-modules/1.0-13 boost/v1.75.0-10 libjalienO2/0.1.3-5 Ppconsul/v0.2.2-2 Configuration/v2.6.2-2 fmt/7.1.0-10 Clang/v11.0.0-15 Common-O2/v1.6.0-10
protobuf/v3.14.0-8 lz4/v1.9.3-9 arrow/v1.0.0-21 GSL/v1.16-8 libxml2/v2.9.3-8 ROOT/v6-24-02-5 c-ares/v1.17.1-5 ofi/v1.7.1-8 FairLogger/v1.9.1-7 libInfoLogger/v2.1.1-2 re2/2019-09-01-11
grpc/v1.34.0-alice1-6 asiofi/v0.5.1-2 DDS/3.5.16-3 FairMQ/v1.4.40-1 Control-OCCPlugin/v0.26.1-1 GLFW/3.3.2-10 FairRoot/v18.4.1-65 Vc/1.4.1-11 Monitoring/v3.8.7-2 ms_gsl/3.1.0-5
libuv/v1.40.0-10 DebugGUI/v0.5.6-6 FFTW3/v3.3.9-4 O2/nightly-20210812-2 VecGeom/89a05d148cc708d4efc2e7b0eb6e2118d2610057-33 yaml-cpp/yaml-cpp-0.6.2-15
/opt/alisw/el8/O2/nightly-20210812-2/lib/libO2FrameworkFoundation.so(_ZN2o29framework13runtime_errorEPKc+0x6f)[0x7fe6c7d1b38f]
/opt/alisw/el8/O2/nightly-20210812-2/lib/libO2Framework.so(+0x15e39a)[0x7fe6c824f39a]
/opt/alisw/el8/O2/nightly-20210812-2/lib/libO2Framework.so(_Z6doMainiPPcRKSt6vectorIN2o29framework17DataProcessorSpecESaIS4_EERKS1_INS3_26ChannelConfigurationPolicyESaIS9_EERKS1_INS3_16CompletionPolicyESaISE_EERKS1_INS3_14DispatchPolicyESaISJ_EERKS1_INS3_14ResourcePolicyESaISO_EERKS1_INS3_15ConfigParamSpecESaIST_EERNS3_13ConfigContextE+0x3097)[0x7fe6c8482e97]
o2-qc[0x413741]
o2-qc[0x40dbac]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fe6c159e7b3]
o2-qc[0x40dc9e]
[ERROR] invalid workflow in o2-qc: Empty workflow provided while running in batch mode.

The command we are running:
DISPLAY=0 o2-qc --config json://home/epn/runcontrol/tpc/tpcQCCalDetPublisher_Pedestal.json -b --run

We are not sure, how to fix this error.
May we ask for you help.

Thanks
Robert

I had observed the same error and backtrace while investigating another problem today and the exception was in fact raised in QC (while it does not seem so)

Most likely this printout also does not show where the real problem is, so there might be something wrong with catching/printing exceptions. @bvonhall could you please have a look? I would expect to have it correctly printed here or somewhere higher by DPL (@eulisse ?)

@rmunzer In the meanwhile you can try running it with:

 DISPLAY=0 gdb --args o2-qc --config json://home/epn/runcontrol/tpc/tpcQCCalDetPublisher_Pedestal.json -b --run

Then:

catch throw
start
bt
continue # the first exception is probably another one and is dealt with by DPL
bt

This might help you by track down the code line that is failing, before we print the exception correctly again.

Hi Piotr,

thank you very much.
I attach the output from gdb
gdb.log (4.9 KB)

Hi,

I’ll have a look later today.
Cheers

Hi,
Could you provide me with /home/epn/runcontrol/tpc/tpcQCCalDetPublisher_Pedestal.json ?

It seems that the crash occurs during the parsing of the file:

#0  0x00007fe2e5b50b5e in __cxxabiv1__cxa_throw (obj=obj@entry=0x677870, tinfo=tinfo@entry=0x4215d8 typeinfo for boostwrapexceptboostproperty_treeptree_bad_path, 
    dest=dest@entry=0x415880 boostwrapexceptboostproperty_treeptree_bad_path~wrapexcept()) at ........gcclibstdc++-v3libsupc++eh_throw.cc78
#1  0x000000000041993a in boostthrow_exceptionboostproperty_treeptree_bad_path (loc=..., e=...)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562831slc8_x86-64boostv1.75.0-10includeboostthrow_exception.hpp171
#2  boostproperty_treebasic_ptreestd__cxx11basic_stringchar, stdchar_traitschar, stdallocatorchar , std__cxx11basic_stringchar, stdchar_traitschar, stdallocatorchar , stdlessstd__cxx11basic_stringchar, stdchar_traitschar, stdallocatorchar   get_child (this=optimized out, path=...)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562831slc8_x86-64boostv1.75.0-10includeboostproperty_treedetailptree_implementation.hpp576
#3  0x00007fe2ebb7913a in o2configurationbackendsJsonBackendgetRecursive (this=this@entry=0x679380, path=...)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562781SOURCESConfigurationv2.6.2v2.6.2srcBackendsJsonJsonBackend.cxx68
#4  0x0000000000411b85 in defineDataProcessing (config=...)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562831SOURCESQualityControlv1.25.0v1.25.0FrameworksrcrunQC.cxx164
#5  0x0000000000413648 in mainNoCatch (argc=5, argv=0x7ffc45237268)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562831slc8_x86-64O2nightly-20210812-2includeFrameworkrunDataProcessing.h187
#6  0x000000000040dbac in main (argc=5, argv=0x7ffc45237268)
    at mntmesossandboxsandboxjenkinsworkspacebuild-any-ibsw201562831slc8_x86-64O2nightly-20210812-2includeFrameworkrunDataProcessing.h208

Hi Barth,

the json file is the following:

{
“qc”: {
“config”: {
“database”: {
“implementation”: “CCDB”,
“host”: “ccdb-test.cern.ch:8080”,
“username”: “not_applicable”,
“password”: “not_applicable”,
“name”: “not_applicable”
},
“Activity”: {
“number”: “42”,
“type”: “2”
},
“monitoring”: {
“url”: “infologger:///debug?qc”
},
“consul”: {
“url”: “http://consul-test.cern.ch:8500
},
“conditionDB”: {
“url”: “ccdb-test.cern.ch:8080”
}
},
“postprocessing”: {
“PadCalibration”: {
“active”: “true”,
“className”: “o2::quality_control_modules::tpc::CalDetPublisher”,
“moduleName”: “QcTPC”,
“detectorName”: “TPC”,
“valid_outputCalPadMaps_comment” : [ “CE”, “Pulser” ],
“outputCalPadMaps”: [
],
“outputCalPads_comment” : [ “Put all CalPad objects you want to look at in the list. The name has to be the same one as on the CCDB.”,
“valid outputCalPads: ‘Pedestal’, ‘Noise’” ],
“outputCalPads”: [
“Pedestal”,
“Noise”
],
“timestamps_comment”: [ “Put the timestamp of the corresponding file you want to look for in the timestamps array.”,
“You can either put a timestamp for every object or leave the array empty to take the latest file from the CCDB.”,
“An empty array to get the the latest version will be the main use case.”,
“The array is mapped to the output objects sequentially”,
“If you want to pick the latest file in the CCDB manually, you can use -1.”
],
“timestamps”: [
],
“lookupMetaData_comment”: [ “With this array you can filter your search via meta data.”,
“The array is mapped sequentially to the output objects.”,
“If you leave only one entry in the array this is used for all objects in outputCalPadMaps and outputCalPads.”,
“If you want no meta data simply remove ‘keys’ and ‘values’ completely and leave only {}”,
“Every entry above (outputCalPads.size() + outputCalPadMaps.size()) is ignored.”,
“The keys and values that are set by default are only there to serve as an example.”
],
“lookupMetaData”: [
{
}
],
“storeMetaData_comment”: “For how-to, see ‘lookupMetaData_comment’.”,
“storeMetaData”: [
{
}
],
“histogramRanges_comment” : [ “nBins”, “min”, “max” ],
“histogramRanges”: [
{ “Pedestals” : [ “240”, “0”, “120” ] },
{ “Noise” : [ “200”, “0”, “2” ] },
{ “PulserQtot” : [ “600”, “0”, “300” ] },
{ “PulserT0” : [ “100”, “239”, “240” ] },
{ “PulserWidth” : [ “100”, “0”, “1” ] },
{ “CEQtot” : [ “600”, “0”, “300” ] },
{ “CET0” : [ “200”, “400”, “500” ] },
{ “CEWidth” : [ “100”, “0”, “1” ] }
],
“checkZSCalibration”: {
“check”: “false”,
“initRefCalibTimestamp”: “-1”,
“initRefPedestalTimestamp”: “-1”,
“initRefNoiseTimestamp”: “-1”
},
“initTrigger”: [
“once”
],
“updateTrigger_comment”: “To trigger on a specific file being updated, use e.g. ‘newobject:ccdb:TPC/Calib/Noise’”,
“updateTrigger”: [
“newobject:ccdb:TPC/Calib/Noise”
],
“stopTrigger_comment”: [ “To keep the task running until it is stopped manually set the trigger on the update of a non-existing object, e.g. ‘newobject:ccdb
:TPC/ThisDoesNotExist’”,
“There will be a end of run trigger implemented so the above workaround can be abandoned later.” ],
“stopTrigger”: [
“newobject:ccdb:TPC/ThisDoesNotExist”
]
}
}
}
}

There is no dataSamplingPolicies node in the config file. It is mandatory to have one. There is most probably a message in the console before the crash.

My bet is that it provokes fatal in some component(s) but it does not stop the driver because runQC.cxx simply catches the fatal and returns and empty WorkflowSpec.

I will reconsider how to handle that.

@pkonopka Make sure that we stop the driver if we cannot build the workflow by Barthelemy · Pull Request #815 · AliceO2Gro

1 Like

I also think that QC should survive not having “dataSamplingPolicies” in config files, I will fix this: Cern Authentication