Running QC task exhibits extremely short cycles

Dear all,

today I have updated my O2 and QualityControl and now I see a strange behaviour when running a QC task, which I did not observed before: The duration of the cycles is extremely short, it is running of the order of more than hundred cycles per second, although in the json file the cycleDurationSeconds is set to 10. Up to now this was working fine, but now this option seems to be ignored. I checked increasing the value to 10000, but the behaviour does not change at all.

I must admit, that we still have the json file and executable in QualityControl/Framework. We started looking into porting this to Modules/TPC, but this is not working, yet. Anyhow, up to now the QC task was running fine without such issues.

An example of such a TPC QC task is available in the O2 dev and QualtiyControl master branches, the executable is QualityControl/Framework/src/runTPCQCPID.cxx and the json file QualityControl/Framework/tpcQCPID.json.

Note, that the task is also now in principle running, the histograms even get published on the qcg-test webpage, but the behaviour with the cycles is quite strange.

Thanks for any help!

Best regards,
Stefan

Dear Stefan,
I won’t be able to look into it before Thursday as I am away from CERN. However, we can already try to understand a bit better what is going.

  1. Do you see the same behaviour when using o2-qc-run-basic ?
  2. What lines tell you that there are so many cycles ?
  3. Could you give me the exact line you are running ?

My guess is that you see some spurrious output or that our output is misleading.
Cheers,
Barth

Dear Barth,

here is some more information:

  1. Running o2-qc-run-basic does not exhibit this behaviour, but runs normal.
  2. Here are some lines of output, the time is just about a second after the start of the first cycle:

[9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.447700 startOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.450051 monitorData: 4762 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.450262 endOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.450279 cycle 235 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.450282 startOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.452610 monitorData: 4762 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.452849 endOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.452867 cycle 236 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.452870 startOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.455249 monitorData: 4762 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.455685 endOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.455711 cycle 237 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.455715 startOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.458137 monitorData: 4762 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.458477 endOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.458498 cycle 238 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.458502 startOfCycle [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.460872 monitorData: 4762 [9475:QC-TASK-RUNNER-TPCQCPID]: 2019-11-12 11:58:54.461118 endOfCycle

  1. I am running: o2-qc-run-tpcpid
    This I actually do in a separate directory, where I first did run a small simulation to get some TPC tracks. If you would like to try this, you can find the commands to be run in this JIRA ticket: ATO-478, in one of the first comments by Jens. This task was running normally before the latest updates of O2 and QualityControl.

One additional info: There is something published on the qcg-test webpage already after a few seconds, in qc/MISC/TPCQCPID. So there have to be cycles running and publishing after a time much shorter then the 10 seconds specified in the json file.

Cheers,
Stefan

Dear Barth,

did you find already some time to look into this issue?

Yesterday, I updated my O2 and QC and this time also the alidist (which I actually had not done the last time), but the issue persists in the same way as described above.

Cheers,
Stefan

Dear Stefan,
I apologize, I have not been able yet to reproduce this issue. There are compilation issues in the stack apparently. Working on it.
Cheers,

Dear Barth,

a small update to this issue: Jens suggested to change in the run executable the publishing mode of the RootTreeReader from Loop to Single. This actually does change the behaviour: Now, only the cycle 0 is finished (still in a very short time) and the output is also published on the qcg-test webpage. Afterwards, cycle 1 is started and never stops (and also no further cycles are started). Is this expected when setting the publishing mode to Single? Maybe also @wiechula can comment on this?

If you want to have a look at the code, you can find it in my own branch on github, both the runTPCQCTracks.cxx and the corresponding tpcQCTracks.json file. Please note, that both are still in QualityControl/Framework/. We have started to look into moving those to Modules/TPC/, but this was not successful up to now.

EDIT: To be clear: in the version on github, the publishing mode is still Loop, see line 88 of the runTPCQCTracks.cxx

Cheers,
Stefan

Dear Stefan,

I am back on a stable build.

Reproduction of the problem
I am following the instructions from Jens and will come back to you if I am stuck.

Single vs Loop in RootTreeReader
I have never used RootTreeReader but I see that Single will make it serve the dataonly once. As a result, the monitoring and the QC are just waiting for the next piece of data. However, they should still cycle and thus I have created a bug report for Piotr: https://alice.its.cern.ch/jira/browse/QC-257

Cheers,
Barth

Hi,

I am able to reproduce the problem.

For the records here is what I did :

alienv enter QualityControl/latest
mkdir simdata
cd simdata
o2-sim -m TPC -n 100 
o2-sim-digitizer-workflow -b
o2-tpc-reco-workflow --infile tpcdigits.root  --output-type clusters,tracks #needed ? 
o2-qc-run-tpcpid

I have to look into it now. In the worst case I will need Piotr (@pkonopka) and he will be back next week. He is the expert for the Data Sampling and DPL.

Cheers,

Hi,

I tried to remove the lines adding the QC in runTPCQCPID.cxx and “piping” the QC instead and it seems to work.

  1. Remove these lines
 std::string filename = "tpcQCPID.json";
  const std::string qcConfigurationSource = std::string("json://") + getenv("QUALITYCONTROL_ROOT") + "/etc/" + filename;
  LOG(INFO) << "Using config file '" << qcConfigurationSource << "'";
  
  // Generation of Data Sampling infrastructure
  DataSampling::GenerateInfrastructure(specs, qcConfigurationSource);
  
  // Generation of the QC topology (one task, one checker in this case)
  o2::quality_control::generateRemoteInfrastructure(specs, qcConfigurationSource);
  1. Run with
    o2-qc-run-tpcpid | o2-qc --config json://${QUALITYCONTROL_ROOT}/etc/tpcQCPID.json

The cycles are 10 seconds long as expected. Could you give it a try ?

Even if it works, I propose to keep the jira ticket open (https://alice.its.cern.ch/jira/browse/QC-258) because I would like Piotr to have a look. I don’t understand what is wrong with your proposed code.
Meanwhile you can work with the solution I exposed above. It is any ways the normal way of running QC now.

Cheers,
Barth

Hi Barth,

thanks a look for you detailed checks an looking into this. I agree with all you wrote. Indeed, we also were planning to split the file loading part and the QC part as you did.
We are in principle waiting for Matthias to come up with a generic solution for the file reader. But for the moment we can do as you propose since this is also the proper way.

Cheers,
Jens

Hello,
see the solution in https://alice.its.cern.ch/jira/browse/QC-258
I will prepare a PR with the fix shortly.
Best, Piotr

This is a result of the same root cause, that produced the problem of the very short cycles. I confirmed it works as it should.