QC checker with multiple data sources - monitoring objects not published

aferrero · July 27, 2022, 6:35pm

Dear QC experts,

I am trying to implement a checks with two data sources (one task and one post-processing). The basic idea would be to determine the quality from one plot published by the task, and beautify the plots published by the post-processing according to the quality flag.

Here is the relevant part of my configuration:

    "tasks": {
      "MCHDigits": {
        "active": "true",
        "className": "o2::quality_control_modules::muonchambers::PhysicsTaskDigits",
        "moduleName": "QcMuonChambers",
        "detectorName": "MCH",
        "cycleDurationSeconds": "30",
        "maxNumberCycles": "-1",
        "dataSource": {
          "type": "direct",
          "query" : "digits:MCH/DIGITS"
        },
        "taskParameters" : {
          "Diagnostic" : "false",
          "OnCycle" : "true"
        },
        "saveObjectsToFile": "qc-mch-digits.root",
        "location": "remote"
      }
    },
    "postprocessing": {
      "MCHDigitsPP": {
        "active": "true",
        "className": "o2::quality_control_modules::muonchambers::PostProcessingDigits",
        "moduleName": "QcMuonChambers",
        "detectorName": "MCH",
        "customization": [
          {
            "name": "ElecHistoPath",
            "value": "MCH/MO/MCHDigits"
          },
          {
            "name": "ElecHistoName",
            "value": "Occupancy_Elec"
          }
        ],
        "dataSources": [
          {
            "type": "repository",
            "path": "MCH/MO/MCHDigits",
            "names": [
              "Occupancy_Elec"
            ],
            "reductorName": "o2::quality_control_modules::muonchambers::TH2ElecMapReductor",
            "moduleName": "QcMuonChambers"
          }
        ],
        "initTrigger": [
          "userorcontrol"
        ],
        "updateTrigger": [
          "newobject:qcdb:MCH/MO/MCHDigits/Occupancy_Elec"
        ],
        "stopTrigger": [
          "userorcontrol"
        ]
      }
    },
    "checks": {
      "QcCheckMCHDigitsPP": {
        "active": "true",
        "className": "o2::quality_control_modules::muonchambers::PhysicsCheck",
        "moduleName": "QcMuonChambers",
        "detectorName": "MCH",
        "policy": "OnAny",
        "checkParameters": {
          "MinOccupancy": "0.01",
          "MaxOccupancy": "10",
          "MinGoodFraction": "0.9",
          "OccupancyPlotScaleMin": "0.0001",
          "OccupancyPlotScaleMax": "1",
          "Verbose": "false"
        },
        "dataSource": [
          {
            "type": "Task",
            "name": "MCHDigits",
            "MOs" : "all"
          },
          {
            "type": "PostProcessing",
            "name": "MCHDigitsPP",
            "MOs" : "all"
          }
        ]
      }
    }

In the output I see that the checker is processing the objects from the Task, but is not publishing anything and probably not receiving anything from the PostProcessing:

[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:27:59.370833     CheckRunner qc-check-MCH-QcCheckMCHDigitsPP received an array with 10 entries from MCHDigits
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:27:59.631752     Trying 1 checks for 10 monitor objects
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:28:00.672480     Check 'QcCheckMCHDigitsPP', quality 'Quality: Good (level 1)'
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:28:00.672523     Storing 1 QualityObjects
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:28:00.672556     Storing quality object qc/MCH/QO/QcCheckMCHDigitsPP (QcCheckMCHDigitsPP)
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:28:00.685095     Storing 0 MonitorObjects
[218762:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 20:28:00.685111     Sending 1 quality objects

If I remove the “Task” data source in the checker, the monitoring objects from the PostProcessing are received and published as expected:

[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.020056     CheckRunner qc-check-MCH-QcCheckMCHDigitsPP received an array with 4 entries from MCHDigitsPP
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.020161     Trying 1 checks for 4 monitor objects
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.020201     Check 'QcCheckMCHDigitsPP', quality 'Quality: Null (level 10)'
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.020407     Storing 1 QualityObjects
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.020451     Storing quality object qc/MCH/QO/QcCheckMCHDigitsPP (QcCheckMCHDigitsPP)
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.029346     Storing 4 MonitorObjects
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.029442     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/MeanRate
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.041491     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/MeanRateOnCycle
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.052996     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Rate_ST12
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.063457     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Rate_ST345
[207648:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-27 19:55:00.074972     Sending 1 quality objects

Is this the expected behavior? From the documentation I get the impression that multiple data sources for the checks are supported, or at least one can find an example here:

{
    2   "qc" : {
    3     "config" : { ... },
    4     "tasks" : { ... },
    5 
    6     "checks": {
    7       "CheckName": {
    8         "active": "true",
    9         "className": "o2::quality_control_modules::skeleton::SkeletonCheck",
   10         "moduleName": "QcSkeleton",
   11         "policy": "OnAny",
   12         "dataSource": [{
   13           "type": "Task",
   14           "name": "TaskName"
   15         },
   16         {
   17           "type": "Task",
   18           "name": "QcTask",
   19           "MOs": ["example", "other"]
   20         }]
   21       },
   22       "QcCheck": {
   23          ...
   24       }
   25    }
   26 
   27 }

Thanks a lot in advance!

pkonopka · July 28, 2022, 9:13am

Hi, indeed multiple data sources are supported and even regularly tested in CI. That being said, you are probably the first person to try applying a check on results of a QC task and a post-processing task.

One possibility is that the time domains of MCHDigits and MCHDigitsPP are so different that the DPL framework drops the latter. While you see that the objects from the post-processing task are not processed, do you see any log that the pp task actually publishes them? Could you upload the full logs perhaps?

If you do not see the above, perhaps the post-processing task never publishes the objects because it is configured to trigger when MCH/MO/MCHDigits/Occupancy_Elec is stored. This might never happen if the check runner waits for the post-processing objects to store all of them in one go (in such case it would be good to understand why). Could you please upload also the output of your DPL command with --dump at the end?

Lastly, perhaps I am missing something, but did you consider creating these post-processing objects inside the main QC task? Perhaps it would be easier this way.

aferrero · July 28, 2022, 9:52am

Hi Piotr,

I will provide all the logs and DPL dumps a bit later, but let me first clarify the last point…

We are actually trying to move as many plots as possible into the post-processing, in an attempt to reduce the load on the QC mergers. Currently we fill and publish most of the plots in the tasks, which means that all those plots are generated by the EPNs and then merged at each cycle. In many cases, such plots are different ways of plotting the same information, particularly in the case of the occupancies/rates and the efficiencies.

Therefore the idea is to have one single plot published by the tasks, containing all the information (which already exists, see for example https://ali-qcg.cern.ch/?page=objectView&objectName=qc/MCH/MO/QcTaskMCHDigits/Occupancy_Elec), and let the PP do the job of averaging the values and/or plotting them in detector coordinates, with a single merged plot as input.

The same plot will also be used as input for generating the trends.

Does this make sense?

pkonopka · July 28, 2022, 9:58am

Hi, that absolutely makes sense, thanks for the explanation!

aferrero · July 28, 2022, 1:00pm

Here is a cernbox folder with the full QC configuration file, the full log and the DPL dump: CERNBox

Let me know if you need anything else…

Thanks!

pkonopka · July 29, 2022, 1:35pm

Hm, I see that there MCHDigitsPP task objects are actually reaching the check sink:

[65095:PP-TASK-RUNNER-MCHDigitsPP]: 2022-07-28 14:12:45.110180     Checking triggers of the task 'MCHDigitsPP'
[65096:PP-TASK-RUNNER-MCHTrendRates]: 2022-07-28 14:12:45.104351     Checking triggers of the task 'MCHTrendRates'
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.105170     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE600/Rate_XY_B_600
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.138184     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE601/Rate_XY_B_601
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.170595     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE602/Rate_XY_B_602
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.198906     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE603/Rate_XY_B_603
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.226696     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE604/Rate_XY_B_604
[65097:qc-check-sink-QC_MCHDigitsPP-mo_0]: 2022-07-28 14:12:45.253149     Storing MonitorObject qc/MCH/MO/MCHDigitsPP/Expert/ST3/DE605/Rate_XY_B_605

Then it went quite silent because qc-task-MCH-MCHDigits an End Of Stream message, while MCHDigitsPP was configured to trigger when the first produces new objects.

As documented here:

The beautify function is called after the check function if there is a single dataSource of type Task in the configuration of the check. If there is more than one, the beautify() is not called in this check.

you should not expect that beautify() is called for the results of the post-processing task. This design decision was due to potential problems of having two copies of the same objects which are beautified in different ways in different processes.

However, it is weird that these objects do not reach the check runner at all, indeed i can see only these:

[65106:qc-check-MCH-QcCheckMCHDigitsPP]: 2022-07-28 14:12:36.074852     CheckRunner qc-check-MCH-QcCheckMCHDigitsPP received an array with 10 entries from MCHDigits

this looks like a bug in QC.

However, due to the previously mentioned limitation (which I forgot about before), you will probably have to approach the problem differently. Would it work for you to access the check result in the post-processing task? It is done e.g. in TrendingTask like this:

    } else if (dataSource.type == "repository-quality") {
      auto qo = qcdb.retrieveQO(dataSource.path + "/" + dataSource.name, t.timestamp, t.activity);
      if (qo) {
        mReductors[dataSource.name]->update(qo.get());
      }

then you could create beautified plots directly in your post-processing task, based on the result you retrieve from the QCDB.

aferrero · September 8, 2022, 12:05pm

Hi @pkonopka!

Sorry for the long silence…

In the tests I am doing, the beautify method is actually called by the Checker even if the dataSource type is set to PostProcessing (but the checker has only one data source). Is this supposed to change in the future?

Piotr Konopka:

However, due to the previously mentioned limitation (which I forgot about before), you will probably have to approach the problem differently. Would it work for you to access the check result in the post-processing task? It is done e.g. in TrendingTask like this:
    } else if (dataSource.type == "repository-quality") {
      auto qo = qcdb.retrieveQO(dataSource.path + "/" + dataSource.name, t.timestamp, t.activity);
      if (qo) {
        mReductors[dataSource.name]->update(qo.get());
      }
then you could create beautified plots directly in your post-processing task, based on the result you retrieve from the QCDB.

Yes, this would work, thanks!

pkonopka · September 12, 2022, 7:13am

Sorry, I think the documentation was actually unclear. beautify can be called if there is just one dataSource of any kind. I will fix the doc.

This would mean that the behaviour is not supposed to change in the future.