After having recently updated the O2 processing chain that we use for the MCH commissioning, I think I observe a different behavior with respect to the code from June 2020 that I was using beforehand.
For the MCH commissioning we analyze data that was previously written to disk using readout.exe. The data is readout via a custom DPL device that loads HB frames from the input file and pushes than as DPL messages. The messages are received by QC (at 100% sampling rate) and processed.
The DPL messages are pushed much faster than they are digested by QC, so some back-pressure builds up quite quickly.
Using an O2 version from before June 2020, the DPL source was apparently slowing down to cope with the sink speed, and all messages were received and processed by QC.
With the current O2 code however it seems that DPL messages get dropped in order to reduce the back-pressure, instead of slowing down the source.
I have not done any detailed benchmark so far, but I could try to prepare a minimal example if this is an unexpected behavior.
If instead this is a known feature, is there an option that allows to disable the DPL message dropping and allow to transfer all messages from source to sink?
Thanks a lot in advance.
I suspect you’re hitting the same problem as we (Philippe and myself) hit, and that is described in https://alice.its.cern.ch/jira/browse/O2-1924 where Philippe actually prepared a minimal reproducer.
So either there’s indeed a bug somewhere or we (mch people) consistently misuse the DPL framework somehow
thanks for pointing me to the Jira ticket. I have tried to find similar discussions before posting this, but I missed that one…
My impression is that with the current behavior it is impossible to process 100% of the DPL messages, because in a DPL chain most of the consumers are typically slower than the producers… so for me it looks like a bug.
However, mixing the order of DPL messages and dropping some of them are two different issues IMHO. Mixing is somehow OK, as long as we consider that each TF is independent for the others. In the other hand, skipping messages means we completely miss some data…
My 2 cents: it should be, at the very least, configurable. In ZeroMQ, one would switch between subscriber-publisher and push-pull to have one behaviour or the other (i.e. non-blocking vs blocking).
@laphecet of course in the case we want to compare the output of a given chain with a reference file, then the order in which the messages are processed is also important. I think we would ideally need the three options:
- 100% messages processed sequentially for debugging and cross-checks
- 100% messages in random order for data analysis, where the order of processing of the TFs should not matter as long as at the end we are sure to have analyzed 100% of the data
- DPL messages dropping for online processing, if back-pressure cannot be avoided otherwise
@bvonhall I absolutely agree with you…