Questions about the processing of pedestals data for MCH

aferrero · February 16, 2021, 8:30am

Dear experts,

MCH plans to regularly collect pedestal data that will be used to asses the status of the readout. and to identify dead and/or noisy channels that need to be masked for physics data taking.

The pedestal data consists of large, fixed-size events collected at very low trigger rates, of the order of few Hz or few tenth of Hz. The raw data has to be stored as-is, and cannot be converted into compressed TimeFrames because the information about the individual ADC samples would be lost. Hence my first qiestion:

will it be possible to save raw non-compressed TimeFrames when collecting MCH pedestal data?

The pedestal data will then be processed to compute the mean and RMS of the pedestal values for each readout channel. This information will be used to asses whether the front-end electronics is properly working, and to identify dead/noisy channels that need to be masked for physics data taking.

The extraction of the channel-by-channel pedestal mean/RMS values cannot be parallelized in the usual way, because the data of a given channel must be computed by a single process. It is therefore not possible to parallelize the computation by dispatching the different TimeFrames to different computing nodes.
It would be however possible to achieve a certain parallelization by dispatching the data from different CRU links to different nodes, making sure that each node always gets the data from the same link. So my second question:

will it be possible to parallelize the processing such that each node gets all the data from a given CRU link?

The result of the pedestals processing will be some histograms that show the overall status of the readout and eventually highlight the portions that do not work properly, and a table of readout channels to be masked, with three columns like this:

LinkID    BoardID    ChannelID

This table should be stored both in the CDDB (so that the reconstruction knows which channels were disabled at a given time) and in the DCS DB (the table of disabled channels will be read each time the front-end is configured for physics data taking).
I guess that the O2 processing will write the table into the CDDB, and then the table will be replicated into the DCS DB.

what is the current status of this CBBD <-> DCS DB replication? When do you think we could foresee some first tests?

When this table will be read by the MCH WinccOA software, we will need to perform some mapping that converts the LinkID values into CRU,LINK pairs, since this is how we address the front-end electronics in ALF-FRED. My understanding is that this mapping will be maintained by the central system, as it represents the actual wiring of the CRU optical links.

is that really the case? Do we already know where this mapping will be stored, and in which form? At the beginning we can use our own mapping for the tests, but it would be nice to get prepared for the final solution…

The next doubt I have is about the preferred way to implement the processing of pedestal data, which is both a data quality check task and a calibration task. One possibility would be to include al the code in QC, however my understanding is that people prefer to have a dedicated O2 workflow for calibration tasks. We could then put all the processing into a dedicated DPL workflow, and simply send the channel-by-channel values to the QC for visualization.

what is your recommendation regarding the implementation?

Finally, is there an example that shows how to write a table into the CDDB from an O2 workflow and/or a QC task?

Thanks for your patience in reading this long message!

Ping @costaf @bvonhall @pkonopka @rkuriako @vmcb @laphecet @ppillot @seperrin @bond as they probably have some words to say… please feel free to ping other relevant people that I forgot!

pkonopka · February 17, 2021, 9:32am

Hi!
I can’t say much about the calibration-related questions, but as for the QC:

Yes, I would advise for a separate O2 workflow.
QC allows to apply Checks to objects coming from external tasks (see QualityControl/doc/Advanced.md at master · AliceO2Group/QualityControl · GitHub ). It is not clear to me how we are going to execute it in the production setup, but we will find a way.

If you don’t need Checks, then you should still be able to use the same mechanism just to store objects in QCDB. I wouldn’t advise to store them by hand with CcdbAPI, because the directory structure in QCDB might change and then you would have to follow that change in your code.

Not much I suppose. You could get inspired by the Digit Task of FT0, which uses a TTree to store some events (if TTree is the direction you want to take for storing tables).

zampolli · February 17, 2021, 9:59am

Hello,

I let @bond answer more in detail on the status of the CCDB <–> DCS DB, in general, the plan is that the DB will be able to read from CCDB, and also the detector DCS machines will be able to send the configurations as zmq messages, to be picked up by a dedicated proxy (to be written).

As for examples to write to CCDB, there are several available e.g. in TOF:

The CCDB objects are prepared and then sent to a ccdb-populator, which will take care of uploading them to CCDB. Note that QC should not upload any calibration to CCDB (I make it bold since there were some misunderstanding about this).
One further comment: we are changing the interface to the ccdb populator right now, due to some limitation the current one has, but if you upload only 1 object, it will be fine.

Chiara

shahoian · February 17, 2021, 10:24am

Hi @aferrero

For (1): do you need to store uncompressed data routinely or consider to process them on-line? For the regular processing task the latter should be preferred. If you need to store these data, I think it is better to invent a dedicated CTF format for it.

For the processing itself: the options are either to run processing on FLPs (which will guarantee that all the data od the same channel are seen by the same processor) or to use special DataDistribution topology which allows to send data of certain FLP or group of its STFs to the same EPN.
ITS thresholds scan has a similar requirement (data from the multiple injection of the signal to the same group of channels need to be processed on the same device). You may contact Markus Keil from ITS, I think they already have discussed such a topology with @gneskovi .

Concerning the calibration output: this should be done from the O2 rather than QC. Why don’t you use a dedicated object with vector of Link/BoardID/ChannelID to store this table? (@pkonopka, I did not notice that the FT0 stores trees in the CCDB, I would avoid doing this since a memory based trees are source of problems.) For the storage from O2 to CCDB, there are plenty of examples, just grep for storeAsTFileAny. Note that the CCDB objects which require aggregation of data from different processors (EPNs) are supposed to be produced via this mechanism: AliceO2/Detectors/Calibration at dev · AliceO2Group/AliceO2 · GitHub, ending up in the CCDBPopulator (not yet full functional).

laphecet · February 17, 2021, 10:39am

From O2 side you have the TOF example in AliceO2/Detectors/TOF/calibration at dev · AliceO2Group/AliceO2 · GitHub

One key idea being that the upload to the CCDB per se is handled by the ccdb-populator device, not by your device (see e.g. what I’ve started for DCS->CCDB for MCH : AliceO2/Detectors/MUON/MCH/Conditions at dev · AliceO2Group/AliceO2 · GitHub)

gneskovi · February 17, 2021, 9:48pm

Hi @aferrero

Exactly. For these types of calibration runs you’ll need to specify how many EPNs you require, and define sets of FEE IDs to be sent to each EPN node. The data will be available in the form of TFs for your calibration workflow.

aferrero · February 18, 2021, 8:19am

Dear all,

thanks a lot for the prompt and detailed inputs! I will start drafting a O2 workflow for the pedestals calibration, based on the informations I git from here.

@gneskovi the dispatching of data based on the FEEID seems the good way to go for us.

aferrero · February 22, 2021, 7:29am

Ok, got it. Concerning the actual table storage format, the key thing is that is should be easily loaded in WinccOA. @rkuriako @laphecet any idea what could be a convenient format that can be easily handled both in O2 and WinccOA?

aferrero · February 24, 2021, 2:58pm

@laphecet I started to have a look at the MCH and TOF examples, and to the ccdb-populator device. However, it is not clear to me how the merging of the outputs from the various EPNs will be handled. The CCDB populator seems to be mostly an interface for the CCDB API. Am I missing something?

shahoian · February 24, 2021, 3:06pm

The aggregation of inputs from different EPNs will be done with input/output proxies, but at the moment does not work as expected, there might be more news next week.

aferrero · February 24, 2021, 3:21pm

@shahoian ok, got it. Does this mean that I can take the AliceO2/Detectors/TOF/calibration/src/LHCClockCalibrator.cxx code as a starting point, and it should be more or less compatible with the EPN merging?

One big difference I see with respect to the calibration documentation and the TOF example is that in the use case I am describing here we will not have TF-dependent calibration values, but a single table of bad channels that is the result of the processing and merging of several TFs from special pedestal runs. Such table will be updated periodically by taking new pedestal data, probably every (few) fills.

So we do not have “slots”, and we also do not have “input calibration data”. Instead, we have workflows that take RAW data as input and generate a table when the end of the input data is reached. Several equivalent workflows will run on different EPNs, getting TimeFrames from specific FEE IDs and generating (sub)tables of bad channels from those FEE IDs at the end of the processing. Then, the individual tables from the various EPNs need to be merged into a single table before being written to the CCDB.

Looking at the existing code I was not able to find an example similar to what I am describing, but maybe I should dig further. Otherwise, is this documentation also valid for what I am describing?

Thanks!

zampolli · February 24, 2021, 3:44pm

Ciao Andrea,

TOF has also the ChannelCalibration that creates 1 calibration only at the end of the processing, see:

github.com

AliceO2Group/AliceO2/blob/dev/Detectors/TOF/calibration/testWorkflow/README.md

<!-- doxy
\page refDetectorsTOFtestWorkflow testWorkflow
/doxy -->

# TOF calibration workflows

## DCS DP processing:

Local example workflow with local CCDB (running on port 8080) :

This will read the list of DPs to be associated to TOF from CCDB (remove
`--use-ccdb-to-configure` if you don't want this, but use hardcoded
aliases. You can specify the path of CCDB also with `--ccdb-path`.
YOu can also specify to run in verbose mode (`--use-verbose-mode`)

```shell
o2-calibration-tof-dcs-sim-workflow --max-timeframes 3 --delta-fraction 0.5 -b |
o2-calibration-tof-dcs-workflow --use-ccdb-to-configure -b |
o2-calibration-ccdb-populator-workflow --ccdb-path="http://localhost:8080" -b
```

This file has been truncated. show original

It is different, in the sense that it receives the calibration data at every TF.
Your use case simply needs to merge at the end of processing the output of different workflows, right? So it should be even simpler because you do not have the TimeSlot complication.

Chiara

aferrero · February 24, 2021, 4:10pm

Thanks Chiara, I am now also looking into the TOF channel calibrator, however I still have an issue with the flow of the calibration data… quoting from the O2 documentation:

“The calibration flow of O2 foresees that every calibration device (expected to all run on one single aggregation node) will receive the TimeFrames with calibration input from every EPN in an asynchronous way. The calibration device will have to process the TFs in time intervals (TimeSlots) which allow to create CCDB entries with the needed granularity and update frequency (defined by the calibration device itself).”

If I understand correctly, there is ONE SINGLE calibration device that gets data from workflows running on the various EPNs, each workflow processing a sub-set of the TimeFrames. The merging is performed on a time basis, creating “slots” that combine a number of TF together.

In our case the merging we have to perform is not time-based, but topology-based: each EPN sends data from a different portions of the MCH system, and the data will arrive to the calibrator asynchronously.

Probably in this case my calibrator should collect all the “chunks” from the various EPNs, do the merging and send the result to CCDB when all the expected chunks have been received. One would then need to set a reasonable timeout if some pieces are not arriving.

I am wondering if there is already a mechanism in place for this, or if we should implement the merging and timeout logic from scratch…

Thanks!

shahoian · February 24, 2021, 6:58pm

@aferrero as Chiara wrote, the time slot may be also a single container limited only by the run duration.
On every EPN you will process multiple TFs of the same FEEID and send the result of processing for aggregation, right? What triggers the sending from given EPN?
The messages from different EPN will arrive under some TFID (even if you don’t care which) with similar DataOrigin/DataDescription and the SubSpec being e.g. FEEID. Then if your aggregator subscribes to this inputs with wildcarded SubSpec, it should be irrelevant that some messaged may have the same TFID, they all will fit to definition infinite time slot. The only thing I am not sure: how the end of run (or end_of_stream in DPL sense) can be communicated to aggregator, so that it sends its output to CCDB populator.

zampolli · February 24, 2021, 9:45pm

Hello @shahoian ,

Your last question about the end of stream is general, right? Because when one simulates the EPNs data going to the aggregator, this is not an issue.

Is it the same as here: https://alice.its.cern.ch/jira/browse/O2-2075?

Chiara

shahoian · February 24, 2021, 10:07pm

@zampolli Yes, it is general, thanks for pointing me to this ticket.

aferrero · February 25, 2021, 9:45am

Yes!

The end of the pedestal run. At the moment I am subscribing to CallbackService::Id::Stop for this.

Good point…

Thanks for the link, I will follow the discussion…

An alternative approach, that would probably be more similar to what is already done for other detectors, would be to only perform the RAW decoding on the various EPNs, and send to the calibrator some special “fat” digits with the individual ADC samples. The computation of the mean and RMS of the pedestals would then be performed by the calibrator itself, which will receive blocks of digits (one block for each TF) from the various processors. The order in which the digits arrive is not important, so we still can consider a single “infinite” time slot.

The calibrator would then send the output to the database when the end of stream is reached, hence I guess the discussion in the JIRA ticket mentioned by Chiara is relevant also for our case, right?

I will wait for @laphecet and @ppillot to step in the discussion, to see which approach they would prefer.

Thanks!

shahoian · February 25, 2021, 10:07am

this should be ok, provided the total digits data rate is not too high.

aferrero · February 25, 2021, 10:15am

The instantaneous rate will be quite high, since each individual channel in the system will send a digit with up to 20 ADC samples, but we will take data at very low trigger rate (few Hz) so the average data rate can be kept low.

shahoian · February 25, 2021, 10:30am

OK, then there should be no problem to send digits directly to aggregator, moreover in pedestal run where the MCH is single user (?) of resources.