Snapshots in QC?

jgcn · April 27, 2020, 6:39am

Dear Experts

a question was raised in a recent MFT meeting. For some of the
histograms we would like to have ‘snapshots’ taken at regular
intervals. (The lenght of the time interval should depend on the
running conditions: pp, Pb-Pb, …)

The idea is the following. I start from my understanding that QC
keeps accumulating the data in our histograms from the beginning to
the end of a run. In a given chip we have ~0.5 M pixels.
If one pixel dies in the middle of a (several hours) long run we
may not notice it and later we will have no tool to know from which
moment it stopped working. Thus, for the pixel maps, we would like to
force its storing and reseting in fixed intervals, this is what I
called a snapshot (sorry if I use the term incorrectly, please advice
the right naming convention for such a thing).
If we have snapshots say every second hour, later on we can create
easily the cumulative maps of a whole run (or period), but we can
also have more time granularity to detect problems at the pixel level
and account for them later on when simulating our date.

the question is if such a feature exists, or is forseen, in QC for a subset of the histograms.

any comments are welcomed

thanks a lot and have a nice day

guillermo

bvonhall · April 27, 2020, 7:36am

Dear Guillermo,

Great question !

Let me see if I understand correctly. Some plots need to be reset from time to time, such as the pixels map, otherwise one would never see when a pixel dies in the middle of a run.

First of all, we keep a “snapshot” of the objects in the CCDB every hour or so (for those folders that we keep). Of course, this does not solve the problem that having a lot of data in your plot makes it difficult to identify that there is a problem (imagine for the shifter during a run).

You could therefore reset your histograms every cycle (or every now and then) in your Task.

Finally there is the post-processing to create a trending plot that would trend the hits every X minutes or hours or every cycle.

With that in place, you can see during a run already that the number of hits has dropped to zero at a given moment.

Does that answer your quesiton ?
Cheers,
Barth

jgcn · April 27, 2020, 11:43am

Hi Barth

thanks for your answer. I need three clarifications
Below I have in mind standard running conditions during data taking periods for physics analyses.

(1)
“we keep a “snapshot” of the objects in the CCDB every hour or so”
Is this for all objects and all of them will be permanently stored in the DB?
or are all the snapshots discarded at the end of the run and only the final object is stored?
(2)
“You could therefore reset your histograms every cycle (or every now and then) in your Task.”
Where is the length of the cycle defined? We would need to reset the histrograms depending on general conditions (eg pp vs PbPb) but may be also ‘location’ of the chips (those closest to the beam pipe may get more counts than those somewhere else for example). Can this cycles use then ‘selectively’? that is, do we have full access to our histograms to reset them as/if we wish?
(3)
I do not understand if reseting at the end of a cycle triggers automatically the permanent storing of the object in the DB (before the resetting of course) or if a specific command should be issued.

thanks a lot

guillermo

bvonhall · April 27, 2020, 11:58am

Dera Guillermo,

(1)
It depends. We asked in the past each detector to say whether they wanted their objects to be stored for short-term (till end of run), mid-term (a month), long-term and forever. Replies were mixed and we need a bit of everything. It will be configurable.

(2)
The cycle’s duration is defined in the qc config file:

        "cycleDurationSeconds": "10",

In production we assume a cycle of 60 seconds. It is simply the period during which we give data to the Task (which updates the histos) until, at the end of the cylce, we push the new version of the histograms to the checkers.
For your use case it seems that you will have to determine yourself in the task when is a good time to reset the histograms, independently.

(3)
It does not. At end of cycle we push the histograms forwards to the checkers who will store them in the database. Then the database, independently will clean up these many histograms by keeping only 1 per hour, or 1 per run depending on what the detector people want.

Cheers,

jgcn · April 27, 2020, 1:29pm

Hi Barth

thanks a lot. Things are clearer now. Point (3) is a bit worrisome (if I understand it correctly).
One would need a mechanism to synchronize the resetting with the storage in the database once per hour (for the selected histograms, the others are fine once per run).

Otherwise I can imagine that the resetting happens just before the storaging and then we get a reset (empty) histogram stored permanently …

Could it be possible to add somewhere some command to be sent to the DB that triggers the storage or to get from the database some flag when the histos are stored so that we use the next end of cycle to reset?
Or do you intend by design that the database performs this operation completely independently of the QC?

have a nice day

guillermo

bvonhall · April 27, 2020, 1:54pm

Hi Guillermo,

Things in O2 are asynchronous and we need to keep it this way I fear. However, I don’t think that it will prevent us from doing what you need.

Your task updates the histograms. You programmed it to reset them, maybe according to some criteria, probably in EndOfCycle. At the end of a cycle, the framework calls your EndOfCycle and then push the histograms to the checker, who checks it and stores it in the database. Each time, we get a new version.

Then, the post-processing, based on your criteria (every cycle / minute / custom !) will pick up the latest versions and will update the trending plot.

A script, that has no knowledge of anything but the database, goes through the versions of the objects (there are many many as we publish one every cycle) and remove some based on a number of policies. At the moment we have a policy that removes all but 1 per hour and one policy that keeps only 1 per run. We can have more. We could have one that uses a meta data that you would have set in your task when publishing the object to decide whether we must or not remove the object. The policies are applied on folders in the database.

So, all in all, I don’t think that we will encounter problems due the asynchronicity.
Cheers,
Barth

jgcn · April 27, 2020, 2:01pm

Hi Barth

thanks a lot for the explanations!

We will the come back to this later on when real data taking is closer (and our system is more developed) to discuss the implementation details.

have a nice day
guillermo