Various questions regarding the QC framework

aferrero · August 1, 2019, 4:28pm

I have started to do more detailed experiments with the QC framework for the muon chambers, with the aim of preparing some diagnostics tools for the commissioning.

While developing the code, some questions came to my mind:

the QC analysis code needs to know the status of the electronics, and in particular which FECs and/or links are enabled/disabled, in order not to process data that is known to be missing or corrupted. I guess that such information will be stored in some database that is shared with the DSC, as it also needs to know which FECs have to be configured and which ones should be skipped/disabled.
Is this mechanism already implemented and available in QC?
Most likely, some QC tasks will only be run for specific data taking conditions. For example, in our case the analysis of the pedestal values only makes sense for special “calibration” runs. Is there a mechanism in QC to retrieve the run conditions, and adapt the analysis accordingly?
During the code development I have re-named few times some histograms, and now they all appear in the web interface. Is there a simple way to remove the obsolete ones?
Some of my histograms are 2D, and I have tried to set the draw option to “colz” in the beautify method of the checker code, however this setting does not seem to propagate to the web display, where the histogram is still shown with the default draw option. Is this expected?

Thanks a lot in advance!

Cheers, Andrea

pkonopka · August 2, 2019, 12:44pm

Hello Andrea,
Thank you for the feedback!

As for now, we don’t have such a mechanism in QC. We will have to think if it is possible to somehow get this data from DCS and how to do it. I have created a JIRA issue and added you as a watcher.

We were planning to make possible to access CCDB data as Tasks or Checks inputs, though this is not in place yet. You can follow this JIRA issue.
@eulisse I see that there is a Lifetime::Condition present in DPL, is that already usable?

AFAIK, there is no user interface for that. Could you @bvonhall confirm when you are back?

Could you please confirm that it works locally, for example by saving the histogram as an image on drive and checking if the option is applied then? If it is, then probably it is a bug in QCG indeed.

Best regards,
Piotr

eulisse · August 5, 2019, 8:16am

Lifetime::Condition should work, albeit in a non optimised way. If someone could try it out with a real workload and provide feedback, I can try to address the issues which come up.
Simple test at:

github.com

AliceO2Group/AliceO2/blob/dev/Framework/Core/test/test_SimpleCondition.cxx

// Copyright CERN and copyright holders of ALICE O2. This software is
// distributed under the terms of the GNU General Public License v3 (GPL
// Version 3), copied verbatim in the file "COPYING".
//
// See http://alice-o2.web.cern.ch/license for full licensing information.
//
// In applying this license CERN does not waive the privileges and immunities
// granted to it by virtue of its status as an Intergovernmental Organization
// or submit itself to any jurisdiction.
#include "Framework/DataRefUtils.h"
#include "Framework/AlgorithmSpec.h"
#include "Framework/ServiceRegistry.h"
#include "Framework/runDataProcessing.h"
#include <Monitoring/Monitoring.h>
#include "Framework/ControlService.h"
#include "Framework/CallbackService.h"
#include "Framework/Logger.h"

using namespace o2::framework;
using DataHeader = o2::header::DataHeader;

This file has been truncated. show original

very minimal documentation at:

http://aliceo2group.github.io/AliceO2/d8/dd2/structo2_1_1framework_1_1LifetimeHelpers.html#a8f715bc5c1ad1298b8cb5afdad3abb25

let me know if this can get you started.

aferrero · August 5, 2019, 9:04am

As far as I understand, making a real test will involve few additional things, in particular a DB that should be queried, a way to add/change the information about running conditions in the DB, etc… How much of this already exists in AliECS? What about the CCDB?

eulisse · August 5, 2019, 9:15am

There is a test instance from @grigoras which you can use, but at the moment I cannot find the url.

pkonopka · August 5, 2019, 3:01pm

Also, the FLP suite installs a CCDB instance on the machine.

aferrero · August 5, 2019, 9:36pm

@pkonopka Does it mean that one can configure, access and modify a local instance of the CCDB? Are there any instructions for this?

@grigoras could you maybe provide me some guidelines on how to get started? We could also sit together if you are at CERN…

Thanks!

grigoras · August 6, 2019, 6:26am

Ciao Andrea,

I’m not at CERN in the next couple of weeks. Until then you have two options to run your own CCDB instance:

a simple, file system-based service: https://docs.google.com/document/d/1_GM6yY7ejVEIRi1y8Ooc9ongrGgZyCiks6Ca0OAEav8/edit
the full fledged, SQL-based version like the FLP installation for which the instructions to install being here: https://alice.its.cern.ch/jira/browse/O2-752

Alternatively you can simply point to the test instance at ccdb-test.cern.ch:8080 .

Cheers,

.costin

aferrero · August 6, 2019, 12:13pm

@grigoras @pkonopka @eulisse thanks for the clarifications! I also had some further discussion with the DCS team, and I now understand that the DCS will use an ORACLE database to store the informations, including the FE electronics configuration.

However, I still do not have a clear picture of where the various bits of information will be stored, and who will talk with what… let me try to give a simple example to explain my doubts.

In the MCH case we have a large number of front-end chips (SAMPA) that can be configured individually. Some of those chips (or even individual channels within one chip) might be disabled for various reasons. This means that somewhere we will need to store some sort of “chip enable” mask, that the DCS will use to determine which chips must be configured and which must be skipped. The obvious place for such information seems to be the ORACLE DCS database.

However, when running the QC over the collected data we also need to know which FE chips are disabled and therefore do not send valid data, so that the corresponding channels can be safely ignored in the QC analysis. Is there an existing communication channel between the QC and the DCS database? Is it foreseen in the future? Or should the information be duplicated in some way to be accessible by the QC?

Thanks!

pkonopka · August 6, 2019, 2:36pm

That database instance was indeed designated for QC and QC GUI, so I am not really sure myself if using it for other purposes would break something, for example in the GUI. Again, I have to let @bvonhall answer.

At the moment we don’t have a way to access the DCS database in QC. As you clearly gave a valid reason to have it, we will try to figure out something.

@eulisse What do you think about having something similar to the aforementioned Lifetime::Condition, but Lifetime::DCS or Lifetime::OracleDB, which would query their database? I can give it a try, though not straight away.

@aferrero Out of curiosity and concerns about the impact on performance - Is that kind of data from DCS expected to change frequently? Is it enough to load it once, at the beginning of a run? What would be the size of such a “chip enable” mask?

aferrero · August 6, 2019, 2:43pm

I guess this information will only change at the beginning of a run. We have a bit more than 30 000 chips, and the information could be arranged as bit masks, so the size might be of the order of 5kB if I am not wrong…

aferrero · August 7, 2019, 6:49am

@grigoras @pkonopka @eulisse @bvonhall
I would have another DB-related question, while we are at it…

At least in the MCH case, we foresee different running modes (pedestals, physics triggered, physics continuous). Each running mode requires a different FE chip configuration, as well as a different analysis path in QC.

In other words, the information about the running mode should be accessed both by the DCS (ton properly configure the electronics) and the QC (to adapt the analysis code path).

However, I guess neither the DCS nor the QC are in charge of setting the running mode, they will just read the value. Also, to me such information does not seem to belong to the DCS database.

Is there already an agreement on how to handle this? Are there similar requirements from other detectors?

Thanks!

eulisse · August 7, 2019, 7:42am

@pkonopka yes, I was also thinking about this. Lifetime::DB? Then DCS or ORACLE could be part of the configuration. In principle all the bits (apart from the caching) should be there. You can write your own fetchFromOracleDB mimicking fetchFromCCDBCache and then modify DeviceSpecHelpers.cxx to take care of the new kind of lifetime.

pkonopka · August 7, 2019, 9:01am

Hmm, I can’t say for DCS, but for QC, wouldn’t we just need to run different workflows depending on running mode? AFAIK, that will be the case for the main processing as well.
Workflows for data taking QC and e.g. pedestal runs QC could consist of different QC Tasks and Checks, or the same ones, but with different configuration parameters. They would be started and stopped by a shifter together with the main processing or calibration workflows. Please someone correct me if this is not the plan.

bvonhall · August 12, 2019, 7:16am

The GUI shows what is stored in the database. In the newest version of the QCG, there is an “online” mode that shows only the tasks currently running and the objects currently being published. This option will already clarify the view for the user. As for the deletion, we are still thinking what is the best way as it can lead to disasters.

bvonhall · August 12, 2019, 7:20am

The QC uses the CCDB technology for its repository. In production, it will be a different instance than the CCDB where conditions and calibrations are stored.

Today, users can either use the central instance called ccdb-test or they can install a local instance. If the user installs an FLP using ansible they will have a local instance. There are instructions, provided by Costin, on how to install locally the CCDB. Then one could just update the URL in the config file.

bvonhall · August 12, 2019, 7:20am

I would expect the Control (or CCML at large) to be in charge of that.

sheckel · November 4, 2019, 4:17pm

Dear all,

we have a few questions concerning the QC publishing and found two of them discussed already here, i.e. questions 3 and 4. So we would like to come back to these points:

Deleting outdated histograms in the QCG: If we understand correctly, outdated histograms cannot be deleted by the user, but there should be an “online” mode only showing the currently produced and updated histograms. How can the “online” mode be activated? Will it be possible to delete single histograms from the QCG webpage in future?
Drawing options (like “colz”) are not propagated to the histograms published on the webpage: We observe the same behaviour. We tried both SetDrawOption(“colz”) and SetOption(“colz”) within an O2 QC task (during initialization of the histograms or alternatively in the function, where tracks are processed and the histograms filled), but they both get ignored. Other changes of the histogram actually do work, e.g. GetYaxis()->SetTitleOffset(). If instead of a QC workflow we use a simple root macro to run the O2 QC task and fill and draw the histograms, the SetOption works fine, so it indeed seems to be an issue of the QCG.

We also would like to add one question:

On the qcg webpage, the histograms are added in exactly the opposite order as they are initialized and filled. Is there a specific reason for this behaviour and can it be changed?

Thanks a lot for any help!

Cheers,
Stefan (for the TPC-QC team)

bvonhall · November 5, 2019, 1:55pm

Dear Stefan,

Thank you for your message. I will try to answer your questions.

Objects deletion
A more complex topic than I thought. I amend slightly what I said previously.

A new version of each object is published at each cycle (e.g. 10s, 1m, …). To avoid filling up the test database, i.e. ccdb-test, which is shared by everyone, we have a cleanup script. It removes objects older than 1 day but keeps at least 1 per hour (if I remember correctly). Different parameters can be used for different folders. Algorithm to be refined as we gain experience.
Objects can be deleted in ccdb-test or your local instance using the REST api. You could use curl per instance, as described in Costin’s slides. Please be careful as there are no restrictions for the time being.
The production CCDB will not allow for deletion by users via the REST api. To be seen what the procedure will be.

Online mode

I proposed it earlier in this thread as an idea to have “less noise” in the GUI. Its real purpose is to get a look at what is currently ongoing at P2 or in a lab.
The online mode is not enabled in qcg-test as it does not belong to P2 or a lab.
We could look into enabling it but it would require to open at least 1 extra service to the outside world.
Alternatively, one could install the GUI locally.

QCG

Drawing options: This is probably a bug, I have created a ticket here.
Order: It is certainly fixable. I’ll let Adam and George comment in this ticket.

Best regards,
Barth

sheckel · November 6, 2019, 10:48am

Dear Barth,

thanks a lot for your replies! I have a few more questions to those.

Objects deletion

If I understand correctly, in principle old objects should be deleted automatically from the webpage latest after one day, if they are not updated again. That is actually not, what we observe. We see, that they stay on the webpage, even if not used for longer time anymore. Maybe the cleanup script just always keeps the last version of all plots? In any case, I will try the cleaning using the REST api.

Online mode

If the deletion of unused histograms works (either by the cleanup script or by ourselfes), I think the online mode is not needed for the qcg-test.

QCG

Thanks for opening the JIRA tickets, I will comment there.

Best regards,
Stefan