Customizable command line options for PostProcessingTask?

tklemenz · February 10, 2021, 2:57pm

Hello experts,

I have a question concerning postprocessing tasks. Is it possible to add command line options to hand some parameters to a post processing task? I know that there is a functionality with customize functions for workflows where ConfigContext are handed around. Tbh I don’t fully understand how this is implemented.

But the question is if there is something similar one can use for postprocessing tasks. I know there is a configure function but I do not know what exactly it does or how it works and I think it cannot add command line options. As far as I understand they are implemented in runPostProcessing.cxx and cannot be customized easily for individual tasks. Could someone give me some insight?

Thanks a lot!
Cheers,
Thomas

pkonopka · February 10, 2021, 3:33pm

Hi,
customize and ConfigContext are only relevant to DPL, but Post-processing tasks can run with and without it, so it is not quite possible to access command line options.

However, the configure method gives you your task name and a full configuration tree which is taken from the configuration file you use. So you can add some parameters of your choice in your task configuration structure.

You can see how it is done for other tasks. For example, this is how Trending Task reads the configuration from such a ptree. Here is an example file which it reads.
Cheers

tklemenz · February 10, 2021, 4:02pm

Ah, thanks for the hint. So I would basically need to make MyTaskConfig.cxx where I read the configuration from the file and after calling the configure function in MyTask.cxx one can access the config values.

I actually have another question now. I don’t know if it is better to make a new thread. Just tell me to put it in a new one if I should do so.
The task I am using is made to look at a number of objects in the CCDB. It fetches them, makes nice canvases which show the data and publishes the canvases to the QCG.
I would ideally want the task to update every time either one of the objects is changed in the CCDB and if one is changed it should only update the plots for that object. I know there is an option to trigger update upon one object changing with e.g. newobject:qcdb:qc/TST/MO/QcTask/example but is it possible to put multiple objects in the update trigger and then check within the task which object actually triggered the update?

Cheers,
Thomas

pkonopka · February 10, 2021, 4:18pm

You don’t have to necessarily make a structure for configuration, you can read the ptree directly in your task. This just how I did it to have cleaner code.

You should be able to put multiple triggers in the update list, for example:

        "updateTrigger": [
          "newobject:qcdb:qc/TST/MO/QcTask/example",
          "newobject:qcdb:qc/TST/MO/QcTask/example2",
          "newobject:qcdb:qc/TST/MO/QcTask/example3",
        ],

Then it will trigger if any of the three were updated. However, I think it will tell you only that NewObject triggered, but not which one.
Perhaps it may work on directories as well, but I am not sure, I haven’t tried it this way.

tklemenz · February 10, 2021, 4:20pm

Ok, I see. Since the file reading part for my purpose won’t be that long this might actually be nicer.

tklemenz · February 10, 2021, 4:25pm

Do you know what would happen if I have e.g. 5 files in the trigger list, then 3 of them are updated within a few seconds but my update function, which itself takes a few seconds to run, runs over all objects every time? Does it trigger 3 times? I guess it will, right?
So the question basically is: Will the trigger which arrives while the update is running be lost in the void or will they stack up in a queue?

pkonopka · February 10, 2021, 4:29pm

Hmm, yes, I think so. I am not sure yet how to approach your use case, I will get back to you tomorrow.

tklemenz · February 10, 2021, 4:29pm

Thanks a lot Piotr, this was already very helpful!

pkonopka · February 12, 2021, 8:07am

I thought a bit more about this. A have a few ideas, but none are ideal:

If you know that the objects of your interest are updated in regular time intervals, you could just trigger the task in similar intervals and always download the objects. If it is not very often, then I would say it is fine.
You can trigger the task in time intervals, but you check for the object updates yourself. The NewObject trigger code could help you with that - you could use it to create a bunch of triggers yourself and then manage them in your own way inside the task.
We rework the NewObject trigger to accept multiple objects, for example separated with ‘;’. However, I am not sure yet how to then pass the information which object update actually caused the trigger, at least without changing a lot of code.

Personally, I would advocate for the 2nd, because I have already a lot on my plate, so I can’t promise I could do 3. soon. What do you think?

tklemenz · February 12, 2021, 8:12am

Option 2 sounds fine I think. Maybe even option 1 would be sufficient for some objects.

Thanks for your help!