we are preparing some tests for the TPC-QC in the upcoming Milestone Week 5. We have already successfully tested locally (on data replay) a full reconstruction chain with each one QC task on the raw data and during the reconstruction, and several QC tasks at the last stage after reconstruction.
Now during MW5, the synchronous reconstruction of the TPC data will run in parallel on multiple EPNs and in the end we would like to have the combined (merged) output of all the QC tasks. Here we have two cases and related questions:
The tasks running before or during reconstruction will have to run on all EPNs. Is it already possible to merge the output of those tasks, e.g. on a dedicated QC node?
Concerning the other tasks, we would like to ask, if it is already possible to not run them on the EPNs, but instead collect and merge the output of the reco of all EPNs and then run these tasks only once on a QC node?
Would you run on the stored output, ie. files ? in this case I think that it is trivial as you could even do it by hand. And of course not to run something just means that you remove it from the config. If you mean a proper post-processing task, we have not done it yet at P2.
thanks for the link to the docu. I wasn’t aware of that part.
The options described above are basically
Run QC tasks on multiple nodes locally, merge their outputs on a remote machine and then publish to QCG, which is covered by the docu you provided, as far as I understand (but there will be a few questions later, probably)
Collect data samples (e.g. TPC tracks) from multiple nodes (running e.g. online reconstruction) on a remote machine and run the QC task on that machine, then publish QC output to QCG
We need to think if option 2 would even be needed because QC task output needs to be mergeable anyway for option 1 so option 2 could probably as well be achieved via option 1.
for option 1 described above:
Would it in general be possible to implement the possibility to split a qc task in two parts, so that part A is run on local nodes, which send MOs to the remote machine and then the remote machine performs part B of the task on the MOs before publishing?
This would for example be convenient for objects where merging needs further information from other MOs to be done correctly.
At the moment, it is not possible. We have to discuss with Piotr and see the implication. You could possibly achieve the same with post-processing, i.e. a task that would get all the sub-objects and merge them “smartly” (and we would trash the sub-objects and keep only the merged one).