We are using the QC v0.26.1 (commit 19a915e32b9c04904df19b9ae87de894f24d7197) and alidist (commit 17838c60e2ce56e24179eff8e11e74a75bcb66d2, FairMQ v1.4.18) for ITS commissioning.
-rw-r–r-- 1 its its 32 Sep 3 20:23 sem.fmq_11930162_mtx
-rw-r–r-- 1 its its 104 Sep 3 20:23 fmq_11930162_cv
-rw-r–r-- 1 its its 655360 Sep 4 12:33 fmq_11930162_mng
-rw-r–r-- 1 its its 2000000000 Sep 4 12:37 fmq_11930162_main
Thank you for these to log files. However, I must say that tells me very little aside from giving suspicions that it is related to shared memory.
I see that you are using quite old software versions. Would you consider updating them, or this is not an option now? @eulisse Where there any fixes in DPL<->FairMQ
in the last few months which could have fixed this issue?
@jian Does the problem occur each time or just occasionally? Did it work before?
It only occasionally occurred. The chance is about 1/8.
We reverted to these old versions as we are still using the RDHv4 for the ITS commissioning (Readout v1.3.10-1). This workflow started processing data since Sep. 02 and the problem is always there.
The current workflow for shifters is: readout.exe (raw data replay) → StfB → raw-proxy → qc-its. We are running QC on 5 flps; 4 local + 1 “remote”. The problem is occurring randomly on these machines. We do clean the shm by running:
before starting the workflow. I can see the segments were removed.
We will update to the latest versions as soon as we run the detector with RDHv6 (hopefully a couple of weeks later).
For ITS QC task development, we keep following the latest software versions. I was facing a similar problem in July and August. Sorry, I did not remember clearly. It seemed the proxy thread occasionally could not be set up when starting QC but no crash happened. We will pay attention to gather the crash info.