Reconstruction : using o2-sim, o2-sim-digitizer-workflow and o2-tpc-reco-workflow executables

Hi,

I’m trying to use o2-sim tools to go up to the reconstruction. I found the documentation about o2-sim in AliceO2/doc/DetectorSimulation.md (the part about o2-sim-digitizer-workflow is blank though) and about o2-tpc-reco-workflow in AliceO2/Detectors/TPC/workflow/Readme.md

I am having some issues in the process.

I am running this on an installation of O2 done with Alidock on a macOS Mojave machine.
Here are the command lines I enter.

  • o2-sim -n 5 -g pythia8 -e TGeant4 -m TPC,ITS

Here the process seems to go without issue. I only get a warning about LLVM symbols being “exposed” to CLING (see log_o2-sim.txt attached).

  • o2-sim-digitizer-workflow -b

Here, again, the process seems to work fine. I do not get any error message apart from the same LLVM symbol warning. One thing though, unless what is announced in AliceO2/Detectors/TPC/workflow at dev · AliceO2Group/AliceO2 · GitHub , I do not get any tpcdigits.root output.
Again, I attached a log in log_o2-digitizer.txt

  • o2-tpc-reco-workflow -b

This time, though the executable seems to start fine, it soon comes to a halt after

[11233:tpc-tracker]: [12:02:13][INFO] DEVICE: Running...

with the following error message appearing every minute, with an updated time:

[11157:internal-dpl-clock]: [12:03:13][INFO] from_internal-dpl-clock_to_tpc-digit-reader[0]: in: 0 (0 MB) out: 0 (0 MB)
[11251:tpc-track-writer]: [12:03:13][INFO] from_tpc-tracker_to_tpc-track-writer[0]: in: 0 (0 MB) out: 0 (0 MB)
[11195:tpc-clusterer]: [12:03:13][INFO] from_tpc-clusterer_to_tpc-cluster-decoder[0]: in: 0 (0 MB) out: 0 (0 MB)
[11195:tpc-clusterer]: [12:03:13][INFO]  from_tpc-digit-reader_to_tpc-clusterer[0]: in: 0 (0 MB) out: 0 (0 MB)
[11215:tpc-cluster-decoder]: [12:03:13][INFO]  from_tpc-cluster-decoder_to_tpc-tracker[0]: in: 0 (0 MB) out: 0 (0 MB)
[11215:tpc-cluster-decoder]: [12:03:13][INFO] from_tpc-clusterer_to_tpc-cluster-decoder[0]: in: 0 (0 MB) out: 0 (0 MB)
[11233:tpc-tracker]: [12:03:13][INFO] from_tpc-cluster-decoder_to_tpc-tracker[0]: in: 0 (0 MB) out: 0 (0 MB)
[11233:tpc-tracker]: [12:03:13][INFO]  from_tpc-tracker_to_tpc-track-writer[0]: in: 0 (0 MB) out: 0 (0 MB)

At this point it continues giving me these few lines every minute, and nothing is done, until I stop the process. Again, I attached a log in log_o2-tpc-reco-workflow.txt

I suspect the previous step, “o2-sim-digitizer-workflow -b”, doesn’t actually gives me the right outputs.

Anyone has an idea of what I am doing wrong?

Best wishes,
Aimeric

log_o2-sim.txt (975 Bytes)
log_o2-digitizer.txt (32.5 KB)
log_o2-tpc-reco-workflow.txt (47.3 KB)

Hi @alandou
Most probably your tpc reco flow finishes its job but then keeps waiting for an extra input. Normally, ctr-C will stop it.
Proper termination of the flows was fixed in the last couple of days, could you update O2 and retry?

Here you can find a script to run sim. test with all detectors and reco flows for some of them.
Cheers,
Ruben

I tried again with an up to date version of O2, but I still have the same issue when running the tpc reco workflow.

I also tried to run the bash script you linked.
The simulation part seems to go without issue, and was done in about a minute. But the digitizer workflow seems to be taking a really long time to run. It’s now been more than 30 min, and no progress seems to have been made (the script is apparently stuck at “Running digitization”). Is it supposed to take such a long time?

Best,
Aimeric

Out of interest: Does it work with Geant3? I have seen similar problems as you with Geant4 and this might be a real problem.

Hi @swenzel

I tried the exact same 3 lines of codes, replacing Geant4 with Geant3.
I get the same issue: the executable running up to
[16793:tpc-tracker]: [17:05:06][INFO] DEVICE: Running...
then stopping, and giving the same output every minute:

[16717:internal-dpl-clock]: [17:06:06][INFO] from_internal-dpl-clock_to_tpc-digit-reader[0]: in: 0 (0 MB) out: 0 (0 MB)
[16811:tpc-track-writer]: [17:06:06][INFO] from_tpc-tracker_to_tpc-track-writer[0]: in: 0 (0 MB) out: 0 (0 MB)
[16755:tpc-clusterer]: [17:06:06][INFO] from_tpc-clusterer_to_tpc-cluster-decoder[0]: in: 0 (0 MB) out: 0 (0 MB)
[16755:tpc-clusterer]: [17:06:06][INFO]    from_tpc-digit-reader_to_tpc-clusterer[0]: in: 0 (0 MB) out: 0 (0 MB)
[16775:tpc-cluster-decoder]: [17:06:06][INFO]   from_tpc-cluster-decoder_to_tpc-tracker[0]: in: 0 (0 MB) out: 0 (0 MB)
[16775:tpc-cluster-decoder]: [17:06:06][INFO] from_tpc-clusterer_to_tpc-cluster-decoder[0]: in: 0 (0 MB) out: 0 (0 MB)
[16793:tpc-tracker]: [17:06:06][INFO] from_tpc-cluster-decoder_to_tpc-tracker[0]: in: 0 (0 MB) out: 0 (0 MB)
[16793:tpc-tracker]: [17:06:06][INFO]    from_tpc-tracker_to_tpc-track-writer[0]: in: 0 (0 MB) out: 0 (0 MB)

@alandou your simulation is empty due to the “,” in the TPC,ITS, you should use space for the o2-sim.
In opposite, the “,” is needed for detectors list in the o2-sim-digitizer-workflow. This is indeed, somewhat confusing…

I have to admit I had no clue how to separate them after looking at the o2-sim --help. I ended up mimicking the syntax shown for --skipModules on the o2-sim documentation in AliceO2/doc/DetectorSimulation.md
This is indeed a bit confusing. Any chance it could be mentioned either in this DetectorSimulation.md or even better in the o2-sim --help output?

Looking back at the o2sim.root file I created using the ITS,TPC syntax, indeed there is no “ITSHit” or “TPCHitsShiftedSector##” tree branch , while they are there with the ITS TPC syntax

Now, replacing that “,” shifted the issue to the o2-sim-digitizer-workflow -b process. I get the same kind of log I got with the o2-tpc-reco-workflow -b before:

[18811:TPCDigitizer7]: [18:26:00][INFO] from_TPCDigitizer7_to_TPCDigitWriter[0]: in: 0 (0 MB) out: 0 (0 MB)

with variations (Simreader or ITSDigitizer instead of TPCDIGITIzer for example). I stopped the process after more than 10 min of it running. I attached the full log again. As not all of the numbers shown are 0 (see log), I don’t really know whether the process is supposed to take a lot of time or if there is simply something not working right.

PS: I thought it might be because I didn’t declare the detectors, but checking the log it looks like it can detect that on its own:

Digitizer Detector Detection
[INFO] TPC is in grp? yes; is skipped? no
[INFO] TPC: Channel 0 will supply ROMode
[INFO] ITS is in grp? yes; is skipped? no
[INFO] MFT is in grp? no; is skipped? no
[INFO] TOF is in grp? no; is skipped? no
[INFO] FT0 is in grp? no; is skipped? no
[INFO] EMC is in grp? no; is skipped? no
[INFO] HMP is in grp? no; is skipped? no
[INFO] ZDC is in grp? no; is skipped? no
[INFO] TRD is in grp? no; is skipped? no
[INFO] MCH is in grp? no; is skipped? no
[INFO] MID is in grp? no; is skipped? no
[INFO] FDD is in grp? no; is skipped? no
[INFO] PHS is in grp? no; is skipped? no

Best,
Aimeric

log_digitizer_syntaxCorrected.txt (285.0 KB)

I could not reproduce your problem with "o2-sim -n 5 -g pythia8 -e TGeant4 -m TPC ITS", even using 8 TPC lanes like you do

but from your log I see that the TPC lane 5 (from 8 digitizers created) gets stuck. Could you try digizing the same data with
"o2-sim-digitizer-workflow --tpc-lanes 4 -b"?

For @swenzel: I see messages like " HAVE DIGIT DATA FOR SECTOR -1 ON CHANNEL ..." " HAVE DIGIT DATA FOR SECTOR -2 ON CHANNEL ..." (what -2 means?) followed by " CHANNEL ... DONE" for all channels 0-7 but 5.

for all channels 0-7 but 5 (which apparently gets stuck) and "... DATA FOR SECTOR -2" for channels 4,6,7

(which is ok) and " HAVE DIGIT DATA FOR SECTOR -2 ON CHANNEL ..." (not ok) for all channels 0:7 except 5,

Here is the log of the o2-sim-digitizer-workflow --tpc-lanes 4 -b command line, ran in the same folder where I ran o2-sim-digitizer-workflow -b in my previous message.

A short question concerning that: does running an o2-digitizer-workflow command more than once in the same folder have an impact on the output? I don’t know if it edits some of the files from the o2-sim command or even searches for files that would have been created by a previous o2-digitizer-workflow command and changes its process accordingly.

Best,
Aimeric

log_lane_restrictions.txt (484.2 KB)

Hi @alandou
OK, now all 4 channels have CHANNEL ... DONE, so, apparently, before there was a problem of syncing the job between the lanes (we have seen this before but I thought it was solved).

But on the exit it produces *** Error in `o2-sim-digitizer-workflow': double free or corruption (!prev): 0x0000000036671440 ***.
@swenzel, you have a separate ticket on this issue, right?

@alandou to be sure the problem you see is not due to the G4 (we have seen some bogus times assigned to tracks, which may create problems), could you try the same with G3?

You can run again the digitization over the same input data, it does rewrite the o2sim-grp.root file but this should not pose any problem

Here is attached the log of
o2-sim-digitizer-workflow --tpc-lanes 4 -b
done after
o2-sim -n 5 -g pythia8 -e TGeant3 -m TPC ITS

log_lane_restriction_Geant3.txt (189.8 KB)

On a side note, I found out something strange, I do not know whether it’s intended or not: when running those o2-sim commands in a directory with a name too long, the simulation stops with an error.
I tested that with folders named by concatenating figures in a way to easily count the character length: “12345”, “12345678”, “1234567890123”, etc … Inside a folder 12345678901234567 or lower, it works. Above that (123456789012345678 or more) it fails. Different characters, like “_” or other special characters, might change that, I didn’t tested.

Name too long: test with name=1234567890123456789
[O2/latest] ~/MyWorkDir/O2_work/Simulation_tests/12345678901234567890 $> o2-sim -n 5 -g pythia8 -e TGeant4 -m TPC ITS
[INFO] Running with 8 sim workers
[INFO] CREATING SIM SHARED MEM SEGMENT FOR 8 WORKERS
Spawning particle server on PID 19436; Redirect output to serverlog
Spawning sim worker 0 on PID 19438; Redirect output to workerlog0
Spawning hit merger on PID 19439; Redirect output to mergerlog
[INFO] Merger process 19439 returned
[INFO] Simulation process took 0.746956 s
Error in <UnknownClass::InitInterpreter()>: LLVM SYMBOLS ARE EXPOSED TO CLING! This will cause problems; please hide them or dlopen() them after the call to TROOT::InitInterpreter()!
Error in <TFile::TFile>: file o2sim.root does not exist
[O2/latest] ~/MyWorkDir/O2_work/Simulation_tests/12345678901234567890 $> ls
mergerlog o2simtopology_19435.json primary-get_2500519435 serverlog workerlog0

Ok, the G3 test converged, pleas don’t use G4 at the moment since we have seen other problems there.
Is this weird dependence on the path length for G3 or G4? I am used to run simulation with quite long paths and never have seen any dependence on their length.

Cheers,
Ruben

  • Digitizer Going forever:
    I tried this set of commands:
o2-sim -n 5 -g pythia8 -e TGeant3 -m TPC ITS
o2-sim-digitizer-workflow -b

I still end up having the second command running in what looks like a loop, giving me a same set of log lines every minute, with in: 0 (0 MB) out: 0 (0 MB) on every one of them.
I attach the log though it looks like the same issue.
log_Geant3_try_with_correct_o2-sim_syntax.txt (287.6 KB)

  • Path length:

Concerning the path length: it’s the same regardless of all the options I tried. Geant3 vs Geant4 didn’t make a difference, neither changing the detectors involved or the number of events.
Also, it seems like this length issue is tied to the total path length and not just the folder name length: the same folder name that was failing in my previous post worked fine if I created the same on my home directory ( “~/” the one where I end ups when using alidock) rather than deeper in my folder tree. I tried again to find the length limit, and found another one in this “~/” directory. The new limit is 123456789012345678901234567890123456789012345678901
For a length of 51 digits. This one sees o2-sim -n 5 -g pythia8 -e TGeant3 -m TPC ITS working fine. Anything above that and it fails.

Quick EDIT:
@shahoian I just launched a modified version of your runtest.sh script, adding “-e Geant3” to the o2-sim line. It’s now running and went past the “Running digitization” line (now “Running ITS reco flow”). I will give it some time and make a new post to give an update on this.

About your runtest.sh script @shahoian. It’s been an hour now, no change, the “ITS reco flow” is still running. Unless it’s supposed to take longer, there is an issue somewhere.