O2-sim does not run

I try to run Pythia8 simulations with o2-sim but it stack at the very beginning. If I only type o2-sim without any further option it does the same.
It creates all the usual files, including the rootfiles but it does not do the simulation, just pends. Here is the what I see in the terminal:

prompt$ o2-sim
[INFO] This is o2-sim version 1.2.0 (110107d065)
[INFO] Built by ALIBUILD:v1.10.2, ALIDIST-REV:275cc6137fb0550ca656bc895f91ddfad3259d77 on OS:Linux-4.4.0-19041-Microsoft
[INFO] BINDING TO ADDRESS ipc:///tmp/o2sim-notifications-11997 type pub
[INFO] Running with 4 sim workers
[INFO] CREATING SIM SHARED MEM SEGMENT FOR 4 WORKERS
Spawning particle server on PID 12000; Redirect output to o2sim_serverlog
Spawning sim worker 0 on PID 12002; Redirect output to o2sim_workerlog0
Spawning hit merger on PID 12003; Redirect output to o2sim_mergerlog

and it stops here forever. The logfiles are uploaded to here:
https://slokos.web.cern.ch/slokos/sim_error/

O2 is fresh, I built it yesterday. BTW: with an older version on another computer, I have no problem but I had no problem before on my recent machine either. The system is WSL2 (Ubuntu 20.04).

Thanks for any idea in advance, Sandor

I think you should at least specify the number of events you’d like to simulate (e.g. -n 10). Does it work then?

Hi @swenzel , thanks for the answer.
No, unfortunatelly. I tried to add the usual settings (-g,-e, etc.) one-by-one but no luck. The minimal o2-sim -n 10 doesn’t work either. The full command I tried to run in the first place was

o2-sim -e TGeant3 -m FV0 FT0 ITS PIPE MFT -j 2 -n 5000 -g pythia8pp --field 0 --configKeyValues "Diamond.width[2]=6.;GeneratorPythia8.config=/path/to/pythia.conf"

Ok. Then I would need to see the log files o2sim_[serverlog|workerlog|mergerlog] please.

They are on the link above. I re-generate them with the full command.

I see [e[01;36m20:39:28e[0m][e[01;31mERRORe[0m] Some error/interrupt occurred on socket during receive which indicates some communication problem.
Could you try to cleanup /tmp/, /dev/shm, or eventually reboot?

Hm, I did so as you suggest and it seems like I’m not a root anymore. I have to sudo alienv enter instead of alienv enter otherwise I got error messages like

Module ERROR: couldn't create error file for command: permission denied 
In '/home/lokos/alice/sw/MODULES/ubuntu2004_x86-64/Python/v3.6.10-2' 

and similars. This is strange. Should I rebuild O2 to get rid of this?

But the original problem is still present. I have no error in the worker log anymore but thought.

/tmp and /dev/shm is empty now.

The software should work without root rights evidently. Something seems to be wrong with your permissions or setup. At this moment, I have no further idea I am afraid.

Is this something run in a container/virtual machine or just a plain Ubuntu server?

Yes, it is pretty strange. The problem with root permission appeared when I cleaned up /tmp and /dev/shm. I will try to rebuild and I’ll let you know the outcome.

I use Ubuntu 20.04 on this WSL2 platform (Windows Subsystem for Linux). It’s kind of a virtual solution but not like Virtual Machine by Oracle. I had no problem with ALICE software related to this specific platform before.

Thanks!

I did a full re-build, but the problem appears to be the same. I don’t need to be root, as it should be, but that socket problem is still present. The /tmp contains only a single directory with a single file called apt.conf which has one line: DIR "/"; This doesn’t seem to be problematic. The /dev/shm is a link to /run/shm which is empty.
You might have further idea but may be it is platform specific issue.

You may try not to use the /tmp folder. For this, you would need to

a) replace all /tmp inside the o2sim_parallel.cxx file
b) replace /tmp inside o2simtopology_template.json

(all within the run folder of O2). Recompile and check if it helps.

No luck. I changed all /tmp path to /temp which I had created in advance with the same permissions as the /tmp has. The socket problem is still there.

Another thing which worries me is that I got a WARNING when I enter O2/latest or QualityControl\latest:

WARNING: not updating modulefiles

Could this mean that somehow my installation is not complete?

My suspicion is that there are some sort of write, permission problems with the file-system that you mount inside the container. Similar problems exist on AFS sometimes but this is absolut speculation and I am afraid it will be hard to hunt this down. Maybe using a. standard Linux OSs will make things easier.

Yes, that is absolutely possible. I have no problem with WSL2 so far but it might cause the problem on a very general level.

I will try to do a complete re-build from scratch, not just O2. And also look around in the Win if there is anything which permits something concerning WSL2.

Thanks for the replies, I will give feed back if there is any solution/progress.

I have seen this WARNING: not updating modulefiles long ago when O2 was compiled as root, then the user does not have proper permissions to properly set the environment. Worth checking.

Thanks, so this is another sing of that something with permissions are not correct.

indeed alienv tries to copy the modulefiles in some central place when it runs. Failing that will result in the errors you see. There is an environment variable which prevents that (NO_REFRESH), but I think it’s required that at least one user does the correct installation.