BUSY WAITING SIZE on workerlog0 of o2-sim

pezzi · May 6, 2020, 5:54pm

Hi,

I am running a simulation with 50000 pythia8 events using o2-sim -m PIPE ITS MFT -j16 -e TGeant3 -g pythia8 -n 50000 on alidock.

It started using 100% of all CPU cores for at least the first couple thousand events. Now it is a bit over midway, at about event # 35000, and the CPU load is consistently at 50%. workerlog0 has several lines of BUSY WAITING SIZE. The first occurrence appeared at event # 8861.

Since Detector.h says that "this should ideally never happen", I report it here. I can share the logs if necessary.

A couple of weeks ago I ran similar simulation with the full ALICE setup, on Ubuntu 18.04 docker container. It also presented the same slowdown midway, but I did not investigate. Looking back at the logs, BUSY WAITING SIZE is there, starting at event 17233.

By the way, workerlog0 has the combined output of all workers. I wonder if is there any way to discriminate each worker on the logs?

Cheers,
Rafael

swenzel · May 6, 2020, 7:40pm

That’s typically an indication that the IO process becomes too slow to pickup the hits from the workers which in turn means that shared memory segments remain full.
I can’t say why this happens in your case. Maybe some critical file size is reached and the filesystem inside docker becomes the bottleneck. There are some parameters that we might tune but maybe you could just split your productions into smaller ones (and write a tool to combine them)?

As a side note: Are you sure that no additional material is needed for MFT? Something like magnet, absorber, etc.? These will likely add more stress on the workers and less stress on IO.

pezzi · May 7, 2020, 12:47am

Hi Sandro,

thanks for your reply and suggestions. I will keep in mind splitting larger productions. For now, these productions are enough: one set of MC data with the full ALICE setup and another one for quicker assessment with the PIPE, ITS and MFT.

Rafael