Debugging shared memory issues

I’m trying to insert a new device into an existing workflow and get a message like :

[40812:mch-digits-filtering]: [09:04:47][ERROR] Exception caught: shmem: could not create a message of size 41943040, alignment: 64, free memory: 253162208

are there some tools to debug the shmem usage (e.g. usage request per device or things like that) ?

I do not know about debug tools for shmem usage (would be actually interesting!), but in this case, maybe a moderate increase of the shmem could help, as there seems more free memory available than needed. This can be done by adding to your workflow the option

--shm-segment-size <value>

where you replace <value> with some reasonable number in bytes.

Hi Stefan,

Thanks for your answer.
Indeed, in this case increasing the allocated shared memory helped (was using the default of 2GB), but increasing that number “until it works” (while a very pragmatic and even eventually valid solution :wink: ) does not seem like a very future proof solution.
But ok, if there’s no (better) tool for shared memory monitoring, so be it.

Hi Laurent,

I am happy it helped in this case, but I totally agree, that this is just a workaround, and it would be nice to have a tool to analyse the shmem usage. Actually, there may very well be such a tool, just I am not aware of. Maybe some more expert people could comment?

Hi, there is very little to monitor messages in the SHM.
There is a simple tool called fairmq-shmmonitor by FairMQ. Usually I would use it with -i -v options.

I am since quite a while asking for such a tool, e.g. in PDP SRC Meetings: Alice Weekly Meeting: Software for Hardware Accelerators / PDP Run Coordination / Full System Test (13 October 2021) · Indico

The problem is:
In order to investigate messages in the SHM, FairMQ must be compiled in debug mode, so with the default software it is not possible at all.
But even then, to check the messages, one needs to pass the O2 headers inside FMQ (FMQ has an API (when build in debug mode) to iterate over all messages in the SHM, but then one needs to hack together some code by oneself to parse the O2 headers inside.