How to get a core dump for a DPL workflow?

I’m getting a crash in a workflow and the backtrace shown is not enough to find out what the problem is. Turns out I get no core dump that I could use to investigate a bit more.
What should I do to get a proper core dump ? (seems a while ago we did have coredump upon DPL crashes, right ?)

I’m on a Mac (M1) if that makes a difference for the “recipe” to use…


@eulisse any hint on this one ?

I assume you have the correct ulimit set, correct?

I need to check. In principle I do not touch any system settings related to that, IIRC, so it’s up to your system configuration. I will check if that’s actually the case and if so.

Here are my limits. Compared to the default ones on my system, I increased the file descriptors (otherwise some workflows would simply fail) and unlimited the core file size (which is at zero by default).

📦qc🚀~/alice/qc/qc-async$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         unlimited
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       2666
-n: file descriptors                1024

But even with those limits I get 0-sized core dumps …

What does sysctl kern.coredump say?

📦qc🚀~/alice/qc/qc-async$ sysctl kern.coredump
kern.coredump: 1

and your user can write in the current folder of the process, correct?

I think so, I do have some output, e.g. :

-rw-r--r--   1 laurent staff          0 jan 13 14:50 core_dump_57189
-rw-r--r--   1 laurent staff         98 jan 13 14:50 encountered_exceptions_list
-rw-r--r--   1 laurent staff       1230 jan 13 14:50 reco_ASYNC.log

For the record the use case for the core dump at this very moment is to see where that very generic exception is coming from :wink:

$ cat encountered_exceptions_list
reco_ASYNC.log:[FATAL] error while setting up workflow in o2-qc: unordered_map::at: key not found

I will check about the corefile, but IMHO you would be better off simply attaching the debugger and adding a breakpoint on __cxa_throw.