Dear all,
I’m trying to run an acquisition on the MID FLP installed at CERN.
Readout.exe works fine when writing the output on file, but when I try to use the StfBuilder I get this error message:
terminate called without an active exception
96654 Aborted StfBuilder --session default --id stf_builder-0 --transport shmem --detector MID --dpl-channel-name dpl-chan --channel-config "name=dpl-chan,type=push,method=bind,address=ipc:///tmp/stf-builder-dpl-pipe-0,transport=shmem,rateLogging=5" --channel-config "name=readout,type=pull,method=connect,address=ipc:///tmp/readout-pipe-0,transport=shmem,rateLogging=5" --detector-rdh 6 --verbosity veryhigh
The config file I use is the following:
# test config to run readout-stbf out of the box with data emulator
# dummy data source
[equipment-emulator-1]
enabled=0
name=emulator-1
equipmentType=cruEmulator
memoryPoolNumberOfPages=1800
memoryPoolPageSize=1M
numberOfLinks=4
PayloadSize=8000
# define a (disabled) CRU equipment for CRU end point #0
[equipment-rorc-1]
enabled=1
equipmentType=rorc
cardId=#1
dataSource=Fee
memoryBankName=bank-o2
memoryPoolNumberOfPages=1800
memoryPoolPageSize=1M
rdhUseFirstInPageEnabled=1
linkMask=10,11
firmwareCheckEnabled=0
# monitor counters
[consumer-stats]
consumerType=stats
monitoringEnabled=0
monitoringUpdatePeriod=5
monitoringURI=influxdb-udp://alio2-cr1-flp159:8088
# record data to file (disabled)
[consumer-rec]
enabled=0
consumerType=fileRecorder
fileName=/tmp/data.raw
# allow data sampling to take data
[consumer-data-sampling]
enabled=0
consumerType=DataSampling
# send data to stfb
[consumer-StfBuilder]
enabled = 1
consumerType = FairMQChannel
sessionName = default
fmq-transport = shmem
fmq-name = readout
fmq-type = push
# fmq-address = ipc:///tmp/flp-readout-pipe-0
fmq-address = ipc:///tmp/readout-pipe-0
memoryBankName = bank-o2
unmanagedMemorySize = 2G
memoryPoolNumberOfPages = 200
memoryPoolPageSize = 1M
disableSending=0
# matching config for the test receiver
# [receiver-fmq]
# decodingMode=stfHbf
# channelAddress=ipc:///tmp/flp-readout-pipe-0
# channelType=pull
The data distribution version I’m using is the one shipped with the flp-suite, namely: v0.7.6.
Notice that a similar configuration is working on different FLPs (not located at CERN), one with the same software version, and the other with an older version installed (0.7.3).
Any suggestion is welcome.
Thanks in advance!
Best regards,
Diego
It’s a shot in the dark … but I have seen this exception recently related to shared memory problems. Could you try increasing the shared memory segment using --shm-segment-size 10000000000 (for 10GB instead of 2 default). Alternatively, you may try with the new --no-IPC option to disable shared mem.
Hi @swenzel,
ok, I think I figured it out.
The readout.exe is compiled with FairMQ v1.4.20 while the DataDistribution part comes with FairMQ v1.4.18. I re-installed DataDistribution with aliBuild (instead of the RPMs) with an updated version of alidist and the issue is gone.
Not sure how to ensure that all of the packages needed by Readout and DataDistribution are in synch though…
This is a problem similar to what I posted here:
Anyways, thanks for the suggestions.
Cheers,
Diego
Before each release of the FLP Suite, we do integration tests to identify issues like this and avoid them from reaching end users, so I’d like to understand if there’s something to be improved in the test procedure,
The latest release of FLP Suite (v0.8.0) ships readout 1.4.0-4 but I see that you also have Readout 1.4.4-3 (which, as indicated, compiles against FairMQ v1.4.20).
Was there any manual installation of Readout in this machine ?
Dear Vasco,
I did not install the software myself, but it might be that the needed Readout version was not tagged yet and the Readout package was updated later on (I guess via yum).
So maybe this is the problem.
Thanks,
cheers,
Diego
Dear all,
while the crash is solved I get another issue.
The Readout seems not to be recognising that I’m in the MID detector. I usually define the dataspec MID/RAWDATA. However, the DPL proxy gives me the warning message:
[195868:readout-proxy]: [15:46:14][WARN] Some input data are not matched by filter rules
[195868:readout-proxy]: FLP/DISTSUBTIMEFRAME/0
[195868:readout-proxy]: NIL/RAWDATA/12
[195868:readout-proxy]: NIL/RAWDATA/11
meaning that somewhere in the chain (Readout?, StfBuilder?) the data source is not identified. Is this a known issue? How can I force telling the system that the data source is MID?
Dear @gneskovi,
AFAIU, in the past the data source for the StfBuilder was specified with the --detector option.
However, if we have RDH v6, this option is ignored, and the data source is taken directly from the RDH.
I guess that it is up to each detector to correctly fill the data source in the RDH when the UL is implemented, but I’m not sure what to do if the UL is not implemented. In this case, it seems that the data source is invalid.
Is there a way to force the StfBuilder to use the data source specified with the --detector option?
Hi @dstocco,
Do you know what value is used in the RDH?
Using the RDH value was intentional, but I could add fallback to the configuration parameter if this is invalid…
@costaf : Can the Detector/System ID be specified during configuration of CRU to have the correct value in the RDHv6?
Hello, which CRU firmware version has been used for the tests?
There is a way to configure the SYSTEM ID for the streaming detector without UL.
It is not yes timplemented in the software so I have to check the register to be written.
Ciao @costaf,
the FW version is: e7687156 (you updated it last week).
However, before inserting the correct value (which I need to check what is for MID), do you know if a reload of the electronics configuration might be required after the modification?
We have a problem with the mini-pc used to configure the electronics: they seem to be not online, so we cannot reconfigure it remotely.
In this case I’d prefer to stick with the current configuration until September and use an hack instead (if I tell StfBuilder that the RDH version is v5 instead of v6 it seems to work…)
Hello,
no the adding of this register will not change the clock of the CRU, so the link should stay up.
I am not in favor to hacking the system, otherwise we will progress with the testing and at some point realize that everything was working because we were tweaking something.
For MID the magic number is 37 (0x25)
How many links do you use?
for EP0 link 0 the raw command to load the correct system id is the following
# LINK 0
roc-reg-write --i=PCIEADD --address=0x00640004 --ch=2 --val=0x250000
# LINK 1
roc-reg-write --i=PCIEADD --address=0x00642004 --ch=2 --val=0x250000
# LINK 2
roc-reg-write --i=PCIEADD --address=0x00644004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00646004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00648004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x0064a004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x0064c004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x0064e004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00650004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00652004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00654004 --ch=2 --val=0x250000
# LINK
roc-reg-write --i=PCIEADD --address=0x00656004 --ch=2 --val=0x250000
Ciao @costaf,
well, indeed the hack is not working as expected (it complains about the RDH size).
But if the proper configuration does not bring the link down, let’s use it.
I use 2 links (10 and 11) on CRU #1 (3b:00.0, EP 0).
So, if I get it correctly the command is:
# LINK 10
roc-reg-write --i=PCIEADD --address=0x00654004 --ch=2 --val=0x250000
# LINK 11
roc-reg-write --i=PCIEADD --address=0x00656004 --ch=2 --val=0x250000
Quick question to be sure. When I write --ch=2 does it mean CRU ID #1…or should I do something. else to target that CRU?
[2020-07-15 17:42:43.538][ **E** ] READOUT INTERFACE: wrong number of HBFrames in the header.header_cnt=256 msg_length=449 total_occurrences=1
[2020-07-15 17:42:43.539][ **E** ] READOUT INTERFACE: Error when accessing the RDH: RDH size is too small. size=12
[2020-07-15 17:42:43.561][ **E** ] READOUT INTERFACE: TF ID non-contiguous increase! (6) -> (11) readout.exe sent messages with non-monotonic TF id!
SubTimeFrames will be incomplete! Total occurrences: 0
can you dump some data locally on the disk with readout and give me tha path to the file?
RDH size = 12 is really wrong, but I doubt it is coming from the CRU as it should chop the data correctly … is there some data processing in the middle?