Complete links missing from data with 2nd CRU

Hello CRU team,

we have 2 CRUs in FLP2. Each CRU supports 2 x 9 = 18 links. CRU 0 works well but we see some strange behavior with the second CRU. Both CRUs run identical FW.

===========================================================================================
  #   Type   PCI Addr   Vendor ID   Device ID   Serial   FW Version      Card ID          
-------------------------------------------------------------------------------------------
  0   CRU    04:00.0    0x1172      0xe001      0        0-0-4a3df76b    00010002-00000000
  1   CRU    05:00.0    0x1172      0xe001      0        0-0-4a3df76b    00010002-00000000
  2   CRU    83:00.0    0x1172      0xe001      0        0-0-4a3df76b    00010002-00000000
  3   CRU    84:00.0    0x1172      0xe001      0        0-0-4a3df76b    00010002-00000000
===========================================================================================

We use identical scripts to take data, only the PCI-E address got modified accordingly and the readout configuration file. What we see is the following:

  • data from random links “disappears”, meaning it is not in the data written to file.
  • sometimes data from a complete EP is missing.
  • sometimes there is no SYNC at all in the data

Cheers,
Torsten

To clarify what i mean with data from random links “disappears”. I start the readout and i expect data from Link 0-8 and Link 12-20 (2 x 9). Now, in the first readout i would see data from Link 0,1,2,3,4,5,6,8 and Link 12-20. So data from Link 7 is completely missing, no packages with that ID. Next time i read out, i would see the same but with a different link. So now link 8 is missing but 7 is there.

Ciao,
thx for reporting this.
I’ll check if I can reproduce the issue in the lab.
Keep you posted.
Cheers

Ciao Torsten, how big is the file you are recording on the disk?
I was doing some tests dumping 50 MB on the disk and using 11 links per End points.

This is my config

[bank-a1]                                
type=MemoryMappedFile                    
size=2G                                  
numaNode=1                               
                                         
[bank-a2]                                
type=MemoryMappedFile                    
size=2G                                  
numaNode=2                               
                                         
[bank-b1]                                
type=MemoryMappedFile                    
size=2G                                  
numaNode=1                               
                                         
[bank-b2]                                
type=MemoryMappedFile                    
size=2G                                  
numaNode=2                               
                                         
                                         
                                         
[equipment-rorc-1]                       
equipmentType=rorc                       
enabled=1                                
cardId=21:00.0                           
generatorEnabled=0                       
rdhCheckEnabled=0                        
rdhDumpEnabled=0                         
memoryPoolNumberOfPages=1000             
memoryPoolPageSize=2M                    
linkMask=0-11                            
memoryBankName=bank-a1                   
blockAlign=2M                            

and the second equipment identical to the first

Now indeed I missed one link from time to time.

But I tried to increase the file to 100 MB and I could see the remaining link.
I am using the firmware 2.5.0 … and scanning the file created by readout in this way

 python eventDump.py /tmp/data.raw | grep " 0)" | grep "       0x1        0"
   0)   0x112233   0x112244  0x2776cec        0x1        0x0 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x1 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x2 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x3 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x4 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x5 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x6 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x7 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x8 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x9 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x0 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x1 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x2 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x3 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x4 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x5 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x6 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x7 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x8 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0x9 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0xa 0x1ee02000     0x12ec 0x1ea04003
   0)   0x112233   0x112244  0x2776cec        0x1        0xa 0x1ee02000     0x12ec 0x1ea04003

I can count the link ID on each end point.
All the link ID are printed at constant pace (the time for the program to go through the different SUPERPAGE and check the data) … but the last one comes very late.

I will check with Sylvain if the allocation of the memory blocks has some strange behavior … but I guess your link is not lost … could you try to increase the size of the file you store on the disk and check if you find the missing link?

Thx

Hi Pippo,

i will try to test it tomorrow, we were busy with other stuff today. The general issue with the file size defined in the file recorder i was aware of, we ran into the problem before when i forgot to adapt it for more links. At the moment we run with 20 links and the SP for each link is set to 1M. So we set the file size in the file recorder to twice the total size, 40M. But the missing data is still there. Will increase it further tomorrow.

Cheers,
Torsten