Hello CRU team,
during data taking we spotted another potential bug. While data taking seemed to have worked last week, this week we saw some artifacts in the data. To exclude any mixed up state in the CRU, we rebooted the host and started fresh. Still, the issue with data corruption remains.
What we did so far. The data stream is decoded into the ADC values of the different SAMPA channels. One can then look at the values over time (baseline). They need to be in a certain range for each SAMPA channel. So this can be used as a “digital” fingerprint. What we see is a jump in the data, meaning data is lost somewhere.
Below is the time sequence (each line = time-bin) for 16 ADC channels. In time-bin 4941 there is a sudden jump, meaning data is lost somewhere.
4938 : 52 61 70 76 83 73 82 83 73 68 79 69 78 86 76 98
4939 : 55 63 71 77 84 74 84 84 72 69 80 65 79 85 76 98
4940 : 55 60 71 76 84 74 82 85 74 69 80 67 80 85 78 98
4941 : 54 61 70 99 55 61 70 76 83 73 84 84 72 69 79 67 <-- Jump in baseline
4942 : 79 84 76 98 52 61 72 77 83 73 83 85 74 69 80 70
I traced this back to the GBT frames in the data. First column is the packet number, second column the file position (in byte), third column is the GBT frame number (starting from the SYNC). What follows is the GBT frame and the decoded ADC values around the jump in the data (in brackets).
00000080 00663200 00039687 : 00000082.7a0a20fa.0807a0a0.052200d8 : 0003 0002 0010 0002
00000080 00663216 00039688 : 000010a2.f81288da.00af21a2.a6182a78 : 0015 0002 000e 0002
00000080 00663232 00039689 : 0000f088.f0f80870.82272fa0.07a202e5 : 0002 0003 0016 0001 (98 54)
00000080 00663248 00039690 : 0000f80a.70f88050.08050f2a.a5288a3c : 001d 0001 0006 0002 (61 70)
00000081 00663616 00039691 : 0000f088.daf8007a.0aaf2fa0.078202ef : 0003 0003 0017 0001 (99 55)
00000081 00663632 00039692 : 0000f80a.78f808da.2a263f2a.a5288a3c : 001d 0001 0006 0002
00000081 00663648 00039693 : 0000f0a8.5af2a0d2.88a3ef28.87828872 : 000c 0002 0013 0002
00000081 00663664 00039694 : 0000e082.72e88836.08ad8e28.8eb28258 : 0009 0002 0014 0002
I cross-checked this with the raw 32-bit data dumped by the CRU to file. This is the end of packet number 80. The position here is simply the line number in the hex-dump. So each number corresponds to a 32-bit word and counting starts with 1.
Last GBT frame starts with 0xA5288A3C (Position 165812 *4 = 663248). This is consistent with the frame dump above.
End of package 80
165809 07A202E5
165810 82272FA0
165811 F0F80870
165812 0000F088
165813 A5288A3C
165814 08050F2A
165815 70F88050
165816 0000F80A
165817 00000000
165818 00000000
165819 00000000
165820 00000000
165821 00000000
And this is the start of packet number 81. First GBT frame starts with 0x078202EF (Position 165904*4 = 663.616).
Start of package 81
165885 00000000
165886 00000000
165887 00000000
165888 00000000
165889 1EA04003
165890 000012EC
165891 1EE02000
165892 00000003
165893 A3F86AE6
165894 00000000
165895 00112244
165896 00112233
165897 00000000
165898 00000003
165899 00112255
165900 00112233
165901 00000000
165902 0000EC00
165903 00112266
165904 00112233
165905 078202EF
165906 0AAF2FA0
165907 DAF8007A
165908 0000F088
165909 A5288A3C
165910 2A263F2A
All those values match. Also this is consistent with the dump of the extracted frames.
So, it seems that there is somewhere data lost. This happens randomly, for each link. I tried it with single links (also different ones), sometimes it works, sometimes it does not. Also with several links, same behavior. The position where it happens is also kind or erratic. In this case it happened for package 80/81 (single link readout). But i have seen it in many different positions.
Cheers,
Torsten