Could you try to repeat the test mounting the ramdisk?
I am running with 6 links in continuous mode and checking the data … I don’t see errors (I am running since several minutes).
You dump 16MB on the disk that is very little and very short … so I am surprised you get error with a single link. So I am wondering if the writing on the disk has an effect.
By the way, in the new release of the bench-dma we fixed the issue of the file size.
Alright, did a test with the ramdisk. First event was fine, second one showed scrambling. I guess that was to be expected. Once you have the data in the memory, the writing to disk should not be the issue. This is buffered by the OS.
So, did more tests, this time with 12 links. Package loss happens now nearly immediately, seen in data and in the package headers. The package number jumps frequently on all links.
Ciao,
I am not sure what you mean with first event and second one … anyway I think it is time for me to debug a bit your system to see if I get something strange.
Today I am bit stuck with other stuiff … I’ll contact you later or tomorrow to see how to run a few things on your machines.
With 12 links indeed the situation is a bit at the limit and I would expect to see errors immediately when writing data on the disk. Let me discuss the issue with other colleagues CRU/O2
well, whenever we take data with the roc-bench-dma tool, the data is dumped to disk. I call that an event. You can call it a run or whatever. So i take data once, data looks good. I take data again, data looks not good.
I don’t write data to disk, i used the ramdisk for the test. I reserved 2 GB of buffer and took 16 MB (bytes option). So we should be fine.
Again, why would writing to disk be an issue? The data is in the memory (DMA) and from there stored to disk. So the host memory acts as a buffer and we don’t send that much data to overflow it. Disk writes are usually handled by the OS, so why should there be a loss when writing from memory to disk?
you will need the new software to check the data online … it has been tagged, the RPM should be available soon to be installed.
during data taking you can check if the CRU DROPs packet … that would explain why you see full dma page missing in your stream.
I have done many tests … and indeed if I check data online or store data on the disk I can collect correctly data with only a few links without errors (max 4) … but if I disable the error check and I monitor the DROPPED counter I can run with 9 links and the data is never dropped in the CRU.
To read the DROPPED packet counter read this register while in data taking (the counter is cleared when RUN_ENABLE is 0)
Is there a way to test with the current software we have installed? I want to repeat exactly the same test first and write data to disk. The only difference to testing with FECs would be that the data is sourced from the DDG instead from the DDG.
So, how do I set the data path to DDG and control it?
so i did run a check with the DDG and i see the same behavior: Packets are lost. I can see it in the package counter in the RDH and also in the raw data. Below is the output from both.
Ciao,
yes I was expecting that. 8 links and dumping data on the disk wont work.
I have just finished to discuss with the other O2 colleagues … we should move to readout.exe.
I will write an email to you and johannes to see how to sync the operation next week.
Readout should dump data on the disk in a more efficient way.
Would it work for you?
sure, we can move to the new readout. I still would like to understand why dumping to disk won’t work. You reserve a DMA buffer, right? That is located in the host memory. From there you dump the data to disk. The buffer is 1 GB and the actual data only 16 M. So plenty of space in the host memory. Once it is there, it should not matter how long the disk will take to store it. There is no more data coming in, so there is no overflow or overwriting in the memory.
Ciao,
roc-bench-dma doesn’t work like that.
The current version dump on the disk every single SUPERPAGE when they are filled … that kills the purpose of the buffer in the memory.
The data is not kept in the memory and write at the end. This is why the writing on the disk is very inefficient.
Considering that readout is ready we decided to use roc-bench-dma for other scopes but move on using readout, that is actually doing what you described.
Data is stored in the memory while a second thread dump data on the disk … so if there is enough buffer in the RAM, the writing on the disk will not slow it down.
thanks for the explanation. That actually makes a lot of sense (the explanation, not the implementation). For the next time, it would be nice to declare this type of tool immediately unsuitable for data taking. Because it is, the way it is implemented. A single TPC link will send 500 MB/s and this already brings the tool over its limits. Instead, it is advertised in the CRU Test GIT as the readout tool one should use. Could have saved the two of us a lot of time and debugging effort.
So, let’s see how the new tool performs. Is there an RPM for it?
Well apologize for that, but it took me time to figure it out.
The previous developer left the group since quite some time and the new one just joined the collaboration and is still getting used to the code and past decisions (that were taken while there was not much to test).
So far it was working fine for the different test systems, and in my machine I can store data for a couple of links without errors but indeed we reached soon its limitation adding more links.
I have contacted Johannes, readout in the last RPM available should be ok.
I will do tests to see how much data we can dump on the disk before dropping data.
No worries, no hard feelings. At least we found the issue and understand it, so we can happily move to the new readout.
Concerning dropping data, we’re not planning to take huge amounts of data for the noise testing. If we get 1 MB/Link that is already quite a bit and plenty to check the noise. So if you buffer the data in the host memory first, there should be no issues at all.
In the current test machine (no tuning) I can read data from 12 links without errors up to 30 DMA pages per link.
That means
30 * 8KB = 240 KB per link
240 KB * 12 = 2880 KB = ~2.8 MB
Would it be enough for the time being to do some more complete tests on your electronic?
We are looking how to configure the machine to get better memory usage.