tried to run a TPC readout. Used the scripts provided under
cru-tests/DETECTORS/tpc-readout/ from the Git master (8d22c1d887…)
Two things i noticed.
The execution of the standalone-startup.py which is executed by the script takes ages. Timed it, was 44 seconds. I remember this was much faster in the past, maybe 1 second max.
The script fails. Apparently because i don’t work as root (which i still think is a bad idea). Output is below.
[talt@flp2 tpc-readout]$ source tpc-set.sh 04:00.0
Starting TTC PON calibration
Starting GBT calibration
Checking link status
/home/talt/cru/cru-tests/DETECTORS/tpc-readout
rm -vf /tmp/output.bin
roc-setup-hugetlbfs
File '/etc/flpprototype.d/hugepages-2MiB.conf' not found, allocating default amount of 2 MiB hugepages: 128
/opt/alisw/el7/ReadoutCard/v0.8.8-1/bin/roc-setup-hugetlbfs: line 24: /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages: Permission denied
File '/etc/flpprototype.d/hugepages-1GiB.conf' found, allocating configured amount of 1 GiB hugepages: 16
/opt/alisw/el7/ReadoutCard/v0.8.8-1/bin/roc-setup-hugetlbfs: line 33: /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages: Permission denied
Adding 'pda' group
/opt/alisw/el7/ReadoutCard/v0.8.8-1/bin/roc-setup-hugetlbfs: line 39: /usr/sbin/groupadd: Permission denied
Creating hugetlbfs mounts
hugeadm:ERROR: Mounts can only be created by root
Setting permissions on hugeltbfs mounts
chgrp: changing group of '/var/lib/hugetlbfs/global/pagesize-1GB/readout-bank-a2': Operation not permitted
chgrp: changing group of '/var/lib/hugetlbfs/global/pagesize-1GB': Operation not permitted
chgrp: changing group of '/var/lib/hugetlbfs/global/pagesize-2MB': Operation not permitted
chmod: changing permissions of '/var/lib/hugetlbfs/global/pagesize-1GB': Operation not permitted
chmod: changing permissions of '/var/lib/hugetlbfs/global/pagesize-1GB/readout-bank-a2': Operation not permitted
chmod: changing permissions of '/var/lib/hugetlbfs/global/pagesize-2MB': Operation not permitted
Hugepages:
Size Minimum Current Maximum Default
2097152 0 0 0 *
1073741824 0 0 0
Use 'echo [number] > /sys/kernel/mm/hugepages/hugepages-[size]/nr_hugepages' to allocate hugepages manually
Or set a number in the following conf files and run the script again:
echo [number] > /etc/flpprototype.d/hugepages-2MiB.conf
echo [number] > /etc/flpprototype.d/hugepages-1GiB.conf
build date 0x20180726
verinfo0=0x20202020
verinfo1=0x1ee1ed72
verinfo2=0x7670
verinfo3=0x56494e46
Altera chip ID=0x0054018810d20309
roc-reg-write --id=04:00.0 --channel=0 --address=0xc00 --value=0
infoLoggerD not available, falling back to stdout logging
0x0
roc-bench-dma --verbose --id=04:00.0 --buffer-size=128Mi --superpage-size=2Mi --links="0-31" --no-errorche --bytes=4096Mi --to-file-bin=/tmp/output.bin --loopback=NONE --generator=0
infoLoggerD not available, falling back to stdout logging
2018-08-17 14:46:09.355792 DMA channel: 0
2018-08-17 14:46:09.356396 IOMMU enabled
Error: Permission denied
/build/workarea/sw/20155875/1/SOURCES/ReadoutCard/v0.8.8/v0.8.8/src/InterprocessLock.h(47): Throw in function AliceO2::roc::Interprocess::Lock::Lock(const boost::filesystem::path&, const string&, bool)
Dynamic exception type: boost::exception_detail::clone_impl<AliceO2::roc::LockException>
std::exception::what: Permission denied
[AliceO2::roc::ErrorInfo::_NamedMutexName*] = AliceO2_RoC_Pda_Mutex
[Possible cause] = Invalid lock file path
[AliceO2::roc::ErrorInfo::_SharedLockFile*] = /dev/shm/AliceO2_RoC_Pda.lock
[Error message] = Permission denied
Traceback (most recent call last):
File "tpc-readout-test.py", line 93, in <module>
p.wait()
AttributeError: 'int' object has no attribute 'wait'
[talt@flp2 tpc-readout]$ groups
z2 pda
[talt@flp2 tpc-readout]$
Ciao,
all the scripts from cru-tests should not need root credentials.
I will have a look … which tag version are you using?
We had many modifications I think it would make sense to updgrade firmware and software to the latest release (that currently is develop) … let me know.
I tried with the master branch and then with the development branch. Both fail with the same issue. Can try to figure out which firmware version is installed. Unfortunately the roc-list-cards doesn’t give me that information because of the know mutex issues. So can try to delete the mutex by hand to get it working…
I checked the sof file you have in the server … it should be the tag_v1.0.0 … we are already at tagv1.5.0 … I think we should load v1.5.0 … but I would really like to try the develop branch as it it the last one used to test the firmware and software.
We will tag as soon Olivier is back
Alright, will try the new firmware. Another thing i saw is the execution speed of the tools. Not sure if it is an issue with the machine or the software in combination with the machine. As you can see from the output, it took a whooping 1 minute and 27 seconds to run the tool. Same with the python stuff. And it uses 100% CPU…
Alright, will try with the latest version of the FW. Downloaded it just now. Short remark here. In the folder under v1.5.0 is the file called cru-v2.sof. Would it be possible to add the tag also to the file-name. Files can be copied from folders to other locations, so having all the information in the filename (tag + maybe data) is really helpful
O.k. So apparently we are using our own firmware which Lars compiled. It is based on the one from the master branch and he changed the number of links to 20. Will first try your firmware, then checkout the latest develop, change the links to 20 (if it is not already in yet) and compile/update.
Alight, “dev” firmware is in the card. After the reboot the host seems to be back to normal speed again, tools run at normal speed again. Readout for TPC still doesn’t work and complains about no access rights and that it requires root privileges.
This issue appeared again after sending SCA ADC commands for about 12 hours (might or might not be a causality there).
It was taking 1.5 seconds to send a couple of SCA commands (read GPIOs and a couple of I2C transactions).
After digging a bit with perf trace I found that the time was spent in the PCI DMA driver:
After reinitializing the DMA driver, everything is back to normal speed again
-bash-4.2$ sudo modprobe -r uio_pci_dma
-bash-4.2$ sudo modprobe uio_pci_dma
I will have our Driver expert have a look at it.
What do you do exactly with your software?
What type of command is it? Do you run it in a loop for several hours?
I have a FEC in the lab … can I try to reproduce it?
And last but not least, what is the O2 software version you are using?
SO I can install the same setup here in the lab.