Packages missing after system-configuration pull and re-running o2-flp-setup deploy

Hello O2 FLP team,

For the TPC setup at P2 we are using the o2-flp-setup since some time, to set up the machines configured in the requested and recommended way. We used the latest dev branch back then to deploy everything, which worked without any problems.
In the meantime we tried to update everything multiple times and ran into problems with the installed packages.
After pulling the system-configuration git repository and re-running o2-flp-setup deploy with our TPC inventory, the newest Ansible playbooks are applied. During this process all “old” O2 packages are removed and the latest versions should be installed. However the readout packages, as well as ALF is no longer present on our FLP machines afterwards. Furthermore there are now ECS problems on our Head node.
Last time things only worked again after a complete rebuild. It’s rather inconvenient to always rebuild the complete setup, to get the latest version. Would it be possible to fix this issue / give advice how to prevent this in the future? In case you would like to debug, the TPC setup is currently still in the broken state.

Best regards,
Johannes

Hi Johannes,

Normally updates are possible from version n to version n+1. In fact, we have this as part of our FLP Suite CI and also actively test this scenario in the integration tests before releasing a new version. If you upgrade from an older version to the latest you might indeed run into problems.

The specific symptoms you describe of missing packages is strange.
How can we have a look ?

Cheers,
Vasco

Hello Vasco,

the login node of the TPC P2 setup is reachable via alice-tpc-test.cern.ch with cern user credentials for everyone on the alice-tpc-cru egroup.

I added most users accounts who debugged in the past to the sudoers list.

The o2-flp-setup deploy was executed from the login node as the tpc user. I can add ssh keys to the tpc user, in which case just email me your pub key, or you can use “sudo su - tpc” to change to the tpc user. The login node itself is not included in the configuration, the inventory is located in /home/tpc/inventory/

The previous configuration was done with the dev branch and marked with 0.11.0-beta and now should be the 0.11.0 after the release.

Best regards,
Johannes

Hello Johannes,

I will try to debug the issues noticed with o2-flp-setup and AliECS. I have subscribed to the alice-tpc-cru e-group and waiting for approval.

I would like to ask you if you have the log from the o2-flp-setup installation.

Kind Regards,
Miltiadis

Subscription request is approved.

Hello Christian,

Thank you for your approval. I am still unable to ssh to alice-tpc-test.cern.ch.

Kind Regards,
Miltiadis

Hello Miltiadis,

it takes a while until the Cern LDAP servers are updated, so access is usually delayed 1-2h. I will add you to the sudoers list as well, so you have full access.

I don’t have any complete logs unfortunately. You can run the setup the o2-flp-setup deploy again as the tpc user and check the output, it should be the same as during the other run.

Best regards,
Johannes

Hello Johannes,

Thank you very much. I will try again in a bit.

Kind Regards,
Miltiadis

Hello Johannes,

I fixed your deployment. I noticed that the ansible 2.10.2 version was installed and there is an issue that doesn’t let the deployment complete, at least that was my case on my first retry. I have re-installed the ansible 2.9.13 and the deployment was successful. I also did a small from the head node

eval `aliswmod load coconut`
coconut e c -w readout-dataflow -e '{"hosts":["flp5"], "user": "root"}' 

I set the user to root cause of the following error reported by readout:

cannot open file /home/flp/readout_stfb_emu.cfg

The file is present

[root@flp5 ~]# ls -la /home/flp/readout_stfb_emu.cfg
-rw-r--r-- 1 flp root 1250 Aug 10 21:23 /home/flp/readout_stfb_emu.cfg

Kind Regards,
Miltiadis

Hello Miltiadis,

thank you very much. Everything seem to work fine now. Lars just took some data with the newest firmware / Readout from flp-suite 0.11.

One remark to the Ansible version:
We encountered an issue with a quite old version, I think it was 2.7, which was missing a plugin “varnames” and was therefore failing as well (with a more obvious error message). This was the reason I updated Ansible and got the newest version installed.

It could be a good idea to add check for the Ansible version in the o2-flp-setup and give an warning / error message if an incompatible version is used.

Best regards,
Johannes