FLP Suite Update - Process stuck at "Gathering Facts"

HI @divia

I thought I did leave the passwords blank.
I retried, but got the same outcome.

[root@flpmid ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Js3UpNPrwor8LKJAfUKzXH07soikCrsDGAc6SNxhFh8 root@flpmid
The key's randomart image is:
+---[RSA 2048]----+
|. .=oE    .      |
|.oo.. o  =       |
|+. o o .+.o      |
|= = +  +....     |
|.= * ...So.      |
|+ o + .+o..      |
|+. . . .o .      |
|=....o . .       |
|=+ .oo+          |
+----[SHA256]-----+
[root@flpmid ~]# service sshd restart
Redirecting to /bin/systemctl restart sshd.service
[root@flpmid ~]# o2-flp-setup deploy --head flpmid --flps flpmid --debug
2020/11/16 12:43:36 target flpmid is unreachable

Running: ansible-playbook /root/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/ansible_flp_multinode_inventory020799327 -u root --skip-tags dev,post-installation,trigger,readout-autoconf

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
 [WARNING]: Unhandled error in Python interpreter discovery for host flpmid:
Failed to connect to the host via ssh: Permission denied (publickey,gssapi-
keyex,gssapi-with-mic,password).

fatal: [flpmid]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Data could not be sent to remote host "flpmid". Make sure this host can be reached over ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).



PLAY RECAP *********************************************************************
flpmid                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0  

Well, password-based SSH RSA (or DSA) keys are an overkill for our setups.

There is no need to restart the sshd service, the keys are fetched when the daemon runs.

I fear there are leftovers in the ~/.ssh files. Try the following:

mv ~/.ssh ~/sshOld
mkdir .ssh
ssh-keygen …
<< EMPTY PASSPHRASE, just press RETURN, do not give anything - not even a space >>
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh flpmid true

If all worked OK, the last command should complete without error and without asking for the password. If you reach this point, the FLP suite should now run without problems.

Hi @divia

I ran everything as you mentioned.
But, I still get asked root@flpmid's password:
and neither the SSH nor root password are being accepted (permission denied)…

Could you do as root in ~/:

ls -al .ssh
cat .ssh/id_rsa.pub
cat .ssh/authorized_keys

Pls send the output to me directly (Roberto.Divia@cern.ch).

After some looking around it turned out that sshd had root logins forbidden, the only way to run as root on the machine was to login as another user and the su (or sudo). This setting has now been reverted, I hope it will work…

@divia Thank you. The deployment has made it past “being stuck” at the “gathering facts”

Unfortunately I am still getting errors, although the deployment does run through till the end.

TASK [infologger-gui : Stop obsolete systemd unit] *****************************
fatal: [flpmid]: FAILED! => {
    "changed": false
}

MSG:

Could not find the requested service ilg: host

...ignoring
TASK [control-gui : Stop obsolete systemd unit] ********************************
fatal: [flpmid]: FAILED! => {
    "changed": false
}

MSG:

Could not find the requested service cog: host

...ignoring
TASK [flp-readout.exe-config : Store readout configuration file to consul] *****
fatal: [flpmid]: FAILED! => {
    "changed": true,
    "cmd": "sh -c \"set -o pipefail;unset http_proxy; /opt/alisw/el7/coconut/v0.17.1-1/bin/coconut configuration import readout dummy-readout-flpmid /home/flp/readout.cfg -f cfg --config_endpoint consul://flpmid:8500 --endpoint flpmid:47102 --no-versioning\"\n",
    "delta": "0:00:00.042015",
    "end": "2020-11-16 17:50:46.760378",
    "rc": 1,
    "start": "2020-11-16 17:50:46.718363"
}

STDOUT:




STDERR:

FATAL import <component> <entry> <file_path>: command finished with error error=Get "http://flpmid:8500/v1/kv/?keys=": dial tcp: lookup flpmid on 8.8.8.8:53: no such host


MSG:

non-zero return code

Kind regards
Rene

Hello Renè,

some of the messages you report are not real errors, they are actions done to cleanup things that might have been running but were not, therefore the failure is not fatal (e.g. the “Stop obsolete unit” messages, they try to stop things - ilg, cog - that could have been running but are not).

What really counts is the last line of the procedure (you will see less errors than reported during the run).

Other errors could be the consequence of previous errors. Again, the last line of the procedure will say the final word.

Do you know what was the previous version of the FLP suite installed on your machine? And which version are you trying to install?

What I would do now is a reboot of the FLP, retry the installation and see what happens. Just check the final line of the procedure.

If it still fails, I can connect tomorrow and have a look, if this is OK with you.

@divia

Thank you so much for your help.

The suite appears to have been installed correctly.

I reran the deployment (this morning) and still got the same errors.After a reboot I reran the deployment (with the tag --modules readout-autoconf)

TASK [flp-readout.exe-config : Store readout configuration file to consul] *****
fatal: [flpmid]: FAILED! => {
    "changed": true, 
    "cmd": "sh -c \"set -o pipefail;unset http_proxy; /opt/alisw/el7/coconut/v0.17.1-1/bin/coconut configuration import readout readout-cru-flpmid /home/flp/readout-cru.cfg -f cfg --config_endpoint consul://flpmid:8500 --endpoint flpmid:47102 --no-versioning\"\n", 
    "delta": "0:00:00.217831", 
    "end": "2020-11-17 10:40:03.043521", 
    "rc": 1, 
    "start": "2020-11-17 10:40:02.825690"
}

STDOUT:




STDERR:

FATAL import <component> <entry> <file_path>: command finished with error error=Get "http://flpmid:8500/v1/kv/?keys=": dial tcp: lookup flpmid on 8.8.8.8:53: no such host


MSG:

non-zero return code


PLAY RECAP *********************************************************************
flpmid                     : ok=7    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0  

Kind regards
Rene

Also the printout to the terminal has become much shorter

Running: ansible-playbook /root/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/ansible_flp_multinode_inventory065953203 -u root -t readout-autoconf --skip-tags dev,post-installation,trigger

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [flpmid]

PLAY [head] ********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [flpmid]

PLAY [flps] ********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [flpmid]

TASK [flp-readout.exe-config : Auto-detect device-optimized readout parameters] ***
ok: [flpmid]

TASK [flp-readout.exe-config : Install a configuration file for readout with auto-detected devices] ***
ok: [flpmid]

TASK [flp-readout.exe-config : Check if coconut exist] *************************
ok: [flpmid]

TASK [flp-readout.exe-config : Set consul endpoint] ****************************
ok: [flpmid]

TASK [flp-readout.exe-config : Store readout configuration file to consul] *****
fatal: [flpmid]: FAILED! => {
    "changed": true, 
    "cmd": "sh -c \"set -o pipefail;unset http_proxy; /opt/alisw/el7/coconut/v0.17.1-1/bin/coconut configuration import readout readout-cru-flpmid /home/flp/readout-cru.cfg -f cfg --config_endpoint consul://flpmid:8500 --endpoint flpmid:47102 --no-versioning\"\n", 
    "delta": "0:00:00.217831", 
    "end": "2020-11-17 10:40:03.043521", 
    "rc": 1, 
    "start": "2020-11-17 10:40:02.825690"
}

STDOUT:




STDERR:

FATAL import <component> <entry> <file_path>: command finished with error error=Get "http://flpmid:8500/v1/kv/?keys=": dial tcp: lookup flpmid on 8.8.8.8:53: no such host


MSG:

non-zero return code


PLAY RECAP *********************************************************************
flpmid                     : ok=7    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

To summarize what has been done offline:

  • The FLP needed a proper TCP/IP configuration (hostname, domain name, DNS server - at the end we used Google’s while waiting for a in-house server we could use).
  • The SSH daemon had to be configured to accept ssh commands from root.
  • selinux had to be disabled.

At the end the installation could be completed without hangups and without errors.