FLP Suite Update - Process stuck at "Gathering Facts"

Dear all,
I am trying to update the flp-suite to v11

I am running as root, due to Python permission errors:

[flp@localhost ~] o2-flp-setup deploy --head localhost --flps localhost --debug

Could not copy SSH key. You will be prompted for the password for user root.

Running: ansible-playbook /home/flp/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/flp/ansible_flp_multinode_inventory282574984 -u root --ask-pass --skip-tags dev,post-installation,trigger,readout-autoconf
SSH password: 

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
 [WARNING]: Unhandled error in Python interpreter discovery for host localhost:
Invalid/incorrect password: Permission denied, please try again.

fatal: [localhost]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Invalid/incorrect password: Permission denied, please try again.


PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0 

I updated the suite using:

o2-flp-setup checkout flp-suite-v0.11.0

Then I ran:

o2-flp-setup deploy --head localhost --flps localhost --debug

which returns the following:

[root@localhost ~]# o2-flp-setup deploy --head localhost --flps localhost --debug

Could not copy SSH key. You will be prompted for the password for user root.

Running: ansible-playbook /root/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/ansible_flp_multinode_inventory999081640 -u root --ask-pass --skip-tags dev,post-installation,trigger,readout-autoconf
SSH password: 

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
^C
[root@localhost ~]#  [ERROR]: User interrupted execution

The process continues to get stuck at “Gathering Facts”.
I have tried to reboot the flp and run again, but I get the same issue(s).

@rmonteve Do not use “localhost” but rather the output of hostname -s

Thank you for your reply Roberto

I believe the hostname is local host.

[flp@localhost ~] hostname -s
localhost

Would the (only) solution be to change the computers hostname?

Kind regards
Rene

The machine has a network connection? How do you reach it, which name do you give?

I connect to it over SSH (yes it has a network connection).
E.g. flp@154.xxx.xxx.xxx
With “flp” being the username (i.e. not made up for representation).

And the 154.XXX… has no hostname? Strange. Is it DHCP or fixed IP address?

I believe it was setup with a fixed IP.

We have had issues with the suite in the past. It was previously alluded that the hostname was giving issues (that we managed to resolve).

Would I be correct to assume that we should ideally alter the host name? We have tried to put this off has it may have far reaching repercussions on the rest of the testbench system.

Do you have any potential ideas for alternate solutions?

Kind regards
Rene

OK, try to use the IP address instead of “localhost”. You should also try to do, from the flp, ssh 154.XXX and see if it works.

To ease the installation you can also create a SSH key and authorize it:

ssh-key -t dsa

cat .ssh/id_dsa.pub >> .ssh/authorized_keys

Hi,

As Roberto suggested, using the IP address might solve the issue. But to name a machine localhost is looking for trouble. I would still recommend to rename the machine and redeploy.

Cheers,
Vasco

I second that. Using IP addresses instead of hostnames can lead to funny results, and this for many packages - not only for the FLP suite.

So it seems it would be best if I were to rename and redeploy, then see what happens.

Thank you for your help so far, Roberto and Vasco

I’ll speak with my team to see if when will be available for me to do this.
Probably I will have a response from the redeployment on Monday.

Kind regards
Rene

Please do and let us know how it went. Which group is this FLP for?

Hello Roberto and Vasco

@divia this FLP is for the ALICE-MID User Logic group

I managed to change the hostname, since we last spoke.
From ‘localhost’ to ‘flpmid’

[root@flpmid ~]# hostname
flpmid

Unfortunately I am still getting the same problem (the deployment not going past ‘gathering facts’).

[root@flpmid ~]# o2-flp-setup deploy --head flpmid --flps flpmid --debug

Could not copy SSH key. You will be prompted for the password for user root.

Running: ansible-playbook /root/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/ansible_flp_multinode_inventory780842597 -u root --ask-pass --skip-tags dev,post-installation,trigger,readout-autoconf
SSH password: 

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
^C
[root@flpmid ~]#  [ERROR]: User interrupted execution

Additionally, I re-cloned and installed the suite (but still getting stuck).

Is this potentially a ssh config (sshd_config) issue?

Kind regards
Rene

OK, well done for setting the hostname.

Let us go by steps. First: ssh. Can you please try to do:

ssh root@flpmid hostname

and post the output?

@divia
Please see the command and output below

[root@flpmid ~]# ssh root@flpmid hostname
Enter passphrase for key '/root/.ssh/id_rsa': 
flpmid

Good. Next step: allow passwordless ssh. This is an option, but it will make life easier. As root do:

ssh-keygen -t dsa
<< answer with the default values by simply pressing RETURN every time >>
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now the ssh root@flpmid hostname command should run passwordless.

does this have to be dsa or can we stick with rsa?

You can stick with RSA, no problems. But it must be password-less.

If you want to keep a password-based RSA key, then we will have to use a trick.

I ran the following:

[root@flpmid ~]# ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:oixyejW958RLDaJGtHzj6f7VT1J/3sVyg1qvQI5A380 root@flpmid
The key's randomart image is:
+---[RSA 4096]----+
|                 |
|                 |
|    .  .         |
|   o .. . . o    |
|    +.=.S. o E.  |
|   oo=.*.o+. ..o |
|. o.=.o.+.ooooo.*|
| +.o ..o.o  ++.+=|
|..   .o++  . .o.o|
+----[SHA256]-----+

But when I try to ssh:

root@flpmid ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@flpmid ~]# ssh root@flpmid hostname
root@flpmid's password: 
Permission denied, please try again.
root@flpmid's password: 
Permission denied, please try again.
root@flpmid's password: 
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

I believe that there should no be no issues, but now I get

[root@flpmid ~]# o2-flp-setup deploy --head flpmid --flps flpmid --debug
2020/11/16 12:29:10 target flpmid is unreachable

Running: ansible-playbook /root/.local/share/o2-flp-setup/system-configuration/ansible/flp-multinode.yml -i /tmp/ansible_flp_multinode_inventory759807307 -u root --skip-tags dev,post-installation,trigger,readout-autoconf

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
 [WARNING]: Unhandled error in Python interpreter discovery for host flpmid:
Failed to connect to the host via ssh: Permission denied (publickey,gssapi-
keyex,gssapi-with-mic,password).

fatal: [flpmid]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Data could not be sent to remote host "flpmid". Make sure this host can be reached over ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).



PLAY RECAP *********************************************************************
flpmid                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   

Again: you are setting up a password-based RSA key. Try not to set a password…

When ssh-keygen says "Enter passphrase (empty for no passphrase): " just press return, do not give any text. You will not loose in security (after all you are just authorizing root to SSH at itself into the same machine, which is not a security threat at all).

If you really want to set a non-empty password, then we will have to use a trick. A password-less RSA key would be much easier.