Using institutional HTCondor cluster with O2/O2DPG/O2Physics? (Connection failure to CCDB)

maogino · January 25, 2023, 7:44am

Hi,

When I run O2 workflows in a shell script, it works as usual. However, when I submit a batch job with the same script on HTCondor we maintain for my lab, it fails to connect to CCDB. It should be technically possible to run O2 with HTCondor as we do on the grid.

Is there anyone using HTCondor locally to execute O2 workflows? If any, would you share tips?

Cas · January 26, 2023, 11:01am

Do you pass the proper environment to HTCondor? I remember struggling a lot with this when I tried setting this up. HTCondor does not simply transfer the environment of alienv when you set getenv = true in your submit file. I am unsure as to why it does not, but I found a workaround.

First I print the environment of alienv into a .sh file which I then tell HTCondor to transfer. This file should be sourced in your executable.

I named the file that prints the environment “prepare_env.sh” and it looks like this

#!/bin/bash

export ALIBUILD_WORK_DIR=<YourAliBuildWorkDir>
eval `/usr/local/bin/alienv shell-helper`
export PATH=${PATH}

# o2_env.sh
echo "#!/bin/bash" > o2_env.sh
echo "" >> o2_env.sh
alienv printenv O2/latest-dev-o2 >> o2_env.sh
sed -i "s|test 0;||g" o2_env.sh
chmod a+x o2_env.sh

It produces a file with the environment paths that should be transferred with HTCondor.

I hope this helps, but I am unsure as to whether this is actually your problem.

swenzel · January 26, 2023, 1:57pm

I believe the problem is that on the HTCondor nodes, your Alien GRID certificate (or token) is not setup or available. Access to CCDB requires this be present.

On our GRID nodes this is usually the case automatically but on a local HTCondor farm probably not.

There are techniques to export your ALIEN Grid certificate as environment variables or files to the worker nodes. Maybe @grigoras can comment?

swenzel · January 27, 2023, 1:47pm

To unblock you with this, I think that one of the following approaches might work:

You initialize a GRID token on a machine where your GRID certificate is installed (typically you should have a file ~/.globus/usercert.pem). This can be done by typing alien.py ls which demands a password. Upon successful execution, there will be files tokencert_XXX.pem and tokenkey_XXX.pem somewhere under /tmp/.
(Option 1) – You simply copy/sync those files to the HTCondor nodes, where the job will be run. This may be possible via a shared file system or not.
(Option 2) – You export the content of these files as environment variables within your script submitted to HTCondor (not discussing security here):
```
export JALIEN_TOKEN_CERT='CONTENT OF TOKENCERT FILE'
export JALIEN_TOKEN_KEY='CONTENT OF TOKENKEY FILE'
```

With these methods, a simple call to alien-token-info should succeed and show the token information.

grigoras · January 27, 2023, 2:22pm

Sandro is right, with your full Grid certificate any alien.py connection will first create these token*.pem files in your tmp directory.

In addition to that you can also call a token alien.py command to print the same content on screen. To enhance the security, try something like alien.py token -v 2 to only give a 2 day validity to the generated identity. Better have a very limited validity for them and if needed overwrite them tomorrow with another fresh, short lived, pair.

Then indeed, either take the two respective contents and export them in the environment of the job, or set those environment variables to the path to those token files somewhere in your own space on the shared file system (ideally all nodes would see the same location).

maogino · January 30, 2023, 8:02am

Thank you for your suggestions and comments, @Cas, @swenzel and @grigoras!

Casper Arie Van Veen:

I named the file that prints the environment “prepare_env.sh” and it looks like this
#!/bin/bash

export ALIBUILD_WORK_DIR=<YourAliBuildWorkDir>
eval `/usr/local/bin/alienv shell-helper`
export PATH=${PATH}

# o2_env.sh
echo "#!/bin/bash" > o2_env.sh
echo "" >> o2_env.sh
alienv printenv O2/latest-dev-o2 >> o2_env.sh
sed -i "s|test 0;||g" o2_env.sh
chmod a+x o2_env.sh
It produces a file with the environment paths that should be transferred with HTCondor.

I have used shell-helper and printenv in my shell script. I am sorry for the missing information.

I had thought that shared /home (and thus ~/.globus/usercert.pem) by NFS would be enough for authentication and did not considered /tmp. It should be the root cause.
I will try your suggestions.

alien.py token will be very helpful, as alien-token-init will produce tokens of all users in the same /tmp directory and it will confuse people (students especially) finding their own token. Thanks!