Errors when trying to create environment

Hi O2 FLP team,

I am trying to start a readout with our FLP test setup and the ECS software.
The FLP suite looks to be correctly set up. I can see correct settings in consul, and coconut info seems okay:

    flp6[workflows]$ coconut info
    instance name:      AliECS instance
    endpoint:           127.0.0.1:47102
    core version:       AliECS 0.15.0 revision c4b7aa1
    framework id:       4e35bc46-7a3a-4f94-9792-f50eefcf7818-0000
    environments count: 0
    active tasks count: 0
    global state:       CONNECTED

    flp6[workflows]$ coconut repo list
    Git repositories used as configuration sources:

      ID |                REPOSITORY                 | DEFAULT |    DEFAULT REVISION
    -----+-------------------------------------------+---------+-------------------------
       0 | github.com/AliceO2Group/ControlWorkflows/ |         | flp-suite-v0.9.0
       1 | github.com/larks/ControlWorkflows/        | YES     | tpc-test-sector-v0.9.0

    Global default revision: flp-suite-v0.9.0

We use the machine called flp6 as the headnode, and have FECs connected to CRUs in the 5 machines flp0-flp4.
I forked the O2 ControlWorkflows repo, branched out from flp-suite-v0.9.0 branch and added a template that defines a readout config for our setup (only one CRU per machine for now).
workflow template
readout config

When I try to create an environment with the ECS GUI I get the error:
Request to server failed (504 Gateway Timeout): 13 INTERNAL: cannot create new environment: transition canceled with error: transition unsuccessful: ok: false, trigger: DEVICE_ERROR, event: CONFIGURE, state: ERROR

If I try to create an environment with coconut I get the following error (also with the official templates from O2:

flp6[workflows]$ coconut environment create -w readout --extra-vars hosts=["flp0"] readout_cfg_uri=file:/tools/ecs/tpc-ecs-support-files/rdo-simple-cru.cfg
FATAL create:      command finished with error error=rpc error: code = Internal desc = cannot create new environment: cannot load workflow template: invalid character 'l' in literal false (expecting 'a')

Any idea what I am missing/doing wrong?

Hi Lars,

It looks like the CONFIGURE transition is failing.

Do you see any error messages in the InfoLogger GUI (there should be a link to it in the top directory web page with all the links to the GUIs) ?
That might tell us more about what’s wrong.

Cheers,
Vasco

Hi Vasco,

Thanks for the hint! Seems like my brain is on vacation…
In the GUI it was indeed picking up the wrong readout config file.
After adding the readout_cfg_uri variable it moved to configured.
Thanks a lot!

Another related question:
I have set the default revision for my repo to a different branch than “flp-suite-v0.9.0” using coconut, but in the AliECS GUI the default revision is still “flp-suite-v0.9.0” for this repo.
Where does AliECS GUI pick up the revision? Does it pick up the global default instead of the local? Do I need to push this change somehow so that the GUI sees it?

Hi Lars,

I will let my colleagues confirm tomorrow, but I think that what you need to do is to execute

coconut repo default-revision your-default-revision

on the head node as described here.

Let me know if it helped.

Cheers,
Vasco

Expanding on Vasco’s answer, you can check the current defaults with coconut repo list. The global default should only apply when a per-repo default isn’t defined (the opposite behavior would be a bug).

In order to guarantee that your coconut env create pulls the workflow template you expect, you need coconut repo default-revision <repo index> <branch name> to set a specific branch as the default for that repo, as well as coconut repo default <repo index> to set your own repo as the default one.

Cheers,

Hi Teo, Vasco,

Yes, that is what is the confusion on my side.
As you can see in my first post, I set my own repo as default and changed the default revision of this repo, restated here:

flp6[workflows]$ coconut repo list
Git repositories used as configuration sources:

  ID |                REPOSITORY                 | DEFAULT |    DEFAULT REVISION
-----+-------------------------------------------+---------+-------------------------
   0 | github.com/AliceO2Group/ControlWorkflows/ |         | flp-suite-v0.9.0
   1 | github.com/larks/ControlWorkflows/        | YES     | tpc-test-sector-v0.9.0

Global default revision: flp-suite-v0.9.0

However, if I open a new instance of the ECS GUI, the default revision is set to the global one:

It is only after I change to a different repo in the drop down menu and then back again to my own repo (which is set as default) that it changes to the correct default revision (please see the following screen capture).


I see the same behavior in both Firefox and Chromium.

Hi @lbratrud ,

I confirm the behavior you are seeing is not correct and it is a bug on the AliECS GUI side. (no matter what browser version).

I raised the ticket for this and added you as a watcher: https://alice.its.cern.ch/jira/browse/OGUI-660 and the fix will be released in the following days.

Have a nice day,
George

Hi @graduta,
Thanks for the update and confirmation :slight_smile:

Hi @lbratrud,

We released a new version of AliECS GUI (1.6.12). Among other new features it also includes a fix to the bug that you have raised.

Thank you for the report and let me know if it works as expected!

Have a nice day,
George