AliECS workflow and task templates documentation

lbratrud · September 10, 2020, 8:53am

Dear experts,

Are there some more user friendly/in depth documenation for the Control Workflows?
Like what the different options for the YAML files do.
How to access for instance a variable set in the parent workflow template from a task template that said workflow launches, etc.

I see some work has begun on a handbook - I guess this will be filled up with this kind of information?
Is there already some development fork/branch of this one could follow?

malexis · September 15, 2020, 8:55am

Dear Lars,

There are some more in-depth examples in the master branch of ControlWorkflows repository. The workflow template with comments across the different values and the same for the task template. The handbook will contain this kind of information. Unfortunately, there isn’t a branch at the moment.

Kind Regards,
Miltiadis

dstocco · February 4, 2021, 1:49pm

Dear all,
Id’ also like to use the AliECS instead of my custom scripts to launch the acquisition.
However, I cannot find a clear tutorial on how to do it.
For example, I’ve had a look at the main webpage GitHub - AliceO2Group/Control: AliECS - The ALICE Experiment Control System, but it is now written how to launch the AliECS.
Searching here and there, I think to understand that I need to launch firefox and connect to localhost:8080… but I’m not sure about it.
Is there a basic tutorial that I can follow?
Thanks in advance,
best regards,
Diego

malexis · February 4, 2021, 2:06pm

Dear @dstocco,

Do you want to launch the AliECS on a machine where the FLP Suite is installed?

Kind Regards,
Miltiadis

dstocco · February 4, 2021, 2:17pm

Dear @malexis ,
yes, I want to launch AliECS on 3 FLPs (of which 1 at P2) where the latest FLP suite 0.14 is installed.
The idea is to launch a standard acquisition, with readout.exe + StfBuilder + possibly QC.

Actually for the moment the QC part is inside O2, so if I could launch the DPL proxy + my devices it would be great…but running just the QC would be already ok.

Thanks!
Cheers,
Diego

malexis · February 4, 2021, 2:46pm

Dear @dstocco,

Okay, since the machines are installed with the FLP suite 0.14 the AliECS components are present and running. In order to use the AliECS, you need to go to the machine serving as the head node, there you can validate that the service o2-aliecs-core is running with the systemctl status o2-aliecs-core.

Now if the machines (FLPs) are all part of the cluster, then you can launch environments in two different ways.

The first one is through the AliECS GUI, by visiting the following URL headnode:8080 and you should see a similar page

.

If you go to the + Create tab you can select you will see the GUI which will allow you to deploy specific tasks to the selected FLPs. Notice that the FLPs must be part of the head nodes cluster in order to be accessible

You can see three workflows for more information on the different configurations and the workflows, in general, can be found here.

The second way to deploy environments is through the coconut cli tool. You can ssh to the head node, and execute the following commands (as an example I will use the readout-dataflow workflow):

module load coconut
coconut environment create -w readout-dataflow@flp-suite-v0.14.0 -e '{"hosts":["flp1","flp2","flp3"]}'
coconut environment control [environment id] -e start   #start a new run
or 
coconut environment control [environment id] -e stop  #stop run

For more information, you can always execute coconut --help

Let me know if I can help you with something.

Kind Regards,
Miltiadis

dstocco · February 4, 2021, 3:05pm

Hi @malexis ,
I checked that the o2-aliecs-core is indeed running.
I then tried with the GUI option. However, the headnode:8080 is unknown.
I therefore tried with localhost:8080 and in this case I can see the window you’re posting. But if I try to create the readout-dataflow workflow I get the error message:

Request to server failed (403 Forbidden): Control is not locked

Notice also that in Revision I have master, and I do not see the v0.14.0 (it’s like the latest version is not fetched ?)

I installed the flp-suite as root on that machine and deployed to the machine itself. I can launch readout.exe as user flp, so the permissions should be ok.

Cheers,
Diego

dstocco · February 4, 2021, 3:09pm

Hi again,
to avoid misunderstanding, I’m not using the three FLPs at the same time. I deploy the FLP suite to each one of them separately (they are two independent test-benches + the official MID FLP at P2).
So, at least for the test-benches, the --head and --flps coincide.
Not sure about the FLP at P2…I’ll first play with the test-benches and then move to P2.
Cheers,
Diego

divia · February 4, 2021, 3:11pm

At Point 2 we have alio2-cr1-flp182…186 together with alio2-cr1-flp182 as headnode.

malexis · February 4, 2021, 3:38pm

Hi @dstocco ,

Yes, the Request to server failed (403 Forbidden): Control is not locked means that you need to click on the lock .

Regarding the flp-suite-v0.14.0 not being there, you can hit the refresh button

and try to change afterward the revision. The other way is to use coconut from the machine(since it is a standalone and it is used as an FLP and head at the same time), by executing:

module load coconut
coconut repo refresh
coconut repo default-revision 0 master
coconut repo list # to validate the changes

Cheers,
Miltiadis

dstocco · February 4, 2021, 4:12pm

Hi @malexis ,
it works now, thanks.
For information, where are the workflows stored? So far it seems that the acquisition is running, but the output is not stored nor analysed. If I can access the workflow directory I guess I can add my tasks as well…
Thanks!
Cheers,
Diego

malexis · February 4, 2021, 4:21pm

Hi @Diego,

The workflows are templates, which are stored in the github.com/AliceO2Group/ControlWorkflows repository. You can always fork that and create your own templates based on the tasks that you want to execute. Once this is done you have to point AliECS to your repository

module load coconut
coconut repo add github.com/miltalex/ControlWorkflows #my fork
coconut repo list # we need to remember the id of the above repo lets say id=1
coconut repo default 1 #id
coconut repo refresh

In order to access the output, you need to go to the infobrowser the URL is localhost:8081. For the rest of the components such as monitoring, QC, Configuration if you visit localhost you should see the following directory.

.

Cheers,
Miltiadis

dstocco · February 4, 2021, 4:30pm

Hi @malexis ,
ok, but isn’t there a local cache I can modify to test my workflow? It is a little bit overkilling having to push to my local fork and reload if I want to modify my tasks.
Or should I move to coconut in this case?

Notice that the default StfBuilder is configured with detector = TPC. So I guess that one has to modify the workflows anyway…

malexis · February 4, 2021, 4:57pm

Hi @Diego,

The local cache can be found here /var/lib/o2/aliecs/repos . The problem with modifying directly the local cache is that instantly permissions will change, which may result to a permission denied error. Also every time the AliECS tries to deploy a workflow it will do a checkout to the most recent revision of the branch.

The easiest way is to fork and add once the repo through coconut and then refresh on every change. Yes, it is indeed needed to change the configuration for the tasks.

Kind Regards,
Miltiadis

dstocco · February 4, 2021, 5:02pm

Hi @malexis ,
ok, thanks for the detailed explanation. I’ll go for the fork then.

Thanks!
Cheers,
Diego

malexis · February 4, 2021, 5:04pm

Hi @dstocco ,

Great let me know If I can help you with anything else.

Kind regards,
Miltiadis

dstocco · February 5, 2021, 8:05am

Dear @malexis ,
I tried your suggestion and forked the repository.
I refreshed with coconut and set the repository as default. From the WebUI I can select my repository and the desired branch.
However, if I try to create a workflow from my fork I get:

Request to server failed (504 Gateway Timeout): 4 DEADLINE_EXCEEDED: Deadline exceeded

Is there any step I’m missing?
Thanks in advance,
cheers,
Diego

malexis · February 5, 2021, 9:07am

Dear @dstocco ,

The DEADLINE_EXCEEDED error means that there is an issue while creating the environment. In order to overcome it, I need to have a bit more information from the infologger regarding the errors that occurred during deployment.

Kind Regards,
Miltiadis

dstocco · February 5, 2021, 9:29am

Hi @malexis ,
ah ok, so there might be something bad with my tasks. For the moment I just changed the detector name to MID. So the issue might be that the MID part is not yet in QC for this version (I have to further check).

Just for my education: is there a way to export the errors from the infologger?

malexis · February 5, 2021, 9:34am

Hello @dstocco ,

Unfortunately I am not familiar with all the options that Infologger provides, usually I am using it through the browser. However the Infologger experts (@sy-c) may have more information to provide.

Cheers,
Miltiadis