QC workflow writing to local CCDB: Internal server error 500

Dear experts,

I am trying to run a local CCDB and QCG on a remote slc7 machine, where I do not have root permissions.

I can start the CCDB and QCG without any problem. However, when I run a QC workflow e.g.

o2-qc-run-tpctrackreader -b | o2-qc -b --config json:/${QUALITYCONTROL_ROOT}/etc/tpcQCPID_sampled.json

writing to the local CCDB I get the following errors from the QC checker:

[33943:QC-CHECK-RUNNER-QcCheck]: <!doctype html><html lang="en"><head><title>HTTP Status 500 – Internal Server Error</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 500 – Internal Server Error</h1><hr class="line" /><p><b>Type</b> Exception Report</p><p><b>Message</b> Servlet execution threw an exception</p><p><b>Description</b> The server encountered an unexpected condition that prevented it from fulfilling the request.</p><p><b>Exception</b></p><pre>javax.servlet.ServletException: Servlet execution threw an exception
[33943:QC-CHECK-RUNNER-QcCheck]: </pre><p><b>Root Cause</b></p><pre>java.lang.NoClassDefFoundError: Could not initialize class ch.alice.o2.ccdb.servlets.LocalObjectWithVersion
[33943:QC-CHECK-RUNNER-QcCheck]: 	ch.alice.o2.ccdb.servlets.Local.doPost(Local.java:473)
[33943:QC-CHECK-RUNNER-QcCheck]: 	javax.servlet.http.HttpServlet.service(HttpServlet.java:660)
[33943:QC-CHECK-RUNNER-QcCheck]: 	javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
[33943:QC-CHECK-RUNNER-QcCheck]: </pre><p><b>Note</b> The full stack trace of the root cause is available in the server logs.

The used checker is the SkeletonCheck, which is just there to be a Checker, nothing more.
For the full run log see here.

When I open a browser and go to the local CCDB I can see the full tree of histograms as I would expect but as soon as I click on a histogram I also see an internal server error 500.

Finally, also in the terminal output of the QCG I see an error code 500 as soon as I open the QCG in the web browser.

Trace: Error: Non-2xx status code: 500
    at ClientRequest.requestHandler (/home/tklemenz/AliSoftware/sw/slc7_x86-64/qcg/v1.6.10-1/node_modules/@aliceo2/qc/lib/CCDBConnector.js:95:18)
    at Object.onceWrapper (events.js:300:26)
    at ClientRequest.emit (events.js:210:5)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:583:27)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:115:17)
    at Socket.socketOnData (_http_client.js:456:22)
    at Socket.emit (events.js:210:5)
    at addChunk (_stream_readable.js:308:12)
    at readableAddChunk (_stream_readable.js:289:11)
    at Socket.Readable.push (_stream_readable.js:223:10)
    at Log.trace (/home/tklemenz/AliSoftware/sw/slc7_x86-64/qcg/v1.6.10-1/node_modules/@aliceo2/web-ui/Backend/log/Log.js:113:13)
    at errorHandler (/home/tklemenz/AliSoftware/sw/slc7_x86-64/qcg/v1.6.10-1/node_modules/@aliceo2/qc/lib/api.js:276:11)
    at /home/tklemenz/AliSoftware/sw/slc7_x86-64/qcg/v1.6.10-1/node_modules/@aliceo2/qc/lib/api.js:36:21
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
2020-03-12T15:46:25.787Z e[31merrore[39m: [QualityControl] Non-2xx status code: 500
2020-03-12T15:46:25.797Z e[34mdebuge[39m: [WebSocket] ID 0 Processing "filter"
2020-03-12T15:46:25.797Z e[34mdebuge[39m: [WebSocket] ID 0 Sent filter/200
2020-03-12T15:46:25.805Z e[34mdebuge[39m: [HTTP] Page was not found: /favicon.ico

Full QCG terminal output.
There is also a red box popping up on the qcg web page stating:

Failed to retrieve list of objects due to undefined

On my local machine (ubuntu 18.04) this works with absolutely no problem so I am wondering if this may be caused by some restrictions given by the root user. Could this be or does anyone have any other idea?

I mentioned this on another topic already but I wanted to bring this up again because the focus there was more on another thing that turned out not to be the problem, so I put this issue here again with a bit more information.

I will take the liberty of pinging @pkonopka and @bvonhall since I hope you can help me here.

Edit: Publishing on the public qcg-test page works from the remote machine. Only the local CCDB/QCG is the problem.

Thanks a lot!
Cheers,
Thomas

1 Like

Hello Thomas,

To me it seems that the local CCDB has a problem. Ponging @grigoras as expert of CCDB.

Cheers,
Barth

Hi,

Can you try again QCG?

For debugging the local one I would need the service logs from it, it’s not clear to me why it fails to initialize the class. I assume a restart didn’t cure it and can be reproduced?

Cheers,

.costin

Hi Costin,
The writing to the CCDB from the QC is already failing, it is not related to QCG.
Cheers,
Barth

Hello Costin,

I already tried this many times and always get the same outcome. Btw: How/where would I get the service logs from qcg? Are this files that are always created?

Cheers,
Thomas

For the QCG logs, I’ll let @awegrzyn and @graduta answer.

As I said, the QCG is not the culprit here.

Yes I understood that and this is also what I thought since there is already the problem in the CCDB but still how to get the logs would be valuable information. Maybe I just don’t find them.

I was referring to restarting the local ccdb instance, not the o2 processes.

No idea if and where the ccdb service logs are stored in this case. Could you do something like this to look them up?

ps auwx | grep local.jar
ls -l /proc//fd/1

In case the output is redirected to a file it should show up here, please send it to me if so.

Otherwise some pstree | grep java to see if the output is piped through another tool, maybe you find it somewhere downstream.

Cheers,

.costin

If you run qcg as systemd unit you can see the logs: journalctl -xe -u o2-qcg
If you run it in console (./qcg) then all logs are display in the console (I believe this is your case).

Hi Costin,
thanks for the fast reply!

Me too. :slight_smile:

I feel like there are no log files. Please correct me if I’m wrong.
I have the CCDB instance running and also the workflow that should write to it. Here are the outputs of the commands you gave:

[tklemenz@tpc-login ~]$ ps auwx | grep local.jar
tklemenz  7180  0.0  0.0 112712   992 pts/80   S+   14:29   0:00 grep --color=auto local.jar
tklemenz 48408  1.0  0.3 40390660 523536 pts/41 Sl  14:15   0:08 java -jar /home/tklemenz/local.jar

There is no folder /proc/fd.

And finally

[tklemenz@tpc-login ~]$ pstree | grep java
        |-java---60*[{java}]
        |-screen---bash---java---90*[{java}]

which tbh does not enlighten me very much but looks at least as if there is nothing written anywhere.
I will investigate that.

Cheers,
Thomas

Hi Adam,

ok, if ./qcg gives me all its output already in the console and there is not more then I am fine with that.

Cheers,
Thomas

Ups, the damn markup ate part of my reply. Please
ls -l /proc/48408/fd/

Cheers,

.costin

Oh, ok. Adding the process ID inbetween seems kind of obvious, I should have figured this out myself…

Indeed there is a link to a log file in /proc/48408/fd/, however thats apmon.log looking like this:

Mar 13, 2020 2:15:54 PM apmon.ApMon initialize
INFO: Initializing destination addresses & ports:
Mar 13, 2020 2:15:54 PM apmon.ApMon arrayInit
INFO: adding destination: 127.0.0.1:8884
Mar 13, 2020 2:15:54 PM apmon.ApMon setJobMonitoring
INFO: Disabling job monitoring...
Mar 13, 2020 2:15:54 PM apmon.ApMon setSysMonitoring
INFO: Disabling system monitoring...
Mar 13, 2020 2:15:54 PM apmon.ApMon setGenMonitoring
INFO: Setting general information monitoring to false
Mar 13, 2020 2:15:54 PM apmon.ApMon setJobMonitoring
INFO: Enabling job monitoring, time interval 60 s
Mar 13, 2020 2:15:55 PM apmon.BkThread run
INFO: [Starting background thread...]
Mar 13, 2020 2:17:55 PM apmon.BkThread sendOneJobInfo
WARNING: Unable to read job Disk Usage info for 48408 : java.lang.NumberFormatException: empty String (empty String)
Mar 13, 2020 2:18:55 PM apmon.BkThread sendOneJobInfo
WARNING: Job 48408 does not exist
Mar 13, 2020 2:18:55 PM apmon.BkThread sendJobInfo
WARNING: There are not jobs to be monitored, not sending job monitoring information...

The last two lines then appear every 60 seconds. To me this does not really look related much. I would have linked to the full log instead of posting most of it but the cernbox is incredibly slow atm.

I don’t know what apmon is and what it does with port 8884. Additional info I should probably add: I have no instance of QCG running since this problem should not be related as Barth already pointed out. QCG would be configured to communicate on port 8081 if that is any kind of useful information.

Cheers,
Thomas

This is the internal monitoring stuff, nothing relevant so far. Let’s try something else:
kill 48408
export TOMCAT_DEBUG=3
java -jar /home/tklemenz/local.jar 2>&1 | tee local.log

And then execute something and normally the log should show the same exceptions with hopefully more details than before.

Cheers,

.costin

I did that and the output looks rather verbose now.

Cheers,
Thomas

Any idea yet what might be the problem, @grigoras?

Cheers,
Thomas

Hi Thomas,

Can you please make sure that your java -version is 11+? From the message I assume that you are running with an older version.

Cheers,

.costin

Oh, ok. java -version gives me

openjdk version "1.8.0_201"
OpenJDK Runtime Environment (build 1.8.0_201-b09)
OpenJDK 64-Bit Server VM (build 25.201-b09, mixed mode)

So I assume 1.8.0 is what is installed on the machine.
I will make sure to get the latest java version installed.

Thanks!
Cheers,
Thomas

If you’re on CC7 then something like this should do:

yum install java-11-openjdk-headless

update-alternatives --config java

and choose the newer version from it (2 in the output below).

On Ubuntu replace the first command with apt install openjdk-11-jdk.

Cheers,

.costin

sudo update-alternatives --config java

There are 2 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*  1           /usr/lib/jvm/jre-1.8.0-oracle.x86_64/bin/java
 + 2           java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64/bin/java)

Enter to keep the current selection[+], or type selection number: 2

Thanks for the info. However, I am not root on that machine (it is CC7). I’ll find a workaround.

Cheers,
Thomas