Testdata location for boost tests

laphecet · April 23, 2018, 1:55pm

Dear all,

I was wondering if there is a recommended way of dealing with testdata (i.e. input files that are required for some tests) for boost tests in O2 ?

e.g. :

where should I put those testdata
how should I reference them in the test code

so I don’t run into some basic file path issues when running the tests ?

Thanks

swenzel · April 23, 2018, 2:42pm

I would say it depends on the file format. If it is just small text files that you are using in the tests, one could put them directly in the test directories.

For binary files (ROOT files) the problem is of course different…

Can you generate the test-data on the fly? Then some test/process could generate them during test time and another test opens the file directly. cmake supports the notion of test dependencies and ensures the right order of execution. (Via such a mechanism one could think about downloading the test data from somewhere; Same is possible during cmake time of course.)

laphecet · April 23, 2018, 3:12pm

The file I’m considering right now is a JSON file.

It contains test positions for the MCH mapping with the expected result of a “PadByPosition”-like call. For instance :

       {
            "de": 100,
            "bending": "true",
            "x": 38.78732515590371,
            "y": 21.294016168598937,
            "dsid": 89,
            "dsch": 17,
            "px": 38.745,
            "py": 21.21,
            "sx": 0.63,
            "sy": 0.42
        },
        {
            "de": 100,
            "bending": "false",
            "x": 12.08335841285345,
            "y": -42.853158086135245,
            "isoutside": "true"
        }, 
       ...

The size depends basically on the number of test positions in the file. I’m currently using 10 test positions per detection element (we have 156 detection elements), and the (uglified version) of the JSON is 158KB. Which I’d consider small enough, but ideally I’d want more test positions…

Note that I cannot generate the test positions from O2. They are actually generated from AliRoot mapping (from alo to be precise, which uses AliRoot)in order to get reference values to test the O2 implementation(s)…

eulisse · April 23, 2018, 8:07pm

How about storing them in a separate branch (not as dangling commits, because they would be GC), and use git-cat-file to extract them on the fly? Assuming they do not change much, this could be rather efficient, thanks to git object store compression.

sbinet · April 23, 2018, 8:36pm

what about git-lfs ?

eulisse · April 24, 2018, 7:45am

I think that you have to pay after the first GB, no?

dberzano · April 24, 2018, 8:26am

I tend not to favour any variation from plain Git. We had this discussion already in ALICE and, as @eulisse said, it does not come for free. There are free alternatives, but all of them are implementation-dependent and would require us to maintain our own services.

I think this is a smart solution!

laphecet · April 24, 2018, 8:33am

ok, git-lfs is not free, let’s forget about it.

About the git cat-file solution, it might be smart, but I’d need a pedestrian explanation to fully get how to use it

dberzano · April 24, 2018, 8:46am

So basically during the build phase, in the recipe, you have $SOURCEDIR set to the source directory (by aliBuild), which is a Git directory. First off you need to make sure this directory knows the branch with test files, call it testfiles:

git -C $SOURCEDIR fetch origin testfiles

Then you would get the file you want with:

git show origin/testfiles:PATH/TO/FILE/YOU/want.txt

Actually I haven’t used git cat-file in my example, as usual with Git there are a thousand ways to do the same thing.

I’d prepare the test files dir in the recipe, maybe, and I’d keep Git commands outside of CMake.

laphecet · April 24, 2018, 10:14am

Ok, I guess I see. But then it means I would somehow require the usage of the recipe, i.e. a plain CMake usage would be more complicated that now…

laphecet · April 26, 2018, 12:02pm

For the record, the discussions above only address the first of the two questions I had :

where should I put those testdata
how should I reference them in the test code

Unless there are some objections, I’d would go with a simple solution (for the first question) of committing the test data file “as is” in git. It’s 158KB. Not small, but not large either, and it contains enough information to offer reasonable testing ground.

For the second point, the only reliable way I’ve found is to actually pass the file name as an argument to the test program (see SegmentationLong.cxx hasTestPosFile in https://github.com/AliceO2Group/AliceO2/pull/1025). The other option would have been to infer the test data file location from the executable path, but that would be somehow fragile.

dberzano · April 26, 2018, 2:42pm

Hi Laurent,

I agree with your solution of shipping mere 158 KB with the code.
Having it as argument is the best option (better than inferring it or relying on environment variables IMHO)

If you agree, please mark this as a solution!

d.