we are wondering how exactly to use the seed in the random samplingCondition of dataSamplingPolicies.
In our case we have a task that is sending data from multiple EPNs to the merger on the QC node, which runs the remote QC task on the data. Therefore, each EPN has its own config, which, however, are the same on all EPNs. This means that also the seed in the dataSamplingPolicies is the same on all EPNs.
E.g.
The main question is: Is this intended to be used like this where every machine has the same seed or should they have different ones?
Can it lead to “bursts” of data with equal seeds on each machine?
If the seeds should be randomized so that each machine has a different one, is there an easy way to do so? E.g. seed = 0 leads to a random one. However, after checking the code I don’t think this is the case.
I also checked the DataSampling docu but didn’t find a hint concerning our question.
Just for clarity, if a Task runs remotely, there are no mergers. Data just reach the remote QC task via some proxies. Mergers are used to merge Monitor Objects, which is not needed in this case.
If you use the same seed, you will always get the same pseudo-random selection decisions for the same timeframeIDs (which are taken from DataProcessingHeader::startTime).
As far as I understand, EPNs receive different TFs in a round-robin scheme or some other algorithm, so you cannot have any bursts of data then. These would happen for FLPs, which work on STFs with the same ID in parallel.
That being said, we can add the proposed feature of selecting a random seed for seed==0, if you need. It should be quick.