We are currently working on the pre-clustering and clustering for the MCH detectors, and we have some really basic question regarding the definition of the data structures that need to go out from each step, and be compatible with DPL messages.
For example, a pre-cluster in our case would basically correspond to a list of associated digits, something like this in a kind of pseudo-code:
struct MCHDigit
{
int time, amplitude, detid, padid;
};
struct MCHPreCluster
{
std::vector<MCHDigit> digits;
};
Would such a MCHPreCluster structure be compatible with DPL messages? In the DPL documentation I read this:
“If the message is of a known messageable type, you can automatically get the content of the message by passing type T as template argument. The actual operation depends on the properties of the type. Not all types are supported”
Otherwise, are there already some default base types for pre-clusters and clusters defined in the O2 framework?
Vector of such MCHPreCluster can be messaged using root serialization, which comes with overhead. Both from the messaging and the process memory management POV, it is better to avoid vectors of vectors. Instead, you can split it to vectors, e.g.
struct MCHPreClusterRef {
o2::dataformats::RangeRefComp<4> ref; // 4 means that it may refer to (1<<4)-1=15 clusters, use 5 if needed
// + other data members if needed, if none, you can simply derive from RangeRefComp
}
;
std::vector<MCHDigit> digits;
std::vector<MCHPreClusterRef> preclusterRefs;
// the way to access content of the precluster (obviously, the digits of single precluster
// should be stored in the digits in a continuous way.
for (int i=0;i<preclusterRefs.size();i++) {
const auto& pcr = preclusterRefs[i];
std::cout << "precluster " << i << " of " << pcr.getEntries() << "digits\n";
for (int d=0;d<pcr.getEntries();d++) {
const auto& dig = digits[pcr.getFirstEntry()+d];
std::cout << "digit " << d << " " << dig << '\n';
}
}
Thanks for the example! Let me see if I understood the logic, and if there could be an even simpler solution…
Assuming that there are no digits in common between pre-clusters, we could have a single std::vector<MCHDigit> containing all the digits ordered by their association to pre-clusters, and a second std::vector<int> with the size of each pre-cluster.
For example, if I have 2 pre-clusters, one with 3 digits and the second with 4 digits, I would create a MCHDigit vector with 7 entries, and a second vector of integers with two entries respectively equal to 3 and 4.
Does this make sense?
Yes, this will also work, provided you don’t need random access to the preclusters (i.e. to get the digits of the precluster J you will need to get the offset of their start by summing all sizes 0 : J-1). The RangeRefComp stores the number of entries (e.g. digits) and the offset of the 1 digits.
In fact, if you start with existing vector of digits, you may actually avoid recreating it in way sorted in cluster attachment order: instead, can introduce the vector of indices of digits used by the cluster and refer by RangeRefComp to these vector, i.e. externding my previous example:
std::vector<int> digIdx; // indices of unordered digits used in clusters
//...
for (int i=0;i<preclusterRefs.size();i++) {
const auto& pcr = preclusterRefs[i];
std::cout << "precluster " << i << " of " << pcr.getEntries() << "digits\n";
for (int d=0;d<pcr.getEntries();d++) {
const auto& dig = digits[ digIdx[ pcr.getFirstEntry()+d] ];
std::cout << "digit " << d << " " << dig << '\n';
}
}
PS: Note that the sizeof(RangeRefComp) is the same as int, if you use RangeRefComp<4>, you still have 28 bits for the 1st digit offset (in the TF you will anyway have < 268435456 digits)
In our case, the output vector of digits will likely be different (smaller) than the input one, because we want to use the pre-clusterization to filter out bad digits - for example those that are clearly originating from noise. Then we are free to order the output the way we want.
Sending the sizes and not the offsets has the advantage that there will be no hard cut on the total number of digits.
Anyway, thanks for the explanations, not I really have a better picture of how to handle this!