Container - Dual-key Map #12
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The ECS system specifically requires a map where
A = Entity
B = Component Type
C = std::any <- component itself
where each entry can be accessed by following
DKM[A] returns {(B,C), (B,C), ... ,(B,C)} // Every entry with first key of A
DKM[B] returns {(A,C), (A,C), ... ,(A,C)} // Every entry with Second key of B
DKM[(A,B)] returns C // Single specific entry that A and B both exist in
The case of indexing for A key and B key only is O(n), where-as with A,B is it worse case O(n). We can actually improve the times of indexing by keeping count of total entries of A and total entries of B. Since that way we can exit out of the loop once we collect all the possible outputs.
For this the current best solution is to store the data in following way
Class DualkeyMap{
std::vector<A*> AKeyList;
std::vector<B*> BKeyList;
std::vector<C*> CValueList;
struct DualkeyHash {
size_t AHash, BHash;
}
std::vector FlatHash;
}
this allows us to only compare hashes of A and B which is pretty fast. If we are indexing for only key A. Then the indexing algorithm only checks for AHash inside vector. We are storing A, B, and C as pointers since vectors may reallocate. We want to make sure when we return reference to A, B, and C that the reference is never invalidated. This does mean that we can return std::span for single key queries, and then cache them for future calls.
std::vector's references WILL be nullified if the vector's vector::capacity is above size. Reallocation will happen. For sake of usability of a map, we need to somehow ensure that
A. References are not nullified
B. Memory usage is minimally affected
Initially using raw pointers or a light pointer wrapper like that in Corrade::Pointer came to mind, however I'm still not sure.
The main benefit of the dual key map is the speed that it can index and process things. Additionally it is tweaked with settings to make allocations less. Vector's allow reservation for memory. Which means we can reserve specific blocks of memory. Currently each entry of A, B, C and FlatHash. If we assume the best which is pointer, pointer, pointer, 2 x size_t
8 * 5 bytes just for indexing and managing so 40 bytes absolute minimum. Preferably of course we would like to reduce this value. One way to do so could be to store the hashes of A and B with the actual pointers to data
std::pair<size_t, uint64_t> or struct version. This way we can remove 1 vector out of the equation.
Class DualkeyMap{
std::vector<std::pair<size_t hash,A*>> AKeyList;
std::vector<std::pair<size_t hash,B*>> BKeyList;
std::vector<C*> CValueList;
}
regarding my last comment on references being nullified can be tested with the following
Also yes here auto is a string pointer not reference, however even if you swap the place of & to auto & same segmentation fault will happen! Since the issue is that the data is no longer where it was.
As stated before by storing the pointers on the vector rather than the raw data we can fix this issue. This will run as expected. While this is a solution. I believe there might be better STL for this specific need.
Current situation means that per element inserted we are going to use at minimum 32 more bytes to know where they are and who they are. This can be forgiven for 100,000 entries this only adds 3.2mb overhead (in the example above it is only quarter).
I have considered possibility of merging all 3 vectors into 1. The reason I am not fond of it as of now is that, with this fragmentation. We can additionally look for specific areas. For example checking "DoesKeyExist" which can be used to quickly index 1st or 2nd key entries. Additionally since 3 entries are all seperately allocated while allocation count is increased (Allocation at worst will happen 6 times. Vector growth 3 times, and 3 A, B, and C initialization) Of course by introducing a size growth factor we can make the allocations be less. I am currently divided between using a factor a constant value. With a factor it while uses more memory makes allocations be logarithmic. While with static sizes, allocations are more frequent but more memory efficient.
This requires further testing to be concluded. Hopefully before 23rd of january 2026 I will come with a consensus and also finish implementation.
Post implementation incorporating this to ECS system will be trivial.
5846c5bf34is the first iteration that works. I still need to add a proper deletion system. Right now I'm leaning into using a tombstoning mechanism rather than to nuke specific entries in the vector and causing an expensive moving situation.
4cc10ddc21with that I have finished implementing it. Thanks to mosra at magnum graphics for their input.I've also integrated DKM into the ECS system, which improved runtime by a lot.