Container - Dual-key Map #12

Closed
opened 2026-01-16 00:38:26 +01:00 by cat · 6 comments
Owner

The ECS system specifically requires a map where
A = Entity
B = Component Type
C = std::any <- component itself

where each entry can be accessed by following
DKM[A] returns {(B,C), (B,C), ... ,(B,C)} // Every entry with first key of A
DKM[B] returns {(A,C), (A,C), ... ,(A,C)} // Every entry with Second key of B
DKM[(A,B)] returns C // Single specific entry that A and B both exist in

The case of indexing for A key and B key only is O(n), where-as with A,B is it worse case O(n). We can actually improve the times of indexing by keeping count of total entries of A and total entries of B. Since that way we can exit out of the loop once we collect all the possible outputs.

For this the current best solution is to store the data in following way

Class DualkeyMap{
std::vector<A*> AKeyList;
std::vector<B*> BKeyList;
std::vector<C*> CValueList;
struct DualkeyHash {
size_t AHash, BHash;
}
std::vector FlatHash;
}

this allows us to only compare hashes of A and B which is pretty fast. If we are indexing for only key A. Then the indexing algorithm only checks for AHash inside vector. We are storing A, B, and C as pointers since vectors may reallocate. We want to make sure when we return reference to A, B, and C that the reference is never invalidated. This does mean that we can return std::span for single key queries, and then cache them for future calls.

The ECS system specifically requires a map where A = Entity B = Component Type C = std::any <- component itself where each entry can be accessed by following DKM[A] returns {(B,C), (B,C), ... ,(B,C)} // Every entry with first key of A DKM[B] returns {(A,C), (A,C), ... ,(A,C)} // Every entry with Second key of B DKM[(A,B)] returns C // Single specific entry that A and B both exist in The case of indexing for A key and B key only is O(n), where-as with A,B is it worse case O(n). We can actually improve the times of indexing by keeping count of total entries of A and total entries of B. Since that way we can exit out of the loop once we collect all the possible outputs. For this the current best solution is to store the data in following way Class DualkeyMap{ std::vector<A*> AKeyList; std::vector<B*> BKeyList; std::vector<C*> CValueList; struct DualkeyHash { size_t AHash, BHash; } std::vector<DualkeyHash> FlatHash; } this allows us to only compare hashes of A and B which is pretty fast. If we are indexing for only key A. Then the indexing algorithm only checks for AHash inside vector. We are storing A, B, and C as pointers since vectors may reallocate. We want to make sure when we return reference to A, B, and C that the reference is never invalidated. This does mean that we can return std::span for single key queries, and then cache them for future calls.
cat added this to the Tourmaline Engine - Basics project 2026-01-16 00:38:26 +01:00
cat moved this to In Progress in Tourmaline Engine - Basics on 2026-01-16 00:38:29 +01:00
cat added the Kind/EnhancementKind/Feature
Priority
Critical
1
labels 2026-01-16 00:38:59 +01:00
cat pinned this 2026-01-16 00:39:03 +01:00
Author
Owner

std::vector's references WILL be nullified if the vector's vector::capacity is above size. Reallocation will happen. For sake of usability of a map, we need to somehow ensure that
A. References are not nullified
B. Memory usage is minimally affected
Initially using raw pointers or a light pointer wrapper like that in Corrade::Pointer came to mind, however I'm still not sure.

The main benefit of the dual key map is the speed that it can index and process things. Additionally it is tweaked with settings to make allocations less. Vector's allow reservation for memory. Which means we can reserve specific blocks of memory. Currently each entry of A, B, C and FlatHash. If we assume the best which is pointer, pointer, pointer, 2 x size_t

8 * 5 bytes just for indexing and managing so 40 bytes absolute minimum. Preferably of course we would like to reduce this value. One way to do so could be to store the hashes of A and B with the actual pointers to data

std::pair<size_t, uint64_t> or struct version. This way we can remove 1 vector out of the equation.

Class DualkeyMap{
std::vector<std::pair<size_t hash,A*>> AKeyList;
std::vector<std::pair<size_t hash,B*>> BKeyList;
std::vector<C*> CValueList;
}

std::vector's references WILL be nullified if the vector's vector::capacity is above size. Reallocation will happen. For sake of usability of a map, we need to somehow ensure that A. References are not nullified B. Memory usage is minimally affected Initially using raw pointers or a light pointer wrapper like that in Corrade::Pointer came to mind, however I'm still not sure. The main benefit of the dual key map is the speed that it can index and process things. Additionally it is tweaked with settings to make allocations less. Vector's allow reservation for memory. Which means we can reserve specific blocks of memory. Currently each entry of A, B, C and FlatHash. If we assume the best which is pointer, pointer, pointer, 2 x size_t 8 * 5 bytes just for indexing and managing so 40 bytes absolute minimum. Preferably of course we would like to reduce this value. One way to do so could be to store the hashes of A and B with the actual pointers to data std::pair<size_t, uint64_t> or struct version. This way we can remove 1 vector out of the equation. Class DualkeyMap{ std::vector<std::pair<size_t hash,A*>> AKeyList; std::vector<std::pair<size_t hash,B*>> BKeyList; std::vector<C*> CValueList; }
Author
Owner

regarding my last comment on references being nullified can be tested with the following

#include <print>
#include <string>
#include <vector>

int main() {
  std::vector<std::string> container{};
  container.emplace_back("Test");
  auto t1 = &container[0];
  container.emplace_back("Test1");
  auto t2 = &container[1];
  container.emplace_back("Test2");
  auto t3 = &container[2];
  container.emplace_back("Test3");
  auto t4 = &container[3];
  std::print("t1 = {}\nt2 = {}\nt3 = {}\nt4 = {}\n", *t1, *t2, *t3, *t4);

  return 0;
}

Also yes here auto is a string pointer not reference, however even if you swap the place of & to auto & same segmentation fault will happen! Since the issue is that the data is no longer where it was.

regarding my last comment on references being nullified can be tested with the following ```c++ #include <print> #include <string> #include <vector> int main() { std::vector<std::string> container{}; container.emplace_back("Test"); auto t1 = &container[0]; container.emplace_back("Test1"); auto t2 = &container[1]; container.emplace_back("Test2"); auto t3 = &container[2]; container.emplace_back("Test3"); auto t4 = &container[3]; std::print("t1 = {}\nt2 = {}\nt3 = {}\nt4 = {}\n", *t1, *t2, *t3, *t4); return 0; } ``` Also yes here auto is a string pointer not reference, however even if you swap the place of & to auto & same segmentation fault will happen! Since the issue is that the data is no longer where it was.
Author
Owner
#include <print>
#include <string>
#include <vector>

int main() {
  std::vector<std::string *> container{};
  container.emplace_back(new std::string("Test"));
  auto &t1 = *container[0];
  container.emplace_back(new std::string("Test1"));
  auto &t2 = *container[1];
  container.emplace_back(new std::string("Test2"));
  auto &t3 = *container[2];
  container.emplace_back(new std::string("Test3"));
  auto &t4 = *container[3];
  std::print("t1 = {}\nt2 = {}\nt3 = {}\nt4 = {}\n", t1, t2, t3, t4);

  return 0;
}

As stated before by storing the pointers on the vector rather than the raw data we can fix this issue. This will run as expected. While this is a solution. I believe there might be better STL for this specific need.

Current situation means that per element inserted we are going to use at minimum 32 more bytes to know where they are and who they are. This can be forgiven for 100,000 entries this only adds 3.2mb overhead (in the example above it is only quarter).

```c++ #include <print> #include <string> #include <vector> int main() { std::vector<std::string *> container{}; container.emplace_back(new std::string("Test")); auto &t1 = *container[0]; container.emplace_back(new std::string("Test1")); auto &t2 = *container[1]; container.emplace_back(new std::string("Test2")); auto &t3 = *container[2]; container.emplace_back(new std::string("Test3")); auto &t4 = *container[3]; std::print("t1 = {}\nt2 = {}\nt3 = {}\nt4 = {}\n", t1, t2, t3, t4); return 0; } ``` As stated before by storing the pointers on the vector rather than the raw data we can fix this issue. This will run as expected. While this is a solution. I believe there might be better STL for this specific need. Current situation means that per element inserted we are going to use at minimum 32 more bytes to know where they are and who they are. This can be forgiven for 100,000 entries this only adds 3.2mb overhead (in the example above it is only quarter).
Author
Owner

I have considered possibility of merging all 3 vectors into 1. The reason I am not fond of it as of now is that, with this fragmentation. We can additionally look for specific areas. For example checking "DoesKeyExist" which can be used to quickly index 1st or 2nd key entries. Additionally since 3 entries are all seperately allocated while allocation count is increased (Allocation at worst will happen 6 times. Vector growth 3 times, and 3 A, B, and C initialization) Of course by introducing a size growth factor we can make the allocations be less. I am currently divided between using a factor a constant value. With a factor it while uses more memory makes allocations be logarithmic. While with static sizes, allocations are more frequent but more memory efficient.

This requires further testing to be concluded. Hopefully before 23rd of january 2026 I will come with a consensus and also finish implementation.

Post implementation incorporating this to ECS system will be trivial.

I have considered possibility of merging all 3 vectors into 1. The reason I am not fond of it as of now is that, with this fragmentation. We can additionally look for specific areas. For example checking "DoesKeyExist" which can be used to quickly index 1st or 2nd key entries. Additionally since 3 entries are all seperately allocated while allocation count is increased (Allocation at worst will happen 6 times. Vector growth 3 times, and 3 A, B, and C initialization) Of course by introducing a size growth factor we can make the allocations be less. I am currently divided between using a factor a constant value. With a factor it while uses more memory makes allocations be logarithmic. While with static sizes, allocations are more frequent but more memory efficient. This requires further testing to be concluded. Hopefully before 23rd of january 2026 I will come with a consensus and also finish implementation. Post implementation incorporating this to ECS system will be trivial.
Author
Owner

5846c5bf34
is the first iteration that works. I still need to add a proper deletion system. Right now I'm leaning into using a tombstoning mechanism rather than to nuke specific entries in the vector and causing an expensive moving situation.

https://git.thenight.club/cat/Tourmaline-Engine/commit/5846c5bf348f0bde7b2777f8089d464da2d35e30 is the first iteration that works. I still need to add a proper deletion system. Right now I'm leaning into using a tombstoning mechanism rather than to nuke specific entries in the vector and causing an expensive moving situation.
Author
Owner

4cc10ddc21 with that I have finished implementing it. Thanks to mosra at magnum graphics for their input.

I've also integrated DKM into the ECS system, which improved runtime by a lot.

https://git.thenight.club/cat/Tourmaline-Engine/commit/4cc10ddc2130dd3b6dc4737356e683e611a0b9b5 with that I have finished implementing it. Thanks to mosra at magnum graphics for their input. I've also integrated DKM into the ECS system, which improved runtime by a lot.
cat closed this issue 2026-01-28 13:37:43 +01:00
cat moved this to Done in Tourmaline Engine - Basics on 2026-01-28 13:37:52 +01:00
Sign in to join this conversation.