Skip to content

1.4 Optimizing Components

WARNING

This post has been translated by artificial intelligence and needs proofreading.

The Essence of the Problem

As I previously wrote here - naively using EntityId as an index in component storage is not very efficient, although it is the fastest access method (excluding resizes, of course). Unused occupied memory, in practice, will tend to 100% as the number of entity types grows, and I am not the first to think about this.

The most obvious solution would be to store components in a dictionary: Dictionary<entityId(int), data(TComponent)>. Simple - yes, O(1) - yes, efficient - no, components will be scattered across the heap, and there will be a lot of "heap jumps" and cache misses. Not very good.

Expected Result

Here, it is probably worth describing what solution we should strive for and what we want to achieve in the end.

So, the expected result:

  1. Access to elements in O(1) for both reading and writing.
  2. Linear arrangement of elements in memory.
  3. Minimal overhead on element insertion/deletion.

Desired API:

Don't forget about CRUD

csharp
interface IStorage<T>
{
    int Count { get; }

    ReadOnlySpan<T> All();
    ReadOnlySpan<int> AllEntities();

    // [C]reate
    void Add(in int entityIid, in T value);

    // [R]ead
    bool Has(in int entityIid);

    // [R]ead/[U]pdate
    ref T Ref(in int entityIid);

    // [D]elete
    void Remove(in int entityIid);
}

Additionally, Count, All(), and AllEntities() are added because it is a common practice to iterate over existing data. Just remember that they need to be exposed.

In all my examples, int is used as the Entity Internal Identifier or entityIid used - in other frameworks, it could be a different data type, it's not essential.

What's essential is that EntityId is not used in its entirety here, and checks for world membership and whether the entity is alive or not are not implemented - these checks are performed at a higher level, and this element of the system is a so-called hot spot or hot path - a very high-load element of the system where even the most innocuous check can significantly degrade performance - the fastest code is the code that isn't there.

As a starting point, let's build on a basic "sparse" linear implementation:

csharp
class Storage<T> : IStorage<T>
{
    private bool[]? has;
    private T[]? data;

    public int Count { get; private set; }
    public ReadOnlySpan<T> All() => throw new NotImplementedException();
    public ReadOnlySpan<int> AllEntities() => throw new NotImplementedException();

    public Storage() => Resize(32);
    
    public void Add(in int entityIid, in T value)
    {
        if (entityIid >= has!.Length) Resize((entityIid / 32 + 1) * 32);

        data![entityIid] = value;
        has![entityIid] = true;
        Count++;
    }

    public bool Has(in int entityIid) => entityIid < has!.Length && has[entityIid];

    public ref T Ref(in int entityIid) => ref data![entityIid];

    public void Remove(in int entityIid)
    {
        if (entityIid >= has!.Length) return;

        has[entityIid] = false;
        Count--;
    }

    private void Resize(in int size)
    {
        if (has == null)
        {
            has = new bool[size];
            data = new T[size];
            return;
        }

        var initialSize = has!.Length;
        if (initialSize >= size) return;
        
        Array.Resize(ref has, size);
        Array.Resize(ref data, size);
    }
}

As you can see, there is no efficient way to implement All() and AllEntities() with the current API, either through IEnumerator or by allocating a new array, both options are unacceptable, so we won’t dwell on them for now.

In the current implementation, entityIid itself is the index in the data array, which is very convenient but inefficient - still gaps.

To get rid of the gaps, you need to shift the array elements to the left, something like this:

1

Compact - yes, but something broke - also yes: now entityIid does not point to the correct element in the data array. In simple mathematical terms:

Was: ce = data[ie]

Became: ce = data[iс] where iс = f(ie) or simply ce = data[f(ie)]

What is f()? Where did it come from?

f() is a function that transforms entityIid into an internal index within the data array. This is what was done with the elements in the animation above: took the index that matched entityIid and changed it to another one, ensuring the elements are closely packed in the array.

In reality, it's not feasible to express this through a function simply because these changes depend on user code and are not deterministic, but it can be expressed through mapping, storing the pair entityIid -> index.

You can use Dictionary here!

You can, but we won’t because there’s a better option.

Sparse Set

Explained here and here, so I won't go into much depth.

In a nutshell - it's a clever mapping with addition, reading, writing, and deletion in O(1) but with certain limitations, namely:

  • It can only store values in the range [0 : n), where n is the maximum number of elements.
  • Will be sparse instead of our data array (this is acceptable in principle)

How it looks schematically:

2

We got rid of the gaps in the component storage, components lie linearly, is that all? - No, not all.

Assuming that we have figured out how it all fits together and is interconnected, let's look at the code:

Generally, the code will have the following appearance:

csharp
class Storage<T> : IStorage<T>
{
    private int[]? sparse;
    private int[]? dense;
    private T[]? data;

    public int Count { get; private set; }
    public ReadOnlySpan<T> All() => new(data, 0, Count);

    public Storage() => Resize(32, 32);

    public bool Has(in int entityIid) => sparse![entityIid] != -1;

    public ref T Ref(in int entityIid) => ref data![sparse![entityIid]];

    private void Resize(in int sparseSize, in int dataSize)
    {
        if (sparse == null)
        {
            sparse = new int[sparseSize];
            dense = new int[sparseSize];
            Array.Fill(dense, -1);
            Array.Fill(sparse, -1);
            data = new T[dataSize];
            return;
        }
        
        var initialSparseSize = sparse!.Length;
        if (initialSparseSize < sparseSize)
        {
            Array.Resize(ref sparse, sparseSize);
            Array.Resize(ref dense, sparseSize);
            Array.Fill(sparse, -1, initialSparseSize, sparseSize - initialSparseSize);
            Array.Fill(dense, -1, initialSparseSize, sparseSize - initialSparseSize);
        }

        var initialDataSize = data!.Length;
        if (initialDataSize < dataSize)
            Array.Resize(ref data, dataSize);
    }

    public ReadOnlySpan<int> AllEntities() => throw new NotImplementedException(); // <-- ignoring for now
    public void Add(in int entityIid, in T value) => throw new NotImplementedException(); // <-- will solve below
    public void Remove(in int entityIid) => throw new NotImplementedException(); // <-- will also solve solve below
}

This code takes into account that the components are already in place and packed tightly. The AllEntities() method signature is questionable because identifiers will never be packed tightly in sparse. We will deal with addition and deletion.

I had doubts about the economy and efficiency of this approach, so I armed myself with a calculator and Google spreadsheets.

There is one theory, and now we will test it

With low component utilization, a large number of entities, and small component sizes, we do not save memory, and we also add additional operations that complicate reading/writing.

Well, this needs to be checked, determine the following values and ranges:

  1. N: Total number of entities - let's say 1000.
  2. C: Component size in bytes (sizeof, assuming align = 8 and the minimum component size is also 8, step - also 8).
  3. U: Utilization [0:1] with a step of 0.1 (10%) - what % of all entities have this component, i.e., has value.

For a simple linear storage, the amount of memory consumption is calculated as (C + 1) * N. U is not taken into account because we allocate memory potentially for all possible entities.

In reality, since we are obsessed with optimization, for linear implementation, memory consumption can be reduced using a bitmask, and the formula would look like C * N + ceil(N / 8) - i.e., 1 byte for every 8 entities.

For storage with a sparse set, it will be sizeof(int) * 2 * N + C * N * U.

The table with calculations and the chart can be viewed here, along with the conclusions. The vertical axis - component size

The horizontal axis - utilization

SizeLINEAR0.10.20.30.40.50.60.70.80.91
88125880096001040011200120001280013600144001520016000
16161259600112001280014400160001760019200208002240024000
241412510400128001520017600200002240024800272002960032000
323212511200144001760020800240002720030400336003680040000
404012512000160002000024000280003200036000400004400048000
484812512800176002240027200320003680041600464005120056000
565612513600192002480030400360004160047200528005840064000
646412514400208002720033600400004640052800592006560072000
727212515200224002960036800440005120058400656007280080000
808012516000240003200040000480005600064000720008000088000
888812516800256003440043200520006080069600784008720096000
9696125176002720036800464005600065600752008480094400104000
1041041251840028800392004960060000704008080091200101600112000
1121121251920030400416005280064000752008640097600108800120000
12012012520000320004400056000680008000092000104000116000128000
12812812520800336004640059200720008480097600110400123200136000

One can notice that the use of sparse sets in terms of memory consumption is not advantageous in two cases:

  • Very small components.
  • Very dense utilization.

Do not abuse splitting components into sparse-set ECSes, you will just waste memory for the sake of a false optimization.

Addition / Deletion

The methods of addition and deletion are closely related, so, as always, a bit of theory.

There are several memory utilization strategies in linear storage systems such as arrays, and each of them needs to be understood, choosing something suitable.

Let me clarify right away that there is no ideal strategy - each has its pros and cons.

List Approach

Here, in general, everything is simple:

csharp
public void Add(in int entityIid, in T value)
{
    // убедились, что влезаем
    Resize((entityIid / 32 + 1) * 32, data!.Length == Count + 1 ? data.Length + 32 : data.Length)

    sparse![entityIid] = Count; // проставили индексы
    dense![Count] = entityIid;
    data![Count] = value; // записали данные
    Count++; // инкрементровали счетчик
}
csharp
public void Remove(in int entityIid)
{
    var arrayIndex = sparse![entityIid];
    Array.Copy(data!, arrayIndex + 1, data!, arrayIndex, Count - arrayIndex - 1);

    for (var i = arrayIndex; i < Count - 1; i++)
    {
      sparse![dense![i + 1]] = i;
        dense![i] = dense![i + 1];
    }

    Count--;
    sparse![entityIid] = -1;
    dense![Count] = -1;
}

Let's now see how all this looks schematically:

Addition

3

Deletion

4

If Add() is more or less okay, then in Remove() we have a loop - in the worst case, removing the first added component will require adjusting all indices, O(n) in short - does not meet our requirements.

Is it fixable?

Yes, it's fixable, let's move on.

Remove and Swap Back

A very interesting mechanism for removing array elements, charming in its simplicity.

Its essence is that instead of shifting all elements to the left, the removed element is simply replaced by the last one. This allows for much faster removal of elements by index because only one element is copied. As a result, we have removal in constant time (not to be confused with constant complexity).

This is how it looks in code:

Such things are very convenient to implement as extension methods.

csharp
    public static void RemoveAndSwapBack<T>(in T[] array, in int index, ref int count)
        => array[index] = array[--count];

Well, that's basically it. The last element can be overwritten if array elements are reference types or structures containing reference types so that the garbage collector can pick them up. But that is a topic for another article.

Schematically it looks like this:

5

Cool and beautiful, but there are also negative sides:

  1. The order of storing elements is disrupted - in general, this is not a big problem, for example, when iterating over one type of component. But as we discussed earlier - the access will be through the entity ID mapped to the index in the array and this will be more like random access by index, which will lead to a lot of cache misses (and affect performance accordingly). Here's a good video on this topic.

Some ECS frameworks have optimizations aimed at ordering memory so that iterations always occur from left to right.This significantly speeds up the iteration time but requires additional time for ordering.

But let's get back to implementations and see how it will look in the context of component storage.

csharp
public void Remove(in int entityIid)
{
    var arrayIndex = sparse![entityIid];
    var lastEntityIid = dense![--Count]; 
    data![arrayIndex] = data![Count]; 
    dense![arrayIndex] = lastEntityIid; 
    sparse[lastEntityIid] = arrayIndex; 
    Array.Copy(data!, arrayIndex + 1, data!, arrayIndex, Count - arrayIndex - 1); 

    for (var i = arrayIndex; i < Count - 1; i++) 
    { 
      sparse![dense![i + 1]] = i; 
      dense![i] = dense![i + 1]; 
    } 

    Count--; 
    dense![Count] = -1; 
    sparse![entityIid] = -1;

}

Compared to what was before: minus the loop and a constant number of operations that do not depend on the size and index of the deleted element.

But who am I not to measure all this
TypeMethodCountMeanErrorStdDevMedian
RemoveAndSwapBackReverse1000041.68 us0.825 us1.209 us41.79 us
RemoveAndSwapBackLinear1000042.82 us0.851 us1.399 us42.83 us
RemoveAndSwapBackRandom1000045.59 us0.903 us1.075 us45.50 us
ListReverse1000061.86 us1.353 us3.725 us61.48 us
RemoveAndSwapBackReverse100000290.98 us5.730 us8.217 us286.89 us
RemoveAndSwapBackLinear100000295.25 us5.882 us7.002 us295.54 us
RemoveAndSwapBackRandom100000409.35 us7.702 us7.565 us405.67 us
ListReverse100000440.00 us8.429 us7.472 us437.75 us
RemoveAndSwapBackReverse250000687.89 us3.690 us3.081 us687.25 us
RemoveAndSwapBackLinear250000691.51 us3.748 us3.322 us691.50 us
ListReverse2500001,059.04 us16.636 us15.561 us1,057.12 us
RemoveAndSwapBackRandom2500001,064.65 us19.698 us36.512 us1,048.54 us
ListRandom1000028,061.13 us490.183 us434.534 us27,916.44 us
ListLinear1000057,214.52 us686.687 us642.328 us57,218.58 us
ListRandom1000002,898,043.15 us57,450.454 us72,656.430 us2,859,386.54 us
ListLinear1000005,824,036.29 us73,614.984 us61,471.846 us5,812,698.46 us
ListRandom25000017,908,060.10 us249,017.326 us232,930.961 us17,889,613.81 us
ListLinear25000036,972,363.97 us472,965.859 us442,412.558 us36,850,371.92 us

As expected, yes, it is much faster, and it is clear, in principle, why. But here you can make a number of interesting observations:

  • Random() is faster than Linear() in the case of List, and the fastest was Reverse().

This is due to the fact that in linear iteration from start to end, the amount and volume of copied memory will be maximum, i.e., we always delete the first element and copy the entire "tail". In the case of Random, we copy an average of half the total length. And in the case of Reverse - deleting the last element we copy the element itself, only 1 (this case can be handled separately and just decrement Count).

  • Random() is slower than Linear() in the case of RemoveAndSwapBack, but Reverse() is faster.

Since our operation is atomic and there is no need to manipulate the "tail," everything comes down to the speed of accessing array elements. In linear iteration, the utilization of the prefetcher is maximized (you can read about this here. In the case of random access, cache misses occur more frequently, which is evident in the benchmark. Reverse() is faster because there is no need to jump between the current index and the last one, thus avoiding cache misses.

Can it be even faster?

In theory - yes, but only due to deferred operations of compacting storages, which will maximally preserve the order of stored components.

How to further speed up the in-place version I do not know =)

Conclusions

Sparse Sets - not an ideal solution, at least not for all cases.

Compact in-place better through RemoveAndSwapBack.

The benchmark code can be found here, and the storage code here.