Popular Caching Algorithms: LFU Cache

Hello, dear reader!

Today’s post I would like to start with a standard question at the interview: “What achievement are you most proud of at work?” This question may surprise or even offend because it is not enough today to just perform your duties well, you should be proud of it. But don’t let the interviewer take you by surprise. If the room is enough spacious, then it is even better to get out of the chair, your one-man show should light a fire in the interviewer’s eyes. Have you recently wrapped some method to try / catch? Do not be shy – you have already warned the company against an exceptional situation! Did you fix another bug a month ago? It was not just a fix, it was saving the company from potential bankruptcy! And here the main thing is not to overdo, because the interviewer most likely also visits interviews quite often, and more than once has already “saved the company from bankruptcy”. And it’s better, of course, to prepare for this question in advance.

But this is a standard question. And we will return to a non-standard one. In the previous post, we have already got acquainted with caching and the LRU algorithm of element eviction. I propose to consider the following example:

var cache = new LruCache<string, int>(2);

for (int i = 0; i < 100; ++i)
{
    cache.Add("x", 1);
}
cache.Add("y", 3);
cache.Add("z", 1);

if (cache.TryGet("x", out var x))
{
    Console.WriteLine($"Element 'x' is equal {x}");
}
else
{
    Console.WriteLine($"Element 'x' was not found");
}

In the console, we will see "Element 'x' was not found". Thus, the x element will be forced out of the cache, to which there were 100 times more accesses, but the LRU strategy won’t take this into account. Although this example is quite far-fetched, it illustrates well the limitations of the LRU strategy. In this post, we will consider a new strategy that takes into account the frequency of calls to elements. Let’s go!

LFU (least frequently used) is the strategy of discarding an element that has been least used. This element is the element that has been accessed by a key (methods of adding and receiving values) the least number of times. If there are several such elements, then the element that has not been accessed the longest is discarded.

The complete implementation of LFU cache, as usual, can be found on the GitHub coding-interview project.

And we’re returning back to the implementation. Let’s start with the simplest option. We will implement a wrapper around an associative array without an eviction strategy:

public class LfuCache<TKey, TValue>
{
    public LfuCache()
    {
        _cache = new Dictionary<TKey, TValue>();
    }

    public void Add(TKey key, TValue value)
    {
        _cache[key] = value;
    }

    public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
    {
        return _cache.TryGetValue(key, out value);
    }

    public bool Remove(TKey key)
    {
        return _cache.Remove(key);
    }

    public void Clear()
    {
        _cache.Clear();
    }

    private readonly Dictionary<TKey, TValue> _cache;
}

Now let’s limit our cache capacity to some maximum allowed number of elements (capacity). And let’s create a common access counter (totalAccessSteps), which will be increased each time we access an element. We need also to store access time for each saved element (accessStepsByKey) and the number of accesses per element (frequenciesByKey). Looking ahead, we should note that this implementation is not optimal, it will be improved in the future.

public class LfuCache<TKey, TValue>
{
    public LfuCache(int capacity)
    {
        if (capacity <= 0)
        {
            var exMessage = $"must be a positive value";
            throw new ArgumentOutOfRangeException(nameof(capacity), capacity, exMessage);
        }

        _capacity = capacity;
        _cache = new Dictionary<TKey, TValue>();
        _frequenciesByKey = new Dictionary<TKey, int>();
        _accessStepsByKey = new Dictionary<TKey, int>();
        _totalAccessSteps = 0;
    }

    // ...

    private readonly Dictionary<TKey, TValue> _cache;
    private readonly Dictionary<TKey, int> _frequenciesByKey;
    private readonly Dictionary<TKey, int> _accessStepsByKey;
    private readonly int _capacity;
    private int _totalAccessSteps;
}

Let’s update the most obvious cache operations – removal of an element and cleaning of the cache

public bool Remove(TKey key)
{
    _accessStepsByKey.Remove(key);
    _frequenciesByKey.Remove(key);
    return _cache.Remove(key);
}

public void Clear()
{
    _cache.Clear();
    _frequenciesByKey.Clear();
    _accessStepsByKey.Clear();
    _totalAccessSteps = 0;
}

Consider, please, the operation of adding a new element to the cache. The following three cases may occur, depending on the situation:

  1. If this element exists in the cache, then update its value, access frequency and time;
  2. If the cache contains fewer elements than its capacity, then simply save this new element;
  3. If the cache already contains the maximum allowed number of elements, then delete the element that has been least used. If there are several such elements then remove the element that has not been used the longest. If there are several such elements, then remove the element that was accessed the earliest. Then add this new element.
public void Add(TKey key, TValue value)
{
    if (_cache.ContainsKey(key))
    {
        Update(key, value);
    }
    else if (_cache.Count < _capacity)
    {
        AddNew(key, value);
    }
    else
    {
        Evict();
        AddNew(key, value);
    }

    ++_totalAccessSteps;
}

private void AddNew(TKey key, TValue value)
{
    _cache[key] = value;
    _accessStepsByKey[key] = _totalAccessSteps;
    _frequenciesByKey[key] = 1;
}

private void Update(TKey key, TValue value)
{
    _cache[key] = value;
    _accessStepsByKey[key] = _totalAccessSteps;
    _frequenciesByKey[key] = _frequenciesByKey[key] + 1;
}

private void Evict()
{
    var minFrequency = _frequenciesByKey.Values.Min();
    var minFrequencyKeys = _frequenciesByKey
        .Where((x) => x.Value == minFrequency)
        .Select((x) => x.Key);
    var lfuItemKey = minFrequencyKeys
        .Aggregate((l, r) => _accessStepsByKey[l] < _accessStepsByKey[r] ? l : r);

    Remove(lfuItemKey);
}

The operation of accessing an element by a key should also change the access time and frequency of the element, therefore, we will use the implemented method Update:

public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
{
    if (_cache.TryGetValue(key, out value))
    {
        Update(key, value);
        return true;
    }

    value = default!;
    return false;
}

As noted above, this implementation is not optimal. The main problem is with the Evict method which looks for an element that should be evicted using brute force. Well, let’s optimize it!

We will divide our elements into groups according to the frequency of use. In each group, we will arrange the elements in the order they are added from earlier to later. We will arrange frequency groups vertically, and we will arrange elements in each group horizontally (squares indicate frequency groups, and circles indicate elements, where the key is written to the left while the value is written to the right):

Let’s see how the structure is changed under the influence of various operations. The clear cache operation is obvious. Therefore, we will consider the remaining operations (hereinafter, in red we will show the removal of a vertex or group, and in blue – addition):

Elements in LFU cache
An initial state example
  • The operation of removing an element from the cache. In this case, we remove the element from the frequency group. If the element is the last in the group, then we delete this frequency group too:
Removing an element from LFU cache
The removal of an element
Removing an element and its frequency group from LFU cache
The removal of the last element in a group
  • The operation of adding a new element if the cache contains fewer elements than the maximum allowed value. In this case, a new element is added at the end of the list of the group with the access frequency equals to one:
Adding an element to LFU cache
Adding a new element
  • The operation of adding a new element if the cache contains the maximum allowed number of elements. In this case, the element that is first in the list of the group with the lowest access frequency is removed, since the access time to it is less than the access time to any other element. After removal, we get a structure in which the number of elements is less than the maximum allowable. That is exactly the same case described in the paragraph above:
Adding an element with eviction to LFU cache
Adding a new element with eviction
  • The operation of updating an existing element. In this case, the new value of an element can be any, since it does not affect the sorting of elements. Note, please, that if the item is the last in the group, then this group is removed. If necessary, a new frequency group is created:
Updating an element in LFU cache
Updating an element
Updating an element and its frequency group in LFU cache
Updating an element and its frequency group
  • The operation of getting an element by a key. This operation is equivalent to updating an existing element while keeping its original value.

It remains to learn how to quickly find an element in the list by a key. To do this, we will create a link to an appropriate list node for each key. To quickly search for a frequency group, we will also store for each frequency a link to the corresponding frequency group.

internal class LfuItem<TKey, TValue>
{
    public LfuItem(TKey key, TValue value, int frequency)
    {
        Key = key;
        Value = value;
        Frequency = frequency;
    }

    public TKey Key { get; }
    public TValue Value { get; }
    public int Frequency { get; }
}

internal class LfuFrequencyGroup<TKey, TValue>
{
    public LfuFrequencyGroup(int frequency)
    {
        Frequency = frequency;
        Items = new LinkedList<LfuItem<TKey, TValue>>();
    }

    public int Frequency { get; }
    public LinkedList<LfuItem<TKey, TValue>> Items { get; }
}

public class LfuCache<TKey, TValue>
{
    public LfuCache(int capacity)
    {
        if (capacity <= 0)
        {
            var exMessage = $"must be a positive value";
            throw new ArgumentOutOfRangeException(nameof(capacity), capacity, exMessage);
        }

        _capacity = capacity;
        _itemNodesByKey = new Dictionary<TKey, LinkedListNode<LfuItem<TKey, TValue>>>();
        _frequencyGroupsByFrequency = new Dictionary<
            int,
            LinkedListNode<LfuFrequencyGroup<TKey, TValue>>
        >();
        _frequencyGroups = new LinkedList<LfuFrequencyGroup<TKey, TValue>>();
    }

    // ...

    private readonly Dictionary<TKey, LinkedListNode<LfuItem<TKey, TValue>>> _itemNodesByKey;
    private readonly Dictionary<
        int,
        LinkedListNode<LfuFrequencyGroup<TKey, TValue>>
    > _frequencyGroupsByFrequency;
    private readonly LinkedList<LfuFrequencyGroup<TKey, TValue>> _frequencyGroups;
    private readonly int _capacity;
}

Let’s rewrite the methods for clearing and removing an element from the cache:

public bool Remove(TKey key)
{
    if (_itemNodesByKey.TryGetValue(key, out var itemNode))
    {
        var frequencyGroupNode = _frequencyGroupsByFrequency[itemNode.Value.Frequency];
        frequencyGroupNode.Value.Items.Remove(itemNode);
        if (frequencyGroupNode.Value.Items.Count == 0)
        {
            _frequencyGroupsByFrequency.Remove(itemNode.Value.Frequency);
            _frequencyGroups.Remove(frequencyGroupNode);
        }
        _itemNodesByKey.Remove(key);
        return true;
    }

    return false;
}

public void Clear()
{
    _itemNodesByKey.Clear();
    _frequencyGroups.Clear();
    _frequencyGroupsByFrequency.Clear();
}

The methods for adding an element (note that the implementation of the main Add method remains the same):

public void Add(TKey key, TValue value)
{
    if (_itemNodesByKey.ContainsKey(key))
    {
        Update(key, value);
    }
    else if (_itemNodesByKey.Count < _capacity)
    {
        AddNew(key, value);
    }
    else
    {
        Evict();
        AddNew(key, value);
    }
}

private void AddNew(TKey key, TValue value)
{
    var lfuItem = new LfuItem<TKey, TValue>(key, value, 1);

    if (!_frequencyGroupsByFrequency.TryGetValue(1, out var frequencyGroupNode))
    {
        frequencyGroupNode = _frequencyGroups
            .AddFirst(new LfuFrequencyGroup<TKey, TValue>(1));
        _frequencyGroupsByFrequency.Add(1, frequencyGroupNode);
    }

    _itemNodesByKey[key] = frequencyGroupNode.Value.Items.AddLast(lfuItem);
}

private void Update(TKey key, TValue value)
{
    var itemNode = _itemNodesByKey[key];
    var frequency = itemNode.Value.Frequency;
    var frequencyNode = _frequencyGroupsByFrequency[itemNode.Value.Frequency];
    var nextFrequency = frequency + 1;

    if (!_frequencyGroupsByFrequency.TryGetValue(
        nextFrequency,
        out var nextFrequencyNode
    ))
    {
        nextFrequencyNode = _frequencyGroups.AddAfter(
            frequencyNode,
            new LfuFrequencyGroup<TKey, TValue>(nextFrequency)
        );
        _frequencyGroupsByFrequency[nextFrequency] = nextFrequencyNode;
    }
    Remove(key);

    var nextItemNode = nextFrequencyNode.Value.Items.AddLast(new LfuItem<TKey, TValue>(
        key,
        value,
        nextFrequency
    ));
    _itemNodesByKey[key] = nextItemNode;
}

private void Evict()
{
    var lfuItemNode = _frequencyGroups.First.Value.Items.First;
    Remove(lfuItemNode.Value.Key);
}

And the method of obtaining an element (the implementation of the TryGet method almost has not changed):

public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
{
    if (_itemNodesByKey.TryGetValue(key, out var itemNode))
    {
        value = itemNode.Value.Value;
        Update(key, value);
        return true;
    }

    value = default!;
    return false;
}

This completes the implementation of LFU caching. I need to say goodbye. And I’d like to ask you, dear reader, to familiarize yourself also with the other reviews of coding-interview problems. See you soon!

Leave a Reply