Popular Caching Algorithms: LRU Cache
Hello dear reader!
A technical interview is a process of solving interesting algorithmic problems that most likely won’t be useful to you or your interviewer in a professional career, except at the next technical interview. Nevertheless, such tasks are very popular interview tasks, because they allow an interviewer to evaluate candidate’s skills in finding constraints, to follow his or her thinking process, and also to measure the speed of writing for-loops in notepad. One type of popular tasks is the cache implementation task.
Cache is a software or hardware data storage with high-speed access. For example, a cache is often used to save calculation results in order to reduce the number of repeated calculations. A cache can be used to save the results of access to some storage with a lower access speed. In any case, since the cache only duplicates some data or results that can be obtained again, typically the cache has only temporary memory. Though not as temporary as that of the recruiter who promised to call back, but did not call.
Typically, the cache interface can be represented with the following set of operations:
- Add an element by a key;
- Get an element by a key;
- Remove an element by a key;
- Clear the cache.
At the same time, since we can infinitely add elements to the cache only in theory, the cache should also describe a strategy for discarding elements. For example, the strategy should determine the element which should be discarded when the maximum number of allowed elements is exceeded.
LRU (least recently used) is the strategy of discarding an element that has not been used the longest. It is the element that hasn’t been accessed by a key the longest (methods of adding and receiving a value).
The complete implementation of LRU cache in C#, as usual, can be found on the GitHub of the coding-interview project.
Let’s start with the implementation of the simplest case when we don’t have a discarding strategy. In other words, add a wrapper around the standard implementation of the associative array, which usually comes with the standard library of your favorite programming language:
public class LruCache<TKey, TValue>
{
public LruCache()
{
_cache = new Dictionary<TKey, TValue>();
}
public void Add(TKey key, TValue value)
{
_cache[key] = value;
}
public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
{
return _cache.TryGetValue(key, out value);
}
public bool Remove(TKey key)
{
return _cache.Remove(key);
}
public void Clear()
{
_cache.Clear();
}
private readonly Dictionary<TKey, TValue> _cache;
}
Now let’s limit our cache capacity to some maximum allowed number of elements (capacity
). And let’s create a common access counter (totalAccessSteps
), which will be increased each time we access an element. We need also to store access time for each saved element (accessStepsByKey
), which is necessary to determine the element that has not been used the longest. Looking ahead, we should note that this implementation is not optimal, it will be improved in the future.
public class LruCache<TKey, TValue>
{
public LruCache(int capacity)
{
if (capacity <= 0)
{
var exMessage = $"must be a positive value";
throw new ArgumentOutOfRangeException(nameof(capacity), capacity, exMessage);
}
_capacity = capacity;
_cache = new Dictionary<TKey, TValue>();
_accessStepsByKey = new Dictionary<TKey, int>();
_totalAccessSteps = 0;
}
// ...
private readonly int _capacity;
private readonly Dictionary<TKey, TValue> _cache;
private readonly Dictionary<TKey, int> _accessStepsByKey;
private int _totalAccessSteps;
}
Next, we’re going to update the most obvious cache operations – removal of an element and cleaning of the cache:
public bool Remove(TKey key)
{
_accessStepsByKey.Remove(key);
return _cache.Remove(key);
}
public void Clear()
{
_accessStepsByKey.Clear();
_cache.Clear();
_totalAccessSteps = 0;
}
Consider the operation of adding a new element to the cache. The following three cases may occur, depending on the situation:
- If this element exists in the cache, then update its value and access time;
- If the cache contains fewer elements than its capacity, then simply save this new element;
- If the cache already contains the maximum allowed number of elements, then delete the element that has not been used the longest and save this new element.
Note that updating the value of an element, in this case, is equivalent to deleting the element and adding it again with another value since we still need to change its access time. So we get the following implementation of adding an element:
public void Add(TKey key, TValue value)
{
if (_cache.ContainsKey(key))
{
Update(key, value);
}
else if (_cache.Count < _capacity)
{
AddNew(key, value);
}
else
{
Evict();
AddNew(key, value);
}
++_totalAccessSteps;
}
private void AddNew(TKey key, TValue value)
{
_cache[key] = value;
_accessStepsByKey[key] = _totalAccessSteps;
}
private void Update(TKey key, TValue value)
{
Remove(key);
AddNew(key, value);
}
private void Evict()
{
var minKey = _accessesTime.Aggregate((l, r) => l.Value < r.Value ? l : r).Key;
Remove(minKey);
}
The operation of accessing an element by a key should also change the access time of the element, therefore, we will use the implemented method Update
:
public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
{
if (_cache.TryGetValue(key, out value))
{
Update(key, value);
return true;
}
value = default!;
return false;
}
As noted above, this implementation is not optimal, which is of course immediately noticed by the interviewer who sarcastically reports that in the book it’s written differently. The main problem is with the Evict
method which looks for an element with the earliest access time using brute force. Well, why not do an optimization then?
As our operations over the cache are called sequentially, we can arrange our items in the list in the order they have been accessed from earlier to later (a key is written to the left and value to the right):
Let’s see how the list is changed under the influence of various operations. The operations of clearing the cache and removing a certain element are obvious. Therefore, we consider the remaining operations (hereinafter, in red we will show the removal of some vertex, and in blue – the addition):
- The operation of adding a new element when the cache contains fewer elements than the maximum allowed value. In this case, a new element is added at the end of the list, since in any case, its time of addition will be later than the access time of any other element:
- The operation of adding a new element when the cache contains the maximum number of elements. In this case, the element that is the first in the list is removed, since the access time of it is less than the access time of any other element. After removing, we get a list having the number of elements less than the maximum allowed. That is exactly the same case described in the paragraph above.
- The operation of updating an existing element and the operation of obtaining a value by a key. Both of these cases are equivalent to removing an element and adding it to the end of the list. A new value of the element can be anything (in the example below we have left the original value) since it does not affect the sorting of elements:
It remains to learn how to quickly find an element in the list by a key. To do this, we will create a link to an appropriate list node for each key.
internal class LruItem<TKey, TValue>
{
public LruItem(TKey key, TValue value)
{
Key = key;
Value = value;
}
public TKey Key { get; }
public TValue Value { get; }
}
public class LruCache<TKey, TValue>
{
public LruCache(int capacity)
{
if (capacity <= 0)
{
var exMessage = $"must be a positive value";
throw new ArgumentOutOfRangeException(nameof(capacity), capacity, exMessage);
}
_capacity = capacity;
_itemNodesByKey = new Dictionary<TKey, LinkedListNode<LruItem<TKey, TValue>>>();
_items = new LinkedList<LruItem<TKey, TValue>>();
}
// ...
private readonly int _capacity;
private readonly Dictionary<TKey, LinkedListNode<LruItem<TKey, TValue>>> _itemNodesByKey;
private readonly LinkedList<LruItem<TKey, TValue>> _items;
}
Let’s rewrite the methods for clearing and removing an element from the cache:
public bool Remove(TKey key)
{
if (_itemNodesByKey.TryGetValue(key, out var node))
{
_items.Remove(node);
return _itemNodesByKey.Remove(key);
}
return false;
}
public void Clear()
{
_items.Clear();
_itemNodesByKey.Clear();
}
The methods for adding an element (note that the implementation of the main Add
method remains the same):
public void Add(TKey key, TValue value)
{
if (_itemNodesByKey.ContainsKey(key))
{
Update(key, value);
}
else if (_itemNodesByKey.Count < _capacity)
{
AddNew(key, value);
}
else
{
Evict();
AddNew(key, value);
}
}
private void AddNew(TKey key, TValue value)
{
var item = new LruItem<TKey, TValue>(key, value);
_itemNodesByKey[key] = _items.AddLast(item);
}
private void Update(TKey key, TValue value)
{
Remove(key);
AddNew(key, value);
}
private void Evict()
{
var minKey = _items.First.Value.Key;
Remove(minKey);
}
And the method of obtaining an element (the implementation of the TryGet
method almost has not changed):
public bool TryGet(TKey key, [MaybeNullWhen(false)] out TValue value)
{
if (_itemNodesByKey.TryGetValue(key, out var node))
{
value = node.Value.Value;
Update(key, value);
return true;
}
value = default!;
return false;
}
Here we are, our cache is ready! Now you can safely go for an interview, and I need to say goodbye. See you soon!