Counting sort is one of such algorithms and this post provides a step by step guide to its implementation.

The post was originally posted on my personal website. You can find there also a slideshow that visually explains the algorithm.

Counting sort requires a precondition to be applied. Each of the n input elements shall be an integer in a finite range or shall have an integer key in that range.

Let's assume for now that the range is [0,k].

The basic idea is that the number of times each key occurs in the input array (e.g. its frequency) can be used to determine the position of the elements in the sorted output array.

How is this possible?

Let's suppose that there is an input element with key equal to c, with c <= k. If we know that the input array contains m elements with key < c, the index of that element in the sorted output array will be clearly equal to m.

An example can help to understand better. If the input is [1,1,2,4,3,0,1,2,3,1], there are five keys less than 2. So the first 2 will go in position 5 in the output [0,1,1,1,1,2,2,3,3,4].

The frequency of each key can be computed using a **bookkeeping** array with size equal to k+1 initialized with zeros. Since the keys in the input array are in the range [0,k], we can iterate over it and use the keys as indices for the bookkeeping array.

During the iteration we increment the elements of the bookkeeping indexed by the keys. At the end, each element of the bookkeeping array will store the frequency of the key with the corresponding index.

If there are only integer keys without attached values, it is trivial to build the sorted output. We can just iterate through the **bookkeping** array and fill the sorted output with the occurrences of each key.

```
void CountingSort(vector<int>& keys) {
int max_key = *max_element(keys.begin(), keys.end());
vector<int> bookkeeping(max_key + 1);
for (int key : keys) bookkeeping[key]++;
for (int i = 0, k = 0; i <= max_key; ++i) {
for (int j = 0; j < bookkeeping[i]; ++j) {
keys[k++] = i;
}
}
}
```

The generic case where the input elements are key-value pairs is more complicated. It is necessary to precompute the position of each input element in the sorted output.

The easiest way is to accomplish this is building up another array. We can use the bookkeeping array to build up an array that will tell us where the next occurrence of a key goes in the sorted output.

We can call this array **next_index** because its i-th element represents the position of the next input element with key i in the sorted output. We can build the next_index array tasking into account two things:

the output position of the first element with key i corresponds to the number of elements with key < i

the bookkeeping subarray with indices in the range [0,i-1] contains the frequencies of all the elements with key < i

So the i-th element of the next_index array corresponds to the cumulative sum of the bookkeeping array up to the index i-1. Notice how the bookkeeping and the next_index arrays have the same size.

The last step is now using the next_index array to build the sorted output.

We can now iterate through the input array and use each key as an index for the next_index array to get its position in the output array.

Once the current element is placed in its output position, the next_index element indexed by its key get increased. This will always keep valid the invariant that the i-th element of next_index represents the position of the next input element with key i in the sorted output.

Once this iteration is completed, we can copy the sorted output array back to the input array.

```
vector<pair<int, string>> CountingSort(vector<pair<int, string>>& items) {
int max_key = max_element(items.begin(), items.end(),
[](auto const& x, auto const& y) {
return x.first < y.first;
})->first;
vector<int> bookkeeping(max_key+1, 0);
//counter[i] corresponds to the number of entries with key equal to i
for (const auto& item : items) {
bookkeeping[item.first]++;
}
//nextIndex[i] corresponds to the number of entries with key less than i
vector<int> next_index(max_key+1, 0);
for (int i = 1; i < next_index.size(); ++i) {
next_index[i] = next_index[i-1] + bookkeeping[i-1];
}
vector<pair<int, string>> output(items.size());
for (const auto& item : items) output[next_index[item.first]++] = item;
return output;
}
```

Instead of creating a separate next_index array, we can use directly the bookkeeping array saving some memory space. To understand how, let's consider that:

- the position of the last element with key i corresponds to the number of elements with key <= i

2️. the bookkeeping subarray with indices in the range [0,i] contains the frequencies of the elements with key <= i

So the cumulative sum of the bookkeeping array up to the index i corresponds to the position (plus 1) of the last input element with key i in the sorted output. The cumulative sum of the bookkeeping array can be computed in place.

We can then iterate backwards over the input array and use each key as an index for the bookkeeping array to get its position in the output array.

Before the current element is placed in its output position, the bookkeeping indexed by its key get decreased. This will always keep valid the invariant that the i-th element of next_index represents the position (plus 1) of the last input element with key i in the sorted output.

```
vector<pair<int, string>> CountingSort(vector<pair<int, string>>& items) {
int max_key = max_element(items.begin(), items.end(),
[](auto const& x, auto const& y) {
return x.first < y.first;
})->first;
vector<int> bookkeeping(max_key + 1, 0);
//count keys frequency
for (const auto& item : items) {
bookkeeping[item.first]++;
}
//at the end each element is the index of the last element with that key
std::partial_sum(bookkeeping.begin(), bookkeeping.end(), bookkeeping.begin());
vector<pair<int, string>> output(items.size());
//build sorted output iterating backward
for (auto it = items.crbegin(); it != items.crend(); ++it) {
output[--bookkeeping[it->first]] = *it;
}
return output;
}
```

We always assumed that all the keys were positives and included in a range between 0 and some maximum value k.

Actually this is not a prerequisite. We can apply the counting sort to whatever integer range of keys. The trick is just to find the minimum element and store the frequency of that minimum element at 0 index.

An important property of counting sort is that it is **stable**: elements with the same key appear in the output array in the same order as they do in the input array.

Regarding the asymptotic analysis we have instead that:

**time complexity**is O(n+k) to iterate through both the input array and the bookkeeping array**space complexity**is O(k) to store the bookkeeping array.

Usually the number of items to be sorted is not asymptotically different than the number of keys those items can take on. In those cases k becomes O(n) and the time complexity of the whole algorithm is O(n). Anyway this is not always valid (i.e. input = [1e10,1,2,4,3,0,1,2,3,1]).

```
def countingSort(input):
maxKey= max(input, key=lambda item:item[0])[0]
bookeepingLength = maxKey+1
bookeeping = [0] * bookeepingLength
# Count keys frequency
for item in input:
bookeeping[item[0]] += 1
# at the end each element is the index
# of the last element with that key
for i in range(1, bookeepingLength):
bookeeping[i] += bookeeping[i-1]
output = [0] * len(input)
# build sorted output iterating backward
i = len(input) - 1
while i >= 0:
item = input[i]
bookeeping[item[0]] -= 1
position = bookeeping[item[0]]
output[position] = item
i -= 1
return output
```

```
class CountingSort {
private static int[] PartialSum(int[] input)
{
for (int i = 1; i < input.Count(); i++)
{
input[i] = input[i] + input[i - 1];
}
return input;
}
public static (int, string)[] CountingSort((int, string)[] items)
{
int max_key = items.Max(t => t.Item1);
var bookkeeping = new int[max_key + 1];
//count keys frequency
foreach (var item in items) {
bookkeeping[item.Item1]++;
}
//at the end each element is the index of the last element with that key
bookkeeping = PartialSum(bookkeeping);
var output = new (int, string)[items.Length];
//build sorted output iterating backward
for (int i = items.Length - 1; i >= 0; i--) {
output[--bookkeeping[items[i].Item1]] = items[i];
}
return output;
}
}
```

Counting sort is a powerful and extremely efficient algorithm. Its basic idea is simple, but the implementation can be tricky and requires attention.

This post provided a step by step guide for implementing the counting sort algorithm.

The implementation of the algorithm in multiple programming languages (C++, C# and Python) is available at my GitHub repository.

If you liked this post, follow me on Twitter to get more related content daily!

]]>The data inside the ring buffer are delimited by two pointers that are adjusted when a new data is generated or an existing data is consumed. In particular the tail pointer advances when a new data is added and the head pointer advances when an old data is consumed. If one of the pointers reaches the end of the buffer, it wraps around to the beginning.

Ring buffers are often used as fixed-sized queues in embedded systems, where static data storage methods are preferred. A common use case is when data are generated and consumed at different rates, so that the most recent data are always consumed.

This post presents a Ring Buffer implemented in C++ using templates.

The article was originally published on my personal website.

The data structure provides API to put elements into the buffer and get elements from the buffer, to know if the buffer is full or empty and to know the size and capacity of the buffer.

```
template <class T>
class RingBuffer {
using DataPtr = std::unique_ptr<T[], std::function<void(T*)>>;
public:
RingBuffer(size_t size);
~RingBuffer();
RingBuffer(const RingBuffer& src);
RingBuffer& operator= (const RingBuffer& rhs);
RingBuffer(RingBuffer&& rhs);
RingBuffer& operator =(RingBuffer&& rhs);
bool empty() const;
bool full() const;
void put(const T& item);
T get();
size_t capacity() const;
size_t size() const;
private:
std::mutex mMutex;
size_t mHead = 0;
size_t mTail = 0;
size_t mCapacity;
std::function<void(T*)> deleter = [](T *m){ operator delete(m);};
DataPtr mData;
};
```

The implementation has the following features:

- the data structure is move copyable and assignable;
- only used elements are instantiated, using the placement new operator;
- the memory is managed through the std::unique_ptr smart pointer to make easier the destructor and the move operators definition;
- the data structure is thread-safe.

The constructor allocates the raw memory for the Ring Buffer using the operator new and sets the buffer capacity. The new operator allocates the memory necessary to store a number of elements equal to the buffer capacity.

Actually, the allocated memory is one slot more than the requested capacity in order to detect the full/empty states of the buffer with the head/tail pointers, without any additional logic and member variables.

```
RingBuffer(size_t size) :
mCapacity(size+1),
mData(static_cast<T*>(operator new ((size+1)*sizeof(T))), deleter)
{
}
```

Since the element are instantiated using placement new, every element remaining in the buffer is manually destroyed calling its destructor. This is quite interesting, because the use of placement new it's probably one of the few cases where it makes sense to calling the destructor explicitly.

```
~RingBuffer()
{
if (mData != nullptr)
{
// Destroy all elements in buffer
for (std::size_t i = mHead; i != mTail; i = (i + 1) % mCapacity)
{
mData[i].~T();
}
}
}
```

The raw memory for the ring buffer is implicitly deallocated by the std::unique_ptr smart pointer that invokes the deleter function specified as argument.

The use of a custom deleter function it`s necessary because the smart pointer holds an array of elements of type T, while the constructor allocated raw memory. Without the custom deleter, the smart pointer would have tried to call delete[] on such arrays causing an undefined behavior.

The head pointer is used to identify the slot with the older element in the buffer, while the tail pointer is used to identify the slot where to store the next produced element.

According to this, the empty state is detected checking the equality between the head and tail pointers, while the full state is detected checking that increasing the tail pointer (modulo the capacity) make it equal to the head pointer.

```
bool empty() const
{
//if head and tail are equal the container is empty
return (mHead == mTail);
}
bool full() const
{
//If tail is ahead the head by 1 the container is full
return (mHead == ((mTail+1)%mCapacity));
}
```

The size of the Ring Buffer corresponds to the number of stored elements, while its capacity corresponds to the maximum number of elements that can be stored.

```
size_t capacity() const
{
return mCapacity;
}
size_t size() const
{
return (mTail >= mHead) ? (mTail - mHead) : (mCapacity - mHead + mTail);
}
```

Adding and removing elements from the ring buffer requires to modify the head and tail pointers. A new element is inserted at the current tail location, advancing then the tail by one (modulo the buffer capacity).

If the buffer is full, it is also necessary to advance the head property in order to preserve the conditions used to verify the empty and full states. When removing an element, the element at the current head location is returned, advancing then the head by one (modulo the buffer capacity).

If the buffer is empty, an empty value is returned.

```
void put(const T& item)
{
std::lock_guard<std::mutex> lock(mMutex);
if (full()) mHead = (mHead+1) % mCapacity;
new(mData.get() + mTail) T(item);
mTail = (mTail+1) % mCapacity;
}
T get()
{
std::lock_guard<std::mutex> lock(mMutex);
if(empty())
{
return T();
}
//Read mData and advance the head
auto ret = mData[mHead];
mData[mHead].~T();
mHead = (mHead+1) % mCapacity;
return ret;
}
```

The complete Ring Buffer implementation can be found in my Github repository including unit test using the Google C++ test framework.

If you liked this post, follow me on Twitter to get more related content daily!

]]>A modulo n integer sequence is just a (potentially infinite) integer sequence that wrap to 0 when the current value is n. For example, $$ 0,1,2,3,4,0,1,2,3,4,...$$ is a modulo 5 integer sequence.

Modulo-n integer sequences are widely used in programming. For instance, they are necessary in any array-backed data structure (e.g. ring buffers or fixed size queues) in order to reuse elements when the end of the array is reached.

In this post, we will shortly review three different ways to create such sequences examining their pros and cons.

Let's assume that the current value of the sequence is stored in an integer variable called *counter*.
The traditional way to generate the next value is to increase *counter*, divide the result by *n* and then keep the remainder using the modulo operator.

```
cnt = (cnt + 1) % n;
```

This solution is easy and readable, but not so performant. Indeed, the integer modulo operator is expensive and requires far more clock cycles than other operations like addition or subtraction.

A second way is to replace the modulo operator with a comparison. We reset *counter* to 0 when it is equal to *n* and we increment *counter* otherwise.

```
cnt = (cnt + 1);
if (cnt >= n) cnt = 0;
```

This approach is usually more efficient than the previous one even if the performance benefits is completely platform dependent.

If n is a power of two (i.e. n = 2^{m}), it is possible to create a modulo-n sequence in a very efficient way using the *bitwise and* operator.

Under this special condition indeed

```
cnt = (cnt + 1) % n;
```

becomes equivalent to

```
cnt = (cnt + 1) & n;
```

But why this approach works?

When n = 2^{m}, we have that:

the modulo operator takes only the least significant

*m*bits of the*counter*variablen-1 is a number having the last

*m*bits equal to one (i.e. n = 64 = 1000000 in binary, n-1 = 63 = 111111)

So if you consider the *bitwise and* table of truth, it becomes clear how the *bitwise and* operator can be used to implement the modulo operator in this particular case.

This third method is very efficient because the *bitwise and* operator is really fast. The downside is that the method is not applicable to all use cases and it requires some binary math knowledge to be understood.

The following example shows how to generate a modulo-64 sequence using the *bitwise and* operator:

```
cnt = (cnt + 1) & 0x3F; // n = 64, m = 6
```

We reviewed three different methods of generating modulo n sequences of integer.

I would generally suggest to use one of the first two methods, since they are more readable and applicable without any restriction.

Anyway, if you need a very performant code, you could consider also the third option. For example, if you are implementing a circular buffer, you might consider setting its size to a power of two speed up performance on accessing the buffer.

Since I used C++ as reference language, in the code snippets the modulo operator has been represented by *%* and the *bitwise and* operator by *&*.

If you liked this post, follow me on Twitter to get more related content daily!

]]>