Solving the Advent of Code 2021 calendar using C in under half-a-second

30 May 2022

Since being introduced to the world of Advent of Code just prior to the 2020 calendar starting, I subsequently spent the majority of 2021 completing (and documenting) the 2015, 2016 and 2017 calendars. For the 2021 calendar I decided that it would be interesting (and challenging) to complete the calendar in C, with an initial solution written in Python to meet the daily aspect of the challenge. Additionally, with C being such a performant language, I also wanted to set the goal of ensuring that the entire C calendar was solvable on a single CPU core in under a half-a-second (inspired by this blog post). In this post I wish to discuss how I went about achieving this goal (spoiler 410315 μs (0.410 s) on average) and the hurdles I faced along the way.

The rules

A month prior to the calendar commencing, I decided to lay out some rules that I would follow whilst completing each solution in C.

The runtime benchmark would be taken from execution on my current MacBook Pro, running an Intel Core i7-9750H CPU @ 2.60GHz.
Each solution written would follow the C11 standard, and compile using the GCC compiler with the following flags -std=gnu11 -march=native -Ofast -Wall -Werror.
No third-party libraries (data-structures and algorithms) could be used within a solution, making each solution self-contained. I did make one exception, that being the inclusion of a dynamic array implementation, which eased variable input sizes.
Each solution would clean up any memory allocation it had requested. Sure the OS would clean up these allocations once the executable terminates, but in constrained memory environments (i.e. microcontrollers) this would be a necessity, and a possible next project 😉.
Each day’s solution parts would be computational self-reliant, meaning that computation completed by part one could not be re-used in part two. This in effect made the task of keeping to under half-a-second even harder, as performance gains found in re-using previous state was prohibited.

As discussed in the introduction I did not complete each day’s solution in C from the offset. Instead, I opted to initially complete each solution in Python (a more forgiving dynamic language) first and then set out to complete the C equivalent. This allowed me to review possible strategies in a batteries included language that provides rapid development first, and then port these over to C.

Things I learnt

The Advent of Code 2021 calendar had some very tricky problems in it. Some of which required a lot of thinking and research to reduce down to an adequate runtime to meet the half-a-second target. Over the course of a couple of months I managed to finally achieve this goal, reaching an average runtime of 410315 μs (0.410 s) 🎉. Below are key learnings I took away from the experience of initially programming in C, and then trying to make them more performant.

C has very unforgiving error messages, runtime Segmentation faults are not easy to debug. When solving the 2017 calendar in Rust (another system-level language) I think I took for granted how well that language aids the developer in tracking down and correcting bugs.
Debugging using tooling such as gdb was very useful, but as I had not invested the time in integrating this within an IDE, using the terminal for such tasks was rather painful. This meant that I did opt for one too many getc(stdin)and printf statement to track execution throughout the course of the calendar.
Valgrind is your friend! When handling cleaning up after each solution having the ability to profile the heap memory usage throughout execution using Valgrind was a life-saver.
Due to some data-structures required to adequately solve some days, I unintentionally learnt a lot about how Hash tables and Priority queues are built from first principles. Although this was interesting, it did take away from focusing on the problem at hand. This made me envious of languages such as C++ and Python which already include tried and true implementations of such structures.
On top of the C11 standard I also wished to follow a uniform coding style. This lead me to employ use of the clang-format tool and follow the style laid out in How to C in 2016, for example using types found in stdint.h.
Parsing input using sscanf is not as bad as I feared it would be. In fact, for some solutions I actually found it easier to parse the input in the C version than I did the Python equivalent 🤯.
Using the C macro system to help reduce boilerplate required to bootstrap each solution was very useful. Being able to reuse behaviour within the AOC_MAIN macro aided greatly in making each solutions’ implementation very concise to the problem.
It took a little while to get used to handling memory allocations on the Stack vs. the Heap - another concept that higher-level languages with more abstract memory models hide from you. However, after the initial learning curve, it became an interesting decision to make on a per-solution basis what the best option was.

The solutions

Throughout the course of December, and the subsequent months when revising the C solutions, I documented how each problem was tackled. Below are notes on interesting aspects of each solution, and techniques used to increase performance for the end goal.

Day 1: Sonar Sweep

Solution: Python, C runtime: 732 μs

The first day provided me with a good use case for Python’s list comprehensions, and allowed me to make use of the zip function to apply the desired windows. For the C solution I opted to use conventional for loops. It gave me my first experience of the dynamic array implementation, allowing me to initially parse the input measurements into a form I wanted to use for the computation.

Day 2: Dive!

Solution: Python, C runtime: 444 μs

The Python version was small enough to fit in each parts’ solution functions. In hindsight, I would have opted to use an exhaustive match case statement, over if statements. The C solution required my first struct to store the course instructions, and followed a simplified mutable pattern to the Python variant. It also provided me with my first experience using scanff to pluck out the desired input.

Day 3: Binary Diagnostic

Solution: Python, C runtime: 761 μs

I had a lot of fun using bitwise operations and the handy Counter data-structure in the Python solution. Finding the bit_length method within the standard library was also very useful. For the C solution I followed a similar approach, however, for determining the bit width I thought it would be fun to drop-down into some assembly. Having targeted the x86 architecture I was able to use the bsrl CPU operation to determine the bit length.

Day 4: Giant Squid

Solution: Python, C runtime: 2715 μs

The Python solution made use of any and all iterable functions to clearly map intent within the code. This again used the powerful zip function, this time to transpose the lists board rows. For the C solution I tried to accommodate for variable size input, resulting in dynamic allocation of the boards. This lead to quite a lot of cleanup being required after the computation step. I did find a goto to be useful however, to apply the required cleanup operation before returning. Typically, I would short-circuit and return early in the looping construct if the condition had been met. This means leaning heavily on the garbage collector etc. in other languages to free any reclaimable memory.

Day 5: Hydrothermal Venture

Solution: Python, C runtime: 1799 μs

This was another interesting problem and allowed me to use Python’s zip_longest and range to determine the intermediate steps of the given lines, with the Counter data structure recording overlaps. I also took advantage of True values being counted as 1 within a Python sum function call. For the C solution I used a variable-length grid to store the counts in, opting to check for +grid[x][y] == 2 to ensure we only counted coordinate overlaps a single time. With the inclusion of variable length arrays I started to explore the use of Stack and Heap allocated memory within different scenarios.

Day 6: Lanternfish

Solution: Python, C runtime: 61 μs

Having spent some time determining the pattern cycle, the resulting solutions modelled in Python and C were very similar and not very complex. Fishes with a lifetime of 0 will have a timer of 8 on the next iteration (aka. newborns). Fishes on the current generation with a timer of 7 today will have a timer of 6 on the next day. So, the number of fishes that are reset today (timer of 0) must be added to the one with a timer of 7.

Day 7: The Treachery of Whales

Solution: Python, C runtime: 11267 μs

This was another Python solution that leaned heavily on list comprehensions and the standard library, using triangle numbers to keep the formula simple. As expected there was a lot more C code required for this one; even though it follows the Python solution, the lack of list comprehensions required a lot of state manipulation.

Day 8: Seven Segment Search

Solution: Python, C runtime: 351 μs

This problem was one of my favourite from the 2021 calendar. I had a lot of fun using sets, match statements and the exploring Python’s type-hinting capabilities. The general solution I went with was that based on the characteristics of one and four you can determine the decoded output. For the C solution, instead of sets I simply counted the pattern similarities, which did not look to be too much of a performance bottleneck. As we knew the expected input values I was able to bake these in to the struct used, greatly simplifying parsing the input. Not making everything dynamic has its wins!

Day 9: Smoke Basin

Solution: Python, C runtime: 1011 μs

This was the first map-based problem of the calendar. I opted to use Breadth-first search (BFS) for this solution and found that in the case of the Python solution being able to use negative index to access the end of the array was very handy. For the C version I used BFS too, but for the visited set I opted to use a 2D array of booleans instead - this was a tradeoff between memory and lookup time.

Day 10: Syntax Scoring

Solution: Python, C runtime: 436 μs

This problem all revolved around picking the right data-structure, in this case a Stack. Although, there were some additional extras at the end (sorting and median calculation) to add another level of complexity. This was very easy to map out and solve within Python, opting to store the score weights within a Dictionary. Fortunately within the C implementation the dynamic array I used had the ability to be used as an efficient Stack. I was additionally able to sort this array using the C standard library’s qsort function. As the dynamic array is internally stored as a regular C array (with some offset metadata) these functions can be used without any additional consideration required.

Day 11: Dumbo Octopus

Solution: Python, C runtime: 250 μs

This was another grid-based problem, where-by treating the grid as a Dictionary within Python made it easy to iterate over and apply the neighbouring checks. In the C version I opted to use a bounded multidimensional array to represent the grid. In this problem I also found how elegant it was to parse integer values using character offsets (i.e. 'number as char' - '0');

Day 12: Passage Pathing

Solution: Python, C runtime: 5340 μs

This problem lent itself well to use BFS again, supplying a predicate function to determine if the cave was explorable. The default value capabilities within Python Dictionaries (i.e. defaultdict(list)) is super useful! For the C version I used an id function to translate the path identifiers into deterministic integers (instead of using a hash function). With this, I employed a recursive approach to solving the problem at hand, using a boolean flag small_cave_seen_twice for switching between the two parts. I am not a huge fan of this approach, however it gets the job done.

Day 13: Transparent Origami

Solution: Python, C runtime: 3667 μs

The Python version again leaned heavily on computed sets and list comprehension. It also showed how a functional reduction step could be used to model the problem. In the C version as space is always getting smaller (due to the folds) we can continue to use the same paper_t structure, but update the width/height that we are concerned with at the time.

Day 14: Extended Polymerization

Solution: Python, C runtime: 67 μs

This was a fun Dynamic programming problem. Like always, I opted to first go down the naive implementation route for part 1, just to be bit by that decision in part 2. Having realised that I do not actually care about creating the exact final string, I was able to break the problem down recursively using memorisation (the @cache decorator is great) and Counter data-structure. For the C version I opted to use a bottom-up (tabulation) approach instead, storing pairs as indexed array values; using a similar method to how you would rows/columns in a single dimensional array. Modelling this problem in this means was made easier thanks to a couple of small macro function I made to handle the index conversion.

Day 15: Chiton

Solution: Python, C runtime: 16857 μs

This was another grid traversal problem. I initially used BFS but then opted to add a Priority queue and use Dijkstra’s algorithm for better performance. Within the Python version I did find treating a standard list as a Priority queue (using heapq methods) felt a little odd; it seemed as thought it could have been abstracted into a different data-structure entirely. I also enjoyed working out the formula to calculate the scaled risk levels. For the C version I decided to follow a similar path, expect this time I was required to build my own heap-based Priority queue implementation. Instead of storing the risk levels as x/y tuples like in the Python version, I opted to store them using a single-dimensional array representation.

Day 16: Packet Decoder

Solution: Python, C runtime: 102 μs

This proved to be a very wordy problem - but like always, I am amazed at how much thought goes into a problem and how much can be packed into such a small input!. For the Python variant I opted for a recursive approach, which returned the resulting reduction and pointer to the consumed sub-expression. In the C version I created the concept of a bitstream which stored the transformed hexadecimal to binary character stream, along with a mutating position index pointer. This meant that when I called parse_uint it would update the bitstream to the next available position in the stream (as if we were consuming that part of the stream). I preferred the approach in the C version, building up the tree with sub-packet structures allowed me to construct the tree a single time and call either calc_version_total or eval based on the part. This is in contrast to the Python version, which I was a little lazy with and side-carted solving the initial parts’ version total.

Day 17: Trick Shot

Solution: Python, C runtime: 1679 μs

I opted for brute-force to solve this problem, finding Python’s product and range combination resulting in very concise code to produce the velocity. Fortunately I modelled this solution in a way that would map well to a C equivalent. The key difference of course is the omission of list comprehensions, which means lots of looping! Having a look through the subreddit after solving this problem there looks to be a non-brute means in which to solve this problem using some equations of motion.

Day 18: Snailfish

Solution: Python, C runtime: 9027 μs

This problem essentially boiled down to a binary tree. I opted to mix it up with the Python solution and use an object-oriented approach. This provided a good use-case for the match statement with pattern matching to construct each Number recursively. Along with this, implementing the __add__ and __radd__ magic methods allowed me to handle the resulting abstraction as if they were true numbers. For the C version I stored the tree in a single dimensional array, keeping track of the depth of each node. This made it easy to perform the split and explode operations. I was also interested in using the memmove function to insert and remove numbers from the tree in a performant manner. I found handling raw memory like this is so powerful but scary at the same time.

Day 19: Beacon Scanner

Solution: Python, C runtime: 116330 μs

This was without question one of the hardest Advent of Code problems I have encountered! Fortunately I had experimented with 3D matrices and matrix transformations whilst working on my Rubik Cube solving project mid-last year, so had a feeling to experiment with this when initially reading the problem. For my Python solution iteratively mapping successive scanners based on orientation although naive and slow did eventually come to a result. I enjoyed storing the orientations as small lambda functions to make it trivial to apply rotations per scanner beacon. Implementing this problem in C was a whole other ball game. I knew my naive approach was not going to cut it, even in a system-level language such as C. So instead I resorted to reviewing how some cleaver people on the Advent of Code subreddit were able to achieve it. Thanks to some key insight I was able to use this to construct my own implementation which heavily relies on linked-lists.

Day 20: Trench Map

Solution: Python, C runtime: 11983 μs

This was yet another 2D matrix problem. For the Python version I opted to build the image using a Dictionary. Using a Dictionary in this way provides an easy means to navigate through and expand the map going forward. For the C solution I decided to instead use a fixed upper-bound multidimensional array, using an accompanying size value to limit the size per enhancement step.

Day 21: Dirac Dice

Solution: Python, C runtime: 29 μs

This was another fun Dynamic programming problem, in which I decided to again use memorisation (and the @cache decorator) to solve the problem using Python. For the C solution another bottom-up tabulation strategy was employed, as this felt simpler than trying to implement function call memorisation in C. To make the solution performant as well as space considerate I based my solution on ideas discussed on the subreddit.

Day 22: Reactor Reboot

Solution: Python, C runtime: 50313 μs

Instead of actually having to go about building up this 3D space and slice up cuboids I was able to employ a neat negation trick. To achieve this I added all intersections to the resulting cuboid listing, along with the cuboid itself if it was on. Intersections with positive region generated negative ones and intersections with negative regions generated positive ones. This canceled out any existing geometry the new cuboid was intersecting. This solution mapped well to both Python and C variants, and met my performance needs.

Day 23: Amphipod

Solution: Python, C runtime: 146356 μs

This was another head scratcher of a day, with many intermediate rule states to map. For the Python solution I employed Dijkstra’s algorithm which although slow (takes over 10 seconds to complete) was very readable. I found that modeling the next valid states as generators was very concise in Python. For the C variant I knew I would need to do some better state pruning to meet the desired runtime constraints. Again, this lead me down an Advent of Code subreddit rabbit hole, in which, along with pruning superfluous moves I also employed the A* algorithm instead. The lower-bound heuristic used calculated the cost required if every pod could move through each other to be placed in their correct room. For this solution to work I ended up having to build my own Hash table and Priority queue tailored to this solution. It took a while to develop, but the resulting speed gains were worth it.

Day 24: Arithmetic Logic Unit

Solution: Python, C runtime: 82 μs

This problem was all related to the Stack data-structure. After spending far too long going through the assembly code I finally figured out the pattern! Both Python and C solutions follow the same idea, using a higher-order function to pass in either the maximum or minimum accumulation functions.

Day 25: Sea Cucumber

Solution: Python, C runtime: 28627 μs

For the final day I was able to use a Dictionary again to model the growing 2D matrix. For the C version I used a fixed-size multidimensional array with a high enough size upper-bound to complete the computation. This shows off once again how useful Python’s list comprehensions are, coupled with functions such as count found in the itertools library.