Anagram Solver in Clojure
This past week discussion around solving anagrams was brought up - I thought it would be interesting to devise a couple of methods for computational achieving such a task. Providing an anagram string and a dictionary file, I wished to be returned all possible matching words from the dictionary.
I decided this expectational exercise was a great opportunity to use Clojure.
Whilst thinking of ways to tackle this problem, I knew of initially two different means of storing an effective dictionary for anagram lookup.
I was able to generalise into a higher-order function the process of taking each word within the supplied dictionary file, and adding it into the initial
dictionary data-structure using the
So as to not incur generation performance penalties each time this function was invoked, I decided to memoize the function.
You will notice that I added some constraints upon the words that we wished to include from the supplied dictionary - this was to simplify the large English dictionary provided with Unix platforms such as Mac OS
The first solution used a simple hash-map, devising the key based on the alphabetical sorting of the supplied word. This meant that each matching anagram would generate the same unique key and then be stored within the hash-maps Set value.
The above solution provided a simple design, along with quick lookup times. However, I felt that a better suited data-structure could help ease memory requirements. Using a Trie data-structure allowed us to more effectively store the computed keys for the lookup dictionary. Clojure provides an efficient way in which to interact with nested associated data-structures using the familiar hash-map. Instead of the key being converted back into a sorted string, it was maintained as a vector and used as a sequence of keys which formed the path to the matching anagram words.
Hash-Map using Prime Numbers
After some Internet research I noticed a different way in which we could generate unique word keys, regardless of character ordering.
Taking advantage of the fact that prime numbers are ‘multiplicatively unique’, I created a character to prime number lookup function.
The product of each characters corresponding prime number resulted in a unique integer value been generated.
I was then able to use a more efficient
int-map dictionary data-structure, providing better lookup and storage costs than the initial implementation.
Finally, I wished to extend on the initial problem and provide the ability to include sub-string anagrams in the result. I achieved this by creating a small function which took a word and created its power-set (set of all the possible subsets). With this implementation I was then able to create a higher-order function which took the desired finder implementation and fetched both full and sub-string anagrams from the lookup dictionary.