# Closest Parkrun Alphabet Challenge using Pandas

The Parkrun Alphabet is an [unofficial challenge](https://blog.parkrun.com/uk/2018/07/18/the-parkrun-alphabet/) that sees runners complete a Parkrun at locations starting with each letter of the English alphabet.
I am a big fan of the Parkrun and wanted to work out how feasible it would be for me to complete the challenge based on the closest _tourist_ locations to my _local_ weekly run.
I also thought this would be a great opportunity to explore [pandas](https://pandas.pydata.org/) and work with [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

## The Dataset

My first job was to build a dataset of all the current Parkrun events and their locations.
Fortunaly the offical Parkrun websites provides this datasetindirectly by-way of OpenStreetMap [Features](https://wiki.openstreetmap.org/wiki/Features) in their interactive map.

In [7]:
import requests
import json

events = json.loads(requests.get("https://images.parkrun.com/events.json").content)

For historical prosperity I stored a local copy of this dataset; as it is not an offical dataset and more an implementation detail of another feature there is a high likely hood it could change.

In [8]:
with open("closest-parkrun-alphabet-challenge.json", "w") as file:
    file.write(json.dumps(events, indent=4))

This dataset provides me with the required Parkrun event names and location coordindtes (longitude and latitude).
Based on a supplied local Parkrun event I should be able to determine the closest event per-letter of the English alphabet to complete the challenge.

## Calculating Distances using the Haversine Formula

To calculate the distance between two different events I will use the Haversine formula.
This formula calculates the shortest distance over the earth's surface – giving an 'as-the-crow-flies' distance between the two points.
Although this will not factor in actual travel considerations (such as roads, traffic etc.) it is a _good enough_ metric to solve the problem.
There are [many](https://nathanrooy.github.io/posts/2016-09-07/haversine-with-python/) [other](https://en.wikipedia.org/wiki/Haversine_formula) [resources](https://www.movable-type.co.uk/scripts/latlong.html) which go into detail on how this formula works; instead of re-implementing it I have decided to use an [existing library](https://pypi.org/project/haversine/).

In [9]:
!pip install haversine

Collecting haversine
  Downloading haversine-2.8.0-py2.py3-none-any.whl (7.7 kB)
Installing collected packages: haversine
Successfully installed haversine-2.8.0


In [10]:
[event_a, event_b, *_] = events["events"]["features"]

event_a["geometry"]["coordinates"]

[-0.335791, 51.410992]

The library I am using looks to require the coordinates to be positioned in the opposite direction (latitude, longitude) to what the dataset has provided (longitude, latitude).
As such, I will apply a simple tranformation over the dataset before usage within the distance calculation.

In [11]:
from haversine import haversine

def flip(coords):
    x, y = coords
    return y, x

haversine(flip(event_a["geometry"]["coordinates"]), flip(event_b["geometry"]["coordinates"]), unit="mi")

4.952173093357963

## Putting it all together with Pandas

Now that we have the core building blocks in-place we can now go about solving the problem using the panda's library.

In [12]:
import pandas as pd

frame = pd.json_normalize(events["events"]["features"])

With the normalised dataset now imported into a DataFrame, I will go about applying some initial transformations to prepare the data for use.
The first of which is, as the imported dataset includes both adult and junior events, I only wish to consider adult events for this problem.

In [13]:
ADULT_PARKRUN = 1
frame = frame[frame["properties.seriesid"] == ADULT_PARKRUN]

The event names look to conform to lower-case, English alphabet characters (even in the case of international Parkrun events).
We can produce a new column from this source to group each of the events alphabetically by their first character going forward.

In [14]:
frame["letter"] = frame["properties.eventname"].str[0]

As discussed before, the final piece of dataset preperation we need to do is ensure the the coordindates are supplied to the Haversine formula in the expected order.

In [15]:
frame["geometry.coordinates"] = frame["geometry.coordinates"].apply(flip)

In [16]:
frame

Unnamed: 0,id,type,geometry.type,geometry.coordinates,properties.eventname,properties.EventLongName,properties.EventShortName,properties.LocalisedEventLongName,properties.countrycode,properties.seriesid,properties.EventLocation,letter
0,1,Feature,Point,"(51.410992, -0.335791)",bushy,Bushy parkrun,Bushy Park,,97,1,"Bushy Park, Teddington",b
1,2,Feature,Point,"(51.442078, -0.232215)",wimbledon,Wimbledon Common parkrun,Wimbledon Common,,97,1,Wimbledon Common,w
2,3,Feature,Point,"(51.307648, -0.184225)",banstead,Banstead Woods parkrun,Banstead Woods,,97,1,"Banstead Woods, Coulsdon",b
3,4,Feature,Point,"(51.451962, -0.292886)",richmond,Richmond parkrun,Richmond Park,,97,1,"Richmond Park, Richmond upon Thames",r
4,5,Feature,Point,"(53.808582, -1.560059)",woodhousemoor,Woodhouse Moor parkrun,Woodhouse Moor,,97,1,"Woodhouse Moor, Leeds",w
...,...,...,...,...,...,...,...,...,...,...,...,...
2389,3331,Feature,Point,"(-29.684661, 30.472345)",lynnfield,Lynnfield parkrun,Lynnfield,,85,1,Lynnfield Park,l
2390,3332,Feature,Point,"(-25.670437, 29.455077)",lavendermill,Lavender Mill parkrun,Lavender Mill,,85,1,Lavender Mill,l
2391,3333,Feature,Point,"(-25.654647, 28.179186)",onderstepoortcampus,Onderstepoort Campus parkrun,Onderstepoort Campus,,85,1,Onderstepoort Campus,o
2392,3336,Feature,Point,"(42.100364, -76.806334)",lackawannarailtrail,Lackawanna Rail Trail parkrun,Lackawanna Rail Trail,,98,1,Lackawanna Rail Trail,l


We can now find the local parkrun event within the DataFrame.

In [17]:
local_parkrun = frame.loc[frame["properties.EventShortName"] == "Wimbledon Common"].iloc[0]
local_parkrun

id                                                          2
type                                                  Feature
geometry.type                                           Point
geometry.coordinates                   (51.442078, -0.232215)
properties.eventname                                wimbledon
properties.EventLongName             Wimbledon Common parkrun
properties.EventShortName                    Wimbledon Common
properties.LocalisedEventLongName                        None
properties.countrycode                                     97
properties.seriesid                                         1
properties.EventLocation                     Wimbledon Common
letter                                                      w
Name: 1, dtype: object

We can then determine each _tourist_ events distance away from from the local event.

In [18]:
frame["distance"] = frame.apply(lambda parkrun: haversine(parkrun["geometry.coordinates"], local_parkrun["geometry.coordinates"], unit='mi'), axis=1)

challenge = frame.sort_values(['letter', 'distance'], ascending=True).groupby('letter').apply(lambda parkruns: parkruns.head(2))

challenge[['properties.EventShortName', 'distance']]

Unnamed: 0_level_0,Unnamed: 1_level_0,properties.EventShortName,distance
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,275,Ally Pally,11.633086
a,864,Aldenham,14.678486
b,0,Bushy Park,4.952173
b,69,Brockwell,5.505712
c,1317,Clapham Common,3.610917
c,344,Crane Park,6.023641
d,302,Dulwich,6.590564
d,888,Dartford Heath,18.18634
e,462,East Grinstead,23.870382
e,2162,Edenbrook Country,29.585593
