Edd Mann Developer

Determining your closest Parkrun Alphabet Challenge using Python and pandas

The Parkrun Alphabet is an unofficial challenge that sees runners complete a Parkrun at locations starting with each letter of the English alphabet. I am a big fan of the Parkrun and wanted to work out how feasible it would be for me to complete the challenge based on the closest tourist locations to my local weekly run. I also thought this would be a great opportunity to explore pandas and work with DataFrames.

This article was originally written as an Jupyter Notebook which can be downloaded here.

The Dataset

My first job was to build a dataset of all the current Parkrun events and their locations. Fortunately, the official Parkrun websites provides this dataset indirectly by-way of OpenStreetMap Features in their interactive map.

import requests
import json

events = json.loads(requests.get("https://images.parkrun.com/events.json").content)

For historical prosperity I stored a local copy of this dataset; as it is not an official dataset and more an implementation detail of another feature there is a high likely hood it could change.

with open("closest-parkrun-alphabet-challenge.json", "w") as file:
    file.write(json.dumps(events, indent=4))

This dataset provides me with the required Parkrun event names and location coordinates (longitude and latitude). Based on a supplied local Parkrun event I should be able to determine the closest event per-letter of the English alphabet to complete the challenge.

Calculating Distances using the Haversine Formula

To calculate the distance between two different events I will use the Haversine formula. This formula calculates the shortest distance over the earth’s surface – giving an ‘as-the-crow-flies’ distance between the two points. Although this will not factor in actual travel considerations (such as roads, traffic etc.) it is a good enough metric to solve the problem. There are many other resources which go into detail on how this formula works; instead of re-implementing it I have decided to use an existing library.

!pip install haversine
[event_a, event_b, *_] = events["events"]["features"]

event_a["geometry"]["coordinates"]

# [-0.335791, 51.410992]

The library I am using looks to require the coordinates to be positioned in the opposite direction (latitude, longitude) to what the dataset has provided (longitude, latitude). As such, I will apply a simple transformation over the dataset before usage within the distance calculation.

from haversine import haversine

def flip(coords):
    x, y = coords
    return y, x

haversine(flip(event_a["geometry"]["coordinates"]), flip(event_b["geometry"]["coordinates"]), unit="mi")

# 4.952173093357963

Putting it all together with Pandas

Now that we have the core building blocks in-place we can now go about solving the problem using the panda’s library.

import pandas as pd

frame = pd.json_normalize(events["events"]["features"])

With the normalised dataset now imported into a DataFrame, I will go about applying some initial transformations to prepare the data for use. The first of which is, as the imported dataset includes both adult and junior events, I only wish to consider adult events for this problem.

ADULT_PARKRUN = 1
frame = frame[frame["properties.seriesid"] == ADULT_PARKRUN]

The event names look to conform to lower-case, English alphabet characters (even in the case of international Parkrun events). We can produce a new column from this source to group each of the events alphabetically by their first character going forward.

frame["letter"] = frame["properties.eventname"].str[0]

As discussed before, the final piece of dataset preparation we need to do is ensure the the coordinates are supplied to the Haversine formula in the expected order.

frame["geometry.coordinates"] = frame["geometry.coordinates"].apply(flip)

We can now find the local Parkrun event within the DataFrame.

local_parkrun = frame.loc[frame["properties.EventShortName"] == "Wimbledon Common"].iloc[0]
id                                                          2
type                                                  Feature
geometry.type                                           Point
geometry.coordinates                   (51.442078, -0.232215)
properties.eventname                                wimbledon
properties.EventLongName             Wimbledon Common parkrun
properties.EventShortName                    Wimbledon Common
properties.LocalisedEventLongName                        None
properties.countrycode                                     97
properties.seriesid                                         1
properties.EventLocation                     Wimbledon Common
letter                                                      w
Name: 1, dtype: object

We can then finally determine each tourist events distance away from the local event.

frame["distance"] = frame.apply(lambda parkrun: haversine(parkrun["geometry.coordinates"], local_parkrun["geometry.coordinates"], unit='mi'), axis=1)

challenge = frame.sort_values(['letter', 'distance'], ascending=True).groupby('letter').apply(lambda parkruns: parkruns.head(2))

challenge[['properties.EventShortName', 'distance']]
              properties.EventShortName    distance
letter
a      275                   Ally Pally   11.633086
       864                     Aldenham   14.678486
b      0                     Bushy Park    4.952173
       69                     Brockwell    5.505712
c      1317              Clapham Common    3.610917
       344                   Crane Park    6.023641
d      302                      Dulwich    6.590564
       888               Dartford Heath   18.186340
e      462               East Grinstead   23.870382
       2162           Edenbrook Country   29.585593
f      562                Fulham Palace    2.181071
       20                 Finsbury Park   10.390485
g      177                  Gunnersbury    4.729258
       343                    Gladstone    7.940967
h      1637                    Hanworth    7.156085
       65               Hampstead Heath    8.508784
i      1755            Ifield Mill Pond   23.123788
       1876       Itchen Valley Country   59.060942
j      1544                 Jersey Farm   23.143607
       900                       Jersey  177.952723
k      68                      Kingston    3.497859
       1625                     Kingdom   25.778223
l      176                        Lloyd    8.459577
       2272  Lordship Recreation Ground   12.234581
m      301                     Mile End   10.021387
       1292                 Mole Valley   14.049230
n      122                      Nonsuch    5.883647
       566              Northala Fields    9.252729
o      50                 Old Deer Park    3.569700
       524                     Osterley    6.028906
p      668                  Peckham Rye    7.443820
       234                       Pymmes   14.130422
q      481              Queen Elizabeth   46.581142
       309             Queen’s, Belfast  320.843908
r      3                  Richmond Park    2.700374
       15               Roundshaw Downs    7.902912
s      2108                    Southall    7.536143
       199                South Norwood    8.075157
t      972               Tooting Common    3.729324
       2340       Thames Path, Woolwich   13.809657
u      390                  Upton Court   15.324948
       1644                    Uckfield   35.385475
v      1309               Victoria Dock   11.653173
       169                   Valentines   15.691955
w      1               Wimbledon Common    0.000000
       277              Wormwood Scrubs    5.419166
y      2297   Yarborough Leisure Centre  125.414614
       87                          York  176.070534
z      1905                  Zuiderpark  198.046322
       1779                 Ziegelwiese  523.953296