Determining your closest Parkrun Alphabet Challenge using Python and pandas
The Parkrun Alphabet is an unofficial challenge that sees runners complete a Parkrun at locations starting with each letter of the English alphabet. I am a big fan of Parkrun and wanted to work out how feasible it would be for me to complete the challenge based on the closest tourist locations to my local weekly run. I also thought this would be a great opportunity to explore pandas and work with DataFrames.
This post was originally written as a Jupyter Notebook, which can be downloaded here.
The Dataset
My first job was to build a dataset of all the current Parkrun events and their locations. Fortunately, the official Parkrun website provides this dataset indirectly via OpenStreetMap Features in their interactive map.
import requests
import json
events = json.loads(requests.get("https://images.parkrun.com/events.json").content)
For historical prosperity, I stored a local copy of this dataset. As it is not an official dataset but rather an implementation detail of another feature, there is a high likelihood it could change.
with open("closest-parkrun-alphabet-challenge.json", "w") as file:
file.write(json.dumps(events, indent=4))
This dataset provides the necessary Parkrun event names and location coordinates (longitude and latitude). Based on a supplied local Parkrun event, I should be able to determine the closest event per letter of the English alphabet to complete the challenge.
Calculating Distances using the Haversine Formula
To calculate the distance between two different events, I will use the Haversine formula. This formula calculates the shortest distance over the Earth’s surface, giving an ‘as-the-crow-flies’ distance between two points. Although this does not account for actual travel considerations (such as roads, traffic, etc.), it is a good enough metric to solve the problem. There are many other resources that explain how this formula works. Instead of re-implementing it, I have decided to use an existing library.
!pip install haversine
[event_a, event_b, *_] = events["events"]["features"]
event_a["geometry"]["coordinates"]
# [-0.335791, 51.410992]
The library I am using appears to require the coordinates to be positioned in the opposite order (latitude, longitude) to what the dataset provides (longitude, latitude). As such, I will apply a simple transformation over the dataset before using it in the distance calculation.
from haversine import haversine
def flip(coords):
x, y = coords
return y, x
haversine(flip(event_a["geometry"]["coordinates"]), flip(event_b["geometry"]["coordinates"]), unit="mi")
# 4.952173093357963
Putting it all together with Pandas
Now that we have the core building blocks in place, we can solve the problem using the pandas library.
import pandas as pd
frame = pd.json_normalize(events["events"]["features"])
With the normalised dataset now imported into a DataFrame, I will apply some initial transformations to prepare the data for use. The first of these is filtering out junior events, as I only want to consider adult events.
ADULT_PARKRUN = 1
frame = frame[frame["properties.seriesid"] == ADULT_PARKRUN]
The event names appear to conform to lowercase English alphabet characters (even for international Parkrun events). We can create a new column from this data to group events alphabetically by their first character.
frame["letter"] = frame["properties.eventname"].str[0]
As discussed before, the final dataset transformation ensures that the coordinates are supplied to the Haversine formula in the expected order.
frame["geometry.coordinates"] = frame["geometry.coordinates"].apply(flip)
We can now locate the local Parkrun event within the DataFrame.
local_parkrun = frame.loc[frame["properties.EventShortName"] == "Wimbledon Common"].iloc[0]
id 2
type Feature
geometry.type Point
geometry.coordinates (51.442078, -0.232215)
properties.eventname wimbledon
properties.EventLongName Wimbledon Common parkrun
properties.EventShortName Wimbledon Common
properties.LocalisedEventLongName None
properties.countrycode 97
properties.seriesid 1
properties.EventLocation Wimbledon Common
letter w
Name: 1, dtype: object
Finally, we determine the distance of each tourist event from the local event.
frame["distance"] = frame.apply(lambda parkrun: haversine(parkrun["geometry.coordinates"], local_parkrun["geometry.coordinates"], unit='mi'), axis=1)
challenge = frame.sort_values(['letter', 'distance'], ascending=True).groupby('letter').apply(lambda parkruns: parkruns.head(2))
challenge[['properties.EventShortName', 'distance']]
properties.EventShortName distance
letter
a 275 Ally Pally 11.633086
864 Aldenham 14.678486
b 0 Bushy Park 4.952173
69 Brockwell 5.505712
c 1317 Clapham Common 3.610917
344 Crane Park 6.023641
d 302 Dulwich 6.590564
888 Dartford Heath 18.186340
e 462 East Grinstead 23.870382
2162 Edenbrook Country 29.585593
f 562 Fulham Palace 2.181071
20 Finsbury Park 10.390485
g 177 Gunnersbury 4.729258
343 Gladstone 7.940967
h 1637 Hanworth 7.156085
65 Hampstead Heath 8.508784
i 1755 Ifield Mill Pond 23.123788
1876 Itchen Valley Country 59.060942
j 1544 Jersey Farm 23.143607
900 Jersey 177.952723
k 68 Kingston 3.497859
1625 Kingdom 25.778223
l 176 Lloyd 8.459577
2272 Lordship Recreation Ground 12.234581
m 301 Mile End 10.021387
1292 Mole Valley 14.049230
n 122 Nonsuch 5.883647
566 Northala Fields 9.252729
o 50 Old Deer Park 3.569700
524 Osterley 6.028906
p 668 Peckham Rye 7.443820
234 Pymmes 14.130422
q 481 Queen Elizabeth 46.581142
309 Queen's, Belfast 320.843908
r 3 Richmond Park 2.700374
15 Roundshaw Downs 7.902912
s 2108 Southall 7.536143
199 South Norwood 8.075157
t 972 Tooting Common 3.729324
2340 Thames Path, Woolwich 13.809657
u 390 Upton Court 15.324948
1644 Uckfield 35.385475
v 1309 Victoria Dock 11.653173
169 Valentines 15.691955
w 1 Wimbledon Common 0.000000
277 Wormwood Scrubs 5.419166
y 2297 Yarborough Leisure Centre 125.414614
87 York 176.070534
z 1905 Zuiderpark 198.046322
1779 Ziegelwiese 523.953296
This approach provides an efficient way to determine the closest Parkrun events for completing the Alphabet Challenge using Python and pandas. Happy running!