2022 was an odd year. I started running in 2021, but only in 2022 I began to compete seriously as an amateur. Therefore, I competed in 12 official races where 9 of which were 10km races and three half-marathon (21 km each). However, using gadgets that helped me to record my results, I ended with 98 unique records, which counted all the official races and all the training. Bearing this in mind, what can we do with this whole amount of saved data? The answer is simple: analyzing and visualizing this data within data analytics. Therefore, I explain below some exciting insights I gather by analyzing my running history information.
First, I had to extract the data from the Nike Run Club app using some third-parties software. This approach is covered in the blog post “Export Data From Nike Run Club in GPX File Type for Data Analytics“.
By uploading the gpx file type into the Jupyter Notebook and using Python as a programming language, I could read the file and get some information as the total points of tracks (races) I performed.
By rearding the gpx file , I could see that I had 98 tracks and 1 segment but I still want to have more information such as ‘Longitude’, ‘Latitude’, ‘Altitude’, ‘Time’, ‘Speed’
Looping through all tracks and segments, I got individual information on all 98 tracks (0-97). However, let’s see how many meters above or below the sea I ran this year:
This means that the lowest point I ran was 21 meters above sea level, while the highest is at 631 meters.
An interesting thing about the gpx files is that they are XML files in a text format, and this can be seen by describing the gpx; I have been analyzing here:
From this point, the idea is to gather all the information and apply normalization to the column speed. Normalization is a process in which the columns are transformed to the same scale in a dataset. After this, it is printed the information of the dataset with the name of each route:
The next step is to return the mean of the values for the requested axis. This method is applied on a series object to return a scalar value, which is the mean value of the dataset.
At last, I thought to transform the gpx file into a CSV (Excel file type) to show this possibility, but I decided to perform another analisys where I wanted to know how many kilometers I ran each month of the year instead:
By analyzing the output in the graph above, I learned that I ran 120 km in October 2022 and did not in February 2022, besides the previous information acquired.
Additionally, all the code involved in this analysis is available in the following GitHub repository:https://github.com/brunorsreis/running2022
References:
(2014). How to use IPython notebook. Python4oceanographers. https://ocefpaf.github.io/python4oceanographers/blog/2013/06/17/teaching/
(2014). Build Interactive GPS activity maps from GPX files using Folium. Towardsdatascience. https://towardsdatascience.com/build-interactive-gps-activity-maps-from-gpx-files-using-folium-cf9eebba1fe7
(2014). Exploring GPX files. Python4oceanographers. https://ocefpaf.github.io/python4oceanographers/blog/2014/08/18/gpx/
Radečić, D. (2021). Data Science For Cycling — How To Read GPX Strava Routes With Python. Towardsdatascience. https://towardsdatascience.com/data-science-for-cycling-how-to-read-gpx-strava-routes-with-python-e45714d5da23
Hi! I am Bruno, a Brazilian born and bred, and I am also a naturalized Swedish citizen. I am a former Oracle ACE and, to keep up with academic research, I am a Computer Scientist with an MSc in Data Science and another MSc in Software Engineering. I have over ten years of experience working with companies such as IBM, Epico Tech, and Playtech across three different countries (Brazil, Hungary, and Sweden), and I have joined projects remotely in many others. I am super excited to share my interests in Databases, Cybersecurity, Cloud, Data Science, Data Engineering, Big Data, AI, Programming, Software Engineering, and data in general.
(Continue reading)