SNCF Open Data train station attendance
Passengers per french train station in 2018
Data
Libraries
The following libraries are imported:
- pandas and numpy for data processing
- plotly.colors to use a specific color scale
- plotly.graph_object for data visualization
import pandas as pd
import numpy as np
import plotly.colors
import plotly.graph_objects as go
Processing
1. Reading csv files
df_frequentation = pd.read_csv('data/frequentation-gares.csv', sep=';')
df_gares = pd.read_csv('data/referentiel-gares-voyageurs.csv', sep=';')
Sample data from df_frequentation
Nom de la gare | Code UIC complet | Code postal | Segmentation DRG 2018 | Total Voyageurs 2018 | Total Voyageurs + Non voyageurs 2018 | Total Voyageurs 2017 | Total Voyageurs + Non voyageurs 2017 | Total Voyageurs 2016 | Total Voyageurs + Non voyageurs 2016 | Total Voyageurs 2015 | Total Voyageurs + Non voyageurs 2015 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Abancourt | 87313759 | 60220 | c | 40228 | 40228 | 43760 | 43760 | 41096 | 41096.551614 | 39720 | 39720 |
1 | Agay | 87757559 | 83530 | c | 15093 | 15093 | 14154 | 14154 | 19240 | 19240.514370 | 19121 | 19121 |
2 | Agde | 87781278 | 34300 | a | 588297 | 735372 | 697091 | 871364 | 660656 | 825820.929253 | 662516 | 828146 |
3 | Agonac | 87595157 | 24460 | c | 1492 | 1492 | 1583 | 1583 | 1134 | 1134.699996 | 1127 | 1127 |
4 | Aigrefeuille Le Thou | 87485193 | 17290 | c | 18670 | 18670 | 14513 | 14513 | 266 | 266.157144 | 0 | 0 |
Sample data from df_gares
Code plate-forme | Intitulé gare | Intitulé fronton de gare | Gare DRG | Gare étrangère | Agence gare | Région SNCF | Unité gare | UT | Nbre plateformes | … | Longitude WGS84 | Latitude WGS84 | Code UIC | TVS | Segment DRG | Niveau de service | SOP | RG | Date fin validité plateforme | WGS 84 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00007-1 | Bourg-Madame | Bourg-Madame | True | False | Agence Grand Sud | REGION LANGUEDOC-ROUSSILLON | UG Languedoc Roussillon | BOURG MADAME GARE | 1 | … | 1.948670 | 42.432407 | 87784876 | BMD | c | 1.0 | NaN | GARES C LANGUEDOC ROUSSILLON | NaN | 42.4324069,1.9486704 |
1 | 00014-1 | Bolquère - Eyne | Bolquère - Eyne | True | False | Agence Grand Sud | REGION LANGUEDOC-ROUSSILLON | UG Languedoc Roussillon | BOLQUERE EYNE GARE | 1 | … | 2.087559 | 42.497873 | 87784801 | BQE | c | 1.0 | NaN | GARES C LANGUEDOC ROUSSILLON | NaN | 42.4978734,2.0875591 |
2 | 00015-1 | Mont-Louis - La Cabanasse | Mont-Louis - La Cabanasse | True | False | Agence Grand Sud | REGION LANGUEDOC-ROUSSILLON | UG Languedoc Roussillon | MONT LOUIS LA CABANASSE GARE | 1 | … | 2.113138 | 42.502090 | 87784793 | MTC | c | 1.0 | NaN | GARES C LANGUEDOC ROUSSILLON | NaN | 42.5020902,2.1131379 |
3 | 00020-1 | Thuès les Bains | Thuès les Bains | True | False | Agence Grand Sud | REGION LANGUEDOC-ROUSSILLON | UG Languedoc Roussillon | THUES LES BAINS GARE | 1 | … | 2.249094 | 42.528801 | 87784744 | THB | c | 1.0 | NaN | GARES C LANGUEDOC ROUSSILLON | NaN | 42.5288009,2.249094 |
2. Merging dataframes
The UIC Code is a unique ID for train stations. However, the column names are different in both files, so it’s mandatory so specify the left_on
and right_on
arguments.
df = df_gares.merge(
right=df_frequentation,
left_on='Code UIC',
right_on='Code UIC complet',
how='inner')
3. Filtering
In order to avoid keeping small train stations, I chose to filter out stations with attendance below 1000 passengers in 2018. For visualization purpose, I added a column holding the square root of the number of passengers per station
df = df[df['Total Voyageurs 2018'] > 1000]
4. Adding a category column
By using pandas.cut
data can be split into categories according to total number of passengers. This will allow to plot with a different color for each category.
df['category'] = pd.cut(df['Total Voyageurs 2018'], bins=[1e4, 1e5, 1e6, 1e7, np.inf])
Visualization
Plotly is a handy tool when it comes to creating interactive graphs and plots, that you can embed in other websites.
1. Scatter Mapbox
Data contain latitude and longitude: these will be used to plot train stations on the map. The size of the bubbles will depend on the square root of the number of passengers in 2018. A different trace is added for each of the categories defined above. Finally, information shown on mouse-hovering is defined using hovertemplate
.
fig = go.Figure()
colors = plotly.colors.sequential.Viridis
for i, cat in enumerate(df.category.cat.categories):
df_sub = df[df.category == cat]
fig.add_trace(go.Scattermapbox(
lat=df_sub['Latitude WGS84'],
lon=df_sub['Longitude WGS84'],
text=df_sub['Intitulé gare'],
marker=dict(
color=colors[2*i+1],
size=np.sqrt(df_sub['Total Voyageurs 2018 sqrt']),
sizemin=1,
sizeref=15,
sizemode='area',
opacity=.8,
),
meta=df_sub['Total Voyageurs 2018'],
hovertemplate="%{text}" + "<br>" + "Passengers: %{meta}",
name=f'> {cat.left:1.0e} passengers',
))
2. Layout
The last step is adding the background map, the title, margins around the plot, and the initial position & zoom.
fig.update_layout(
mapbox_style="open-street-map",
title='Passengers per french train station in 2018',
margin={'l': 0, 'r': 0, 't': 50, 'b': 0},
mapbox=dict(
center={'lon': 2.39, 'lat': 47.09},
zoom=4
),
)