Airport names come in all sorts of forms. The name itself often refers to a famous person. Airport codes can be either ICAO or IATA. Both are being used frequently used.
Our dataset comes from OpenFlights and requires some data cleaning before use. The following actions are perfomed:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# this tells Jupyter to embed matplotlib plots in the notebook
%matplotlib notebook
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = 1000
df = pd.read_csv("airports.csv", index_col="Id")
# Convert feet to meters
df["Altitude"] = df["Altitude"] / 3.28
# Remove the "airport" suffix
df['Name'] = df['Name'].str.replace('Airport', '')
# Remove trailing spaces
df['Name'] = df['Name'].str.strip()
# Add a column with length of the name
df['Name_length'] = df['Name'].str.len()
# Remove values that have Duplicate in it
df.query('Name.str.contains("Duplicate") == False', inplace=True)
This a sample of the data we have available.
df.head()
There is a great variation in name length, ranging from 2 to 57 characters.
print(df['Name_length'].describe())
print("")
print(df.loc[df['Name'].map(len).idxmax(), ['Name','ICAO', 'Country', 'Name_length']])
print("")
print(df.loc[df['Name'].map(len).idxmin(), ['Name','ICAO', 'Country', 'Name_length']])
df.describe()
A top 10 of amount of airports per country.
#df.plot(x='Country')
df.groupby('Country').count().sort_values(by=['Name'], ascending=False).head(10)[['Name']]