Airport names

Airport names come in all sorts of forms. The name itself often refers to a famous person. Airport codes can be either ICAO or IATA. Both are being used frequently used.

  • ICAO codes are used for "official" purposes such as Air Traffic Control e.g. flight plans use ICAO codes for airports and airline flight identification.
  • IATA codes are mainly used for ticketing. e.g. travel itineraries use IATA codes for airports and IATA flight numbers.

Data source

Our dataset comes from OpenFlights and requires some data cleaning before use. The following actions are perfomed:

  • Convert the altitude from feet to meter
  • Remove the 'Airport' at the end of the airport name because this was not consistently used
  • Remove any spaces at the start or end of each name
  • Remove lines that are duplicates (indicated by the "Duplicate" in the dataset
In [50]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# this tells Jupyter to embed matplotlib plots in the notebook
%matplotlib notebook

pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = 1000

df = pd.read_csv("airports.csv", index_col="Id")

# Convert feet to meters
df["Altitude"] = df["Altitude"] / 3.28

# Remove the "airport" suffix
df['Name'] = df['Name'].str.replace('Airport', '')

# Remove trailing spaces
df['Name'] = df['Name'].str.strip()

# Add a column with length of the name
df['Name_length'] = df['Name'].str.len()

# Remove values that have Duplicate in it
df.query('Name.str.contains("Duplicate") == False', inplace=True)
Out[50]:
Name City Country IATA ICAO Latitude Longitude Altitude Timezone DST Tz database time zone Type Source Name_length
Id
1 Goroka Goroka Papua New Guinea GKA AYGA -6.081690 145.391998 1610.365854 10 U Pacific/Port_Moresby airport OurAirports 7
2 Madang Madang Papua New Guinea MAG AYMD -5.207080 145.789001 6.097561 10 U Pacific/Port_Moresby airport OurAirports 7
3 Mount Hagen Kagamuga Mount Hagen Papua New Guinea HGU AYMH -5.826790 144.296005 1642.682927 10 U Pacific/Port_Moresby airport OurAirports 21
4 Nadzab Nadzab Papua New Guinea LAE AYNZ -6.569803 146.725977 72.865854 10 U Pacific/Port_Moresby airport OurAirports 7
5 Port Moresby Jacksons International Port Moresby Papua New Guinea POM AYPY -9.443380 147.220001 44.512195 10 U Pacific/Port_Moresby airport OurAirports 36

Data fields

This a sample of the data we have available.

In [58]:
df.head()
Out[58]:
Name City Country IATA ICAO Latitude Longitude Altitude Timezone DST Tz database time zone Type Source Name_length
Id
1 Goroka Goroka Papua New Guinea GKA AYGA -6.081690 145.391998 1610.365854 10 U Pacific/Port_Moresby airport OurAirports 6
2 Madang Madang Papua New Guinea MAG AYMD -5.207080 145.789001 6.097561 10 U Pacific/Port_Moresby airport OurAirports 6
3 Mount Hagen Kagamuga Mount Hagen Papua New Guinea HGU AYMH -5.826790 144.296005 1642.682927 10 U Pacific/Port_Moresby airport OurAirports 20
4 Nadzab Nadzab Papua New Guinea LAE AYNZ -6.569803 146.725977 72.865854 10 U Pacific/Port_Moresby airport OurAirports 6
5 Port Moresby Jacksons International Port Moresby Papua New Guinea POM AYPY -9.443380 147.220001 44.512195 10 U Pacific/Port_Moresby airport OurAirports 35

Name length

There is a great variation in name length, ranging from 2 to 57 characters.

In [62]:
print(df['Name_length'].describe())
print("")
print(df.loc[df['Name'].map(len).idxmax(), ['Name','ICAO', 'Country', 'Name_length']])
print("")
print(df.loc[df['Name'].map(len).idxmin(), ['Name','ICAO', 'Country', 'Name_length']])
count    7695.000000
mean       15.073164
std         8.768695
min         2.000000
25%         8.000000
50%        13.000000
75%        20.000000
max        57.000000
Name: Name_length, dtype: float64

Name           Guarulhos - Governador André Franco Montoro International
ICAO                                                                SBGR
Country                                                           Brazil
Name_length                                                           57
Name: 2564, dtype: object

Name              Wa
ICAO            DGLW
Country        Ghana
Name_length        2
Name: 250, dtype: object
In [25]:
df.describe()
Out[25]:
Latitude Longitude Altitude Name_length
count 7695.000000 7695.000000 7695.000000 7695.000000
mean 25.799139 -1.381275 309.799006 22.057960
std 28.399319 86.530619 496.649707 8.444517
min -90.000000 -179.876999 -385.975610 5.000000
25% 6.864085 -78.965199 19.207317 16.000000
50% 34.085300 6.387140 107.317073 20.000000
75% 47.236500 56.036150 366.920732 27.000000
max 82.517799 179.951004 4412.195122 65.000000

Location of airports across the world

A top 10 of amount of airports per country.

In [63]:
#df.plot(x='Country')
df.groupby('Country').count().sort_values(by=['Name'], ascending=False).head(10)[['Name']]
Out[63]:
Name
Country
United States 1512
Canada 430
Australia 334
Russia 264
Brazil 264
Germany 248
China 241
France 217
United Kingdom 167
India 148