Interactive data visualization in Python: Geopandas


Overview

Within this notebook, we will cover:

  1. Browser-based interactive maps of point-based data

  2. Geopandas

Prerequisites

Concepts

Importance

Notes

Cartopy Intro

Required

Projections and Features

Pandas

Required

Tabular Datasets

  • Time to learn: 20 minutes


All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.


We have previously displayed tropical cyclone locations on a static map, using Cartopy, Matplotlib, and (in your HW1 assignment) Pandas. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.

Imports

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas
/tmp/ipykernel_4158155/2440184564.py:6: DeprecationWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas still uses PyGEOS by default. However, starting with version 0.14, the default will switch to Shapely. To force to use Shapely 2.0 now, you can either uninstall PyGEOS or set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd
Note: The previous cell includes code lines that deal with the warning message you likely see above.

Read in the iBTrACS database

iBTrACS, aka the International Best Track Archive for Climate Stewardship, is a continually-updated database of worldwide tropical cyclones.

The iBTrACS datasets have over 100 columns. Let’s focus on just a subset of columns that have most relevance to the North Atlantic/Gulf of Mexico/Caribbean Sea basins.

keepCols = ['SEASON', 'NAME', 'ISO_TIME', 'LAT', 'LON', 'USA_STATUS', 'USA_WIND', 'USA_PRES']

Open a pandas Dataframe from a URL pointing to the North Atlantic iBTrACS dataset.

df = pd.read_csv("https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.NA.list.v04r00.csv",low_memory=False, skiprows = [1], usecols=keepCols)
Note: We specified three additional options to Pandas' read_csv function. You can explore the full suite of options via the Pandas API documentation.
df
SEASON NAME ISO_TIME LAT LON USA_STATUS USA_WIND USA_PRES
0 1851 NOT_NAMED 1851-06-23 12:00:00 26.1000 -90.4000
1 1851 NOT_NAMED 1851-06-23 15:00:00 26.2001 -90.6999
2 1851 NOT_NAMED 1851-06-23 18:00:00 26.3000 -91.0000
3 1851 NOT_NAMED 1851-06-23 21:00:00 26.3999 -91.3001
4 1851 NOT_NAMED 1851-06-24 00:00:00 26.5000 -91.6000
... ... ... ... ... ... ... ... ...
126953 2023 SEAN 2023-10-14 12:00:00 15.8000 -43.9000 29 1007
126954 2023 SEAN 2023-10-14 15:00:00 16.0650 -44.2092 29 1007
126955 2023 SEAN 2023-10-14 18:00:00 16.3000 -44.5000 29 1007
126956 2023 SEAN 2023-10-14 21:00:00 16.5076 -44.7974 29 1007
126957 2023 SEAN 2023-10-15 00:00:00 16.7000 -45.1000 29 1008

126958 rows × 8 columns

Create a new Dataframe consisting only of tropical cyclone Franklin (2023).

dfFranklin = df.query('NAME == "FRANKLIN" & SEASON == 2023')
Note: Pandas query function allows for database-like queries on a Dataset. For more info, check out the documentation for the df.query function.
dfFranklin
SEASON NAME ISO_TIME LAT LON USA_STATUS USA_WIND USA_PRES
125903 2023 FRANKLIN 2023-08-18 06:00:00 11.6000 -53.5000 DB 15
125904 2023 FRANKLIN 2023-08-18 09:00:00 11.5900 -53.8225 DB 15
125905 2023 FRANKLIN 2023-08-18 12:00:00 11.6000 -54.2000 DB 15
125906 2023 FRANKLIN 2023-08-18 15:00:00 11.6425 -54.6700 DB 17
125907 2023 FRANKLIN 2023-08-18 18:00:00 11.7000 -55.2000 DB 20 1008
... ... ... ... ... ... ... ... ...
126015 2023 FRANKLIN 2023-09-01 06:00:00 37.4000 -57.9000 HU 70 983
126016 2023 FRANKLIN 2023-09-01 09:00:00 37.8122 -57.2549 HU 67 984
126017 2023 FRANKLIN 2023-09-01 12:00:00 38.2000 -56.6000 HU 65 985
126018 2023 FRANKLIN 2023-09-01 15:00:00 38.5994 -55.7943 HU 67 982
126019 2023 FRANKLIN 2023-09-01 18:00:00 39.0000 -54.9000 EX 70 979

117 rows × 8 columns

Make our Pandas Dataframe geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry column, which may consist of shapes or points. The TC locations are points, so that’s what we’ll use to instantiate the Geometry column.

gdf = gpd.GeoDataFrame(dfFranklin,geometry=gpd.points_from_xy(dfFranklin.LON,dfFranklin.LAT))
gdf
SEASON NAME ISO_TIME LAT LON USA_STATUS USA_WIND USA_PRES geometry
125903 2023 FRANKLIN 2023-08-18 06:00:00 11.6000 -53.5000 DB 15 POINT (-53.50000 11.60000)
125904 2023 FRANKLIN 2023-08-18 09:00:00 11.5900 -53.8225 DB 15 POINT (-53.82250 11.59000)
125905 2023 FRANKLIN 2023-08-18 12:00:00 11.6000 -54.2000 DB 15 POINT (-54.20000 11.60000)
125906 2023 FRANKLIN 2023-08-18 15:00:00 11.6425 -54.6700 DB 17 POINT (-54.67000 11.64250)
125907 2023 FRANKLIN 2023-08-18 18:00:00 11.7000 -55.2000 DB 20 1008 POINT (-55.20000 11.70000)
... ... ... ... ... ... ... ... ... ...
126015 2023 FRANKLIN 2023-09-01 06:00:00 37.4000 -57.9000 HU 70 983 POINT (-57.90000 37.40000)
126016 2023 FRANKLIN 2023-09-01 09:00:00 37.8122 -57.2549 HU 67 984 POINT (-57.25490 37.81220)
126017 2023 FRANKLIN 2023-09-01 12:00:00 38.2000 -56.6000 HU 65 985 POINT (-56.60000 38.20000)
126018 2023 FRANKLIN 2023-09-01 15:00:00 38.5994 -55.7943 HU 67 982 POINT (-55.79430 38.59940)
126019 2023 FRANKLIN 2023-09-01 18:00:00 39.0000 -54.9000 EX 70 979 POINT (-54.90000 39.00000)

117 rows × 9 columns

Note that geometry appears as a new column (Series).

We can interactively explore this Dataframe as a map in the browser!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Well … we have an interactive frame … and it looks like there’s a track in there … but where is the interactive map??

We still have a little more work to do:

While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.

Note the arguments to set_crs:

  1. epsg = 4326: Assign the specified CRS

  2. inplace = True: The gdf object is updated without the need to assign a new dataframe object

  3. allow_override = True: If a CRS had previously been applied, override with the EPSG value specified.

gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
SEASON NAME ISO_TIME LAT LON USA_STATUS USA_WIND USA_PRES geometry
125903 2023 FRANKLIN 2023-08-18 06:00:00 11.6000 -53.5000 DB 15 POINT (-53.50000 11.60000)
125904 2023 FRANKLIN 2023-08-18 09:00:00 11.5900 -53.8225 DB 15 POINT (-53.82250 11.59000)
125905 2023 FRANKLIN 2023-08-18 12:00:00 11.6000 -54.2000 DB 15 POINT (-54.20000 11.60000)
125906 2023 FRANKLIN 2023-08-18 15:00:00 11.6425 -54.6700 DB 17 POINT (-54.67000 11.64250)
125907 2023 FRANKLIN 2023-08-18 18:00:00 11.7000 -55.2000 DB 20 1008 POINT (-55.20000 11.70000)
... ... ... ... ... ... ... ... ... ...
126015 2023 FRANKLIN 2023-09-01 06:00:00 37.4000 -57.9000 HU 70 983 POINT (-57.90000 37.40000)
126016 2023 FRANKLIN 2023-09-01 09:00:00 37.8122 -57.2549 HU 67 984 POINT (-57.25490 37.81220)
126017 2023 FRANKLIN 2023-09-01 12:00:00 38.2000 -56.6000 HU 65 985 POINT (-56.60000 38.20000)
126018 2023 FRANKLIN 2023-09-01 15:00:00 38.5994 -55.7943 HU 67 982 POINT (-55.79430 38.59940)
126019 2023 FRANKLIN 2023-09-01 18:00:00 39.0000 -54.9000 EX 70 979 POINT (-54.90000 39.00000)

117 rows × 9 columns

Now, let’s try the explore function again!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe.

Now, let’s select just one column from the Dataframe and explore once again.

gdf.explore(column='USA_WIND')
Make this Notebook Trusted to load map: File -> Trust Notebook

Notice that the values aren’t in numerical order … they’re being treated as strings. Let’s explicitly set the USA_WIND column to be an integer (a 16-bit size is fine).

gdf['USA_WIND'] = gdf['USA_WIND'].astype('int16')
gdf.explore('USA_WIND')
Make this Notebook Trusted to load map: File -> Trust Notebook

By default, passing in one column of numerical values will color-code each value!

Things to try

On your own, create Jupyter notebooks and do the following:

  1. Select another storm from the historical record and plot its track.

  2. Instead of just one specific named storm, create a Dataframe including all storms from a particular year … or multiple years. Then plot the tracks of all storms from that year. Can you find a way to clearly show what track goes with each storm?

  3. Examine the Geopandas website and experiment with different visualizations, either from the iBTrACS database or another dataset of interest to you.