Interactive data visualization in Python: Geopandas¶

Overview¶

Within this notebook, we will cover:

Browser-based interactive maps of point-based data
Geopandas

Prerequisites¶

Concepts	Importance	Notes
Cartopy Intro	Required	Projections and Features
Pandas	Required	Tabular Datasets

Time to learn: 20 minutes

All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.

We have previously displayed tropical cyclone locations on a static map, using Cartopy, Matplotlib, and (in your HW1 assignment) Pandas. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.

Imports¶

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

/tmp/ipykernel_4158155/2440184564.py:6: DeprecationWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas still uses PyGEOS by default. However, starting with version 0.14, the default will switch to Shapely. To force to use Shapely 2.0 now, you can either uninstall PyGEOS or set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd

Note: The previous cell includes code lines that deal with the warning message you likely see above.

Read in the iBTrACS database¶

iBTrACS, aka the International Best Track Archive for Climate Stewardship, is a continually-updated database of worldwide tropical cyclones.

The iBTrACS datasets have over 100 columns. Let’s focus on just a subset of columns that have most relevance to the North Atlantic/Gulf of Mexico/Caribbean Sea basins.

keepCols = ['SEASON', 'NAME', 'ISO_TIME', 'LAT', 'LON', 'USA_STATUS', 'USA_WIND', 'USA_PRES']

Open a pandas Dataframe from a URL pointing to the North Atlantic iBTrACS dataset.

df = pd.read_csv("https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.NA.list.v04r00.csv",low_memory=False, skiprows = [1], usecols=keepCols)

Note: We specified three additional options to Pandas' read_csv function. You can explore the full suite of options via the Pandas API documentation.

df

	SEASON	NAME	ISO_TIME	LAT	LON	USA_STATUS	USA_WIND	USA_PRES
0	1851	NOT_NAMED	1851-06-23 12:00:00	26.1000	-90.4000
1	1851	NOT_NAMED	1851-06-23 15:00:00	26.2001	-90.6999
2	1851	NOT_NAMED	1851-06-23 18:00:00	26.3000	-91.0000
3	1851	NOT_NAMED	1851-06-23 21:00:00	26.3999	-91.3001
4	1851	NOT_NAMED	1851-06-24 00:00:00	26.5000	-91.6000
...	...	...	...	...	...	...	...	...
126953	2023	SEAN	2023-10-14 12:00:00	15.8000	-43.9000		29	1007
126954	2023	SEAN	2023-10-14 15:00:00	16.0650	-44.2092		29	1007
126955	2023	SEAN	2023-10-14 18:00:00	16.3000	-44.5000		29	1007
126956	2023	SEAN	2023-10-14 21:00:00	16.5076	-44.7974		29	1007
126957	2023	SEAN	2023-10-15 00:00:00	16.7000	-45.1000		29	1008

126958 rows × 8 columns

Create a new Dataframe consisting only of tropical cyclone Franklin (2023).

dfFranklin = df.query('NAME == "FRANKLIN" & SEASON == 2023')

Note: Pandas query function allows for database-like queries on a Dataset. For more info, check out the documentation for the df.query function.

dfFranklin

	SEASON	NAME	ISO_TIME	LAT	LON	USA_STATUS	USA_WIND	USA_PRES
125903	2023	FRANKLIN	2023-08-18 06:00:00	11.6000	-53.5000	DB	15
125904	2023	FRANKLIN	2023-08-18 09:00:00	11.5900	-53.8225	DB	15
125905	2023	FRANKLIN	2023-08-18 12:00:00	11.6000	-54.2000	DB	15
125906	2023	FRANKLIN	2023-08-18 15:00:00	11.6425	-54.6700	DB	17
125907	2023	FRANKLIN	2023-08-18 18:00:00	11.7000	-55.2000	DB	20	1008
...	...	...	...	...	...	...	...	...
126015	2023	FRANKLIN	2023-09-01 06:00:00	37.4000	-57.9000	HU	70	983
126016	2023	FRANKLIN	2023-09-01 09:00:00	37.8122	-57.2549	HU	67	984
126017	2023	FRANKLIN	2023-09-01 12:00:00	38.2000	-56.6000	HU	65	985
126018	2023	FRANKLIN	2023-09-01 15:00:00	38.5994	-55.7943	HU	67	982
126019	2023	FRANKLIN	2023-09-01 18:00:00	39.0000	-54.9000	EX	70	979

117 rows × 8 columns

Make our Pandas Dataframe geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry column, which may consist of shapes or points. The TC locations are points, so that’s what we’ll use to instantiate the Geometry column.

gdf = gpd.GeoDataFrame(dfFranklin,geometry=gpd.points_from_xy(dfFranklin.LON,dfFranklin.LAT))

gdf

	SEASON	NAME	ISO_TIME	LAT	LON	USA_STATUS	USA_WIND	USA_PRES	geometry
125903	2023	FRANKLIN	2023-08-18 06:00:00	11.6000	-53.5000	DB	15		POINT (-53.50000 11.60000)
125904	2023	FRANKLIN	2023-08-18 09:00:00	11.5900	-53.8225	DB	15		POINT (-53.82250 11.59000)
125905	2023	FRANKLIN	2023-08-18 12:00:00	11.6000	-54.2000	DB	15		POINT (-54.20000 11.60000)
125906	2023	FRANKLIN	2023-08-18 15:00:00	11.6425	-54.6700	DB	17		POINT (-54.67000 11.64250)
125907	2023	FRANKLIN	2023-08-18 18:00:00	11.7000	-55.2000	DB	20	1008	POINT (-55.20000 11.70000)
...	...	...	...	...	...	...	...	...	...
126015	2023	FRANKLIN	2023-09-01 06:00:00	37.4000	-57.9000	HU	70	983	POINT (-57.90000 37.40000)
126016	2023	FRANKLIN	2023-09-01 09:00:00	37.8122	-57.2549	HU	67	984	POINT (-57.25490 37.81220)
126017	2023	FRANKLIN	2023-09-01 12:00:00	38.2000	-56.6000	HU	65	985	POINT (-56.60000 38.20000)
126018	2023	FRANKLIN	2023-09-01 15:00:00	38.5994	-55.7943	HU	67	982	POINT (-55.79430 38.59940)
126019	2023	FRANKLIN	2023-09-01 18:00:00	39.0000	-54.9000	EX	70	979	POINT (-54.90000 39.00000)

117 rows × 9 columns

Note that geometry appears as a new column (Series).

We can interactively explore this Dataframe as a map in the browser!

gdf.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

Well … we have an interactive frame … and it looks like there’s a track in there … but where is the interactive map??

We still have a little more work to do:

While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.

Note the arguments to set_crs:

epsg = 4326: Assign the specified CRS
inplace = True: The gdf object is updated without the need to assign a new dataframe object
allow_override = True: If a CRS had previously been applied, override with the EPSG value specified.

gdf.set_crs(epsg=4326, inplace=True, allow_override=True)

	SEASON	NAME	ISO_TIME	LAT	LON	USA_STATUS	USA_WIND	USA_PRES	geometry
125903	2023	FRANKLIN	2023-08-18 06:00:00	11.6000	-53.5000	DB	15		POINT (-53.50000 11.60000)
125904	2023	FRANKLIN	2023-08-18 09:00:00	11.5900	-53.8225	DB	15		POINT (-53.82250 11.59000)
125905	2023	FRANKLIN	2023-08-18 12:00:00	11.6000	-54.2000	DB	15		POINT (-54.20000 11.60000)
125906	2023	FRANKLIN	2023-08-18 15:00:00	11.6425	-54.6700	DB	17		POINT (-54.67000 11.64250)
125907	2023	FRANKLIN	2023-08-18 18:00:00	11.7000	-55.2000	DB	20	1008	POINT (-55.20000 11.70000)
...	...	...	...	...	...	...	...	...	...
126015	2023	FRANKLIN	2023-09-01 06:00:00	37.4000	-57.9000	HU	70	983	POINT (-57.90000 37.40000)
126016	2023	FRANKLIN	2023-09-01 09:00:00	37.8122	-57.2549	HU	67	984	POINT (-57.25490 37.81220)
126017	2023	FRANKLIN	2023-09-01 12:00:00	38.2000	-56.6000	HU	65	985	POINT (-56.60000 38.20000)
126018	2023	FRANKLIN	2023-09-01 15:00:00	38.5994	-55.7943	HU	67	982	POINT (-55.79430 38.59940)
126019	2023	FRANKLIN	2023-09-01 18:00:00	39.0000	-54.9000	EX	70	979	POINT (-54.90000 39.00000)

117 rows × 9 columns

Now, let’s try the explore function again!

gdf.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe.

Now, let’s select just one column from the Dataframe and explore once again.

gdf.explore(column='USA_WIND')

Make this Notebook Trusted to load map: File -> Trust Notebook

Notice that the values aren’t in numerical order … they’re being treated as strings. Let’s explicitly set the USA_WIND column to be an integer (a 16-bit size is fine).

gdf['USA_WIND'] = gdf['USA_WIND'].astype('int16')

gdf.explore('USA_WIND')

Make this Notebook Trusted to load map: File -> Trust Notebook

By default, passing in one column of numerical values will color-code each value!

Things to try¶

On your own, create Jupyter notebooks and do the following:¶

Select another storm from the historical record and plot its track.
Instead of just one specific named storm, create a Dataframe including all storms from a particular year … or multiple years. Then plot the tracks of all storms from that year. Can you find a way to clearly show what track goes with each storm?
Examine the Geopandas website and experiment with different visualizations, either from the iBTrACS database or another dataset of interest to you.

ATM433/533 Fall 2023

Interactive data visualization in Python: Geopandas

Contents

Interactive data visualization in Python: Geopandas¶

Overview¶

Prerequisites¶

Imports¶

Read in the iBTrACS database¶

Things to try¶

On your own, create Jupyter notebooks and do the following:¶

References¶