Interactive data visualization in Python: Geopandas#


Overview#

Within this notebook, we will cover:

  1. Browser-based interactive maps of point-based data

  2. Geopandas

Prerequisites#

Concepts

Importance

Notes

Cartopy Intro

Required

Projections and Features

Pandas

Required

Tabular Datasets

  • Time to learn: 20 minutes


All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.


We have previously displayed real-time NYS Mesonet observations on a static map, using Pandas, Cartopy, and Matplotlib. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.

Imports#

import matplotlib.pyplot as plt
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd

Read in the most recent set of NYSM observations#

nysm_latest = pd.read_csv('https://www.atmos.albany.edu/products/nysm/nysm_latest.csv')
nysm_latest
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name
0 ADDI 2025-05-06 15:35:00 17.6 17.3 69.5 0.0 5.03 0.0 3.9 6.3 ... 13.8 12.6 11.3 0.59 0.37 0.43 42.040360 -77.237260 507.6140 Addison
1 ANDE 2025-05-06 15:35:00 17.4 16.9 83.0 0.0 0.44 0.0 2.6 4.7 ... 14.4 13.3 12.0 0.28 0.15 0.11 42.182270 -74.801390 518.2820 Andes
2 BATA 2025-05-06 15:35:00 16.8 16.5 72.3 0.0 18.23 0.0 2.9 4.5 ... 14.3 12.4 11.3 0.34 0.31 0.30 43.019940 -78.135660 276.1200 Batavia
3 BEAC 2025-05-06 15:35:00 18.6 18.0 92.0 0.0 18.78 0.0 1.8 4.3 ... 15.7 14.9 14.7 0.46 0.38 0.37 41.528750 -73.945270 90.1598 Beacon
4 BELD 2025-05-06 15:35:00 15.4 15.1 96.8 0.0 17.33 0.0 2.1 3.0 ... 13.7 13.4 11.9 0.53 0.46 0.40 42.223220 -75.668520 470.3700 Belden
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
122 WFMB 2025-05-06 15:35:00 16.7 16.0 79.9 0.0 1.60 0.0 1.6 3.6 ... 13.0 13.0 10.2 0.30 0.25 0.28 44.393236 -73.858829 614.5990 Whiteface Mountain Base
123 WGAT 2025-05-06 15:35:00 16.4 16.3 83.0 0.0 0.92 0.0 0.8 2.2 ... 14.9 12.8 11.5 0.17 0.27 0.10 43.532408 -75.158597 442.9660 Woodgate
124 WHIT 2025-05-06 15:35:00 18.1 17.3 91.1 0.0 10.47 0.0 1.2 2.0 ... 14.0 12.1 10.4 0.61 0.52 0.47 43.485073 -73.423071 36.5638 Whitehall
125 WOLC 2025-05-06 15:35:00 19.9 19.5 75.0 0.0 2.21 0.0 3.9 5.8 ... 17.0 14.4 14.1 0.24 0.07 0.12 43.228680 -76.842610 121.2190 Wolcott
126 YORK 2025-05-06 15:35:00 19.4 18.7 62.1 0.0 4.37 0.0 2.4 4.4 ... 14.9 13.4 12.4 0.29 0.30 0.38 42.855040 -77.847760 177.9420 York

127 rows × 34 columns

Make our Pandas Dataframe geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry column, which may consist of shapes or points. The NYSM locations are points, so that’s what we’ll use to instantiate the Geometry column.

gdf = gpd.GeoDataFrame(nysm_latest,geometry=gpd.points_from_xy(nysm_latest.lon,nysm_latest.lat))
gdf
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2025-05-06 15:35:00 17.6 17.3 69.5 0.0 5.03 0.0 3.9 6.3 ... 12.6 11.3 0.59 0.37 0.43 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2025-05-06 15:35:00 17.4 16.9 83.0 0.0 0.44 0.0 2.6 4.7 ... 13.3 12.0 0.28 0.15 0.11 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2025-05-06 15:35:00 16.8 16.5 72.3 0.0 18.23 0.0 2.9 4.5 ... 12.4 11.3 0.34 0.31 0.30 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2025-05-06 15:35:00 18.6 18.0 92.0 0.0 18.78 0.0 1.8 4.3 ... 14.9 14.7 0.46 0.38 0.37 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2025-05-06 15:35:00 15.4 15.1 96.8 0.0 17.33 0.0 2.1 3.0 ... 13.4 11.9 0.53 0.46 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
122 WFMB 2025-05-06 15:35:00 16.7 16.0 79.9 0.0 1.60 0.0 1.6 3.6 ... 13.0 10.2 0.30 0.25 0.28 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
123 WGAT 2025-05-06 15:35:00 16.4 16.3 83.0 0.0 0.92 0.0 0.8 2.2 ... 12.8 11.5 0.17 0.27 0.10 43.532408 -75.158597 442.9660 Woodgate POINT (-75.1586 43.53241)
124 WHIT 2025-05-06 15:35:00 18.1 17.3 91.1 0.0 10.47 0.0 1.2 2.0 ... 12.1 10.4 0.61 0.52 0.47 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
125 WOLC 2025-05-06 15:35:00 19.9 19.5 75.0 0.0 2.21 0.0 3.9 5.8 ... 14.4 14.1 0.24 0.07 0.12 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
126 YORK 2025-05-06 15:35:00 19.4 18.7 62.1 0.0 4.37 0.0 2.4 4.4 ... 13.4 12.4 0.29 0.30 0.38 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

127 rows × 35 columns

Note that geometry appears as a new column (Series).

We can interactively explore this Dataframe as a map in the browser!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Well … we have an interactive frame … and it looks like New York State … but where is the interactive map??

We still have a little more work to do:

While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.

Note the arguments to set_crs:

  1. epsg = 4326: Assign the specified CRS

  2. inplace = True: The gdf object is updated without the need to assign a new dataframe object

  3. allow_override = True: If a CRS had previously been applied, override with the EPSG value specified.

gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2025-05-06 15:35:00 17.6 17.3 69.5 0.0 5.03 0.0 3.9 6.3 ... 12.6 11.3 0.59 0.37 0.43 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2025-05-06 15:35:00 17.4 16.9 83.0 0.0 0.44 0.0 2.6 4.7 ... 13.3 12.0 0.28 0.15 0.11 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2025-05-06 15:35:00 16.8 16.5 72.3 0.0 18.23 0.0 2.9 4.5 ... 12.4 11.3 0.34 0.31 0.30 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2025-05-06 15:35:00 18.6 18.0 92.0 0.0 18.78 0.0 1.8 4.3 ... 14.9 14.7 0.46 0.38 0.37 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2025-05-06 15:35:00 15.4 15.1 96.8 0.0 17.33 0.0 2.1 3.0 ... 13.4 11.9 0.53 0.46 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
122 WFMB 2025-05-06 15:35:00 16.7 16.0 79.9 0.0 1.60 0.0 1.6 3.6 ... 13.0 10.2 0.30 0.25 0.28 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
123 WGAT 2025-05-06 15:35:00 16.4 16.3 83.0 0.0 0.92 0.0 0.8 2.2 ... 12.8 11.5 0.17 0.27 0.10 43.532408 -75.158597 442.9660 Woodgate POINT (-75.1586 43.53241)
124 WHIT 2025-05-06 15:35:00 18.1 17.3 91.1 0.0 10.47 0.0 1.2 2.0 ... 12.1 10.4 0.61 0.52 0.47 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
125 WOLC 2025-05-06 15:35:00 19.9 19.5 75.0 0.0 2.21 0.0 3.9 5.8 ... 14.4 14.1 0.24 0.07 0.12 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
126 YORK 2025-05-06 15:35:00 19.4 18.7 62.1 0.0 4.37 0.0 2.4 4.4 ... 13.4 12.4 0.29 0.30 0.38 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

127 rows × 35 columns

Now, let’s try the explore function again!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe.

Now, let’s select just one column from the Dataframe and explore once again.

gdf.explore(column='temp_2m [degC]')
Make this Notebook Trusted to load map: File -> Trust Notebook

By default, passing in one column of numerical values will color-code each value!

Explore the most recent worldwide METARs#

In this particular dataset, missing values are either -9999.00 or -9999.0. We will pass these as a list via the na_values argument in our call to read_csv
worldMetar = pd.read_csv("https://www.atmos.albany.edu/products/metarCSV/world_metar_latest.csv", sep='\\s+',na_values=['9999.00','-9999.0'])
worldMetar.rename(columns = {'SLAT':'lat','SLON':'lon'},inplace=True)

Examine the dataframe

worldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 NUW 250506/1500 48.35 -122.65 14.0 11.1 7.8 80.10 1020.3 0.00 NaN 0.0 NaN
1 PAGA 250506/1500 64.73 -156.93 46.0 1.1 0.0 92.36 1005.8 0.00 NaN 0.0 NaN
2 PAKN 250506/1500 58.68 -156.65 15.0 1.7 0.0 88.47 1000.6 3.09 NaN 360.0 NaN
3 CACQ 250506/1500 47.00 -65.45 34.0 19.0 2.0 32.14 1028.1 6.69 NaN 230.0 NaN
4 CAFC 250506/1500 45.92 -66.60 35.0 14.0 4.0 50.91 1029.5 3.60 NaN 160.0 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
4335 LHNY 250506/1500 NaN NaN NaN 15.0 8.0 62.92 NaN 5.15 NaN 360.0 NaN
4336 SUPU 250506/1500 NaN NaN NaN 24.0 20.0 78.34 NaN 4.63 NaN 50.0 NaN
4337 EGOM 250506/1500 NaN NaN NaN 15.0 4.0 47.72 NaN 4.63 NaN 270.0 NaN
4338 EGOP 250506/1500 NaN NaN NaN 14.0 9.0 71.83 NaN 7.21 NaN 260.0 NaN
4339 EGQA 250506/1500 NaN NaN NaN 13.0 4.0 54.34 NaN 1.54 NaN 90.0 NaN

4340 rows × 13 columns

We have several stations at the end of the Dataframe whose latitudes and longitudes (and elevations) are -9999.00 and thus set to NaN when we created the Dataframe. These denote stations whose coordinates are not set in the underlying data source. We need to eliminate them in order for the geolocation to work properly.

lat, lon = worldMetar.lat, worldMetar.lon
worldMetar = worldMetar.dropna(subset=['lat','lon']).reset_index(drop=True)

Now, we can proceed with creating the Geopandas dataframe.

gdfWorldMetar = gpd.GeoDataFrame(worldMetar,geometry=gpd.points_from_xy(worldMetar.lon,worldMetar.lat))
gdfWorldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 NUW 250506/1500 48.35 -122.65 14.0 11.1 7.8 80.10 1020.3 0.00 NaN 0.0 NaN POINT (-122.65 48.35)
1 PAGA 250506/1500 64.73 -156.93 46.0 1.1 0.0 92.36 1005.8 0.00 NaN 0.0 NaN POINT (-156.93 64.73)
2 PAKN 250506/1500 58.68 -156.65 15.0 1.7 0.0 88.47 1000.6 3.09 NaN 360.0 NaN POINT (-156.65 58.68)
3 CACQ 250506/1500 47.00 -65.45 34.0 19.0 2.0 32.14 1028.1 6.69 NaN 230.0 NaN POINT (-65.45 47)
4 CAFC 250506/1500 45.92 -66.60 35.0 14.0 4.0 50.91 1029.5 3.60 NaN 160.0 NaN POINT (-66.6 45.92)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4239 MGQZ 250506/1500 14.87 -91.50 2371.0 16.0 12.0 77.14 NaN 0.00 NaN 0.0 NaN POINT (-91.5 14.87)
4240 MGCB 250506/1500 15.47 -90.41 1323.0 24.0 20.0 78.34 NaN 4.12 NaN 210.0 NaN POINT (-90.41 15.47)
4241 MGZA 250506/1500 14.96 -89.54 193.0 30.0 23.0 66.15 NaN 0.00 NaN 0.0 NaN POINT (-89.54 14.96)
4242 MGES 250506/1500 14.57 -89.33 949.0 27.0 20.0 65.54 NaN 0.00 NaN 0.0 NaN POINT (-89.33 14.57)
4243 UWOO 250506/1500 51.80 55.46 118.0 20.0 9.0 49.10 NaN 3.09 NaN 30.0 NaN POINT (55.46 51.8)

4244 rows × 14 columns

gdfWorldMetar.set_crs(epsg=4326, inplace=True, allow_override=True)
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 NUW 250506/1500 48.35 -122.65 14.0 11.1 7.8 80.10 1020.3 0.00 NaN 0.0 NaN POINT (-122.65 48.35)
1 PAGA 250506/1500 64.73 -156.93 46.0 1.1 0.0 92.36 1005.8 0.00 NaN 0.0 NaN POINT (-156.93 64.73)
2 PAKN 250506/1500 58.68 -156.65 15.0 1.7 0.0 88.47 1000.6 3.09 NaN 360.0 NaN POINT (-156.65 58.68)
3 CACQ 250506/1500 47.00 -65.45 34.0 19.0 2.0 32.14 1028.1 6.69 NaN 230.0 NaN POINT (-65.45 47)
4 CAFC 250506/1500 45.92 -66.60 35.0 14.0 4.0 50.91 1029.5 3.60 NaN 160.0 NaN POINT (-66.6 45.92)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4239 MGQZ 250506/1500 14.87 -91.50 2371.0 16.0 12.0 77.14 NaN 0.00 NaN 0.0 NaN POINT (-91.5 14.87)
4240 MGCB 250506/1500 15.47 -90.41 1323.0 24.0 20.0 78.34 NaN 4.12 NaN 210.0 NaN POINT (-90.41 15.47)
4241 MGZA 250506/1500 14.96 -89.54 193.0 30.0 23.0 66.15 NaN 0.00 NaN 0.0 NaN POINT (-89.54 14.96)
4242 MGES 250506/1500 14.57 -89.33 949.0 27.0 20.0 65.54 NaN 0.00 NaN 0.0 NaN POINT (-89.33 14.57)
4243 UWOO 250506/1500 51.80 55.46 118.0 20.0 9.0 49.10 NaN 3.09 NaN 30.0 NaN POINT (55.46 51.8)

4244 rows × 14 columns

gdfWorldMetar.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
Note: Most of the non-US data is not available until approximately 15 minutes past each hour. If you do not see many worldwide stations, try re-reading the data file after that point in the hour.

Create a color-coded map from one variable in the dataset.#

gdfWorldMetar.explore(column='TMPC')
Make this Notebook Trusted to load map: File -> Trust Notebook
Note that the color scale is unaffected by any location whose temperature value was missing ... GeoPandas "knows" to exlucde NaN from the range of valid values.

Plot hourly precipitation

gdfWorldMetar.explore(column='P01M')
Make this Notebook Trusted to load map: File -> Trust Notebook
With few exceptions, hourly precip is only provided by US stations. For this variable, a better visualization would exclude all the missing values (in the US, a missing value denotes no precip, while a trace = 0.00)
gdfWorldMetar = gdfWorldMetar.dropna(subset=['P01M']).reset_index(drop=True)
gdfWorldMetar.explore(column='P01M')
Make this Notebook Trusted to load map: File -> Trust Notebook

References#

  1. GeoPandas

  2. EPSG