Interactive data visualization in Python: Geopandas


Overview

Within this notebook, we will cover:

  1. Browser-based interactive maps of point-based data

  2. Geopandas

Prerequisites

Concepts

Importance

Notes

Cartopy Intro

Required

Projections and Features

Pandas

Required

Tabular Datasets

  • Time to learn: 20 minutes


All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.


We have previously displayed real-time NYS Mesonet observations on a static map, using Pandas, Cartopy, and Matplotlib. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.

Imports

import matplotlib.pyplot as plt
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd

Read in the most recent set of NYSM observations

nysm_latest = pd.read_csv('https://www.atmos.albany.edu/products/nysm/nysm_latest.csv')
nysm_latest
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name
0 ADDI 2024-04-30 01:15:00 17.9 19.4 81.0 0.0 0.0 0.0 0.1 0.5 ... 13.7 10.2 8.9 0.52 0.45 0.44 42.040360 -77.237260 507.6140 Addison
1 ANDE 2024-04-30 01:15:00 18.7 19.4 68.2 0.0 0.0 0.0 0.2 1.2 ... 16.9 12.4 9.5 0.22 0.11 0.09 42.182270 -74.801390 518.2820 Andes
2 BATA 2024-04-30 01:15:00 12.0 12.4 83.0 0.0 0.0 0.0 2.2 2.9 ... 13.5 11.2 9.7 0.27 0.25 0.27 43.019940 -78.135660 276.1200 Batavia
3 BEAC 2024-04-30 01:15:00 18.7 18.6 66.8 0.0 0.0 0.0 0.8 1.5 ... 16.3 13.5 12.1 0.29 0.24 0.19 41.528750 -73.945270 90.1598 Beacon
4 BELD 2024-04-30 01:15:00 20.6 21.6 58.8 0.0 0.0 0.0 0.0 0.3 ... 12.4 10.3 8.6 0.32 0.35 0.40 42.223220 -75.668520 470.3700 Belden
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2024-04-30 01:15:00 4.7 5.5 89.2 0.0 0.0 0.0 0.7 1.6 ... 10.8 11.6 8.0 0.28 0.23 0.26 44.393236 -73.858829 614.5990 Whiteface Mountain Base
122 WGAT 2024-04-30 01:15:00 13.9 14.4 96.1 0.0 0.0 0.0 0.3 1.3 ... 14.1 11.7 9.0 0.17 0.28 0.10 43.532408 -75.158597 442.9660 Woodgate
123 WHIT 2024-04-30 01:15:00 10.6 10.8 67.3 0.0 0.0 0.0 1.9 3.3 ... 12.1 10.6 8.5 0.40 0.48 0.45 43.485073 -73.423071 36.5638 Whitehall
124 WOLC 2024-04-30 01:15:00 11.4 11.6 84.7 0.0 0.0 0.0 1.8 2.3 ... 15.3 14.1 12.3 0.15 0.04 0.10 43.228680 -76.842610 121.2190 Wolcott
125 YORK 2024-04-30 01:15:00 14.3 14.5 80.6 0.0 0.0 0.0 1.1 1.4 ... 13.0 11.3 9.9 0.25 0.28 0.33 42.855040 -77.847760 177.9420 York

126 rows × 34 columns

Make our Pandas Dataframe geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry column, which may consist of shapes or points. The NYSM locations are points, so that’s what we’ll use to instantiate the Geometry column.

gdf = gpd.GeoDataFrame(nysm_latest,geometry=gpd.points_from_xy(nysm_latest.lon,nysm_latest.lat))
gdf
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2024-04-30 01:15:00 17.9 19.4 81.0 0.0 0.0 0.0 0.1 0.5 ... 10.2 8.9 0.52 0.45 0.44 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2024-04-30 01:15:00 18.7 19.4 68.2 0.0 0.0 0.0 0.2 1.2 ... 12.4 9.5 0.22 0.11 0.09 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2024-04-30 01:15:00 12.0 12.4 83.0 0.0 0.0 0.0 2.2 2.9 ... 11.2 9.7 0.27 0.25 0.27 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2024-04-30 01:15:00 18.7 18.6 66.8 0.0 0.0 0.0 0.8 1.5 ... 13.5 12.1 0.29 0.24 0.19 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2024-04-30 01:15:00 20.6 21.6 58.8 0.0 0.0 0.0 0.0 0.3 ... 10.3 8.6 0.32 0.35 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2024-04-30 01:15:00 4.7 5.5 89.2 0.0 0.0 0.0 0.7 1.6 ... 11.6 8.0 0.28 0.23 0.26 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
122 WGAT 2024-04-30 01:15:00 13.9 14.4 96.1 0.0 0.0 0.0 0.3 1.3 ... 11.7 9.0 0.17 0.28 0.10 43.532408 -75.158597 442.9660 Woodgate POINT (-75.15860 43.53241)
123 WHIT 2024-04-30 01:15:00 10.6 10.8 67.3 0.0 0.0 0.0 1.9 3.3 ... 10.6 8.5 0.40 0.48 0.45 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
124 WOLC 2024-04-30 01:15:00 11.4 11.6 84.7 0.0 0.0 0.0 1.8 2.3 ... 14.1 12.3 0.15 0.04 0.10 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
125 YORK 2024-04-30 01:15:00 14.3 14.5 80.6 0.0 0.0 0.0 1.1 1.4 ... 11.3 9.9 0.25 0.28 0.33 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

126 rows × 35 columns

Note that geometry appears as a new column (Series).

We can interactively explore this Dataframe as a map in the browser!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Well … we have an interactive frame … and it looks like New York State … but where is the interactive map??

We still have a little more work to do:

While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.

Note the arguments to set_crs:

  1. epsg = 4326: Assign the specified CRS

  2. inplace = True: The gdf object is updated without the need to assign a new dataframe object

  3. allow_override = True: If a CRS had previously been applied, override with the EPSG value specified.

gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2024-04-30 01:15:00 17.9 19.4 81.0 0.0 0.0 0.0 0.1 0.5 ... 10.2 8.9 0.52 0.45 0.44 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2024-04-30 01:15:00 18.7 19.4 68.2 0.0 0.0 0.0 0.2 1.2 ... 12.4 9.5 0.22 0.11 0.09 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2024-04-30 01:15:00 12.0 12.4 83.0 0.0 0.0 0.0 2.2 2.9 ... 11.2 9.7 0.27 0.25 0.27 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2024-04-30 01:15:00 18.7 18.6 66.8 0.0 0.0 0.0 0.8 1.5 ... 13.5 12.1 0.29 0.24 0.19 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2024-04-30 01:15:00 20.6 21.6 58.8 0.0 0.0 0.0 0.0 0.3 ... 10.3 8.6 0.32 0.35 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2024-04-30 01:15:00 4.7 5.5 89.2 0.0 0.0 0.0 0.7 1.6 ... 11.6 8.0 0.28 0.23 0.26 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
122 WGAT 2024-04-30 01:15:00 13.9 14.4 96.1 0.0 0.0 0.0 0.3 1.3 ... 11.7 9.0 0.17 0.28 0.10 43.532408 -75.158597 442.9660 Woodgate POINT (-75.15860 43.53241)
123 WHIT 2024-04-30 01:15:00 10.6 10.8 67.3 0.0 0.0 0.0 1.9 3.3 ... 10.6 8.5 0.40 0.48 0.45 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
124 WOLC 2024-04-30 01:15:00 11.4 11.6 84.7 0.0 0.0 0.0 1.8 2.3 ... 14.1 12.3 0.15 0.04 0.10 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
125 YORK 2024-04-30 01:15:00 14.3 14.5 80.6 0.0 0.0 0.0 1.1 1.4 ... 11.3 9.9 0.25 0.28 0.33 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

126 rows × 35 columns

Now, let’s try the explore function again!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe.

Now, let’s select just one column from the Dataframe and explore once again.

gdf.explore(column='temp_2m [degC]')
Make this Notebook Trusted to load map: File -> Trust Notebook

By default, passing in one column of numerical values will color-code each value!

Explore the most recent worldwide METARs

In this particular dataset, missing values are either -9999.00 or -9999.0. We will pass these as a list via the na_values argument in our call to read_csv
worldMetar = pd.read_csv("https://www.atmos.albany.edu/products/metarCSV/world_metar_latest.csv", sep='\s+',na_values=['9999.00','-9999.0'])
worldMetar.rename(columns = {'SLAT':'lat','SLON':'lon'},inplace=True)

Examine the dataframe

worldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 DYS 240430/0100 32.43 -99.85 545.0 27.7 10.4 33.93 1004.7 6.18 NaN 120.0 NaN
1 NUW 240430/0100 48.35 -122.65 14.0 10.6 3.3 60.60 1018.7 4.63 NaN 220.0 NaN
2 NYL 240430/0100 32.65 -114.62 65.0 33.3 1.7 13.49 1005.4 5.15 NaN 220.0 NaN
3 PALU 240430/0100 68.88 -166.13 3.0 -7.6 -9.5 86.23 1025.4 6.69 NaN 90.0 NaN
4 PATC 240430/0100 65.57 -167.92 83.0 -6.8 -7.1 97.72 1021.5 10.30 NaN 20.0 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
4216 00U 240430/0100 NaN NaN NaN 17.0 0.0 31.56 NaN 2.57 NaN 150.0 NaN
4217 10U 240430/0100 NaN NaN NaN 10.0 -7.0 29.52 NaN 9.78 12.87 310.0 NaN
4218 UBBY 240430/0100 NaN NaN NaN 15.0 11.0 76.98 NaN 4.12 NaN 100.0 NaN
4219 8V7 240430/0100 NaN NaN NaN 22.0 -3.0 18.55 NaN 2.06 NaN 150.0 NaN
4220 MZJ 240430/0100 NaN NaN NaN 31.0 -6.0 8.70 NaN 4.12 NaN 310.0 NaN

4221 rows × 13 columns

We have several stations at the end of the Dataframe whose latitudes and longitudes (and elevations) are -9999.00 and thus set to NaN when we created the Dataframe. These denote stations whose coordinates are not set in the underlying data source. We need to eliminate them in order for the geolocation to work properly.

lat, lon = worldMetar.lat, worldMetar.lon
worldMetar = worldMetar.dropna(subset=['lat','lon']).reset_index(drop=True)

Now, we can proceed with creating the Geopandas dataframe.

gdfWorldMetar = gpd.GeoDataFrame(worldMetar,geometry=gpd.points_from_xy(worldMetar.lon,worldMetar.lat))
gdfWorldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 DYS 240430/0100 32.43 -99.85 545.0 27.7 10.4 33.93 1004.7 6.18 NaN 120.0 NaN POINT (-99.85000 32.43000)
1 NUW 240430/0100 48.35 -122.65 14.0 10.6 3.3 60.60 1018.7 4.63 NaN 220.0 NaN POINT (-122.65000 48.35000)
2 NYL 240430/0100 32.65 -114.62 65.0 33.3 1.7 13.49 1005.4 5.15 NaN 220.0 NaN POINT (-114.62000 32.65000)
3 PALU 240430/0100 68.88 -166.13 3.0 -7.6 -9.5 86.23 1025.4 6.69 NaN 90.0 NaN POINT (-166.13000 68.88000)
4 PATC 240430/0100 65.57 -167.92 83.0 -6.8 -7.1 97.72 1021.5 10.30 NaN 20.0 NaN POINT (-167.92000 65.57000)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4118 GVNP 240430/0100 14.95 -23.48 70.0 23.0 18.0 73.44 NaN 11.84 NaN 20.0 NaN POINT (-23.48000 14.95000)
4119 GVSV 240430/0100 16.83 -25.07 20.0 23.0 17.0 68.94 NaN 13.38 NaN 50.0 NaN POINT (-25.07000 16.83000)
4120 OPST 240430/0100 32.53 74.37 247.0 14.0 11.0 82.12 NaN 0.00 NaN 0.0 NaN POINT (74.37000 32.53000)
4121 EDAH 240430/0100 53.87 14.15 28.0 11.0 6.0 71.26 NaN 4.12 NaN 50.0 NaN POINT (14.15000 53.87000)
4122 FYOG 240430/0100 -28.58 44.58 4.0 16.0 13.0 82.37 NaN 2.06 NaN 90.0 NaN POINT (44.58000 -28.58000)

4123 rows × 14 columns

gdfWorldMetar.set_crs(epsg=4326, inplace=True, allow_override=True)
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 DYS 240430/0100 32.43 -99.85 545.0 27.7 10.4 33.93 1004.7 6.18 NaN 120.0 NaN POINT (-99.85000 32.43000)
1 NUW 240430/0100 48.35 -122.65 14.0 10.6 3.3 60.60 1018.7 4.63 NaN 220.0 NaN POINT (-122.65000 48.35000)
2 NYL 240430/0100 32.65 -114.62 65.0 33.3 1.7 13.49 1005.4 5.15 NaN 220.0 NaN POINT (-114.62000 32.65000)
3 PALU 240430/0100 68.88 -166.13 3.0 -7.6 -9.5 86.23 1025.4 6.69 NaN 90.0 NaN POINT (-166.13000 68.88000)
4 PATC 240430/0100 65.57 -167.92 83.0 -6.8 -7.1 97.72 1021.5 10.30 NaN 20.0 NaN POINT (-167.92000 65.57000)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4118 GVNP 240430/0100 14.95 -23.48 70.0 23.0 18.0 73.44 NaN 11.84 NaN 20.0 NaN POINT (-23.48000 14.95000)
4119 GVSV 240430/0100 16.83 -25.07 20.0 23.0 17.0 68.94 NaN 13.38 NaN 50.0 NaN POINT (-25.07000 16.83000)
4120 OPST 240430/0100 32.53 74.37 247.0 14.0 11.0 82.12 NaN 0.00 NaN 0.0 NaN POINT (74.37000 32.53000)
4121 EDAH 240430/0100 53.87 14.15 28.0 11.0 6.0 71.26 NaN 4.12 NaN 50.0 NaN POINT (14.15000 53.87000)
4122 FYOG 240430/0100 -28.58 44.58 4.0 16.0 13.0 82.37 NaN 2.06 NaN 90.0 NaN POINT (44.58000 -28.58000)

4123 rows × 14 columns

gdfWorldMetar.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
Note: Most of the non-US data is not available until approximately 15 minutes past each hour. If you do not see many worldwide stations, try re-reading the data file after that point in the hour.
Exercise: Create a color-coded map from one variable in the dataset.
# Write your code here
gdfWorldMetar.explore(column='TMPC')
Make this Notebook Trusted to load map: File -> Trust Notebook
Note that the color scale is unaffected by any location whose temperature value was missing ... GeoPandas "knows" to exlucde NaN from the range of valid values.

Plot hourly precipitation

gdfWorldMetar.explore(column='P01M')
Make this Notebook Trusted to load map: File -> Trust Notebook
With few exceptions, hourly precip is only provided by US stations. For this variable, a better visualization would exclude all the missing values (in the US, a missing value denotes no precip, while a trace = 0.00)
gdfWorldMetar = gdfWorldMetar.dropna(subset=['P01M']).reset_index(drop=True)
gdfWorldMetar.explore(column='P01M')
Make this Notebook Trusted to load map: File -> Trust Notebook

References

  1. GeoPandas

  2. EPSG