Interactive data visualization in Python: Geopandas


Overview

Within this notebook, we will cover:

  1. Browser-based interactive maps of point-based data

  2. Geopandas

Prerequisites

Concepts

Importance

Notes

Cartopy Intro

Required

Projections and Features

Pandas

Required

Tabular Datasets

  • Time to learn: 20 minutes


All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.


We have previously displayed real-time NYS Mesonet observations on a static map, using Pandas, Cartopy, and Matplotlib. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.

Imports

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd
ERROR 1: PROJ: proj_create_from_database: Open of /knight/anaconda_aug22/envs/aug22_env/share/proj failed

Read in the most recent set of NYSM observations

nysm_latest = pd.read_csv('https://www.atmos.albany.edu/products/nysm/nysm_latest.csv')
nysm_latest
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name
0 ADDI 2023-04-21 00:50:00 12.4 14.8 61.1 0.0 0.0 0.0 0.2 0.6 ... 11.7 9.6 8.9 0.37 0.37 0.43 42.040360 -77.237260 507.6140 Addison
1 ANDE 2023-04-21 00:50:00 10.6 11.1 65.0 0.0 0.0 0.0 0.3 0.6 ... 10.0 8.5 8.2 0.23 0.14 0.14 42.182270 -74.801390 518.2820 Andes
2 BATA 2023-04-21 00:50:00 7.2 7.5 64.8 0.0 0.0 0.0 2.7 4.1 ... 10.6 9.0 8.6 0.23 0.24 0.26 43.019940 -78.135660 276.1200 Batavia
3 BEAC 2023-04-21 00:50:00 15.6 15.8 40.3 0.0 0.0 0.0 1.1 1.6 ... 11.4 10.3 10.2 0.23 0.22 0.19 41.528750 -73.945270 90.1598 Beacon
4 BELD 2023-04-21 00:50:00 14.2 15.1 44.9 0.0 0.0 0.0 0.1 0.8 ... 9.5 8.5 8.2 0.31 0.37 0.40 42.223220 -75.668520 470.3700 Belden
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-04-21 00:50:00 2.8 2.7 66.7 0.0 0.0 0.0 0.4 0.7 ... 9.6 7.3 6.9 0.25 0.22 0.25 44.393236 -73.858829 614.5990 Whiteface Mountain Base
122 WGAT 2023-04-21 00:50:00 6.5 7.3 71.9 0.0 0.0 0.0 0.0 0.0 ... 10.2 8.2 6.7 0.15 0.27 0.09 43.532408 -75.158597 442.9660 Woodgate
123 WHIT 2023-04-21 00:50:00 8.8 9.0 52.5 0.0 0.0 0.0 3.0 4.9 ... 10.9 9.2 8.4 0.46 0.52 0.50 43.485073 -73.423071 36.5638 Whitehall
124 WOLC 2023-04-21 00:50:00 8.5 8.6 59.4 0.0 0.0 0.0 0.6 1.3 ... 12.4 12.0 10.8 0.17 0.02 0.10 43.228680 -76.842610 121.2190 Wolcott
125 YORK 2023-04-21 00:50:00 10.9 10.8 59.0 0.0 0.0 0.0 2.5 3.9 ... 11.9 10.5 9.6 0.22 0.25 0.30 42.855040 -77.847760 177.9420 York

126 rows × 34 columns

Make our Pandas Dataframe geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry column, which may consist of shapes or points. The NYSM locations are points, so that’s what we’ll use to instantiate the Geometry column.

gdf = gpd.GeoDataFrame(nysm_latest,geometry=gpd.points_from_xy(nysm_latest.lon,nysm_latest.lat))
gdf
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2023-04-21 00:50:00 12.4 14.8 61.1 0.0 0.0 0.0 0.2 0.6 ... 9.6 8.9 0.37 0.37 0.43 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2023-04-21 00:50:00 10.6 11.1 65.0 0.0 0.0 0.0 0.3 0.6 ... 8.5 8.2 0.23 0.14 0.14 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2023-04-21 00:50:00 7.2 7.5 64.8 0.0 0.0 0.0 2.7 4.1 ... 9.0 8.6 0.23 0.24 0.26 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2023-04-21 00:50:00 15.6 15.8 40.3 0.0 0.0 0.0 1.1 1.6 ... 10.3 10.2 0.23 0.22 0.19 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2023-04-21 00:50:00 14.2 15.1 44.9 0.0 0.0 0.0 0.1 0.8 ... 8.5 8.2 0.31 0.37 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-04-21 00:50:00 2.8 2.7 66.7 0.0 0.0 0.0 0.4 0.7 ... 7.3 6.9 0.25 0.22 0.25 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
122 WGAT 2023-04-21 00:50:00 6.5 7.3 71.9 0.0 0.0 0.0 0.0 0.0 ... 8.2 6.7 0.15 0.27 0.09 43.532408 -75.158597 442.9660 Woodgate POINT (-75.15860 43.53241)
123 WHIT 2023-04-21 00:50:00 8.8 9.0 52.5 0.0 0.0 0.0 3.0 4.9 ... 9.2 8.4 0.46 0.52 0.50 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
124 WOLC 2023-04-21 00:50:00 8.5 8.6 59.4 0.0 0.0 0.0 0.6 1.3 ... 12.0 10.8 0.17 0.02 0.10 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
125 YORK 2023-04-21 00:50:00 10.9 10.8 59.0 0.0 0.0 0.0 2.5 3.9 ... 10.5 9.6 0.22 0.25 0.30 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

126 rows × 35 columns

Note that geometry appears as a new column (Series).

We can interactively explore this Dataframe as a map in the browser!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Well … we have an interactive frame … and it looks like New York State … but where is the interactive map??

We still have a little more work to do:

While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.

Note the arguments to set_crs:

  1. epsg = 4326: Assign the specified CRS

  2. inplace = True: The gdf object is updated without the need to assign a new dataframe object

  3. allow_override = True: If a CRS had previously been applied, override with the EPSG value specified.

gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name geometry
0 ADDI 2023-04-21 00:50:00 12.4 14.8 61.1 0.0 0.0 0.0 0.2 0.6 ... 9.6 8.9 0.37 0.37 0.43 42.040360 -77.237260 507.6140 Addison POINT (-77.23726 42.04036)
1 ANDE 2023-04-21 00:50:00 10.6 11.1 65.0 0.0 0.0 0.0 0.3 0.6 ... 8.5 8.2 0.23 0.14 0.14 42.182270 -74.801390 518.2820 Andes POINT (-74.80139 42.18227)
2 BATA 2023-04-21 00:50:00 7.2 7.5 64.8 0.0 0.0 0.0 2.7 4.1 ... 9.0 8.6 0.23 0.24 0.26 43.019940 -78.135660 276.1200 Batavia POINT (-78.13566 43.01994)
3 BEAC 2023-04-21 00:50:00 15.6 15.8 40.3 0.0 0.0 0.0 1.1 1.6 ... 10.3 10.2 0.23 0.22 0.19 41.528750 -73.945270 90.1598 Beacon POINT (-73.94527 41.52875)
4 BELD 2023-04-21 00:50:00 14.2 15.1 44.9 0.0 0.0 0.0 0.1 0.8 ... 8.5 8.2 0.31 0.37 0.40 42.223220 -75.668520 470.3700 Belden POINT (-75.66852 42.22322)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-04-21 00:50:00 2.8 2.7 66.7 0.0 0.0 0.0 0.4 0.7 ... 7.3 6.9 0.25 0.22 0.25 44.393236 -73.858829 614.5990 Whiteface Mountain Base POINT (-73.85883 44.39324)
122 WGAT 2023-04-21 00:50:00 6.5 7.3 71.9 0.0 0.0 0.0 0.0 0.0 ... 8.2 6.7 0.15 0.27 0.09 43.532408 -75.158597 442.9660 Woodgate POINT (-75.15860 43.53241)
123 WHIT 2023-04-21 00:50:00 8.8 9.0 52.5 0.0 0.0 0.0 3.0 4.9 ... 9.2 8.4 0.46 0.52 0.50 43.485073 -73.423071 36.5638 Whitehall POINT (-73.42307 43.48507)
124 WOLC 2023-04-21 00:50:00 8.5 8.6 59.4 0.0 0.0 0.0 0.6 1.3 ... 12.0 10.8 0.17 0.02 0.10 43.228680 -76.842610 121.2190 Wolcott POINT (-76.84261 43.22868)
125 YORK 2023-04-21 00:50:00 10.9 10.8 59.0 0.0 0.0 0.0 2.5 3.9 ... 10.5 9.6 0.22 0.25 0.30 42.855040 -77.847760 177.9420 York POINT (-77.84776 42.85504)

126 rows × 35 columns

Now, let’s try the explore function again!

gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe.

Now, let’s select just one column from the Dataframe and explore once again.

gdf.explore(column='temp_2m [degC]')
Make this Notebook Trusted to load map: File -> Trust Notebook

By default, passing in one column of numerical values will color-code each value!

Explore the most recent worldwide METARs

worldMetar = pd.read_csv("https://www.atmos.albany.edu/products/metarCSV/world_metar_latest.csv", sep='\s+')
worldMetar.rename(columns = {'SLAT':'lat','SLON':'lon'},inplace=True)

Examine the dataframe

worldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 DYS 230421/0000 32.43 -99.85 545.0 23.5 0.5 21.89 1008.0 6.18 -9999.00 30.0 -9999.00
1 NUW 230421/0000 48.35 -122.65 14.0 8.9 3.3 67.92 1019.7 15.44 23.17 150.0 0.00
2 NYL 230421/0000 32.65 -114.62 65.0 30.0 -3.9 10.80 1016.4 3.09 -9999.00 330.0 -9999.00
3 PATC 230421/0000 65.57 -167.92 83.0 -5.0 -12.3 56.57 1033.3 1.54 -9999.00 260.0 -9999.00
4 PAIM 230421/0000 66.00 -153.70 389.0 3.1 -7.6 45.32 1032.8 1.54 -9999.00 -9999.0 -9999.00
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3635 1KM 230421/0000 -9999.00 -9999.00 -9999.0 -1.2 -4.5 78.26 1025.1 13.38 16.99 10.0 -9999.00
3636 1MW 230421/0000 -9999.00 -9999.00 -9999.0 9.4 -14.5 16.94 1018.1 11.84 15.96 310.0 -9999.00
3637 6S2 230421/0000 -9999.00 -9999.00 -9999.0 9.0 9.0 100.00 -9999.0 1.54 6.69 -9999.0 0.76
3638 PAN 230421/0000 -9999.00 -9999.00 -9999.0 19.0 -14.0 9.47 -9999.0 5.15 9.78 280.0 -9999.00
3639 TYL 230421/0000 -9999.00 -9999.00 -9999.0 16.0 -16.0 9.71 -9999.0 1.54 -9999.00 300.0 -9999.00

3640 rows × 13 columns

We have several stations at the end of the Dataframe whose latitudes and longitudes (and elevations) are -9999.00. These denote stations whose coordinates are not set in the underlying data source. We need to eliminate them in order for the geolocation to work properly.

lat, lon = worldMetar.lat, worldMetar.lon
worldMetar = worldMetar[(lat>-900) | (lon>-900)].reset_index(drop=True)
worldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 DYS 230421/0000 32.43 -99.85 545.0 23.5 0.5 21.89 1008.0 6.18 -9999.00 30.0 -9999.0
1 NUW 230421/0000 48.35 -122.65 14.0 8.9 3.3 67.92 1019.7 15.44 23.17 150.0 0.0
2 NYL 230421/0000 32.65 -114.62 65.0 30.0 -3.9 10.80 1016.4 3.09 -9999.00 330.0 -9999.0
3 PATC 230421/0000 65.57 -167.92 83.0 -5.0 -12.3 56.57 1033.3 1.54 -9999.00 260.0 -9999.0
4 PAIM 230421/0000 66.00 -153.70 389.0 3.1 -7.6 45.32 1032.8 1.54 -9999.00 -9999.0 -9999.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3535 W81 230421/0000 37.18 -78.10 128.0 24.2 7.9 35.27 -9999.0 2.06 -9999.00 170.0 -9999.0
3536 HRF 230421/0000 46.25 -114.12 1110.0 9.0 -9.0 27.04 -9999.0 6.69 -9999.00 290.0 -9999.0
3537 YBSU 230421/0000 -26.60 153.09 5.0 25.0 15.0 53.80 -9999.0 8.75 -9999.00 180.0 -9999.0
3538 LTBU 230421/0000 41.14 27.92 175.0 9.0 9.0 100.00 -9999.0 1.03 -9999.00 -9999.0 -9999.0
3539 ETIH 230421/0000 49.22 11.84 443.0 4.2 2.9 91.24 1014.5 0.00 -9999.00 0.0 -9999.0

3540 rows × 13 columns

Now, we can proceed with creating the Geopandas dataframe.

gdfWorldMetar = gpd.GeoDataFrame(worldMetar,geometry=gpd.points_from_xy(worldMetar.lon,worldMetar.lat))
gdfWorldMetar
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 DYS 230421/0000 32.43 -99.85 545.0 23.5 0.5 21.89 1008.0 6.18 -9999.00 30.0 -9999.0 POINT (-99.85000 32.43000)
1 NUW 230421/0000 48.35 -122.65 14.0 8.9 3.3 67.92 1019.7 15.44 23.17 150.0 0.0 POINT (-122.65000 48.35000)
2 NYL 230421/0000 32.65 -114.62 65.0 30.0 -3.9 10.80 1016.4 3.09 -9999.00 330.0 -9999.0 POINT (-114.62000 32.65000)
3 PATC 230421/0000 65.57 -167.92 83.0 -5.0 -12.3 56.57 1033.3 1.54 -9999.00 260.0 -9999.0 POINT (-167.92000 65.57000)
4 PAIM 230421/0000 66.00 -153.70 389.0 3.1 -7.6 45.32 1032.8 1.54 -9999.00 -9999.0 -9999.0 POINT (-153.70000 66.00000)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3535 W81 230421/0000 37.18 -78.10 128.0 24.2 7.9 35.27 -9999.0 2.06 -9999.00 170.0 -9999.0 POINT (-78.10000 37.18000)
3536 HRF 230421/0000 46.25 -114.12 1110.0 9.0 -9.0 27.04 -9999.0 6.69 -9999.00 290.0 -9999.0 POINT (-114.12000 46.25000)
3537 YBSU 230421/0000 -26.60 153.09 5.0 25.0 15.0 53.80 -9999.0 8.75 -9999.00 180.0 -9999.0 POINT (153.09000 -26.60000)
3538 LTBU 230421/0000 41.14 27.92 175.0 9.0 9.0 100.00 -9999.0 1.03 -9999.00 -9999.0 -9999.0 POINT (27.92000 41.14000)
3539 ETIH 230421/0000 49.22 11.84 443.0 4.2 2.9 91.24 1014.5 0.00 -9999.00 0.0 -9999.0 POINT (11.84000 49.22000)

3540 rows × 14 columns

gdfWorldMetar.set_crs(epsg=4326, inplace=True, allow_override=True)
STN YYMMDD/HHMM lat lon SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M geometry
0 DYS 230421/0000 32.43 -99.85 545.0 23.5 0.5 21.89 1008.0 6.18 -9999.00 30.0 -9999.0 POINT (-99.85000 32.43000)
1 NUW 230421/0000 48.35 -122.65 14.0 8.9 3.3 67.92 1019.7 15.44 23.17 150.0 0.0 POINT (-122.65000 48.35000)
2 NYL 230421/0000 32.65 -114.62 65.0 30.0 -3.9 10.80 1016.4 3.09 -9999.00 330.0 -9999.0 POINT (-114.62000 32.65000)
3 PATC 230421/0000 65.57 -167.92 83.0 -5.0 -12.3 56.57 1033.3 1.54 -9999.00 260.0 -9999.0 POINT (-167.92000 65.57000)
4 PAIM 230421/0000 66.00 -153.70 389.0 3.1 -7.6 45.32 1032.8 1.54 -9999.00 -9999.0 -9999.0 POINT (-153.70000 66.00000)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3535 W81 230421/0000 37.18 -78.10 128.0 24.2 7.9 35.27 -9999.0 2.06 -9999.00 170.0 -9999.0 POINT (-78.10000 37.18000)
3536 HRF 230421/0000 46.25 -114.12 1110.0 9.0 -9.0 27.04 -9999.0 6.69 -9999.00 290.0 -9999.0 POINT (-114.12000 46.25000)
3537 YBSU 230421/0000 -26.60 153.09 5.0 25.0 15.0 53.80 -9999.0 8.75 -9999.00 180.0 -9999.0 POINT (153.09000 -26.60000)
3538 LTBU 230421/0000 41.14 27.92 175.0 9.0 9.0 100.00 -9999.0 1.03 -9999.00 -9999.0 -9999.0 POINT (27.92000 41.14000)
3539 ETIH 230421/0000 49.22 11.84 443.0 4.2 2.9 91.24 1014.5 0.00 -9999.00 0.0 -9999.0 POINT (11.84000 49.22000)

3540 rows × 14 columns

gdfWorldMetar.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
Note: Most of the non-US data is not available until approximately 15 minutes past each hour. If you do not see many worldwide stations, try re-reading the data file after that point in the hour.
Exercise: Create a color-coded map from one variable in the dataset.
# Write your code here
gdfWorldMetar.explore(column='TMPC')
Make this Notebook Trusted to load map: File -> Trust Notebook
Were you surprised by your result? How might you make it look better?

Follow a similar procedure as when we eliminated rows with latitudes or longitudes that equaled -9999.0. However, let’s save some time by applying the test on the GeoPandas Dataframe, rather than on the original Pandas Dataframe.

var = gdfWorldMetar.TMPC
gdfWorldMetar = gdfWorldMetar[(var>-900)].reset_index(drop=True)
gdfWorldMetar.explore(column='TMPC')
Make this Notebook Trusted to load map: File -> Trust Notebook