Interactive data visualization in Python: Geopandas
Contents
Interactive data visualization in Python: Geopandas¶
Prerequisites¶
Concepts |
Importance |
Notes |
---|---|---|
Required |
Projections and Features |
|
Required |
Tabular Datasets |
Time to learn: 20 minutes
All of the graphics we have generated so far in the class have been static. In other words, they exist “as-is” … there is no way to interact with them. While this is fine, and even preferable, for traditional publication figures and websites, it would be nice to be able to produce dynamic figures … which one can zoom into/out of, pan around … similar to, say, Google Maps.
We have previously displayed real-time NYS Mesonet observations on a static map, using Pandas, Cartopy, and Matplotlib. Now, let’s make an interactive map … for that, we will leverage the Geopandas Python package.
Imports¶
import matplotlib.pyplot as plt
import pandas as pd
from cartopy import crs as ccrs
from cartopy import feature as cfeature
import geopandas as gpd
Read in the most recent set of NYSM observations¶
nysm_latest = pd.read_csv('https://www.atmos.albany.edu/products/nysm/nysm_latest.csv')
nysm_latest
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | lat | lon | elevation | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2024-04-30 01:15:00 | 17.9 | 19.4 | 81.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.5 | ... | 13.7 | 10.2 | 8.9 | 0.52 | 0.45 | 0.44 | 42.040360 | -77.237260 | 507.6140 | Addison |
1 | ANDE | 2024-04-30 01:15:00 | 18.7 | 19.4 | 68.2 | 0.0 | 0.0 | 0.0 | 0.2 | 1.2 | ... | 16.9 | 12.4 | 9.5 | 0.22 | 0.11 | 0.09 | 42.182270 | -74.801390 | 518.2820 | Andes |
2 | BATA | 2024-04-30 01:15:00 | 12.0 | 12.4 | 83.0 | 0.0 | 0.0 | 0.0 | 2.2 | 2.9 | ... | 13.5 | 11.2 | 9.7 | 0.27 | 0.25 | 0.27 | 43.019940 | -78.135660 | 276.1200 | Batavia |
3 | BEAC | 2024-04-30 01:15:00 | 18.7 | 18.6 | 66.8 | 0.0 | 0.0 | 0.0 | 0.8 | 1.5 | ... | 16.3 | 13.5 | 12.1 | 0.29 | 0.24 | 0.19 | 41.528750 | -73.945270 | 90.1598 | Beacon |
4 | BELD | 2024-04-30 01:15:00 | 20.6 | 21.6 | 58.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | ... | 12.4 | 10.3 | 8.6 | 0.32 | 0.35 | 0.40 | 42.223220 | -75.668520 | 470.3700 | Belden |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2024-04-30 01:15:00 | 4.7 | 5.5 | 89.2 | 0.0 | 0.0 | 0.0 | 0.7 | 1.6 | ... | 10.8 | 11.6 | 8.0 | 0.28 | 0.23 | 0.26 | 44.393236 | -73.858829 | 614.5990 | Whiteface Mountain Base |
122 | WGAT | 2024-04-30 01:15:00 | 13.9 | 14.4 | 96.1 | 0.0 | 0.0 | 0.0 | 0.3 | 1.3 | ... | 14.1 | 11.7 | 9.0 | 0.17 | 0.28 | 0.10 | 43.532408 | -75.158597 | 442.9660 | Woodgate |
123 | WHIT | 2024-04-30 01:15:00 | 10.6 | 10.8 | 67.3 | 0.0 | 0.0 | 0.0 | 1.9 | 3.3 | ... | 12.1 | 10.6 | 8.5 | 0.40 | 0.48 | 0.45 | 43.485073 | -73.423071 | 36.5638 | Whitehall |
124 | WOLC | 2024-04-30 01:15:00 | 11.4 | 11.6 | 84.7 | 0.0 | 0.0 | 0.0 | 1.8 | 2.3 | ... | 15.3 | 14.1 | 12.3 | 0.15 | 0.04 | 0.10 | 43.228680 | -76.842610 | 121.2190 | Wolcott |
125 | YORK | 2024-04-30 01:15:00 | 14.3 | 14.5 | 80.6 | 0.0 | 0.0 | 0.0 | 1.1 | 1.4 | ... | 13.0 | 11.3 | 9.9 | 0.25 | 0.28 | 0.33 | 42.855040 | -77.847760 | 177.9420 | York |
126 rows × 34 columns
Make our Pandas Dataframe
geo-aware. To do this, we create a Geopandas Dataframe. It adds a Geometry
column, which may consist of shapes or points. The NYSM locations are points, so that’s what we’ll use to instantiate the Geometry column.
gdf = gpd.GeoDataFrame(nysm_latest,geometry=gpd.points_from_xy(nysm_latest.lon,nysm_latest.lat))
gdf
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | lat | lon | elevation | name | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2024-04-30 01:15:00 | 17.9 | 19.4 | 81.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.5 | ... | 10.2 | 8.9 | 0.52 | 0.45 | 0.44 | 42.040360 | -77.237260 | 507.6140 | Addison | POINT (-77.23726 42.04036) |
1 | ANDE | 2024-04-30 01:15:00 | 18.7 | 19.4 | 68.2 | 0.0 | 0.0 | 0.0 | 0.2 | 1.2 | ... | 12.4 | 9.5 | 0.22 | 0.11 | 0.09 | 42.182270 | -74.801390 | 518.2820 | Andes | POINT (-74.80139 42.18227) |
2 | BATA | 2024-04-30 01:15:00 | 12.0 | 12.4 | 83.0 | 0.0 | 0.0 | 0.0 | 2.2 | 2.9 | ... | 11.2 | 9.7 | 0.27 | 0.25 | 0.27 | 43.019940 | -78.135660 | 276.1200 | Batavia | POINT (-78.13566 43.01994) |
3 | BEAC | 2024-04-30 01:15:00 | 18.7 | 18.6 | 66.8 | 0.0 | 0.0 | 0.0 | 0.8 | 1.5 | ... | 13.5 | 12.1 | 0.29 | 0.24 | 0.19 | 41.528750 | -73.945270 | 90.1598 | Beacon | POINT (-73.94527 41.52875) |
4 | BELD | 2024-04-30 01:15:00 | 20.6 | 21.6 | 58.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | ... | 10.3 | 8.6 | 0.32 | 0.35 | 0.40 | 42.223220 | -75.668520 | 470.3700 | Belden | POINT (-75.66852 42.22322) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2024-04-30 01:15:00 | 4.7 | 5.5 | 89.2 | 0.0 | 0.0 | 0.0 | 0.7 | 1.6 | ... | 11.6 | 8.0 | 0.28 | 0.23 | 0.26 | 44.393236 | -73.858829 | 614.5990 | Whiteface Mountain Base | POINT (-73.85883 44.39324) |
122 | WGAT | 2024-04-30 01:15:00 | 13.9 | 14.4 | 96.1 | 0.0 | 0.0 | 0.0 | 0.3 | 1.3 | ... | 11.7 | 9.0 | 0.17 | 0.28 | 0.10 | 43.532408 | -75.158597 | 442.9660 | Woodgate | POINT (-75.15860 43.53241) |
123 | WHIT | 2024-04-30 01:15:00 | 10.6 | 10.8 | 67.3 | 0.0 | 0.0 | 0.0 | 1.9 | 3.3 | ... | 10.6 | 8.5 | 0.40 | 0.48 | 0.45 | 43.485073 | -73.423071 | 36.5638 | Whitehall | POINT (-73.42307 43.48507) |
124 | WOLC | 2024-04-30 01:15:00 | 11.4 | 11.6 | 84.7 | 0.0 | 0.0 | 0.0 | 1.8 | 2.3 | ... | 14.1 | 12.3 | 0.15 | 0.04 | 0.10 | 43.228680 | -76.842610 | 121.2190 | Wolcott | POINT (-76.84261 43.22868) |
125 | YORK | 2024-04-30 01:15:00 | 14.3 | 14.5 | 80.6 | 0.0 | 0.0 | 0.0 | 1.1 | 1.4 | ... | 11.3 | 9.9 | 0.25 | 0.28 | 0.33 | 42.855040 | -77.847760 | 177.9420 | York | POINT (-77.84776 42.85504) |
126 rows × 35 columns
Note that geometry
appears as a new column (Series
).
We can interactively explore
this Dataframe as a map in the browser!
gdf.explore()
Well … we have an interactive frame … and it looks like New York State … but where is the interactive map??
We still have a little more work to do:
While the points certainly look like latitude and longitudes, we need to explicitly assign a projection to the Dataframe before we can view it on a map. One way is to assign a coordinate reference system code, via EPSG … in this case, EPSG 4326.
Note the arguments to set_crs
:
epsg = 4326
: Assign the specified CRSinplace = True
: Thegdf
object is updated without the need to assign a new dataframe objectallow_override = True
: If a CRS had previously been applied, override with the EPSG value specified.
gdf.set_crs(epsg=4326, inplace=True, allow_override=True)
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | lat | lon | elevation | name | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2024-04-30 01:15:00 | 17.9 | 19.4 | 81.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.5 | ... | 10.2 | 8.9 | 0.52 | 0.45 | 0.44 | 42.040360 | -77.237260 | 507.6140 | Addison | POINT (-77.23726 42.04036) |
1 | ANDE | 2024-04-30 01:15:00 | 18.7 | 19.4 | 68.2 | 0.0 | 0.0 | 0.0 | 0.2 | 1.2 | ... | 12.4 | 9.5 | 0.22 | 0.11 | 0.09 | 42.182270 | -74.801390 | 518.2820 | Andes | POINT (-74.80139 42.18227) |
2 | BATA | 2024-04-30 01:15:00 | 12.0 | 12.4 | 83.0 | 0.0 | 0.0 | 0.0 | 2.2 | 2.9 | ... | 11.2 | 9.7 | 0.27 | 0.25 | 0.27 | 43.019940 | -78.135660 | 276.1200 | Batavia | POINT (-78.13566 43.01994) |
3 | BEAC | 2024-04-30 01:15:00 | 18.7 | 18.6 | 66.8 | 0.0 | 0.0 | 0.0 | 0.8 | 1.5 | ... | 13.5 | 12.1 | 0.29 | 0.24 | 0.19 | 41.528750 | -73.945270 | 90.1598 | Beacon | POINT (-73.94527 41.52875) |
4 | BELD | 2024-04-30 01:15:00 | 20.6 | 21.6 | 58.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3 | ... | 10.3 | 8.6 | 0.32 | 0.35 | 0.40 | 42.223220 | -75.668520 | 470.3700 | Belden | POINT (-75.66852 42.22322) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2024-04-30 01:15:00 | 4.7 | 5.5 | 89.2 | 0.0 | 0.0 | 0.0 | 0.7 | 1.6 | ... | 11.6 | 8.0 | 0.28 | 0.23 | 0.26 | 44.393236 | -73.858829 | 614.5990 | Whiteface Mountain Base | POINT (-73.85883 44.39324) |
122 | WGAT | 2024-04-30 01:15:00 | 13.9 | 14.4 | 96.1 | 0.0 | 0.0 | 0.0 | 0.3 | 1.3 | ... | 11.7 | 9.0 | 0.17 | 0.28 | 0.10 | 43.532408 | -75.158597 | 442.9660 | Woodgate | POINT (-75.15860 43.53241) |
123 | WHIT | 2024-04-30 01:15:00 | 10.6 | 10.8 | 67.3 | 0.0 | 0.0 | 0.0 | 1.9 | 3.3 | ... | 10.6 | 8.5 | 0.40 | 0.48 | 0.45 | 43.485073 | -73.423071 | 36.5638 | Whitehall | POINT (-73.42307 43.48507) |
124 | WOLC | 2024-04-30 01:15:00 | 11.4 | 11.6 | 84.7 | 0.0 | 0.0 | 0.0 | 1.8 | 2.3 | ... | 14.1 | 12.3 | 0.15 | 0.04 | 0.10 | 43.228680 | -76.842610 | 121.2190 | Wolcott | POINT (-76.84261 43.22868) |
125 | YORK | 2024-04-30 01:15:00 | 14.3 | 14.5 | 80.6 | 0.0 | 0.0 | 0.0 | 1.1 | 1.4 | ... | 11.3 | 9.9 | 0.25 | 0.28 | 0.33 | 42.855040 | -77.847760 | 177.9420 | York | POINT (-77.84776 42.85504) |
126 rows × 35 columns
Now, let’s try the explore
function again!
gdf.explore()
We can pan, zoom, and hover over each point … hovering shows the values of all the columns in the Dataframe
.
Now, let’s select just one column from the Dataframe and explore once again.
gdf.explore(column='temp_2m [degC]')
By default, passing in one column of numerical values will color-code each value!
Explore the most recent worldwide METARs¶
na_values
argument in our call to read_csv
worldMetar = pd.read_csv("https://www.atmos.albany.edu/products/metarCSV/world_metar_latest.csv", sep='\s+',na_values=['9999.00','-9999.0'])
worldMetar.rename(columns = {'SLAT':'lat','SLON':'lon'},inplace=True)
Examine the dataframe
worldMetar
STN | YYMMDD/HHMM | lat | lon | SELV | TMPC | DWPC | RELH | PMSL | SPED | GUMS | DRCT | P01M | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DYS | 240430/0100 | 32.43 | -99.85 | 545.0 | 27.7 | 10.4 | 33.93 | 1004.7 | 6.18 | NaN | 120.0 | NaN |
1 | NUW | 240430/0100 | 48.35 | -122.65 | 14.0 | 10.6 | 3.3 | 60.60 | 1018.7 | 4.63 | NaN | 220.0 | NaN |
2 | NYL | 240430/0100 | 32.65 | -114.62 | 65.0 | 33.3 | 1.7 | 13.49 | 1005.4 | 5.15 | NaN | 220.0 | NaN |
3 | PALU | 240430/0100 | 68.88 | -166.13 | 3.0 | -7.6 | -9.5 | 86.23 | 1025.4 | 6.69 | NaN | 90.0 | NaN |
4 | PATC | 240430/0100 | 65.57 | -167.92 | 83.0 | -6.8 | -7.1 | 97.72 | 1021.5 | 10.30 | NaN | 20.0 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4216 | 00U | 240430/0100 | NaN | NaN | NaN | 17.0 | 0.0 | 31.56 | NaN | 2.57 | NaN | 150.0 | NaN |
4217 | 10U | 240430/0100 | NaN | NaN | NaN | 10.0 | -7.0 | 29.52 | NaN | 9.78 | 12.87 | 310.0 | NaN |
4218 | UBBY | 240430/0100 | NaN | NaN | NaN | 15.0 | 11.0 | 76.98 | NaN | 4.12 | NaN | 100.0 | NaN |
4219 | 8V7 | 240430/0100 | NaN | NaN | NaN | 22.0 | -3.0 | 18.55 | NaN | 2.06 | NaN | 150.0 | NaN |
4220 | MZJ | 240430/0100 | NaN | NaN | NaN | 31.0 | -6.0 | 8.70 | NaN | 4.12 | NaN | 310.0 | NaN |
4221 rows × 13 columns
We have several stations at the end of the Dataframe
whose latitudes and longitudes (and elevations) are -9999.00 and thus set to NaN when we created the Dataframe
. These denote stations whose coordinates are not set in the underlying data source. We need to eliminate them in order for the geolocation to work properly.
lat, lon = worldMetar.lat, worldMetar.lon
worldMetar = worldMetar.dropna(subset=['lat','lon']).reset_index(drop=True)
Now, we can proceed with creating the Geopandas dataframe.
gdfWorldMetar = gpd.GeoDataFrame(worldMetar,geometry=gpd.points_from_xy(worldMetar.lon,worldMetar.lat))
gdfWorldMetar
STN | YYMMDD/HHMM | lat | lon | SELV | TMPC | DWPC | RELH | PMSL | SPED | GUMS | DRCT | P01M | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DYS | 240430/0100 | 32.43 | -99.85 | 545.0 | 27.7 | 10.4 | 33.93 | 1004.7 | 6.18 | NaN | 120.0 | NaN | POINT (-99.85000 32.43000) |
1 | NUW | 240430/0100 | 48.35 | -122.65 | 14.0 | 10.6 | 3.3 | 60.60 | 1018.7 | 4.63 | NaN | 220.0 | NaN | POINT (-122.65000 48.35000) |
2 | NYL | 240430/0100 | 32.65 | -114.62 | 65.0 | 33.3 | 1.7 | 13.49 | 1005.4 | 5.15 | NaN | 220.0 | NaN | POINT (-114.62000 32.65000) |
3 | PALU | 240430/0100 | 68.88 | -166.13 | 3.0 | -7.6 | -9.5 | 86.23 | 1025.4 | 6.69 | NaN | 90.0 | NaN | POINT (-166.13000 68.88000) |
4 | PATC | 240430/0100 | 65.57 | -167.92 | 83.0 | -6.8 | -7.1 | 97.72 | 1021.5 | 10.30 | NaN | 20.0 | NaN | POINT (-167.92000 65.57000) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4118 | GVNP | 240430/0100 | 14.95 | -23.48 | 70.0 | 23.0 | 18.0 | 73.44 | NaN | 11.84 | NaN | 20.0 | NaN | POINT (-23.48000 14.95000) |
4119 | GVSV | 240430/0100 | 16.83 | -25.07 | 20.0 | 23.0 | 17.0 | 68.94 | NaN | 13.38 | NaN | 50.0 | NaN | POINT (-25.07000 16.83000) |
4120 | OPST | 240430/0100 | 32.53 | 74.37 | 247.0 | 14.0 | 11.0 | 82.12 | NaN | 0.00 | NaN | 0.0 | NaN | POINT (74.37000 32.53000) |
4121 | EDAH | 240430/0100 | 53.87 | 14.15 | 28.0 | 11.0 | 6.0 | 71.26 | NaN | 4.12 | NaN | 50.0 | NaN | POINT (14.15000 53.87000) |
4122 | FYOG | 240430/0100 | -28.58 | 44.58 | 4.0 | 16.0 | 13.0 | 82.37 | NaN | 2.06 | NaN | 90.0 | NaN | POINT (44.58000 -28.58000) |
4123 rows × 14 columns
gdfWorldMetar.set_crs(epsg=4326, inplace=True, allow_override=True)
STN | YYMMDD/HHMM | lat | lon | SELV | TMPC | DWPC | RELH | PMSL | SPED | GUMS | DRCT | P01M | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DYS | 240430/0100 | 32.43 | -99.85 | 545.0 | 27.7 | 10.4 | 33.93 | 1004.7 | 6.18 | NaN | 120.0 | NaN | POINT (-99.85000 32.43000) |
1 | NUW | 240430/0100 | 48.35 | -122.65 | 14.0 | 10.6 | 3.3 | 60.60 | 1018.7 | 4.63 | NaN | 220.0 | NaN | POINT (-122.65000 48.35000) |
2 | NYL | 240430/0100 | 32.65 | -114.62 | 65.0 | 33.3 | 1.7 | 13.49 | 1005.4 | 5.15 | NaN | 220.0 | NaN | POINT (-114.62000 32.65000) |
3 | PALU | 240430/0100 | 68.88 | -166.13 | 3.0 | -7.6 | -9.5 | 86.23 | 1025.4 | 6.69 | NaN | 90.0 | NaN | POINT (-166.13000 68.88000) |
4 | PATC | 240430/0100 | 65.57 | -167.92 | 83.0 | -6.8 | -7.1 | 97.72 | 1021.5 | 10.30 | NaN | 20.0 | NaN | POINT (-167.92000 65.57000) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4118 | GVNP | 240430/0100 | 14.95 | -23.48 | 70.0 | 23.0 | 18.0 | 73.44 | NaN | 11.84 | NaN | 20.0 | NaN | POINT (-23.48000 14.95000) |
4119 | GVSV | 240430/0100 | 16.83 | -25.07 | 20.0 | 23.0 | 17.0 | 68.94 | NaN | 13.38 | NaN | 50.0 | NaN | POINT (-25.07000 16.83000) |
4120 | OPST | 240430/0100 | 32.53 | 74.37 | 247.0 | 14.0 | 11.0 | 82.12 | NaN | 0.00 | NaN | 0.0 | NaN | POINT (74.37000 32.53000) |
4121 | EDAH | 240430/0100 | 53.87 | 14.15 | 28.0 | 11.0 | 6.0 | 71.26 | NaN | 4.12 | NaN | 50.0 | NaN | POINT (14.15000 53.87000) |
4122 | FYOG | 240430/0100 | -28.58 | 44.58 | 4.0 | 16.0 | 13.0 | 82.37 | NaN | 2.06 | NaN | 90.0 | NaN | POINT (44.58000 -28.58000) |
4123 rows × 14 columns
gdfWorldMetar.explore()
# Write your code here
gdfWorldMetar.explore(column='TMPC')
NaN
from the range of valid values.Plot hourly precipitation
gdfWorldMetar.explore(column='P01M')
gdfWorldMetar = gdfWorldMetar.dropna(subset=['P01M']).reset_index(drop=True)
gdfWorldMetar.explore(column='P01M')