Interactive visualization of worldwide METAR data: Geoviews

Overview:

  1. Create an interactive visualization of worldwide METAR data using the Holoviz ecosystem

Prerequisites

Concepts

Importance

Notes

Pandas

Necessary

Contextily

Helpful

  • Time to learn: 30 minutes


Imports

from datetime import datetime
import numpy as np
import pandas as pd
import geoviews as gv
import geoviews.feature as gf
from geoviews import opts
import geoviews.tile_sources as gts
from cartopy import crs as ccrs

Create an interactive visualization of worldwide surface meteorological (METAR) data using the Holoviz ecosystem

Holoviz is a suite of open-source Python libraries designed for interactive data analysis and visualization via the browser (including the Jupyter notebook). In this notebook, we will use the GeoViews package, which is part of Holoviz.

Another part of the Holoviz ecosystem is bokeh. Bokeh leverages Javascript in order to accomplish interactivity via the browser. GeoViews makes available Bokeh as well as Matplotlib via an extenstion.

gv.extension('bokeh', 'matplotlib')

Use Pandas to read in the file containing the most recent METAR data.

Since this file has latitude and longitude, we can pass the dataframe directly to GeoViews (i.e. no need to use Geopandas).

# First define the format and then define the function
timeFormat = "%y%m%d/%H%M"
# This function will iterate over each string in a 1-d array
# and use Pandas' implementation of strptime to convert the string into a datetime object.
parseTime = lambda x: datetime.strptime(x, timeFormat)

df = pd.read_csv('/spare11/atm533/data/world_metar_latest.csv',parse_dates=['YYMMDD/HHMM'], date_parser=parseTime, sep='\s+')
/tmp/ipykernel_975643/3289517985.py:7: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  df = pd.read_csv('/spare11/atm533/data/world_metar_latest.csv',parse_dates=['YYMMDD/HHMM'], date_parser=parseTime, sep='\s+')
df
STN YYMMDD/HHMM SLAT SLON SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 DYS 2023-10-24 18:00:00 32.43 -99.85 545.0 23.7 18.8 74.02 1007.9 6.18 -9999.00 180.0 -9999.0
1 NUW 2023-10-24 18:00:00 48.35 -122.65 14.0 9.4 5.6 77.14 1012.4 3.60 -9999.00 110.0 -9999.0
2 NYL 2023-10-24 18:00:00 32.65 -114.62 65.0 25.0 11.7 43.38 1008.8 1.54 -9999.00 70.0 -9999.0
3 PALU 2023-10-24 18:00:00 68.88 -166.13 3.0 4.6 2.3 85.02 1020.3 8.24 19.56 190.0 -9999.0
4 PAEI 2023-10-24 18:00:00 64.67 -147.10 167.0 -12.9 -14.8 85.64 1034.4 0.00 -9999.00 0.0 -9999.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
4219 MZJ 2023-10-24 18:00:00 -9999.00 -9999.00 -9999.0 25.0 10.0 38.74 -9999.0 2.57 -9999.00 160.0 -9999.0
4220 OEPS 2023-10-24 18:00:00 -9999.00 -9999.00 -9999.0 28.5 5.9 23.85 1015.8 2.57 -9999.00 140.0 -9999.0
4221 PAAD 2023-10-24 18:00:00 -9999.00 -9999.00 -9999.0 1.2 -1.7 80.99 -9999.0 9.78 13.90 240.0 -9999.0
4222 LTFO 2023-10-24 18:00:00 -9999.00 -9999.00 -9999.0 20.0 13.0 64.04 -9999.0 3.60 -9999.00 180.0 -9999.0
4223 ORBI 2023-10-24 18:00:00 -9999.00 -9999.00 -9999.0 22.0 13.0 56.63 -9999.0 3.09 -9999.00 150.0 -9999.0

4224 rows × 13 columns

In this dataset, missing values are set to -9999.0. Let’s replace any instance of this value with np.nan throughout the DataFrame.

df.replace(-9999.0,np.nan,inplace=True)
NaN = np.nan

Let’s also remove any rows whose latitudes or longitudes are missing.

df = df.query('SLAT.notnull() | SLON.notnull()')
df
STN YYMMDD/HHMM SLAT SLON SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M
0 DYS 2023-10-24 18:00:00 32.43 -99.85 545.0 23.7 18.8 74.02 1007.9 6.18 NaN 180.0 NaN
1 NUW 2023-10-24 18:00:00 48.35 -122.65 14.0 9.4 5.6 77.14 1012.4 3.60 NaN 110.0 NaN
2 NYL 2023-10-24 18:00:00 32.65 -114.62 65.0 25.0 11.7 43.38 1008.8 1.54 NaN 70.0 NaN
3 PALU 2023-10-24 18:00:00 68.88 -166.13 3.0 4.6 2.3 85.02 1020.3 8.24 19.56 190.0 NaN
4 PAEI 2023-10-24 18:00:00 64.67 -147.10 167.0 -12.9 -14.8 85.64 1034.4 0.00 NaN 0.0 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
4139 GVSV 2023-10-24 18:00:00 16.83 -25.07 20.0 29.0 20.0 58.32 NaN 9.27 NaN 40.0 NaN
4140 OPST 2023-10-24 18:00:00 32.53 74.37 247.0 21.0 16.0 73.09 NaN 0.00 NaN 0.0 NaN
4141 MHGS 2023-10-24 18:00:00 14.57 -88.60 913.0 28.0 22.0 69.90 NaN 3.09 NaN 360.0 NaN
4142 EDAH 2023-10-24 18:00:00 53.87 14.15 28.0 11.0 11.0 100.00 NaN 3.60 NaN 110.0 NaN
4143 QAZ 2023-10-24 18:00:00 16.98 7.98 492.0 32.0 -4.0 9.56 1012.1 3.09 NaN 100.0 NaN

4144 rows × 13 columns

df.TMPC.describe()
count    4058.000000
mean       15.996057
std        10.373377
min       -23.300000
25%        10.000000
50%        18.400000
75%        23.900000
max        40.000000
Name: TMPC, dtype: float64

Create a set of GeoViews Points objects.

We pass it three arguments:

  1. The Pandas dataframe

  2. A list containing the lons and lats

  3. A list containing the columns we want to include

df_points = gv.Points(df, ['SLON','SLAT'],['STN','TMPC','PMSL','DWPC','SPED','P01M'])

Visualize the GeoViews Points object

df_points

Notice that the x and y-axes correspond to the range of lons and lats in the dataframe. Also notice the Toolbar to the right of the plot. You can mouse over each tool to see its function. By default, it is in Pan mode … click and drag to move around.

Next, activate the Box Zoom tool (the single magnifying glass). Click and drag from upper-left to lower-right to create a box. The plot will automatically adjust to the new zoomed-in view.

If you have a mouse, you can also try the Wheel Zoom tool, which is just below the Box Zoom tool.

The Reset tool (i.e., the bottom-most icon) resets the plot to the full domain.

Geo-reference the image via a background raster image.

The Holoviz suite uses the * to add layers to the same plot. Here, we first specify the Open Street Maps tile source. Then, we add our Points dataframe to it. We also specify options that accomplish the following:

  1. Specify the width and height of the plot

  2. Set the size of the points

  3. Color each point by a variable: in this case, 2 m temperature in Celsius

  4. Add a colorbar

  5. Add the hover tool to the Toolbar

(gv.tile_sources.OSM * df_points).opts(
    opts.Points(frame_width=800, frame_height=600, size=8, color='TMPC',tools=['hover']))

Note that hover tool is active and appears as the tool icon just below the reset tool. Mouse over any of the points and you will see a readout of the data from all the columns that we included in the creation of the df_points GeoViews object.

Make a Labels plot

Next, let’s plot a map that instead of the point locations, outputs the values of a particular column from the dataset. To do this, we create a GeoViews Labels object. It takes arguments similar to Points.

df_labels = gv.Labels(df, ['SLON','SLAT'],['TMPC'])

Plot just the labels.

df_labels

Now layer on the background map tile, as we did for Points. Specify the color and size of the text labels.

figure = (gv.tile_sources.CartoLight * df_labels).opts(
    opts.Labels(frame_height=800,frame_width=800,text_color='purple',text_font_size='10pt'))
figure
Note: If you use the panning tool such that you go beyond the bounds of the map, the data does not reappear.
Things to try:
  1. Combine the worldwide METAR and NYSM Dataframes and visualize the combined Dataframe.
  2. Plot a variable other than temperature.

Summary

  • The Holoviz set of Python packages provide interactive visualization of datasets in the Jupyter notebook.

  • Data from Pandas dataframes can be easily displayed and geo-referenced by Geoviews Point and Label objects.

  • Similar to Contextily, Geoviews allows one to add background tile-served maps as an additional layer.

What’s Next?

In the next notebook, we will use GeoViews to interactively browse gridded datasets.

References:

  1. Holoviz

  2. Geoviews