http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png

Xarray 3: Datasets and Plotting


Overview

  1. The Xarray Dataset

  2. Open and select a variable from the ERA-5 dataset

  3. Convert geopotential to height

  4. Create a contour plot

Prerequisites

Concepts

Importance

Notes

Xarray Lessons 1-2

Necessary

  • Time to learn: 15 minutes

Imports

import xarray as xr
import numpy as np
from metpy.units import units
import metpy.calc as mpcalc
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
ERROR 1: PROJ: proj_create_from_database: Open of /knight/anaconda_aug22/envs/aug22_env/share/proj failed

Work with an Xarray Dataset

An Xarray Dataset consists of one or more DataArrays. Let’s open the same NetCDF file as in the first Xarray notebook, but as a Dataset.

ds = xr.open_dataset('/spare11/atm533/data/2012103000_z500_era5.nc')

Let’s see what this Dataset looks like.

ds
<xarray.Dataset>
Dimensions:    (longitude: 1440, latitude: 721, time: 1)
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 2012-10-30
Data variables:
    z          (time, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2020-09-29 00:40:14 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...

It’s represented somewhat similarly to a DataArray, but now we have support for multiple data variables. In this case,

  1. There is just one data variable: z (Geopotential)

  2. The coordinate variables are longitude, latitude, and time

  3. The data variable has three dimensions (time x latitude x longitude).

  4. Although time is a dimension, there is only one time value on that coordinate.

  5. The Dataset itself has attributes. The attributes of the coordinate and data variables are separately accessible.

Select a data variable from the Dataset.

Z = ds['z'] # Similar to selecting a Series in Pandas

We can programmatically retrieve various attributes of the data variable that we browsed interactively above.

Z
<xarray.DataArray 'z' (time: 1, latitude: 721, longitude: 1440)>
[1038240 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * time       (time) datetime64[ns] 2012-10-30
Attributes:
    units:          m**2 s**-2
    long_name:      Geopotential
    standard_name:  geopotential
Z.shape
(1, 721, 1440)
Z.dims
('time', 'latitude', 'longitude')

We see that Z is a 3-dimensional DataArray, with time as the first dimension.

Z.units
'm**2 s**-2'

Convert geopotential to height

Create a contour plot of gridded data

We will use Cartopy and Matplotlib to plot our 500 hPa height contours. Matplotlib has two contour methods:

  1. Line contours: ax.contour

  2. Filled contours: ax.contourf

We will draw contour lines in meters, with a contour interval of 60 m.

As we’ve done before, let’s first define some variables relevant to Cartopy.

cLon = -80
cLat = 35
lonW = -100
lonE = -60
latS = 20
latN = 50
proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)
proj_data = ccrs.PlateCarree() # Our data is lat-lon; thus its native projection is Plate Carree.
res = '50m'

Now define the range of our contour values and a contour interval. 60 m is standard for 500 hPa.

minVal = 4680
maxVal = 6060
cint = 60
cintervals = np.arange(minVal, maxVal, cint)
cintervals
array([4680, 4740, 4800, 4860, 4920, 4980, 5040, 5100, 5160, 5220, 5280,
       5340, 5400, 5460, 5520, 5580, 5640, 5700, 5760, 5820, 5880, 5940,
       6000])

Matplotlib’s contour methods require three arrays to be passed to them … x- and y- arrays (longitude and latitude in our case), and a 2-d array (corresponding to x- and y-) of our data variable. So we need to extract the latitude and longitude coordinate variables from our DataArray. We’ll also extract the third coordinate value, time.

lats = ds.latitude 
lons = ds.longitude
times = ds.time

We’re set to make our map. After creating our figure, setting the bounds of our map domain, and adding cartographic features, we will plot the contours. We’ll assign the output of the contour method to an object, which will then be used to label the contour lines.

fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW,lonE,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))
CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f');
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [15], in <cell line: 6>()
      4 ax.add_feature(cfeature.COASTLINE.with_scale(res))
      5 ax.add_feature(cfeature.STATES.with_scale(res))
----> 6 CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
      7 ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/cartopy/mpl/geoaxes.py:318, in _add_transform.<locals>.wrapper(self, *args, **kwargs)
    313     raise ValueError(f'Invalid transform: Spherical {func.__name__} '
    314                      'is not supported - consider using '
    315                      'PlateCarree/RotatedPole.')
    317 kwargs['transform'] = transform
--> 318 return func(self, *args, **kwargs)

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/cartopy/mpl/geoaxes.py:362, in _add_transform_first.<locals>.wrapper(self, *args, **kwargs)
    360     # Use the new points as the input arguments
    361     args = (x, y, z) + args[3:]
--> 362 return func(self, *args, **kwargs)

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/cartopy/mpl/geoaxes.py:1614, in GeoAxes.contour(self, *args, **kwargs)
   1592 @_add_transform
   1593 @_add_transform_first
   1594 def contour(self, *args, **kwargs):
   1595     """
   1596     Add the "transform" keyword to :func:`~matplotlib.pyplot.contour`.
   1597 
   (...)
   1612 
   1613     """
-> 1614     result = super().contour(*args, **kwargs)
   1616     # We need to compute the dataLim correctly for contours.
   1617     bboxes = [col.get_datalim(self.transData)
   1618               for col in result.collections
   1619               if col.get_paths()]

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/__init__.py:1414, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1411 @functools.wraps(func)
   1412 def inner(ax, *args, data=None, **kwargs):
   1413     if data is None:
-> 1414         return func(ax, *map(sanitize_sequence, args), **kwargs)
   1416     bound = new_sig.bind(ax, *args, **kwargs)
   1417     auto_label = (bound.arguments.get(label_namer)
   1418                   or bound.kwargs.get(label_namer))

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6303, in Axes.contour(self, *args, **kwargs)
   6294 """
   6295 Plot contour lines.
   6296 
   (...)
   6300 %(contour_doc)s
   6301 """
   6302 kwargs['filled'] = False
-> 6303 contours = mcontour.QuadContourSet(self, *args, **kwargs)
   6304 self._request_autoscale_view()
   6305 return contours

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/contour.py:812, in ContourSet.__init__(self, ax, levels, filled, linewidths, linestyles, hatches, alpha, origin, extent, cmap, colors, norm, vmin, vmax, extend, antialiased, nchunk, locator, transform, *args, **kwargs)
    808     self.origin = mpl.rcParams['image.origin']
    810 self._transform = transform
--> 812 kwargs = self._process_args(*args, **kwargs)
    813 self._process_levels()
    815 self._extend_min = self.extend in ['min', 'both']

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/contour.py:1446, in QuadContourSet._process_args(self, corner_mask, *args, **kwargs)
   1443     corner_mask = mpl.rcParams['contour.corner_mask']
   1444 self._corner_mask = corner_mask
-> 1446 x, y, z = self._contour_args(args, kwargs)
   1448 _mask = ma.getmask(z)
   1449 if _mask is ma.nomask or not _mask.any():

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/contour.py:1485, in QuadContourSet._contour_args(self, args, kwargs)
   1483     args = args[1:]
   1484 elif Nargs <= 4:
-> 1485     x, y, z = self._check_xyz(args[:3], kwargs)
   1486     args = args[3:]
   1487 else:

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/matplotlib/contour.py:1513, in QuadContourSet._check_xyz(self, args, kwargs)
   1510 z = ma.asarray(args[2], dtype=np.float64)
   1512 if z.ndim != 2:
-> 1513     raise TypeError(f"Input z must be 2D, not {z.ndim}D")
   1514 if z.shape[0] < 2 or z.shape[1] < 2:
   1515     raise TypeError(f"Input z must be at least a (2, 2) shaped array, "
   1516                     f"but has shape {z.shape}")

TypeError: Input z must be 2D, not 3D
../../_images/03_Xarray_Plotting_38_1.png

Didn't work! The end of the traceback reads:

TypeError: Input z must be 2D, not 3D

In other words, our geopotential height array is not two dimensional in x and y ... recall that it also has *time* as its first dimension! Even though there is only one time value in this dimension, we need to eliminate this dimension from Z.

We’ll learn other ways to abstract out dimensions other than x and y (remember, there could be additional dimensions in our Datasets or DataArrays, such as vertical levels or ensemble members) in later notebooks, but for now, we can simply select the first (and only) element from the time dimension.

HGHT = HGHT[0,:,:] # Select the first element of the leading (time) dimension, and all lons and lats.
HGHT
<xarray.DataArray (latitude: 721, longitude: 1440)>
<Quantity([[5212.1777 5212.1777 5212.1777 ... 5212.1777 5212.1777 5212.1777]
 [5209.8657 5209.8657 5209.8823 ... 5209.8154 5209.832  5209.832 ]
 [5207.554  5207.5874 5207.604  ... 5207.4873 5207.504  5207.5376]
 ...
 [4974.5474 4974.5474 4974.564  ... 4974.497  4974.497  4974.514 ]
 [4970.8623 4970.8623 4970.8623 ... 4970.845  4970.845  4970.845 ]
 [4967.161  4967.161  4967.161  ... 4967.161  4967.161  4967.161 ]], 'meter')>
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
    time       datetime64[ns] 2012-10-30

Now try again.

fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW,lonE,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))
CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f');
../../_images/03_Xarray_Plotting_45_0.png

There we have it! There are definitely some things we can improve about this plot (such as the lack of an informative title, and the fact that our contour labels seem not to be appearing on every contour line), but we’ll get to that soon!


Summary

  • An Xarray Dataset consists of one or more DataArrays.

  • MetPy can perform unit-aware calculations and conversions on Xarray objects.

  • Since Xarray DataArrays often have more than x- and y- dimensions, care must be taken to reduce dimensionality for the purposes of Matplotlib functions such as contour and contourf.

What’s Next?

In the next notebook, we’ll work with multiple Datasets.