Xarray 3: Datasets and Plotting
Contents

Xarray 3: Datasets and Plotting¶
Overview¶
The Xarray
Dataset
Open and select a variable from the ERA-5 dataset
Convert geopotential to height
Create a contour plot
Imports¶
import xarray as xr
import numpy as np
from metpy.units import units
import metpy.calc as mpcalc
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
Work with an Xarray Dataset
¶
An Xarray Dataset consists of one or more DataArray
s. Let’s open the same NetCDF file as in the first Xarray notebook, but as a Dataset
.
ds = xr.open_dataset('/spare11/atm533/data/2012103000_z500_era5.nc')
Let’s see what this Dataset looks like.
ds
<xarray.Dataset> Dimensions: (longitude: 1440, latitude: 721, time: 1) Coordinates: * longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8 * latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0 * time (time) datetime64[ns] 2012-10-30 Data variables: z (time, latitude, longitude) float32 ... Attributes: Conventions: CF-1.6 history: 2020-09-29 00:40:14 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...
It’s represented somewhat similarly to a DataArray, but now we have support for multiple data variables. In this case,
There is just one data variable: z (Geopotential)
The coordinate variables are longitude, latitude, and time
The data variable has three dimensions (time x latitude x longitude).
Although time is a dimension, there is only one time value on that coordinate.
The Dataset itself has attributes. The attributes of the coordinate and data variables are separately accessible.
Select a data variable from the Dataset.¶
Z = ds['z'] # Similar to selecting a Series in Pandas
We can programmatically retrieve various attributes of the data variable that we browsed interactively above.
Z
<xarray.DataArray 'z' (time: 1, latitude: 721, longitude: 1440)> [1038240 values with dtype=float32] Coordinates: * longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8 * latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0 * time (time) datetime64[ns] 2012-10-30 Attributes: units: m**2 s**-2 long_name: Geopotential standard_name: geopotential
Z.shape
(1, 721, 1440)
Z.dims
('time', 'latitude', 'longitude')
We see that Z is a 3-dimensional DataArray, with time as the first dimension.
Z.units
'm**2 s**-2'
Convert geopotential to height¶
Print out the representation of our data variable. We can see it’s a DataArray
.¶
print (Z) # Text-based, as opposed to HTML-prettified output
<xarray.DataArray 'z' (time: 1, latitude: 721, longitude: 1440)>
[1038240 values with dtype=float32]
Coordinates:
* longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
* latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
* time (time) datetime64[ns] 2012-10-30
Attributes:
units: m**2 s**-2
long_name: Geopotential
standard_name: geopotential
The units are already part of the data variable. We can use MetPy
’s units
and calc
libraries to convert geopotential to height. We’ll assign a new object to the output of the function.
HGHT = mpcalc.geopotential_to_height(Z)
Print out its representation. Note that the array values have changed, and there is a unit attached to the DataArray
.¶
HGHT
<xarray.DataArray (time: 1, latitude: 721, longitude: 1440)> <Quantity([[[5212.1777 5212.1777 5212.1777 ... 5212.1777 5212.1777 5212.1777] [5209.8657 5209.8657 5209.8823 ... 5209.8154 5209.832 5209.832 ] [5207.554 5207.5874 5207.604 ... 5207.4873 5207.504 5207.5376] ... [4974.5474 4974.5474 4974.564 ... 4974.497 4974.497 4974.514 ] [4970.8623 4970.8623 4970.8623 ... 4970.845 4970.845 4970.845 ] [4967.161 4967.161 4967.161 ... 4967.161 4967.161 4967.161 ]]], 'meter')> Coordinates: * longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8 * latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0 * time (time) datetime64[ns] 2012-10-30
Now we can make a map with contours of 500 hPa height!
Create a contour plot of gridded data¶
We will use Cartopy and Matplotlib to plot our 500 hPa height contours. Matplotlib has two contour methods:
Line contours: ax.contour
Filled contours: ax.contourf
We will draw contour lines in meters, with a contour interval of 60 m.¶
As we’ve done before, let’s first define some variables relevant to Cartopy.¶
cLon = -80
cLat = 35
lonW = -100
lonE = -60
latS = 20
latN = 50
proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)
proj_data = ccrs.PlateCarree() # Our data is lat-lon; thus its native projection is Plate Carree.
res = '50m'
Now define the range of our contour values and a contour interval. 60 m is standard for 500 hPa.
minVal = 4680
maxVal = 6060
cint = 60
cintervals = np.arange(minVal, maxVal, cint)
cintervals
array([4680, 4740, 4800, 4860, 4920, 4980, 5040, 5100, 5160, 5220, 5280,
5340, 5400, 5460, 5520, 5580, 5640, 5700, 5760, 5820, 5880, 5940,
6000])
Matplotlib’s contour methods require three arrays to be passed to them … x- and y- arrays (longitude and latitude in our case), and a 2-d array (corresponding to x- and y-) of our data variable. So we need to extract the latitude and longitude coordinate variables from our DataArray. We’ll also extract the third coordinate value, time.
lats = ds.latitude
lons = ds.longitude
times = ds.time
We’re set to make our map. After creating our figure, setting the bounds of our map domain, and adding cartographic features, we will plot the contours. We’ll assign the output of the contour method to an object, which will then be used to label the contour lines.
fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW,lonE,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))
CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f');
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 6
4 ax.add_feature(cfeature.COASTLINE.with_scale(res))
5 ax.add_feature(cfeature.STATES.with_scale(res))
----> 6 CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
7 ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f');
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/cartopy/mpl/geoaxes.py:318, in _add_transform.<locals>.wrapper(self, *args, **kwargs)
313 raise ValueError(f'Invalid transform: Spherical {func.__name__} '
314 'is not supported - consider using '
315 'PlateCarree/RotatedPole.')
317 kwargs['transform'] = transform
--> 318 return func(self, *args, **kwargs)
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/cartopy/mpl/geoaxes.py:362, in _add_transform_first.<locals>.wrapper(self, *args, **kwargs)
360 # Use the new points as the input arguments
361 args = (x, y, z) + args[3:]
--> 362 return func(self, *args, **kwargs)
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/cartopy/mpl/geoaxes.py:1614, in GeoAxes.contour(self, *args, **kwargs)
1592 @_add_transform
1593 @_add_transform_first
1594 def contour(self, *args, **kwargs):
1595 """
1596 Add the "transform" keyword to :func:`~matplotlib.pyplot.contour`.
1597
(...)
1612
1613 """
-> 1614 result = super().contour(*args, **kwargs)
1616 # We need to compute the dataLim correctly for contours.
1617 bboxes = [col.get_datalim(self.transData)
1618 for col in result.collections
1619 if col.get_paths()]
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/__init__.py:1442, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
1439 @functools.wraps(func)
1440 def inner(ax, *args, data=None, **kwargs):
1441 if data is None:
-> 1442 return func(ax, *map(sanitize_sequence, args), **kwargs)
1444 bound = new_sig.bind(ax, *args, **kwargs)
1445 auto_label = (bound.arguments.get(label_namer)
1446 or bound.kwargs.get(label_namer))
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/axes/_axes.py:6451, in Axes.contour(self, *args, **kwargs)
6442 """
6443 Plot contour lines.
6444
(...)
6448 %(contour_doc)s
6449 """
6450 kwargs['filled'] = False
-> 6451 contours = mcontour.QuadContourSet(self, *args, **kwargs)
6452 self._request_autoscale_view()
6453 return contours
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/contour.py:769, in ContourSet.__init__(self, ax, levels, filled, linewidths, linestyles, hatches, alpha, origin, extent, cmap, colors, norm, vmin, vmax, extend, antialiased, nchunk, locator, transform, negative_linestyles, *args, **kwargs)
765 if self.negative_linestyles is None:
766 self.negative_linestyles = \
767 mpl.rcParams['contour.negative_linestyle']
--> 769 kwargs = self._process_args(*args, **kwargs)
770 self._process_levels()
772 self._extend_min = self.extend in ['min', 'both']
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/contour.py:1411, in QuadContourSet._process_args(self, corner_mask, algorithm, *args, **kwargs)
1408 corner_mask = mpl.rcParams['contour.corner_mask']
1409 self._corner_mask = corner_mask
-> 1411 x, y, z = self._contour_args(args, kwargs)
1413 contour_generator = contourpy.contour_generator(
1414 x, y, z, name=self._algorithm, corner_mask=self._corner_mask,
1415 line_type=contourpy.LineType.SeparateCode,
1416 fill_type=contourpy.FillType.OuterCode,
1417 chunk_size=self.nchunk)
1419 t = self.get_transform()
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/contour.py:1450, in QuadContourSet._contour_args(self, args, kwargs)
1448 elif nargs <= 4:
1449 x, y, z_orig, *args = args
-> 1450 x, y, z = self._check_xyz(x, y, z_orig, kwargs)
1451 else:
1452 raise _api.nargs_error(fn, takes="from 1 to 4", given=nargs)
File /knight/mamba_aug23/envs/aug23_env/lib/python3.11/site-packages/matplotlib/contour.py:1475, in QuadContourSet._check_xyz(self, x, y, z, kwargs)
1472 z = ma.asarray(z)
1474 if z.ndim != 2:
-> 1475 raise TypeError(f"Input z must be 2D, not {z.ndim}D")
1476 if z.shape[0] < 2 or z.shape[1] < 2:
1477 raise TypeError(f"Input z must be at least a (2, 2) shaped array, "
1478 f"but has shape {z.shape}")
TypeError: Input z must be 2D, not 3D

Didn't work! The end of the traceback reads:
TypeError: Input z must be 2D, not 3D
In other words, our geopotential height array is not two dimensional in x and y ... recall that it also has *time* as its first dimension! Even though there is only one time value in this dimension, we need to eliminate this dimension from Z.
We’ll learn other ways to abstract out dimensions other than x and y (remember, there could be additional dimensions in our Datasets or DataArrays, such as vertical levels or ensemble members) in later notebooks, but for now, we can simply select the first (and only) element from the time dimension.
HGHT = HGHT[0,:,:] # Select the first element of the leading (time) dimension, and all lons and lats.
HGHT
<xarray.DataArray (latitude: 721, longitude: 1440)> <Quantity([[5212.1777 5212.1777 5212.1777 ... 5212.1777 5212.1777 5212.1777] [5209.8657 5209.8657 5209.8823 ... 5209.8154 5209.832 5209.832 ] [5207.554 5207.5874 5207.604 ... 5207.4873 5207.504 5207.5376] ... [4974.5474 4974.5474 4974.564 ... 4974.497 4974.497 4974.514 ] [4970.8623 4970.8623 4970.8623 ... 4970.845 4970.845 4970.845 ] [4967.161 4967.161 4967.161 ... 4967.161 4967.161 4967.161 ]], 'meter')> Coordinates: * longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8 * latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0 time datetime64[ns] 2012-10-30
Now try again.
fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW,lonE,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))
CL = ax.contour(lons,lats,HGHT,cintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f');

There we have it! There are definitely some things we can improve about this plot (such as the lack of an informative title, and the fact that our contour labels seem not to be appearing on every contour line), but we’ll get to that soon!¶
Summary¶
An Xarray
Dataset
consists of one or moreDataArrays
.MetPy can perform unit-aware calculations and conversions on Xarray objects.
Since Xarray
DataArray
s often have more than x- and y- dimensions, care must be taken to reduce dimensionality for the purposes of Matplotlib functions such ascontour
andcontourf
.
What’s Next?¶
In the next notebook, we’ll work with multiple Dataset
s.