http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png

Xarray 4: Remote access

Overview

  1. Work with a multiple-file Xarray Dataset hosted on NCAR’s Research Data Archive

  2. Subset the Dataset along its dimensions

  3. Perform unit conversion

  4. Create a well-labeled multi-parameter contour plot of gridded ERA-5 reanalysis data

Prerequisites

Concepts

Importance

Notes

Xarray Lessons 1-3

Necessary

  • Time to learn: 30 minutes

Imports

import xarray as xr
import pandas as pd
import numpy as np
from datetime import datetime as dt
from metpy.units import units
from metpy import calc as mpcalc
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import requests
from pathlib import Path

Work with the ERA-5 Dataset hosted on NCAR’s Remote Data Archive

Datasets are ever-growing in terms of size and sheer number. It is not “remotely” possible to download every dataset one might want to use to your computer! Fortunately, although Xarray provides easy access to files stored “locally” on disk, it can also give you access to files stored on remotely-located servers via different types of data access protocols. One such protocol is called OpenDAP. It makes files available via a specially-crafted web link, or URL. This is how we will access ERA-5 datasets stored on the ERA-5 RDA’s THREDDS server data catalog.

Besides OpenDAP, datasets are increasingly served via cloud providers, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. We will explore how Xarray can access these datasets in a future lesson.

Construct OpenDAP URL’s for the ERA-5 datasets served by the RDA server.

Note: RDA follows a particular convention in the directory and file names for the ERA-5 dataset. Single-level fields are stored in one monthly file, with a naming convention that includes the string sfc. Fields on multiple levels, such as pressure (isobaric) levels, are stored in daily files. For isobaric levels, the naming convention includes the string pl. In this case, we will retrieve sea-level pressure (SLP) and geopotential on isobaric surfaces, so we will be both of these two strings in our URLs.
ds_urlp = 'https://thredds.rda.ucar.edu/thredds/dodsC/files/g/ds633.0/e5.oper.an.sfc/201210/e5.oper.an.sfc.128_151_msl.ll025sc.2012100100_2012103123.nc'
ds_urlz = 'https://thredds.rda.ucar.edu/thredds/dodsC/files/g/ds633.0/e5.oper.an.pl/201210/e5.oper.an.pl.128_129_z.ll025sc.2012103000_2012103023.nc'

Now open the two files as separate Datasets. (avoid using open_mfdataset as subsetting will cause failures down the line)

Note: As we will see in future lessons, Xarray provides a method, open_mfdataset, that allows for one Dataset to be created from multiple sources. However, this may lead to strange errors after we subset the data later in this notebook. This problem seems specific to using open_mfdataset on file served by THREDDS.
dsSLP = xr.open_dataset(ds_urlp)
dsZ = xr.open_dataset(ds_urlz)

Let’s see what one of these Datasets looks like.

dsZ
<xarray.Dataset>
Dimensions:    (latitude: 721, level: 37, longitude: 1440, time: 24)
Coordinates:
  * latitude   (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * level      (level) float64 1.0 2.0 3.0 5.0 7.0 ... 925.0 950.0 975.0 1e+03
  * longitude  (longitude) float64 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
  * time       (time) datetime64[ns] 2012-10-30 ... 2012-10-30T23:00:00
Data variables:
    utc_date   (time) int32 ...
    Z          (time, level, latitude, longitude) float32 ...
Attributes:
    _NCProperties:                   version=1|netcdflibversion=4.6.1|hdf5lib...
    DATA_SOURCE:                     ECMWF: https://cds.climate.copernicus.eu...
    NETCDF_CONVERSION:               CISL RDA: Conversion from ECMWF GRIB 1 d...
    NETCDF_VERSION:                  4.6.1
    CONVERSION_PLATFORM:             Linux casper21 3.10.0-693.21.1.el7.x86_6...
    CONVERSION_DATE:                 Mon Aug 26 17:48:10 MDT 2019
    Conventions:                     CF-1.6
    NETCDF_COMPRESSION:              NCO: Precision-preserving compression to...
    history:                         Mon Aug 26 17:48:45 2019: ncks -4 --ppc ...
    NCO:                             netCDF Operators version 4.7.4 (http://n...
    DODS_EXTRA.Unlimited_Dimension:  time
dsSLP
<xarray.Dataset>
Dimensions:    (latitude: 721, longitude: 1440, time: 744)
Coordinates:
  * latitude   (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float64 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
  * time       (time) datetime64[ns] 2012-10-01 ... 2012-10-31T23:00:00
Data variables:
    utc_date   (time) int32 ...
    MSL        (time, latitude, longitude) float32 ...
Attributes:
    _NCProperties:                   version=1|netcdflibversion=4.6.1|hdf5lib...
    DATA_SOURCE:                     ECMWF: https://cds.climate.copernicus.eu...
    NETCDF_CONVERSION:               CISL RDA: Conversion from ECMWF GRIB1 da...
    NETCDF_VERSION:                  4.6.1
    CONVERSION_PLATFORM:             Linux casper21 3.10.0-693.21.1.el7.x86_6...
    CONVERSION_DATE:                 Mon Aug 26 16:01:50 MDT 2019
    Conventions:                     CF-1.6
    NETCDF_COMPRESSION:              NCO: Precision-preserving compression to...
    history:                         Mon Aug 26 16:02:17 2019: ncks -4 --ppc ...
    NCO:                             netCDF Operators version 4.7.4 (http://n...
    DODS_EXTRA.Unlimited_Dimension:  time
dsSize = dsZ.nbytes / 1e9
print ("Size of dataset = %.3f GB" % dsSize)
Size of dataset = 3.688 GB
That's a fairly large dataset! However, Xarray uses what's called lazy loading, in that the actual data transfer only occurs when we actually need the data to perform computations or visualizations. By subsetting the data so we only get what we want, the actual data transfer will be much smaller.

Subset the Dataset along its dimensions

We noticed in the previous notebook that our contour labels were not appearing with every contour line. This is because we passed the entire horizontal extent (all latitudes and longitudes) to the ax.contour method. Since our intent is to plot only over a regional subset, we will use the sel method on the latitude and longitude dimensions as well as time and isobaric surface.

We’ll also use Datetime and string methods to more dynamically assign various dimensional specifications, as well as aid in figure-labeling later on.

# Areal extent
lonW = -100
lonE = -60
latS = 20
latN = 50
latRange = np.arange(latS-5,latN+5,.25) # expand the data range a bit beyond the plot range
lonRange = np.arange((lonW-5+360),(lonE+5+360),.25) # Need to match longitude values to those of the coordinate variable

# Vertical level specificaton
vlevel = 500
levelStr = str(vlevel)

# Date/Time specification
Year = 2012
Month = 10
Day = 30
Hour = 0
Minute = 0
dateTime = dt(Year,Month,Day, Hour, Minute)
timeStr = dateTime.strftime("%Y-%m-%d %H%M UTC")

# Data variable selection

SLP = dsSLP['MSL'].sel(time=dateTime,latitude=latRange,longitude=lonRange)
Z = dsZ['Z'].sel(time=dateTime,latitude=latRange,longitude=lonRange,level=vlevel)

Let’s look at some of the attributes

Z.shape
(160, 200)
Z.dims
('latitude', 'longitude')
Z.units
'm**2 s**-2'
SLP.shape
(160, 200)
SLP.dims
('latitude', 'longitude')
SLP.units
'Pa'

Define our subsetted coordinate arrays of lat and lon. Pull them from any of the DataArrays. We’ll need to pass these into the contouring functions later on.

lats = SLP.latitude
lons = SLP.longitude
lats
<xarray.DataArray 'latitude' (latitude: 160)>
array([15.  , 15.25, 15.5 , 15.75, 16.  , 16.25, 16.5 , 16.75, 17.  , 17.25,
       17.5 , 17.75, 18.  , 18.25, 18.5 , 18.75, 19.  , 19.25, 19.5 , 19.75,
       20.  , 20.25, 20.5 , 20.75, 21.  , 21.25, 21.5 , 21.75, 22.  , 22.25,
       22.5 , 22.75, 23.  , 23.25, 23.5 , 23.75, 24.  , 24.25, 24.5 , 24.75,
       25.  , 25.25, 25.5 , 25.75, 26.  , 26.25, 26.5 , 26.75, 27.  , 27.25,
       27.5 , 27.75, 28.  , 28.25, 28.5 , 28.75, 29.  , 29.25, 29.5 , 29.75,
       30.  , 30.25, 30.5 , 30.75, 31.  , 31.25, 31.5 , 31.75, 32.  , 32.25,
       32.5 , 32.75, 33.  , 33.25, 33.5 , 33.75, 34.  , 34.25, 34.5 , 34.75,
       35.  , 35.25, 35.5 , 35.75, 36.  , 36.25, 36.5 , 36.75, 37.  , 37.25,
       37.5 , 37.75, 38.  , 38.25, 38.5 , 38.75, 39.  , 39.25, 39.5 , 39.75,
       40.  , 40.25, 40.5 , 40.75, 41.  , 41.25, 41.5 , 41.75, 42.  , 42.25,
       42.5 , 42.75, 43.  , 43.25, 43.5 , 43.75, 44.  , 44.25, 44.5 , 44.75,
       45.  , 45.25, 45.5 , 45.75, 46.  , 46.25, 46.5 , 46.75, 47.  , 47.25,
       47.5 , 47.75, 48.  , 48.25, 48.5 , 48.75, 49.  , 49.25, 49.5 , 49.75,
       50.  , 50.25, 50.5 , 50.75, 51.  , 51.25, 51.5 , 51.75, 52.  , 52.25,
       52.5 , 52.75, 53.  , 53.25, 53.5 , 53.75, 54.  , 54.25, 54.5 , 54.75])
Coordinates:
  * latitude  (latitude) float64 15.0 15.25 15.5 15.75 ... 54.0 54.25 54.5 54.75
    time      datetime64[ns] 2012-10-30
Attributes:
    long_name:    latitude
    short_name:   lat
    units:        degrees_north
    qualifier:    Gaussian
    _ChunkSizes:  721
lons
<xarray.DataArray 'longitude' (longitude: 200)>
array([255.  , 255.25, 255.5 , 255.75, 256.  , 256.25, 256.5 , 256.75, 257.  ,
       257.25, 257.5 , 257.75, 258.  , 258.25, 258.5 , 258.75, 259.  , 259.25,
       259.5 , 259.75, 260.  , 260.25, 260.5 , 260.75, 261.  , 261.25, 261.5 ,
       261.75, 262.  , 262.25, 262.5 , 262.75, 263.  , 263.25, 263.5 , 263.75,
       264.  , 264.25, 264.5 , 264.75, 265.  , 265.25, 265.5 , 265.75, 266.  ,
       266.25, 266.5 , 266.75, 267.  , 267.25, 267.5 , 267.75, 268.  , 268.25,
       268.5 , 268.75, 269.  , 269.25, 269.5 , 269.75, 270.  , 270.25, 270.5 ,
       270.75, 271.  , 271.25, 271.5 , 271.75, 272.  , 272.25, 272.5 , 272.75,
       273.  , 273.25, 273.5 , 273.75, 274.  , 274.25, 274.5 , 274.75, 275.  ,
       275.25, 275.5 , 275.75, 276.  , 276.25, 276.5 , 276.75, 277.  , 277.25,
       277.5 , 277.75, 278.  , 278.25, 278.5 , 278.75, 279.  , 279.25, 279.5 ,
       279.75, 280.  , 280.25, 280.5 , 280.75, 281.  , 281.25, 281.5 , 281.75,
       282.  , 282.25, 282.5 , 282.75, 283.  , 283.25, 283.5 , 283.75, 284.  ,
       284.25, 284.5 , 284.75, 285.  , 285.25, 285.5 , 285.75, 286.  , 286.25,
       286.5 , 286.75, 287.  , 287.25, 287.5 , 287.75, 288.  , 288.25, 288.5 ,
       288.75, 289.  , 289.25, 289.5 , 289.75, 290.  , 290.25, 290.5 , 290.75,
       291.  , 291.25, 291.5 , 291.75, 292.  , 292.25, 292.5 , 292.75, 293.  ,
       293.25, 293.5 , 293.75, 294.  , 294.25, 294.5 , 294.75, 295.  , 295.25,
       295.5 , 295.75, 296.  , 296.25, 296.5 , 296.75, 297.  , 297.25, 297.5 ,
       297.75, 298.  , 298.25, 298.5 , 298.75, 299.  , 299.25, 299.5 , 299.75,
       300.  , 300.25, 300.5 , 300.75, 301.  , 301.25, 301.5 , 301.75, 302.  ,
       302.25, 302.5 , 302.75, 303.  , 303.25, 303.5 , 303.75, 304.  , 304.25,
       304.5 , 304.75])
Coordinates:
  * longitude  (longitude) float64 255.0 255.2 255.5 255.8 ... 304.2 304.5 304.8
    time       datetime64[ns] 2012-10-30
Attributes:
    long_name:    longitude
    short_name:   lon
    units:        degrees_east
    _ChunkSizes:  1440
SLP
<xarray.DataArray 'MSL' (latitude: 160, longitude: 200)>
[32000 values with dtype=float32]
Coordinates:
  * latitude   (latitude) float64 15.0 15.25 15.5 15.75 ... 54.25 54.5 54.75
  * longitude  (longitude) float64 255.0 255.2 255.5 255.8 ... 304.2 304.5 304.8
    time       datetime64[ns] 2012-10-30
Attributes: (12/15)
    long_name:                     Mean sea level pressure
    short_name:                    msl
    units:                         Pa
    original_format:               WMO GRIB 1 with ECMWF local table
    ecmwf_local_table:             128
    ecmwf_parameter:               151
    ...                            ...
    rda_dataset:                   ds633.0
    rda_dataset_url:               https:/rda.ucar.edu/datasets/ds633.0/
    rda_dataset_doi:               DOI: 10.5065/BH6N-5N20
    rda_dataset_group:             ERA5 atmospheric surface analysis [netCDF4]
    number_of_significant_digits:  7
    _ChunkSizes:                   [ 27 139 277]

Perform unit conversions

Traditionally, we plot SLP in units of hectopascals (hPa) and geopotential heights in units of decameters. The SLP units conversion is a bit more straightforward than for Geopotential to Height. We take the DataArray and apply MetPy’s unit conversion method.

SLP = SLP.metpy.convert_units('hPa')

Next, convert geopotential to height, as we did in the previous notebook. In the same line of code, convert the resulting units from meters to decameters.

HGHT = mpcalc.geopotential_to_height(Z).metpy.convert_units('dam')

Create a well-labeled multi-parameter contour plot of gridded ERA-5 reanalysis data

We will make contour lines of SLP in hPa, contour interval = 4, and filled contours of geopotential height in decameters, contour interval = 6.

As we’ve done before, let’s first define some variables relevant to Cartopy. Recall that we already defined the areal extent up above when we did the data subsetting.

cLon = -80
cLat = 35

proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)
proj_data = ccrs.PlateCarree() # Our data is lat-lon; thus its native projection is Plate Carree.
res = '50m'

Now define the range of our contour values and a contour interval. 4 hPa is standard for SLP.

minVal = 900
maxVal = 1080
cint = 4
SLPcintervals = np.arange(minVal, maxVal, cint)
SLPcintervals
array([ 900,  904,  908,  912,  916,  920,  924,  928,  932,  936,  940,
        944,  948,  952,  956,  960,  964,  968,  972,  976,  980,  984,
        988,  992,  996, 1000, 1004, 1008, 1012, 1016, 1020, 1024, 1028,
       1032, 1036, 1040, 1044, 1048, 1052, 1056, 1060, 1064, 1068, 1072,
       1076])
minVal = 468
maxVal = 606
cint = 6
HGHTcintervals = np.arange(minVal, maxVal, cint)
HGHTcintervals
array([468, 474, 480, 486, 492, 498, 504, 510, 516, 522, 528, 534, 540,
       546, 552, 558, 564, 570, 576, 582, 588, 594, 600])

Create a meaningful title string.

tl1 = "ERA-5 " + levelStr + " hPa geopotential heights (color fill, dam) and SLP (lines, hPa)"
tl2 = str('Valid at: '+ timeStr)
title_line = (tl1 + '\n' + tl2 + '\n')

Plot the map.

fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW,lonE,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))
CF = ax.contourf(lons,lats,HGHT, levels=HGHTcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))
cbar = plt.colorbar(CF,shrink=0.5)
cbar.ax.tick_params(labelsize=16)
cbar.ax.set_ylabel("Height (dam)",fontsize=16)

CL = ax.contour(lons,lats,SLP,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')
title = ax.set_title(title_line,fontsize=16)
../../_images/042a2c9f693fb722da38013940f0f5feec4d1ed51f95e45c0a902237c43440db.png

We’re missing the outer longitudes at higher latitudes. This is a consequence of reprojecting our data from PlateCarree to Lambert Conformal. We could do one of two things to resolve this:

  1. Re-subset our original datset by extending the longitudinal ranges

  2. Slightly constrain the map plotting region

Let’s use the latter method. The determination of how many degrees longitude to constrain is a process of trial and error; you want to avoid any blank areas in your map, but not restrict your map domain so much that you lose features at the edge of your map.

constrainLon = 3.3 # trial and error!

fig = plt.figure(figsize=(18,12))
ax = plt.subplot(1,1,1,projection=proj_map)
ax.set_extent ([lonW+constrainLon,lonE-constrainLon,latS,latN])
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))

CF = ax.contourf(lons,lats,HGHT, levels=HGHTcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))
cbar = plt.colorbar(CF,shrink=0.5)
cbar.ax.tick_params(labelsize=16)
cbar.ax.set_ylabel("Height (dam)",fontsize=16)

CL = ax.contour(lons,lats,SLP,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')
ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')
title = ax.set_title(title_line,fontsize=16)
../../_images/838897425de16938783a88871c73ca2f9a0db967aad3d881e9fe5d842ff981c2.png

Summary

  • Via the OpenDAP protocol, Xarray can open remotely-served datasets, such as the ERA-5 from the RDA THREDDS server.

  • Subsetting our data request results in less data being transferred, as well as more consistent contour labels.

What’s Next?

We’ll continue to work with the ERA-5 repository, constructing time series and performing more complicated diagnostic calculations.