{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Xarray 4: Remote access" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "1. Work with a multiple-file Xarray `Dataset` hosted on NCAR's Research Data Archive\n", "2. Subset the Dataset along its dimensions\n", "3. Perform unit conversion\n", "4. Create a well-labeled multi-parameter contour plot of gridded ERA-5 reanalysis data\n", "\n", "## Prerequisites\n", "\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| Xarray Lessons 1-3| Necessary | |\n", "\n", "* **Time to learn**: 30 minutes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime as dt\n", "from metpy.units import units\n", "from metpy import calc as mpcalc\n", "import cartopy.crs as ccrs\n", "import cartopy.feature as cfeature\n", "import matplotlib.pyplot as plt\n", "import requests\n", "from pathlib import Path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Work with the ERA-5 Dataset hosted on NCAR's [Remote Data Archive](https://rda.ucar.edu/datasets/ds633.0/#description)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Datasets are ever-growing in terms of size and sheer number. It is not \"remotely\" possible to download every dataset one might want to use to your computer! Fortunately, although Xarray provides easy access to files stored \"locally\" on disk, it can also give you access to files stored on remotely-located servers via different types of *data access protocols*. One such protocol is called **OpenDAP**. It makes files available via a specially-crafted web link, or *URL*. This is how we will access ERA-5 datasets stored on the ERA-5 [RDA's THREDDS server](https://thredds.rda.ucar.edu/thredds/catalog/files/g/ds633.0/catalog.html) data catalog." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Besides OpenDAP, datasets are increasingly served via cloud providers, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. We will explore how Xarray can access these datasets in a future lesson.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct OpenDAP URL's for the ERA-5 datasets served by the RDA server." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Note: RDA follows a particular convention in the directory and file names for the ERA-5 dataset. Single-level fields are stored in one monthly file, with a naming convention that includes the string sfc. Fields on multiple levels, such as pressure (isobaric) levels, are stored in daily files. For isobaric levels, the naming convention includes the string pl. In this case, we will retrieve sea-level pressure (SLP) and geopotential on isobaric surfaces, so we will be both of these two strings in our URLs.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_urlp = 'https://thredds.rda.ucar.edu/thredds/dodsC/files/g/ds633.0/e5.oper.an.sfc/201210/e5.oper.an.sfc.128_151_msl.ll025sc.2012100100_2012103123.nc'\n", "ds_urlz = 'https://thredds.rda.ucar.edu/thredds/dodsC/files/g/ds633.0/e5.oper.an.pl/201210/e5.oper.an.pl.128_129_z.ll025sc.2012103000_2012103023.nc'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now open the two files as separate Datasets. (avoid using open_mfdataset as subsetting will cause failures down the line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Note: As we will see in future lessons, Xarray provides a method, open_mfdataset, that allows for one Dataset to be created from multiple sources. However, this may lead to strange errors after we subset the data later in this notebook. This problem seems specific to using open_mfdataset on file served by THREDDS.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dsSLP = xr.open_dataset(ds_urlp)\n", "dsZ = xr.open_dataset(ds_urlz)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "Let's see what one of these Datasets looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dsZ" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dsSLP" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dsSize = dsZ.nbytes / 1e9\n", "print (\"Size of dataset = %.3f GB\" % dsSize)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
That's a fairly large dataset! However, Xarray uses what's called lazy loading, in that the actual data transfer only occurs when we actually need the data to perform computations or visualizations. By subsetting the data so we only get what we want, the actual data transfer will be much smaller.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Subset the Dataset along its dimensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We noticed in the previous notebook that our contour labels were not appearing with every contour line. This is because we passed the entire horizontal extent (all latitudes and longitudes) to the `ax.contour` method. Since our intent is to plot only over a regional subset, we will use the `sel` method on the latitude and longitude dimensions as well as time and isobaric surface.\n", "\n", "We'll also use Datetime and string methods to more dynamically assign various dimensional specifications, as well as aid in figure-labeling later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Areal extent\n", "lonW = -100\n", "lonE = -60\n", "latS = 20\n", "latN = 50\n", "latRange = np.arange(latS-5,latN+5,.25) # expand the data range a bit beyond the plot range\n", "lonRange = np.arange((lonW-5+360),(lonE+5+360),.25) # Need to match longitude values to those of the coordinate variable\n", "\n", "# Vertical level specificaton\n", "vlevel = 500\n", "levelStr = str(vlevel)\n", "\n", "# Date/Time specification\n", "Year = 2012\n", "Month = 10\n", "Day = 30\n", "Hour = 0\n", "Minute = 0\n", "dateTime = dt(Year,Month,Day, Hour, Minute)\n", "timeStr = dateTime.strftime(\"%Y-%m-%d %H%M UTC\")\n", "\n", "# Data variable selection\n", "\n", "SLP = dsSLP['MSL'].sel(time=dateTime,latitude=latRange,longitude=lonRange)\n", "Z = dsZ['Z'].sel(time=dateTime,latitude=latRange,longitude=lonRange,level=vlevel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's look at some of the attributes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Z.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Z.dims" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Z.units" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SLP.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SLP.dims" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SLP.units" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define our subsetted coordinate arrays of lat and lon. Pull them from any of the DataArrays. We'll need to pass these into the contouring functions later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lats = SLP.latitude\n", "lons = SLP.longitude" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lons" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SLP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Perform unit conversions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Traditionally, we plot SLP in units of hectopascals (hPa) and geopotential heights in units of decameters. The SLP units conversion is a bit more straightforward than for Geopotential to Height. We take the DataArray and apply [MetPy's unit conversion method](https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html#units)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SLP = SLP.metpy.convert_units('hPa')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, convert geopotential to height, as we did in the previous notebook. In the same line of code, convert the resulting units from meters to decameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HGHT = mpcalc.geopotential_to_height(Z).metpy.convert_units('dam')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a well-labeled multi-parameter contour plot of gridded ERA-5 reanalysis data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will make contour lines of SLP in hPa, contour interval = 4, and filled contours of geopotential height in decameters, contour interval = 6." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we've done before, let's first define some variables relevant to Cartopy. Recall that we already defined the areal extent up above when we did the data subsetting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cLon = -80\n", "cLat = 35\n", "\n", "proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)\n", "proj_data = ccrs.PlateCarree() # Our data is lat-lon; thus its native projection is Plate Carree.\n", "res = '50m'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now define the range of our contour values and a contour interval. 4 hPa is standard for SLP." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "minVal = 900\n", "maxVal = 1080\n", "cint = 4\n", "SLPcintervals = np.arange(minVal, maxVal, cint)\n", "SLPcintervals" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "minVal = 468\n", "maxVal = 606\n", "cint = 6\n", "HGHTcintervals = np.arange(minVal, maxVal, cint)\n", "HGHTcintervals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a meaningful title string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tl1 = \"ERA-5 \" + levelStr + \" hPa geopotential heights (color fill, dam) and SLP (lines, hPa)\"\n", "tl2 = str('Valid at: '+ timeStr)\n", "title_line = (tl1 + '\\n' + tl2 + '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the map." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(18,12))\n", "ax = plt.subplot(1,1,1,projection=proj_map)\n", "ax.set_extent ([lonW,lonE,latS,latN])\n", "ax.add_feature(cfeature.COASTLINE.with_scale(res))\n", "ax.add_feature(cfeature.STATES.with_scale(res))\n", "CF = ax.contourf(lons,lats,HGHT, levels=HGHTcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))\n", "cbar = plt.colorbar(CF,shrink=0.5)\n", "cbar.ax.tick_params(labelsize=16)\n", "cbar.ax.set_ylabel(\"Height (dam)\",fontsize=16)\n", "\n", "CL = ax.contour(lons,lats,SLP,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')\n", "ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')\n", "title = ax.set_title(title_line,fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're missing the outer longitudes at higher latitudes. This is a consequence of reprojecting our data from PlateCarree to Lambert Conformal. We could do one of two things to resolve this:\n", "\n", "1. Re-subset our original datset by extending the longitudinal ranges\n", "2. Slightly constrain the map plotting region\n", "\n", "Let's use the latter method. The determination of how many degrees longitude to constrain is a process of trial and error; you want to avoid any blank areas in your map, but not restrict your map domain so much that you lose features at the edge of your map." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "constrainLon = 3.3 # trial and error!\n", "\n", "fig = plt.figure(figsize=(18,12))\n", "ax = plt.subplot(1,1,1,projection=proj_map)\n", "ax.set_extent ([lonW+constrainLon,lonE-constrainLon,latS,latN])\n", "ax.add_feature(cfeature.COASTLINE.with_scale(res))\n", "ax.add_feature(cfeature.STATES.with_scale(res))\n", "\n", "CF = ax.contourf(lons,lats,HGHT, levels=HGHTcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))\n", "cbar = plt.colorbar(CF,shrink=0.5)\n", "cbar.ax.tick_params(labelsize=16)\n", "cbar.ax.set_ylabel(\"Height (dam)\",fontsize=16)\n", "\n", "CL = ax.contour(lons,lats,SLP,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')\n", "ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')\n", "title = ax.set_title(title_line,fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Summary\n", "* Via the OpenDAP protocol, Xarray can open remotely-served datasets, such as the ERA-5 from the RDA THREDDS server.\n", "* Subsetting our data request results in less data being transferred, as well as more consistent contour labels.\n", "\n", "### What's Next?\n", "We'll continue to work with the ERA-5 repository, constructing time series and performing more complicated diagnostic calculations.\n", "\n", "## Resources and References\n", "1. [About ERA-5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5)\n", "1. [RDA's ERA-5 Documentation](https://rda.ucar.edu/datasets/ds633.0)\n", "1. [Data Table for ERA-5 Surface Variables](https://rda.ucar.edu/OS/web/datasets/ds633.0/docs/ds633.0.e5.oper.an.sfc.grib1.table.web.txt)\n", "1. [Data Table for ERA-5 Pressure Levels Variables](https://rda.ucar.edu/OS/web/datasets/ds633.0/docs/ds633.0.e5.oper.an.pl.grib1.table.web.txt)\n", "1. [The THREDDS Data Server](https://www.unidata.ucar.edu/software/tds/current/)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 August 2023 Environment", "language": "python", "name": "aug23" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }