{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 04_Xarray: Overlays from a cloud-served ERA5 archive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "1. Work with a cloud-served ERA5 archive\n", "2. Subset the Dataset along its dimensions\n", "3. Perform unit conversions\n", "4. Create a well-labeled multi-parameter contour plot of gridded ERA5 data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime as dt\n", "from metpy.units import units\n", "import metpy.calc as mpcalc\n", "import cartopy.crs as ccrs\n", "import cartopy.feature as cfeature\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select the region, time, and (if applicable) vertical level(s) of interest." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Areal extent\n", "\n", "lonW = -100\n", "lonE = -60\n", "latS = 25\n", "latN = 55\n", "cLat, cLon = (latS + latN)/2, (lonW + lonE)/2\n", "\n", "# Recall that in ERA5, longitudes run between 0 and 360, not -180 and 180\n", "if (lonW < 0 ):\n", " lonW = lonW + 360\n", "if (lonE < 0 ):\n", " lonE = lonE + 360\n", " \n", "expand = 1\n", "latRange = np.arange(latS - expand,latN + expand,.25) # expand the data range a bit beyond the plot range\n", "lonRange = np.arange((lonW - expand),(lonE + expand),.25) # Need to match longitude values to those of the coordinate variable\n", "\n", "# Vertical level specificaton\n", "plevel = 500\n", "levelStr = str(plevel)\n", "\n", "# Date/Time specification\n", "Year = 1998\n", "Month = 5\n", "Day = 31\n", "Hour = 18\n", "Minute = 0\n", "dateTime = dt(Year,Month,Day, Hour, Minute)\n", "timeStr = dateTime.strftime(\"%Y-%m-%d %H%M UTC\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Work with a cloud-served ERA5 archive\n", "\n", "A team at [Google Research & Cloud](https://research.google/) are making parts of the [ECMWF Reanalysis version 5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) (aka **ERA-5**) accessible in a [Analysis Ready, Cloud Optimized](https://www.frontiersin.org/articles/10.3389/fclim.2021.782909/full) (aka **ARCO**) format.\n", "\n", "Access the [ERA-5 ARCO](https://weatherbench2.readthedocs.io/en/latest/) catalog\n", "\n", "The ERA5 archive runs from 1/1/1959 through 1/10/2023. For dates subsequent to the end-date, we'll instead load a local archive." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### If the requested date is later than 1/10/2023, read in the dataset from a locally-stored archive." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endDate = dt(2023,1,10)\n", "if (dateTime <= endDate):\n", "\n", " cloud_source = True\n", " ds = xr.open_dataset(\n", " 'gs://weatherbench2/datasets/era5/1959-2023_01_10-wb13-6h-1440x721.zarr', \n", " chunks={'time': 48},\n", " consolidated=True,\n", " engine='zarr'\n", ")\n", "\n", "else: \n", " import glob, os\n", " cloud_source = False\n", " input_directory = '/free/ktyle/era5'\n", " files = glob.glob(os.path.join(input_directory,'*.nc'))\n", " ds = xr.open_mfdataset(files)\n", "\n", "print(f'size: {ds.nbytes / (1024 ** 4)} TB')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cloud-hosted dataset is a **big** (40+ TB) file! But Xarray is just reading only enough of the file for us to get a look at the dimensions, coordinate and data variables, and other metadata. We call this *lazy loading*, as opposed to *eager loading*, which we will do only after we have subset the dataset over time, lat-lon, and vertical level." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examine the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Pay attention to the names of the data and coordinate variables! The locally-stored dataset differs from the cloud-served one in all but latitude and longtiude.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data variables selection and subsetting\n", "We'll retrieve two data variables (i.e. `DataArray`s), geopotential and sea-level pressure, from the `Dataset`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if (cloud_source):\n", " z = ds['geopotential'].sel(time=dateTime,level=plevel,latitude=latRange,longitude=lonRange)\n", " slp = ds['mean_sea_level_pressure'].sel(time=dateTime,latitude=latRange,longitude=lonRange)\n", "else:\n", " z = ds['z'].sel(valid_time=dateTime,pressure_level=plevel,latitude=latRange,longitude=lonRange)\n", " slp = ds['msl'].sel(valid_time=dateTime,latitude=latRange,longitude=lonRange) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define our subsetted coordinate arrays of lat and lon. Pull them from either of the two DataArrays. We'll need to pass these into the contouring functions later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lats = z.latitude\n", "lons = z.longitude" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform unit conversions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert geopotential in `m**2/s**2` to geopotential height in decameters (`dam`), and Pascals to hectoPascals (`hPa`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take the DataArrays and apply [MetPy's unit conversion methods](https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html#units)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "slp = slp.metpy.convert_units('hPa')\n", "z = mpcalc.geopotential_to_height(z).metpy.convert_units('dam')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
These conversions involve actual calculations, so in that cell, we are not *lazy-loading*. But that is the first instance where actual bytes of anything other than metadata get transfered ... and we've already subset our original much larger datsets into something much smaller.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "slp.nbytes / 1e6, z.nbytes / 1e6 # size in MB for the subsetted DataArrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a well-labeled multi-parameter contour plot of gridded ERA5 reanalysis data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will make contour fills of 500 hPa height in decameters, with a contour interval of 6 dam, and contour lines of SLP in hPa, contour interval = 4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we've done before, let's first define some variables relevant to Cartopy. Recall that we already defined the areal extent up above when we did the data subsetting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)\n", "proj_data = ccrs.PlateCarree() # Our data is lat-lon; thus its native projection is Plate Carree.\n", "res = '50m'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now define the range of our contour values and a contour interval. 60 m is standard for 500 hPa." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "minVal = 474\n", "maxVal = 606\n", "cint = 6\n", "Zcintervals = np.arange(minVal, maxVal, cint)\n", "Zcintervals" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "minVal = 900\n", "maxVal = 1080\n", "cint = 4\n", "SLPcintervals = np.arange(minVal, maxVal, cint)\n", "SLPcintervals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the map, with filled contours of 500 hPa geopotential heights, and contour lines of SLP." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a meaningful title string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tl1 = f\"ERA5 {levelStr} hPa heights (dam, filled contours) and SLP (lines, hPa)\"\n", "tl2 = f\"Valid at: {timeStr}\"\n", "title_line = (tl1 + '\\n' + tl2 + '\\n')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)\n", "proj_data = ccrs.PlateCarree()\n", "res = '50m'\n", "fig = plt.figure(figsize=(18,12))\n", "ax = plt.subplot(1,1,1,projection=proj_map)\n", "ax.set_extent ([lonW,lonE,latS,latN])\n", "ax.add_feature(cfeature.COASTLINE.with_scale(res))\n", "ax.add_feature(cfeature.STATES.with_scale(res))\n", "CF = ax.contourf(lons,lats,z, levels=Zcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))\n", "cbar = plt.colorbar(CF,shrink=0.5)\n", "cbar.ax.tick_params(labelsize=16)\n", "cbar.ax.set_ylabel(\"Height (dam)\",fontsize=16)\n", "\n", "CL = ax.contour(lons,lats,slp,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')\n", "ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')\n", "title = plt.title(title_line,fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Constrain the map plotting region to eliminate blank space on the east & west edges of the domain." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "constrainLon = 7 # trial and error!\n", "\n", "proj_map = ccrs.LambertConformal(central_longitude=cLon, central_latitude=cLat)\n", "proj_data = ccrs.PlateCarree()\n", "res = '50m'\n", "fig = plt.figure(figsize=(18,12))\n", "ax = plt.subplot(1,1,1,projection=proj_map)\n", "ax.set_extent ([lonW+constrainLon,lonE-constrainLon,latS,latN])\n", "ax.add_feature(cfeature.COASTLINE.with_scale(res))\n", "ax.add_feature(cfeature.STATES.with_scale(res))\n", "CF = ax.contourf(lons,lats,z,levels=Zcintervals,transform=proj_data,cmap=plt.get_cmap('coolwarm'))\n", "cbar = plt.colorbar(CF,shrink=0.5)\n", "cbar.ax.tick_params(labelsize=16)\n", "cbar.ax.set_ylabel(\"Height (dam)\",fontsize=16)\n", "\n", "CL = ax.contour(lons,lats,slp,SLPcintervals,transform=proj_data,linewidths=1.25,colors='green')\n", "ax.clabel(CL, inline_spacing=0.2, fontsize=11, fmt='%.0f')\n", "title = plt.title(title_line,fontsize=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Things to try:\n", "1. Change the date and time\n", "2. Change the region\n", "3. Use a different map projection for your plot\n", "4. Select and overlay different variables" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 Jan. 2025 Environment", "language": "python", "name": "jan25" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 4 }