{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "adc8fe25-feea-4121-8743-e786349e0ab3",
   "metadata": {},
   "source": [
    "<img src=\"../../images/xarray_aws_zarr.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "clinical-console",
   "metadata": {},
   "source": [
    "# Plotting HRRR 2-meter temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a32df8af-ecc4-4db7-a211-277eb46e9133",
   "metadata": {},
   "source": [
    "## Overview\n",
    "1. Access archived HRRR data hosted on AWS in Zarr format\n",
    "2. Visualize one of the variables (2m temperature) at an analysis time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ec4a6c8-6042-4e91-920f-1e709d14a8b8",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "| Concepts | Importance | Notes |\n",
    "| --- | --- | --- |\n",
    "| Xarray Lessons 1-9| Necessary | |\n",
    "\n",
    "* **Time to learn**: 30 minutes\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "860a2761-73dc-4fa3-9f09-8336802f2c55",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fluid-transfer",
   "metadata": {},
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import s3fs\n",
    "import metpy\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import cartopy.crs as ccrs\n",
    "import cartopy.feature as cfeature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec3a703-f324-4f30-9790-303b5a794611",
   "metadata": {},
   "source": [
    "## What is Zarr?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b42dfcf9-821c-46fe-afdb-5a8bb48ba5bb",
   "metadata": {},
   "source": [
    "So far we have used Xarray to work with gridded datasets in NetCDF and GRIB formats. Zarr is a relatively new data format. It is particularly relevant in the following two scenarios:\n",
    "1. Datasets that are stored in what's called *object store*. This is a commonly-used storage method for cloud providers, such as Amazon, Google, and Microsoft.\n",
    "2. Datasets that are typically too large to load into memory all at once."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eafa1fa1-25ef-44ac-af09-4b4eff4eacff",
   "metadata": {},
   "source": [
    "The [Pangeo](https://pangeo.io) project specifically recommends [Zarr as the Xarray-amenable data format of choice in the cloud](https://pangeo.io/data.html):\n",
    ">\n",
    ">\"Our current preference for storing multidimensional array data in the cloud is the Zarr format. Zarr is a new storage format which, thanks to its simple yet well-designed specification, makes large datasets easily accessible to distributed computing. In Zarr datasets, the arrays are divided into chunks and compressed. These individual chunks can be stored as files on a filesystem or as objects in a cloud storage bucket. The metadata are stored in lightweight .json files. Zarr works well on both local filesystems and cloud-based object stores. Existing datasets can easily be converted to zarr via xarray’s zarr functions.\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "parallel-strike",
   "metadata": {},
   "source": [
    "## Access archived HRRR data hosted on AWS in Zarr format</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "004b16ec-c133-4662-aa88-a43bd8f69ab4",
   "metadata": {},
   "source": [
    "For a number of years, the [Mesowest](https://mesowest.utah.edu/) group at the University of Utah has hosted an archive of data from NCEP's High Resolution Rapid Refresh model. This data, originally in GRIB-2 format, has been converted into Zarr and is freely available \"in the cloud\", on [Amazon Web Service's Simple Storage Service](https://aws.amazon.com/s3/), otherwise known as **S3**. Data is stored in S3 in a manner akin to (but different from) a Linux filesystem, using a [*bucket* and *object*](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#CoreConcepts) model."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "behavioral-apparatus",
   "metadata": {},
   "source": [
    "To interactively browse the contents of this archive, go to this link: [HRRRZarr File Browser on AWS](https://hrrrzarr.s3.amazonaws.com/index.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "927ac8c8-36d7-4b70-9773-b5b5c628b842",
   "metadata": {},
   "source": [
    "To access Zarr-formatted data stored in an S3 bucket, we follow a 3-step process:\n",
    "1. Create URL(s) pointing to the bucket and object(s) that contain the data we want\n",
    "1. Create *map(s)* to the object(s) with the **s3fs** library's `S3Map` method\n",
    "1. Pass the *map(s)* to Xarray's `open_dataset` or `open_mfdataset` methods, and specify `zarr` as the format, via the `engine` argument."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3af78d85-d544-4b26-9183-241209d05e86",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "A quirk in how these grids were converted from GRIB2 to Zarr means that the dimension variables are defined one directory up from where the data variables are. Thus, our strategy is to use Xarray's <code>open_mfdataset</code> method and pass in two AWS S3 file references to these two corresponding directories.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52dded5b-f813-4177-9400-f800905f4000",
   "metadata": {},
   "source": [
    "Create the URLs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc62deb0-cfd2-4720-ab84-2e0bbece8c3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "date = '20210214'\n",
    "hour = '18'\n",
    "var = 'TMP'\n",
    "level = '2m_above_ground'\n",
    "url1 = 's3://hrrrzarr/sfc/' + date + '/' + date + '_' + hour + 'z_anl.zarr/' + level + '/' + var + '/' + level\n",
    "url2 = 's3://hrrrzarr/sfc/' + date + '/' + date + '_' + hour + 'z_anl.zarr/' + level + '/' + var\n",
    "print(url1)\n",
    "print(url2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a286eda-418b-4b95-a0de-fceb6feb393e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    In this case, <b>hrrrzarr</b> is the S3 <i>bucket</i> name. <b>2m_above_ground</b> and <b>TMP</b> are both <i>objects</i> within the <b>bucket</b>. The former object has the 2-meter temperature array, while the latter contains the coordinate arrays of the spatial dimensions of 2m temperature (i.e., x and y).\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d661c992-5db9-40af-8ad4-fac7b28dad2f",
   "metadata": {},
   "source": [
    "Create the S3 maps from the S3 object store."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "530efd16-4779-4ace-9e71-0bbf6119056e",
   "metadata": {},
   "outputs": [],
   "source": [
    "fs = s3fs.S3FileSystem(anon=True)\n",
    "file1 = s3fs.S3Map(url1, s3=fs)\n",
    "file2 = s3fs.S3Map(url2, s3=fs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d76f2f8b-059c-43bc-9bc6-0f7070c254c6",
   "metadata": {},
   "source": [
    "Use Xarray's `open_mfdataset` to create a `Dataset` from these two S3 objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23db3344-cf7e-48c2-b907-039a195dc73d",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = xr.open_mfdataset([file1,file2], engine='zarr')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "processed-importance",
   "metadata": {},
   "source": [
    "Examine the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ceramic-letter",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e45ee292-a90b-4716-894e-96932027f5ca",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "The projection information for the HRRR was not readily found in the Zarr representation, so we will need to define the relevant parameters explicitly from other sources, shown below.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "worldwide-thumbnail",
   "metadata": {},
   "source": [
    "### HRRR Grid Navigation: \n",
    "     PROJECTION:          LCC                 \n",
    "     ANGLES:                38.5   -97.5    38.5\n",
    "     GRID SIZE:             1799    1059\n",
    "     LL CORNER:            21.1381 -122.7195\n",
    "     UR CORNER:            47.8422  -60.9168"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "embedded-lafayette",
   "metadata": {},
   "outputs": [],
   "source": [
    "lon1 = -97.5\n",
    "lat1 = 38.5\n",
    "slat = 38.5\n",
    "projData= ccrs.LambertConformal(central_longitude=lon1,\n",
    "                             central_latitude=lat1,\n",
    "                             standard_parallels=[slat,slat],globe=ccrs.Globe(semimajor_axis=6371229,\n",
    "                                        semiminor_axis=6371229))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16efe72d-53ea-4f24-a5cf-ca8b46a7a2d5",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\"><b>Note: </b>\n",
    "    The HRRR's projection assumes a <i>spherical earth</i>, whose semi-major/minor axes are both equal to 6371.229 km. We therefore need to explicitly define a <code>Globe</code> in Cartopy with these values.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "typical-canvas",
   "metadata": {},
   "source": [
    "Examine the dataset's coordinate variables. Each x- and y- value represents distance in meters from the central latitude and longitude."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "rotary-logic",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds.coords"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "entertaining-telescope",
   "metadata": {},
   "source": [
    "Create an object pointing to the dataset's data variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "interior-springer",
   "metadata": {},
   "outputs": [],
   "source": [
    "airTemp = ds.TMP"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sustainable-divorce",
   "metadata": {},
   "source": [
    "When we examine the object, we see that it is a special type of `DataArray` ... a *Dask* array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aerial-weekly",
   "metadata": {},
   "outputs": [],
   "source": [
    "airTemp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8b42a80-95ed-4fcf-bd6a-f3af8ce90025",
   "metadata": {},
   "source": [
    "When we examine the object, we see that it is a special type of `DataArray` ... a *[DaskArray](https://docs.dask.org/en/latest/array.html)*."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "449b8917-17a8-4e38-9f8f-72f7f053289e",
   "metadata": {},
   "source": [
    "## Sidetrip: Dask"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58fb366d-2386-4592-ba66-71d68b116028",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">Dask is a Python library that is especially well-suited for handling very large datasets (especially those that are too large to fit into RAM) and is nicely integrated with Xarray. We're going to defer a detailed exploration of Dask for now. But suffice it to say that when we use <code>open_mfdataset</code>, the resulting objects are <i>Dask</i> objects.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ideal-component",
   "metadata": {},
   "source": [
    "MetPy supports Dask arrays, and so performing a unit conversion is straightforward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "distinct-fraction",
   "metadata": {},
   "outputs": [],
   "source": [
    "airTemp = airTemp.metpy.convert_units('degC')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "social-pension",
   "metadata": {},
   "source": [
    "Verify that the object has the unit change"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "modular-receiver",
   "metadata": {},
   "outputs": [],
   "source": [
    "airTemp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pacific-alliance",
   "metadata": {},
   "source": [
    "Similar to what we did for datasets whose projection-related coordinates were latitude and longitude, we define objects pointing to x and y now, so we can pass them to the plotting functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "universal-syndrome",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = airTemp.projection_x_coordinate\n",
    "y = airTemp.projection_y_coordinate"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "split-trail",
   "metadata": {},
   "source": [
    "## Visualize 2m temperature at an analysis time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "loved-variance",
   "metadata": {},
   "source": [
    "First, just use Xarray's `plot` function to get a quick look to verify that things look right."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "pressed-combining",
   "metadata": {
    "tags": [
     "remove-stderr"
    ]
   },
   "outputs": [],
   "source": [
    "airTemp.plot(figsize=(11,8.5))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a17cea46-493a-478d-a196-682949137591",
   "metadata": {},
   "source": [
    "To facilitate the bounds of the contour intervals, obtain the min and max values from this `DataArray`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aabfb20e-1386-488d-8158-358ce21e6ef0",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    A Dask array is even more <i>lazy</i> in terms of its data loading than a basic <code>DataArray</code> in Xarray. If we want to perform a computation on this array, e.g. calculate the mean, min, or max, note that we don't get a result straightaway ... we get another Dask array.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee181450-e313-454e-804d-a9ee0f593506",
   "metadata": {},
   "outputs": [],
   "source": [
    "airTemp.min()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26bd8a76-12d5-49a6-90eb-2f818cb96a32",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "With Dask arrays, applying the min and max functions doesn't actually do the computation ... instead, it is creating a <i>task graph</i> which describes how the computations would be launched. You will need to call Dask's <code>compute</code> function to actually trigger the computation.   \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "portable-compensation",
   "metadata": {
    "tags": [
     "remove-stderr"
    ]
   },
   "outputs": [],
   "source": [
    "minTemp = airTemp.min().compute()\n",
    "maxTemp = airTemp.max().compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9ec0a48-be58-47f3-a4f4-deb7687af67f",
   "metadata": {},
   "outputs": [],
   "source": [
    "minTemp.values, maxTemp.values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "honest-chair",
   "metadata": {},
   "source": [
    "Based on the min and max, define a range of values used for contouring. Let's invoke NumPy's `floor` and `ceil`(ing) functions so these values conform to whatever variable we are contouring."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "treated-freedom",
   "metadata": {},
   "outputs": [],
   "source": [
    "fint = np.arange(np.floor(minTemp.values),np.ceil(maxTemp.values) + 2, 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22a3e565-e388-4430-8cf5-f40d72ad5054",
   "metadata": {},
   "outputs": [],
   "source": [
    "fint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdd75f32-faad-47c2-bb8b-a32430b5daae",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    " For a single map, setting the contour fill values as we did above is appropriate. But if you were producing a series of maps that span a range of times, a consistent (and thus wider) range of values would be better.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "utility-peripheral",
   "metadata": {},
   "source": [
    "## Plot the map\n",
    "We'll define the plot extent to nicely encompass the HRRR's spatial domain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "spectacular-spirit",
   "metadata": {
    "tags": [
     "remove-stderr"
    ]
   },
   "outputs": [],
   "source": [
    "latN = 50.4\n",
    "latS = 24.25\n",
    "lonW = -123.8\n",
    "lonE = -71.2\n",
    "\n",
    "res = '50m'\n",
    "\n",
    "fig = plt.figure(figsize=(18,12))\n",
    "ax = plt.subplot(1,1,1,projection=projData)\n",
    "ax.set_extent ([lonW,lonE,latS,latN],crs=ccrs.PlateCarree())\n",
    "ax.add_feature(cfeature.COASTLINE.with_scale(res))\n",
    "ax.add_feature(cfeature.STATES.with_scale(res))\n",
    "\n",
    "# Add the title\n",
    "tl1 = str('HRRR 2m temperature ($^\\circ$C)')\n",
    "tl2 = str('Analysis valid at: '+ hour + '00 UTC ' + date  )\n",
    "plt.title(tl1+'\\n'+tl2,fontsize=16)\n",
    "\n",
    "# Contour fill\n",
    "CF = ax.contourf(x,y,airTemp,levels=fint,cmap=plt.get_cmap('coolwarm'))\n",
    "# Make a colorbar for the ContourSet returned by the contourf call.\n",
    "cbar = fig.colorbar(CF,shrink=0.5)\n",
    "cbar.set_label(r'2m Temperature ($^\\circ$C)', size='large')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20d36311-c750-4368-b266-70dbd6e3fec7",
   "metadata": {},
   "source": [
    "---\n",
    "## Summary\n",
    "* Xarray can access datasets in Zarr format, which is ideal for a cloud-based object store system such as S3.\n",
    "* Xarray and MetPy both support *Dask*, a library that is particularly well-suited for very large datasets.\n",
    "\n",
    "## What's next?\n",
    "On your own, browse the [**hrrrzarr**](https://hrrrzarr.s3.amazonaws.com/index.html) S3 bucket. Try making maps for different variables and/or different times.\n",
    "\n",
    "## Resources and References\n",
    "\n",
    "1. [HRRR in Zarr format](https://mesowest.utah.edu/html/hrrr/)\n",
    "1. [NCEP's HRRR S3 archive (GRIB format)](https://registry.opendata.aws/noaa-hrrr-pds/)\n",
    "1. [What is *object store*?](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html)\n",
    "1. [Xarray's Dask implementation](http://xarray.pydata.org/en/stable/user-guide/dask.html)"
   ]
  }
 ],
 "metadata": {
  "execution": {
   "allow_errors": false
  },
  "kernelspec": {
   "display_name": "Python 3 August 2022 Environment",
   "language": "python",
   "name": "aug22"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}