{
"cells": [
{
"cell_type": "markdown",
"id": "adc8fe25-feea-4121-8743-e786349e0ab3",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "clinical-console",
"metadata": {},
"source": [
"# Plotting HRRR 2-meter temperatures"
]
},
{
"cell_type": "markdown",
"id": "a32df8af-ecc4-4db7-a211-277eb46e9133",
"metadata": {},
"source": [
"## Overview\n",
"1. Access archived HRRR data hosted on AWS in Zarr format\n",
"2. Visualize one of the variables (2m temperature) at an analysis time"
]
},
{
"cell_type": "markdown",
"id": "1ec4a6c8-6042-4e91-920f-1e709d14a8b8",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"| Concepts | Importance | Notes |\n",
"| --- | --- | --- |\n",
"| Xarray Lessons 1-9| Necessary | |\n",
"\n",
"* **Time to learn**: 30 minutes\n",
"***"
]
},
{
"cell_type": "markdown",
"id": "860a2761-73dc-4fa3-9f09-8336802f2c55",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fluid-transfer",
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr\n",
"import s3fs\n",
"import metpy\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import cartopy.crs as ccrs\n",
"import cartopy.feature as cfeature"
]
},
{
"cell_type": "markdown",
"id": "eec3a703-f324-4f30-9790-303b5a794611",
"metadata": {},
"source": [
"## What is Zarr?"
]
},
{
"cell_type": "markdown",
"id": "b42dfcf9-821c-46fe-afdb-5a8bb48ba5bb",
"metadata": {},
"source": [
"So far we have used Xarray to work with gridded datasets in NetCDF and GRIB formats. Zarr is a relatively new data format. It is particularly relevant in the following two scenarios:\n",
"1. Datasets that are stored in what's called *object store*. This is a commonly-used storage method for cloud providers, such as Amazon, Google, and Microsoft.\n",
"2. Datasets that are typically too large to load into memory all at once."
]
},
{
"cell_type": "markdown",
"id": "eafa1fa1-25ef-44ac-af09-4b4eff4eacff",
"metadata": {},
"source": [
"The [Pangeo](https://pangeo.io) project specifically recommends [Zarr as the Xarray-amenable data format of choice in the cloud](https://pangeo.io/data.html):\n",
">\n",
">\"Our current preference for storing multidimensional array data in the cloud is the Zarr format. Zarr is a new storage format which, thanks to its simple yet well-designed specification, makes large datasets easily accessible to distributed computing. In Zarr datasets, the arrays are divided into chunks and compressed. These individual chunks can be stored as files on a filesystem or as objects in a cloud storage bucket. The metadata are stored in lightweight .json files. Zarr works well on both local filesystems and cloud-based object stores. Existing datasets can easily be converted to zarr via xarray’s zarr functions.\"\n"
]
},
{
"cell_type": "markdown",
"id": "parallel-strike",
"metadata": {},
"source": [
"## Access archived HRRR data hosted on AWS in Zarr format"
]
},
{
"cell_type": "markdown",
"id": "004b16ec-c133-4662-aa88-a43bd8f69ab4",
"metadata": {},
"source": [
"For a number of years, the [Mesowest](https://mesowest.utah.edu/) group at the University of Utah has hosted an archive of data from NCEP's High Resolution Rapid Refresh model. This data, originally in GRIB-2 format, has been converted into Zarr and is freely available \"in the cloud\", on [Amazon Web Service's Simple Storage Service](https://aws.amazon.com/s3/), otherwise known as **S3**. Data is stored in S3 in a manner akin to (but different from) a Linux filesystem, using a [*bucket* and *object*](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#CoreConcepts) model."
]
},
{
"cell_type": "markdown",
"id": "behavioral-apparatus",
"metadata": {},
"source": [
"To interactively browse the contents of this archive, go to this link: [HRRRZarr File Browser on AWS](https://hrrrzarr.s3.amazonaws.com/index.html)"
]
},
{
"cell_type": "markdown",
"id": "927ac8c8-36d7-4b70-9773-b5b5c628b842",
"metadata": {},
"source": [
"To access Zarr-formatted data stored in an S3 bucket, we follow a 3-step process:\n",
"1. Create URL(s) pointing to the bucket and object(s) that contain the data we want\n",
"1. Create *map(s)* to the object(s) with the **s3fs** library's `S3Map` method\n",
"1. Pass the *map(s)* to Xarray's `open_dataset` or `open_mfdataset` methods, and specify `zarr` as the format, via the `engine` argument."
]
},
{
"cell_type": "markdown",
"id": "3af78d85-d544-4b26-9183-241209d05e86",
"metadata": {},
"source": [
"
open_mfdataset
method and pass in two AWS S3 file references to these two corresponding directories.\n",
"Globe
in Cartopy with these values.\n",
"open_mfdataset
, the resulting objects are Dask objects.\n",
"DataArray
in Xarray. If we want to perform a computation on this array, e.g. calculate the mean, min, or max, note that we don't get a result straightaway ... we get another Dask array.\n",
"compute
function to actually trigger the computation. \n",
"