Rasters 1: Intro to raster data

Overview

  1. Access real-time NWP data from Unidata’s THREDDS server

  2. Use Xarray’s plot.imshow to display gridded data as an image

  3. Use Xarray’s open_rasterio method to open and view high-resolution elevation data

  4. Close data objects when you are no longer using them.

Prerequisites

Concepts

Importance

Notes

Xarray

Necessary

  • Time to learn: 15 minutes


Imports

import xarray as xr
from datetime import datetime, timedelta

Access real-time NWP data from Unidata’s THREDDS server

Parse the current time and choose a recent hour when we might expect the model run to be complete. There is normally a ~90 minute lag, but we’ll give it a bit more than that.

now = datetime.now()
year = now.year
month = now.month
day = now.day
hour = now.hour
minute = now.minute
print (year, month, day, hour,minute)
2023 1 19 17 32
# The previous hour's data is typically available shortly after the half-hour, but give it a little extra time 
if (minute < 45): 
    hrDelta = 3
else:
    hrDelta = 2
runTime = now - timedelta(hours=hrDelta)
runTimeStr = runTime.strftime('%Y%m%d %H00 UTC')
modelDate = runTime.strftime('%Y%m%d')
modelHour = runTime.strftime('%H')
modelDay = runTime.strftime('%D')
print (modelDay)
print (modelHour)
print (runTimeStr)
01/19/23
14
20230119 1400 UTC

Create an object pointing to the dataset.

data_url = 'https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/HRRR_CONUS_2p5km_'+ modelDate + '_' + modelHour + '00.grib2'
data_url
'https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/HRRR_CONUS_2p5km_20230119_1400.grib2'
ds = xr.open_dataset(data_url)
xrdSize = ds.nbytes / 1e9
print ("Size of dataset = %.3f GB" % xrdSize)
Size of dataset = 18.159 GB
ds
<xarray.Dataset>
Dimensions:                                                                                                              (
                                                                                                                          x: 2145,
                                                                                                                          y: 1377,
                                                                                                                          time: 19,
                                                                                                                          time1: 19,
                                                                                                                          time2: 18,
                                                                                                                          ...
                                                                                                                          height_above_ground_layer2_bounds_1: 2,
                                                                                                                          pressure_difference_layer2_bounds_1: 2,
                                                                                                                          height_above_ground_layer3_bounds_1: 2,
                                                                                                                          isobaric_layer_bounds_1: 2,
                                                                                                                          sigma_layer_bounds_1: 2,
                                                                                                                          pressure_difference_layer3_bounds_1: 2)
Coordinates: (12/22)
  * x                                                                                                                    (x) float32 ...
  * y                                                                                                                    (y) float32 ...
    reftime                                                                                                              datetime64[ns] ...
  * time                                                                                                                 (time) datetime64[ns] ...
  * time1                                                                                                                (time1) datetime64[ns] ...
  * time2                                                                                                                (time2) datetime64[ns] ...
    ...                                                                                                                   ...
  * height_above_ground3                                                                                                 (height_above_ground3) float32 ...
  * isobaric_layer                                                                                                       (isobaric_layer) float32 ...
  * sigma_layer                                                                                                          (sigma_layer) float32 ...
  * pressure_difference_layer3                                                                                           (pressure_difference_layer3) float32 ...
  * isobaric                                                                                                             (isobaric) float32 ...
  * isobaric1                                                                                                            (isobaric1) float32 ...
Dimensions without coordinates: time1_bounds_1, time2_bounds_1,
                                pressure_difference_layer_bounds_1,
                                pressure_difference_layer1_bounds_1,
                                height_above_ground_layer_bounds_1,
                                height_above_ground_layer1_bounds_1,
                                height_above_ground_layer2_bounds_1,
                                pressure_difference_layer2_bounds_1,
                                height_above_ground_layer3_bounds_1,
                                isobaric_layer_bounds_1, sigma_layer_bounds_1,
                                pressure_difference_layer3_bounds_1
Data variables: (12/68)
    LambertConformal_Projection                                                                                          int32 ...
    time1_bounds                                                                                                         (time1, time1_bounds_1) datetime64[ns] ...
    time2_bounds                                                                                                         (time2, time2_bounds_1) datetime64[ns] ...
    pressure_difference_layer_bounds                                                                                     (pressure_difference_layer, pressure_difference_layer_bounds_1) float32 ...
    pressure_difference_layer1_bounds                                                                                    (pressure_difference_layer1, pressure_difference_layer1_bounds_1) float32 ...
    height_above_ground_layer_bounds                                                                                     (height_above_ground_layer, height_above_ground_layer_bounds_1) float32 ...
    ...                                                                                                                   ...
    u-component_of_wind_isobaric                                                                                         (time, isobaric1, y, x) float32 ...
    u-component_of_wind_height_above_ground                                                                              (time, height_above_ground2, y, x) float32 ...
    u-component_storm_motion_height_above_ground_layer                                                                   (time, height_above_ground_layer1, y, x) float32 ...
    v-component_of_wind_isobaric                                                                                         (time, isobaric1, y, x) float32 ...
    v-component_of_wind_height_above_ground                                                                              (time, height_above_ground2, y, x) float32 ...
    v-component_storm_motion_height_above_ground_layer                                                                   (time, height_above_ground_layer1, y, x) float32 ...
Attributes:
    Originating_or_generating_Center:                                        ...
    Originating_or_generating_Subcenter:                                     ...
    GRIB_table_version:                                                      ...
    Type_of_generating_process:                                              ...
    Analysis_or_forecast_generating_process_identifier_defined_by_originating...
    file_format:                                                             ...
    Conventions:                                                             ...
    history:                                                                 ...
    featureType:                                                             ...
    _CoordSysBuilder:                                                        ...
    EXTRA_DIMENSION.reftime:                                                 ...
Tip: This Dataset, while stored in its native (GRIB2) format, presents itself to Xarray as NetCDF. The THREDDS server automatically translates GRIB to NetCDF.
Notice as well that its horizontal dimensions, 2145 x 1377, differs from the HRRR datasets served via cloud providers such as AWS or Google Cloud (1799 x 1099)! The former is a bit higher in horizontal resolution (2.5 km vs 3 km).

Note that the horizontal dimensions of this Dataset are not longitude/latitude. Instead, they represent distances in km from a central origin.

ds.x
<xarray.DataArray 'x' (x: 2145)>
array([-2763.217 , -2760.6772, -2758.1377, ...,  2676.8267,  2679.3662,
        2681.906 ], dtype=float32)
Coordinates:
  * x        (x) float32 -2.763e+03 -2.761e+03 ... 2.679e+03 2.682e+03
    reftime  datetime64[ns] 2023-01-19T14:00:00
Attributes:
    standard_name:        projection_x_coordinate
    units:                km
    _CoordinateAxisType:  GeoX
ds.y
<xarray.DataArray 'y' (y: 1377)>
array([-263.79062, -261.25092, -258.7112 , ..., 3225.7612 , 3228.3008 ,
       3230.8406 ], dtype=float32)
Coordinates:
  * y        (y) float32 -263.8 -261.3 -258.7 ... 3.226e+03 3.228e+03 3.231e+03
    reftime  datetime64[ns] 2023-01-19T14:00:00
Attributes:
    standard_name:        projection_y_coordinate
    units:                km
    _CoordinateAxisType:  GeoY

Use Xarray’s plot.imshow to display gridded data as an image

One of the output grids is surface pressure. We can visualize it as a proxy for surface elevation.

ds.Pressure_surface
<xarray.DataArray 'Pressure_surface' (time: 19, y: 1377, x: 2145)>
[56119635 values with dtype=float32]
Coordinates:
  * x        (x) float32 -2.763e+03 -2.761e+03 ... 2.679e+03 2.682e+03
  * y        (y) float32 -263.8 -261.3 -258.7 ... 3.226e+03 3.228e+03 3.231e+03
    reftime  datetime64[ns] 2023-01-19T14:00:00
  * time     (time) datetime64[ns] 2023-01-19T14:00:00 ... 2023-01-20T08:00:00
Attributes:
    long_name:                       Pressure @ Ground or water surface
    units:                           Pa
    abbreviation:                    PRES
    grid_mapping:                    LambertConformal_Projection
    Grib_Variable_Id:                VAR_0-3-0_L1
    Grib2_Parameter:                 [0 3 0]
    Grib2_Parameter_Discipline:      Meteorological products
    Grib2_Parameter_Category:        Mass
    Grib2_Parameter_Name:            Pressure
    Grib2_Level_Type:                1
    Grib2_Level_Desc:                Ground or water surface
    Grib2_Generating_Process_Type:   Forecast
    Grib2_Statistical_Process_Type:  UnknownStatType--1

Use Xarray to generate an image, as opposed to a filled contour plot, for the first time in the Dataset.

ds.Pressure_surface.isel(time1=0).plot.imshow(figsize=(15,10))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 ds.Pressure_surface.isel(time1=0).plot.imshow(figsize=(15,10))

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/xarray/core/dataarray.py:1291, in DataArray.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   1286     return self._from_temp_dataset(ds)
   1288 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
   1289 # lists, or zero or one-dimensional np.ndarray's
-> 1291 variable = self._variable.isel(indexers, missing_dims=missing_dims)
   1292 indexes, index_variables = isel_indexes(self.xindexes, indexers)
   1294 coords = {}

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/xarray/core/variable.py:1223, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs)
   1199 """Return a new array indexed along the specified dimension(s).
   1200 
   1201 Parameters
   (...)
   1219     indexer, in which case the data will be a copy.
   1220 """
   1221 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
-> 1223 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   1225 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
   1226 return self[key]

File /knight/anaconda_aug22/envs/aug22_env/lib/python3.10/site-packages/xarray/core/utils.py:850, in drop_dims_from_indexers(indexers, dims, missing_dims)
    848     invalid = indexers.keys() - set(dims)
    849     if invalid:
--> 850         raise ValueError(
    851             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    852         )
    854     return indexers
    856 elif missing_dims == "warn":
    857 
    858     # don't modify input

ValueError: Dimensions {'time1'} do not exist. Expected one or more of ('time', 'y', 'x')
WARNING: A quirk in how a THREDDS data server interprets a GRIB dataset means that the coordinate variable names may change unpredictabily from run to run. In this case, the time dimension was named time1. When you run this notebook, the dimension name might also be time1, but it could be something else ... such as time or time2 Modify the cell above if this occurs.
Tip: Rather than creating filled contours, we have used Xarray's Matplotlib interface to create an image map of surface pressure. Its resolution is the same as the dataset ... in this case, approximately 2.5 km. It fits the definition of what is called raster data.
Note: But there exist other rasterized representations of elevation ... at even higher resolutions ... that are in a particular image format (tiff). Since we never expect these elevations to change (except on very long time scales), we can use these static datasets to complement our visualizations.

Use Xarray’s open_rasterio method to open high-resolution elevation data

Natural Earth, the same source that Cartopy utilizes for many of the background features we have used, also provides raster datasets for download. The medium and high-resolution manually-shaded relief files can be accessed from their location in the course data directory.

ds50m = xr.open_rasterio('/spare11/atm533/data/raster/MSR_50M/MSR_50M.tif')
ds10m = xr.open_rasterio('/spare11/atm533/data/raster/US_MSR_10M/US_MSR.tif')
/tmp/ipykernel_1021647/2329449947.py:1: DeprecationWarning: open_rasterio is Deprecated in favor of rioxarray. For information about transitioning, see: https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
  ds50m = xr.open_rasterio('/spare11/atm533/data/raster/MSR_50M/MSR_50M.tif')
/tmp/ipykernel_1021647/2329449947.py:2: DeprecationWarning: open_rasterio is Deprecated in favor of rioxarray. For information about transitioning, see: https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
  ds10m = xr.open_rasterio('/spare11/atm533/data/raster/US_MSR_10M/US_MSR.tif')

Examine these Datasets

ds50m
<xarray.DataArray (band: 1, y: 5400, x: 10800)>
[58320000 values with dtype=uint8]
Coordinates:
  * band     (band) int64 1
  * y        (y) float64 89.98 89.95 89.92 89.88 ... -89.88 -89.92 -89.95 -89.98
  * x        (x) float64 -180.0 -179.9 -179.9 -179.9 ... 179.9 179.9 179.9 180.0
Attributes:
    transform:               (0.03333333333333, 0.0, -179.99999999999997, 0.0...
    crs:                     +init=epsg:4326
    res:                     (0.03333333333333, 0.03333333333333)
    is_tiled:                0
    nodatavals:              (nan,)
    scales:                  (1.0,)
    offsets:                 (0.0,)
    AREA_OR_POINT:           Area
    TIFFTAG_DATETIME:        2015:01:13 09:19:11
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    TIFFTAG_SOFTWARE:        Adobe Photoshop CC 2014 (Macintosh)
    TIFFTAG_XRESOLUTION:     342.85699
    TIFFTAG_YRESOLUTION:     342.85699
ds10m
<xarray.DataArray (band: 1, y: 9493, x: 12578)>
[119402954 values with dtype=uint8]
Coordinates:
  * band     (band) int64 1
  * y        (y) float64 7.56e+06 7.559e+06 7.558e+06 ... 6.678e+05 6.671e+05
  * x        (x) float64 -1.492e+07 -1.492e+07 ... -5.789e+06 -5.788e+06
Attributes:
    transform:               (726.0672549487241, 0.0, -14920200.0, 0.0, -726....
    crs:                     +init=epsg:3857
    res:                     (726.0672549487241, 726.1428270486623)
    is_tiled:                0
    nodatavals:              (nan,)
    scales:                  (1.0,)
    offsets:                 (0.0,)
    AREA_OR_POINT:           Area
    TIFFTAG_DATETIME:        2015:12:30 08:08:29
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    TIFFTAG_SOFTWARE:        Adobe Photoshop CC 2015 (Macintosh)
    TIFFTAG_XRESOLUTION:     350
    TIFFTAG_YRESOLUTION:     350

We now have two Xarray DataArray objects. They are three-dimensional (band, y, and x), with the new dimension called band. There is only one band in each of these data arrays, so we will just pass its one index (0) directly to imshow. Note the horizontal dimemsions … in both cases, quite a bit higher than the already high-resolution HRRR grid we viewed above. What parts of the world do they represent?

ds50m.sel(band=1).plot.imshow(figsize=(15,10))
<matplotlib.image.AxesImage at 0x14c696e8c7c0>
../../_images/01RasterIntro_32_1.png
ds10m.sel(band=1).plot.imshow(figsize=(15,10))
<matplotlib.image.AxesImage at 0x14c69674f820>
../../_images/01RasterIntro_33_1.png
Tip: Notice the x- and y-axis labels. What do you think they represent? Look at the attributes for both of these data arrays for more info. Particularly notice the `crs` (Coordinate Reference System) attributes.
We'll explore these georeferenced TIFF (tagged image file format) images in more detail in subsequent notebooks!
ds10m.crs
'+init=epsg:3857'

Close the file handles so they don’t stay resident in memory.

ds.close()
ds10m.close()
ds50m.close()

Summary

  • We can access real-time NWP data via the Unidata THREDDS server

  • THREDDS automatically translates data in supported formats (such as GRIB) into NetCDF.

  • Gridded data arrays are a type of raster data.

  • Matplotlib’s imshow method displays raster data as an image.

  • A frequently-used image format is tiff.

  • TIFFS can be geo-referenced, making them very useful in map-based visualizations.

What’s Next?

In the next notebook, we’ll use our geo-referenced TIFF as a map layer and overlay NWP data on top of it.