{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 04_Geoviews_Datashader_NLDN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview:\n", "1. Read in a multi-million row dataframe containing National Lightning Detection Network data from August 2021\n", "1. Use `Datashader` to create a rasterized representation of the dataframe\n", "1. Create an interactive, georeferenced visualization of NLDN data using the [Holoviz](https://holoviz.org) ecosystem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| Pandas | Necessary | |\n", "| Geoviews 1-2 | Necessary| |\n", "\n", "* **Time to learn**: 30 minutes\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datashader as ds\n", "import pandas as pd\n", "from colorcet import fire\n", "from datashader import transfer_functions as tf\n", "import geoviews as gv\n", "from geoviews import opts\n", "import holoviews as hv\n", "from holoviews.element.tiles import OSM\n", "from holoviews.operation.datashader import datashade,rasterize" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hv.extension('bokeh')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in a multi-million row dataframe containing National Lightning Detection Network data from August 2021" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we load in all of the detected cloud-to-ground flashes from August 2021. The data was originally derived from a Pandas-friendly text file, with over 50 million lines (one per flash, cloud-to-cloud as well as cloud-to-ground). While we could use Pandas `read_csv` to create a `Dataframe` from it, here we read in the dataset as expressed in [Feather](https://arrow.apache.org/docs/python/feather.html) format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time df = pd.read_feather('/spare11/atm533/data/202108_NLDN_CG.ftr')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Typically, it takes less than 30 seconds to load in this Feather-formatted file, and requires a maximum of 10 GB of system memory (once loaded, about 2.7GB). Pretty signficant, but in comparison, loading the same dataset in ASCII format takes close to 10 minutes and 60 GB of RAM!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Did you know? The NLDN was originally developed in our department in the 1980s! It is now operated by Vaisala.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot flash locations using Pandas' Matplotlib Scatterplot interface." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.plot.scatter('Lon','Lat',figsize=(11,8));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot takes a while to appear, and is essentially useless due to the thousands upon thousands of overlapping points. It seems impossible to make a useful plot of a month's worth (tens of millions of flashes during the warm season) of lightning strikes ... until we leverage the `Datashader` library, which is part of `Holoviz`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use `Datashader` to create a rasterized representation of the dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we will directly use `Datashader` methods to produce a visualization. We set up a Datashader `Canvas` object of x by y pixels, and then *aggregate* each point into a raster image of the specified x by y dimensions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we use Datashader's `transfer_functions` methods to set a colormap and a background." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "agg = ds.Canvas(800,600).points(df,'Lon','Lat')\n", "tf.set_background(tf.shade(agg, cmap=fire),\"black\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What previously took a minute or so and provided only the coarsest amount of detail, now takes place in seconds and the detail is such that we can clearly see the implied geography!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, as cool (or, in the case of this colormap, hot) as the above graphic is, it has no interactivity nor georeferencing (technically, a `Datashader` object has no concept of things like `axes` or `colorbar`s. \n", "\n", "Holoviews and Geoviews to the rescue!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an interactive, georeferenced visualization of NLDN data using the [Holoviz](https://holoviz.org) ecosystem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, create a Geoviews dataset from the Pandas dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gdf = gv.Dataset(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we create two Geoviews objects; one for a background map, and the other, NLDN flash locations derived from GeoViews' representation of the underlying Dataframe. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "map_tiles = OSM().opts(alpha=0.9,width=900, height=700)\n", "points = gv.Points(gdf, ['Lon','Lat'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rasterize (and datashade) the points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flashes = rasterize(points, x_sampling=.001, y_sampling=.001, cmap=fire, alpha=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Note: \n", "
    \n", "
  1. x_ and y_sampling specify to what resolution the points should be sampled from, using the distance coordinates relevant to the dataset. Here, we specify to sample every thousandth of a degree.
  2. \n", "
  3. We specify the alpha argument as a value from 0 to 255 here, as opposed to a floating point range of 0 to 1. This will make it so we can see the underlying map background.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add in some options\n", "We'll add a colorbar, a log scale, and the hover tool." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flashes = flashes.opts(colorbar=True,logz=True,width=900,height=700,tools=['hover'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the final graphic\n", "Overlay the rasterized flashes with the OpenStreetMaps background" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flashmap = map_tiles * flashes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View and interact with the map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flashmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
TIP: Turn off the hover tool before panning and zooming, and then turn it back on when you are at your desired level of zoom or pan location).
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how quickly the map updates! The rasterized image's detail changes with zoom just as you are used to with other web-based mapping tools, such as Google Maps!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Things to try: \n", "Zoom in all the way so you can see the UAlbany uptown campus on the map. You will see cloud-to-ground lightning strikes very close to the ETEC building.\n", " \n", "Examine the Pandas dataframe: subset latitude and longitude so you can determine when these strikes occurred.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Summary\n", "* We have successfully and efficiently loaded, visualized, and interacted with a very large dataset!\n", "* Nevertheless, the memory footprint remains large enough so that it is not practical to have several users working with this dataset simultaneously.\n", "\n", "### What's Next?\n", "Take a look at [hvPlot](https://hvplot.holoviz.org/), a higher-level interface to Holoviz packages such as Geoviews and Datashader." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References:\n", "1. [National Lightning Detection Network](https://www.vaisala.com/en/products/national-lightning-detection-network-nldn)\n", "1. [Datashader](https://datashader.org/getting_started/Introduction.html)\n", "1. [Large Data in Holoviews](http://holoviews.org/user_guide/Large_Data.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 August 2023 Environment", "language": "python", "name": "aug23" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }