{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Shapefiles 1: Intro to Shapefiles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview:\n", "1. Read in the state/province Natural Earth shapefile\n", "2. Explore the shapefile using GeoPandas\n", "3. Plot the shapefile on a map and overlay with other Natural Earth shapefiles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| Pandas| Necessary | |\n", "| Cartopy| Necessary | |\n", "\n", "* **Time to learn**: 30 minutes\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a **shapefile**?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From [Wikipedia](https://en.wikipedia.org/wiki/Shapefile):\n", ">The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Rasters versus Shapefiles: By virtue of being a *vector data format*, shapefiles do not becoming fuzzy/pixelated as one zooms in on maps that display features defined by shapefiles. This is in contrast to raster-based data. See the first reference at the end of the notebook for a nice explanation.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note: We've already been using shapefiles, whenever we've added various Natural Earth backgrounds with Cartopy's feature. We'll examine shapefiles in more detail here, using Cartopy's shapereader.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cartopy.crs as ccrs\n", "import cartopy.feature as cfeature\n", "import geopandas as gp\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import cartopy.io.shapereader as shpreader\n", "from metpy.plots import USCOUNTIES\n", "import glob\n", "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in a shapefile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use Cartopy's `shpreader.natural_earth` method to read in the highest (10m) resolution states and provinces shapefile. \n", "It will be automatically downloaded if not present, and is cached by default in a hidden subdirectory in your home directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = '10m'\n", "category = 'cultural'\n", "name = 'admin_1_states_provinces'\n", "shpfilename = shpreader.natural_earth(res, category, name)\n", "shpfilename" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What files matching this particular political boundary classification do we have?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = os.path.dirname(shpfilename)\n", "glob.glob(path + \"/ne_10m_admin_1_states_provinces.*\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have five files with different suffixes. The first three **must** be present (see the second reference at the end of the notebook). Here, we will read in the *.shp* file, using a Pandas-like library, **Geopandas**, which creates a geo-referenced Pandas `Dataframe`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore the shapefile using [Geopandas](https://geopandas.org/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display all columns in the dataframe. Then create the dataframe and take a look at the first and last five rows (scroll to see all 84 columns)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.max_columns', None)\n", "df = gp.read_file(shpfilename)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the column names more concisely." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip: Countries are in the DataFrame's admin column. We can chain together Pandas` methods such as sort_values and unique to show all of the countries included in the shapefile.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.admin.sort_values().unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's display all rows that are in the USA." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.loc[df.admin == 'United States of America']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip: The name column corresponds to the states.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examine the *Geometry* column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When Geopandas creates a dataframe from a Shapefile, a key column that gets added is the `geometry` one. Here we can see that each of the states have Geometry features consisting of *Polygons* or *Multipolygons*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.loc[df.name == 'Colorado']['geometry']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.loc[df.name == 'New York']['geometry']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Question: Notice that Colorado's geometry is POLYGON while New York is MULTIPOLYGON. Can you think of a reason for the distinction? (Hint: are any islands part of the state?)\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make a Geopandas dataframe for New York State " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dfNYS = df.loc[df.name == 'New York']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with any Pandas Dataframe, we can directly call its `plot` method for a quick view." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dfNYS.plot(figsize=(12,9))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create a GeoPandas series consisting only of the Geometry column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geoNY = df.loc[df.name == 'New York']['geometry']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geoNY" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the `values` attribute of this Geopandas geometry object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geoNY.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note: The object begins with shapely. It stems from the Shapely library. Although we did not explicitly list this library in our list of imports, it was imported as a result of the import cartopy.io.shapereader as shpreader statement.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we take just the first (and only element) of this array, what do we get?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nysPoly = geoNY.values[0]\n", "nysPoly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot the shapefile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now add this object to a Cartopy-referenced Matplotlib plot using Matplotlib's `add_geometries` method.\n", "First we'll comment out all of the Natural Earth shapefiles we typically use. This will produce just a map of NYS." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "fig = plt.figure(figsize=(15,10))\n", "ax = plt.axes(projection=ccrs.PlateCarree())\n", "plt.title('NYS')\n", "ax.add_geometries(nysPoly, ccrs.PlateCarree(),\n", " edgecolor='black', facecolor='tab:blue', alpha=0.5)\n", "ax.set_extent([-80, -72,40.5,45.2], ccrs.PlateCarree());" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add additional features to the plot\n", "Now, add some of Cartopy's Natural Earth features to the plot. We should see that with the exception of the counties (which defaults to a slightly lower resolution (20m) than the others) everything should line up nicely. We'll slightly expand the map extent so we'll see neighboring state/province outlines." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = '10m'\n", "fig = plt.figure(figsize=(15,10))\n", "ax = plt.axes(projection=ccrs.PlateCarree())\n", "\n", "plt.title('NYS')\n", "ax.coastlines(resolution=res)\n", "ax.add_geometries(nysPoly, ccrs.PlateCarree(),\n", " edgecolor='black', facecolor='tab:blue', alpha=0.5)\n", "ax.add_feature (cfeature.LAKES.with_scale(res), alpha = 0.5)\n", "ax.add_feature (cfeature.STATES.with_scale(res))\n", "ax.add_feature(USCOUNTIES,edgecolor='grey',linewidth=1 );\n", "ax.set_extent([-85, -67,37.5,49.5], ccrs.PlateCarree());" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip: The USCOUNTIES shapefile also has a 5m version. To use it, append .with_scale('5m') to USCOUNTIES.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Summary\n", "* Matplotlib and Cartopy, via the Shapely library, provides support for reading and displaying shapefiles. \n", "* Shapefiles are a form of vector graphics; unlike rasters, they do not degrade as one zooms into them.\n", "* The Geopandas library treats shapefiles as geo-referenced Pandas dataframes.\n", "\n", "### What's Next?\n", "We'll return to cloud-optimized geotiffs, specifically those that are tile-based, with more detail automatically displayed as one zooms in to a particular region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References:\n", "1. https://www.gislounge.com/geodatabases-explored-vector-and-raster-data/\n", "1. https://www.earthdatascience.org/workshops/gis-open-source-python/intro-vector-data-python/\n", "2. https://wiki.openstreetmap.org/wiki/Shapefiles" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 August 2023 Environment", "language": "python", "name": "aug23" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }