{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\"pandas
\n", "\n", "# Pandas Notebook 7: Merging DataFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook will combine into one dataframe observations from two separate data sources: the NYS Mesonet, and ASOS sites located in New York. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "1. Examine the NYSM and ASOS-derived datafiles for a particular day.\n", "1. Remove (*drop*) unwanted columns from a DataFrame.\n", "1. Use a dictionary to rename columns.\n", "1. Merge columns from one DataFrame into another and re-index.\n", "1. Add columns to a DataFrame\n", "1. Concatenate one DataFrame into another.\n", "1. Analyze the merged DataFrame.\n", "\n", "## Prerequisites\n", "\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| Pandas notebooks 1-7 | Necessary | Intro to `dict` |\n", "\n", "* **Time to learn**: 30 minutes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Examine the NYSM and ASOS-derived datafiles for a particular day" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in NYSM 5-minute data from 8/4/2020 (during the passage of Hurricane Isaias through the region) and also read in the NYSM site table so we can extract latitude, longitude, and elevation. Also read in a table that has hourly METAR observations from ASOS sites in New York State. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "nysm_data_file = '/spare11/atm533/data/nysm_data_20200804.csv'\n", "nysm_sites_file = '/spare11/atm533/data/nysm_sites.csv'\n", "nyasos_data_file = '/spare11/atm533/data/nyasos_data_20200804.txt'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "nysm_data = pd.read_csv(nysm_data_file)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "nysm_sites = pd.read_csv(nysm_sites_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the ASOS data. Note the following differences in this dataset's format as compared to how the NYSM data is structured:\n", "1. Its columns are named differently from those of the NYSM\n", "1. Its columns are separated using whitespace, not tabs\n", "1. Missing values are denoted by -9999.0, not null strings\n", "1. The date/time is formatted differently." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "STN YYMMDD/HHMM SLAT SLON SELV TMPC DWPC RELH PMSL SPED GUMS DRCT P01M\n", "ALB 200804/0000 42.75 -73.80 89.00 25.00 15.60 55.91 1016.50 2.06 -9999.00 10.00 -9999.00\n", "ART 200804/0000 44.00 -76.02 99.00 22.80 16.70 68.47 1015.50 0.00 -9999.00 0.00 -9999.00\n", "BGM 200804/0000 42.22 -75.98 497.00 23.90 15.00 57.47 1016.40 2.57 -9999.00 160.00 -9999.00\n", "BUF 200804/0000 42.93 -78.73 215.00 23.30 17.80 71.22 1015.00 3.60 -9999.00 150.00 0.00\n", "DKK 200804/0000 42.50 -79.28 203.00 20.60 18.30 86.66 1015.10 3.09 -9999.00 150.00 0.00\n", "DSV 200804/0000 42.57 -77.72 209.00 21.70 18.90 84.10 1016.30 4.12 -9999.00 330.00 0.25\n", "ELM 200804/0000 42.17 -76.90 291.00 27.80 15.00 45.60 1014.30 2.57 -9999.00 180.00 -9999.00\n", "ELZ 200804/0000 42.11 -77.99 647.00 22.20 15.00 63.70 1016.40 2.06 -9999.00 160.00 -9999.00\n", "FOK 200804/0000 40.85 -72.63 20.00 25.60 22.80 84.53 1018.00 2.57 -9999.00 220.00 -9999.00\n" ] } ], "source": [ "! head $nyasos_data_file" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "nyasos_data = pd.read_csv(nyasos_data_file,delim_whitespace=True,na_values=[-9999.0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the first couple of rows in all three DataFrames." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stationtimetemp_2m [degC]temp_9m [degC]relative_humidity [percent]precip_incremental [mm]precip_local [mm]precip_max_intensity [mm/min]avg_wind_speed_prop [m/s]max_wind_speed_prop [m/s]...snow_depth [cm]frozen_soil_05cm [bit]frozen_soil_25cm [bit]frozen_soil_50cm [bit]soil_temp_05cm [degC]soil_temp_25cm [degC]soil_temp_50cm [degC]soil_moisture_05cm [m^3/m^3]soil_moisture_25cm [m^3/m^3]soil_moisture_50cm [m^3/m^3]
0ADDI2020-08-04 00:00:00 UTC23.123.963.70.00.00.01.63.0...NaN0.00.00.022.620.319.10.070.20.26
1ADDI2020-08-04 00:05:00 UTC23.424.161.40.00.00.02.23.1...NaN0.00.00.022.620.319.20.080.20.26
\n", "

2 rows × 30 columns

\n", "
" ], "text/plain": [ " station time temp_2m [degC] temp_9m [degC] \\\n", "0 ADDI 2020-08-04 00:00:00 UTC 23.1 23.9 \n", "1 ADDI 2020-08-04 00:05:00 UTC 23.4 24.1 \n", "\n", " relative_humidity [percent] precip_incremental [mm] precip_local [mm] \\\n", "0 63.7 0.0 0.0 \n", "1 61.4 0.0 0.0 \n", "\n", " precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] \\\n", "0 0.0 1.6 \n", "1 0.0 2.2 \n", "\n", " max_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] \\\n", "0 3.0 ... NaN 0.0 \n", "1 3.1 ... NaN 0.0 \n", "\n", " frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] \\\n", "0 0.0 0.0 22.6 \n", "1 0.0 0.0 22.6 \n", "\n", " soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] \\\n", "0 20.3 19.1 0.07 \n", "1 20.3 19.2 0.08 \n", "\n", " soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] \n", "0 0.2 0.26 \n", "1 0.2 0.26 \n", "\n", "[2 rows x 30 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.head(2)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stidnumbernamelatlonelevationcountynearest_citystatedistance_from_town [km]direction_from_town [degrees]climate_divisionclimate_division_namewfocommissioneddecommissioned
0ADDI107Addison42.04036-77.23726507.614SteubenAddisonNY6.9S1Western PlateauBGM2016-08-10 18:15:00 UTCNaN
1ANDE111Andes42.18227-74.80139518.282DelawareAndesNY1.5WSW2Eastern PlateauBGM2016-08-04 15:55:00 UTCNaN
\n", "
" ], "text/plain": [ " stid number name lat lon elevation county \\\n", "0 ADDI 107 Addison 42.04036 -77.23726 507.614 Steuben \n", "1 ANDE 111 Andes 42.18227 -74.80139 518.282 Delaware \n", "\n", " nearest_city state distance_from_town [km] direction_from_town [degrees] \\\n", "0 Addison NY 6.9 S \n", "1 Andes NY 1.5 WSW \n", "\n", " climate_division climate_division_name wfo commissioned \\\n", "0 1 Western Plateau BGM 2016-08-10 18:15:00 UTC \n", "1 2 Eastern Plateau BGM 2016-08-04 15:55:00 UTC \n", "\n", " decommissioned \n", "0 NaN \n", "1 NaN " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_sites.head(2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STNYYMMDD/HHMMSLATSLONSELVTMPCDWPCRELHPMSLSPEDGUMSDRCTP01M
0ALB200804/000042.75-73.8089.025.015.655.911016.52.06NaN10.0NaN
1ART200804/000044.00-76.0299.022.816.768.471015.50.00NaN0.0NaN
\n", "
" ], "text/plain": [ " STN YYMMDD/HHMM SLAT SLON SELV TMPC DWPC RELH PMSL SPED \\\n", "0 ALB 200804/0000 42.75 -73.80 89.0 25.0 15.6 55.91 1016.5 2.06 \n", "1 ART 200804/0000 44.00 -76.02 99.0 22.8 16.7 68.47 1015.5 0.00 \n", "\n", " GUMS DRCT P01M \n", "0 NaN 10.0 NaN \n", "1 NaN 0.0 NaN " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyasos_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Remove (*drop*) unwanted columns from a DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our merged data file will contain only a subset of the columns from each data source. First, let's remove the unwanted columns from the NYSM file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `inplace` so the Dataframe object is updated, and specify that we're referring to columns, not the default axis, which is row indices." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "nysm_data.drop(['temp_9m [degC]','precip_incremental [mm]','precip_local [mm]','precip_max_intensity [mm/min]',\n", " 'avg_wind_speed_prop [m/s]', 'max_wind_speed_prop [m/s]','wind_speed_stddev_prop [m/s]', 'wind_direction_prop [degrees]',\n", " 'wind_direction_stddev_prop [degrees]','wind_speed_stddev_sonic [m/s]','wind_direction_stddev_sonic [degrees]', 'solar_insolation [W/m^2]','snow_depth [cm]', 'frozen_soil_05cm [bit]',\n", " 'frozen_soil_25cm [bit]', 'frozen_soil_50cm [bit]',\n", " 'soil_temp_05cm [degC]', 'soil_temp_25cm [degC]',\n", " 'soil_temp_50cm [degC]', 'soil_moisture_05cm [m^3/m^3]',\n", " 'soil_moisture_25cm [m^3/m^3]', 'soil_moisture_50cm [m^3/m^3]'],inplace=True,axis='columns')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine what columns remain post-drop." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['station', 'time', 'temp_2m [degC]', 'relative_humidity [percent]',\n", " 'avg_wind_speed_sonic [m/s]', 'max_wind_speed_sonic [m/s]',\n", " 'wind_direction_sonic [degrees]', 'station_pressure [mbar]'],\n", " dtype='object')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, drop any unwanted columns from the ASOS data file." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "nyasos_data.drop(['P01M'],axis='columns',inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the columns remaining in the ASOS data file" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['STN', 'YYMMDD/HHMM', 'SLAT', 'SLON', 'SELV', 'TMPC', 'DWPC', 'RELH',\n", " 'PMSL', 'SPED', 'GUMS', 'DRCT'],\n", " dtype='object')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyasos_data.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the columns in the NYSM Sites file" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['stid', 'number', 'name', 'lat', 'lon', 'elevation', 'county',\n", " 'nearest_city', 'state', 'distance_from_town [km]',\n", " 'direction_from_town [degrees]', 'climate_division',\n", " 'climate_division_name', 'wfo', 'commissioned', 'decommissioned'],\n", " dtype='object')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_sites.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use a dictionary to rename columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each dataframe has varying column names. Let's standardize by creating a `dictionary` that will map current column names to common (and in some cases, much shorter) names." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "column_mapping = {'station' : 'STID',\n", " 'time': 'TIME',\n", " 'temp_2m [degC]': 'TMPC',\n", " 'relative_humidity [percent]': 'RELH',\n", " 'precip_incremental [mm]': 'PRCP',\n", " 'precip_local [mm]': 'PTOT',\n", " 'avg_wind_speed_sonic [m/s]': 'SPED',\n", " 'max_wind_speed_sonic [m/s]': 'GUMS',\n", " 'wind_direction_sonic [degrees]': 'DRCT', \n", " 'station_pressure [mbar]': 'PRES',\n", " 'stid': 'STID',\n", " 'name': 'NAME',\n", " 'lat': 'SLAT',\n", " 'lon': 'SLON',\n", " 'elevation': 'SELV',\n", " 'STN': 'STID',\n", " 'YYMMDD/HHMM': 'TIME'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each of the three Dataframes, rename the columns according to our dictionary. Then examine each Dataframe to see how they look." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "nysm_data.rename(columns=column_mapping, inplace=True)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STIDTIMETMPCRELHSPEDGUMSDRCTPRES
0ADDI2020-08-04 00:00:00 UTC23.163.71.83.0175.0957.00
1ADDI2020-08-04 00:05:00 UTC23.461.42.43.6169.0957.03
2ADDI2020-08-04 00:10:00 UTC23.162.22.43.7167.0956.98
3ADDI2020-08-04 00:15:00 UTC23.262.62.64.3170.0957.01
4ADDI2020-08-04 00:20:00 UTC23.062.82.03.7172.0956.97
\n", "
" ], "text/plain": [ " STID TIME TMPC RELH SPED GUMS DRCT PRES\n", "0 ADDI 2020-08-04 00:00:00 UTC 23.1 63.7 1.8 3.0 175.0 957.00\n", "1 ADDI 2020-08-04 00:05:00 UTC 23.4 61.4 2.4 3.6 169.0 957.03\n", "2 ADDI 2020-08-04 00:10:00 UTC 23.1 62.2 2.4 3.7 167.0 956.98\n", "3 ADDI 2020-08-04 00:15:00 UTC 23.2 62.6 2.6 4.3 170.0 957.01\n", "4 ADDI 2020-08-04 00:20:00 UTC 23.0 62.8 2.0 3.7 172.0 956.97" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "nysm_sites.rename(columns=column_mapping, inplace=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STIDnumberNAMESLATSLONSELVcountynearest_citystatedistance_from_town [km]direction_from_town [degrees]climate_divisionclimate_division_namewfocommissioneddecommissioned
0ADDI107Addison42.04036-77.23726507.6140SteubenAddisonNY6.9S1Western PlateauBGM2016-08-10 18:15:00 UTCNaN
1ANDE111Andes42.18227-74.80139518.2820DelawareAndesNY1.5WSW2Eastern PlateauBGM2016-08-04 15:55:00 UTCNaN
2BATA24Batavia43.01994-78.13566276.1200GeneseeBataviaNY4.9ENE9Great LakesBUF2016-02-18 18:40:00 UTCNaN
3BEAC76Beacon41.52875-73.9452790.1598DutchessBeaconNY3.3NE5Hudson ValleyALY2016-08-22 16:45:00 UTCNaN
4BELD90Belden42.22322-75.66852470.3700BroomeBeldenNY2.2NNE2Eastern PlateauBGM2015-11-30 20:20:00 UTCNaN
\n", "
" ], "text/plain": [ " STID number NAME SLAT SLON SELV county nearest_city \\\n", "0 ADDI 107 Addison 42.04036 -77.23726 507.6140 Steuben Addison \n", "1 ANDE 111 Andes 42.18227 -74.80139 518.2820 Delaware Andes \n", "2 BATA 24 Batavia 43.01994 -78.13566 276.1200 Genesee Batavia \n", "3 BEAC 76 Beacon 41.52875 -73.94527 90.1598 Dutchess Beacon \n", "4 BELD 90 Belden 42.22322 -75.66852 470.3700 Broome Belden \n", "\n", " state distance_from_town [km] direction_from_town [degrees] \\\n", "0 NY 6.9 S \n", "1 NY 1.5 WSW \n", "2 NY 4.9 ENE \n", "3 NY 3.3 NE \n", "4 NY 2.2 NNE \n", "\n", " climate_division climate_division_name wfo commissioned \\\n", "0 1 Western Plateau BGM 2016-08-10 18:15:00 UTC \n", "1 2 Eastern Plateau BGM 2016-08-04 15:55:00 UTC \n", "2 9 Great Lakes BUF 2016-02-18 18:40:00 UTC \n", "3 5 Hudson Valley ALY 2016-08-22 16:45:00 UTC \n", "4 2 Eastern Plateau BGM 2015-11-30 20:20:00 UTC \n", "\n", " decommissioned \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_sites.head()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "nyasos_data.rename(columns=column_mapping, inplace=True)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STIDTIMESLATSLONSELVTMPCDWPCRELHPMSLSPEDGUMSDRCT
0ALB200804/000042.75-73.8089.025.015.655.911016.52.06NaN10.0
1ART200804/000044.00-76.0299.022.816.768.471015.50.00NaN0.0
2BGM200804/000042.22-75.98497.023.915.057.471016.42.57NaN160.0
3BUF200804/000042.93-78.73215.023.317.871.221015.03.60NaN150.0
4DKK200804/000042.50-79.28203.020.618.386.661015.13.09NaN150.0
\n", "
" ], "text/plain": [ " STID TIME SLAT SLON SELV TMPC DWPC RELH PMSL SPED \\\n", "0 ALB 200804/0000 42.75 -73.80 89.0 25.0 15.6 55.91 1016.5 2.06 \n", "1 ART 200804/0000 44.00 -76.02 99.0 22.8 16.7 68.47 1015.5 0.00 \n", "2 BGM 200804/0000 42.22 -75.98 497.0 23.9 15.0 57.47 1016.4 2.57 \n", "3 BUF 200804/0000 42.93 -78.73 215.0 23.3 17.8 71.22 1015.0 3.60 \n", "4 DKK 200804/0000 42.50 -79.28 203.0 20.6 18.3 86.66 1015.1 3.09 \n", "\n", " GUMS DRCT \n", "0 NaN 10.0 \n", "1 NaN 0.0 \n", "2 NaN 160.0 \n", "3 NaN 150.0 \n", "4 NaN 150.0 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyasos_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Merge columns from one DataFrame into another, and then re-index." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `merge` to add the latitude, longitude, and elevation from the NYSM Sites dataframe to the NYSM Data one.\n", "To do this, we take a subset of the nysm_sites dataframe containing the ones we want to merge, plus the station id column, since `merge` requires a common column for both dataframes. Note also that `merge` does not have an `inplace` option so we just redefine the `nysm_data` object." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "nysm_data = pd.merge(nysm_data,nysm_sites[['STID','SLAT','SLON','SELV']])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STIDTIMETMPCRELHSPEDGUMSDRCTPRESSLATSLONSELV
0ADDI2020-08-04 00:00:00 UTC23.163.71.83.0175.0957.0042.04036-77.23726507.614
1ADDI2020-08-04 00:05:00 UTC23.461.42.43.6169.0957.0342.04036-77.23726507.614
2ADDI2020-08-04 00:10:00 UTC23.162.22.43.7167.0956.9842.04036-77.23726507.614
3ADDI2020-08-04 00:15:00 UTC23.262.62.64.3170.0957.0142.04036-77.23726507.614
4ADDI2020-08-04 00:20:00 UTC23.062.82.03.7172.0956.9742.04036-77.23726507.614
\n", "
" ], "text/plain": [ " STID TIME TMPC RELH SPED GUMS DRCT PRES \\\n", "0 ADDI 2020-08-04 00:00:00 UTC 23.1 63.7 1.8 3.0 175.0 957.00 \n", "1 ADDI 2020-08-04 00:05:00 UTC 23.4 61.4 2.4 3.6 169.0 957.03 \n", "2 ADDI 2020-08-04 00:10:00 UTC 23.1 62.2 2.4 3.7 167.0 956.98 \n", "3 ADDI 2020-08-04 00:15:00 UTC 23.2 62.6 2.6 4.3 170.0 957.01 \n", "4 ADDI 2020-08-04 00:20:00 UTC 23.0 62.8 2.0 3.7 172.0 956.97 \n", "\n", " SLAT SLON SELV \n", "0 42.04036 -77.23726 507.614 \n", "1 42.04036 -77.23726 507.614 \n", "2 42.04036 -77.23726 507.614 \n", "3 42.04036 -77.23726 507.614 \n", "4 42.04036 -77.23726 507.614 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Make the NYSM and ASOS dataframes *multi-indexed* according to *site* and *time*. Convert the date/time from string to Datetime objects. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First set the indexes" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "nysm_data.set_index(['STID', 'TIME'], inplace = True)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCRELHSPEDGUMSDRCTPRESSLATSLONSELV
STIDTIME
ADDI2020-08-04 00:00:00 UTC23.163.71.83.0175.0957.0042.04036-77.23726507.614
2020-08-04 00:05:00 UTC23.461.42.43.6169.0957.0342.04036-77.23726507.614
2020-08-04 00:10:00 UTC23.162.22.43.7167.0956.9842.04036-77.23726507.614
2020-08-04 00:15:00 UTC23.262.62.64.3170.0957.0142.04036-77.23726507.614
2020-08-04 00:20:00 UTC23.062.82.03.7172.0956.9742.04036-77.23726507.614
\n", "
" ], "text/plain": [ " TMPC RELH SPED GUMS DRCT PRES SLAT \\\n", "STID TIME \n", "ADDI 2020-08-04 00:00:00 UTC 23.1 63.7 1.8 3.0 175.0 957.00 42.04036 \n", " 2020-08-04 00:05:00 UTC 23.4 61.4 2.4 3.6 169.0 957.03 42.04036 \n", " 2020-08-04 00:10:00 UTC 23.1 62.2 2.4 3.7 167.0 956.98 42.04036 \n", " 2020-08-04 00:15:00 UTC 23.2 62.6 2.6 4.3 170.0 957.01 42.04036 \n", " 2020-08-04 00:20:00 UTC 23.0 62.8 2.0 3.7 172.0 956.97 42.04036 \n", "\n", " SLON SELV \n", "STID TIME \n", "ADDI 2020-08-04 00:00:00 UTC -77.23726 507.614 \n", " 2020-08-04 00:05:00 UTC -77.23726 507.614 \n", " 2020-08-04 00:10:00 UTC -77.23726 507.614 \n", " 2020-08-04 00:15:00 UTC -77.23726 507.614 \n", " 2020-08-04 00:20:00 UTC -77.23726 507.614 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NYSM date/time strings can be automatically parsed into Datetime objects, recognized as in the UTC time zone." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_3772866/1677587967.py:1: FutureWarning: inplace is deprecated and will be removed in a future version.\n", " nysm_data.index.set_levels([nysm_data.index.levels[0], pd.to_datetime(nysm_data.index.levels[1])],inplace=True)\n" ] } ], "source": [ "nysm_data.index.set_levels([nysm_data.index.levels[0], pd.to_datetime(nysm_data.index.levels[1])],inplace=True)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCRELHSPEDGUMSDRCTPRESSLATSLONSELV
STIDTIME
ADDI2020-08-04 00:00:00+00:0023.163.71.83.0175.0957.0042.04036-77.23726507.614
2020-08-04 00:05:00+00:0023.461.42.43.6169.0957.0342.04036-77.23726507.614
2020-08-04 00:10:00+00:0023.162.22.43.7167.0956.9842.04036-77.23726507.614
2020-08-04 00:15:00+00:0023.262.62.64.3170.0957.0142.04036-77.23726507.614
2020-08-04 00:20:00+00:0023.062.82.03.7172.0956.9742.04036-77.23726507.614
\n", "
" ], "text/plain": [ " TMPC RELH SPED GUMS DRCT PRES \\\n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 23.1 63.7 1.8 3.0 175.0 957.00 \n", " 2020-08-04 00:05:00+00:00 23.4 61.4 2.4 3.6 169.0 957.03 \n", " 2020-08-04 00:10:00+00:00 23.1 62.2 2.4 3.7 167.0 956.98 \n", " 2020-08-04 00:15:00+00:00 23.2 62.6 2.6 4.3 170.0 957.01 \n", " 2020-08-04 00:20:00+00:00 23.0 62.8 2.0 3.7 172.0 956.97 \n", "\n", " SLAT SLON SELV \n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 42.04036 -77.23726 507.614 \n", " 2020-08-04 00:05:00+00:00 42.04036 -77.23726 507.614 \n", " 2020-08-04 00:10:00+00:00 42.04036 -77.23726 507.614 \n", " 2020-08-04 00:15:00+00:00 42.04036 -77.23726 507.614 \n", " 2020-08-04 00:20:00+00:00 42.04036 -77.23726 507.614 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ASOS data set needs to have its station id and date/times sorted in order for the multi-indexing to work properly." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "nyasos_data.sort_values(['STID','TIME'],inplace=True)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "nyasos_data.set_index(['STID', 'TIME'], inplace = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert the date/time strings. Here, we need to specify how the string is formatted." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_3772866/889004120.py:1: FutureWarning: inplace is deprecated and will be removed in a future version.\n", " nyasos_data.index.set_levels([nyasos_data.index.levels[0], pd.to_datetime(nyasos_data.index.levels[1],format=\"%y%m%d/%H%M\",utc=True)],inplace=True)\n" ] } ], "source": [ "nyasos_data.index.set_levels([nyasos_data.index.levels[0], pd.to_datetime(nyasos_data.index.levels[1],format=\"%y%m%d/%H%M\",utc=True)],inplace=True)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SLATSLONSELVTMPCDWPCRELHPMSLSPEDGUMSDRCT
STIDTIME
ALB2020-08-04 00:00:00+00:0042.75-73.8089.025.015.655.911016.52.06NaN10.0
2020-08-04 01:00:00+00:0042.75-73.8089.023.916.161.661016.61.54NaN60.0
2020-08-04 02:00:00+00:0042.75-73.8089.021.716.773.221017.00.00NaN0.0
2020-08-04 03:00:00+00:0042.75-73.8089.021.716.773.221017.00.00NaN0.0
2020-08-04 04:00:00+00:0042.75-73.8089.022.216.771.011017.00.00NaN0.0
....................................
SYR2020-08-04 19:00:00+00:0043.12-76.12124.021.119.490.021008.24.12NaN70.0
2020-08-04 20:00:00+00:0043.12-76.12124.021.119.490.021006.52.57NaN60.0
2020-08-04 21:00:00+00:0043.12-76.12124.021.118.987.261006.13.09NaN50.0
2020-08-04 22:00:00+00:0043.12-76.12124.022.818.978.651006.52.06NaN270.0
2020-08-04 23:00:00+00:0043.12-76.12124.022.818.978.651006.91.54NaN280.0
\n", "

837 rows × 10 columns

\n", "
" ], "text/plain": [ " SLAT SLON SELV TMPC DWPC RELH \\\n", "STID TIME \n", "ALB 2020-08-04 00:00:00+00:00 42.75 -73.80 89.0 25.0 15.6 55.91 \n", " 2020-08-04 01:00:00+00:00 42.75 -73.80 89.0 23.9 16.1 61.66 \n", " 2020-08-04 02:00:00+00:00 42.75 -73.80 89.0 21.7 16.7 73.22 \n", " 2020-08-04 03:00:00+00:00 42.75 -73.80 89.0 21.7 16.7 73.22 \n", " 2020-08-04 04:00:00+00:00 42.75 -73.80 89.0 22.2 16.7 71.01 \n", "... ... ... ... ... ... ... \n", "SYR 2020-08-04 19:00:00+00:00 43.12 -76.12 124.0 21.1 19.4 90.02 \n", " 2020-08-04 20:00:00+00:00 43.12 -76.12 124.0 21.1 19.4 90.02 \n", " 2020-08-04 21:00:00+00:00 43.12 -76.12 124.0 21.1 18.9 87.26 \n", " 2020-08-04 22:00:00+00:00 43.12 -76.12 124.0 22.8 18.9 78.65 \n", " 2020-08-04 23:00:00+00:00 43.12 -76.12 124.0 22.8 18.9 78.65 \n", "\n", " PMSL SPED GUMS DRCT \n", "STID TIME \n", "ALB 2020-08-04 00:00:00+00:00 1016.5 2.06 NaN 10.0 \n", " 2020-08-04 01:00:00+00:00 1016.6 1.54 NaN 60.0 \n", " 2020-08-04 02:00:00+00:00 1017.0 0.00 NaN 0.0 \n", " 2020-08-04 03:00:00+00:00 1017.0 0.00 NaN 0.0 \n", " 2020-08-04 04:00:00+00:00 1017.0 0.00 NaN 0.0 \n", "... ... ... ... ... \n", "SYR 2020-08-04 19:00:00+00:00 1008.2 4.12 NaN 70.0 \n", " 2020-08-04 20:00:00+00:00 1006.5 2.57 NaN 60.0 \n", " 2020-08-04 21:00:00+00:00 1006.1 3.09 NaN 50.0 \n", " 2020-08-04 22:00:00+00:00 1006.5 2.06 NaN 270.0 \n", " 2020-08-04 23:00:00+00:00 1006.9 1.54 NaN 280.0 \n", "\n", "[837 rows x 10 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyasos_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add columns to a DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NYSM data does not contain dewpoint nor sea-level pressure yet. Calculate and create columns for dewpoint and sea-level pressure." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "tmpc = nysm_data['TMPC']\n", "rh = nysm_data['RELH']\n", "pres = nysm_data['PRES']\n", "elev = nysm_data['SELV']" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "sensorHeight = .5\n", "# Reduce station pressure to SLP. Source: https://www.sandhurstweather.org.uk/barometric.pdf \n", "nysm_data['PMSL'] = pres/np.exp(-1*(elev+sensorHeight)/((tmpc+273.15) * 29.263))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "nysm_data['DWPC'] = 243.04*(np.log(rh/100)+((17.625*tmpc)/(243.04+tmpc)))/(17.625-np.log(rh/100)-((17.625*tmpc)/(243.04+tmpc)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have calculated sea-level pressure, let's drop station pressure from the NYSM DataFrame since it does not appear in the ASOS one." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "nysm_data.drop('PRES',inplace=True,axis='columns')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCRELHSPEDGUMSDRCTSLATSLONSELVPMSLDWPC
STIDTIME
ADDI2020-08-04 00:00:00+00:0023.163.71.83.0175.042.04036-77.23726507.6141014.76774515.846049
2020-08-04 00:05:00+00:0023.461.42.43.6169.042.04036-77.23726507.6141014.73938615.554540
2020-08-04 00:10:00+00:0023.162.22.43.7167.042.04036-77.23726507.6141014.74653815.473742
2020-08-04 00:15:00+00:0023.262.62.64.3170.042.04036-77.23726507.6141014.75827815.668212
2020-08-04 00:20:00+00:0023.062.82.03.7172.042.04036-77.23726507.6141014.75601715.529141
....................................
YORK2020-08-04 23:35:00+00:0021.789.71.32.3246.042.85504-77.84776177.9421009.39110019.933345
2020-08-04 23:40:00+00:0021.790.01.01.6237.042.85504-77.84776177.9421009.45235419.987260
2020-08-04 23:45:00+00:0021.790.91.22.2222.042.85504-77.84776177.9421009.57486120.148065
2020-08-04 23:50:00+00:0021.890.81.12.1231.042.85504-77.84776177.9421009.63924420.229084
2020-08-04 23:55:00+00:0021.790.00.81.9217.042.85504-77.84776177.9421009.68716019.987260
\n", "

36288 rows × 10 columns

\n", "
" ], "text/plain": [ " TMPC RELH SPED GUMS DRCT SLAT \\\n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 23.1 63.7 1.8 3.0 175.0 42.04036 \n", " 2020-08-04 00:05:00+00:00 23.4 61.4 2.4 3.6 169.0 42.04036 \n", " 2020-08-04 00:10:00+00:00 23.1 62.2 2.4 3.7 167.0 42.04036 \n", " 2020-08-04 00:15:00+00:00 23.2 62.6 2.6 4.3 170.0 42.04036 \n", " 2020-08-04 00:20:00+00:00 23.0 62.8 2.0 3.7 172.0 42.04036 \n", "... ... ... ... ... ... ... \n", "YORK 2020-08-04 23:35:00+00:00 21.7 89.7 1.3 2.3 246.0 42.85504 \n", " 2020-08-04 23:40:00+00:00 21.7 90.0 1.0 1.6 237.0 42.85504 \n", " 2020-08-04 23:45:00+00:00 21.7 90.9 1.2 2.2 222.0 42.85504 \n", " 2020-08-04 23:50:00+00:00 21.8 90.8 1.1 2.1 231.0 42.85504 \n", " 2020-08-04 23:55:00+00:00 21.7 90.0 0.8 1.9 217.0 42.85504 \n", "\n", " SLON SELV PMSL DWPC \n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 -77.23726 507.614 1014.767745 15.846049 \n", " 2020-08-04 00:05:00+00:00 -77.23726 507.614 1014.739386 15.554540 \n", " 2020-08-04 00:10:00+00:00 -77.23726 507.614 1014.746538 15.473742 \n", " 2020-08-04 00:15:00+00:00 -77.23726 507.614 1014.758278 15.668212 \n", " 2020-08-04 00:20:00+00:00 -77.23726 507.614 1014.756017 15.529141 \n", "... ... ... ... ... \n", "YORK 2020-08-04 23:35:00+00:00 -77.84776 177.942 1009.391100 19.933345 \n", " 2020-08-04 23:40:00+00:00 -77.84776 177.942 1009.452354 19.987260 \n", " 2020-08-04 23:45:00+00:00 -77.84776 177.942 1009.574861 20.148065 \n", " 2020-08-04 23:50:00+00:00 -77.84776 177.942 1009.639244 20.229084 \n", " 2020-08-04 23:55:00+00:00 -77.84776 177.942 1009.687160 19.987260 \n", "\n", "[36288 rows x 10 columns]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nysm_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Concatenate one DataFrame into another" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the two dataframes can be merged. We'll use the `concat` method to \"glue\" the rows from the ASOS table on to the NYSM one." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "nymerge_data = pd.concat([nysm_data,nyasos_data])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCRELHSPEDGUMSDRCTSLATSLONSELVPMSLDWPC
STIDTIME
ADDI2020-08-04 00:00:00+00:0023.163.701.803.0175.042.04036-77.23726507.6141014.76774515.846049
2020-08-04 00:05:00+00:0023.461.402.403.6169.042.04036-77.23726507.6141014.73938615.554540
2020-08-04 00:10:00+00:0023.162.202.403.7167.042.04036-77.23726507.6141014.74653815.473742
2020-08-04 00:15:00+00:0023.262.602.604.3170.042.04036-77.23726507.6141014.75827815.668212
2020-08-04 00:20:00+00:0023.062.802.003.7172.042.04036-77.23726507.6141014.75601715.529141
....................................
SYR2020-08-04 19:00:00+00:0021.190.024.12NaN70.043.12000-76.12000124.0001008.20000019.400000
2020-08-04 20:00:00+00:0021.190.022.57NaN60.043.12000-76.12000124.0001006.50000019.400000
2020-08-04 21:00:00+00:0021.187.263.09NaN50.043.12000-76.12000124.0001006.10000018.900000
2020-08-04 22:00:00+00:0022.878.652.06NaN270.043.12000-76.12000124.0001006.50000018.900000
2020-08-04 23:00:00+00:0022.878.651.54NaN280.043.12000-76.12000124.0001006.90000018.900000
\n", "

37125 rows × 10 columns

\n", "
" ], "text/plain": [ " TMPC RELH SPED GUMS DRCT SLAT \\\n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 23.1 63.70 1.80 3.0 175.0 42.04036 \n", " 2020-08-04 00:05:00+00:00 23.4 61.40 2.40 3.6 169.0 42.04036 \n", " 2020-08-04 00:10:00+00:00 23.1 62.20 2.40 3.7 167.0 42.04036 \n", " 2020-08-04 00:15:00+00:00 23.2 62.60 2.60 4.3 170.0 42.04036 \n", " 2020-08-04 00:20:00+00:00 23.0 62.80 2.00 3.7 172.0 42.04036 \n", "... ... ... ... ... ... ... \n", "SYR 2020-08-04 19:00:00+00:00 21.1 90.02 4.12 NaN 70.0 43.12000 \n", " 2020-08-04 20:00:00+00:00 21.1 90.02 2.57 NaN 60.0 43.12000 \n", " 2020-08-04 21:00:00+00:00 21.1 87.26 3.09 NaN 50.0 43.12000 \n", " 2020-08-04 22:00:00+00:00 22.8 78.65 2.06 NaN 270.0 43.12000 \n", " 2020-08-04 23:00:00+00:00 22.8 78.65 1.54 NaN 280.0 43.12000 \n", "\n", " SLON SELV PMSL DWPC \n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 -77.23726 507.614 1014.767745 15.846049 \n", " 2020-08-04 00:05:00+00:00 -77.23726 507.614 1014.739386 15.554540 \n", " 2020-08-04 00:10:00+00:00 -77.23726 507.614 1014.746538 15.473742 \n", " 2020-08-04 00:15:00+00:00 -77.23726 507.614 1014.758278 15.668212 \n", " 2020-08-04 00:20:00+00:00 -77.23726 507.614 1014.756017 15.529141 \n", "... ... ... ... ... \n", "SYR 2020-08-04 19:00:00+00:00 -76.12000 124.000 1008.200000 19.400000 \n", " 2020-08-04 20:00:00+00:00 -76.12000 124.000 1006.500000 19.400000 \n", " 2020-08-04 21:00:00+00:00 -76.12000 124.000 1006.100000 18.900000 \n", " 2020-08-04 22:00:00+00:00 -76.12000 124.000 1006.500000 18.900000 \n", " 2020-08-04 23:00:00+00:00 -76.12000 124.000 1006.900000 18.900000 \n", "\n", "[37125 rows x 10 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nymerge_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's rearrange the columns into a more logical order." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "colOrder = ['SLAT','SLON','SELV','TMPC','DWPC','RELH','PMSL','DRCT','SPED','GUMS']" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "nymerge_data = nymerge_data.reindex(columns=colOrder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note: Whenever you combine DataFrames, it's a very good idea to re-sort the indices. Although the new DataFrame may \"look\" correct, its indices may not be in what Pandas calls lexical order. We'll use Pandas sort_index for this purpose.
" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "nymerge_data.sort_index(ascending=True, inplace=True)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SLATSLONSELVTMPCDWPCRELHPMSLDRCTSPEDGUMS
STIDTIME
ADDI2020-08-04 00:00:00+00:0042.04036-77.23726507.61423.115.84604963.71014.767745175.01.83.0
2020-08-04 00:05:00+00:0042.04036-77.23726507.61423.415.55454061.41014.739386169.02.43.6
2020-08-04 00:10:00+00:0042.04036-77.23726507.61423.115.47374262.21014.746538167.02.43.7
2020-08-04 00:15:00+00:0042.04036-77.23726507.61423.215.66821262.61014.758278170.02.64.3
2020-08-04 00:20:00+00:0042.04036-77.23726507.61423.015.52914162.81014.756017172.02.03.7
....................................
YORK2020-08-04 23:35:00+00:0042.85504-77.84776177.94221.719.93334589.71009.391100246.01.32.3
2020-08-04 23:40:00+00:0042.85504-77.84776177.94221.719.98726090.01009.452354237.01.01.6
2020-08-04 23:45:00+00:0042.85504-77.84776177.94221.720.14806590.91009.574861222.01.22.2
2020-08-04 23:50:00+00:0042.85504-77.84776177.94221.820.22908490.81009.639244231.01.12.1
2020-08-04 23:55:00+00:0042.85504-77.84776177.94221.719.98726090.01009.687160217.00.81.9
\n", "

37125 rows × 10 columns

\n", "
" ], "text/plain": [ " SLAT SLON SELV TMPC DWPC \\\n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 42.04036 -77.23726 507.614 23.1 15.846049 \n", " 2020-08-04 00:05:00+00:00 42.04036 -77.23726 507.614 23.4 15.554540 \n", " 2020-08-04 00:10:00+00:00 42.04036 -77.23726 507.614 23.1 15.473742 \n", " 2020-08-04 00:15:00+00:00 42.04036 -77.23726 507.614 23.2 15.668212 \n", " 2020-08-04 00:20:00+00:00 42.04036 -77.23726 507.614 23.0 15.529141 \n", "... ... ... ... ... ... \n", "YORK 2020-08-04 23:35:00+00:00 42.85504 -77.84776 177.942 21.7 19.933345 \n", " 2020-08-04 23:40:00+00:00 42.85504 -77.84776 177.942 21.7 19.987260 \n", " 2020-08-04 23:45:00+00:00 42.85504 -77.84776 177.942 21.7 20.148065 \n", " 2020-08-04 23:50:00+00:00 42.85504 -77.84776 177.942 21.8 20.229084 \n", " 2020-08-04 23:55:00+00:00 42.85504 -77.84776 177.942 21.7 19.987260 \n", "\n", " RELH PMSL DRCT SPED GUMS \n", "STID TIME \n", "ADDI 2020-08-04 00:00:00+00:00 63.7 1014.767745 175.0 1.8 3.0 \n", " 2020-08-04 00:05:00+00:00 61.4 1014.739386 169.0 2.4 3.6 \n", " 2020-08-04 00:10:00+00:00 62.2 1014.746538 167.0 2.4 3.7 \n", " 2020-08-04 00:15:00+00:00 62.6 1014.758278 170.0 2.6 4.3 \n", " 2020-08-04 00:20:00+00:00 62.8 1014.756017 172.0 2.0 3.7 \n", "... ... ... ... ... ... \n", "YORK 2020-08-04 23:35:00+00:00 89.7 1009.391100 246.0 1.3 2.3 \n", " 2020-08-04 23:40:00+00:00 90.0 1009.452354 237.0 1.0 1.6 \n", " 2020-08-04 23:45:00+00:00 90.9 1009.574861 222.0 1.2 2.2 \n", " 2020-08-04 23:50:00+00:00 90.8 1009.639244 231.0 1.1 2.1 \n", " 2020-08-04 23:55:00+00:00 90.0 1009.687160 217.0 0.8 1.9 \n", "\n", "[37125 rows x 10 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nymerge_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyze the merged DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we have what we wanted ... one Dataframe that contains the same variables for both datasets. We can make selections on the multi-index Dataframe as we've done previously." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get some general information about the `DataFrame`:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "MultiIndex: 37125 entries, ('ADDI', Timestamp('2020-08-04 00:00:00+0000', tz='UTC')) to ('YORK', Timestamp('2020-08-04 23:55:00+0000', tz='UTC'))\n", "Data columns (total 10 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 SLAT 37125 non-null float64\n", " 1 SLON 37125 non-null float64\n", " 2 SELV 37125 non-null float64\n", " 3 TMPC 37125 non-null float64\n", " 4 DWPC 36141 non-null float64\n", " 5 RELH 36141 non-null float64\n", " 6 PMSL 37033 non-null float64\n", " 7 DRCT 37079 non-null float64\n", " 8 SPED 37123 non-null float64\n", " 9 GUMS 36415 non-null float64\n", "dtypes: float64(10)\n", "memory usage: 3.0+ MB\n" ] } ], "source": [ "nymerge_data.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select one station" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SLATSLONSELVTMPCDWPCRELHPMSLDRCTSPEDGUMS
TIME
2020-08-04 00:00:00+00:0043.12-77.67169.021.120.696.971015.770.03.60NaN
2020-08-04 01:00:00+00:0043.12-77.67169.021.120.093.441015.550.02.06NaN
2020-08-04 02:00:00+00:0043.12-77.67169.020.019.496.341016.090.04.12NaN
2020-08-04 03:00:00+00:0043.12-77.67169.020.019.496.341015.9240.01.54NaN
2020-08-04 04:00:00+00:0043.12-77.67169.020.019.496.341015.4100.01.54NaN
2020-08-04 05:00:00+00:0043.12-77.67169.020.019.496.341014.7130.02.06NaN
2020-08-04 06:00:00+00:0043.12-77.67169.020.019.496.341014.1230.01.54NaN
2020-08-04 07:00:00+00:0043.12-77.67169.019.418.996.931013.40.00.00NaN
2020-08-04 08:00:00+00:0043.12-77.67169.019.418.996.931013.30.00.00NaN
2020-08-04 09:00:00+00:0043.12-77.67169.019.419.4100.001012.80.00.00NaN
2020-08-04 10:00:00+00:0043.12-77.67169.020.019.496.341012.60.00.00NaN
2020-08-04 11:00:00+00:0043.12-77.67169.019.418.996.931012.4280.01.54NaN
2020-08-04 12:00:00+00:0043.12-77.67169.020.619.492.831012.00.00.00NaN
2020-08-04 13:00:00+00:0043.12-77.67169.021.120.093.441011.90.00.00NaN
2020-08-04 14:00:00+00:0043.12-77.67169.021.120.093.441011.70.00.00NaN
2020-08-04 15:00:00+00:0043.12-77.67169.021.718.984.101011.40.00.00NaN
2020-08-04 16:00:00+00:0043.12-77.67169.021.720.090.061010.9170.02.57NaN
2020-08-04 17:00:00+00:0043.12-77.67169.021.720.693.471010.3150.02.06NaN
2020-08-04 18:00:00+00:0043.12-77.67169.022.220.087.351009.60.00.00NaN
2020-08-04 19:00:00+00:0043.12-77.67169.023.320.081.711008.810.02.06NaN
2020-08-04 20:00:00+00:0043.12-77.67169.025.020.073.781007.520.04.12NaN
2020-08-04 21:00:00+00:0043.12-77.67169.021.721.196.391008.3320.03.60NaN
2020-08-04 22:00:00+00:0043.12-77.67169.022.220.690.661008.7340.03.09NaN
2020-08-04 23:00:00+00:0043.12-77.67169.022.819.481.141009.0320.03.09NaN
\n", "
" ], "text/plain": [ " SLAT SLON SELV TMPC DWPC RELH PMSL \\\n", "TIME \n", "2020-08-04 00:00:00+00:00 43.12 -77.67 169.0 21.1 20.6 96.97 1015.7 \n", "2020-08-04 01:00:00+00:00 43.12 -77.67 169.0 21.1 20.0 93.44 1015.5 \n", "2020-08-04 02:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1016.0 \n", "2020-08-04 03:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1015.9 \n", "2020-08-04 04:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1015.4 \n", "2020-08-04 05:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1014.7 \n", "2020-08-04 06:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1014.1 \n", "2020-08-04 07:00:00+00:00 43.12 -77.67 169.0 19.4 18.9 96.93 1013.4 \n", "2020-08-04 08:00:00+00:00 43.12 -77.67 169.0 19.4 18.9 96.93 1013.3 \n", "2020-08-04 09:00:00+00:00 43.12 -77.67 169.0 19.4 19.4 100.00 1012.8 \n", "2020-08-04 10:00:00+00:00 43.12 -77.67 169.0 20.0 19.4 96.34 1012.6 \n", "2020-08-04 11:00:00+00:00 43.12 -77.67 169.0 19.4 18.9 96.93 1012.4 \n", "2020-08-04 12:00:00+00:00 43.12 -77.67 169.0 20.6 19.4 92.83 1012.0 \n", "2020-08-04 13:00:00+00:00 43.12 -77.67 169.0 21.1 20.0 93.44 1011.9 \n", "2020-08-04 14:00:00+00:00 43.12 -77.67 169.0 21.1 20.0 93.44 1011.7 \n", "2020-08-04 15:00:00+00:00 43.12 -77.67 169.0 21.7 18.9 84.10 1011.4 \n", "2020-08-04 16:00:00+00:00 43.12 -77.67 169.0 21.7 20.0 90.06 1010.9 \n", "2020-08-04 17:00:00+00:00 43.12 -77.67 169.0 21.7 20.6 93.47 1010.3 \n", "2020-08-04 18:00:00+00:00 43.12 -77.67 169.0 22.2 20.0 87.35 1009.6 \n", "2020-08-04 19:00:00+00:00 43.12 -77.67 169.0 23.3 20.0 81.71 1008.8 \n", "2020-08-04 20:00:00+00:00 43.12 -77.67 169.0 25.0 20.0 73.78 1007.5 \n", "2020-08-04 21:00:00+00:00 43.12 -77.67 169.0 21.7 21.1 96.39 1008.3 \n", "2020-08-04 22:00:00+00:00 43.12 -77.67 169.0 22.2 20.6 90.66 1008.7 \n", "2020-08-04 23:00:00+00:00 43.12 -77.67 169.0 22.8 19.4 81.14 1009.0 \n", "\n", " DRCT SPED GUMS \n", "TIME \n", "2020-08-04 00:00:00+00:00 70.0 3.60 NaN \n", "2020-08-04 01:00:00+00:00 50.0 2.06 NaN \n", "2020-08-04 02:00:00+00:00 90.0 4.12 NaN \n", "2020-08-04 03:00:00+00:00 240.0 1.54 NaN \n", "2020-08-04 04:00:00+00:00 100.0 1.54 NaN \n", "2020-08-04 05:00:00+00:00 130.0 2.06 NaN \n", "2020-08-04 06:00:00+00:00 230.0 1.54 NaN \n", "2020-08-04 07:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 08:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 09:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 10:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 11:00:00+00:00 280.0 1.54 NaN \n", "2020-08-04 12:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 13:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 14:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 15:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 16:00:00+00:00 170.0 2.57 NaN \n", "2020-08-04 17:00:00+00:00 150.0 2.06 NaN \n", "2020-08-04 18:00:00+00:00 0.0 0.00 NaN \n", "2020-08-04 19:00:00+00:00 10.0 2.06 NaN \n", "2020-08-04 20:00:00+00:00 20.0 4.12 NaN \n", "2020-08-04 21:00:00+00:00 320.0 3.60 NaN \n", "2020-08-04 22:00:00+00:00 340.0 3.09 NaN \n", "2020-08-04 23:00:00+00:00 320.0 3.09 NaN " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nymerge_data.loc['ROC']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select multiple stations and multiple times." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCDWPC
STIDTIME
ALB2020-08-04 18:00:00+00:0019.419.400000
VOOR2020-08-04 17:45:00+00:0020.219.988491
2020-08-04 18:00:00+00:0020.320.071947
\n", "
" ], "text/plain": [ " TMPC DWPC\n", "STID TIME \n", "ALB 2020-08-04 18:00:00+00:00 19.4 19.400000\n", "VOOR 2020-08-04 17:45:00+00:00 20.2 19.988491\n", " 2020-08-04 18:00:00+00:00 20.3 20.071947" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nymerge_data.loc[(('ALB','VOOR'),('2020-08-04 17:45:00 UTC','2020-08-04 18:00:00 UTC')),('TMPC','DWPC')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the *cross-section* method to select one index value and then display all values of the other index for selected columns." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TMPCDWPCRELH
STID
ADDI20.019.42579296.50
ALB19.419.400000100.00
ANDE19.018.57839197.40
ART21.120.00000093.44
BATA23.219.51769279.80
............
WFMB17.6NaNNaN
WGAT19.318.84451197.20
WHIT21.020.58858497.50
WOLC21.1NaNNaN
YORK22.820.31594585.90
\n", "

161 rows × 3 columns

\n", "
" ], "text/plain": [ " TMPC DWPC RELH\n", "STID \n", "ADDI 20.0 19.425792 96.50\n", "ALB 19.4 19.400000 100.00\n", "ANDE 19.0 18.578391 97.40\n", "ART 21.1 20.000000 93.44\n", "BATA 23.2 19.517692 79.80\n", "... ... ... ...\n", "WFMB 17.6 NaN NaN\n", "WGAT 19.3 18.844511 97.20\n", "WHIT 21.0 20.588584 97.50\n", "WOLC 21.1 NaN NaN\n", "YORK 22.8 20.315945 85.90\n", "\n", "[161 rows x 3 columns]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nymerge_data.xs('2020-08-04 18:00:00 UTC',level='TIME')[['TMPC','DWPC','RELH']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip: We could, of course, take this merged DataFrame and create a station plot with it ... which would show not only the NYSM data, but also the tradtional ASOS network of stations!
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Summary\n", "* Pandas can efficiently combine datasets with different attributes into a single `DataFrame`.\n", "* Pay attention to the row indexing in the resulting merged `DataFrame`.\n", "\n", "### What's Next?\n", "In the next notebook, we will explore Pandas' `groupby` function.\n", "\n", "## Resources and References\n", "1. [MetPy Monday Episode 94](https://www.youtube.com/watch?v=ncpYohRYG3I&list=PLQut5OXpV-0ir4IdllSt1iEZKTwFBa7kO&index=85&t=4s)\n", "1. [MetPy Monday Episode 97](https://www.youtube.com/watch?v=rj2ZEAIbg1k&list=PLQut5OXpV-0ir4IdllSt1iEZKTwFBa7kO&index=88&ab_channel=Unidata)\n", "1. [MetPy Monday Episode 98](https://www.youtube.com/watch?v=slUGaLyLJX0&list=PLQut5OXpV-0ir4IdllSt1iEZKTwFBa7kO&index=89&ab_channel=Unidata)\n", "1. [National Weather Service Automated Surface Observing Systems (ASOS)](https://www.weather.gov/asos/)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 August 2022 Environment", "language": "python", "name": "aug22" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 4 }