Contents

Pandas 3: Indexing, Subsetting, Conditionals

Contents

pandas Logo

Pandas 3: Indexing, Subsetting, Conditionals


Overview

In this notebook, we’ll work with Pandas DataFrame and Series objects to do the following:

  1. Specify a particular column to use as the row index name

  2. Select various rows, columns, or both

  3. Filter a dataset based on specified conditions

  4. Calculate and output some summary information, using data from multiple files

We’ll once again use NYS Mesonet data from 2 September 2021 at 0200 UTC.

Prerequisites

Concepts

Importance

Notes

Pandas

Necessary

Intro

  • Time to learn: 30 minutes

Imports

import pandas as pd
import numpy as np

Create and inspect the relevant DataFrames.

nysm_sites = pd.read_csv('/spare11/atm533/data/nysm_sites.csv')
nysm_data = pd.read_csv('/spare11/atm533/data/nysm_data_2021090202.csv')
nysm_data
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
0 ADDI 2021-09-02 02:00:00 UTC 13.3 13.4 92.5 0.00 0.00 0.00 2.6 5.8 ... NaN 0.0 0.0 0.0 19.9 20.3 19.9 0.52 0.44 0.44
1 ANDE 2021-09-02 02:00:00 UTC 14.0 13.7 100.0 0.28 10.67 0.00 1.9 2.6 ... NaN 0.0 0.0 0.0 19.1 19.1 19.3 0.25 0.21 0.14
2 BATA 2021-09-02 02:00:00 UTC 14.8 16.3 77.5 0.00 0.00 0.00 1.9 2.3 ... NaN 0.0 0.0 0.0 20.5 21.5 21.3 0.25 0.21 0.22
3 BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37
4 BELD 2021-09-02 02:00:00 UTC 14.3 14.5 94.9 0.00 1.16 0.00 2.6 4.8 ... NaN 0.0 0.0 0.0 19.7 20.1 20.2 0.50 0.43 0.41
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2021-09-02 02:00:00 UTC 12.9 13.6 75.0 0.00 0.00 0.00 0.8 1.5 ... NaN 0.0 0.0 0.0 18.8 19.9 19.9 0.24 0.18 0.20
122 WGAT 2021-09-02 02:00:00 UTC 13.8 13.8 79.5 0.00 0.00 0.00 1.4 4.0 ... NaN 0.0 0.0 0.0 18.8 20.2 20.7 0.16 0.25 0.08
123 WHIT 2021-09-02 02:00:00 UTC 15.7 15.8 95.9 0.00 0.00 0.00 1.4 2.6 ... NaN 0.0 0.0 0.0 19.3 20.3 20.1 0.28 0.47 0.46
124 WOLC 2021-09-02 02:00:00 UTC 14.0 16.6 84.6 0.00 0.00 0.00 0.4 0.9 ... NaN 0.0 0.0 0.0 21.9 23.5 24.2 0.18 0.03 0.07
125 YORK 2021-09-02 02:00:00 UTC 12.0 13.9 96.0 0.00 0.00 0.00 0.0 0.3 ... NaN 0.0 0.0 0.0 20.5 21.8 21.9 0.13 0.24 0.24

126 rows × 30 columns

nysm_sites
stid number name lat lon elevation county nearest_city state distance_from_town [km] direction_from_town [degrees] climate_division climate_division_name wfo commissioned decommissioned
0 ADDI 107 Addison 42.040360 -77.237260 507.6140 Steuben Addison NY 6.9 S 1 Western Plateau BGM 2016-08-10 18:15:00 UTC NaN
1 ANDE 111 Andes 42.182270 -74.801390 518.2820 Delaware Andes NY 1.5 WSW 2 Eastern Plateau BGM 2016-08-04 15:55:00 UTC NaN
2 BATA 24 Batavia 43.019940 -78.135660 276.1200 Genesee Batavia NY 4.9 ENE 9 Great Lakes BUF 2016-02-18 18:40:00 UTC NaN
3 BEAC 76 Beacon 41.528750 -73.945270 90.1598 Dutchess Beacon NY 3.3 NE 5 Hudson Valley ALY 2016-08-22 16:45:00 UTC NaN
4 BELD 90 Belden 42.223220 -75.668520 470.3700 Broome Belden NY 2.2 NNE 2 Eastern Plateau BGM 2015-11-30 20:20:00 UTC NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 14 Whiteface Mountain Base 44.393236 -73.858829 614.5990 Essex Wilmington NY 3.5 W 3 Northern Plateau BTV 2016-01-29 20:55:00 UTC NaN
122 WGAT 123 Woodgate 43.532408 -75.158597 442.9660 Oneida Woodgate NY 1.4 NNW 3 Northern Plateau BGM 2016-08-29 18:20:00 UTC NaN
123 WHIT 10 Whitehall 43.485073 -73.423071 36.5638 Washington Whitehall NY 8.0 S 7 Champlain Valley ALY 2015-08-26 20:30:00 UTC NaN
124 WOLC 79 Wolcott 43.228680 -76.842610 121.2190 Wayne Wolcott NY 2.4 WNW 9 Great Lakes BUF 2016-03-09 18:10:00 UTC NaN
125 YORK 99 York 42.855040 -77.847760 177.9420 Livingston York NY 3.6 ESE 10 Central Lakes BUF 2016-08-09 17:55:00 UTC NaN

126 rows × 16 columns

Select a row index name from a particular column

For the nysm_data DataFrame, let’s set the station column as the row index, in place of the default index. The inplace argument makes it so the DataFrame object is updated with that change.

nysm_data.set_index('station', inplace=True)
Question: How do we discover the possible arguments for the set_index function?
Answer: We can follow any function (aka "method") with a `?` to read documentation on it.
nysm_data.set_index?

One of the arguments to set_index is drop. If true, it will drop (i.e., delete) the corresponding column from the DataFrame. Its default is True, so we don’t need to include it if that’s what we want.

Let’s look at the first couple rows in the DataFrame to verify that the index got changed.

nysm_data.head(2)
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
ADDI 2021-09-02 02:00:00 UTC 13.3 13.4 92.5 0.00 0.00 0.0 2.6 5.8 0.9 ... NaN 0.0 0.0 0.0 19.9 20.3 19.9 0.52 0.44 0.44
ANDE 2021-09-02 02:00:00 UTC 14.0 13.7 100.0 0.28 10.67 0.0 1.9 2.6 0.4 ... NaN 0.0 0.0 0.0 19.1 19.1 19.3 0.25 0.21 0.14

2 rows × 29 columns

Success! Now the station name is the row index.

Selecting rows and/or columns from a DataFrame using the loc and iloc operators (full documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)

Let’s say we just want to select a few of the NYS Mesonet sites for analysis. We can select rows / columns either by name or index number. First, let’s select by name.

The loc operator can accept a single row/column name, or a list of names. Try a single site first. As loc is a type of Python list or array index operator, we place the arguments inside brackets [].

# Display data for the Manhattan (MANH) site.
nysm_data.loc['MANH']
time                                     2021-09-02 02:00:00 UTC
temp_2m [degC]                                              18.7
temp_9m [degC]                                               NaN
relative_humidity [percent]                                100.0
precip_incremental [mm]                                     4.64
precip_local [mm]                                         111.03
precip_max_intensity [mm/min]                               1.22
avg_wind_speed_prop [m/s]                                    5.4
max_wind_speed_prop [m/s]                                    9.5
wind_speed_stddev_prop [m/s]                                 2.0
wind_direction_prop [degrees]                              352.0
wind_direction_stddev_prop [degrees]                        24.0
avg_wind_speed_sonic [m/s]                                   6.6
max_wind_speed_sonic [m/s]                                  12.5
wind_speed_stddev_sonic [m/s]                                2.2
wind_direction_sonic [degrees]                             352.0
wind_direction_stddev_sonic [degrees]                       18.0
solar_insolation [W/m^2]                                       0
station_pressure [mbar]                                    989.3
snow_depth [cm]                                              NaN
frozen_soil_05cm [bit]                                       NaN
frozen_soil_25cm [bit]                                       NaN
frozen_soil_50cm [bit]                                       NaN
soil_temp_05cm [degC]                                        NaN
soil_temp_25cm [degC]                                        NaN
soil_temp_50cm [degC]                                        NaN
soil_moisture_05cm [m^3/m^3]                                 NaN
soil_moisture_25cm [m^3/m^3]                                 NaN
soil_moisture_50cm [m^3/m^3]                                 NaN
Name: MANH, dtype: object

Multiple sites can be passed as a Python list … with its own set of brackets []! This list can be in any order. Once we select more than one row, the output is now a DataFrame … and will be nicely formatted in the notebook.

# Display data for the Rush (RUSH), Bronx (BRON) and Voorheeseville (VOOR) sites.
nysm_data.loc[['RUSH','BRON','VOOR']]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
RUSH 2021-09-02 02:00:00 UTC 11.3 12.9 98.3 0.00 0.00 0.00 0.0 0.0 0.0 ... NaN 0.0 0.0 0.0 20.5 21.3 21.3 0.26 0.25 0.25
BRON 2021-09-02 02:00:00 UTC 18.0 17.8 99.0 6.86 110.35 1.84 6.8 11.5 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
VOOR 2021-09-02 02:00:00 UTC 14.9 14.9 98.9 0.26 11.67 0.00 2.2 3.6 0.6 ... NaN 0.0 0.0 0.0 20.2 21.1 21.1 0.25 0.33 0.19

3 rows × 29 columns

We can also select by row number () using the iloc operator. Manhattan is in row 66. Recall that Python uses zero-based indexing, so the first row is element 0.

nysm_data.iloc[65]
time                                     2021-09-02 02:00:00 UTC
temp_2m [degC]                                              18.7
temp_9m [degC]                                               NaN
relative_humidity [percent]                                100.0
precip_incremental [mm]                                     4.64
precip_local [mm]                                         111.03
precip_max_intensity [mm/min]                               1.22
avg_wind_speed_prop [m/s]                                    5.4
max_wind_speed_prop [m/s]                                    9.5
wind_speed_stddev_prop [m/s]                                 2.0
wind_direction_prop [degrees]                              352.0
wind_direction_stddev_prop [degrees]                        24.0
avg_wind_speed_sonic [m/s]                                   6.6
max_wind_speed_sonic [m/s]                                  12.5
wind_speed_stddev_sonic [m/s]                                2.2
wind_direction_sonic [degrees]                             352.0
wind_direction_stddev_sonic [degrees]                       18.0
solar_insolation [W/m^2]                                       0
station_pressure [mbar]                                    989.3
snow_depth [cm]                                              NaN
frozen_soil_05cm [bit]                                       NaN
frozen_soil_25cm [bit]                                       NaN
frozen_soil_50cm [bit]                                       NaN
soil_temp_05cm [degC]                                        NaN
soil_temp_25cm [degC]                                        NaN
soil_temp_50cm [degC]                                        NaN
soil_moisture_05cm [m^3/m^3]                                 NaN
soil_moisture_25cm [m^3/m^3]                                 NaN
soil_moisture_50cm [m^3/m^3]                                 NaN
Name: MANH, dtype: object

We can pass in a list to iloc too:

# Display data for the 1st, 32nd, and 88th row in the DataFrame.
nysm_data.iloc[[0,31,87]]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
ADDI 2021-09-02 02:00:00 UTC 13.3 13.4 92.5 0.0 0.0 0.0 2.6 5.8 0.9 ... NaN 0.0 0.0 0.0 19.9 20.3 19.9 0.52 0.44 0.44
COPE 2021-09-02 02:00:00 UTC 12.2 12.3 94.4 0.0 0.0 0.0 0.8 1.7 0.4 ... NaN 0.0 0.0 NaN 19.2 19.5 NaN 0.45 0.26 NaN
REDF 2021-09-02 02:00:00 UTC 12.4 12.5 82.1 0.0 0.0 0.0 1.8 3.2 0.6 ... NaN 0.0 0.0 0.0 18.9 19.2 18.9 0.41 0.41 0.29

3 rows × 29 columns

Now let’s select columns. To do so, we first select rows (in this case, we’ll select all rows, as indicated by :), and then continue with a list of columns.

nysm_data.loc[:,['time','temp_2m [degC]','precip_incremental [mm]']]
time temp_2m [degC] precip_incremental [mm]
station
ADDI 2021-09-02 02:00:00 UTC 13.3 0.00
ANDE 2021-09-02 02:00:00 UTC 14.0 0.28
BATA 2021-09-02 02:00:00 UTC 14.8 0.00
BEAC 2021-09-02 02:00:00 UTC 16.0 1.93
BELD 2021-09-02 02:00:00 UTC 14.3 0.00
... ... ... ...
WFMB 2021-09-02 02:00:00 UTC 12.9 0.00
WGAT 2021-09-02 02:00:00 UTC 13.8 0.00
WHIT 2021-09-02 02:00:00 UTC 15.7 0.00
WOLC 2021-09-02 02:00:00 UTC 14.0 0.00
YORK 2021-09-02 02:00:00 UTC 12.0 0.00

126 rows × 3 columns

We can specify specific rows and columns

nysm_data.loc[['BATA','COLD','QUEE'],['time','temp_2m [degC]','precip_incremental [mm]']]
time temp_2m [degC] precip_incremental [mm]
station
BATA 2021-09-02 02:00:00 UTC 14.8 0.00
COLD 2021-09-02 02:00:00 UTC 14.0 0.00
QUEE 2021-09-02 02:00:00 UTC 19.0 11.95

We can use index slicing with iloc; note that the last element is excluded … like typical Python slicing behavior.

nysm_data.iloc[0:4] # Will display first four rows of DataFrame
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
ADDI 2021-09-02 02:00:00 UTC 13.3 13.4 92.5 0.00 0.00 0.00 2.6 5.8 0.9 ... NaN 0.0 0.0 0.0 19.9 20.3 19.9 0.52 0.44 0.44
ANDE 2021-09-02 02:00:00 UTC 14.0 13.7 100.0 0.28 10.67 0.00 1.9 2.6 0.4 ... NaN 0.0 0.0 0.0 19.1 19.1 19.3 0.25 0.21 0.14
BATA 2021-09-02 02:00:00 UTC 14.8 16.3 77.5 0.00 0.00 0.00 1.9 2.3 0.2 ... NaN 0.0 0.0 0.0 20.5 21.5 21.3 0.25 0.21 0.22
BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 1.4 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37

4 rows × 29 columns

As a convenience, one can also display a specified number of rows without using the loc or iloc operators.

# Display the first four rows (indexes 0, 1, 2, 3)
nysm_data[:4]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
ADDI 2021-09-02 02:00:00 UTC 13.3 13.4 92.5 0.00 0.00 0.00 2.6 5.8 0.9 ... NaN 0.0 0.0 0.0 19.9 20.3 19.9 0.52 0.44 0.44
ANDE 2021-09-02 02:00:00 UTC 14.0 13.7 100.0 0.28 10.67 0.00 1.9 2.6 0.4 ... NaN 0.0 0.0 0.0 19.1 19.1 19.3 0.25 0.21 0.14
BATA 2021-09-02 02:00:00 UTC 14.8 16.3 77.5 0.00 0.00 0.00 1.9 2.3 0.2 ... NaN 0.0 0.0 0.0 20.5 21.5 21.3 0.25 0.21 0.22
BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 1.4 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37

4 rows × 29 columns

Question:This next cell will fail; why?
# This will fail; uncomment to see the error message
#nysm_data[2]
Answer:Because the *row index names* are now the four-character site ids!

We can, however, pass in a slice: in this case, rows 2 and 3 (remember, slices in Python exclude the stop value, by default).

nysm_data[2:4]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
BATA 2021-09-02 02:00:00 UTC 14.8 16.3 77.5 0.00 0.00 0.00 1.9 2.3 0.2 ... NaN 0.0 0.0 0.0 20.5 21.5 21.3 0.25 0.21 0.22
BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 1.4 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37

2 rows × 29 columns

Tip: In general, rely on loc and iloc when you need to perform selections.

When you select a single column from a DataFrame, a Series is returned.

Select the second column from the DataFrame (the first column would be referenced as 0 in iloc) and assign the returned Series to its own object name. Inspect this Series object.

tmpc = nysm_data.iloc[:,1]
tmpc
station
ADDI    13.3
ANDE    14.0
BATA    14.8
BEAC    16.0
BELD    14.3
        ... 
WFMB    12.9
WGAT    13.8
WHIT    15.7
WOLC    14.0
YORK    12.0
Name: temp_2m [degC], Length: 126, dtype: float64

We can employ selection operations, such as loc and iloc to Series as well as DataFrames.

Use loc to select a particular row index:

tmpc.loc['BATA']
14.8

This next line is equivalent:

tmpc['BATA']
14.8

Use iloc:

tmpc.iloc[2]
14.8

Referencing via the array index is equivalent:

tmpc[2]
14.8

Slice a Series with iloc:

tmpc.iloc[0:4]
station
ADDI    13.3
ANDE    14.0
BATA    14.8
BEAC    16.0
Name: temp_2m [degC], dtype: float64

Equivalently, slice on the array indices:

tmpc[0:4]
station
ADDI    13.3
ANDE    14.0
BATA    14.8
BEAC    16.0
Name: temp_2m [degC], dtype: float64
Note: In a departure from typical Python behavior regarding slices, slicing index names on a Series does not exclude the stop value:

Here, ‘BEAC’ is included in the returned Series:

tmpc.loc['ANDE':'BEAC']
station
ANDE    14.0
BATA    14.8
BEAC    16.0
Name: temp_2m [degC], dtype: float64
# Equivalent to previous cell
tmpc['ANDE':'BEAC']
station
ANDE    14.0
BATA    14.8
BEAC    16.0
Name: temp_2m [degC], dtype: float64

Select particular rows by expressing conditions:

Applying a conditional to a Series produces a series of booleans (i.e., True/False values … aka a truth table)

# What stations had a temperature of at least a certain threshold value?
thresh = 15
tmpc >= thresh
station
ADDI    False
ANDE    False
BATA    False
BEAC     True
BELD    False
        ...  
WFMB    False
WGAT    False
WHIT     True
WOLC    False
YORK    False
Name: temp_2m [degC], Length: 126, dtype: bool

We pass in this Boolean series to the series it was derived from, in order to produce another series containing the actual values that meet the specified condition.

tmpc[tmpc >= thresh]
station
BEAC    16.0
BERK    15.4
BKLN    19.9
BREW    16.2
BRON    18.0
COPA    15.6
DEPO    16.0
DOVE    16.0
ELMI    15.2
ESSX    15.1
FAYE    15.7
GFAL    16.0
GROT    15.1
HFAL    15.9
JORD    15.7
KIND    16.7
MANH    18.7
ONTA    18.2
OTIS    15.7
PENN    15.2
QUEE    19.0
REDH    16.6
SCHO    16.4
SOME    16.5
SOUT    19.3
SPRA    15.1
STAT    18.5
STEP    15.7
STON    20.3
SUFF    16.5
TICO    15.5
WALL    16.2
WANT    24.1
WARW    16.2
WHIT    15.7
Name: temp_2m [degC], dtype: float64
Tip: Until you get used to it, the way the above two cells are coded is a bit non-intuitive. It may make more sense if we instead set the Boolean series to its own object name:
condition = tmpc >= thresh
tmpc[condition]
station
BEAC    16.0
BERK    15.4
BKLN    19.9
BREW    16.2
BRON    18.0
COPA    15.6
DEPO    16.0
DOVE    16.0
ELMI    15.2
ESSX    15.1
FAYE    15.7
GFAL    16.0
GROT    15.1
HFAL    15.9
JORD    15.7
KIND    16.7
MANH    18.7
ONTA    18.2
OTIS    15.7
PENN    15.2
QUEE    19.0
REDH    16.6
SCHO    16.4
SOME    16.5
SOUT    19.3
SPRA    15.1
STAT    18.5
STEP    15.7
STON    20.3
SUFF    16.5
TICO    15.5
WALL    16.2
WANT    24.1
WARW    16.2
WHIT    15.7
Name: temp_2m [degC], dtype: float64

We can also apply conditions to the entire DataFrame.

Show a subset of the original DataFrame containing only those sites that recorded precip during the five-minute period between NYSM observations:

prcpMeasurable = nysm_data['precip_incremental [mm]' ] > 0.0 
prcpMeasurable
station
ADDI    False
ANDE     True
BATA    False
BEAC     True
BELD    False
        ...  
WFMB    False
WGAT    False
WHIT    False
WOLC    False
YORK    False
Name: precip_incremental [mm], Length: 126, dtype: bool
nysm_data[prcpMeasurable]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
ANDE 2021-09-02 02:00:00 UTC 14.0 13.7 100.0 0.28 10.67 0.00 1.9 2.6 0.4 ... NaN 0.0 0.0 0.0 19.1 19.1 19.3 0.25 0.21 0.14
BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 1.4 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37
BKLN 2021-09-02 02:00:00 UTC 19.9 20.0 97.9 3.21 75.19 0.93 3.4 9.5 2.3 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
BREW 2021-09-02 02:00:00 UTC 16.2 16.5 99.7 1.01 26.89 0.27 4.9 9.8 1.7 ... NaN 0.0 0.0 0.0 20.3 20.3 20.3 0.41 0.33 0.35
BRON 2021-09-02 02:00:00 UTC 18.0 17.8 99.0 6.86 110.35 1.84 6.8 11.5 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
CLAR 2021-09-02 02:00:00 UTC 13.9 14.0 97.9 0.52 13.89 0.11 0.8 2.0 0.5 ... NaN 0.0 0.0 0.0 17.9 19.1 19.3 0.22 0.22 0.15
DEPO 2021-09-02 02:00:00 UTC 16.0 16.1 90.1 0.05 3.17 0.00 2.5 5.8 1.5 ... NaN 0.0 0.0 NaN 19.9 20.7 NaN 0.17 0.27 NaN
DOVE 2021-09-02 02:00:00 UTC 16.0 NaN 95.2 1.35 24.69 0.36 2.1 4.1 0.8 ... NaN 0.0 0.0 0.0 19.6 21.5 21.5 0.39 0.22 0.15
DUAN 2021-09-02 02:00:00 UTC 13.2 13.1 100.0 0.36 6.11 0.00 0.9 2.5 0.5 ... NaN 0.0 0.0 0.0 19.5 20.2 20.1 0.55 0.38 0.42
ELDR 2021-09-02 02:00:00 UTC 14.9 NaN 98.6 0.64 16.78 0.16 0.9 2.3 0.4 ... NaN 0.0 0.0 0.0 19.1 20.1 20.7 0.38 0.52 0.51
HARP 2021-09-02 02:00:00 UTC 13.4 13.6 96.6 0.18 10.54 0.00 3.1 5.7 1.2 ... NaN 0.0 0.0 0.0 18.6 19.3 19.0 0.31 0.28 0.52
HFAL 2021-09-02 02:00:00 UTC 15.9 15.9 99.5 0.86 21.89 0.20 4.3 7.5 1.3 ... NaN 0.0 0.0 0.0 19.7 21.1 22.1 0.47 0.35 0.35
KIND 2021-09-02 02:00:00 UTC 16.7 16.7 98.6 0.73 9.75 0.19 2.3 5.1 1.0 ... NaN 0.0 0.0 0.0 21.3 21.9 21.7 0.46 0.41 0.44
MANH 2021-09-02 02:00:00 UTC 18.7 NaN 100.0 4.64 111.03 1.22 5.4 9.5 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
MEDU 2021-09-02 02:00:00 UTC 14.2 14.2 94.4 0.56 10.39 0.15 3.1 5.4 1.1 ... NaN 0.0 0.0 0.0 17.7 19.1 18.8 0.39 0.37 0.38
NBRA 2021-09-02 02:00:00 UTC 14.8 14.9 94.0 0.27 8.95 0.00 1.4 3.2 0.8 ... NaN 0.0 0.0 0.0 18.9 20.7 20.5 0.44 0.43 0.50
OTIS 2021-09-02 02:00:00 UTC 15.7 15.6 98.8 0.88 28.49 0.19 1.1 2.6 0.6 ... NaN 0.0 0.0 NaN 18.4 20.4 NaN 0.34 0.28 NaN
QUEE 2021-09-02 02:00:00 UTC 19.0 NaN 98.0 11.95 63.43 2.68 10.0 16.5 2.5 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
REDH 2021-09-02 02:00:00 UTC 16.6 16.6 97.8 0.82 19.73 0.24 3.7 6.6 1.2 ... NaN 0.0 0.0 0.0 21.9 23.9 24.3 0.28 0.25 0.30
ROXB 2021-09-02 02:00:00 UTC 14.0 14.0 98.2 0.60 11.85 0.14 3.0 5.3 0.9 ... NaN 0.0 0.0 NaN 17.2 17.3 NaN 0.55 0.52 NaN
SCHA 2021-09-02 02:00:00 UTC 14.9 15.1 98.1 0.53 10.01 0.12 0.5 1.4 0.4 ... NaN 0.0 0.0 0.0 21.5 22.1 22.4 0.22 0.14 0.05
SCHO 2021-09-02 02:00:00 UTC 16.4 16.2 99.9 0.55 9.07 0.18 1.6 4.5 1.0 ... NaN 0.0 0.0 0.0 21.1 21.9 21.9 0.33 0.19 0.14
SCHU 2021-09-02 02:00:00 UTC 14.8 14.7 96.3 0.07 2.75 0.00 1.3 2.4 0.4 ... NaN 0.0 0.0 0.0 21.2 22.4 22.1 0.07 0.12 0.08
SOME 2021-09-02 02:00:00 UTC 16.5 16.4 98.3 1.75 46.43 0.44 2.6 6.0 1.4 ... NaN 0.0 0.0 0.0 20.3 21.7 21.3 0.69 0.42 0.62
STAT 2021-09-02 02:00:00 UTC 18.5 18.5 97.6 0.39 93.40 0.00 6.1 9.4 1.6 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
STEP 2021-09-02 02:00:00 UTC 15.7 15.6 93.0 0.73 8.54 0.21 3.1 5.9 1.1 ... NaN 0.0 0.0 0.0 19.4 19.9 19.7 0.32 0.33 0.31
SUFF 2021-09-02 02:00:00 UTC 16.5 16.2 98.6 0.65 66.08 0.16 4.2 12.1 2.0 ... NaN 0.0 0.0 0.0 20.3 20.1 21.7 0.47 0.34 0.28
TANN 2021-09-02 02:00:00 UTC 13.1 13.4 98.9 0.80 17.90 0.17 0.8 3.5 0.7 ... NaN 0.0 0.0 0.0 18.0 18.4 18.4 0.59 0.49 0.42
VOOR 2021-09-02 02:00:00 UTC 14.9 14.9 98.9 0.26 11.67 0.00 2.2 3.6 0.6 ... NaN 0.0 0.0 0.0 20.2 21.1 21.1 0.25 0.33 0.19
WALL 2021-09-02 02:00:00 UTC 16.2 16.2 98.4 1.77 24.48 0.45 3.3 5.9 1.3 ... NaN 0.0 0.0 0.0 19.9 21.9 21.5 0.57 0.42 0.44
WANT 2021-09-02 02:00:00 UTC 24.1 24.0 98.7 0.06 3.47 0.00 8.0 13.3 1.9 ... NaN 0.0 0.0 0.0 23.7 24.4 24.3 0.22 0.05 0.05
WARW 2021-09-02 02:00:00 UTC 16.2 16.2 100.0 1.35 46.92 0.29 2.7 4.4 0.6 ... NaN 0.0 0.0 0.0 18.2 21.1 21.8 0.43 0.32 0.31
WBOU 2021-09-02 02:00:00 UTC 14.4 14.2 99.1 0.52 19.27 0.11 3.6 5.8 0.9 ... NaN 0.0 0.0 0.0 17.5 20.5 21.3 0.51 0.36 0.34

33 rows × 29 columns

Of those rows that met the conditon, display the 2-m temperatures.

nysm_data.loc[prcpMeasurable,'temp_2m [degC]']
station
ANDE    14.0
BEAC    16.0
BKLN    19.9
BREW    16.2
BRON    18.0
CLAR    13.9
DEPO    16.0
DOVE    16.0
DUAN    13.2
ELDR    14.9
HARP    13.4
HFAL    15.9
KIND    16.7
MANH    18.7
MEDU    14.2
NBRA    14.8
OTIS    15.7
QUEE    19.0
REDH    16.6
ROXB    14.0
SCHA    14.9
SCHO    16.4
SCHU    14.8
SOME    16.5
STAT    18.5
STEP    15.7
SUFF    16.5
TANN    13.1
VOOR    14.9
WALL    16.2
WANT    24.1
WARW    16.2
WBOU    14.4
Name: temp_2m [degC], dtype: float64

Now filter based on two conditions:

WarmWet = (nysm_data['precip_incremental [mm]'] > 0.0) & (nysm_data['temp_2m [degC]'] > thresh)
WarmWet
station
ADDI    False
ANDE    False
BATA    False
BEAC     True
BELD    False
        ...  
WFMB    False
WGAT    False
WHIT    False
WOLC    False
YORK    False
Length: 126, dtype: bool
nysm_data.loc[WarmWet]
time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] wind_speed_stddev_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
station
BEAC 2021-09-02 02:00:00 UTC 16.0 15.9 98.6 1.93 37.53 0.48 3.2 6.5 1.4 ... NaN 0.0 0.0 0.0 18.4 19.3 19.8 0.51 0.36 0.37
BKLN 2021-09-02 02:00:00 UTC 19.9 20.0 97.9 3.21 75.19 0.93 3.4 9.5 2.3 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
BREW 2021-09-02 02:00:00 UTC 16.2 16.5 99.7 1.01 26.89 0.27 4.9 9.8 1.7 ... NaN 0.0 0.0 0.0 20.3 20.3 20.3 0.41 0.33 0.35
BRON 2021-09-02 02:00:00 UTC 18.0 17.8 99.0 6.86 110.35 1.84 6.8 11.5 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
DEPO 2021-09-02 02:00:00 UTC 16.0 16.1 90.1 0.05 3.17 0.00 2.5 5.8 1.5 ... NaN 0.0 0.0 NaN 19.9 20.7 NaN 0.17 0.27 NaN
DOVE 2021-09-02 02:00:00 UTC 16.0 NaN 95.2 1.35 24.69 0.36 2.1 4.1 0.8 ... NaN 0.0 0.0 0.0 19.6 21.5 21.5 0.39 0.22 0.15
HFAL 2021-09-02 02:00:00 UTC 15.9 15.9 99.5 0.86 21.89 0.20 4.3 7.5 1.3 ... NaN 0.0 0.0 0.0 19.7 21.1 22.1 0.47 0.35 0.35
KIND 2021-09-02 02:00:00 UTC 16.7 16.7 98.6 0.73 9.75 0.19 2.3 5.1 1.0 ... NaN 0.0 0.0 0.0 21.3 21.9 21.7 0.46 0.41 0.44
MANH 2021-09-02 02:00:00 UTC 18.7 NaN 100.0 4.64 111.03 1.22 5.4 9.5 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
OTIS 2021-09-02 02:00:00 UTC 15.7 15.6 98.8 0.88 28.49 0.19 1.1 2.6 0.6 ... NaN 0.0 0.0 NaN 18.4 20.4 NaN 0.34 0.28 NaN
QUEE 2021-09-02 02:00:00 UTC 19.0 NaN 98.0 11.95 63.43 2.68 10.0 16.5 2.5 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
REDH 2021-09-02 02:00:00 UTC 16.6 16.6 97.8 0.82 19.73 0.24 3.7 6.6 1.2 ... NaN 0.0 0.0 0.0 21.9 23.9 24.3 0.28 0.25 0.30
SCHO 2021-09-02 02:00:00 UTC 16.4 16.2 99.9 0.55 9.07 0.18 1.6 4.5 1.0 ... NaN 0.0 0.0 0.0 21.1 21.9 21.9 0.33 0.19 0.14
SOME 2021-09-02 02:00:00 UTC 16.5 16.4 98.3 1.75 46.43 0.44 2.6 6.0 1.4 ... NaN 0.0 0.0 0.0 20.3 21.7 21.3 0.69 0.42 0.62
STAT 2021-09-02 02:00:00 UTC 18.5 18.5 97.6 0.39 93.40 0.00 6.1 9.4 1.6 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
STEP 2021-09-02 02:00:00 UTC 15.7 15.6 93.0 0.73 8.54 0.21 3.1 5.9 1.1 ... NaN 0.0 0.0 0.0 19.4 19.9 19.7 0.32 0.33 0.31
SUFF 2021-09-02 02:00:00 UTC 16.5 16.2 98.6 0.65 66.08 0.16 4.2 12.1 2.0 ... NaN 0.0 0.0 0.0 20.3 20.1 21.7 0.47 0.34 0.28
WALL 2021-09-02 02:00:00 UTC 16.2 16.2 98.4 1.77 24.48 0.45 3.3 5.9 1.3 ... NaN 0.0 0.0 0.0 19.9 21.9 21.5 0.57 0.42 0.44
WANT 2021-09-02 02:00:00 UTC 24.1 24.0 98.7 0.06 3.47 0.00 8.0 13.3 1.9 ... NaN 0.0 0.0 0.0 23.7 24.4 24.3 0.22 0.05 0.05
WARW 2021-09-02 02:00:00 UTC 16.2 16.2 100.0 1.35 46.92 0.29 2.7 4.4 0.6 ... NaN 0.0 0.0 0.0 18.2 21.1 21.8 0.43 0.32 0.31

20 rows × 29 columns

Question: What if you wanted to produce some text output for each time, expressing certain extreme values for a variety of parameters, and reference it to a column in another DataFrame?
Answer: Use the idxmax / idxmin method to display the row label of the maximum / minimum value in a Series.

Let’s create Series objects from the NYS Mesonet Station Info table, as well as from the data table.

stid = nysm_sites['stid']
lats = nysm_sites['lat']
lons = nysm_sites['lon']

time = nysm_data['time']
tmpc = nysm_data['temp_2m [degC]']
rh   = nysm_data['relative_humidity [percent]']
pres = nysm_data['station_pressure [mbar]']
wspd = nysm_data['max_wind_speed_prop [m/s]']
drct = nysm_data['wind_direction_prop [degrees]']
pinc = nysm_data['precip_incremental [mm]']
ptot = nysm_data['precip_local [mm]']
pint = nysm_data['precip_max_intensity [mm/min]']

What is the maximum 2m temperature of all stations in the DataFrame?

tmpcMax = tmpc.max()
tmpcMax
24.1

In what row # occurs the maximum temperature? We might think to use the idxmax method …

siteTmpcMax = tmpc.idxmax()
siteTmpcMax
'WANT'

That works, but it outputs the row index name (recall that when we read in the data file, we specified that the site ID column woudl be used for the row index name).

Let’s use the argmax method to get the row index number.

indxTmpcMax = tmpc.argmax()
indxTmpcMax
114

We can do the same for another extrema … e.g., minimum 2m temperature.

tmpcMin = tmpc.min()
siteTmpcMin = tmpc.idxmin()
indxTmpcMin = tmpc.argmin()

Knowing the row index #, we can get the corresponding full station name from the NYSM station table:

siteNameTmpcMax = nysm_sites['name'].iloc[indxTmpcMax]
siteNameTmpcMin = nysm_sites['name'].iloc[indxTmpcMin]
print ("Warmest NYSM Site: %s (%s), %.1f" % (siteNameTmpcMax,siteTmpcMax, tmpcMax))
print ("Coldest NYSM Site: %s (%s), %.1f" % (siteNameTmpcMin,siteTmpcMin, tmpcMin))
Warmest NYSM Site: Wantagh (WANT), 24.1
Coldest NYSM Site: Edwards (EDWA), 10.3
Exercise: find and output the highest and lowest sea-level pressure from the dataset.

Consider: would you want to just find the max/min station pressure values, and then convert those two values to SLP? Or would it be better to first apply the SLP reduction to the entire station pressure Series?

Solution: It's possible that the max/min station pressure values might not also correspond to the stations with max/min SLP, since there is elevation dependence.

So, first create a series of the station pressure values, and then apply the SLP reduction formula. This requires that we retrieve the elevation data from the site file as well.

pres = nysm_data['station_pressure [mbar]']
elev = nysm_sites['elevation']

Now, it should be as straightforward as passing these two Series to the SLP reduction formula … shouldn’t it?

Employ the SLP reduction equation that we used in the previous notebook:

sensorHeight = .5
# Reduce station pressure to SLP. Source: https://www.sandhurstweather.org.uk/barometric.pdf 
slp = pres/np.exp(-1*(elev+sensorHeight)/((tmpc+273.15) * 29.263))

Looks like it worked! Let’s see what we get for max/min values:

print(slp.max(), slp.min())
nan nan

Uh-oh … nan signifies missing data! Let’s look at what our slp Series object looks like:

slp
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
        ..
WFMB   NaN
WGAT   NaN
WHIT   NaN
WOLC   NaN
YORK   NaN
Length: 252, dtype: float64

Look at the length of the series! It’s 252 … which is 126 times 2!

Let’s just look at all the rows of the series first …

pd.set_option('display.max_rows',None)
slp
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      NaN
9      NaN
10     NaN
11     NaN
12     NaN
13     NaN
14     NaN
15     NaN
16     NaN
17     NaN
18     NaN
19     NaN
20     NaN
21     NaN
22     NaN
23     NaN
24     NaN
25     NaN
26     NaN
27     NaN
28     NaN
29     NaN
30     NaN
31     NaN
32     NaN
33     NaN
34     NaN
35     NaN
36     NaN
37     NaN
38     NaN
39     NaN
40     NaN
41     NaN
42     NaN
43     NaN
44     NaN
45     NaN
46     NaN
47     NaN
48     NaN
49     NaN
50     NaN
51     NaN
52     NaN
53     NaN
54     NaN
55     NaN
56     NaN
57     NaN
58     NaN
59     NaN
60     NaN
61     NaN
62     NaN
63     NaN
64     NaN
65     NaN
66     NaN
67     NaN
68     NaN
69     NaN
70     NaN
71     NaN
72     NaN
73     NaN
74     NaN
75     NaN
76     NaN
77     NaN
78     NaN
79     NaN
80     NaN
81     NaN
82     NaN
83     NaN
84     NaN
85     NaN
86     NaN
87     NaN
88     NaN
89     NaN
90     NaN
91     NaN
92     NaN
93     NaN
94     NaN
95     NaN
96     NaN
97     NaN
98     NaN
99     NaN
100    NaN
101    NaN
102    NaN
103    NaN
104    NaN
105    NaN
106    NaN
107    NaN
108    NaN
109    NaN
110    NaN
111    NaN
112    NaN
113    NaN
114    NaN
115    NaN
116    NaN
117    NaN
118    NaN
119    NaN
120    NaN
121    NaN
122    NaN
123    NaN
124    NaN
125    NaN
ADDI   NaN
ANDE   NaN
BATA   NaN
BEAC   NaN
BELD   NaN
BELL   NaN
BELM   NaN
BERK   NaN
BING   NaN
BKLN   NaN
BRAN   NaN
BREW   NaN
BROC   NaN
BRON   NaN
BROO   NaN
BSPA   NaN
BUFF   NaN
BURD   NaN
BURT   NaN
CAMD   NaN
CAPE   NaN
CHAZ   NaN
CHES   NaN
CINC   NaN
CLAR   NaN
CLIF   NaN
CLYM   NaN
COBL   NaN
COHO   NaN
COLD   NaN
COPA   NaN
COPE   NaN
CROG   NaN
CSQR   NaN
DELE   NaN
DEPO   NaN
DOVE   NaN
DUAN   NaN
EAUR   NaN
EDIN   NaN
EDWA   NaN
ELDR   NaN
ELLE   NaN
ELMI   NaN
ESSX   NaN
FAYE   NaN
FRED   NaN
GABR   NaN
GFAL   NaN
GFLD   NaN
GROT   NaN
GROV   NaN
HAMM   NaN
HARP   NaN
HARR   NaN
HART   NaN
HERK   NaN
HFAL   NaN
ILAK   NaN
JOHN   NaN
JORD   NaN
KIND   NaN
LAUR   NaN
LOUI   NaN
MALO   NaN
MANH   NaN
MEDI   NaN
MEDU   NaN
MORR   NaN
NBRA   NaN
NEWC   NaN
NHUD   NaN
OLDF   NaN
OLEA   NaN
ONTA   NaN
OPPE   NaN
OSCE   NaN
OSWE   NaN
OTIS   NaN
OWEG   NaN
PENN   NaN
PHIL   NaN
PISE   NaN
POTS   NaN
QUEE   NaN
RAND   NaN
RAQU   NaN
REDF   NaN
REDH   NaN
ROXB   NaN
RUSH   NaN
SARA   NaN
SBRI   NaN
SCHA   NaN
SCHO   NaN
SCHU   NaN
SCIP   NaN
SHER   NaN
SOME   NaN
SOUT   NaN
SPRA   NaN
SPRI   NaN
STAT   NaN
STEP   NaN
STON   NaN
SUFF   NaN
TANN   NaN
TICO   NaN
TULL   NaN
TUPP   NaN
TYRO   NaN
VOOR   NaN
WALL   NaN
WALT   NaN
WANT   NaN
WARS   NaN
WARW   NaN
WATE   NaN
WBOU   NaN
WELL   NaN
WEST   NaN
WFMB   NaN
WGAT   NaN
WHIT   NaN
WOLC   NaN
YORK   NaN
dtype: float64

Missing (i.e. NaN) values everywhere, and indeed, we have double the number of rows that we thought we’d have!

Let’s take a look at our elev and pres Series objects.

# go back to default # of rows displayed
pd.set_option('display.max_rows',10)
elev
0      507.6140
1      518.2820
2      276.1200
3       90.1598
4      470.3700
         ...   
121    614.5990
122    442.9660
123     36.5638
124    121.2190
125    177.9420
Name: elevation, Length: 126, dtype: float64
pres
station
ADDI     951.47
ANDE     947.40
BATA     979.93
BEAC     994.69
BELD     954.02
         ...   
WFMB     941.26
WGAT     959.46
WHIT    1006.29
WOLC     996.90
YORK     991.17
Name: station_pressure [mbar], Length: 126, dtype: float64

The issue is that our elev and pres series have different row indexes! In order for calculations to work when applied to different Series objects, their indexes must correspond!

Recall that we set the row index for the nysm_data DataFrame to station id, but kept the default for nysm_sites.

Since tmpc and pres both have their row index set to station id, let’s set elev’s index to that as well.

elev.index = pres.index
elev
station
ADDI    507.6140
ANDE    518.2820
BATA    276.1200
BEAC     90.1598
BELD    470.3700
          ...   
WFMB    614.5990
WGAT    442.9660
WHIT     36.5638
WOLC    121.2190
YORK    177.9420
Name: elevation, Length: 126, dtype: float64

Now, repeat the SLP reduction calculation.

slp = pres/np.exp(-1*(elev+sensorHeight)/((tmpc+273.15) * 29.263))
slpMax = slp.max()
slpMin = slp.min()
print (slpMax, slpMin)
1014.0598722184151 998.940242497739

With the row indexes common among all three Series in the equation, it works!

We can now find the corresponding row numbers and summarize as we did for max/min temperature.

siteSlpMax = slp.idxmax()
indxSlpMax = slp.argmax()
siteSlpMin = slp.idxmin()
indxSlpMin = slp.argmin()
siteNameSlpMax = nysm_sites['name'].iloc[indxSlpMax]
siteNameSlpMin = nysm_sites['name'].iloc[indxSlpMin]
print ("Highest SLP NYSM Site: %s (%s), %.1f" % (siteNameSlpMax,siteSlpMax, slpMax))
print ("Lowest  SLP NYSM Site: %s (%s), %.1f" % (siteNameSlpMin,siteSlpMin, slpMin))
Highest SLP NYSM Site: Clymer (CLYM), 1014.1
Lowest  SLP NYSM Site: Brooklyn (BKLN), 998.9
Tip: Be mindful of how you define row indexes when working with multiple `DataFrame`s!

Summary

  • Use the loc and iloc methods to select rows, columns, or combinations of the two in a Pandas DataFrame.

  • These two methods can also be used for Series.

  • Setting conditions on particular rows or columns produce True/False values, which can then be passed into a DataFrame or Series for more meaningful output.

  • The idxmax / idxmin and argmax / argminfunctions can be combined to provide meaningful output from multiple DataFrames.

What’s Next?

In the next notebook, we will explore how Pandas lets us efficiently work with data that are date- and time-based.

Resources and References

  1. Selecting in Pandas

  2. Pandas User Guide