Pandas 3: Indexing, Subsetting, Conditionals
Contents
Pandas 3: Indexing, Subsetting, Conditionals¶
Overview¶
In this notebook, we’ll work with Pandas DataFrame
and Series
objects to do the following:
Specify a particular column to use as the row index name
Select various rows, columns, or both
Filter a dataset based on specified conditions
Calculate and output some summary information, using data from multiple files
We’ll once again use NYS Mesonet data from 2 September 2021 at 0200 UTC.¶
Imports¶
import pandas as pd
import numpy as np
Create and inspect the relevant DataFrame
s.
nysm_sites = pd.read_csv('/spare11/atm533/data/nysm_sites.csv')
nysm_data = pd.read_csv('/spare11/atm533/data/nysm_data_2021090202.csv')
nysm_data
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 13.4 | 92.5 | 0.00 | 0.00 | 0.00 | 2.6 | 5.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 20.3 | 19.9 | 0.52 | 0.44 | 0.44 |
1 | ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 13.7 | 100.0 | 0.28 | 10.67 | 0.00 | 1.9 | 2.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 19.1 | 19.3 | 0.25 | 0.21 | 0.14 |
2 | BATA | 2021-09-02 02:00:00 UTC | 14.8 | 16.3 | 77.5 | 0.00 | 0.00 | 0.00 | 1.9 | 2.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.5 | 21.3 | 0.25 | 0.21 | 0.22 |
3 | BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
4 | BELD | 2021-09-02 02:00:00 UTC | 14.3 | 14.5 | 94.9 | 0.00 | 1.16 | 0.00 | 2.6 | 4.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.7 | 20.1 | 20.2 | 0.50 | 0.43 | 0.41 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2021-09-02 02:00:00 UTC | 12.9 | 13.6 | 75.0 | 0.00 | 0.00 | 0.00 | 0.8 | 1.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.8 | 19.9 | 19.9 | 0.24 | 0.18 | 0.20 |
122 | WGAT | 2021-09-02 02:00:00 UTC | 13.8 | 13.8 | 79.5 | 0.00 | 0.00 | 0.00 | 1.4 | 4.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.8 | 20.2 | 20.7 | 0.16 | 0.25 | 0.08 |
123 | WHIT | 2021-09-02 02:00:00 UTC | 15.7 | 15.8 | 95.9 | 0.00 | 0.00 | 0.00 | 1.4 | 2.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.3 | 20.3 | 20.1 | 0.28 | 0.47 | 0.46 |
124 | WOLC | 2021-09-02 02:00:00 UTC | 14.0 | 16.6 | 84.6 | 0.00 | 0.00 | 0.00 | 0.4 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.9 | 23.5 | 24.2 | 0.18 | 0.03 | 0.07 |
125 | YORK | 2021-09-02 02:00:00 UTC | 12.0 | 13.9 | 96.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.8 | 21.9 | 0.13 | 0.24 | 0.24 |
126 rows × 30 columns
nysm_sites
stid | number | name | lat | lon | elevation | county | nearest_city | state | distance_from_town [km] | direction_from_town [degrees] | climate_division | climate_division_name | wfo | commissioned | decommissioned | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 107 | Addison | 42.040360 | -77.237260 | 507.6140 | Steuben | Addison | NY | 6.9 | S | 1 | Western Plateau | BGM | 2016-08-10 18:15:00 UTC | NaN |
1 | ANDE | 111 | Andes | 42.182270 | -74.801390 | 518.2820 | Delaware | Andes | NY | 1.5 | WSW | 2 | Eastern Plateau | BGM | 2016-08-04 15:55:00 UTC | NaN |
2 | BATA | 24 | Batavia | 43.019940 | -78.135660 | 276.1200 | Genesee | Batavia | NY | 4.9 | ENE | 9 | Great Lakes | BUF | 2016-02-18 18:40:00 UTC | NaN |
3 | BEAC | 76 | Beacon | 41.528750 | -73.945270 | 90.1598 | Dutchess | Beacon | NY | 3.3 | NE | 5 | Hudson Valley | ALY | 2016-08-22 16:45:00 UTC | NaN |
4 | BELD | 90 | Belden | 42.223220 | -75.668520 | 470.3700 | Broome | Belden | NY | 2.2 | NNE | 2 | Eastern Plateau | BGM | 2015-11-30 20:20:00 UTC | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 14 | Whiteface Mountain Base | 44.393236 | -73.858829 | 614.5990 | Essex | Wilmington | NY | 3.5 | W | 3 | Northern Plateau | BTV | 2016-01-29 20:55:00 UTC | NaN |
122 | WGAT | 123 | Woodgate | 43.532408 | -75.158597 | 442.9660 | Oneida | Woodgate | NY | 1.4 | NNW | 3 | Northern Plateau | BGM | 2016-08-29 18:20:00 UTC | NaN |
123 | WHIT | 10 | Whitehall | 43.485073 | -73.423071 | 36.5638 | Washington | Whitehall | NY | 8.0 | S | 7 | Champlain Valley | ALY | 2015-08-26 20:30:00 UTC | NaN |
124 | WOLC | 79 | Wolcott | 43.228680 | -76.842610 | 121.2190 | Wayne | Wolcott | NY | 2.4 | WNW | 9 | Great Lakes | BUF | 2016-03-09 18:10:00 UTC | NaN |
125 | YORK | 99 | York | 42.855040 | -77.847760 | 177.9420 | Livingston | York | NY | 3.6 | ESE | 10 | Central Lakes | BUF | 2016-08-09 17:55:00 UTC | NaN |
126 rows × 16 columns
Select a row index name from a particular column¶
For the nysm_data
DataFrame, let’s set the station column as the row index
, in place of the default index. The inplace
argument makes it so the DataFrame
object is updated with that change.
nysm_data.set_index('station', inplace=True)
set_index
function? nysm_data.set_index?
One of the arguments to set_index
is drop
. If true, it will drop (i.e., delete) the corresponding column from the DataFrame
. Its default is True, so we don’t need to include it if that’s what we want.
Let’s look at the first couple rows in the DataFrame
to verify that the index got changed.
nysm_data.head(2)
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 13.4 | 92.5 | 0.00 | 0.00 | 0.0 | 2.6 | 5.8 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 20.3 | 19.9 | 0.52 | 0.44 | 0.44 |
ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 13.7 | 100.0 | 0.28 | 10.67 | 0.0 | 1.9 | 2.6 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 19.1 | 19.3 | 0.25 | 0.21 | 0.14 |
2 rows × 29 columns
Success! Now the station name is the row index.
Selecting rows and/or columns from a DataFrame
using the loc
and iloc
operators (full documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)¶
Let’s say we just want to select a few of the NYS Mesonet sites for analysis. We can select rows / columns either by name or index number. First, let’s select by name.¶
The loc
operator can accept a single row/column name, or a list of names. Try a single site first. As loc
is a type of Python list or array index operator, we place the arguments inside brackets []
.
# Display data for the Manhattan (MANH) site.
nysm_data.loc['MANH']
time 2021-09-02 02:00:00 UTC
temp_2m [degC] 18.7
temp_9m [degC] NaN
relative_humidity [percent] 100.0
precip_incremental [mm] 4.64
precip_local [mm] 111.03
precip_max_intensity [mm/min] 1.22
avg_wind_speed_prop [m/s] 5.4
max_wind_speed_prop [m/s] 9.5
wind_speed_stddev_prop [m/s] 2.0
wind_direction_prop [degrees] 352.0
wind_direction_stddev_prop [degrees] 24.0
avg_wind_speed_sonic [m/s] 6.6
max_wind_speed_sonic [m/s] 12.5
wind_speed_stddev_sonic [m/s] 2.2
wind_direction_sonic [degrees] 352.0
wind_direction_stddev_sonic [degrees] 18.0
solar_insolation [W/m^2] 0
station_pressure [mbar] 989.3
snow_depth [cm] NaN
frozen_soil_05cm [bit] NaN
frozen_soil_25cm [bit] NaN
frozen_soil_50cm [bit] NaN
soil_temp_05cm [degC] NaN
soil_temp_25cm [degC] NaN
soil_temp_50cm [degC] NaN
soil_moisture_05cm [m^3/m^3] NaN
soil_moisture_25cm [m^3/m^3] NaN
soil_moisture_50cm [m^3/m^3] NaN
Name: MANH, dtype: object
Multiple sites can be passed as a Python list
… with its own set of brackets []
! This list can be in any order. Once we select more than one row, the output is now a DataFrame
… and will be nicely formatted in the notebook.¶
# Display data for the Rush (RUSH), Bronx (BRON) and Voorheeseville (VOOR) sites.
nysm_data.loc[['RUSH','BRON','VOOR']]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
RUSH | 2021-09-02 02:00:00 UTC | 11.3 | 12.9 | 98.3 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.3 | 21.3 | 0.26 | 0.25 | 0.25 |
BRON | 2021-09-02 02:00:00 UTC | 18.0 | 17.8 | 99.0 | 6.86 | 110.35 | 1.84 | 6.8 | 11.5 | 2.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
VOOR | 2021-09-02 02:00:00 UTC | 14.9 | 14.9 | 98.9 | 0.26 | 11.67 | 0.00 | 2.2 | 3.6 | 0.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.2 | 21.1 | 21.1 | 0.25 | 0.33 | 0.19 |
3 rows × 29 columns
We can also select by row number () using the iloc
operator. Manhattan is in row 66. Recall that Python uses zero-based indexing, so the first row is element 0.
nysm_data.iloc[65]
time 2021-09-02 02:00:00 UTC
temp_2m [degC] 18.7
temp_9m [degC] NaN
relative_humidity [percent] 100.0
precip_incremental [mm] 4.64
precip_local [mm] 111.03
precip_max_intensity [mm/min] 1.22
avg_wind_speed_prop [m/s] 5.4
max_wind_speed_prop [m/s] 9.5
wind_speed_stddev_prop [m/s] 2.0
wind_direction_prop [degrees] 352.0
wind_direction_stddev_prop [degrees] 24.0
avg_wind_speed_sonic [m/s] 6.6
max_wind_speed_sonic [m/s] 12.5
wind_speed_stddev_sonic [m/s] 2.2
wind_direction_sonic [degrees] 352.0
wind_direction_stddev_sonic [degrees] 18.0
solar_insolation [W/m^2] 0
station_pressure [mbar] 989.3
snow_depth [cm] NaN
frozen_soil_05cm [bit] NaN
frozen_soil_25cm [bit] NaN
frozen_soil_50cm [bit] NaN
soil_temp_05cm [degC] NaN
soil_temp_25cm [degC] NaN
soil_temp_50cm [degC] NaN
soil_moisture_05cm [m^3/m^3] NaN
soil_moisture_25cm [m^3/m^3] NaN
soil_moisture_50cm [m^3/m^3] NaN
Name: MANH, dtype: object
We can pass in a list to iloc
too:¶
# Display data for the 1st, 32nd, and 88th row in the DataFrame.
nysm_data.iloc[[0,31,87]]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 13.4 | 92.5 | 0.0 | 0.0 | 0.0 | 2.6 | 5.8 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 20.3 | 19.9 | 0.52 | 0.44 | 0.44 |
COPE | 2021-09-02 02:00:00 UTC | 12.2 | 12.3 | 94.4 | 0.0 | 0.0 | 0.0 | 0.8 | 1.7 | 0.4 | ... | NaN | 0.0 | 0.0 | NaN | 19.2 | 19.5 | NaN | 0.45 | 0.26 | NaN |
REDF | 2021-09-02 02:00:00 UTC | 12.4 | 12.5 | 82.1 | 0.0 | 0.0 | 0.0 | 1.8 | 3.2 | 0.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.9 | 19.2 | 18.9 | 0.41 | 0.41 | 0.29 |
3 rows × 29 columns
Now let’s select columns. To do so, we first select rows (in this case, we’ll select all rows, as indicated by :
), and then continue with a list of columns.¶
nysm_data.loc[:,['time','temp_2m [degC]','precip_incremental [mm]']]
time | temp_2m [degC] | precip_incremental [mm] | |
---|---|---|---|
station | |||
ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 0.00 |
ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 0.28 |
BATA | 2021-09-02 02:00:00 UTC | 14.8 | 0.00 |
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 1.93 |
BELD | 2021-09-02 02:00:00 UTC | 14.3 | 0.00 |
... | ... | ... | ... |
WFMB | 2021-09-02 02:00:00 UTC | 12.9 | 0.00 |
WGAT | 2021-09-02 02:00:00 UTC | 13.8 | 0.00 |
WHIT | 2021-09-02 02:00:00 UTC | 15.7 | 0.00 |
WOLC | 2021-09-02 02:00:00 UTC | 14.0 | 0.00 |
YORK | 2021-09-02 02:00:00 UTC | 12.0 | 0.00 |
126 rows × 3 columns
We can specify specific rows and columns¶
nysm_data.loc[['BATA','COLD','QUEE'],['time','temp_2m [degC]','precip_incremental [mm]']]
time | temp_2m [degC] | precip_incremental [mm] | |
---|---|---|---|
station | |||
BATA | 2021-09-02 02:00:00 UTC | 14.8 | 0.00 |
COLD | 2021-09-02 02:00:00 UTC | 14.0 | 0.00 |
QUEE | 2021-09-02 02:00:00 UTC | 19.0 | 11.95 |
We can use index slicing with iloc
; note that the last element is excluded … like typical Python slicing behavior.¶
nysm_data.iloc[0:4] # Will display first four rows of DataFrame
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 13.4 | 92.5 | 0.00 | 0.00 | 0.00 | 2.6 | 5.8 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 20.3 | 19.9 | 0.52 | 0.44 | 0.44 |
ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 13.7 | 100.0 | 0.28 | 10.67 | 0.00 | 1.9 | 2.6 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 19.1 | 19.3 | 0.25 | 0.21 | 0.14 |
BATA | 2021-09-02 02:00:00 UTC | 14.8 | 16.3 | 77.5 | 0.00 | 0.00 | 0.00 | 1.9 | 2.3 | 0.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.5 | 21.3 | 0.25 | 0.21 | 0.22 |
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
4 rows × 29 columns
As a convenience, one can also display a specified number of rows without using the loc
or iloc
operators.¶
# Display the first four rows (indexes 0, 1, 2, 3)
nysm_data[:4]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
ADDI | 2021-09-02 02:00:00 UTC | 13.3 | 13.4 | 92.5 | 0.00 | 0.00 | 0.00 | 2.6 | 5.8 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 20.3 | 19.9 | 0.52 | 0.44 | 0.44 |
ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 13.7 | 100.0 | 0.28 | 10.67 | 0.00 | 1.9 | 2.6 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 19.1 | 19.3 | 0.25 | 0.21 | 0.14 |
BATA | 2021-09-02 02:00:00 UTC | 14.8 | 16.3 | 77.5 | 0.00 | 0.00 | 0.00 | 1.9 | 2.3 | 0.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.5 | 21.3 | 0.25 | 0.21 | 0.22 |
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
4 rows × 29 columns
# This will fail; uncomment to see the error message
#nysm_data[2]
We can, however, pass in a slice: in this case, rows 2 and 3 (remember, slices in Python exclude the stop value, by default).¶
nysm_data[2:4]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
BATA | 2021-09-02 02:00:00 UTC | 14.8 | 16.3 | 77.5 | 0.00 | 0.00 | 0.00 | 1.9 | 2.3 | 0.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.5 | 21.5 | 21.3 | 0.25 | 0.21 | 0.22 |
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
2 rows × 29 columns
loc
and iloc
when you need to perform selections.When you select a single column from a DataFrame
, a Series
is returned.¶
Select the second column from the DataFrame
(the first column would be referenced as 0 in iloc
) and assign the returned Series
to its own object name. Inspect this Series
object.
tmpc = nysm_data.iloc[:,1]
tmpc
station
ADDI 13.3
ANDE 14.0
BATA 14.8
BEAC 16.0
BELD 14.3
...
WFMB 12.9
WGAT 13.8
WHIT 15.7
WOLC 14.0
YORK 12.0
Name: temp_2m [degC], Length: 126, dtype: float64
We can employ selection operations, such as loc
and iloc
to Series
as well as DataFrames
.¶
Use loc
to select a particular row index:
tmpc.loc['BATA']
14.8
This next line is equivalent:
tmpc['BATA']
14.8
Use iloc
:
tmpc.iloc[2]
14.8
Referencing via the array index is equivalent:
tmpc[2]
14.8
Slice a Series
with iloc
:
tmpc.iloc[0:4]
station
ADDI 13.3
ANDE 14.0
BATA 14.8
BEAC 16.0
Name: temp_2m [degC], dtype: float64
Equivalently, slice on the array indices:
tmpc[0:4]
station
ADDI 13.3
ANDE 14.0
BATA 14.8
BEAC 16.0
Name: temp_2m [degC], dtype: float64
Series
does not exclude the stop value:Here, ‘BEAC’ is included in the returned Series
:
tmpc.loc['ANDE':'BEAC']
station
ANDE 14.0
BATA 14.8
BEAC 16.0
Name: temp_2m [degC], dtype: float64
# Equivalent to previous cell
tmpc['ANDE':'BEAC']
station
ANDE 14.0
BATA 14.8
BEAC 16.0
Name: temp_2m [degC], dtype: float64
Select particular rows by expressing conditions:¶
Applying a conditional to a Series
produces a series of booleans (i.e., True
/False
values … aka a truth table)¶
# What stations had a temperature of at least a certain threshold value?
thresh = 15
tmpc >= thresh
station
ADDI False
ANDE False
BATA False
BEAC True
BELD False
...
WFMB False
WGAT False
WHIT True
WOLC False
YORK False
Name: temp_2m [degC], Length: 126, dtype: bool
We pass in this Boolean series to the series it was derived from, in order to produce another series containing the actual values that meet the specified condition.
tmpc[tmpc >= thresh]
station
BEAC 16.0
BERK 15.4
BKLN 19.9
BREW 16.2
BRON 18.0
COPA 15.6
DEPO 16.0
DOVE 16.0
ELMI 15.2
ESSX 15.1
FAYE 15.7
GFAL 16.0
GROT 15.1
HFAL 15.9
JORD 15.7
KIND 16.7
MANH 18.7
ONTA 18.2
OTIS 15.7
PENN 15.2
QUEE 19.0
REDH 16.6
SCHO 16.4
SOME 16.5
SOUT 19.3
SPRA 15.1
STAT 18.5
STEP 15.7
STON 20.3
SUFF 16.5
TICO 15.5
WALL 16.2
WANT 24.1
WARW 16.2
WHIT 15.7
Name: temp_2m [degC], dtype: float64
condition = tmpc >= thresh
tmpc[condition]
station
BEAC 16.0
BERK 15.4
BKLN 19.9
BREW 16.2
BRON 18.0
COPA 15.6
DEPO 16.0
DOVE 16.0
ELMI 15.2
ESSX 15.1
FAYE 15.7
GFAL 16.0
GROT 15.1
HFAL 15.9
JORD 15.7
KIND 16.7
MANH 18.7
ONTA 18.2
OTIS 15.7
PENN 15.2
QUEE 19.0
REDH 16.6
SCHO 16.4
SOME 16.5
SOUT 19.3
SPRA 15.1
STAT 18.5
STEP 15.7
STON 20.3
SUFF 16.5
TICO 15.5
WALL 16.2
WANT 24.1
WARW 16.2
WHIT 15.7
Name: temp_2m [degC], dtype: float64
We can also apply conditions to the entire DataFrame
.¶
Show a subset of the original DataFrame
containing only those sites that recorded precip during the five-minute period between NYSM observations:
prcpMeasurable = nysm_data['precip_incremental [mm]' ] > 0.0
prcpMeasurable
station
ADDI False
ANDE True
BATA False
BEAC True
BELD False
...
WFMB False
WGAT False
WHIT False
WOLC False
YORK False
Name: precip_incremental [mm], Length: 126, dtype: bool
nysm_data[prcpMeasurable]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
ANDE | 2021-09-02 02:00:00 UTC | 14.0 | 13.7 | 100.0 | 0.28 | 10.67 | 0.00 | 1.9 | 2.6 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 19.1 | 19.3 | 0.25 | 0.21 | 0.14 |
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
BKLN | 2021-09-02 02:00:00 UTC | 19.9 | 20.0 | 97.9 | 3.21 | 75.19 | 0.93 | 3.4 | 9.5 | 2.3 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
BREW | 2021-09-02 02:00:00 UTC | 16.2 | 16.5 | 99.7 | 1.01 | 26.89 | 0.27 | 4.9 | 9.8 | 1.7 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 20.3 | 20.3 | 0.41 | 0.33 | 0.35 |
BRON | 2021-09-02 02:00:00 UTC | 18.0 | 17.8 | 99.0 | 6.86 | 110.35 | 1.84 | 6.8 | 11.5 | 2.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
CLAR | 2021-09-02 02:00:00 UTC | 13.9 | 14.0 | 97.9 | 0.52 | 13.89 | 0.11 | 0.8 | 2.0 | 0.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 17.9 | 19.1 | 19.3 | 0.22 | 0.22 | 0.15 |
DEPO | 2021-09-02 02:00:00 UTC | 16.0 | 16.1 | 90.1 | 0.05 | 3.17 | 0.00 | 2.5 | 5.8 | 1.5 | ... | NaN | 0.0 | 0.0 | NaN | 19.9 | 20.7 | NaN | 0.17 | 0.27 | NaN |
DOVE | 2021-09-02 02:00:00 UTC | 16.0 | NaN | 95.2 | 1.35 | 24.69 | 0.36 | 2.1 | 4.1 | 0.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.6 | 21.5 | 21.5 | 0.39 | 0.22 | 0.15 |
DUAN | 2021-09-02 02:00:00 UTC | 13.2 | 13.1 | 100.0 | 0.36 | 6.11 | 0.00 | 0.9 | 2.5 | 0.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.5 | 20.2 | 20.1 | 0.55 | 0.38 | 0.42 |
ELDR | 2021-09-02 02:00:00 UTC | 14.9 | NaN | 98.6 | 0.64 | 16.78 | 0.16 | 0.9 | 2.3 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.1 | 20.1 | 20.7 | 0.38 | 0.52 | 0.51 |
HARP | 2021-09-02 02:00:00 UTC | 13.4 | 13.6 | 96.6 | 0.18 | 10.54 | 0.00 | 3.1 | 5.7 | 1.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.6 | 19.3 | 19.0 | 0.31 | 0.28 | 0.52 |
HFAL | 2021-09-02 02:00:00 UTC | 15.9 | 15.9 | 99.5 | 0.86 | 21.89 | 0.20 | 4.3 | 7.5 | 1.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.7 | 21.1 | 22.1 | 0.47 | 0.35 | 0.35 |
KIND | 2021-09-02 02:00:00 UTC | 16.7 | 16.7 | 98.6 | 0.73 | 9.75 | 0.19 | 2.3 | 5.1 | 1.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.3 | 21.9 | 21.7 | 0.46 | 0.41 | 0.44 |
MANH | 2021-09-02 02:00:00 UTC | 18.7 | NaN | 100.0 | 4.64 | 111.03 | 1.22 | 5.4 | 9.5 | 2.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
MEDU | 2021-09-02 02:00:00 UTC | 14.2 | 14.2 | 94.4 | 0.56 | 10.39 | 0.15 | 3.1 | 5.4 | 1.1 | ... | NaN | 0.0 | 0.0 | 0.0 | 17.7 | 19.1 | 18.8 | 0.39 | 0.37 | 0.38 |
NBRA | 2021-09-02 02:00:00 UTC | 14.8 | 14.9 | 94.0 | 0.27 | 8.95 | 0.00 | 1.4 | 3.2 | 0.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.9 | 20.7 | 20.5 | 0.44 | 0.43 | 0.50 |
OTIS | 2021-09-02 02:00:00 UTC | 15.7 | 15.6 | 98.8 | 0.88 | 28.49 | 0.19 | 1.1 | 2.6 | 0.6 | ... | NaN | 0.0 | 0.0 | NaN | 18.4 | 20.4 | NaN | 0.34 | 0.28 | NaN |
QUEE | 2021-09-02 02:00:00 UTC | 19.0 | NaN | 98.0 | 11.95 | 63.43 | 2.68 | 10.0 | 16.5 | 2.5 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
REDH | 2021-09-02 02:00:00 UTC | 16.6 | 16.6 | 97.8 | 0.82 | 19.73 | 0.24 | 3.7 | 6.6 | 1.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.9 | 23.9 | 24.3 | 0.28 | 0.25 | 0.30 |
ROXB | 2021-09-02 02:00:00 UTC | 14.0 | 14.0 | 98.2 | 0.60 | 11.85 | 0.14 | 3.0 | 5.3 | 0.9 | ... | NaN | 0.0 | 0.0 | NaN | 17.2 | 17.3 | NaN | 0.55 | 0.52 | NaN |
SCHA | 2021-09-02 02:00:00 UTC | 14.9 | 15.1 | 98.1 | 0.53 | 10.01 | 0.12 | 0.5 | 1.4 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.5 | 22.1 | 22.4 | 0.22 | 0.14 | 0.05 |
SCHO | 2021-09-02 02:00:00 UTC | 16.4 | 16.2 | 99.9 | 0.55 | 9.07 | 0.18 | 1.6 | 4.5 | 1.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.1 | 21.9 | 21.9 | 0.33 | 0.19 | 0.14 |
SCHU | 2021-09-02 02:00:00 UTC | 14.8 | 14.7 | 96.3 | 0.07 | 2.75 | 0.00 | 1.3 | 2.4 | 0.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.2 | 22.4 | 22.1 | 0.07 | 0.12 | 0.08 |
SOME | 2021-09-02 02:00:00 UTC | 16.5 | 16.4 | 98.3 | 1.75 | 46.43 | 0.44 | 2.6 | 6.0 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 21.7 | 21.3 | 0.69 | 0.42 | 0.62 |
STAT | 2021-09-02 02:00:00 UTC | 18.5 | 18.5 | 97.6 | 0.39 | 93.40 | 0.00 | 6.1 | 9.4 | 1.6 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
STEP | 2021-09-02 02:00:00 UTC | 15.7 | 15.6 | 93.0 | 0.73 | 8.54 | 0.21 | 3.1 | 5.9 | 1.1 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.4 | 19.9 | 19.7 | 0.32 | 0.33 | 0.31 |
SUFF | 2021-09-02 02:00:00 UTC | 16.5 | 16.2 | 98.6 | 0.65 | 66.08 | 0.16 | 4.2 | 12.1 | 2.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 20.1 | 21.7 | 0.47 | 0.34 | 0.28 |
TANN | 2021-09-02 02:00:00 UTC | 13.1 | 13.4 | 98.9 | 0.80 | 17.90 | 0.17 | 0.8 | 3.5 | 0.7 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.0 | 18.4 | 18.4 | 0.59 | 0.49 | 0.42 |
VOOR | 2021-09-02 02:00:00 UTC | 14.9 | 14.9 | 98.9 | 0.26 | 11.67 | 0.00 | 2.2 | 3.6 | 0.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.2 | 21.1 | 21.1 | 0.25 | 0.33 | 0.19 |
WALL | 2021-09-02 02:00:00 UTC | 16.2 | 16.2 | 98.4 | 1.77 | 24.48 | 0.45 | 3.3 | 5.9 | 1.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 21.9 | 21.5 | 0.57 | 0.42 | 0.44 |
WANT | 2021-09-02 02:00:00 UTC | 24.1 | 24.0 | 98.7 | 0.06 | 3.47 | 0.00 | 8.0 | 13.3 | 1.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 23.7 | 24.4 | 24.3 | 0.22 | 0.05 | 0.05 |
WARW | 2021-09-02 02:00:00 UTC | 16.2 | 16.2 | 100.0 | 1.35 | 46.92 | 0.29 | 2.7 | 4.4 | 0.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.2 | 21.1 | 21.8 | 0.43 | 0.32 | 0.31 |
WBOU | 2021-09-02 02:00:00 UTC | 14.4 | 14.2 | 99.1 | 0.52 | 19.27 | 0.11 | 3.6 | 5.8 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 17.5 | 20.5 | 21.3 | 0.51 | 0.36 | 0.34 |
33 rows × 29 columns
Of those rows that met the conditon, display the 2-m temperatures.¶
nysm_data.loc[prcpMeasurable,'temp_2m [degC]']
station
ANDE 14.0
BEAC 16.0
BKLN 19.9
BREW 16.2
BRON 18.0
CLAR 13.9
DEPO 16.0
DOVE 16.0
DUAN 13.2
ELDR 14.9
HARP 13.4
HFAL 15.9
KIND 16.7
MANH 18.7
MEDU 14.2
NBRA 14.8
OTIS 15.7
QUEE 19.0
REDH 16.6
ROXB 14.0
SCHA 14.9
SCHO 16.4
SCHU 14.8
SOME 16.5
STAT 18.5
STEP 15.7
SUFF 16.5
TANN 13.1
VOOR 14.9
WALL 16.2
WANT 24.1
WARW 16.2
WBOU 14.4
Name: temp_2m [degC], dtype: float64
Now filter based on two conditions:¶
WarmWet = (nysm_data['precip_incremental [mm]'] > 0.0) & (nysm_data['temp_2m [degC]'] > thresh)
WarmWet
station
ADDI False
ANDE False
BATA False
BEAC True
BELD False
...
WFMB False
WGAT False
WHIT False
WOLC False
YORK False
Length: 126, dtype: bool
nysm_data.loc[WarmWet]
time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | wind_speed_stddev_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
station | |||||||||||||||||||||
BEAC | 2021-09-02 02:00:00 UTC | 16.0 | 15.9 | 98.6 | 1.93 | 37.53 | 0.48 | 3.2 | 6.5 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.4 | 19.3 | 19.8 | 0.51 | 0.36 | 0.37 |
BKLN | 2021-09-02 02:00:00 UTC | 19.9 | 20.0 | 97.9 | 3.21 | 75.19 | 0.93 | 3.4 | 9.5 | 2.3 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
BREW | 2021-09-02 02:00:00 UTC | 16.2 | 16.5 | 99.7 | 1.01 | 26.89 | 0.27 | 4.9 | 9.8 | 1.7 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 20.3 | 20.3 | 0.41 | 0.33 | 0.35 |
BRON | 2021-09-02 02:00:00 UTC | 18.0 | 17.8 | 99.0 | 6.86 | 110.35 | 1.84 | 6.8 | 11.5 | 2.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
DEPO | 2021-09-02 02:00:00 UTC | 16.0 | 16.1 | 90.1 | 0.05 | 3.17 | 0.00 | 2.5 | 5.8 | 1.5 | ... | NaN | 0.0 | 0.0 | NaN | 19.9 | 20.7 | NaN | 0.17 | 0.27 | NaN |
DOVE | 2021-09-02 02:00:00 UTC | 16.0 | NaN | 95.2 | 1.35 | 24.69 | 0.36 | 2.1 | 4.1 | 0.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.6 | 21.5 | 21.5 | 0.39 | 0.22 | 0.15 |
HFAL | 2021-09-02 02:00:00 UTC | 15.9 | 15.9 | 99.5 | 0.86 | 21.89 | 0.20 | 4.3 | 7.5 | 1.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.7 | 21.1 | 22.1 | 0.47 | 0.35 | 0.35 |
KIND | 2021-09-02 02:00:00 UTC | 16.7 | 16.7 | 98.6 | 0.73 | 9.75 | 0.19 | 2.3 | 5.1 | 1.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.3 | 21.9 | 21.7 | 0.46 | 0.41 | 0.44 |
MANH | 2021-09-02 02:00:00 UTC | 18.7 | NaN | 100.0 | 4.64 | 111.03 | 1.22 | 5.4 | 9.5 | 2.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
OTIS | 2021-09-02 02:00:00 UTC | 15.7 | 15.6 | 98.8 | 0.88 | 28.49 | 0.19 | 1.1 | 2.6 | 0.6 | ... | NaN | 0.0 | 0.0 | NaN | 18.4 | 20.4 | NaN | 0.34 | 0.28 | NaN |
QUEE | 2021-09-02 02:00:00 UTC | 19.0 | NaN | 98.0 | 11.95 | 63.43 | 2.68 | 10.0 | 16.5 | 2.5 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
REDH | 2021-09-02 02:00:00 UTC | 16.6 | 16.6 | 97.8 | 0.82 | 19.73 | 0.24 | 3.7 | 6.6 | 1.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.9 | 23.9 | 24.3 | 0.28 | 0.25 | 0.30 |
SCHO | 2021-09-02 02:00:00 UTC | 16.4 | 16.2 | 99.9 | 0.55 | 9.07 | 0.18 | 1.6 | 4.5 | 1.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 21.1 | 21.9 | 21.9 | 0.33 | 0.19 | 0.14 |
SOME | 2021-09-02 02:00:00 UTC | 16.5 | 16.4 | 98.3 | 1.75 | 46.43 | 0.44 | 2.6 | 6.0 | 1.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 21.7 | 21.3 | 0.69 | 0.42 | 0.62 |
STAT | 2021-09-02 02:00:00 UTC | 18.5 | 18.5 | 97.6 | 0.39 | 93.40 | 0.00 | 6.1 | 9.4 | 1.6 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
STEP | 2021-09-02 02:00:00 UTC | 15.7 | 15.6 | 93.0 | 0.73 | 8.54 | 0.21 | 3.1 | 5.9 | 1.1 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.4 | 19.9 | 19.7 | 0.32 | 0.33 | 0.31 |
SUFF | 2021-09-02 02:00:00 UTC | 16.5 | 16.2 | 98.6 | 0.65 | 66.08 | 0.16 | 4.2 | 12.1 | 2.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 20.3 | 20.1 | 21.7 | 0.47 | 0.34 | 0.28 |
WALL | 2021-09-02 02:00:00 UTC | 16.2 | 16.2 | 98.4 | 1.77 | 24.48 | 0.45 | 3.3 | 5.9 | 1.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 19.9 | 21.9 | 21.5 | 0.57 | 0.42 | 0.44 |
WANT | 2021-09-02 02:00:00 UTC | 24.1 | 24.0 | 98.7 | 0.06 | 3.47 | 0.00 | 8.0 | 13.3 | 1.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 23.7 | 24.4 | 24.3 | 0.22 | 0.05 | 0.05 |
WARW | 2021-09-02 02:00:00 UTC | 16.2 | 16.2 | 100.0 | 1.35 | 46.92 | 0.29 | 2.7 | 4.4 | 0.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 18.2 | 21.1 | 21.8 | 0.43 | 0.32 | 0.31 |
20 rows × 29 columns
DataFrame
?idxmax
/ idxmin
method to display the row label of the maximum / minimum value in a Series
.Let’s create Series
objects from the NYS Mesonet Station Info table, as well as from the data table.
stid = nysm_sites['stid']
lats = nysm_sites['lat']
lons = nysm_sites['lon']
time = nysm_data['time']
tmpc = nysm_data['temp_2m [degC]']
rh = nysm_data['relative_humidity [percent]']
pres = nysm_data['station_pressure [mbar]']
wspd = nysm_data['max_wind_speed_prop [m/s]']
drct = nysm_data['wind_direction_prop [degrees]']
pinc = nysm_data['precip_incremental [mm]']
ptot = nysm_data['precip_local [mm]']
pint = nysm_data['precip_max_intensity [mm/min]']
What is the maximum 2m temperature of all stations in the DataFrame
?
tmpcMax = tmpc.max()
tmpcMax
24.1
In what row # occurs the maximum temperature? We might think to use the idxmax
method …
siteTmpcMax = tmpc.idxmax()
siteTmpcMax
'WANT'
That works, but it outputs the row index name (recall that when we read in the data file, we specified that the site ID column woudl be used for the row index name).¶
Let’s use the argmax
method to get the row index number.
indxTmpcMax = tmpc.argmax()
indxTmpcMax
114
We can do the same for another extrema … e.g., minimum 2m temperature.
tmpcMin = tmpc.min()
siteTmpcMin = tmpc.idxmin()
indxTmpcMin = tmpc.argmin()
Knowing the row index #, we can get the corresponding full station name from the NYSM station table:¶
siteNameTmpcMax = nysm_sites['name'].iloc[indxTmpcMax]
siteNameTmpcMin = nysm_sites['name'].iloc[indxTmpcMin]
print ("Warmest NYSM Site: %s (%s), %.1f" % (siteNameTmpcMax,siteTmpcMax, tmpcMax))
print ("Coldest NYSM Site: %s (%s), %.1f" % (siteNameTmpcMin,siteTmpcMin, tmpcMin))
Warmest NYSM Site: Wantagh (WANT), 24.1
Coldest NYSM Site: Edwards (EDWA), 10.3
Consider: would you want to just find the max/min station pressure values, and then convert those two values to SLP? Or would it be better to first apply the SLP reduction to the entire station pressure Series
?
So, first create a series of the station pressure values, and then apply the SLP reduction formula. This requires that we retrieve the elevation data from the site file as well.
pres = nysm_data['station_pressure [mbar]']
elev = nysm_sites['elevation']
Now, it should be as straightforward as passing these two Series
to the SLP reduction formula … shouldn’t it?¶
Employ the SLP reduction equation that we used in the previous notebook:
sensorHeight = .5
# Reduce station pressure to SLP. Source: https://www.sandhurstweather.org.uk/barometric.pdf
slp = pres/np.exp(-1*(elev+sensorHeight)/((tmpc+273.15) * 29.263))
Looks like it worked! Let’s see what we get for max/min values:
print(slp.max(), slp.min())
nan nan
Uh-oh … nan
signifies missing data! Let’s look at what our slp Series
object looks like:¶
slp
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
WFMB NaN
WGAT NaN
WHIT NaN
WOLC NaN
YORK NaN
Length: 252, dtype: float64
Look at the length of the series! It’s 252 … which is 126 times 2!¶
Let’s just look at all the rows of the series first …
pd.set_option('display.max_rows',None)
slp
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
20 NaN
21 NaN
22 NaN
23 NaN
24 NaN
25 NaN
26 NaN
27 NaN
28 NaN
29 NaN
30 NaN
31 NaN
32 NaN
33 NaN
34 NaN
35 NaN
36 NaN
37 NaN
38 NaN
39 NaN
40 NaN
41 NaN
42 NaN
43 NaN
44 NaN
45 NaN
46 NaN
47 NaN
48 NaN
49 NaN
50 NaN
51 NaN
52 NaN
53 NaN
54 NaN
55 NaN
56 NaN
57 NaN
58 NaN
59 NaN
60 NaN
61 NaN
62 NaN
63 NaN
64 NaN
65 NaN
66 NaN
67 NaN
68 NaN
69 NaN
70 NaN
71 NaN
72 NaN
73 NaN
74 NaN
75 NaN
76 NaN
77 NaN
78 NaN
79 NaN
80 NaN
81 NaN
82 NaN
83 NaN
84 NaN
85 NaN
86 NaN
87 NaN
88 NaN
89 NaN
90 NaN
91 NaN
92 NaN
93 NaN
94 NaN
95 NaN
96 NaN
97 NaN
98 NaN
99 NaN
100 NaN
101 NaN
102 NaN
103 NaN
104 NaN
105 NaN
106 NaN
107 NaN
108 NaN
109 NaN
110 NaN
111 NaN
112 NaN
113 NaN
114 NaN
115 NaN
116 NaN
117 NaN
118 NaN
119 NaN
120 NaN
121 NaN
122 NaN
123 NaN
124 NaN
125 NaN
ADDI NaN
ANDE NaN
BATA NaN
BEAC NaN
BELD NaN
BELL NaN
BELM NaN
BERK NaN
BING NaN
BKLN NaN
BRAN NaN
BREW NaN
BROC NaN
BRON NaN
BROO NaN
BSPA NaN
BUFF NaN
BURD NaN
BURT NaN
CAMD NaN
CAPE NaN
CHAZ NaN
CHES NaN
CINC NaN
CLAR NaN
CLIF NaN
CLYM NaN
COBL NaN
COHO NaN
COLD NaN
COPA NaN
COPE NaN
CROG NaN
CSQR NaN
DELE NaN
DEPO NaN
DOVE NaN
DUAN NaN
EAUR NaN
EDIN NaN
EDWA NaN
ELDR NaN
ELLE NaN
ELMI NaN
ESSX NaN
FAYE NaN
FRED NaN
GABR NaN
GFAL NaN
GFLD NaN
GROT NaN
GROV NaN
HAMM NaN
HARP NaN
HARR NaN
HART NaN
HERK NaN
HFAL NaN
ILAK NaN
JOHN NaN
JORD NaN
KIND NaN
LAUR NaN
LOUI NaN
MALO NaN
MANH NaN
MEDI NaN
MEDU NaN
MORR NaN
NBRA NaN
NEWC NaN
NHUD NaN
OLDF NaN
OLEA NaN
ONTA NaN
OPPE NaN
OSCE NaN
OSWE NaN
OTIS NaN
OWEG NaN
PENN NaN
PHIL NaN
PISE NaN
POTS NaN
QUEE NaN
RAND NaN
RAQU NaN
REDF NaN
REDH NaN
ROXB NaN
RUSH NaN
SARA NaN
SBRI NaN
SCHA NaN
SCHO NaN
SCHU NaN
SCIP NaN
SHER NaN
SOME NaN
SOUT NaN
SPRA NaN
SPRI NaN
STAT NaN
STEP NaN
STON NaN
SUFF NaN
TANN NaN
TICO NaN
TULL NaN
TUPP NaN
TYRO NaN
VOOR NaN
WALL NaN
WALT NaN
WANT NaN
WARS NaN
WARW NaN
WATE NaN
WBOU NaN
WELL NaN
WEST NaN
WFMB NaN
WGAT NaN
WHIT NaN
WOLC NaN
YORK NaN
dtype: float64
Missing (i.e. NaN
) values everywhere, and indeed, we have double the number of rows that we thought we’d have!¶
Let’s take a look at our elev and pres Series objects.
# go back to default # of rows displayed
pd.set_option('display.max_rows',10)
elev
0 507.6140
1 518.2820
2 276.1200
3 90.1598
4 470.3700
...
121 614.5990
122 442.9660
123 36.5638
124 121.2190
125 177.9420
Name: elevation, Length: 126, dtype: float64
pres
station
ADDI 951.47
ANDE 947.40
BATA 979.93
BEAC 994.69
BELD 954.02
...
WFMB 941.26
WGAT 959.46
WHIT 1006.29
WOLC 996.90
YORK 991.17
Name: station_pressure [mbar], Length: 126, dtype: float64
The issue is that our elev and pres series have different row indexes! In order for calculations to work when applied to different Series objects, their indexes must correspond!¶
Recall that we set the row index for the nysm_data
DataFrame to station id, but kept the default for nysm_sites
.
Since tmpc
and pres
both have their row index set to station id, let’s set elev
’s index to that as well.¶
elev.index = pres.index
elev
station
ADDI 507.6140
ANDE 518.2820
BATA 276.1200
BEAC 90.1598
BELD 470.3700
...
WFMB 614.5990
WGAT 442.9660
WHIT 36.5638
WOLC 121.2190
YORK 177.9420
Name: elevation, Length: 126, dtype: float64
Now, repeat the SLP reduction calculation.¶
slp = pres/np.exp(-1*(elev+sensorHeight)/((tmpc+273.15) * 29.263))
slpMax = slp.max()
slpMin = slp.min()
print (slpMax, slpMin)
1014.0598722184151 998.940242497739
With the row indexes common among all three Series in the equation, it works!¶
We can now find the corresponding row numbers and summarize as we did for max/min temperature.
siteSlpMax = slp.idxmax()
indxSlpMax = slp.argmax()
siteSlpMin = slp.idxmin()
indxSlpMin = slp.argmin()
siteNameSlpMax = nysm_sites['name'].iloc[indxSlpMax]
siteNameSlpMin = nysm_sites['name'].iloc[indxSlpMin]
print ("Highest SLP NYSM Site: %s (%s), %.1f" % (siteNameSlpMax,siteSlpMax, slpMax))
print ("Lowest SLP NYSM Site: %s (%s), %.1f" % (siteNameSlpMin,siteSlpMin, slpMin))
Highest SLP NYSM Site: Clymer (CLYM), 1014.1
Lowest SLP NYSM Site: Brooklyn (BKLN), 998.9
Summary¶
Use the
loc
andiloc
methods to select rows, columns, or combinations of the two in a PandasDataFrame
.These two methods can also be used for
Series
.Setting conditions on particular rows or columns produce True/False values, which can then be passed into a
DataFrame
orSeries
for more meaningful output.The
idxmax
/idxmin
andargmax
/argmin
functions can be combined to provide meaningful output from multipleDataFrames
.
What’s Next?¶
In the next notebook, we will explore how Pandas lets us efficiently work with data that are date- and time-based.