Pandas 8: Multi-index DataFrames¶

In many instances, tabular data may best be represented with more than a single row index. For the case of NYSM hourly data files, which have 126 stations each with 13 sets of 5-minute obs, we definitely want to take advantage of using Multi-index Dataframes.¶

Overview¶

Open an hour’s worth of NYSM data using a single row index dataframe
Review selection and conditional selection methodology
Create a dataframe with station ID and date/time as the two row indices
Use selection and conditions on a multi-index dataframe
Briefly define tuples as opposed to lists in Python
Work with the time index as a Datetime object

Prerequisites¶

Concepts	Importance	Notes
Pandas notebooks 1-8	Necessary

Time to learn: 15 minutes

Imports¶

import pandas as pd
from datetime import datetime 

Open an hour’s worth of NYSM data using a single row index dataframe¶

nysm_data_file = '/spare11/atm533/data/nysm_data_2021090202a.csv'

df = pd.read_csv(nysm_data_file)
# Look at the first few rows
df.head(3)

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
0	ADDI	2021-09-02 02:00:00 UTC	13.3	13.4	92.5	2.6	5.8	...	NaN	19.9	20.3	19.9	0.52	0.44	0.44
1	ADDI	2021-09-02 02:05:00 UTC	13.2	13.4	92.8	2.7	4.3	...	NaN	19.9	20.3	19.9	0.52	0.44	0.44
2	ADDI	2021-09-02 02:10:00 UTC	13.1	13.3	92.8	2.1	3.4	...	NaN	19.9	20.3	19.9	0.52	0.44	0.44

3 rows × 30 columns

Remind ourselves that the default DataFrame’s index is a special type of Python object, called a RangeIndex.

df.index

RangeIndex(start=0, stop=1512, step=1)

Review selection and conditional selection methodology¶

Here we have multiple stations and multiple times. We can select rows/columns, set conditions, and make further selections based on those conditions as we did in the 03_Pandas_IndexSubsetsConditonals notebook.

df[df['station']=='VOOR'] # Voorheesville, NY

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
1332	VOOR	2021-09-02 02:00:00 UTC	14.9	14.9	98.9	0.26	11.67	0.00	2.2	3.6	...	NaN	20.2	21.1	21.1	0.25	0.33	0.19
1333	VOOR	2021-09-02 02:05:00 UTC	14.8	14.8	99.0	0.27	11.94	0.00	3.0	4.6	...	NaN	20.1	21.1	21.1	0.25	0.33	0.19
1334	VOOR	2021-09-02 02:10:00 UTC	14.7	14.7	98.9	0.39	12.33	0.11	3.2	5.1	...	NaN	20.1	21.1	21.1	0.25	0.33	0.19
1335	VOOR	2021-09-02 02:15:00 UTC	14.6	14.6	98.9	0.80	13.13	0.18	3.6	5.6	...	NaN	20.1	21.1	21.1	0.26	0.33	0.19
1336	VOOR	2021-09-02 02:20:00 UTC	14.6	14.6	98.9	0.88	14.01	0.18	2.9	6.4	...	NaN	20.1	21.1	21.1	0.26	0.33	0.19
1337	VOOR	2021-09-02 02:25:00 UTC	14.5	14.5	98.9	0.66	14.67	0.15	2.9	5.8	...	NaN	20.1	21.1	21.1	0.27	0.33	0.19
1338	VOOR	2021-09-02 02:30:00 UTC	14.5	14.5	99.0	0.45	15.12	0.00	2.5	4.6	...	NaN	20.1	21.1	21.1	0.27	0.34	0.19
1339	VOOR	2021-09-02 02:35:00 UTC	14.4	14.5	99.0	0.55	15.67	0.13	2.6	3.8	...	NaN	20.0	21.1	21.1	0.28	0.34	0.19
1340	VOOR	2021-09-02 02:40:00 UTC	14.4	14.4	99.0	0.99	16.66	0.25	2.2	4.3	...	NaN	19.9	21.1	21.1	0.28	0.34	0.19
1341	VOOR	2021-09-02 02:45:00 UTC	14.3	14.4	98.9	1.51	18.17	0.39	1.7	3.5	...	NaN	19.9	21.1	21.1	0.29	0.34	0.19
1342	VOOR	2021-09-02 02:50:00 UTC	14.2	14.3	98.8	1.30	19.47	0.31	1.5	3.4	...	NaN	19.9	20.9	21.1	0.29	0.35	0.19
1343	VOOR	2021-09-02 02:55:00 UTC	14.2	14.3	98.4	0.77	20.24	0.19	2.1	3.7	...	NaN	19.9	20.9	21.1	0.29	0.35	0.19

12 rows × 30 columns

Create a second dataframe that first contains just VOOR rows, then further subset by choosing one time.

df2 = df[df['station']=='VOOR']
df2[df2['time'] == '2021-09-02 02:50:00 UTC'] # we haven't recast this as a Datetime object ... it's just a string

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
1342	VOOR	2021-09-02 02:50:00 UTC	14.2	14.3	98.8	1.3	19.47	0.31	1.5	3.4	...	NaN	0.0	0.0	0.0	19.9	20.9	21.1	0.29	0.35	0.19

1 rows × 30 columns

However, there is a better way! Since our DataFrame consists of one station after another, each associated with a number of discrete times, let’s create a multi-indexed DataFrame, that has station as its outer row index and time as its inner.

Create a dataframe with station ID and date/time as the two row indices¶

df.set_index(['station', 'time'], inplace = True)

Note: inplace means that the df object gets re-created with its new indexes. It's convenient, but if at any point we wish to re-run cells beyond this point, it is better to start from the beginning to ensure that the df object conforms to whatever code cell operates on it.

df

		temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	wind_speed_stddev_prop [m/s]	wind_direction_prop [degrees]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
station	time
ADDI	2021-09-02 02:00:00 UTC	13.3	13.4	92.5	0.0	0.0	0.0	2.6	5.8	0.9	344.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
	2021-09-02 02:05:00 UTC	13.2	13.4	92.8	0.0	0.0	0.0	2.7	4.3	0.6	340.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
	2021-09-02 02:10:00 UTC	13.1	13.3	92.8	0.0	0.0	0.0	2.1	3.4	0.5	336.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
	2021-09-02 02:15:00 UTC	13.0	13.2	92.9	0.0	0.0	0.0	2.4	4.1	0.6	342.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
	2021-09-02 02:20:00 UTC	12.9	13.1	93.0	0.0	0.0	0.0	2.3	4.6	0.9	338.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
YORK	2021-09-02 02:35:00 UTC	11.7	12.8	97.1	0.0	0.0	0.0	0.2	0.6	0.2	136.0	...	NaN	0.0	0.0	0.0	20.3	21.7	21.9	0.13	0.24	0.24
	2021-09-02 02:40:00 UTC	11.6	12.5	97.0	0.0	0.0	0.0	0.6	0.8	0.1	131.0	...	NaN	0.0	0.0	0.0	20.3	21.7	21.9	0.13	0.24	0.24
	2021-09-02 02:45:00 UTC	11.6	12.5	97.2	0.0	0.0	0.0	0.1	0.7	0.2	165.0	...	NaN	0.0	0.0	0.0	20.3	21.7	21.9	0.13	0.24	0.24
	2021-09-02 02:50:00 UTC	11.5	12.7	97.3	0.0	0.0	0.0	0.7	0.8	0.1	193.0	...	NaN	0.0	0.0	0.0	20.2	21.7	21.9	0.13	0.24	0.24
	2021-09-02 02:55:00 UTC	11.5	12.4	97.4	0.0	0.0	0.0	0.5	0.7	0.1	168.0	...	NaN	0.0	0.0	0.0	20.1	21.7	21.9	0.13	0.24	0.24

1512 rows × 28 columns

df.index

MultiIndex([('ADDI', '2021-09-02 02:00:00 UTC'),
            ('ADDI', '2021-09-02 02:05:00 UTC'),
            ('ADDI', '2021-09-02 02:10:00 UTC'),
            ('ADDI', '2021-09-02 02:15:00 UTC'),
            ('ADDI', '2021-09-02 02:20:00 UTC'),
            ('ADDI', '2021-09-02 02:25:00 UTC'),
            ('ADDI', '2021-09-02 02:30:00 UTC'),
            ('ADDI', '2021-09-02 02:35:00 UTC'),
            ('ADDI', '2021-09-02 02:40:00 UTC'),
            ('ADDI', '2021-09-02 02:45:00 UTC'),
            ...
            ('YORK', '2021-09-02 02:10:00 UTC'),
            ('YORK', '2021-09-02 02:15:00 UTC'),
            ('YORK', '2021-09-02 02:20:00 UTC'),
            ('YORK', '2021-09-02 02:25:00 UTC'),
            ('YORK', '2021-09-02 02:30:00 UTC'),
            ('YORK', '2021-09-02 02:35:00 UTC'),
            ('YORK', '2021-09-02 02:40:00 UTC'),
            ('YORK', '2021-09-02 02:45:00 UTC'),
            ('YORK', '2021-09-02 02:50:00 UTC'),
            ('YORK', '2021-09-02 02:55:00 UTC')],
           names=['station', 'time'], length=1512)

The date/time index is currently a string. Let’s do this nifty trick to convert it to a datetime object.

df.index = df.index.set_levels([df.index.levels[0], pd.to_datetime(df.index.levels[1])])

#Examine first three row indices
df.index[:3]

MultiIndex([('ADDI', '2021-09-02 02:00:00+00:00'),
            ('ADDI', '2021-09-02 02:05:00+00:00'),
            ('ADDI', '2021-09-02 02:10:00+00:00')],
           names=['station', 'time'])

Use selection and conditions on a multi-index dataframe¶

Select a column in the usual way.

df.loc['VOOR']

	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	wind_speed_stddev_prop [m/s]	wind_direction_prop [degrees]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
time
2021-09-02 02:00:00+00:00	14.9	14.9	98.9	0.26	11.67	0.00	2.2	3.6	0.6	301.0	...	NaN	0.0	0.0	0.0	20.2	21.1	21.1	0.25	0.33	0.19
2021-09-02 02:05:00+00:00	14.8	14.8	99.0	0.27	11.94	0.00	3.0	4.6	0.8	299.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.25	0.33	0.19
2021-09-02 02:10:00+00:00	14.7	14.7	98.9	0.39	12.33	0.11	3.2	5.1	0.8	301.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.25	0.33	0.19
2021-09-02 02:15:00+00:00	14.6	14.6	98.9	0.80	13.13	0.18	3.6	5.6	0.7	299.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.26	0.33	0.19
2021-09-02 02:20:00+00:00	14.6	14.6	98.9	0.88	14.01	0.18	2.9	6.4	0.9	300.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.26	0.33	0.19
2021-09-02 02:25:00+00:00	14.5	14.5	98.9	0.66	14.67	0.15	2.9	5.8	0.9	298.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.27	0.33	0.19
2021-09-02 02:30:00+00:00	14.5	14.5	99.0	0.45	15.12	0.00	2.5	4.6	0.8	301.0	...	NaN	0.0	0.0	0.0	20.1	21.1	21.1	0.27	0.34	0.19
2021-09-02 02:35:00+00:00	14.4	14.5	99.0	0.55	15.67	0.13	2.6	3.8	0.5	292.0	...	NaN	0.0	0.0	0.0	20.0	21.1	21.1	0.28	0.34	0.19
2021-09-02 02:40:00+00:00	14.4	14.4	99.0	0.99	16.66	0.25	2.2	4.3	0.8	298.0	...	NaN	0.0	0.0	0.0	19.9	21.1	21.1	0.28	0.34	0.19
2021-09-02 02:45:00+00:00	14.3	14.4	98.9	1.51	18.17	0.39	1.7	3.5	0.7	290.0	...	NaN	0.0	0.0	0.0	19.9	21.1	21.1	0.29	0.34	0.19
2021-09-02 02:50:00+00:00	14.2	14.3	98.8	1.30	19.47	0.31	1.5	3.4	0.8	307.0	...	NaN	0.0	0.0	0.0	19.9	20.9	21.1	0.29	0.35	0.19
2021-09-02 02:55:00+00:00	14.2	14.3	98.4	0.77	20.24	0.19	2.1	3.7	0.8	301.0	...	NaN	0.0	0.0	0.0	19.9	20.9	21.1	0.29	0.35	0.19

12 rows × 28 columns

Use loc to select not only the station id, but also a specific time: we pass in a tuple to the loc method

Briefly define tuples as opposed to lists in Python¶

A tuple is a core Python class, similar to but distinct from a list

Tuples are enclosed in parentheses (); lists are enclosed in brackets []
Elements in a tuple are immutable (i.e. can’t be changed), but list elements can be changed.

a = (4,7,9) # a tuple
print (a)
print(a[2])
# this next line won't work; comment it out to see:
# a[2] = 8

(4, 7, 9)
9

b = [4, 7, 9]
print (b)
print (b[2])
# this will work:
b[2] = 8
print(b)

[4, 7, 9]
9
[4, 7, 8]

Tip: Some Pandas methods accept tuples as their arguments, although lists will usually work too. When in doubt, append a ? to the method call to see the documentation.

What can be confusing is that just because something is enclosed in brackets, that doesn’t mean it’s a Python list object. In terms of Pandas, the DataFrame’s loc method typically expects a string that’s enclosed in brackets.

Tip: Note: we can pass in the requested time as a string ... it will work!

df.loc[('VOOR','2021-09-02 02:45:00 UTC')]

temp_2m [degC]                            14.30
temp_9m [degC]                            14.40
relative_humidity [percent]               98.90
precip_incremental [mm]                    1.51
precip_local [mm]                         18.17
precip_max_intensity [mm/min]              0.39
avg_wind_speed_prop [m/s]                  1.70
max_wind_speed_prop [m/s]                  3.50
wind_speed_stddev_prop [m/s]               0.70
wind_direction_prop [degrees]            290.00
wind_direction_stddev_prop [degrees]      21.00
avg_wind_speed_sonic [m/s]                 1.90
max_wind_speed_sonic [m/s]                 3.60
wind_speed_stddev_sonic [m/s]              0.70
wind_direction_sonic [degrees]           294.00
wind_direction_stddev_sonic [degrees]     20.00
solar_insolation [W/m^2]                   0.00
station_pressure [mbar]                  994.87
snow_depth [cm]                             NaN
frozen_soil_05cm [bit]                     0.00
frozen_soil_25cm [bit]                     0.00
frozen_soil_50cm [bit]                     0.00
soil_temp_05cm [degC]                     19.90
soil_temp_25cm [degC]                     21.10
soil_temp_50cm [degC]                     21.10
soil_moisture_05cm [m^3/m^3]               0.29
soil_moisture_25cm [m^3/m^3]               0.34
soil_moisture_50cm [m^3/m^3]               0.19
Name: (VOOR, 2021-09-02 02:45:00+00:00), dtype: float64

As above, but also select a particular column:

df.loc[('VOOR','2021-09-02 02:45:00 UTC'),'temp_2m [degC]' ]

14.3

Taking advantage of multi-indexing allows us to write selection criteria that is more intuitive than how we’ve previously defined and utilized conditonal statements in our selection criteria (as in our cell earlier in the notebook with nested data frame object names). It also executes quicker!

Pass in multiple columns (either as a list or a tuple) … get back a DataFrame whose columns are the ones we selected

df.loc[('VOOR','2021-09-02 02:45:00 UTC'),('temp_2m [degC]','precip_incremental [mm]')]

temp_2m [degC]             14.30
precip_incremental [mm]     1.51
Name: (VOOR, 2021-09-02 02:45:00+00:00), dtype: float64

Pass in multiple indexes and multiple columns also returns DataFrames:

Two outer and one inner index:

df.loc[(('VOOR','KIND'),'2021-09-02 02:45:00 UTC'),('temp_2m [degC]','precip_incremental [mm]')]

		temp_2m [degC]	precip_incremental [mm]
station	time
VOOR	2021-09-02 02:45:00+00:00	14.3	1.51
KIND	2021-09-02 02:45:00+00:00	16.4	0.75

Two outer and two inner:

df.loc[(('VOOR','KIND'),('2021-09-02 02:45:00 UTC','2021-09-02 02:55:00 UTC')),('temp_2m [degC]','precip_incremental [mm]')]

		temp_2m [degC]	precip_incremental [mm]
station	time
VOOR	2021-09-02 02:45:00+00:00	14.3	1.51
VOOR	2021-09-02 02:55:00+00:00	14.2	0.77
KIND	2021-09-02 02:45:00+00:00	16.4	0.75
KIND	2021-09-02 02:55:00+00:00	16.4	0.56

One outer and two inner:

df.loc[('VOOR',('2021-09-02 02:45:00 UTC','2021-09-02 02:55:00 UTC')),('temp_2m [degC]','precip_incremental [mm]')]

		temp_2m [degC]	precip_incremental [mm]
station	time
VOOR	2021-09-02 02:45:00+00:00	14.3	1.51
VOOR	2021-09-02 02:55:00+00:00	14.2	0.77

A more efficient way to get all stations at a particular time is via Pandas’ xs DataFrame method.

# Pass in one index value and what index it belongs to
df.xs('2021-09-02 02:45:00 UTC', level='time')

	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	wind_speed_stddev_prop [m/s]	wind_direction_prop [degrees]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
station
ADDI	12.8	13.0	92.3	0.00	0.00	0.00	2.3	5.0	0.9	346.0	...	NaN	0.0	0.0	0.0	19.9	20.3	19.9	0.52	0.44	0.44
ANDE	14.2	14.0	99.7	0.16	13.74	0.00	3.1	6.0	0.9	344.0	...	NaN	0.0	0.0	0.0	19.1	19.1	19.2	0.28	0.22	0.14
BATA	14.1	16.2	77.0	0.00	0.00	0.00	2.4	2.6	0.1	309.0	...	NaN	0.0	0.0	0.0	19.9	21.5	21.3	0.25	0.21	0.22
BEAC	15.8	15.6	98.9	1.01	56.83	0.25	3.4	5.5	0.9	35.0	...	NaN	0.0	0.0	0.0	17.9	18.9	19.5	0.51	0.36	0.37
BELD	14.1	14.2	93.8	0.00	1.16	0.00	3.4	6.5	1.2	5.0	...	NaN	0.0	0.0	0.0	19.7	20.1	20.2	0.50	0.43	0.41
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
WFMB	13.0	13.6	74.2	0.00	0.00	0.00	0.4	2.0	0.4	260.0	...	NaN	0.0	0.0	0.0	18.3	19.9	19.9	0.25	0.18	0.20
WGAT	13.5	13.4	80.4	0.00	0.00	0.00	0.9	3.3	0.7	22.0	...	NaN	0.0	0.0	0.0	18.4	20.1	20.7	0.15	0.25	0.08
WHIT	16.3	16.3	93.2	0.00	0.00	0.00	1.6	3.1	0.5	28.0	...	NaN	0.0	0.0	0.0	19.1	20.3	20.1	0.28	0.47	0.46
WOLC	13.6	16.3	86.1	0.00	0.00	0.00	0.6	0.9	0.1	317.0	...	NaN	0.0	0.0	0.0	21.5	23.5	24.2	0.18	0.03	0.07
YORK	11.6	12.5	97.2	0.00	0.00	0.00	0.1	0.7	0.2	165.0	...	NaN	0.0	0.0	0.0	20.3	21.7	21.9	0.13	0.24	0.24

126 rows × 28 columns

Include just a list of columns (a tuple won’t work here)

df.xs('2021-09-02 02:45:00 UTC', level = 'time')[['temp_2m [degC]','temp_9m [degC]']]

	temp_2m [degC]	temp_9m [degC]
station
ADDI	12.8	13.0
ANDE	14.2	14.0
BATA	14.1	16.2
BEAC	15.8	15.6
BELD	14.1	14.2
...	...	...
WFMB	13.0	13.6
WGAT	13.5	13.4
WHIT	16.3	16.3
WOLC	13.6	16.3
YORK	11.6	12.5

126 rows × 2 columns

Work with the time index as a Datetime object:¶

Pandas’ handling of datetime objects is incredibly powerful (see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) . Below, we select 3 times using the date_range method.

timeRange = pd.date_range('2021-09-02 02:25', periods=3, freq='5min')

df.loc[('MANH',timeRange.values),('max_wind_speed_prop [m/s]', 'max_wind_speed_sonic [m/s]')]

/tmp/ipykernel_4158000/1874800629.py:1: FutureWarning: Indexing a timezone-aware DatetimeIndex with a timezone-naive datetime is deprecated and will raise KeyError in a future version. Use a timezone-aware object instead.
  df.loc[('MANH',timeRange.values),('max_wind_speed_prop [m/s]', 'max_wind_speed_sonic [m/s]')]

		max_wind_speed_prop [m/s]	max_wind_speed_sonic [m/s]
station	time
MANH	2021-09-02 02:25:00+00:00	8.2	10.2
	2021-09-02 02:30:00+00:00	8.7	10.3
	2021-09-02 02:35:00+00:00	12.2	11.5

Summary¶

One can recast a Pandas DataFrame so it leverages multi-indexing
Selecting and specifying conditions on a multi-index DataFrame work just as in a single-index DataFrame
Pandas has a large set of utilities to work with time-series based data. xs is one such method.
In Python, tuples are akin to lists, but a tuple’s elements are immutable.

What’s Next?¶

We will next explore dataframes with geographic attributes, via the GeoPandas package.

ATM433/533 Fall 2023

Pandas 8: Multi-index DataFrames

Contents