pandas Logo

Pandas 5: Combining and Writing


Previously, we worked with NYS Mesonet data during the period when the remnants of Hurricane Ida was impacting parts of the region. We needed to use site location data that existed in a separate file in order to geolocate NYSM sites and to reduce station pressure to sea-level. It would be more convenient if we could have all the necessary data in a single file.

We’ll once again use NYS Mesonet data, but now we’ll use real-time info!

Overview

  1. Select the most recent time from a current NYSM file.

  2. Add columns from the NYSM Site Locations file to the new DataFrame.

  3. Write out the new DataFrame to a csv file.

Prerequisites

Concepts

Importance

Notes

Pandas Notebooks 1-5

Necessary

  • Time to learn: 30 minutes


Imports

import pandas as pd
from datetime import datetime 

Select the most recent time from a current NYSM dataset

Open the file containing the most recent hour’s worth of NYSM data (12 5-minute timesteps)

# Use a Linux shell command to take a peek at the current CSV file
! head -5 /data1/nysm/latest.csv
station,time,temp_2m [degC],temp_9m [degC],relative_humidity [percent],precip_incremental [mm],precip_local [mm],precip_max_intensity [mm/min],avg_wind_speed_prop [m/s],max_wind_speed_prop [m/s],wind_speed_stddev_prop [m/s],wind_direction_prop [degrees],wind_direction_stddev_prop [degrees],avg_wind_speed_sonic [m/s],max_wind_speed_sonic [m/s],wind_speed_stddev_sonic [m/s],wind_direction_sonic [degrees],wind_direction_stddev_sonic [degrees],solar_insolation [W/m^2],station_pressure [mbar],snow_depth [cm],frozen_soil_05cm [bit],frozen_soil_25cm [bit],frozen_soil_50cm [bit],soil_temp_05cm [degC],soil_temp_25cm [degC],soil_temp_50cm [degC],soil_moisture_05cm [m^3/m^3],soil_moisture_25cm [m^3/m^3],soil_moisture_50cm [m^3/m^3]

ADDI,2023-10-17 16:30:00 UTC,10.2,9.4,82.0,0.00,0.27,0.00,3.0,4.3,0.6,350,22,3.3,5.0,0.6,348,16,809,955.63,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

ADDI,2023-10-17 16:35:00 UTC,10.3,9.7,79.8,0.00,0.27,0.00,3.2,4.8,0.8,356,14,3.6,5.4,0.8,358,14,668,955.61,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

ADDI,2023-10-17 16:40:00 UTC,10.0,9.6,78.4,0.00,0.27,0.00,2.9,4.2,0.6,341,20,3.1,4.5,0.6,343,19,450,955.60,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.43

ADDI,2023-10-17 16:45:00 UTC,10.4,9.7,77.3,0.00,0.27,0.00,2.8,4.2,0.7,346,18,3.1,4.8,0.6,346,17,681,955.56,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

As we have done before, we will employ a lambda function to interpret the data in the time column as Datetime objects, but keep the default row index.

# First define the format and then define the function
timeFormat = "%Y-%m-%d %H:%M:%S UTC"

# This function will iterate over each string in a 1-d array
# and use Pandas' implementation of strptime to convert the string into a datetime object.
parseTime = lambda x: datetime.strptime(x, timeFormat)
nysm_data = pd.read_csv('/data1/nysm/latest.csv',parse_dates=['time'], date_parser=parseTime)
nysm_data
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
0 ADDI 2023-10-17 16:30:00 10.2 9.4 82.0 0.0 0.27 0.0 3.0 4.3 ... NaN 0.0 0.0 0.0 11.3 12.1 12.6 0.54 0.44 0.44
1 ADDI 2023-10-17 16:35:00 10.3 9.7 79.8 0.0 0.27 0.0 3.2 4.8 ... NaN 0.0 0.0 0.0 11.3 12.1 12.6 0.54 0.44 0.44
2 ADDI 2023-10-17 16:40:00 10.0 9.6 78.4 0.0 0.27 0.0 2.9 4.2 ... NaN 0.0 0.0 0.0 11.3 12.1 12.6 0.54 0.44 0.43
3 ADDI 2023-10-17 16:45:00 10.4 9.7 77.3 0.0 0.27 0.0 2.8 4.2 ... NaN 0.0 0.0 0.0 11.3 12.1 12.6 0.54 0.44 0.44
4 ADDI 2023-10-17 16:50:00 10.1 9.7 75.8 0.0 0.27 0.0 2.5 3.8 ... NaN 0.0 0.0 0.0 11.3 12.1 12.6 0.54 0.44 0.43
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1633 YORK 2023-10-17 17:10:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1634 YORK 2023-10-17 17:15:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1635 YORK 2023-10-17 17:20:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1636 YORK 2023-10-17 17:25:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1637 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

1638 rows × 30 columns

In order to select data corresponding to the most recent time, we treat “maximum” as “latest”.

most_recent = nysm_data.time.max()
most_recent
Timestamp('2023-10-17 17:30:00')

Set a condition on which we will select matching rows.

condition = nysm_data.time == most_recent
nysm_data[condition]
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
12 ADDI 2023-10-17 17:30:00 10.5 10.3 73.8 0.0 0.27 0.0 2.8 5.0 ... NaN 0.0 0.0 0.0 11.5 12.1 12.6 0.54 0.44 0.44
25 ANDE 2023-10-17 17:30:00 9.9 9.5 77.9 0.0 0.14 0.0 2.2 4.3 ... NaN 0.0 0.0 NaN 12.3 12.6 NaN 0.32 0.16 NaN
38 BATA 2023-10-17 17:30:00 10.2 10.2 84.5 0.0 1.10 0.0 1.8 2.4 ... NaN 0.0 0.0 0.0 11.1 11.6 13.0 0.15 0.09 0.09
51 BEAC 2023-10-17 17:30:00 14.8 14.3 61.9 0.0 0.00 0.0 0.5 1.5 ... NaN 0.0 0.0 0.0 13.5 13.8 14.3 0.32 0.26 0.22
64 BELD 2023-10-17 17:30:00 9.9 9.7 81.6 0.0 0.00 0.0 1.5 2.7 ... NaN 0.0 0.0 0.0 12.1 12.6 13.3 0.52 0.44 0.40
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1585 WFMB 2023-10-17 17:30:00 9.0 8.7 76.4 0.0 0.38 0.0 0.8 2.4 ... NaN 0.0 0.0 0.0 10.6 13.7 12.3 0.28 0.25 0.28
1598 WGAT 2023-10-17 17:30:00 10.4 9.7 74.2 0.0 0.00 0.0 1.6 3.3 ... NaN 0.0 0.0 0.0 10.9 11.0 11.9 0.15 0.26 0.09
1611 WHIT 2023-10-17 17:30:00 13.8 13.6 69.1 0.0 0.00 0.0 0.9 1.8 ... NaN 0.0 0.0 0.0 12.4 13.1 14.1 0.47 0.51 0.49
1624 WOLC 2023-10-17 17:30:00 11.7 11.5 82.9 0.0 2.08 0.0 1.6 2.8 ... NaN 0.0 0.0 0.0 13.7 13.0 13.7 0.17 0.02 0.07
1637 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

126 rows × 30 columns

The above cell returns another DataFrame. Note its row indices reflect their values in the original DataFrame. We need to reset this index. For this, we’ll use Pandas reset_index method.

Tip: Remember, you can typically view information about objects, functions and methods in Python by appending a ?. For example, you could create an execute a cell below that contains the line nysm_data.reset_index?
# Uncomment the next line if you wish to get info on the `reset_index` method
# nysm_data.reset_index?
nysm_data[condition].reset_index()
index station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
0 12 ADDI 2023-10-17 17:30:00 10.5 10.3 73.8 0.0 0.27 0.0 2.8 ... NaN 0.0 0.0 0.0 11.5 12.1 12.6 0.54 0.44 0.44
1 25 ANDE 2023-10-17 17:30:00 9.9 9.5 77.9 0.0 0.14 0.0 2.2 ... NaN 0.0 0.0 NaN 12.3 12.6 NaN 0.32 0.16 NaN
2 38 BATA 2023-10-17 17:30:00 10.2 10.2 84.5 0.0 1.10 0.0 1.8 ... NaN 0.0 0.0 0.0 11.1 11.6 13.0 0.15 0.09 0.09
3 51 BEAC 2023-10-17 17:30:00 14.8 14.3 61.9 0.0 0.00 0.0 0.5 ... NaN 0.0 0.0 0.0 13.5 13.8 14.3 0.32 0.26 0.22
4 64 BELD 2023-10-17 17:30:00 9.9 9.7 81.6 0.0 0.00 0.0 1.5 ... NaN 0.0 0.0 0.0 12.1 12.6 13.3 0.52 0.44 0.40
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 1585 WFMB 2023-10-17 17:30:00 9.0 8.7 76.4 0.0 0.38 0.0 0.8 ... NaN 0.0 0.0 0.0 10.6 13.7 12.3 0.28 0.25 0.28
122 1598 WGAT 2023-10-17 17:30:00 10.4 9.7 74.2 0.0 0.00 0.0 1.6 ... NaN 0.0 0.0 0.0 10.9 11.0 11.9 0.15 0.26 0.09
123 1611 WHIT 2023-10-17 17:30:00 13.8 13.6 69.1 0.0 0.00 0.0 0.9 ... NaN 0.0 0.0 0.0 12.4 13.1 14.1 0.47 0.51 0.49
124 1624 WOLC 2023-10-17 17:30:00 11.7 11.5 82.9 0.0 2.08 0.0 1.6 ... NaN 0.0 0.0 0.0 13.7 13.0 13.7 0.17 0.02 0.07
125 1637 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

126 rows × 31 columns

Notice the second column, called index? That contains the row numbers from the original nysm_data DataFrame which we don’t want. Let’s repeat the reset_index call, but also include the drop=True argument to reset_index.

nysm_data[condition].reset_index(drop=True)
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... snow_depth [cm] frozen_soil_05cm [bit] frozen_soil_25cm [bit] frozen_soil_50cm [bit] soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3]
0 ADDI 2023-10-17 17:30:00 10.5 10.3 73.8 0.0 0.27 0.0 2.8 5.0 ... NaN 0.0 0.0 0.0 11.5 12.1 12.6 0.54 0.44 0.44
1 ANDE 2023-10-17 17:30:00 9.9 9.5 77.9 0.0 0.14 0.0 2.2 4.3 ... NaN 0.0 0.0 NaN 12.3 12.6 NaN 0.32 0.16 NaN
2 BATA 2023-10-17 17:30:00 10.2 10.2 84.5 0.0 1.10 0.0 1.8 2.4 ... NaN 0.0 0.0 0.0 11.1 11.6 13.0 0.15 0.09 0.09
3 BEAC 2023-10-17 17:30:00 14.8 14.3 61.9 0.0 0.00 0.0 0.5 1.5 ... NaN 0.0 0.0 0.0 13.5 13.8 14.3 0.32 0.26 0.22
4 BELD 2023-10-17 17:30:00 9.9 9.7 81.6 0.0 0.00 0.0 1.5 2.7 ... NaN 0.0 0.0 0.0 12.1 12.6 13.3 0.52 0.44 0.40
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-10-17 17:30:00 9.0 8.7 76.4 0.0 0.38 0.0 0.8 2.4 ... NaN 0.0 0.0 0.0 10.6 13.7 12.3 0.28 0.25 0.28
122 WGAT 2023-10-17 17:30:00 10.4 9.7 74.2 0.0 0.00 0.0 1.6 3.3 ... NaN 0.0 0.0 0.0 10.9 11.0 11.9 0.15 0.26 0.09
123 WHIT 2023-10-17 17:30:00 13.8 13.6 69.1 0.0 0.00 0.0 0.9 1.8 ... NaN 0.0 0.0 0.0 12.4 13.1 14.1 0.47 0.51 0.49
124 WOLC 2023-10-17 17:30:00 11.7 11.5 82.9 0.0 2.08 0.0 1.6 2.8 ... NaN 0.0 0.0 0.0 13.7 13.0 13.7 0.17 0.02 0.07
125 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

126 rows × 30 columns

Now that we have established how to reset the row index, create a new DataFrame object.

df_latest = nysm_data[condition].reset_index(drop=True)

Add columns from the NYSM Site Locations file to the new DataFrame.

nysm_sites = pd.read_csv('/spare11/atm533/data/nysm.csv')
nysm_sites
stid number name lat lon elevation county nearest_city state distance_from_town [km] direction_from_town [degrees] climate_division climate_division_name wfo commissioned decommissioned
0 ADDI 107 Addison 42.040360 -77.237260 507.6140 Steuben Addison NY 6.9 S 1 Western Plateau BGM 2016-08-10 18:15:00 UTC NaN
1 ANDE 111 Andes 42.182270 -74.801390 518.2820 Delaware Andes NY 1.5 WSW 2 Eastern Plateau BGM 2016-08-04 15:55:00 UTC NaN
2 BATA 24 Batavia 43.019940 -78.135660 276.1200 Genesee Batavia NY 4.9 ENE 9 Great Lakes BUF 2016-02-18 18:40:00 UTC NaN
3 BEAC 76 Beacon 41.528750 -73.945270 90.1598 Dutchess Beacon NY 3.3 NE 5 Hudson Valley ALY 2016-08-22 16:45:00 UTC NaN
4 BELD 90 Belden 42.223220 -75.668520 470.3700 Broome Belden NY 2.2 NNE 2 Eastern Plateau BGM 2015-11-30 20:20:00 UTC NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 14 Whiteface Mountain Base 44.393236 -73.858829 614.5990 Essex Wilmington NY 3.5 W 3 Northern Plateau BTV 2016-01-29 20:55:00 UTC NaN
122 WGAT 123 Woodgate 43.532408 -75.158597 442.9660 Oneida Woodgate NY 1.4 NNW 3 Northern Plateau BGM 2016-08-29 18:20:00 UTC NaN
123 WHIT 10 Whitehall 43.485073 -73.423071 36.5638 Washington Whitehall NY 8.0 S 7 Champlain Valley ALY 2015-08-26 20:30:00 UTC NaN
124 WOLC 79 Wolcott 43.228680 -76.842610 121.2190 Wayne Wolcott NY 2.4 WNW 9 Great Lakes BUF 2016-03-09 18:10:00 UTC NaN
125 YORK 99 York 42.855040 -77.847760 177.9420 Livingston York NY 3.6 ESE 10 Central Lakes BUF 2016-08-09 17:55:00 UTC NaN

126 rows × 16 columns

To add a new column to a DataFrame, just create a new object for it.

df_latest['lat'] = nysm_sites['lat']
df_latest['lon'] = nysm_sites['lon']
df_latest['elevation'] = nysm_sites['elevation']
df_latest['name'] = nysm_sites['name']

Let’s look at what we have now. If you scroll over to the right, you will see the four columns have been appended to the df_latest DataFrame.

df_latest
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name
0 ADDI 2023-10-17 17:30:00 10.5 10.3 73.8 0.0 0.27 0.0 2.8 5.0 ... 11.5 12.1 12.6 0.54 0.44 0.44 42.040360 -77.237260 507.6140 Addison
1 ANDE 2023-10-17 17:30:00 9.9 9.5 77.9 0.0 0.14 0.0 2.2 4.3 ... 12.3 12.6 NaN 0.32 0.16 NaN 42.182270 -74.801390 518.2820 Andes
2 BATA 2023-10-17 17:30:00 10.2 10.2 84.5 0.0 1.10 0.0 1.8 2.4 ... 11.1 11.6 13.0 0.15 0.09 0.09 43.019940 -78.135660 276.1200 Batavia
3 BEAC 2023-10-17 17:30:00 14.8 14.3 61.9 0.0 0.00 0.0 0.5 1.5 ... 13.5 13.8 14.3 0.32 0.26 0.22 41.528750 -73.945270 90.1598 Beacon
4 BELD 2023-10-17 17:30:00 9.9 9.7 81.6 0.0 0.00 0.0 1.5 2.7 ... 12.1 12.6 13.3 0.52 0.44 0.40 42.223220 -75.668520 470.3700 Belden
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-10-17 17:30:00 9.0 8.7 76.4 0.0 0.38 0.0 0.8 2.4 ... 10.6 13.7 12.3 0.28 0.25 0.28 44.393236 -73.858829 614.5990 Whiteface Mountain Base
122 WGAT 2023-10-17 17:30:00 10.4 9.7 74.2 0.0 0.00 0.0 1.6 3.3 ... 10.9 11.0 11.9 0.15 0.26 0.09 43.532408 -75.158597 442.9660 Woodgate
123 WHIT 2023-10-17 17:30:00 13.8 13.6 69.1 0.0 0.00 0.0 0.9 1.8 ... 12.4 13.1 14.1 0.47 0.51 0.49 43.485073 -73.423071 36.5638 Whitehall
124 WOLC 2023-10-17 17:30:00 11.7 11.5 82.9 0.0 2.08 0.0 1.6 2.8 ... 13.7 13.0 13.7 0.17 0.02 0.07 43.228680 -76.842610 121.2190 Wolcott
125 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 42.855040 -77.847760 177.9420 York

126 rows × 34 columns

Let’s look at all the column names.

df_latest.columns
Index(['station', 'time', 'temp_2m [degC]', 'temp_9m [degC]',
       'relative_humidity [percent]', 'precip_incremental [mm]',
       'precip_local [mm]', 'precip_max_intensity [mm/min]',
       'avg_wind_speed_prop [m/s]', 'max_wind_speed_prop [m/s]',
       'wind_speed_stddev_prop [m/s]', 'wind_direction_prop [degrees]',
       'wind_direction_stddev_prop [degrees]', 'avg_wind_speed_sonic [m/s]',
       'max_wind_speed_sonic [m/s]', 'wind_speed_stddev_sonic [m/s]',
       'wind_direction_sonic [degrees]',
       'wind_direction_stddev_sonic [degrees]', 'solar_insolation [W/m^2]',
       'station_pressure [mbar]', 'snow_depth [cm]', 'frozen_soil_05cm [bit]',
       'frozen_soil_25cm [bit]', 'frozen_soil_50cm [bit]',
       'soil_temp_05cm [degC]', 'soil_temp_25cm [degC]',
       'soil_temp_50cm [degC]', 'soil_moisture_05cm [m^3/m^3]',
       'soil_moisture_25cm [m^3/m^3]', 'soil_moisture_50cm [m^3/m^3]', 'lat',
       'lon', 'elevation', 'name'],
      dtype='object')

Write out the new DataFrame to a csv file.

We’ll employ Pandas’ to_csv method, and specify index=False to avoid the creation of a new dummy column.

df_latest.to_csv('nysm_latest.csv',index=False)

Verify that we can read in the new csv file with Pandas

df = pd.read_csv('nysm_latest.csv')
df
station time temp_2m [degC] temp_9m [degC] relative_humidity [percent] precip_incremental [mm] precip_local [mm] precip_max_intensity [mm/min] avg_wind_speed_prop [m/s] max_wind_speed_prop [m/s] ... soil_temp_05cm [degC] soil_temp_25cm [degC] soil_temp_50cm [degC] soil_moisture_05cm [m^3/m^3] soil_moisture_25cm [m^3/m^3] soil_moisture_50cm [m^3/m^3] lat lon elevation name
0 ADDI 2023-10-17 17:30:00 10.5 10.3 73.8 0.0 0.27 0.0 2.8 5.0 ... 11.5 12.1 12.6 0.54 0.44 0.44 42.040360 -77.237260 507.6140 Addison
1 ANDE 2023-10-17 17:30:00 9.9 9.5 77.9 0.0 0.14 0.0 2.2 4.3 ... 12.3 12.6 NaN 0.32 0.16 NaN 42.182270 -74.801390 518.2820 Andes
2 BATA 2023-10-17 17:30:00 10.2 10.2 84.5 0.0 1.10 0.0 1.8 2.4 ... 11.1 11.6 13.0 0.15 0.09 0.09 43.019940 -78.135660 276.1200 Batavia
3 BEAC 2023-10-17 17:30:00 14.8 14.3 61.9 0.0 0.00 0.0 0.5 1.5 ... 13.5 13.8 14.3 0.32 0.26 0.22 41.528750 -73.945270 90.1598 Beacon
4 BELD 2023-10-17 17:30:00 9.9 9.7 81.6 0.0 0.00 0.0 1.5 2.7 ... 12.1 12.6 13.3 0.52 0.44 0.40 42.223220 -75.668520 470.3700 Belden
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121 WFMB 2023-10-17 17:30:00 9.0 8.7 76.4 0.0 0.38 0.0 0.8 2.4 ... 10.6 13.7 12.3 0.28 0.25 0.28 44.393236 -73.858829 614.5990 Whiteface Mountain Base
122 WGAT 2023-10-17 17:30:00 10.4 9.7 74.2 0.0 0.00 0.0 1.6 3.3 ... 10.9 11.0 11.9 0.15 0.26 0.09 43.532408 -75.158597 442.9660 Woodgate
123 WHIT 2023-10-17 17:30:00 13.8 13.6 69.1 0.0 0.00 0.0 0.9 1.8 ... 12.4 13.1 14.1 0.47 0.51 0.49 43.485073 -73.423071 36.5638 Whitehall
124 WOLC 2023-10-17 17:30:00 11.7 11.5 82.9 0.0 2.08 0.0 1.6 2.8 ... 13.7 13.0 13.7 0.17 0.02 0.07 43.228680 -76.842610 121.2190 Wolcott
125 YORK 2023-10-17 17:30:00 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 42.855040 -77.847760 177.9420 York

126 rows × 34 columns


Summary

  • Pandas includes functions to combine columns from multiple DataFrames.

  • Often, it is necessary to re-index the modified `DataFrame.

  • Modifed DataFrames are easily saved as csv files.

What’s Next?

In the next notebook, we will create multiple indexes on NYSM DataFrames.

Resources and References

  1. Pandas User Guide