Pandas 5: Combining and Writing¶

Previously, we worked with NYS Mesonet data during the period when the remnants of Hurricane Ida was impacting parts of the region. We needed to use site location data that existed in a separate file in order to geolocate NYSM sites and to reduce station pressure to sea-level. It would be more convenient if we could have all the necessary data in a single file.¶

We’ll once again use NYS Mesonet data, but now we’ll use real-time info!¶

Overview¶

Select the most recent time from a current NYSM file.
Add columns from the NYSM Site Locations file to the new DataFrame.
Write out the new DataFrame to a csv file.

Prerequisites¶

Concepts	Importance	Notes
Pandas Notebooks 1-5	Necessary

Time to learn: 30 minutes

Imports¶

import pandas as pd
from datetime import datetime 

Select the most recent time from a current NYSM dataset¶

Open the file containing the most recent hour’s worth of NYSM data (12 5-minute timesteps)

# Use a Linux shell command to take a peek at the current CSV file
! head -5 /data1/nysm/latest.csv

station,time,temp_2m [degC],temp_9m [degC],relative_humidity [percent],precip_incremental [mm],precip_local [mm],precip_max_intensity [mm/min],avg_wind_speed_prop [m/s],max_wind_speed_prop [m/s],wind_speed_stddev_prop [m/s],wind_direction_prop [degrees],wind_direction_stddev_prop [degrees],avg_wind_speed_sonic [m/s],max_wind_speed_sonic [m/s],wind_speed_stddev_sonic [m/s],wind_direction_sonic [degrees],wind_direction_stddev_sonic [degrees],solar_insolation [W/m^2],station_pressure [mbar],snow_depth [cm],frozen_soil_05cm [bit],frozen_soil_25cm [bit],frozen_soil_50cm [bit],soil_temp_05cm [degC],soil_temp_25cm [degC],soil_temp_50cm [degC],soil_moisture_05cm [m^3/m^3],soil_moisture_25cm [m^3/m^3],soil_moisture_50cm [m^3/m^3]

ADDI,2023-10-17 16:30:00 UTC,10.2,9.4,82.0,0.00,0.27,0.00,3.0,4.3,0.6,350,22,3.3,5.0,0.6,348,16,809,955.63,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

ADDI,2023-10-17 16:35:00 UTC,10.3,9.7,79.8,0.00,0.27,0.00,3.2,4.8,0.8,356,14,3.6,5.4,0.8,358,14,668,955.61,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

ADDI,2023-10-17 16:40:00 UTC,10.0,9.6,78.4,0.00,0.27,0.00,2.9,4.2,0.6,341,20,3.1,4.5,0.6,343,19,450,955.60,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.43

ADDI,2023-10-17 16:45:00 UTC,10.4,9.7,77.3,0.00,0.27,0.00,2.8,4.2,0.7,346,18,3.1,4.8,0.6,346,17,681,955.56,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44

As we have done before, we will employ a `lambda` function to interpret the data in the time column as Datetime objects, but keep the default row index.¶

# First define the format and then define the function
timeFormat = "%Y-%m-%d %H:%M:%S UTC"

# This function will iterate over each string in a 1-d array
# and use Pandas' implementation of strptime to convert the string into a datetime object.
parseTime = lambda x: datetime.strptime(x, timeFormat)

nysm_data = pd.read_csv('/data1/nysm/latest.csv',parse_dates=['time'], date_parser=parseTime)

nysm_data

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
0	ADDI	2023-10-17 16:30:00	10.2	9.4	82.0	0.0	0.27	0.0	3.0	4.3	...	NaN	0.0	0.0	0.0	11.3	12.1	12.6	0.54	0.44	0.44
1	ADDI	2023-10-17 16:35:00	10.3	9.7	79.8	0.0	0.27	0.0	3.2	4.8	...	NaN	0.0	0.0	0.0	11.3	12.1	12.6	0.54	0.44	0.44
2	ADDI	2023-10-17 16:40:00	10.0	9.6	78.4	0.0	0.27	0.0	2.9	4.2	...	NaN	0.0	0.0	0.0	11.3	12.1	12.6	0.54	0.44	0.43
3	ADDI	2023-10-17 16:45:00	10.4	9.7	77.3	0.0	0.27	0.0	2.8	4.2	...	NaN	0.0	0.0	0.0	11.3	12.1	12.6	0.54	0.44	0.44
4	ADDI	2023-10-17 16:50:00	10.1	9.7	75.8	0.0	0.27	0.0	2.5	3.8	...	NaN	0.0	0.0	0.0	11.3	12.1	12.6	0.54	0.44	0.43
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1633	YORK	2023-10-17 17:10:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1634	YORK	2023-10-17 17:15:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1635	YORK	2023-10-17 17:20:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1636	YORK	2023-10-17 17:25:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1637	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

1638 rows × 30 columns

In order to select data corresponding to the most recent time, we treat “maximum” as “latest”.

most_recent = nysm_data.time.max()
most_recent

Timestamp('2023-10-17 17:30:00')

Set a condition on which we will select matching rows.

condition = nysm_data.time == most_recent

nysm_data[condition]

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
12	ADDI	2023-10-17 17:30:00	10.5	10.3	73.8	0.0	0.27	0.0	2.8	5.0	...	NaN	0.0	0.0	0.0	11.5	12.1	12.6	0.54	0.44	0.44
25	ANDE	2023-10-17 17:30:00	9.9	9.5	77.9	0.0	0.14	0.0	2.2	4.3	...	NaN	0.0	0.0	NaN	12.3	12.6	NaN	0.32	0.16	NaN
38	BATA	2023-10-17 17:30:00	10.2	10.2	84.5	0.0	1.10	0.0	1.8	2.4	...	NaN	0.0	0.0	0.0	11.1	11.6	13.0	0.15	0.09	0.09
51	BEAC	2023-10-17 17:30:00	14.8	14.3	61.9	0.0	0.00	0.0	0.5	1.5	...	NaN	0.0	0.0	0.0	13.5	13.8	14.3	0.32	0.26	0.22
64	BELD	2023-10-17 17:30:00	9.9	9.7	81.6	0.0	0.00	0.0	1.5	2.7	...	NaN	0.0	0.0	0.0	12.1	12.6	13.3	0.52	0.44	0.40
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1585	WFMB	2023-10-17 17:30:00	9.0	8.7	76.4	0.0	0.38	0.0	0.8	2.4	...	NaN	0.0	0.0	0.0	10.6	13.7	12.3	0.28	0.25	0.28
1598	WGAT	2023-10-17 17:30:00	10.4	9.7	74.2	0.0	0.00	0.0	1.6	3.3	...	NaN	0.0	0.0	0.0	10.9	11.0	11.9	0.15	0.26	0.09
1611	WHIT	2023-10-17 17:30:00	13.8	13.6	69.1	0.0	0.00	0.0	0.9	1.8	...	NaN	0.0	0.0	0.0	12.4	13.1	14.1	0.47	0.51	0.49
1624	WOLC	2023-10-17 17:30:00	11.7	11.5	82.9	0.0	2.08	0.0	1.6	2.8	...	NaN	0.0	0.0	0.0	13.7	13.0	13.7	0.17	0.02	0.07
1637	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

126 rows × 30 columns

The above cell returns another DataFrame. Note its row indices reflect their values in the original DataFrame. We need to reset this index. For this, we’ll use Pandas reset_index method.

Tip: Remember, you can typically view information about objects, functions and methods in Python by appending a ?. For example, you could create an execute a cell below that contains the line nysm_data.reset_index?

# Uncomment the next line if you wish to get info on the `reset_index` method
# nysm_data.reset_index?

nysm_data[condition].reset_index()

	index	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
0	12	ADDI	2023-10-17 17:30:00	10.5	10.3	73.8	0.0	0.27	0.0	2.8	...	NaN	0.0	0.0	0.0	11.5	12.1	12.6	0.54	0.44	0.44
1	25	ANDE	2023-10-17 17:30:00	9.9	9.5	77.9	0.0	0.14	0.0	2.2	...	NaN	0.0	0.0	NaN	12.3	12.6	NaN	0.32	0.16	NaN
2	38	BATA	2023-10-17 17:30:00	10.2	10.2	84.5	0.0	1.10	0.0	1.8	...	NaN	0.0	0.0	0.0	11.1	11.6	13.0	0.15	0.09	0.09
3	51	BEAC	2023-10-17 17:30:00	14.8	14.3	61.9	0.0	0.00	0.0	0.5	...	NaN	0.0	0.0	0.0	13.5	13.8	14.3	0.32	0.26	0.22
4	64	BELD	2023-10-17 17:30:00	9.9	9.7	81.6	0.0	0.00	0.0	1.5	...	NaN	0.0	0.0	0.0	12.1	12.6	13.3	0.52	0.44	0.40
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
121	1585	WFMB	2023-10-17 17:30:00	9.0	8.7	76.4	0.0	0.38	0.0	0.8	...	NaN	0.0	0.0	0.0	10.6	13.7	12.3	0.28	0.25	0.28
122	1598	WGAT	2023-10-17 17:30:00	10.4	9.7	74.2	0.0	0.00	0.0	1.6	...	NaN	0.0	0.0	0.0	10.9	11.0	11.9	0.15	0.26	0.09
123	1611	WHIT	2023-10-17 17:30:00	13.8	13.6	69.1	0.0	0.00	0.0	0.9	...	NaN	0.0	0.0	0.0	12.4	13.1	14.1	0.47	0.51	0.49
124	1624	WOLC	2023-10-17 17:30:00	11.7	11.5	82.9	0.0	2.08	0.0	1.6	...	NaN	0.0	0.0	0.0	13.7	13.0	13.7	0.17	0.02	0.07
125	1637	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

126 rows × 31 columns

Notice the second column, called index? That contains the row numbers from the original nysm_data DataFrame which we don’t want. Let’s repeat the reset_index call, but also include the drop=True argument to reset_index.

nysm_data[condition].reset_index(drop=True)

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	snow_depth [cm]	frozen_soil_05cm [bit]	frozen_soil_25cm [bit]	frozen_soil_50cm [bit]	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]
0	ADDI	2023-10-17 17:30:00	10.5	10.3	73.8	0.0	0.27	0.0	2.8	5.0	...	NaN	0.0	0.0	0.0	11.5	12.1	12.6	0.54	0.44	0.44
1	ANDE	2023-10-17 17:30:00	9.9	9.5	77.9	0.0	0.14	0.0	2.2	4.3	...	NaN	0.0	0.0	NaN	12.3	12.6	NaN	0.32	0.16	NaN
2	BATA	2023-10-17 17:30:00	10.2	10.2	84.5	0.0	1.10	0.0	1.8	2.4	...	NaN	0.0	0.0	0.0	11.1	11.6	13.0	0.15	0.09	0.09
3	BEAC	2023-10-17 17:30:00	14.8	14.3	61.9	0.0	0.00	0.0	0.5	1.5	...	NaN	0.0	0.0	0.0	13.5	13.8	14.3	0.32	0.26	0.22
4	BELD	2023-10-17 17:30:00	9.9	9.7	81.6	0.0	0.00	0.0	1.5	2.7	...	NaN	0.0	0.0	0.0	12.1	12.6	13.3	0.52	0.44	0.40
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
121	WFMB	2023-10-17 17:30:00	9.0	8.7	76.4	0.0	0.38	0.0	0.8	2.4	...	NaN	0.0	0.0	0.0	10.6	13.7	12.3	0.28	0.25	0.28
122	WGAT	2023-10-17 17:30:00	10.4	9.7	74.2	0.0	0.00	0.0	1.6	3.3	...	NaN	0.0	0.0	0.0	10.9	11.0	11.9	0.15	0.26	0.09
123	WHIT	2023-10-17 17:30:00	13.8	13.6	69.1	0.0	0.00	0.0	0.9	1.8	...	NaN	0.0	0.0	0.0	12.4	13.1	14.1	0.47	0.51	0.49
124	WOLC	2023-10-17 17:30:00	11.7	11.5	82.9	0.0	2.08	0.0	1.6	2.8	...	NaN	0.0	0.0	0.0	13.7	13.0	13.7	0.17	0.02	0.07
125	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

126 rows × 30 columns

Now that we have established how to reset the row index, create a new DataFrame object.

df_latest = nysm_data[condition].reset_index(drop=True)

Add columns from the NYSM Site Locations file to the new DataFrame.¶

nysm_sites = pd.read_csv('/spare11/atm533/data/nysm.csv')

nysm_sites

	stid	number	name	lat	lon	elevation	county	nearest_city	state	distance_from_town [km]	direction_from_town [degrees]	climate_division	climate_division_name	wfo	commissioned	decommissioned
0	ADDI	107	Addison	42.040360	-77.237260	507.6140	Steuben	Addison	NY	6.9	S	1	Western Plateau	BGM	2016-08-10 18:15:00 UTC	NaN
1	ANDE	111	Andes	42.182270	-74.801390	518.2820	Delaware	Andes	NY	1.5	WSW	2	Eastern Plateau	BGM	2016-08-04 15:55:00 UTC	NaN
2	BATA	24	Batavia	43.019940	-78.135660	276.1200	Genesee	Batavia	NY	4.9	ENE	9	Great Lakes	BUF	2016-02-18 18:40:00 UTC	NaN
3	BEAC	76	Beacon	41.528750	-73.945270	90.1598	Dutchess	Beacon	NY	3.3	NE	5	Hudson Valley	ALY	2016-08-22 16:45:00 UTC	NaN
4	BELD	90	Belden	42.223220	-75.668520	470.3700	Broome	Belden	NY	2.2	NNE	2	Eastern Plateau	BGM	2015-11-30 20:20:00 UTC	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
121	WFMB	14	Whiteface Mountain Base	44.393236	-73.858829	614.5990	Essex	Wilmington	NY	3.5	W	3	Northern Plateau	BTV	2016-01-29 20:55:00 UTC	NaN
122	WGAT	123	Woodgate	43.532408	-75.158597	442.9660	Oneida	Woodgate	NY	1.4	NNW	3	Northern Plateau	BGM	2016-08-29 18:20:00 UTC	NaN
123	WHIT	10	Whitehall	43.485073	-73.423071	36.5638	Washington	Whitehall	NY	8.0	S	7	Champlain Valley	ALY	2015-08-26 20:30:00 UTC	NaN
124	WOLC	79	Wolcott	43.228680	-76.842610	121.2190	Wayne	Wolcott	NY	2.4	WNW	9	Great Lakes	BUF	2016-03-09 18:10:00 UTC	NaN
125	YORK	99	York	42.855040	-77.847760	177.9420	Livingston	York	NY	3.6	ESE	10	Central Lakes	BUF	2016-08-09 17:55:00 UTC	NaN

126 rows × 16 columns

To add a new column to a DataFrame, just create a new object for it.

df_latest['lat'] = nysm_sites['lat']

df_latest['lon'] = nysm_sites['lon']

df_latest['elevation'] = nysm_sites['elevation']

df_latest['name'] = nysm_sites['name']

Let’s look at what we have now. If you scroll over to the right, you will see the four columns have been appended to the df_latest DataFrame.

df_latest

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]	lat	lon	elevation	name
0	ADDI	2023-10-17 17:30:00	10.5	10.3	73.8	0.0	0.27	0.0	2.8	5.0	...	11.5	12.1	12.6	0.54	0.44	0.44	42.040360	-77.237260	507.6140	Addison
1	ANDE	2023-10-17 17:30:00	9.9	9.5	77.9	0.0	0.14	0.0	2.2	4.3	...	12.3	12.6	NaN	0.32	0.16	NaN	42.182270	-74.801390	518.2820	Andes
2	BATA	2023-10-17 17:30:00	10.2	10.2	84.5	0.0	1.10	0.0	1.8	2.4	...	11.1	11.6	13.0	0.15	0.09	0.09	43.019940	-78.135660	276.1200	Batavia
3	BEAC	2023-10-17 17:30:00	14.8	14.3	61.9	0.0	0.00	0.0	0.5	1.5	...	13.5	13.8	14.3	0.32	0.26	0.22	41.528750	-73.945270	90.1598	Beacon
4	BELD	2023-10-17 17:30:00	9.9	9.7	81.6	0.0	0.00	0.0	1.5	2.7	...	12.1	12.6	13.3	0.52	0.44	0.40	42.223220	-75.668520	470.3700	Belden
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
121	WFMB	2023-10-17 17:30:00	9.0	8.7	76.4	0.0	0.38	0.0	0.8	2.4	...	10.6	13.7	12.3	0.28	0.25	0.28	44.393236	-73.858829	614.5990	Whiteface Mountain Base
122	WGAT	2023-10-17 17:30:00	10.4	9.7	74.2	0.0	0.00	0.0	1.6	3.3	...	10.9	11.0	11.9	0.15	0.26	0.09	43.532408	-75.158597	442.9660	Woodgate
123	WHIT	2023-10-17 17:30:00	13.8	13.6	69.1	0.0	0.00	0.0	0.9	1.8	...	12.4	13.1	14.1	0.47	0.51	0.49	43.485073	-73.423071	36.5638	Whitehall
124	WOLC	2023-10-17 17:30:00	11.7	11.5	82.9	0.0	2.08	0.0	1.6	2.8	...	13.7	13.0	13.7	0.17	0.02	0.07	43.228680	-76.842610	121.2190	Wolcott
125	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	42.855040	-77.847760	177.9420	York

126 rows × 34 columns

Let’s look at all the column names.

df_latest.columns

Index(['station', 'time', 'temp_2m [degC]', 'temp_9m [degC]',
       'relative_humidity [percent]', 'precip_incremental [mm]',
       'precip_local [mm]', 'precip_max_intensity [mm/min]',
       'avg_wind_speed_prop [m/s]', 'max_wind_speed_prop [m/s]',
       'wind_speed_stddev_prop [m/s]', 'wind_direction_prop [degrees]',
       'wind_direction_stddev_prop [degrees]', 'avg_wind_speed_sonic [m/s]',
       'max_wind_speed_sonic [m/s]', 'wind_speed_stddev_sonic [m/s]',
       'wind_direction_sonic [degrees]',
       'wind_direction_stddev_sonic [degrees]', 'solar_insolation [W/m^2]',
       'station_pressure [mbar]', 'snow_depth [cm]', 'frozen_soil_05cm [bit]',
       'frozen_soil_25cm [bit]', 'frozen_soil_50cm [bit]',
       'soil_temp_05cm [degC]', 'soil_temp_25cm [degC]',
       'soil_temp_50cm [degC]', 'soil_moisture_05cm [m^3/m^3]',
       'soil_moisture_25cm [m^3/m^3]', 'soil_moisture_50cm [m^3/m^3]', 'lat',
       'lon', 'elevation', 'name'],
      dtype='object')

Write out the new DataFrame to a csv file.¶

We’ll employ Pandas’ to_csv method, and specify index=False to avoid the creation of a new dummy column.

df_latest.to_csv('nysm_latest.csv',index=False)

Verify that we can read in the new csv file with Pandas

df = pd.read_csv('nysm_latest.csv')

df

	station	time	temp_2m [degC]	temp_9m [degC]	relative_humidity [percent]	precip_incremental [mm]	precip_local [mm]	precip_max_intensity [mm/min]	avg_wind_speed_prop [m/s]	max_wind_speed_prop [m/s]	...	soil_temp_05cm [degC]	soil_temp_25cm [degC]	soil_temp_50cm [degC]	soil_moisture_05cm [m^3/m^3]	soil_moisture_25cm [m^3/m^3]	soil_moisture_50cm [m^3/m^3]	lat	lon	elevation	name
0	ADDI	2023-10-17 17:30:00	10.5	10.3	73.8	0.0	0.27	0.0	2.8	5.0	...	11.5	12.1	12.6	0.54	0.44	0.44	42.040360	-77.237260	507.6140	Addison
1	ANDE	2023-10-17 17:30:00	9.9	9.5	77.9	0.0	0.14	0.0	2.2	4.3	...	12.3	12.6	NaN	0.32	0.16	NaN	42.182270	-74.801390	518.2820	Andes
2	BATA	2023-10-17 17:30:00	10.2	10.2	84.5	0.0	1.10	0.0	1.8	2.4	...	11.1	11.6	13.0	0.15	0.09	0.09	43.019940	-78.135660	276.1200	Batavia
3	BEAC	2023-10-17 17:30:00	14.8	14.3	61.9	0.0	0.00	0.0	0.5	1.5	...	13.5	13.8	14.3	0.32	0.26	0.22	41.528750	-73.945270	90.1598	Beacon
4	BELD	2023-10-17 17:30:00	9.9	9.7	81.6	0.0	0.00	0.0	1.5	2.7	...	12.1	12.6	13.3	0.52	0.44	0.40	42.223220	-75.668520	470.3700	Belden
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
121	WFMB	2023-10-17 17:30:00	9.0	8.7	76.4	0.0	0.38	0.0	0.8	2.4	...	10.6	13.7	12.3	0.28	0.25	0.28	44.393236	-73.858829	614.5990	Whiteface Mountain Base
122	WGAT	2023-10-17 17:30:00	10.4	9.7	74.2	0.0	0.00	0.0	1.6	3.3	...	10.9	11.0	11.9	0.15	0.26	0.09	43.532408	-75.158597	442.9660	Woodgate
123	WHIT	2023-10-17 17:30:00	13.8	13.6	69.1	0.0	0.00	0.0	0.9	1.8	...	12.4	13.1	14.1	0.47	0.51	0.49	43.485073	-73.423071	36.5638	Whitehall
124	WOLC	2023-10-17 17:30:00	11.7	11.5	82.9	0.0	2.08	0.0	1.6	2.8	...	13.7	13.0	13.7	0.17	0.02	0.07	43.228680	-76.842610	121.2190	Wolcott
125	YORK	2023-10-17 17:30:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	42.855040	-77.847760	177.9420	York

126 rows × 34 columns

Summary¶

Pandas includes functions to combine columns from multiple DataFrames.
Often, it is necessary to re-index the modified `DataFrame.
Modifed DataFrames are easily saved as csv files.

What’s Next?¶

In the next notebook, we will create multiple indexes on NYSM DataFrames.

Resources and References¶

Pandas User Guide

ATM433/533 Fall 2023

Pandas 5: Combining and Writing

Contents

Pandas 5: Combining and Writing¶

We’ll once again use NYS Mesonet data, but now we’ll use real-time info!¶

Overview¶

Prerequisites¶

Imports¶

Select the most recent time from a current NYSM dataset¶

As we have done before, we will employ a `lambda` function to interpret the data in the time column as Datetime objects, but keep the default row index.¶

Add columns from the NYSM Site Locations file to the new DataFrame.¶

Write out the new DataFrame to a csv file.¶

Summary¶

What’s Next?¶

Resources and References¶

ATM433/533 Fall 2023

Pandas 5: Combining and Writing

Contents

Pandas 5: Combining and Writing¶

We’ll once again use NYS Mesonet data, but now we’ll use real-time info!¶

Overview¶

Prerequisites¶

Imports¶

Select the most recent time from a current NYSM dataset¶

As we have done before, we will employ a lambda function to interpret the data in the time column as Datetime objects, but keep the default row index.¶

Add columns from the NYSM Site Locations file to the new DataFrame.¶

Write out the new DataFrame to a csv file.¶

Summary¶

What’s Next?¶

Resources and References¶

As we have done before, we will employ a `lambda` function to interpret the data in the time column as Datetime objects, but keep the default row index.¶