Pandas 5: Combining and Writing
Contents
Pandas 5: Combining and Writing¶
Previously, we worked with NYS Mesonet data during the period when the remnants of Hurricane Ida was impacting parts of the region. We needed to use site location data that existed in a separate file in order to geolocate NYSM sites and to reduce station pressure to sea-level. It would be more convenient if we could have all the necessary data in a single file.¶
We’ll once again use NYS Mesonet data, but now we’ll use real-time info!¶
Overview¶
Select the most recent time from a current NYSM file.
Add columns from the NYSM Site Locations file to the new DataFrame.
Write out the new DataFrame to a csv file.
Imports¶
import pandas as pd
from datetime import datetime
Select the most recent time from a current NYSM dataset¶
Open the file containing the most recent hour’s worth of NYSM data (12 5-minute timesteps)
# Use a Linux shell command to take a peek at the current CSV file
! head -5 /data1/nysm/latest.csv
station,time,temp_2m [degC],temp_9m [degC],relative_humidity [percent],precip_incremental [mm],precip_local [mm],precip_max_intensity [mm/min],avg_wind_speed_prop [m/s],max_wind_speed_prop [m/s],wind_speed_stddev_prop [m/s],wind_direction_prop [degrees],wind_direction_stddev_prop [degrees],avg_wind_speed_sonic [m/s],max_wind_speed_sonic [m/s],wind_speed_stddev_sonic [m/s],wind_direction_sonic [degrees],wind_direction_stddev_sonic [degrees],solar_insolation [W/m^2],station_pressure [mbar],snow_depth [cm],frozen_soil_05cm [bit],frozen_soil_25cm [bit],frozen_soil_50cm [bit],soil_temp_05cm [degC],soil_temp_25cm [degC],soil_temp_50cm [degC],soil_moisture_05cm [m^3/m^3],soil_moisture_25cm [m^3/m^3],soil_moisture_50cm [m^3/m^3]
ADDI,2023-10-17 16:30:00 UTC,10.2,9.4,82.0,0.00,0.27,0.00,3.0,4.3,0.6,350,22,3.3,5.0,0.6,348,16,809,955.63,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44
ADDI,2023-10-17 16:35:00 UTC,10.3,9.7,79.8,0.00,0.27,0.00,3.2,4.8,0.8,356,14,3.6,5.4,0.8,358,14,668,955.61,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44
ADDI,2023-10-17 16:40:00 UTC,10.0,9.6,78.4,0.00,0.27,0.00,2.9,4.2,0.6,341,20,3.1,4.5,0.6,343,19,450,955.60,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.43
ADDI,2023-10-17 16:45:00 UTC,10.4,9.7,77.3,0.00,0.27,0.00,2.8,4.2,0.7,346,18,3.1,4.8,0.6,346,17,681,955.56,,0,0,0,11.3,12.1,12.6,0.54,0.44,0.44
As we have done before, we will employ a lambda
function to interpret the data in the time column as Datetime objects, but keep the default row index.¶
# First define the format and then define the function
timeFormat = "%Y-%m-%d %H:%M:%S UTC"
# This function will iterate over each string in a 1-d array
# and use Pandas' implementation of strptime to convert the string into a datetime object.
parseTime = lambda x: datetime.strptime(x, timeFormat)
nysm_data = pd.read_csv('/data1/nysm/latest.csv',parse_dates=['time'], date_parser=parseTime)
nysm_data
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2023-10-17 16:30:00 | 10.2 | 9.4 | 82.0 | 0.0 | 0.27 | 0.0 | 3.0 | 4.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.3 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
1 | ADDI | 2023-10-17 16:35:00 | 10.3 | 9.7 | 79.8 | 0.0 | 0.27 | 0.0 | 3.2 | 4.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.3 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
2 | ADDI | 2023-10-17 16:40:00 | 10.0 | 9.6 | 78.4 | 0.0 | 0.27 | 0.0 | 2.9 | 4.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.3 | 12.1 | 12.6 | 0.54 | 0.44 | 0.43 |
3 | ADDI | 2023-10-17 16:45:00 | 10.4 | 9.7 | 77.3 | 0.0 | 0.27 | 0.0 | 2.8 | 4.2 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.3 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
4 | ADDI | 2023-10-17 16:50:00 | 10.1 | 9.7 | 75.8 | 0.0 | 0.27 | 0.0 | 2.5 | 3.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.3 | 12.1 | 12.6 | 0.54 | 0.44 | 0.43 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1633 | YORK | 2023-10-17 17:10:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1634 | YORK | 2023-10-17 17:15:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1635 | YORK | 2023-10-17 17:20:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1636 | YORK | 2023-10-17 17:25:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1637 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1638 rows × 30 columns
In order to select data corresponding to the most recent time, we treat “maximum” as “latest”.
most_recent = nysm_data.time.max()
most_recent
Timestamp('2023-10-17 17:30:00')
Set a condition on which we will select matching rows.
condition = nysm_data.time == most_recent
nysm_data[condition]
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | ADDI | 2023-10-17 17:30:00 | 10.5 | 10.3 | 73.8 | 0.0 | 0.27 | 0.0 | 2.8 | 5.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.5 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
25 | ANDE | 2023-10-17 17:30:00 | 9.9 | 9.5 | 77.9 | 0.0 | 0.14 | 0.0 | 2.2 | 4.3 | ... | NaN | 0.0 | 0.0 | NaN | 12.3 | 12.6 | NaN | 0.32 | 0.16 | NaN |
38 | BATA | 2023-10-17 17:30:00 | 10.2 | 10.2 | 84.5 | 0.0 | 1.10 | 0.0 | 1.8 | 2.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.1 | 11.6 | 13.0 | 0.15 | 0.09 | 0.09 |
51 | BEAC | 2023-10-17 17:30:00 | 14.8 | 14.3 | 61.9 | 0.0 | 0.00 | 0.0 | 0.5 | 1.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.5 | 13.8 | 14.3 | 0.32 | 0.26 | 0.22 |
64 | BELD | 2023-10-17 17:30:00 | 9.9 | 9.7 | 81.6 | 0.0 | 0.00 | 0.0 | 1.5 | 2.7 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.1 | 12.6 | 13.3 | 0.52 | 0.44 | 0.40 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1585 | WFMB | 2023-10-17 17:30:00 | 9.0 | 8.7 | 76.4 | 0.0 | 0.38 | 0.0 | 0.8 | 2.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.6 | 13.7 | 12.3 | 0.28 | 0.25 | 0.28 |
1598 | WGAT | 2023-10-17 17:30:00 | 10.4 | 9.7 | 74.2 | 0.0 | 0.00 | 0.0 | 1.6 | 3.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.9 | 11.0 | 11.9 | 0.15 | 0.26 | 0.09 |
1611 | WHIT | 2023-10-17 17:30:00 | 13.8 | 13.6 | 69.1 | 0.0 | 0.00 | 0.0 | 0.9 | 1.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.4 | 13.1 | 14.1 | 0.47 | 0.51 | 0.49 |
1624 | WOLC | 2023-10-17 17:30:00 | 11.7 | 11.5 | 82.9 | 0.0 | 2.08 | 0.0 | 1.6 | 2.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.7 | 13.0 | 13.7 | 0.17 | 0.02 | 0.07 |
1637 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
126 rows × 30 columns
The above cell returns another DataFrame
. Note its row indices reflect their values in the original DataFrame
. We need to reset this index. For this, we’ll use Pandas reset_index
method.
nysm_data.reset_index?
# Uncomment the next line if you wish to get info on the `reset_index` method
# nysm_data.reset_index?
nysm_data[condition].reset_index()
index | station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12 | ADDI | 2023-10-17 17:30:00 | 10.5 | 10.3 | 73.8 | 0.0 | 0.27 | 0.0 | 2.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.5 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
1 | 25 | ANDE | 2023-10-17 17:30:00 | 9.9 | 9.5 | 77.9 | 0.0 | 0.14 | 0.0 | 2.2 | ... | NaN | 0.0 | 0.0 | NaN | 12.3 | 12.6 | NaN | 0.32 | 0.16 | NaN |
2 | 38 | BATA | 2023-10-17 17:30:00 | 10.2 | 10.2 | 84.5 | 0.0 | 1.10 | 0.0 | 1.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.1 | 11.6 | 13.0 | 0.15 | 0.09 | 0.09 |
3 | 51 | BEAC | 2023-10-17 17:30:00 | 14.8 | 14.3 | 61.9 | 0.0 | 0.00 | 0.0 | 0.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.5 | 13.8 | 14.3 | 0.32 | 0.26 | 0.22 |
4 | 64 | BELD | 2023-10-17 17:30:00 | 9.9 | 9.7 | 81.6 | 0.0 | 0.00 | 0.0 | 1.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.1 | 12.6 | 13.3 | 0.52 | 0.44 | 0.40 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | 1585 | WFMB | 2023-10-17 17:30:00 | 9.0 | 8.7 | 76.4 | 0.0 | 0.38 | 0.0 | 0.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.6 | 13.7 | 12.3 | 0.28 | 0.25 | 0.28 |
122 | 1598 | WGAT | 2023-10-17 17:30:00 | 10.4 | 9.7 | 74.2 | 0.0 | 0.00 | 0.0 | 1.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.9 | 11.0 | 11.9 | 0.15 | 0.26 | 0.09 |
123 | 1611 | WHIT | 2023-10-17 17:30:00 | 13.8 | 13.6 | 69.1 | 0.0 | 0.00 | 0.0 | 0.9 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.4 | 13.1 | 14.1 | 0.47 | 0.51 | 0.49 |
124 | 1624 | WOLC | 2023-10-17 17:30:00 | 11.7 | 11.5 | 82.9 | 0.0 | 2.08 | 0.0 | 1.6 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.7 | 13.0 | 13.7 | 0.17 | 0.02 | 0.07 |
125 | 1637 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
126 rows × 31 columns
Notice the second column, called index
? That contains the row numbers from the original nysm_data DataFrame
which we don’t want. Let’s repeat the reset_index
call, but also include the drop=True
argument to reset_index
.
nysm_data[condition].reset_index(drop=True)
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | snow_depth [cm] | frozen_soil_05cm [bit] | frozen_soil_25cm [bit] | frozen_soil_50cm [bit] | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2023-10-17 17:30:00 | 10.5 | 10.3 | 73.8 | 0.0 | 0.27 | 0.0 | 2.8 | 5.0 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.5 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 |
1 | ANDE | 2023-10-17 17:30:00 | 9.9 | 9.5 | 77.9 | 0.0 | 0.14 | 0.0 | 2.2 | 4.3 | ... | NaN | 0.0 | 0.0 | NaN | 12.3 | 12.6 | NaN | 0.32 | 0.16 | NaN |
2 | BATA | 2023-10-17 17:30:00 | 10.2 | 10.2 | 84.5 | 0.0 | 1.10 | 0.0 | 1.8 | 2.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 11.1 | 11.6 | 13.0 | 0.15 | 0.09 | 0.09 |
3 | BEAC | 2023-10-17 17:30:00 | 14.8 | 14.3 | 61.9 | 0.0 | 0.00 | 0.0 | 0.5 | 1.5 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.5 | 13.8 | 14.3 | 0.32 | 0.26 | 0.22 |
4 | BELD | 2023-10-17 17:30:00 | 9.9 | 9.7 | 81.6 | 0.0 | 0.00 | 0.0 | 1.5 | 2.7 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.1 | 12.6 | 13.3 | 0.52 | 0.44 | 0.40 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2023-10-17 17:30:00 | 9.0 | 8.7 | 76.4 | 0.0 | 0.38 | 0.0 | 0.8 | 2.4 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.6 | 13.7 | 12.3 | 0.28 | 0.25 | 0.28 |
122 | WGAT | 2023-10-17 17:30:00 | 10.4 | 9.7 | 74.2 | 0.0 | 0.00 | 0.0 | 1.6 | 3.3 | ... | NaN | 0.0 | 0.0 | 0.0 | 10.9 | 11.0 | 11.9 | 0.15 | 0.26 | 0.09 |
123 | WHIT | 2023-10-17 17:30:00 | 13.8 | 13.6 | 69.1 | 0.0 | 0.00 | 0.0 | 0.9 | 1.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 12.4 | 13.1 | 14.1 | 0.47 | 0.51 | 0.49 |
124 | WOLC | 2023-10-17 17:30:00 | 11.7 | 11.5 | 82.9 | 0.0 | 2.08 | 0.0 | 1.6 | 2.8 | ... | NaN | 0.0 | 0.0 | 0.0 | 13.7 | 13.0 | 13.7 | 0.17 | 0.02 | 0.07 |
125 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
126 rows × 30 columns
Now that we have established how to reset the row index, create a new DataFrame object.
df_latest = nysm_data[condition].reset_index(drop=True)
Add columns from the NYSM Site Locations file to the new DataFrame.¶
nysm_sites = pd.read_csv('/spare11/atm533/data/nysm.csv')
nysm_sites
stid | number | name | lat | lon | elevation | county | nearest_city | state | distance_from_town [km] | direction_from_town [degrees] | climate_division | climate_division_name | wfo | commissioned | decommissioned | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 107 | Addison | 42.040360 | -77.237260 | 507.6140 | Steuben | Addison | NY | 6.9 | S | 1 | Western Plateau | BGM | 2016-08-10 18:15:00 UTC | NaN |
1 | ANDE | 111 | Andes | 42.182270 | -74.801390 | 518.2820 | Delaware | Andes | NY | 1.5 | WSW | 2 | Eastern Plateau | BGM | 2016-08-04 15:55:00 UTC | NaN |
2 | BATA | 24 | Batavia | 43.019940 | -78.135660 | 276.1200 | Genesee | Batavia | NY | 4.9 | ENE | 9 | Great Lakes | BUF | 2016-02-18 18:40:00 UTC | NaN |
3 | BEAC | 76 | Beacon | 41.528750 | -73.945270 | 90.1598 | Dutchess | Beacon | NY | 3.3 | NE | 5 | Hudson Valley | ALY | 2016-08-22 16:45:00 UTC | NaN |
4 | BELD | 90 | Belden | 42.223220 | -75.668520 | 470.3700 | Broome | Belden | NY | 2.2 | NNE | 2 | Eastern Plateau | BGM | 2015-11-30 20:20:00 UTC | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 14 | Whiteface Mountain Base | 44.393236 | -73.858829 | 614.5990 | Essex | Wilmington | NY | 3.5 | W | 3 | Northern Plateau | BTV | 2016-01-29 20:55:00 UTC | NaN |
122 | WGAT | 123 | Woodgate | 43.532408 | -75.158597 | 442.9660 | Oneida | Woodgate | NY | 1.4 | NNW | 3 | Northern Plateau | BGM | 2016-08-29 18:20:00 UTC | NaN |
123 | WHIT | 10 | Whitehall | 43.485073 | -73.423071 | 36.5638 | Washington | Whitehall | NY | 8.0 | S | 7 | Champlain Valley | ALY | 2015-08-26 20:30:00 UTC | NaN |
124 | WOLC | 79 | Wolcott | 43.228680 | -76.842610 | 121.2190 | Wayne | Wolcott | NY | 2.4 | WNW | 9 | Great Lakes | BUF | 2016-03-09 18:10:00 UTC | NaN |
125 | YORK | 99 | York | 42.855040 | -77.847760 | 177.9420 | Livingston | York | NY | 3.6 | ESE | 10 | Central Lakes | BUF | 2016-08-09 17:55:00 UTC | NaN |
126 rows × 16 columns
To add a new column to a DataFrame
, just create a new object for it.
df_latest['lat'] = nysm_sites['lat']
df_latest['lon'] = nysm_sites['lon']
df_latest['elevation'] = nysm_sites['elevation']
df_latest['name'] = nysm_sites['name']
Let’s look at what we have now. If you scroll over to the right, you will see the four columns have been appended to the df_latest
DataFrame
.
df_latest
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | lat | lon | elevation | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2023-10-17 17:30:00 | 10.5 | 10.3 | 73.8 | 0.0 | 0.27 | 0.0 | 2.8 | 5.0 | ... | 11.5 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 | 42.040360 | -77.237260 | 507.6140 | Addison |
1 | ANDE | 2023-10-17 17:30:00 | 9.9 | 9.5 | 77.9 | 0.0 | 0.14 | 0.0 | 2.2 | 4.3 | ... | 12.3 | 12.6 | NaN | 0.32 | 0.16 | NaN | 42.182270 | -74.801390 | 518.2820 | Andes |
2 | BATA | 2023-10-17 17:30:00 | 10.2 | 10.2 | 84.5 | 0.0 | 1.10 | 0.0 | 1.8 | 2.4 | ... | 11.1 | 11.6 | 13.0 | 0.15 | 0.09 | 0.09 | 43.019940 | -78.135660 | 276.1200 | Batavia |
3 | BEAC | 2023-10-17 17:30:00 | 14.8 | 14.3 | 61.9 | 0.0 | 0.00 | 0.0 | 0.5 | 1.5 | ... | 13.5 | 13.8 | 14.3 | 0.32 | 0.26 | 0.22 | 41.528750 | -73.945270 | 90.1598 | Beacon |
4 | BELD | 2023-10-17 17:30:00 | 9.9 | 9.7 | 81.6 | 0.0 | 0.00 | 0.0 | 1.5 | 2.7 | ... | 12.1 | 12.6 | 13.3 | 0.52 | 0.44 | 0.40 | 42.223220 | -75.668520 | 470.3700 | Belden |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2023-10-17 17:30:00 | 9.0 | 8.7 | 76.4 | 0.0 | 0.38 | 0.0 | 0.8 | 2.4 | ... | 10.6 | 13.7 | 12.3 | 0.28 | 0.25 | 0.28 | 44.393236 | -73.858829 | 614.5990 | Whiteface Mountain Base |
122 | WGAT | 2023-10-17 17:30:00 | 10.4 | 9.7 | 74.2 | 0.0 | 0.00 | 0.0 | 1.6 | 3.3 | ... | 10.9 | 11.0 | 11.9 | 0.15 | 0.26 | 0.09 | 43.532408 | -75.158597 | 442.9660 | Woodgate |
123 | WHIT | 2023-10-17 17:30:00 | 13.8 | 13.6 | 69.1 | 0.0 | 0.00 | 0.0 | 0.9 | 1.8 | ... | 12.4 | 13.1 | 14.1 | 0.47 | 0.51 | 0.49 | 43.485073 | -73.423071 | 36.5638 | Whitehall |
124 | WOLC | 2023-10-17 17:30:00 | 11.7 | 11.5 | 82.9 | 0.0 | 2.08 | 0.0 | 1.6 | 2.8 | ... | 13.7 | 13.0 | 13.7 | 0.17 | 0.02 | 0.07 | 43.228680 | -76.842610 | 121.2190 | Wolcott |
125 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 42.855040 | -77.847760 | 177.9420 | York |
126 rows × 34 columns
Let’s look at all the column names.
df_latest.columns
Index(['station', 'time', 'temp_2m [degC]', 'temp_9m [degC]',
'relative_humidity [percent]', 'precip_incremental [mm]',
'precip_local [mm]', 'precip_max_intensity [mm/min]',
'avg_wind_speed_prop [m/s]', 'max_wind_speed_prop [m/s]',
'wind_speed_stddev_prop [m/s]', 'wind_direction_prop [degrees]',
'wind_direction_stddev_prop [degrees]', 'avg_wind_speed_sonic [m/s]',
'max_wind_speed_sonic [m/s]', 'wind_speed_stddev_sonic [m/s]',
'wind_direction_sonic [degrees]',
'wind_direction_stddev_sonic [degrees]', 'solar_insolation [W/m^2]',
'station_pressure [mbar]', 'snow_depth [cm]', 'frozen_soil_05cm [bit]',
'frozen_soil_25cm [bit]', 'frozen_soil_50cm [bit]',
'soil_temp_05cm [degC]', 'soil_temp_25cm [degC]',
'soil_temp_50cm [degC]', 'soil_moisture_05cm [m^3/m^3]',
'soil_moisture_25cm [m^3/m^3]', 'soil_moisture_50cm [m^3/m^3]', 'lat',
'lon', 'elevation', 'name'],
dtype='object')
Write out the new DataFrame to a csv file.¶
We’ll employ Pandas’ to_csv
method, and specify index=False
to avoid the creation of a new dummy column.
df_latest.to_csv('nysm_latest.csv',index=False)
Verify that we can read in the new csv file with Pandas
df = pd.read_csv('nysm_latest.csv')
df
station | time | temp_2m [degC] | temp_9m [degC] | relative_humidity [percent] | precip_incremental [mm] | precip_local [mm] | precip_max_intensity [mm/min] | avg_wind_speed_prop [m/s] | max_wind_speed_prop [m/s] | ... | soil_temp_05cm [degC] | soil_temp_25cm [degC] | soil_temp_50cm [degC] | soil_moisture_05cm [m^3/m^3] | soil_moisture_25cm [m^3/m^3] | soil_moisture_50cm [m^3/m^3] | lat | lon | elevation | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADDI | 2023-10-17 17:30:00 | 10.5 | 10.3 | 73.8 | 0.0 | 0.27 | 0.0 | 2.8 | 5.0 | ... | 11.5 | 12.1 | 12.6 | 0.54 | 0.44 | 0.44 | 42.040360 | -77.237260 | 507.6140 | Addison |
1 | ANDE | 2023-10-17 17:30:00 | 9.9 | 9.5 | 77.9 | 0.0 | 0.14 | 0.0 | 2.2 | 4.3 | ... | 12.3 | 12.6 | NaN | 0.32 | 0.16 | NaN | 42.182270 | -74.801390 | 518.2820 | Andes |
2 | BATA | 2023-10-17 17:30:00 | 10.2 | 10.2 | 84.5 | 0.0 | 1.10 | 0.0 | 1.8 | 2.4 | ... | 11.1 | 11.6 | 13.0 | 0.15 | 0.09 | 0.09 | 43.019940 | -78.135660 | 276.1200 | Batavia |
3 | BEAC | 2023-10-17 17:30:00 | 14.8 | 14.3 | 61.9 | 0.0 | 0.00 | 0.0 | 0.5 | 1.5 | ... | 13.5 | 13.8 | 14.3 | 0.32 | 0.26 | 0.22 | 41.528750 | -73.945270 | 90.1598 | Beacon |
4 | BELD | 2023-10-17 17:30:00 | 9.9 | 9.7 | 81.6 | 0.0 | 0.00 | 0.0 | 1.5 | 2.7 | ... | 12.1 | 12.6 | 13.3 | 0.52 | 0.44 | 0.40 | 42.223220 | -75.668520 | 470.3700 | Belden |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121 | WFMB | 2023-10-17 17:30:00 | 9.0 | 8.7 | 76.4 | 0.0 | 0.38 | 0.0 | 0.8 | 2.4 | ... | 10.6 | 13.7 | 12.3 | 0.28 | 0.25 | 0.28 | 44.393236 | -73.858829 | 614.5990 | Whiteface Mountain Base |
122 | WGAT | 2023-10-17 17:30:00 | 10.4 | 9.7 | 74.2 | 0.0 | 0.00 | 0.0 | 1.6 | 3.3 | ... | 10.9 | 11.0 | 11.9 | 0.15 | 0.26 | 0.09 | 43.532408 | -75.158597 | 442.9660 | Woodgate |
123 | WHIT | 2023-10-17 17:30:00 | 13.8 | 13.6 | 69.1 | 0.0 | 0.00 | 0.0 | 0.9 | 1.8 | ... | 12.4 | 13.1 | 14.1 | 0.47 | 0.51 | 0.49 | 43.485073 | -73.423071 | 36.5638 | Whitehall |
124 | WOLC | 2023-10-17 17:30:00 | 11.7 | 11.5 | 82.9 | 0.0 | 2.08 | 0.0 | 1.6 | 2.8 | ... | 13.7 | 13.0 | 13.7 | 0.17 | 0.02 | 0.07 | 43.228680 | -76.842610 | 121.2190 | Wolcott |
125 | YORK | 2023-10-17 17:30:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 42.855040 | -77.847760 | 177.9420 | York |
126 rows × 34 columns
Summary¶
Pandas includes functions to combine columns from multiple
DataFrame
s.Often, it is necessary to re-index the modified `DataFrame.
Modifed
DataFrame
s are easily saved as csv files.
What’s Next?¶
In the next notebook, we will create multiple indexes on NYSM DataFrame
s.