Pandas Notebook 2, ATM350 Spring 2023 ¶

Motivating Science Questions:¶

What was the daily temperature and precipitation at Albany last year?
What were the the days with the most precipitation?

Motivating Technical Question:¶

How can we use Pandas to do some basic statistical analyses of our data?

We’ll start by repeating some of the same steps we did in the first Pandas notebook.¶

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()

file = '/spare11/atm350/common/data/climo_alb_2022.csv'

Display the first five lines of this file using Python’s built-in readline function

fileObj = open(file)
nLines = 5
for n in range(nLines):
    line = fileObj.readline()
    print(line)

DATE,MAX,MIN,AVG,DEP,HDD,CDD,PCP,SNW,DPT

2022-01-01,51,41,46.0,19.7,19,0,0.12,0.0,0

2022-01-02,49,23,36.0,9.9,29,0,0.07,0.2,0

2022-01-03,23,13,18.0,-7.9,47,0,T,T,T

2022-01-04,29,10,19.5,-6.2,45,0,T,0.1,T

df = pd.read_csv(file, dtype='string')

nRows = df.shape[0]
print ("Number of rows = %d" % nRows )
nCols = df.shape[1]
print ("Number of columns = %d" % nCols)

date = df['DATE']
date = pd.to_datetime(date,format="%Y-%m-%d")

maxT = df['MAX'].astype("float32")
minT = df['MIN'].astype("float32")

Number of rows = 365
Number of columns = 10

Let’s generate the final timeseries we made in our first Pandas notebook, with all the “bells and whistles” included.¶

from matplotlib.dates import DateFormatter, AutoDateLocator,HourLocator,DayLocator,MonthLocator

Set the year so we don’t have to edit the string labels every year!

year = 2022

fig, ax = plt.subplots(figsize=(15,10))
ax.plot (date, maxT, color='red',label = "Max T")
ax.plot (date, minT, color='blue', label = "Min T")
ax.set_title ("ALB Year %d" % year)
ax.set_xlabel('Date')
ax.set_ylabel('Temperature ($^\circ$F)' )
ax.xaxis.set_major_locator(MonthLocator(interval=1))
dateFmt = DateFormatter('%b %d')
ax.xaxis.set_major_formatter(dateFmt)
ax.legend (loc="best")

<matplotlib.legend.Legend at 0x15139c245db0>

Read in precip data. This will be more challenging due to the presence of T(races).¶

Let’s remind ourselves what the Dataframe looks like, paying particular attention to the daily precip column (PCP).

df

	DATE	MAX	MIN	AVG	DEP	HDD	CDD	PCP	SNW	DPT
0	2022-01-01	51	41	46.0	19.7	19	0	0.12	0.0	0
1	2022-01-02	49	23	36.0	9.9	29	0	0.07	0.2	0
2	2022-01-03	23	13	18.0	-7.9	47	0	T	T	T
3	2022-01-04	29	10	19.5	-6.2	45	0	T	0.1	T
4	2022-01-05	38	28	33.0	7.5	32	0	0.00	0.0	T
...	...	...	...	...	...	...	...	...	...	...
360	2022-12-27	34	22	28.0	0.6	37	0	0.00	0.0	T
361	2022-12-28	41	22	31.5	4.3	33	0	0.00	0.0	T
362	2022-12-29	48	22	35.0	8.1	30	0	0.00	0.0	T
363	2022-12-30	57	43	50.0	23.3	15	0	0.00	0.0	0
364	2022-12-31	53	44	48.5	22.0	16	0	0.08	0.0	0

365 rows × 10 columns

Exercise: define a Pandas DataSeries called precip and populate it with the requisite column from our Dataframe. Then print out its values.

TIP: After you have tried on your own, you can uncomment the first line of the cell below and re-run to load the solution.

# %load /spare11/atm350/common/feb23/02a.py

The task now is to convert these values from strings to floating point values. Our task is more complicated due to the presence of strings that are clearly not numerical … such as “T” for trace.¶

As we did in the first Pandas notebook with max temperatures greater than or equal to 90, create a subset of our `Dataframe` that consists only of those days where precip was a trace.¶

traceDays = df[precip=='T']
traceDays

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 traceDays = df[precip=='T']
      2 traceDays

NameError: name 'precip' is not defined

traceDays.shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 traceDays.shape

NameError: name 'traceDays' is not defined

traceDays.shape[0]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 traceDays.shape[0]

NameError: name 'traceDays' is not defined

Exercise: print out the total # of days where a trace of precip was measured. Hint: look back at how we calculated the total # of 90 degree days in our first Pandas notebook ... we used the shape attribute.

# %load /spare11/atm350/common/feb23/02b.py

Getting back to our task of converting precip amounts from strings to floating point numbers, one thing we could do is to create a new array and populate it via a loop, where we’d use an `if-else` logical test to check for Trace values and set the precip value to 0.00 for each day accordingly.¶

There is a more efficient way to do this, though!¶

See https://stackoverflow.com/questions/49154068/fixing-a-typeerror-when-using-pandas-after-replacing-a-string-with-a-floating-po?rq=1

We use the `loc` method of Pandas to find all elements of a DataSeries with a certain value, and then change that value to something else, all in the same line of code!¶

In this case, let’s set all values of ‘T’ to ‘0.00’¶

The line below is what we want! Before we execute it, let’s break it up into pieces.

df.loc[df['PCP'] =='T', ['PCP']] = '0.00'

First, create a `Series` of booleans corresponding to the specified condition.¶

df['PCP'] == 'T'

    False
    False
     True
     True
    False
       ...  
  False
  False
  False
  False
  False
Name: PCP, Length: 365, dtype: boolean

Next, build on that cell by using `loc` to display all rows that correspond to the condition being True.¶

df.loc[df['PCP'] == 'T']

	DATE	MAX	MIN	AVG	DEP	HDD	CDD	PCP	SNW	DPT
2	2022-01-03	23	13	18.0	-7.9	47	0	T	T	T
3	2022-01-04	29	10	19.5	-6.2	45	0	T	0.1	T
9	2022-01-10	32	16	24.0	-0.6	41	0	T	T	T
13	2022-01-14	32	6	19.0	-5.1	46	0	T	T	T
17	2022-01-18	27	7	17.0	-6.8	48	0	T	T	3
19	2022-01-20	38	7	22.5	-1.2	42	0	T	T	2
24	2022-01-25	32	18	25.0	1.4	40	0	T	0.1	3
27	2022-01-28	30	11	20.5	-3.2	44	0	T	0.1	2
38	2022-02-08	37	28	32.5	7.4	32	0	T	T	2
41	2022-02-11	51	27	39.0	13.3	26	0	T	T	2
42	2022-02-12	51	28	39.5	13.5	25	0	T	0.0	2
43	2022-02-13	27	15	21.0	-5.2	44	0	T	T	1
44	2022-02-14	18	6	12.0	-14.4	53	0	T	0.1	T
45	2022-02-15	29	8	18.5	-8.2	46	0	T	T	T
53	2022-02-23	57	22	39.5	10.6	25	0	T	T	0
57	2022-02-27	37	24	30.5	0.4	34	0	T	0.1	6
64	2022-03-06	64	38	51.0	18.7	14	0	T	0.0	2
71	2022-03-13	34	18	26.0	-8.5	39	0	T	T	4
74	2022-03-16	60	32	46.0	10.4	19	0	T	0.0	0
85	2022-03-27	41	22	31.5	-8.1	33	0	T	T	0
86	2022-03-28	27	18	22.5	-17.5	42	0	T	T	0
95	2022-04-06	59	45	52.0	8.2	13	0	T	0.0	0
99	2022-04-10	49	36	42.5	-3.1	22	0	T	0.0	0
102	2022-04-13	77	41	59.0	12.1	6	0	T	0.0	0
104	2022-04-15	68	35	51.5	3.6	13	0	T	0.0	0
106	2022-04-17	48	34	41.0	-7.8	24	0	T	T	0
109	2022-04-20	52	32	42.0	-8.1	23	0	T	0.0	0
110	2022-04-21	60	29	44.5	-6.1	20	0	T	0.0	0
112	2022-04-23	57	34	45.5	-5.9	19	0	T	0.0	0
113	2022-04-24	72	45	58.5	6.6	6	0	T	0.0	0
153	2022-06-03	78	58	68.0	3.1	0	3	T	0.0	0
166	2022-06-16	75	66	70.5	1.8	0	6	T	0.0	0
192	2022-07-12	91	70	80.5	7.2	0	16	T	0.0	0
193	2022-07-13	87	62	74.5	1.1	0	10	T	0.0	0
196	2022-07-16	88	60	74.0	0.5	0	9	T	0.0	0
208	2022-07-28	92	67	79.5	6.3	0	15	T	0.0	0
216	2022-08-05	93	71	82.0	9.3	0	17	T	0.0	0
227	2022-08-16	87	58	72.5	0.9	0	8	T	0.0	0
233	2022-08-22	81	70	75.5	4.8	0	11	T	0.0	0
242	2022-08-31	79	66	72.5	3.7	0	8	T	0.0	0
254	2022-09-12	80	65	72.5	7.4	0	8	T	0.0	0
262	2022-09-20	75	59	67.0	5.1	0	2	T	0.0	0
267	2022-09-25	66	49	57.5	-2.3	7	0	T	0.0	0
269	2022-09-27	68	50	59.0	0.1	6	0	T	0.0	0
291	2022-10-19	53	33	43.0	-7.1	22	0	T	0.0	0
303	2022-10-31	64	37	50.5	4.6	14	0	T	0.0	0
316	2022-11-13	49	36	42.5	1.1	22	0	T	0.0	0
321	2022-11-18	43	28	35.5	-4.1	29	0	T	T	T
324	2022-11-21	38	19	28.5	-10.0	36	0	T	T	T
331	2022-11-28	50	32	41.0	5.0	24	0	T	0.0	0
335	2022-12-02	41	31	36.0	1.4	29	0	T	0.0	0
347	2022-12-14	31	17	24.0	-6.9	41	0	T	T	3
352	2022-12-19	36	27	31.5	2.0	33	0	T	T	4
357	2022-12-24	15	7	11.0	-17.1	54	0	T	0.0	1

Further build this line of code by only returning the column of interest.¶

df.loc[df['PCP'] =='T', ['PCP']]

	PCP
2	T
3	T
9	T
13	T
17	T
19	T
24	T
27	T
38	T
41	T
42	T
43	T
44	T
45	T
53	T
57	T
64	T
71	T
74	T
85	T
86	T
95	T
99	T
102	T
104	T
106	T
109	T
110	T
112	T
113	T
153	T
166	T
192	T
193	T
196	T
208	T
216	T
227	T
233	T
242	T
254	T
262	T
267	T
269	T
291	T
303	T
316	T
321	T
324	T
331	T
335	T
347	T
352	T
357	T

Finally, we have arrived at the full line of code! Take the column of interest, in this case precip only on those days where a trace was measured, and set its value to 0.00.¶

df.loc[df['PCP'] =='T', ['PCP']] = '0.00'

df['PCP']

    0.12
    0.07
    0.00
    0.00
    0.00
       ... 
  0.00
  0.00
  0.00
  0.00
  0.08
Name: PCP, Length: 365, dtype: string

This operation actually modifies the `Dataframe` in place . We can prove this by printing out a row from a date that we know had a trace amount.¶

But first how do we simply print a specific row from a dataframe? Since we know that Jan. 3 had a trace of precip, try this:¶

jan03 = df['DATE'] == '2022-01-03'
jan03

    False
    False
     True
    False
    False
       ...  
  False
  False
  False
  False
  False
Name: DATE, Length: 365, dtype: boolean

That produces a series of booleans; the one matching our condition is `True`. Now we can retrieve all the values for this date.¶

df[jan03]

	DATE	MAX	MIN	AVG	DEP	HDD	CDD	PCP	SNW	DPT
2	2022-01-03	23	13	18.0	-7.9	47	0	0.00	T	T

We see that the precip has now been set to 0.00.¶

Having done this check, and thus re-set the values, let’s now convert this series into floating point values.¶

precip = df['PCP'].astype("float32")

precip

    0.12
    0.07
    0.00
    0.00
    0.00
       ... 
  0.00
  0.00
  0.00
  0.00
  0.08
Name: PCP, Length: 365, dtype: float32

Plot each day’s precip total.¶

fig, ax = plt.subplots(figsize=(15,10))
ax.plot (date, precip, color='blue', marker='+',label = "Precip")
ax.set_title ("ALB Year  %d" % year)
ax.set_xlabel('Date')
ax.set_ylabel('Precip (in.)' )
ax.xaxis.set_major_locator(MonthLocator(interval=1))
dateFmt = DateFormatter('%b %d')
ax.xaxis.set_major_formatter(dateFmt)
ax.legend (loc="best")

<matplotlib.legend.Legend at 0x15139bcabfa0>

What if we just want to pick a certain time range? One simple way is to just pass in a subset of our x and y to the `plot` method.¶

# Plot out just the trace for October. Corresponds to Julian days 214-245 ... thus, indices 213-244 (why?).
fig, ax = plt.subplots(figsize=(15,10))
ax.plot (date[213:244], precip[213:244], color='blue', marker='+',label = "Precip")
ax.set_title ("ALB Year  %d" % year)
ax.set_xlabel('Date')
ax.set_ylabel('Precip (in.)' )
ax.xaxis.set_major_locator(MonthLocator(interval=1))
dateFmt = DateFormatter('%b %d')
ax.xaxis.set_major_formatter(dateFmt)
ax.legend (loc="best")

<matplotlib.legend.Legend at 0x15139bd82f80>

Exercise: print out a table of days with precip amounts of at least 1.00 inches. In a separate cell, print out the total # of such days.

# %load '/spare11/atm350/common/feb23/02c.py'

Pandas has a function to compute the cumulative sum of a series. We’ll use it to compute and graph Albany’s total precip over the year.¶

precipTotal = precip.cumsum()

precipTotal

     0.120000
     0.190000
     0.190000
     0.190000
     0.190000
         ...    
  37.219997
  37.219997
  37.219997
  37.219997
  37.299999
Name: PCP, Length: 365, dtype: float32

We can see that the final total is in the last element of the precipTotal array. How can we explicitly print out just this value?¶

One of the methods available to us in a Pandas DataSeries is `values`. Let’s display it:¶

precipTotal.values

array([ 0.12     ,  0.19     ,  0.19     ,  0.19     ,  0.19     ,
        0.2      ,  0.28     ,  0.28     ,  0.5      ,  0.5      ,
        0.5      ,  0.5      ,  0.5      ,  0.5      ,  0.5      ,
        0.58     ,  1.16     ,  1.16     ,  1.16     ,  1.16     ,
        1.16     ,  1.16     ,  1.17     ,  1.1899999,  1.1899999,
        1.1899999,  1.1899999,  1.1899999,  1.28     ,  1.28     ,
        1.28     ,  1.28     ,  1.28     ,  2.11     ,  2.86     ,
        2.8799999,  2.8799999,  2.8899999,  2.8899999,  2.8899999,
        2.9099998,  2.9099998,  2.9099998,  2.9099998,  2.9099998,
        2.9099998,  2.9099998,  3.0299997,  3.1399996,  3.1699996,
        3.1699996,  3.1699996,  3.2799995,  3.2799995,  3.2799995,
        4.049999 ,  4.049999 ,  4.049999 ,  4.049999 ,  4.0599995,
        4.0599995,  4.0899997,  4.0899997,  4.0899997,  4.0899997,
        4.6699996,  4.6699996,  4.9599996,  4.9599996,  4.9599996,
        5.2899995,  5.2899995,  5.2899995,  5.3199997,  5.3199997,
        5.3599997,  5.3599997,  5.64     ,  5.68     ,  5.68     ,
        5.68     ,  5.7599998,  6.08     ,  6.38     ,  6.3900003,
        6.3900003,  6.3900003,  6.3900003,  6.3900003,  7.4000006,
        7.5900006,  7.5900006,  7.700001 ,  7.710001 ,  7.710001 ,
        7.710001 ,  9.740001 , 10.010001 , 10.4400015, 10.4400015,
       10.450002 , 10.570002 , 10.570002 , 10.580002 , 10.580002 ,
       11.020001 , 11.020001 , 11.310001 , 12.180001 , 12.180001 ,
       12.180001 , 12.180001 , 12.180001 , 12.180001 , 12.180001 ,
       12.380001 , 12.400002 , 12.400002 , 12.400002 , 12.400002 ,
       12.400002 , 12.560001 , 12.570002 , 12.9400015, 12.9400015,
       12.9400015, 12.9400015, 12.9400015, 12.9400015, 12.9400015,
       12.9400015, 12.9400015, 12.9400015, 13.010001 , 13.290001 ,
       13.780001 , 13.81     , 13.81     , 14.120001 , 14.120001 ,
       14.120001 , 14.210001 , 14.210001 , 14.210001 , 14.210001 ,
       14.210001 , 14.210001 , 14.250001 , 14.250001 , 14.250001 ,
       14.250001 , 14.720001 , 14.720001 , 14.720001 , 14.720001 ,
       14.720001 , 14.720001 , 14.800001 , 15.060001 , 15.810001 ,
       15.810001 , 15.810001 , 15.820002 , 15.820002 , 15.820002 ,
       15.820002 , 15.820002 , 15.820002 , 15.860002 , 15.860002 ,
       15.860002 , 15.910002 , 15.910002 , 15.930002 , 15.930002 ,
       15.930002 , 15.930002 , 16.230001 , 16.230001 , 16.230001 ,
       16.230001 , 16.36     , 16.43     , 16.43     , 16.43     ,
       16.78     , 16.800001 , 16.800001 , 16.800001 , 16.800001 ,
       16.800001 , 16.800001 , 16.800001 , 16.800001 , 16.800001 ,
       16.800001 , 16.800001 , 16.800001 , 16.960001 , 16.960001 ,
       16.990002 , 16.990002 , 17.000002 , 17.000002 , 17.070002 ,
       17.670002 , 17.670002 , 17.670002 , 17.670002 , 17.680002 ,
       17.680002 , 17.680002 , 17.680002 , 17.680002 , 17.680002 ,
       18.420002 , 18.420002 , 18.930002 , 19.330002 , 19.410002 ,
       19.480001 , 19.480001 , 19.480001 , 19.480001 , 19.480001 ,
       19.480001 , 19.480001 , 19.480001 , 20.240002 , 20.240002 ,
       20.240002 , 20.240002 , 20.240002 , 20.240002 , 20.710001 ,
       20.710001 , 20.730001 , 20.760002 , 20.760002 , 20.760002 ,
       20.760002 , 21.520002 , 21.520002 , 21.530003 , 21.530003 ,
       21.530003 , 21.540003 , 22.870003 , 23.670002 , 23.670002 ,
       23.670002 , 23.670002 , 23.670002 , 23.690002 , 23.690002 ,
       25.260002 , 25.260002 , 25.260002 , 25.260002 , 25.260002 ,
       25.700003 , 26.040003 , 26.040003 , 26.040003 , 26.680002 ,
       26.680002 , 26.680002 , 26.680002 , 26.910002 , 26.910002 ,
       26.970001 , 26.970001 , 26.970001 , 26.970001 , 26.970001 ,
       26.970001 , 27.000002 , 27.430002 , 27.430002 , 27.450003 ,
       27.450003 , 27.450003 , 27.480003 , 27.480003 , 27.480003 ,
       28.690002 , 29.010002 , 29.010002 , 29.010002 , 29.460003 ,
       29.460003 , 29.460003 , 29.460003 , 29.460003 , 29.460003 ,
       29.460003 , 29.870003 , 30.120003 , 30.120003 , 30.120003 ,
       30.120003 , 30.120003 , 30.120003 , 30.120003 , 30.130003 ,
       30.130003 , 30.130003 , 30.130003 , 30.130003 , 30.130003 ,
       30.380003 , 30.380003 , 30.380003 , 30.380003 , 30.920004 ,
       31.580004 , 31.580004 , 31.580004 , 31.760004 , 32.110004 ,
       32.140003 , 32.140003 , 32.140003 , 32.15     , 32.15     ,
       32.15     , 32.15     , 32.15     , 32.18     , 32.18     ,
       33.16     , 33.16     , 33.16     , 33.35     , 33.35     ,
       33.35     , 33.44     , 33.44     , 33.44     , 33.609997 ,
       33.739998 , 33.739998 , 33.739998 , 33.739998 , 34.14     ,
       34.149998 , 34.149998 , 34.149998 , 34.309998 , 35.339996 ,
       35.449997 , 35.449997 , 35.449997 , 35.449997 , 35.449997 ,
       35.629997 , 37.219997 , 37.219997 , 37.219997 , 37.219997 ,
       37.219997 , 37.219997 , 37.219997 , 37.219997 , 37.3      ],
      dtype=float32)

Exercise: It's an array! So, let's print out the last element of the array. What index # can we use?

# %load '/spare11/atm350/common/feb23/02d.py'

Plot the timeseries of the cumulative precip for Albany over the year.¶

fig, ax = plt.subplots(figsize=(15,10))
ax.plot (date, precipTotal, color='blue', marker='.',label = "Precip")
ax.set_title ("ALB Year  %d" % year)
ax.set_xlabel('Date')
ax.set_ylabel('Precip (in.)' )
ax.xaxis.set_major_locator(MonthLocator(interval=1))
dateFmt = DateFormatter('%b %d')
ax.xaxis.set_major_formatter(dateFmt)
ax.legend (loc="best")

<matplotlib.legend.Legend at 0x15139324fd00>

ATM350 Spring 2023

Pandas Notebook 2, ATM350 Spring 2023

Contents

Pandas Notebook 2, ATM350 Spring 2023 ¶

Motivating Science Questions:¶

Motivating Technical Question:¶

We’ll start by repeating some of the same steps we did in the first Pandas notebook.¶

Let’s generate the final timeseries we made in our first Pandas notebook, with all the “bells and whistles” included.¶

Read in precip data. This will be more challenging due to the presence of T(races).¶

The task now is to convert these values from strings to floating point values. Our task is more complicated due to the presence of strings that are clearly not numerical … such as “T” for trace.¶

As we did in the first Pandas notebook with max temperatures greater than or equal to 90, create a subset of our `Dataframe` that consists only of those days where precip was a trace.¶

Getting back to our task of converting precip amounts from strings to floating point numbers, one thing we could do is to create a new array and populate it via a loop, where we’d use an `if-else` logical test to check for Trace values and set the precip value to 0.00 for each day accordingly.¶

There is a more efficient way to do this, though!¶

We use the `loc` method of Pandas to find all elements of a DataSeries with a certain value, and then change that value to something else, all in the same line of code!¶

In this case, let’s set all values of ‘T’ to ‘0.00’¶

First, create a `Series` of booleans corresponding to the specified condition.¶

Next, build on that cell by using `loc` to display all rows that correspond to the condition being True.¶

Further build this line of code by only returning the column of interest.¶

Finally, we have arrived at the full line of code! Take the column of interest, in this case precip only on those days where a trace was measured, and set its value to 0.00.¶

This operation actually modifies the `Dataframe` in place . We can prove this by printing out a row from a date that we know had a trace amount.¶

But first how do we simply print a specific row from a dataframe? Since we know that Jan. 3 had a trace of precip, try this:¶

That produces a series of booleans; the one matching our condition is `True`. Now we can retrieve all the values for this date.¶

We see that the precip has now been set to 0.00.¶

Having done this check, and thus re-set the values, let’s now convert this series into floating point values.¶

Plot each day’s precip total.¶

What if we just want to pick a certain time range? One simple way is to just pass in a subset of our x and y to the `plot` method.¶

Pandas has a function to compute the cumulative sum of a series. We’ll use it to compute and graph Albany’s total precip over the year.¶

We can see that the final total is in the last element of the precipTotal array. How can we explicitly print out just this value?¶

One of the methods available to us in a Pandas DataSeries is `values`. Let’s display it:¶

Plot the timeseries of the cumulative precip for Albany over the year.¶

Pandas has a plethora of statistical analysis methods to apply on tabular data. An excellent summary method is `describe`.¶

We’ll wrap up by calculating and then plotting rolling means over a period of days in the year, in order to smooth out the day-to-day variations.¶

	PCP
2	T
3	T
9	T
13	T
17	T
19	T
24	T
27	T
38	T
41	T
42	T
43	T
44	T
45	T
53	T
57	T
64	T
71	T
74	T
85	T
86	T
95	T
99	T
102	T
104	T
106	T
109	T
110	T
112	T
113	T
153	T
166	T
192	T
193	T
196	T
208	T
216	T
227	T
233	T
242	T
254	T
262	T
267	T
269	T
291	T
303	T
316	T
321	T
324	T
331	T
335	T
347	T
352	T
357	T

	PCP
2	T
3	T
9	T
13	T
17	T
19	T
24	T
27	T
38	T
41	T
42	T
43	T
44	T
45	T
53	T
57	T
64	T
71	T
74	T
85	T
86	T
95	T
99	T
102	T
104	T
106	T
109	T
110	T
112	T
113	T
153	T
166	T
192	T
193	T
196	T
208	T
216	T
227	T
233	T
242	T
254	T
262	T
267	T
269	T
291	T
303	T
316	T
321	T
324	T
331	T
335	T
347	T
352	T
357	T

ATM350 Spring 2023

Pandas Notebook 2, ATM350 Spring 2023

Contents

Pandas Notebook 2, ATM350 Spring 2023 ¶

Motivating Science Questions:¶

Motivating Technical Question:¶

We’ll start by repeating some of the same steps we did in the first Pandas notebook.¶

Let’s generate the final timeseries we made in our first Pandas notebook, with all the “bells and whistles” included.¶

Read in precip data. This will be more challenging due to the presence of T(races).¶

The task now is to convert these values from strings to floating point values. Our task is more complicated due to the presence of strings that are clearly not numerical … such as “T” for trace.¶

As we did in the first Pandas notebook with max temperatures greater than or equal to 90, create a subset of our Dataframe that consists only of those days where precip was a trace.¶

Getting back to our task of converting precip amounts from strings to floating point numbers, one thing we could do is to create a new array and populate it via a loop, where we’d use an if-else logical test to check for Trace values and set the precip value to 0.00 for each day accordingly.¶

There is a more efficient way to do this, though!¶

We use the loc method of Pandas to find all elements of a DataSeries with a certain value, and then change that value to something else, all in the same line of code!¶

In this case, let’s set all values of ‘T’ to ‘0.00’¶

First, create a Series of booleans corresponding to the specified condition.¶

Next, build on that cell by using loc to display all rows that correspond to the condition being True.¶

Further build this line of code by only returning the column of interest.¶

Finally, we have arrived at the full line of code! Take the column of interest, in this case precip only on those days where a trace was measured, and set its value to 0.00.¶

This operation actually modifies the Dataframe in place . We can prove this by printing out a row from a date that we know had a trace amount.¶

But first how do we simply print a specific row from a dataframe? Since we know that Jan. 3 had a trace of precip, try this:¶

That produces a series of booleans; the one matching our condition is True. Now we can retrieve all the values for this date.¶

We see that the precip has now been set to 0.00.¶

Having done this check, and thus re-set the values, let’s now convert this series into floating point values.¶

Plot each day’s precip total.¶

What if we just want to pick a certain time range? One simple way is to just pass in a subset of our x and y to the plot method.¶

Pandas has a function to compute the cumulative sum of a series. We’ll use it to compute and graph Albany’s total precip over the year.¶

We can see that the final total is in the last element of the precipTotal array. How can we explicitly print out just this value?¶

One of the methods available to us in a Pandas DataSeries is values. Let’s display it:¶

Plot the timeseries of the cumulative precip for Albany over the year.¶

Pandas has a plethora of statistical analysis methods to apply on tabular data. An excellent summary method is describe.¶

We’ll wrap up by calculating and then plotting rolling means over a period of days in the year, in order to smooth out the day-to-day variations.¶

As we did in the first Pandas notebook with max temperatures greater than or equal to 90, create a subset of our `Dataframe` that consists only of those days where precip was a trace.¶

Getting back to our task of converting precip amounts from strings to floating point numbers, one thing we could do is to create a new array and populate it via a loop, where we’d use an `if-else` logical test to check for Trace values and set the precip value to 0.00 for each day accordingly.¶

We use the `loc` method of Pandas to find all elements of a DataSeries with a certain value, and then change that value to something else, all in the same line of code!¶

First, create a `Series` of booleans corresponding to the specified condition.¶

Next, build on that cell by using `loc` to display all rows that correspond to the condition being True.¶

This operation actually modifies the `Dataframe` in place . We can prove this by printing out a row from a date that we know had a trace amount.¶

That produces a series of booleans; the one matching our condition is `True`. Now we can retrieve all the values for this date.¶

What if we just want to pick a certain time range? One simple way is to just pass in a subset of our x and y to the `plot` method.¶

One of the methods available to us in a Pandas DataSeries is `values`. Let’s display it:¶

Pandas has a plethora of statistical analysis methods to apply on tabular data. An excellent summary method is `describe`.¶

	PCP
2	T
3	T
9	T
13	T
17	T
19	T
24	T
27	T
38	T
41	T
42	T
43	T
44	T
45	T
53	T
57	T
64	T
71	T
74	T
85	T
86	T
95	T
99	T
102	T
104	T
106	T
109	T
110	T
112	T
113	T
153	T
166	T
192	T
193	T
196	T
208	T
216	T
227	T
233	T
242	T
254	T
262	T
267	T
269	T
291	T
303	T
316	T
321	T
324	T
331	T
335	T
347	T
352	T
357	T