Pandas Notebook 1, ATM350 Spring 2025#
Here, we read in a text file that has climatological data compiled at the National Weather Service in Albany NY for 2024, previously downloaded and reformatted from the xmACIS2 climate data portal.
We will use the Pandas library to read and analyze the data. We will also use the Matplotlib package to visualize it.
Motivating Science Questions:#
How can we analyze and display tabular climate data for a site?
What was the yearly trace of max/min temperatures for Albany, NY last year?
What was the most common 10-degree maximum temperature range for Albany, NY last year?
# import Pandas and Numpy, and use their conventional two-letter abbreviations when we
# use methods from these packages. Also, import matplotlib's plotting package, using its
# standard abbreviation.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Specify the location of the file that contains the climo data. Use the linux ls command to verify it exists.#
Note that in a Jupyter notebook, we can simply use the ! directive to “call” a Linux command.#
Also notice how we refer to a Python variable name when passing it to a Linux command line in this way … we enclose it in braces!#
year = 2024
file = f'/spare11/atm350/common/data/climo_alb_{year}.csv'
! ls -l {file}
-rw-r--r-- 1 ktyle faculty 15184 Mar 5 19:20 /spare11/atm350/common/data/climo_alb_2024.csv
Use pandas’ read_csv
method to open the file. Specify that the data is to be read in as strings (not integers nor floating points).#
Once this call succeeds, it returns a Pandas Dataframe object which we reference as df
#
df = pd.read_csv(file, dtype='string')
By simply typing the name of the dataframe object, we can get some of its contents to be “pretty-printed” to the notebook!#
df
DATE | MAX | MIN | AVG | DEP | PCP | SNW | DPTH | HDD | CDD | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2024-01-01 | 36 | 25 | 30.5 | 4.2 | 0.00 | 0.0 | 0 | 34 | 0 |
1 | 2024-01-02 | 38 | 24 | 31.0 | 4.9 | 0.00 | 0.0 | 0 | 34 | 0 |
2 | 2024-01-03 | 40 | 32 | 36.0 | 10.1 | 0.00 | 0.0 | 0 | 29 | 0 |
3 | 2024-01-04 | 43 | 19 | 31.0 | 5.3 | T | T | 0 | 34 | 0 |
4 | 2024-01-05 | 31 | 16 | 23.5 | -2.0 | T | T | 0 | 41 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
361 | 2024-12-27 | 27 | 10 | 18.5 | -8.9 | 0.03 | 0.3 | 2 | 46 | 0 |
362 | 2024-12-28 | 35 | 22 | 28.5 | 1.3 | 0.04 | 0.0 | 2 | 36 | 0 |
363 | 2024-12-29 | 57 | 33 | 45.0 | 18.1 | 0.04 | 0.0 | 1 | 20 | 0 |
364 | 2024-12-30 | 58 | 44 | 51.0 | 24.3 | 0.17 | 0.0 | 0 | 14 | 0 |
365 | 2024-12-31 | 51 | 30 | 40.5 | 14.0 | T | 0.0 | 0 | 24 | 0 |
366 rows × 10 columns
Our dataframe has 365 or 366 rows (corresponding to all the days in the year) and 10 columns that contain data. This is expressed by calling the shape
attribute of the dataframe. The first number in the pair is the # of rows, while the second is the # of columns.#
df.shape
(366, 10)
It will be useful to have a variable (more accurately, an object ) that holds the value of the number of rows, and another for the number of columns.#
Remember that Python is a language that uses zero-based indexing, so the first value is accessed as element 0, and the second as element 1!#
Look at the syntax we use below to print out the (integer) value of nRows … it’s another example of string formating.#
nRows = df.shape[0]
print (f"Number of rows = {nRows}")
Number of rows = 366
Let’s do the same for the # of columns.#
nCols = df.shape[1]
print (f"Number of columns = {nCols}")
Number of columns = 10
To access the values in a particular column, we reference it with its column name as a string. The next cell pulls in all values of the year-month-date column, and assigns it to an object of the same name. We could have named the object anything we wanted, not just Date … but on the right side of the assignment statement, we have to use the exact name of the column.#
Print out what this object looks like.
Date = df['DATE']
print (Date)
0 2024-01-01
1 2024-01-02
2 2024-01-03
3 2024-01-04
4 2024-01-05
...
361 2024-12-27
362 2024-12-28
363 2024-12-29
364 2024-12-30
365 2024-12-31
Name: DATE, Length: 366, dtype: string
Each column of a Pandas dataframe is known as a series. It is basically an array of values, each of which has a corresponding row #. By default, row #’s accompanying a Series are numbered consecutively, starting with 0 (since Python’s convention is to use zero-based indexing ).#
We can reference a particular value, or set of values, of a Series by using array-based notation. Below, let’s print out the first 30 rows of the dates.#
print (Date[:30])
0 2024-01-01
1 2024-01-02
2 2024-01-03
3 2024-01-04
4 2024-01-05
5 2024-01-06
6 2024-01-07
7 2024-01-08
8 2024-01-09
9 2024-01-10
10 2024-01-11
11 2024-01-12
12 2024-01-13
13 2024-01-14
14 2024-01-15
15 2024-01-16
16 2024-01-17
17 2024-01-18
18 2024-01-19
19 2024-01-20
20 2024-01-21
21 2024-01-22
22 2024-01-23
23 2024-01-24
24 2024-01-25
25 2024-01-26
26 2024-01-27
27 2024-01-28
28 2024-01-29
29 2024-01-30
Name: DATE, dtype: string
Similarly, let’s print out the last, or 365th row (Why is it 365, not 366???)#
print(Date[365])
2024-12-31
Note that using -1 as the last index doesn’t work!
print(Date[-1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/indexes/range.py:413, in RangeIndex.get_loc(self, key)
412 try:
--> 413 return self._range.index(new_key)
414 except ValueError as err:
ValueError: -1 is not in range
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[11], line 1
----> 1 print(Date[-1])
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/series.py:1121, in Series.__getitem__(self, key)
1118 return self._values[key]
1120 elif key_is_scalar:
-> 1121 return self._get_value(key)
1123 # Convert generator to list before going through hashable part
1124 # (We will iterate through the generator there to check for slices)
1125 if is_iterator(key):
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/series.py:1237, in Series._get_value(self, label, takeable)
1234 return self._values[label]
1236 # Similar to Index.get_value, but we do not fall back to positional
-> 1237 loc = self.index.get_loc(label)
1239 if is_integer(loc):
1240 return self._values[loc]
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/indexes/range.py:415, in RangeIndex.get_loc(self, key)
413 return self._range.index(new_key)
414 except ValueError as err:
--> 415 raise KeyError(key) from err
416 if isinstance(key, Hashable):
417 raise KeyError(key)
KeyError: -1
However, using a negative value as part of a slice does work:
print(Date[-9:])
357 2024-12-23
358 2024-12-24
359 2024-12-25
360 2024-12-26
361 2024-12-27
362 2024-12-28
363 2024-12-29
364 2024-12-30
365 2024-12-31
Name: DATE, dtype: string
EXERCISE: Now, let’s create new Series objects; one for Max Temp (name it maxT), and the other for Min Temp (name it minT).#
# %load /spare11/atm350/common/mar04/01a.py
maxT = df['MAX']
minT = df['MIN']
maxT
0 36
1 38
2 40
3 43
4 31
..
361 27
362 35
363 57
364 58
365 51
Name: MAX, Length: 366, dtype: string
minT
0 25
1 24
2 32
3 19
4 16
..
361 10
362 22
363 33
364 44
365 30
Name: MIN, Length: 366, dtype: string
Let’s now list all the days that the high temperature was >= 90. Note carefully how we express this test. It will fail!#
hotDays = maxT >= 90
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 hotDays = maxT >= 90
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/ops/common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
72 return NotImplemented
74 other = item_from_zerodim(other)
---> 76 return method(self, other)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/arraylike.py:60, in OpsMixin.__ge__(self, other)
58 @unpack_zerodim_and_defer("__ge__")
59 def __ge__(self, other):
---> 60 return self._cmp_method(other, operator.ge)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/series.py:6119, in Series._cmp_method(self, other, op)
6116 lvalues = self._values
6117 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
-> 6119 res_values = ops.comparison_op(lvalues, rvalues, op)
6121 return self._construct_result(res_values, name=res_name)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/ops/array_ops.py:330, in comparison_op(left, right, op)
321 raise ValueError(
322 "Lengths must match to compare", lvalues.shape, rvalues.shape
323 )
325 if should_extension_dispatch(lvalues, rvalues) or (
326 (isinstance(rvalues, (Timedelta, BaseOffset, Timestamp)) or right is NaT)
327 and lvalues.dtype != object
328 ):
329 # Call the method on lvalues
--> 330 res_values = op(lvalues, rvalues)
332 elif is_scalar(rvalues) and isna(rvalues): # TODO: but not pd.NA?
333 # numpy does not like comparisons vs None
334 if op is operator.ne:
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/ops/common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
72 return NotImplemented
74 other = item_from_zerodim(other)
---> 76 return method(self, other)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/arraylike.py:60, in OpsMixin.__ge__(self, other)
58 @unpack_zerodim_and_defer("__ge__")
59 def __ge__(self, other):
---> 60 return self._cmp_method(other, operator.ge)
File /knight/jan25/envs/jan25_env/lib/python3.12/site-packages/pandas/core/arrays/string_.py:593, in StringArray._cmp_method(self, other, op)
590 else:
591 # logical
592 result = np.zeros(len(self._ndarray), dtype="bool")
--> 593 result[valid] = op(self._ndarray[valid], other)
594 return BooleanArray(result, mask)
TypeError: '>=' not supported between instances of 'str' and 'int'
Why did it fail? Remember, when we read in the file, we had Pandas assign the type of every column to string! We need to change the type of maxT to a numerical value. Let’s use a 32-bit floating point #, as that will be more than enough precision for this type of measurement. We’ll do the same for the minimum temp.#
maxT = maxT.astype("float32")
minT = minT.astype("float32")
maxT
0 36.0
1 38.0
2 40.0
3 43.0
4 31.0
...
361 27.0
362 35.0
363 57.0
364 58.0
365 51.0
Name: MAX, Length: 366, dtype: float32
hotDays = maxT >= 90
Now, the test works. What does this data series look like? It actually is a table of booleans … i.e., true/false values.#
print (hotDays)
0 False
1 False
2 False
3 False
4 False
...
361 False
362 False
363 False
364 False
365 False
Name: MAX, Length: 366, dtype: bool
As the default output only includes the first and last 5 rows , let’s slice
and pull out a period in the middle of the year, where we might be more likely to get some True
s!#
print (hotDays[180:195])
180 False
181 False
182 False
183 False
184 False
185 False
186 False
187 True
188 False
189 True
190 True
191 True
192 False
193 False
194 True
Name: MAX, dtype: bool
Now, let’s get a count of the # of days meeting this temperature criterion. Note carefully that we first have to express our set of days exceeding the threshold as a Pandas series. Then, recall that to get a count of the # of rows, we take the first (0th) element of the array returned by a call to the shape
method.#
df[maxT >= 90]
DATE | MAX | MIN | AVG | DEP | PCP | SNW | DPTH | HDD | CDD | |
---|---|---|---|---|---|---|---|---|---|---|
169 | 2024-06-18 | 94 | 70 | 82.0 | 12.8 | 0.00 | 0.0 | 0 | 0 | 17 |
170 | 2024-06-19 | 94 | 76 | 85.0 | 15.5 | T | 0.0 | 0 | 0 | 20 |
171 | 2024-06-20 | 96 | 73 | 84.5 | 14.8 | 0.01 | 0.0 | 0 | 0 | 20 |
187 | 2024-07-06 | 93 | 72 | 82.5 | 9.7 | 0.16 | 0.0 | 0 | 0 | 18 |
189 | 2024-07-08 | 92 | 65 | 78.5 | 5.5 | T | 0.0 | 0 | 0 | 14 |
190 | 2024-07-09 | 95 | 76 | 85.5 | 12.4 | T | 0.0 | 0 | 0 | 21 |
191 | 2024-07-10 | 94 | 76 | 85.0 | 11.8 | T | 0.0 | 0 | 0 | 20 |
194 | 2024-07-13 | 91 | 70 | 80.5 | 7.1 | 0.00 | 0.0 | 0 | 0 | 16 |
195 | 2024-07-14 | 91 | 63 | 77.0 | 3.6 | 0.00 | 0.0 | 0 | 0 | 12 |
196 | 2024-07-15 | 94 | 74 | 84.0 | 10.6 | 0.01 | 0.0 | 0 | 0 | 19 |
197 | 2024-07-16 | 93 | 72 | 82.5 | 9.0 | 0.60 | 0.0 | 0 | 0 | 18 |
211 | 2024-07-30 | 91 | 71 | 81.0 | 7.9 | 0.00 | 0.0 | 0 | 0 | 16 |
213 | 2024-08-01 | 92 | 69 | 80.5 | 7.5 | 0.00 | 0.0 | 0 | 0 | 16 |
df[maxT >= 90].shape[0]
13
Let’s reverse the sense of the test, and get its count. The two counts should add up to the total number of days in the year!#
df[maxT < 90].shape[0]
353
We can combine a test of two different thresholds. Let’s get a count of days where the max. temperature was in the 70s or 80s.#
df[(maxT< 90) & (maxT>=70)].shape[0]
149
Let’s show all the climate data for all these “pleasantly warm” days!#
pleasant = df[(maxT< 90) & (maxT>=70)]
pleasant
DATE | MAX | MIN | AVG | DEP | PCP | SNW | DPTH | HDD | CDD | |
---|---|---|---|---|---|---|---|---|---|---|
99 | 2024-04-09 | 77 | 37 | 57.0 | 11.9 | 0.00 | 0.0 | 0 | 8 | 0 |
113 | 2024-04-23 | 70 | 33 | 51.5 | 0.1 | T | 0.0 | 0 | 13 | 0 |
118 | 2024-04-28 | 75 | 48 | 61.5 | 8.0 | 0.05 | 0.0 | 0 | 3 | 0 |
120 | 2024-04-30 | 74 | 51 | 62.5 | 8.2 | 0.43 | 0.0 | 0 | 2 | 0 |
121 | 2024-05-01 | 75 | 54 | 64.5 | 9.8 | T | 0.0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
303 | 2024-10-30 | 74 | 54 | 64.0 | 17.7 | T | 0.0 | 0 | 1 | 0 |
304 | 2024-10-31 | 79 | 55 | 67.0 | 21.1 | 0.00 | 0.0 | 0 | 0 | 2 |
305 | 2024-11-01 | 74 | 48 | 61.0 | 15.4 | 0.02 | 0.0 | 0 | 4 | 0 |
309 | 2024-11-05 | 72 | 52 | 62.0 | 17.8 | 0.00 | 0.0 | 0 | 3 | 0 |
310 | 2024-11-06 | 77 | 53 | 65.0 | 21.2 | T | 0.0 | 0 | 0 | 0 |
149 rows × 10 columns
Notice that after a certain point, not all the rows are displayed to the notebook. We can eliminate the limit of maximum rows and thus show all of the matching days.#
pd.set_option ('display.max_rows', None)
pleasant
DATE | MAX | MIN | AVG | DEP | PCP | SNW | DPTH | HDD | CDD | |
---|---|---|---|---|---|---|---|---|---|---|
99 | 2024-04-09 | 77 | 37 | 57.0 | 11.9 | 0.00 | 0.0 | 0 | 8 | 0 |
113 | 2024-04-23 | 70 | 33 | 51.5 | 0.1 | T | 0.0 | 0 | 13 | 0 |
118 | 2024-04-28 | 75 | 48 | 61.5 | 8.0 | 0.05 | 0.0 | 0 | 3 | 0 |
120 | 2024-04-30 | 74 | 51 | 62.5 | 8.2 | 0.43 | 0.0 | 0 | 2 | 0 |
121 | 2024-05-01 | 75 | 54 | 64.5 | 9.8 | T | 0.0 | 0 | 0 | 0 |
122 | 2024-05-02 | 76 | 55 | 65.5 | 10.4 | 0.00 | 0.0 | 0 | 0 | 1 |
123 | 2024-05-03 | 73 | 50 | 61.5 | 6.0 | 0.00 | 0.0 | 0 | 3 | 0 |
124 | 2024-05-04 | 74 | 53 | 63.5 | 7.6 | 0.00 | 0.0 | 0 | 1 | 0 |
126 | 2024-05-06 | 78 | 52 | 65.0 | 8.4 | T | 0.0 | 0 | 0 | 0 |
127 | 2024-05-07 | 79 | 44 | 61.5 | 4.6 | 0.00 | 0.0 | 0 | 3 | 0 |
128 | 2024-05-08 | 82 | 52 | 67.0 | 9.7 | 0.48 | 0.0 | 0 | 0 | 2 |
133 | 2024-05-13 | 71 | 39 | 55.0 | -3.9 | 0.02 | 0.0 | 0 | 10 | 0 |
134 | 2024-05-14 | 83 | 58 | 70.5 | 11.3 | 0.00 | 0.0 | 0 | 0 | 6 |
136 | 2024-05-16 | 73 | 56 | 64.5 | 4.7 | T | 0.0 | 0 | 0 | 0 |
137 | 2024-05-17 | 76 | 50 | 63.0 | 2.9 | 0.00 | 0.0 | 0 | 2 | 0 |
138 | 2024-05-18 | 77 | 56 | 66.5 | 6.1 | 0.00 | 0.0 | 0 | 0 | 2 |
139 | 2024-05-19 | 80 | 57 | 68.5 | 7.8 | 0.00 | 0.0 | 0 | 0 | 4 |
140 | 2024-05-20 | 86 | 58 | 72.0 | 11.0 | 0.00 | 0.0 | 0 | 0 | 7 |
141 | 2024-05-21 | 85 | 63 | 74.0 | 12.8 | 0.39 | 0.0 | 0 | 0 | 9 |
142 | 2024-05-22 | 89 | 64 | 76.5 | 15.0 | 0.00 | 0.0 | 0 | 0 | 12 |
143 | 2024-05-23 | 80 | 56 | 68.0 | 6.2 | 0.04 | 0.0 | 0 | 0 | 3 |
144 | 2024-05-24 | 82 | 52 | 67.0 | 4.9 | 0.00 | 0.0 | 0 | 0 | 2 |
145 | 2024-05-25 | 81 | 51 | 66.0 | 3.6 | T | 0.0 | 0 | 0 | 1 |
146 | 2024-05-26 | 84 | 60 | 72.0 | 9.3 | 0.00 | 0.0 | 0 | 0 | 7 |
147 | 2024-05-27 | 81 | 69 | 75.0 | 12.1 | 0.56 | 0.0 | 0 | 0 | 10 |
148 | 2024-05-28 | 79 | 59 | 69.0 | 5.8 | T | 0.0 | 0 | 0 | 4 |
149 | 2024-05-29 | 76 | 57 | 66.5 | 3.0 | 0.00 | 0.0 | 0 | 0 | 2 |
150 | 2024-05-30 | 71 | 52 | 61.5 | -2.3 | T | 0.0 | 0 | 3 | 0 |
151 | 2024-05-31 | 76 | 47 | 61.5 | -2.6 | 0.00 | 0.0 | 0 | 3 | 0 |
152 | 2024-06-01 | 85 | 48 | 66.5 | 2.1 | 0.00 | 0.0 | 0 | 0 | 2 |
153 | 2024-06-02 | 84 | 51 | 67.5 | 2.9 | 0.00 | 0.0 | 0 | 0 | 3 |
154 | 2024-06-03 | 87 | 64 | 75.5 | 10.6 | 0.00 | 0.0 | 0 | 0 | 11 |
155 | 2024-06-04 | 88 | 58 | 73.0 | 7.8 | 0.00 | 0.0 | 0 | 0 | 8 |
156 | 2024-06-05 | 87 | 66 | 76.5 | 11.0 | 0.00 | 0.0 | 0 | 0 | 12 |
157 | 2024-06-06 | 83 | 63 | 73.0 | 7.2 | 0.32 | 0.0 | 0 | 0 | 8 |
158 | 2024-06-07 | 78 | 55 | 66.5 | 0.4 | 0.51 | 0.0 | 0 | 0 | 2 |
159 | 2024-06-08 | 73 | 54 | 63.5 | -2.9 | T | 0.0 | 0 | 1 | 0 |
160 | 2024-06-09 | 72 | 56 | 64.0 | -2.7 | 0.73 | 0.0 | 0 | 1 | 0 |
161 | 2024-06-10 | 71 | 52 | 61.5 | -5.5 | 0.00 | 0.0 | 0 | 3 | 0 |
163 | 2024-06-12 | 74 | 54 | 64.0 | -3.5 | 0.00 | 0.0 | 0 | 1 | 0 |
164 | 2024-06-13 | 86 | 51 | 68.5 | 0.7 | 0.00 | 0.0 | 0 | 0 | 4 |
165 | 2024-06-14 | 86 | 66 | 76.0 | 7.9 | 0.05 | 0.0 | 0 | 0 | 11 |
166 | 2024-06-15 | 79 | 56 | 67.5 | -0.9 | 0.00 | 0.0 | 0 | 0 | 3 |
167 | 2024-06-16 | 77 | 47 | 62.0 | -6.7 | 0.00 | 0.0 | 0 | 3 | 0 |
168 | 2024-06-17 | 88 | 63 | 75.5 | 6.6 | T | 0.0 | 0 | 0 | 11 |
172 | 2024-06-21 | 89 | 69 | 79.0 | 9.0 | 0.32 | 0.0 | 0 | 0 | 14 |
173 | 2024-06-22 | 81 | 70 | 75.5 | 5.3 | 0.53 | 0.0 | 0 | 0 | 11 |
174 | 2024-06-23 | 85 | 71 | 78.0 | 7.5 | 0.24 | 0.0 | 0 | 0 | 13 |
175 | 2024-06-24 | 76 | 61 | 68.5 | -2.2 | 0.23 | 0.0 | 0 | 0 | 4 |
176 | 2024-06-25 | 88 | 57 | 72.5 | 1.6 | T | 0.0 | 0 | 0 | 8 |
177 | 2024-06-26 | 86 | 69 | 77.5 | 6.3 | T | 0.0 | 0 | 0 | 13 |
178 | 2024-06-27 | 79 | 60 | 69.5 | -1.9 | 0.00 | 0.0 | 0 | 0 | 5 |
179 | 2024-06-28 | 77 | 50 | 63.5 | -8.1 | 0.00 | 0.0 | 0 | 1 | 0 |
180 | 2024-06-29 | 76 | 59 | 67.5 | -4.3 | 0.32 | 0.0 | 0 | 0 | 3 |
181 | 2024-06-30 | 87 | 64 | 75.5 | 3.5 | 0.02 | 0.0 | 0 | 0 | 11 |
182 | 2024-07-01 | 79 | 61 | 70.0 | -2.1 | 0.12 | 0.0 | 0 | 0 | 5 |
183 | 2024-07-02 | 84 | 54 | 69.0 | -3.3 | 0.00 | 0.0 | 0 | 0 | 4 |
184 | 2024-07-03 | 85 | 64 | 74.5 | 2.1 | 0.00 | 0.0 | 0 | 0 | 10 |
185 | 2024-07-04 | 85 | 70 | 77.5 | 4.9 | T | 0.0 | 0 | 0 | 13 |
186 | 2024-07-05 | 87 | 74 | 80.5 | 7.8 | T | 0.0 | 0 | 0 | 16 |
188 | 2024-07-07 | 89 | 68 | 78.5 | 5.6 | 0.00 | 0.0 | 0 | 0 | 14 |
192 | 2024-07-11 | 86 | 75 | 80.5 | 7.2 | 0.30 | 0.0 | 0 | 0 | 16 |
193 | 2024-07-12 | 85 | 73 | 79.0 | 5.7 | 0.00 | 0.0 | 0 | 0 | 14 |
198 | 2024-07-17 | 88 | 68 | 78.0 | 4.5 | 0.53 | 0.0 | 0 | 0 | 13 |
199 | 2024-07-18 | 83 | 65 | 74.0 | 0.5 | T | 0.0 | 0 | 0 | 9 |
200 | 2024-07-19 | 83 | 59 | 71.0 | -2.5 | 0.00 | 0.0 | 0 | 0 | 6 |
201 | 2024-07-20 | 86 | 59 | 72.5 | -1.0 | 0.00 | 0.0 | 0 | 0 | 8 |
202 | 2024-07-21 | 85 | 61 | 73.0 | -0.5 | 0.05 | 0.0 | 0 | 0 | 8 |
203 | 2024-07-22 | 87 | 64 | 75.5 | 2.1 | T | 0.0 | 0 | 0 | 11 |
204 | 2024-07-23 | 83 | 69 | 76.0 | 2.6 | 0.12 | 0.0 | 0 | 0 | 11 |
205 | 2024-07-24 | 83 | 70 | 76.5 | 3.1 | 0.02 | 0.0 | 0 | 0 | 12 |
206 | 2024-07-25 | 83 | 66 | 74.5 | 1.1 | T | 0.0 | 0 | 0 | 10 |
207 | 2024-07-26 | 85 | 61 | 73.0 | -0.3 | 0.00 | 0.0 | 0 | 0 | 8 |
208 | 2024-07-27 | 86 | 59 | 72.5 | -0.8 | 0.00 | 0.0 | 0 | 0 | 8 |
209 | 2024-07-28 | 87 | 62 | 74.5 | 1.3 | 0.00 | 0.0 | 0 | 0 | 10 |
210 | 2024-07-29 | 84 | 69 | 76.5 | 3.3 | 0.01 | 0.0 | 0 | 0 | 12 |
212 | 2024-07-31 | 89 | 74 | 81.5 | 8.5 | 0.65 | 0.0 | 0 | 0 | 17 |
214 | 2024-08-02 | 87 | 68 | 77.5 | 4.6 | 0.72 | 0.0 | 0 | 0 | 13 |
215 | 2024-08-03 | 86 | 73 | 79.5 | 6.7 | 0.48 | 0.0 | 0 | 0 | 15 |
216 | 2024-08-04 | 86 | 73 | 79.5 | 6.7 | 0.11 | 0.0 | 0 | 0 | 15 |
217 | 2024-08-05 | 87 | 67 | 77.0 | 4.3 | 0.02 | 0.0 | 0 | 0 | 12 |
218 | 2024-08-06 | 73 | 61 | 67.0 | -5.6 | 0.63 | 0.0 | 0 | 0 | 2 |
219 | 2024-08-07 | 78 | 61 | 69.5 | -3.0 | 0.18 | 0.0 | 0 | 0 | 5 |
220 | 2024-08-08 | 74 | 59 | 66.5 | -5.9 | 0.64 | 0.0 | 0 | 0 | 2 |
221 | 2024-08-09 | 82 | 67 | 74.5 | 2.2 | 2.91 | 0.0 | 0 | 0 | 10 |
222 | 2024-08-10 | 83 | 62 | 72.5 | 0.3 | 0.00 | 0.0 | 0 | 0 | 8 |
223 | 2024-08-11 | 81 | 56 | 68.5 | -3.6 | 0.07 | 0.0 | 0 | 0 | 4 |
224 | 2024-08-12 | 74 | 55 | 64.5 | -7.5 | T | 0.0 | 0 | 0 | 0 |
225 | 2024-08-13 | 82 | 55 | 68.5 | -3.4 | 0.00 | 0.0 | 0 | 0 | 4 |
226 | 2024-08-14 | 86 | 60 | 73.0 | 1.2 | 0.00 | 0.0 | 0 | 0 | 8 |
227 | 2024-08-15 | 86 | 59 | 72.5 | 0.8 | T | 0.0 | 0 | 0 | 8 |
228 | 2024-08-16 | 86 | 64 | 75.0 | 3.4 | 0.00 | 0.0 | 0 | 0 | 10 |
229 | 2024-08-17 | 80 | 70 | 75.0 | 3.5 | T | 0.0 | 0 | 0 | 10 |
230 | 2024-08-18 | 76 | 67 | 71.5 | 0.2 | 0.28 | 0.0 | 0 | 0 | 7 |
231 | 2024-08-19 | 82 | 59 | 70.5 | -0.7 | 1.89 | 0.0 | 0 | 0 | 6 |
233 | 2024-08-21 | 74 | 52 | 63.0 | -7.9 | T | 0.0 | 0 | 2 | 0 |
234 | 2024-08-22 | 75 | 56 | 65.5 | -5.2 | 0.00 | 0.0 | 0 | 0 | 1 |
235 | 2024-08-23 | 81 | 53 | 67.0 | -3.6 | 0.00 | 0.0 | 0 | 0 | 2 |
236 | 2024-08-24 | 83 | 55 | 69.0 | -1.4 | 0.00 | 0.0 | 0 | 0 | 4 |
237 | 2024-08-25 | 83 | 59 | 71.0 | 0.8 | 0.02 | 0.0 | 0 | 0 | 6 |
238 | 2024-08-26 | 84 | 61 | 72.5 | 2.5 | 0.00 | 0.0 | 0 | 0 | 8 |
239 | 2024-08-27 | 87 | 59 | 73.0 | 3.2 | 0.00 | 0.0 | 0 | 0 | 8 |
240 | 2024-08-28 | 88 | 65 | 76.5 | 6.9 | T | 0.0 | 0 | 0 | 12 |
241 | 2024-08-29 | 73 | 61 | 67.0 | -2.3 | 0.00 | 0.0 | 0 | 0 | 2 |
242 | 2024-08-30 | 77 | 66 | 71.5 | 2.4 | 0.00 | 0.0 | 0 | 0 | 7 |
243 | 2024-08-31 | 76 | 63 | 69.5 | 0.7 | 0.15 | 0.0 | 0 | 0 | 5 |
244 | 2024-09-01 | 87 | 67 | 77.0 | 8.4 | T | 0.0 | 0 | 0 | 12 |
245 | 2024-09-02 | 74 | 51 | 62.5 | -5.8 | 0.00 | 0.0 | 0 | 2 | 0 |
246 | 2024-09-03 | 75 | 45 | 60.0 | -8.0 | 0.00 | 0.0 | 0 | 5 | 0 |
247 | 2024-09-04 | 78 | 50 | 64.0 | -3.7 | 0.00 | 0.0 | 0 | 1 | 0 |
248 | 2024-09-05 | 80 | 53 | 66.5 | -0.9 | 0.00 | 0.0 | 0 | 0 | 2 |
249 | 2024-09-06 | 79 | 49 | 64.0 | -3.1 | 0.00 | 0.0 | 0 | 1 | 0 |
250 | 2024-09-07 | 75 | 54 | 64.5 | -2.3 | 0.53 | 0.0 | 0 | 0 | 0 |
252 | 2024-09-09 | 73 | 48 | 60.5 | -5.6 | 0.04 | 0.0 | 0 | 4 | 0 |
253 | 2024-09-10 | 73 | 51 | 62.0 | -3.8 | 0.00 | 0.0 | 0 | 3 | 0 |
254 | 2024-09-11 | 77 | 47 | 62.0 | -3.4 | 0.00 | 0.0 | 0 | 3 | 0 |
255 | 2024-09-12 | 81 | 51 | 66.0 | 0.9 | 0.00 | 0.0 | 0 | 0 | 1 |
256 | 2024-09-13 | 84 | 59 | 71.5 | 6.8 | 0.00 | 0.0 | 0 | 0 | 7 |
257 | 2024-09-14 | 85 | 59 | 72.0 | 7.7 | 0.00 | 0.0 | 0 | 0 | 7 |
258 | 2024-09-15 | 85 | 59 | 72.0 | 8.1 | 0.00 | 0.0 | 0 | 0 | 7 |
259 | 2024-09-16 | 84 | 57 | 70.5 | 7.0 | 0.00 | 0.0 | 0 | 0 | 6 |
260 | 2024-09-17 | 80 | 54 | 67.0 | 3.9 | 0.00 | 0.0 | 0 | 0 | 2 |
261 | 2024-09-18 | 79 | 54 | 66.5 | 3.8 | 0.00 | 0.0 | 0 | 0 | 2 |
262 | 2024-09-19 | 85 | 56 | 70.5 | 8.2 | 0.00 | 0.0 | 0 | 0 | 6 |
263 | 2024-09-20 | 80 | 57 | 68.5 | 6.6 | 0.00 | 0.0 | 0 | 0 | 4 |
264 | 2024-09-21 | 79 | 57 | 68.0 | 6.5 | 0.00 | 0.0 | 0 | 0 | 3 |
265 | 2024-09-22 | 76 | 53 | 64.5 | 3.5 | 0.00 | 0.0 | 0 | 0 | 0 |
266 | 2024-09-23 | 70 | 55 | 62.5 | 1.9 | 0.00 | 0.0 | 0 | 2 | 0 |
267 | 2024-09-24 | 71 | 47 | 59.0 | -1.2 | 0.00 | 0.0 | 0 | 6 | 0 |
270 | 2024-09-27 | 76 | 59 | 67.5 | 8.6 | 0.00 | 0.0 | 0 | 0 | 3 |
271 | 2024-09-28 | 80 | 58 | 69.0 | 10.5 | 0.00 | 0.0 | 0 | 0 | 4 |
272 | 2024-09-29 | 72 | 56 | 64.0 | 6.0 | 0.00 | 0.0 | 0 | 1 | 0 |
273 | 2024-09-30 | 77 | 55 | 66.0 | 8.4 | 0.00 | 0.0 | 0 | 0 | 1 |
274 | 2024-10-01 | 74 | 54 | 64.0 | 6.8 | 0.00 | 0.0 | 0 | 1 | 0 |
276 | 2024-10-03 | 74 | 54 | 64.0 | 7.6 | 0.00 | 0.0 | 0 | 1 | 0 |
277 | 2024-10-04 | 74 | 49 | 61.5 | 5.6 | 0.02 | 0.0 | 0 | 3 | 0 |
278 | 2024-10-05 | 71 | 49 | 60.0 | 4.5 | 0.02 | 0.0 | 0 | 5 | 0 |
279 | 2024-10-06 | 72 | 41 | 56.5 | 1.4 | 0.00 | 0.0 | 0 | 8 | 0 |
280 | 2024-10-07 | 70 | 49 | 59.5 | 4.8 | T | 0.0 | 0 | 5 | 0 |
292 | 2024-10-19 | 71 | 34 | 52.5 | 2.4 | 0.00 | 0.0 | 0 | 12 | 0 |
293 | 2024-10-20 | 70 | 33 | 51.5 | 1.7 | 0.00 | 0.0 | 0 | 13 | 0 |
294 | 2024-10-21 | 80 | 42 | 61.0 | 11.6 | 0.00 | 0.0 | 0 | 4 | 0 |
295 | 2024-10-22 | 80 | 47 | 63.5 | 14.4 | 0.00 | 0.0 | 0 | 1 | 0 |
296 | 2024-10-23 | 78 | 55 | 66.5 | 17.8 | T | 0.0 | 0 | 0 | 2 |
303 | 2024-10-30 | 74 | 54 | 64.0 | 17.7 | T | 0.0 | 0 | 1 | 0 |
304 | 2024-10-31 | 79 | 55 | 67.0 | 21.1 | 0.00 | 0.0 | 0 | 0 | 2 |
305 | 2024-11-01 | 74 | 48 | 61.0 | 15.4 | 0.02 | 0.0 | 0 | 4 | 0 |
309 | 2024-11-05 | 72 | 52 | 62.0 | 17.8 | 0.00 | 0.0 | 0 | 3 | 0 |
310 | 2024-11-06 | 77 | 53 | 65.0 | 21.2 | T | 0.0 | 0 | 0 | 0 |
Now let’s visualize the temperature trace over the year! Pandas has a method that directly calls Matplotlib’s plotting package.#
maxT.plot()
<Axes: >

minT.plot()
<Axes: >

The data plotted fine, but the look could be better. First, let’s import a package, seaborn
, that when imported and set
using its own method, makes matplotlib’s graphs look better.#
Info on seaborn: https://seaborn.pydata.org/index.html
import seaborn as sns
sns.set()
maxT.plot()
<Axes: >

minT.plot()
<Axes: >

Next, let’s plot the two traces simultaneously on the graph so we can better discern max and min temps (this will also enure a single y-axis that will encompass the range of temperature values). We’ll also add some helpful labels and expand the size of the figure.#
You will notice that this graphic took some time to render. Note that the x-axis label is virtually unreadable. This is because every date is being printed!#
fig, ax = plt.subplots(figsize=(15,10))
ax.plot (Date, maxT, color='red')
ax.plot (Date, minT, color='blue')
ax.set_title (f"ALB Year {year}")
ax.set_xlabel('Day of Year')
ax.set_ylabel('Temperature (°F')
Text(0, 0.5, 'Temperature (°F')

We will deal with this by using one of Pandas’ methods that take strings and convert them to a special type of data … not strings nor numbers, but datetime objects. Note carefully how we do this here … it is not terribly intuitive, but we’ll explain it more in an upcoming lecture/notebook on datetime
. You will see though that the output column now looks a bit more date-like, with a four-digit year followed by two-digit month and date.#
Date = pd.to_datetime(Date,format="%Y-%m-%d")
Date
0 2024-01-01
1 2024-01-02
2 2024-01-03
3 2024-01-04
4 2024-01-05
5 2024-01-06
6 2024-01-07
7 2024-01-08
8 2024-01-09
9 2024-01-10
10 2024-01-11
11 2024-01-12
12 2024-01-13
13 2024-01-14
14 2024-01-15
15 2024-01-16
16 2024-01-17
17 2024-01-18
18 2024-01-19
19 2024-01-20
20 2024-01-21
21 2024-01-22
22 2024-01-23
23 2024-01-24
24 2024-01-25
25 2024-01-26
26 2024-01-27
27 2024-01-28
28 2024-01-29
29 2024-01-30
30 2024-01-31
31 2024-02-01
32 2024-02-02
33 2024-02-03
34 2024-02-04
35 2024-02-05
36 2024-02-06
37 2024-02-07
38 2024-02-08
39 2024-02-09
40 2024-02-10
41 2024-02-11
42 2024-02-12
43 2024-02-13
44 2024-02-14
45 2024-02-15
46 2024-02-16
47 2024-02-17
48 2024-02-18
49 2024-02-19
50 2024-02-20
51 2024-02-21
52 2024-02-22
53 2024-02-23
54 2024-02-24
55 2024-02-25
56 2024-02-26
57 2024-02-27
58 2024-02-28
59 2024-02-29
60 2024-03-01
61 2024-03-02
62 2024-03-03
63 2024-03-04
64 2024-03-05
65 2024-03-06
66 2024-03-07
67 2024-03-08
68 2024-03-09
69 2024-03-10
70 2024-03-11
71 2024-03-12
72 2024-03-13
73 2024-03-14
74 2024-03-15
75 2024-03-16
76 2024-03-17
77 2024-03-18
78 2024-03-19
79 2024-03-20
80 2024-03-21
81 2024-03-22
82 2024-03-23
83 2024-03-24
84 2024-03-25
85 2024-03-26
86 2024-03-27
87 2024-03-28
88 2024-03-29
89 2024-03-30
90 2024-03-31
91 2024-04-01
92 2024-04-02
93 2024-04-03
94 2024-04-04
95 2024-04-05
96 2024-04-06
97 2024-04-07
98 2024-04-08
99 2024-04-09
100 2024-04-10
101 2024-04-11
102 2024-04-12
103 2024-04-13
104 2024-04-14
105 2024-04-15
106 2024-04-16
107 2024-04-17
108 2024-04-18
109 2024-04-19
110 2024-04-20
111 2024-04-21
112 2024-04-22
113 2024-04-23
114 2024-04-24
115 2024-04-25
116 2024-04-26
117 2024-04-27
118 2024-04-28
119 2024-04-29
120 2024-04-30
121 2024-05-01
122 2024-05-02
123 2024-05-03
124 2024-05-04
125 2024-05-05
126 2024-05-06
127 2024-05-07
128 2024-05-08
129 2024-05-09
130 2024-05-10
131 2024-05-11
132 2024-05-12
133 2024-05-13
134 2024-05-14
135 2024-05-15
136 2024-05-16
137 2024-05-17
138 2024-05-18
139 2024-05-19
140 2024-05-20
141 2024-05-21
142 2024-05-22
143 2024-05-23
144 2024-05-24
145 2024-05-25
146 2024-05-26
147 2024-05-27
148 2024-05-28
149 2024-05-29
150 2024-05-30
151 2024-05-31
152 2024-06-01
153 2024-06-02
154 2024-06-03
155 2024-06-04
156 2024-06-05
157 2024-06-06
158 2024-06-07
159 2024-06-08
160 2024-06-09
161 2024-06-10
162 2024-06-11
163 2024-06-12
164 2024-06-13
165 2024-06-14
166 2024-06-15
167 2024-06-16
168 2024-06-17
169 2024-06-18
170 2024-06-19
171 2024-06-20
172 2024-06-21
173 2024-06-22
174 2024-06-23
175 2024-06-24
176 2024-06-25
177 2024-06-26
178 2024-06-27
179 2024-06-28
180 2024-06-29
181 2024-06-30
182 2024-07-01
183 2024-07-02
184 2024-07-03
185 2024-07-04
186 2024-07-05
187 2024-07-06
188 2024-07-07
189 2024-07-08
190 2024-07-09
191 2024-07-10
192 2024-07-11
193 2024-07-12
194 2024-07-13
195 2024-07-14
196 2024-07-15
197 2024-07-16
198 2024-07-17
199 2024-07-18
200 2024-07-19
201 2024-07-20
202 2024-07-21
203 2024-07-22
204 2024-07-23
205 2024-07-24
206 2024-07-25
207 2024-07-26
208 2024-07-27
209 2024-07-28
210 2024-07-29
211 2024-07-30
212 2024-07-31
213 2024-08-01
214 2024-08-02
215 2024-08-03
216 2024-08-04
217 2024-08-05
218 2024-08-06
219 2024-08-07
220 2024-08-08
221 2024-08-09
222 2024-08-10
223 2024-08-11
224 2024-08-12
225 2024-08-13
226 2024-08-14
227 2024-08-15
228 2024-08-16
229 2024-08-17
230 2024-08-18
231 2024-08-19
232 2024-08-20
233 2024-08-21
234 2024-08-22
235 2024-08-23
236 2024-08-24
237 2024-08-25
238 2024-08-26
239 2024-08-27
240 2024-08-28
241 2024-08-29
242 2024-08-30
243 2024-08-31
244 2024-09-01
245 2024-09-02
246 2024-09-03
247 2024-09-04
248 2024-09-05
249 2024-09-06
250 2024-09-07
251 2024-09-08
252 2024-09-09
253 2024-09-10
254 2024-09-11
255 2024-09-12
256 2024-09-13
257 2024-09-14
258 2024-09-15
259 2024-09-16
260 2024-09-17
261 2024-09-18
262 2024-09-19
263 2024-09-20
264 2024-09-21
265 2024-09-22
266 2024-09-23
267 2024-09-24
268 2024-09-25
269 2024-09-26
270 2024-09-27
271 2024-09-28
272 2024-09-29
273 2024-09-30
274 2024-10-01
275 2024-10-02
276 2024-10-03
277 2024-10-04
278 2024-10-05
279 2024-10-06
280 2024-10-07
281 2024-10-08
282 2024-10-09
283 2024-10-10
284 2024-10-11
285 2024-10-12
286 2024-10-13
287 2024-10-14
288 2024-10-15
289 2024-10-16
290 2024-10-17
291 2024-10-18
292 2024-10-19
293 2024-10-20
294 2024-10-21
295 2024-10-22
296 2024-10-23
297 2024-10-24
298 2024-10-25
299 2024-10-26
300 2024-10-27
301 2024-10-28
302 2024-10-29
303 2024-10-30
304 2024-10-31
305 2024-11-01
306 2024-11-02
307 2024-11-03
308 2024-11-04
309 2024-11-05
310 2024-11-06
311 2024-11-07
312 2024-11-08
313 2024-11-09
314 2024-11-10
315 2024-11-11
316 2024-11-12
317 2024-11-13
318 2024-11-14
319 2024-11-15
320 2024-11-16
321 2024-11-17
322 2024-11-18
323 2024-11-19
324 2024-11-20
325 2024-11-21
326 2024-11-22
327 2024-11-23
328 2024-11-24
329 2024-11-25
330 2024-11-26
331 2024-11-27
332 2024-11-28
333 2024-11-29
334 2024-11-30
335 2024-12-01
336 2024-12-02
337 2024-12-03
338 2024-12-04
339 2024-12-05
340 2024-12-06
341 2024-12-07
342 2024-12-08
343 2024-12-09
344 2024-12-10
345 2024-12-11
346 2024-12-12
347 2024-12-13
348 2024-12-14
349 2024-12-15
350 2024-12-16
351 2024-12-17
352 2024-12-18
353 2024-12-19
354 2024-12-20
355 2024-12-21
356 2024-12-22
357 2024-12-23
358 2024-12-24
359 2024-12-25
360 2024-12-26
361 2024-12-27
362 2024-12-28
363 2024-12-29
364 2024-12-30
365 2024-12-31
Name: DATE, dtype: datetime64[ns]
We’ll further refine the look of the plot by adding a legend and have vertical grid lines on a frequency of one month.#
from matplotlib.dates import DateFormatter, AutoDateLocator,HourLocator,DayLocator,MonthLocator
fig, ax = plt.subplots(figsize=(15,10))
ax.plot (Date, maxT, color='red',label = "Max T")
ax.plot (Date, minT, color='blue', label = "Min T")
ax.set_title (f"ALB Year {year}")
ax.set_xlabel('Date')
ax.set_ylabel('Temperature (°F)' )
ax.xaxis.set_major_locator(MonthLocator(interval=1))
dateFmt = DateFormatter('%b %d')
ax.xaxis.set_major_formatter(dateFmt)
ax.legend (loc="best")
<matplotlib.legend.Legend at 0x146ec84bf020>

Let’s save our beautiful graphic to disk.#
fig.savefig (f'albTemps{year}.png')
Now, let’s answer the question, “what was the most common range of maximum temperatures last year in Albany?” via a histogram. We use matplotlib
’s hist
method.#
# %load '/spare11/atm350/common/mar04/01b.py'
# Create a figure and size it.
fig, ax = plt.subplots(figsize=(15,10))
# Create a histogram of our data series and divide it in to 10 bins.
ax.hist(maxT, bins=10, color='k', alpha=0.3)
(array([ 2., 9., 31., 42., 49., 40., 35., 57., 79., 22.]),
array([11. , 19.5, 28. , 36.5, 45. , 53.5, 62. , 70.5, 79. , 87.5, 96. ]),
<BarContainer object of 10 artists>)

Ok, but the 10 bins were autoselected. Let’s customize our call to the hist
method by specifying the bounds of each of our bins.#
How can we learn more about how to customize this call? Append a ?
to the name of the method.#
ax.hist?
Revise the call to ax.hist
, and also draw tick marks that align with the bounds of the histogram’s bins.#
# %load '/spare11/atm350/common/mar04/01c.py'
fig, ax = plt.subplots(figsize=(15,10))
ax.hist(maxT, bins=(0,10,20,30,40,50,60,70,80,90,100), color='k', alpha=0.3)
ax.xaxis.set_major_locator(plt.MultipleLocator(10))

Save this histogram to disk.#
fig.savefig("maxT_hist.png")
Use the describe
method on the maximum temperature series to reveal some simple statistical properties.
maxT.describe()
count 366.000000
mean 62.338799
std 19.533743
min 11.000000
25% 46.000000
50% 65.000000
75% 79.750000
max 96.000000
Name: MAX, dtype: float64