{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Notebook 1, ATM350 Spring 2024\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we read in a text file that has climatological data compiled at the National Weather Service in Albany NY for 2023, previously downloaded and reformatted from the [xmACIS2](https://xmacis.rcc-acis.org) climate data portal.\n", "\n", "We will use the Pandas library to read and analyze the data. We will also use the Matplotlib package to visualize it." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Motivating Science Questions:\n", "1. How can we analyze and display *tabular climate data* for a site?\n", "2. What was the yearly trace of max/min temperatures for Albany, NY last year?\n", "3. What was the most common 10-degree maximum temperature range for Albany, NY last year?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# import Pandas and Numpy, and use their conventional two-letter abbreviations when we\n", "# use methods from these packages. Also, import matplotlib's plotting package, using its \n", "# standard abbreviation.\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify the location of the file that contains the climo data. Use the linux ls command to verify it exists. \n", "#### Note that in a Jupyter notebook, we can simply use the ! directive to \"call\" a Linux command. \n", "#### Also notice how we refer to a Python variable name when passing it to a Linux command line in this way ... we enclose it in braces!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file = '/spare11/atm350/common/data/climo_alb_2023.csv'\n", "! ls -l {file}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use pandas' `read_csv` method to open the file. Specify that the data is to be read in as strings (not integers nor floating points).\n", "### Once this call succeeds, it returns a Pandas Dataframe object which we reference as `df`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(file, dtype='string')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By simply typing the name of the dataframe object, we can get some of its contents to be \"pretty-printed\" to the notebook!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Our dataframe has 365 rows (corresponding to all the days in the year 2023) and 10 columns that contain data. This is expressed by calling the `shape` attribute of the dataframe. The first number in the pair is the # of rows, while the second is the # of columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### It will be useful to have a variable (more accurately, an object ) that holds the value of the number of rows, and another for the number of columns.\n", "#### Remember that Python is a language that uses zero-based indexing, so the first value is accessed as element 0, and the second as element 1!\n", "#### Look at the syntax we use below to print out the (integer) value of nRows ... it's another example of **string formating**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nRows = df.shape[0]\n", "print (\"Number of rows = %d\" % nRows )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's do the same for the # of columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nCols = df.shape[1]\n", "print (\"Number of columns = %d\" % nCols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### To access the values in a particular column, we reference it with its column name as a string. The next cell pulls in all values of the year-month-date column, and assigns it to an object of the same name. We could have named the object anything we wanted, not just **Date** ... but on the right side of the assignment statement, we have to use the exact name of the column. \n", "\n", "Print out what this object looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Date = df['DATE']\n", "print (Date)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Each column of a Pandas dataframe is known as a series. It is basically an array of values, each of which has a corresponding row #. By default, row #'s accompanying a Series are numbered consecutively, starting with 0 (since Python's convention is to use zero-based indexing )." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We can reference a particular value, or set of values, of a Series by using array-based notation. Below, let's print out the first 30 rows of the dates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print (Date[:30])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Similarly, let's print out the last, or 364th row (Why is it 364, not 365???)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(Date[364])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that using -1 as the last index doesn't work!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(Date[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, using a negative value as part of a *slice* does work:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(Date[-9:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EXERCISE: Now, let's create new Series objects; one for Max Temp (name it *maxT*), and the other for Min Temp (name it *minT*)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "