{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Notebook 2, ATM350 Spring 2023 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivating Science Questions:\n", "1. What was the daily temperature and precipitation at Albany last year?\n", "2. What were the the days with the most precipitation?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivating Technical Question:\n", "1. How can we use Pandas to do some basic statistical analyses of our data?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We'll start by repeating some of the same steps we did in the first Pandas notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sns.set()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file = '/spare11/atm350/common/data/climo_alb_2022.csv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the first five lines of this file using Python's built-in `readline` function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fileObj = open(file)\n", "nLines = 5\n", "for n in range(nLines):\n", " line = fileObj.readline()\n", " print(line)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(file, dtype='string')\n", "\n", "nRows = df.shape[0]\n", "print (\"Number of rows = %d\" % nRows )\n", "nCols = df.shape[1]\n", "print (\"Number of columns = %d\" % nCols)\n", "\n", "date = df['DATE']\n", "date = pd.to_datetime(date,format=\"%Y-%m-%d\")\n", "\n", "maxT = df['MAX'].astype(\"float32\")\n", "minT = df['MIN'].astype(\"float32\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's generate the final timeseries we made in our first Pandas notebook, with all the \"bells and whistles\" included." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib.dates import DateFormatter, AutoDateLocator,HourLocator,DayLocator,MonthLocator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the year so we don't have to edit the string labels every year!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "year = 2022" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(15,10))\n", "ax.plot (date, maxT, color='red',label = \"Max T\")\n", "ax.plot (date, minT, color='blue', label = \"Min T\")\n", "ax.set_title (\"ALB Year %d\" % year)\n", "ax.set_xlabel('Date')\n", "ax.set_ylabel('Temperature ($^\\circ$F)' )\n", "ax.xaxis.set_major_locator(MonthLocator(interval=1))\n", "dateFmt = DateFormatter('%b %d')\n", "ax.xaxis.set_major_formatter(dateFmt)\n", "ax.legend (loc=\"best\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in precip data. This will be more challenging due to the presence of T(races).\n", "Let's remind ourselves what the `Dataframe` looks like, paying particular attention to the daily precip column (**PCP**)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
DataSeries
called precip
and populate it with the requisite column from our Dataframe
. Then print out its values.shape
attribute.