Assignment One Python Primer

Brian Tang | ATM 413 | January 27, 2020

This primer will guide you through example tasks to help you analyze and visualize the data in Assignment 1. I will assume you know basic Python syntax and the functionality of Jupyter Notebooks, e.g., how to add cells, run cells, move cells around, etc. If you do not, please see either the TA or me.

Loading in libraries

We first need to load in libraries to call data analysis and visualization functions from. The two libraries that we will need for this assignment are numpy and matplotlib. We will use 'np' and 'mpl' as abbreviations for these two libraries, but you can change the abbreviations to be whatever you want.

In [1]:
import numpy as np
import matplotlib as mpl

If we use a sublibrary or function frequently from a parent library, it's convenient to import just that sublibrary or function. For example,

In [2]:
import matplotlib.pyplot as plt

imports the main plotting sublibrary from matplotlib and abbreviates this sublibrary as 'plt'.

Finally, we want our plots to show up in the notebook (inline), so the following command will do just that.

In [3]:
%matplotlib inline

Inputting your data

The first task is to input your data. If you do not have a lot of data, then you can input the numbers manually.

I am going to create some fake data: three variables with five elements for each variable. For each variable, I declare a numpy array and enter my elements. Let's say my data consists of day of the month, daily high temperature (degrees F), and number of people I observe wearing scarves over the course of the day.

In [4]:
#day of the month
day = np.array([1, 2, 3, 4, 5])

#daily high temperature (degrees F)
temperature = np.array([30, 25, 28, 43, 31])

#number of people wearing scarves
scarves = np.array([56, 78, 82, 15, 43])

You can use the print command to see the data that you've inputted. For example,

In [5]:
print(temperature)
[30 25 28 43 31]

Note: What if you had thousands of elements and hundreds of variables? It would not be possible to enter the data in manually, and the data would likely be contained in a file. A useful library to read in a variety of data files is Pandas, but you won't need to use it here.

Declare numpy arrays with elements from your assignment data. Note that you can enter numbers with exponential notation using 'E' (e.g., $2.00 \times 10^9$ as 2.00E9).

Visualizing your data: Scatterplots

A common way to visualize data to assess relationships between variables (e.g., daily high temperature versus number of people wearing scarves), is to generate a scatterplot. The general recipe for creating any plot is

  • Create your plotting handle
  • Generate your plot
  • Alter the aesthetics of your plot to your liking (e.g., axes labels, title, font sizes, legends, etc.)
  • Show the plot

We will first generate a very basic scatterplot with three lines of code.

In [6]:
#create plotting handle
plt.figure(figsize=(8,6)) #you can alter these numbers to change the size of your plot (width, height)

#generate scatter plot
plt.scatter(temperature,scarves) #I want temperature on the x-axis and scarves on the y-axis

#show plot
plt.show()

This works for a quick glance of the data, and matplotlib has made no-frills, default choices about the plot presentation for you. If you're presenting this plot, you will likely want to spruce it up. We can add additional arguments to the scatter function and additional pyplot functions to change the aesthetics of the plot. Let's create a spruced up scatterplot with bigger symbols, axes labels, different axes limits, and bigger font sizes.

In [7]:
#create plotting handle
plt.figure(figsize=(8,6))

#generate scatter plot
plt.scatter(temperature,scarves,s=150) #added 's=150' as a third argument, which specifies the marker size that I want

#specify x-axis to go from 0 F to 50 F
plt.xlim(0,50)

#label x-axis and set fontsize to 14
plt.xlabel('Temperature (degrees F)',fontsize=14)

#change fontsize of x-axis numbers to 14
plt.xticks(fontsize=14)

#specify y-axis to go from 0 to 100 people
plt.ylim(0,100)

#label y-axis and set fontsize to 14
plt.ylabel('Num. people wearing scarves',fontsize=14)

#change fontsize of y-axis numbers to 14
plt.yticks(fontsize=14)

#show plot
plt.show()

Let's add another piece of information to the plot. Say I want to label each point with the day of the month that the point corresponds to. We have to loop through our array elements to label each point one-by-one using the text function.

In [8]:
#create plotting handle
plt.figure(figsize=(8,6))

#generate scatter plot
plt.scatter(temperature,scarves,s=150)

#label points with the day of the month
for n in range(np.size(day)):
    plt.text(temperature[n],scarves[n],day[n],fontsize=14)
    
#everything else is the same as before
plt.xlim(0,50)
plt.xlabel('Temperature (degrees F)',fontsize=14)
plt.xticks(fontsize=14)
plt.ylim(0,100)
plt.ylabel('Num. people wearing scarves',fontsize=14)
plt.yticks(fontsize=14)
plt.show()

In the for loop above, an index n increments from 0 to one less than the size of the day array (4), i.e., n = 0, 1, 2, 3, and 4. The text function writes day[n] at the point (temperature[n], scarves[n]). For example, for n=0 (the first element), '1' is written at the point (30, 56).

Make scatterplots with your assignment data. To help you answer questions 4 and 5, you might try to label your points with the corresponding PDSI, or you might look into the **scatter** function for options on coloring and/or sizing your points to scale with the PDSI.