
# Pandas: Working with a JSON file

---

## Overview
In this notebook, we will create a [Pandas Dataframe](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) from a remotely-served [JSON](https://www.json.org/) file. This particular file contains forecasted [solar wind](https://www.swpc.noaa.gov/phenomena/solar-wind) parameters from NOAA's [Space Weather Prediction Center](https://www.swpc.noaa.gov).

1. Read in a JSON file
1. Reformat the `Dataframe`
1. Visualize the dataset

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Pandas](https://foundations.projectpythia.org/core/pandas/pandas.html) | Necessary | |

- **Time to learn**: 10 minutes


---

## Imports

In [None]:
import pandas as pd

## Read in a JSON file

### NOAA's SWPC has a variety of forecast output in JSON format. Here, we create a `Dataframe` Pandas' `read_json` method from the current 1-day plasma forecast.

In [None]:
df = pd.read_json("https://services.swpc.noaa.gov/products/solar-wind/plasma-1-day.json")

Examine the `Dataframe`

In [None]:
df

## Reformat the `Dataframe`

Notice that the column headers look to be in the `Dataframe`'s first row. Let's modify it.

In [None]:
# Set the columns to be the values of the first row. Then drop that first row.
df = df.rename(columns=df.iloc[0]).drop(df.index[0])

Examine the reformatted `Dataframe`

In [None]:
df

### Set the `Dataframe`'s index to the timestamped column.

In [None]:
df.index

Currently, the `Dataframe` has a *default index* (i.e., a range of integers). For time series data (i.e., time is the independent variable), it is [good practice](https://pandas.pydata.org/docs/user_guide/timeseries.html) to use a time-based column as the index.

In [None]:
df = df.set_index('time_tag')

In [None]:
df

### Check and edit the `dtypes` of the independent and dependent variables

In this case, the `Dataframe`'s index corresponds to the independent variable, and the columns correspond to the dependent variables.

In [None]:
df.index

In [None]:
df.dtypes

They are all `object`s ... and as a result won't be amenable to typical time-series visualization methods. Change them to more appropriate `dtype`s ... `float32` for the dependent variables, and `datetime64` for the time-based one.

In [None]:
for col in df.columns:
    df[col] = df[col].astype("float32")
df.index = pd.to_datetime(df.index)

In [None]:
df.index = pd.to_datetime(df.index)

## Visualize the dataset

In [None]:
df.temperature.plot(figsize=(10, 8));

---

## Summary
Pandas has several reader functions for differently-formatted tabular datasets. In this notebook, we created a `Dataframe` via Pandas `read_json` function, and then manipulated the `Dataframe` to allow for a useful time-series visualization.

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Note:</p>
    There is no strict format specification for JSON files. The strategy we followed to create and reformat the <code>Dataframe</code> in this notebook will likely need to change for other JSON-formatted datasets you may encounter!
</div>

### What's next?
Future [Project Pythia Foundations](https://foundations.projectpythia.org) Pandas notebooks will explore additional file format-specific reader methods.

## Resources and references
1. [pandas](https://pandas.pydata.org)
1. [JSON](https://json.io)
1. [NOAA SWPC](https://www.swpc.noaa.gov)