{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6e124235-3846-4fcb-b533-10fa5856b4b4",
   "metadata": {},
   "source": [
    "\n",
    "# Pandas: Working with a JSON file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e75bb80-da84-47a9-ae2d-210eb06d492e",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d226e98-85a1-4197-a757-448bd2bf4563",
   "metadata": {},
   "source": [
    "## Overview\n",
    "In this notebook, we will create a [Pandas Dataframe](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) from a remotely-served [JSON](https://www.json.org/) file. This particular file contains forecasted [solar wind](https://www.swpc.noaa.gov/phenomena/solar-wind) parameters from NOAA's [Space Weather Prediction Center](https://www.swpc.noaa.gov).\n",
    "\n",
    "1. Read in a JSON file\n",
    "1. Reformat the `Dataframe`\n",
    "1. Visualize the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "daeabc7d-d4f6-4f9f-aad7-679123b9d2fb",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "| Concepts | Importance | Notes |\n",
    "| --- | --- | --- |\n",
    "| [Pandas](https://foundations.projectpythia.org/core/pandas/pandas.html) | Necessary | |\n",
    "\n",
    "- **Time to learn**: 10 minutes\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b875611-44ef-4b1f-9453-f2fa84bb4d82",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c732c9d1-0e00-4d9b-8e29-73ce71c99499",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d48491f-4332-4eff-af72-0382c6c5794a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "690e69eb-5ec0-458d-962f-a25cdf8983dc",
   "metadata": {},
   "source": [
    "## Read in a JSON file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630ace3c-7e81-45a8-a884-a045d7afbae6",
   "metadata": {},
   "source": [
    "### NOAA's SWPC has a variety of forecast output in JSON format. Here, we create a `Dataframe` Pandas' `read_json` method from the current 1-day plasma forecast."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3be45f7e-8f1b-4645-8eff-6d7d0c6976a4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df = pd.read_json(\"https://services.swpc.noaa.gov/products/solar-wind/plasma-1-day.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2128b21f-7d34-45f6-b08f-23287a7761ea",
   "metadata": {},
   "source": [
    "Examine the `Dataframe`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "deea1ad0-78ef-4b47-8ed4-951b2f8e2a5b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b34e2ce-19a9-46b6-a346-da9cc814a8a6",
   "metadata": {},
   "source": [
    "## Reformat the `Dataframe`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61ff9856-2a64-4606-8c13-4cc1bbe6f384",
   "metadata": {},
   "source": [
    "Notice that the column headers look to be in the `Dataframe`'s first row. Let's modify it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d198bce-c48e-4c77-9785-ab26bfaac669",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Set the columns to be the values of the first row. Then drop that first row.\n",
    "df = df.rename(columns=df.iloc[0]).drop(df.index[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3c8f341-a5f6-46f0-ac1d-2d1feeb4631d",
   "metadata": {},
   "source": [
    "Examine the reformatted `Dataframe`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a457cb3-6203-45d5-9467-c8c5ffc07e52",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43bfed63-0fa2-4d3a-8a0b-162386dace6f",
   "metadata": {},
   "source": [
    "### Set the `Dataframe`'s index to the timestamped column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e1f2933-6193-4f2d-b012-7c5f18a3999b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b3ade9e-eebd-49d0-a321-624939d42d7d",
   "metadata": {},
   "source": [
    "Currently, the `Dataframe` has a *default index* (i.e., a range of integers). For time series data (i.e., time is the independent variable), it is [good practice](https://pandas.pydata.org/docs/user_guide/timeseries.html) to use a time-based column as the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83cdc00a-2a38-4ef7-ac33-c2f6875c2dc3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df = df.set_index('time_tag')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9abc6605-9676-48ef-a9c5-57ef13b21eed",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4e6843-638b-4801-af35-9ddbc191378a",
   "metadata": {},
   "source": [
    "### Check and edit the `dtypes` of the independent and dependent variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fc15d05-b3cb-4575-a4c5-c25123613bf7",
   "metadata": {},
   "source": [
    "In this case, the `Dataframe`'s index corresponds to the independent variable, and the columns correspond to the dependent variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8341c2d-7d4b-4960-93e6-b90670aab9a3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e165dc3-1a51-406d-9b31-b92a16721a1d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90c69195-8111-4c0f-bb6f-17aa036b7147",
   "metadata": {
    "tags": []
   },
   "source": [
    "They are all `object`s ... and as a result won't be amenable to typical time-series visualization methods. Change them to more appropriate `dtype`s ... `float32` for the dependent variables, and `datetime64` for the time-based one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32beea01-3390-4eb6-9d62-090ad70ba2a9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for col in df.columns:\n",
    "    df[col] = df[col].astype(\"float32\")\n",
    "df.index = pd.to_datetime(df.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eba917fd-6167-4726-ae0f-79c064d4ef72",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.index = pd.to_datetime(df.index)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38611e12-3f66-4e39-ac83-2021df2bf63e",
   "metadata": {},
   "source": [
    "## Visualize the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "634749f1-f613-4e90-aff8-6efdd038d251",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.temperature.plot(figsize=(10, 8));"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d085493-0e98-4e6f-a190-a948bd44f53a",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a43f367-1bdf-46b6-b253-f40ff965ff02",
   "metadata": {},
   "source": [
    "## Summary\n",
    "Pandas has several reader functions for differently-formatted tabular datasets. In this notebook, we created a `Dataframe` via Pandas `read_json` function, and then manipulated the `Dataframe` to allow for a useful time-series visualization."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "419af5fe-4cfc-466a-a4b0-1df66ddae8f8",
   "metadata": {},
   "source": [
    "<div class=\"admonition alert alert-warning\">\n",
    "    <p class=\"admonition-title\" style=\"font-weight:bold\">Note:</p>\n",
    "    There is no strict format specification for JSON files. The strategy we followed to create and reformat the <code>Dataframe</code> in this notebook will likely need to change for other JSON-formatted datasets you may encounter!\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62c4be7e-84a8-497d-a799-a74e5201c567",
   "metadata": {},
   "source": [
    "### What's next?\n",
    "Future [Project Pythia Foundations](https://foundations.projectpythia.org) Pandas notebooks will explore additional file format-specific reader methods."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd8c3c60-d703-4b22-b796-8c385130cfa2",
   "metadata": {},
   "source": [
    "## Resources and references\n",
    "1. [pandas](https://pandas.pydata.org)\n",
    "1. [JSON](https://json.io)\n",
    "1. [NOAA SWPC](https://www.swpc.noaa.gov)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 August 2023 Environment",
   "language": "python",
   "name": "aug23"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}