code,  covid-19

Local Covid-19 Dashboard

When I need a Data Science tool beyond vanilla SQL I usually reach for RStudio. There is no denying that R is fast and painless for doing statistical analysis, but it’s quirky and falls far out of the norms of coding and that’s always bugged me. So I’ve been debating making a switch.

Lately, I’ve been experimenting with Python programming. Python is a very popular general-purpose programming language that is used in Data Science, but it’s also used to make apps, websites, and services. I’m particularly interested in using this language along with AWS services to publish small web APIs and things like that. But, I need to learn the language first.

The best way to learn a new programming language is to give yourself a project. These days I’ve been obsessed with the Covid-19 pandemic numbers because I am watching these closely to inform our decisions about sending my kid back to school. So I decided to code a Covid-19 dashboard in Python.

To use Python as a Data Science tool you will need to install Anaconda. This will install Python itself, the core Data Science packages like Pandas, and it will install IDE and dashboard solutions. To code the Covid-19 dashboard, I used a tool called Jupyter Notebook. Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.

How I’m Coding the Covid-19 Dashboard

The key to building this dashboard is the availability of web APIs that publish Covid-19 data in a format that is easy to keep up to date. We have APIs for the entire US and each individual state as well as each county. Data for the US and states is very comprehensive, but the data we are getting back at the county level is sparse.

So here is an example of how you can use Python and Pandas to grab data from the API.

import pandas as pd
import requests
from IPython.display import display, HTML
from pandas import DataFrame

us = requests.get('https://covidtracking.com/api/v1/us/daily.json')
us = us.text
us = pd.read_json(us)
us

Here is what the output of that code would look like:

Covid-19 Rates (United States)

You get so much data back that it is hard to look at. We can only see about a third of the columns in this data frame. I want to see a subset of this data and I also want to derive so stats like positivity and death rates. Since I know that I need this for the entire US and nearby states like PA and NJ I decided to write a function that cleans and adds to the data.

def clean_covidtracking_api_data(df):
    
    # get subset
    df = df[['date','positiveIncrease','totalTestResultsIncrease'
           ,'hospitalizedCurrently','death','deathIncrease','positive']]
    
    # Rename dataset columns
    df = df.rename(columns={'positiveIncrease':'positive'
                              ,'totalTestResultsIncrease': 'tests'
                              ,'hospitalizedCurrently': 'hospitalized'
                              ,'death': 'death_total'
                              ,'deathIncrease': 'death'
                              ,'positive': 'positive_total'})
    
    # Harmonize date column
    df['date'] = df['date'].astype(str)
    df['date'] = pd.to_datetime(pd.Series(df['date']), format="%Y%m%d")
    
    # Derived Columns
    # Positivity Rate
    positive = df.loc[:,['positive']].values[0:]
    tests = df.loc[:,['tests']].values[0:]
    df['pos_rate'] = positive / tests * 100

    # Death Rate
    deaths = df.loc[:,['death_total']].values[0:]
    positive = df.loc[:,['positive_total']].values[0:]
    df['death_rate'] = (deaths / positive) * 100
    df['death_rate_est'] = (deaths / (positive * 10)) * 100
    
    return df

Now we can apply this function to our US dataset like this:

us = clean_covidtracking_api_data(us)
us

This gives us a much cleaner dataset.

We do have some missing data and that could be cleaned up a little further, but in interest of time I’ve left that alone for now.

There is much more to the dashboard, for instance I have a plot that shows you trends. The point of this is to give some context for these numbers. I want to be able to answer questions like “are we doing better or worse in terms of deaths, hospitalizations, and positivity rates?”.

PA Positivity Rates

In the dashboard, I do a bunch of other stuff that you might find interesting if you are looking to do something like this. You can check out the complete source code and see the dashboard here in my GitHub account.

Matt has worked as a data analyst, writer, counselor, and business owner for a total of 20 years. Since the start of his career he's been fascinated by technology and passionate about helping people use modern technology to hack their work and their lives.

Leave a Reply

Your email address will not be published. Required fields are marked *