When I need a Data Science tool beyond vanilla SQL I usually reach for RStudio. There is no denying that R is fast and painless for doing statistical analysis, but it’s quirky and falls far out of the norms of coding and that’s always bugged me. So I’ve been debating making a switch.
Lately, I’ve been experimenting with Python programming. Python is a very popular general-purpose programming language that is used in Data Science, but it’s also used to make apps, websites, and services. I’m particularly interested in using this language along with AWS services to publish small web APIs and things like that. But, I need to learn the language first.
The best way to learn a new programming language is to give yourself a project. These days I’ve been obsessed with the Covid-19 pandemic numbers because I am watching these closely to inform our decisions about sending my kid back to school. So I decided to code a Covid-19 dashboard in Python.
To use Python as a Data Science tool you will need to install Anaconda. This will install Python itself, the core Data Science packages like Pandas, and it will install IDE and dashboard solutions. To code the Covid-19 dashboard, I used a tool called Jupyter Notebook. Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
How I’m Coding the Covid-19 Dashboard
The key to building this dashboard is the availability of web APIs that publish Covid-19 data in a format that is easy to keep up to date. We have APIs for the entire US and each individual state as well as each county. Data for the US and states is very comprehensive, but the data we are getting back at the county level is sparse.
So here is an example of how you can use Python and Pandas to grab data from the API.
import pandas as pd import requests from IPython.display import display, HTML from pandas import DataFrame us = requests.get('https://covidtracking.com/api/v1/us/daily.json') us = us.text us = pd.read_json(us) us
Here is what the output of that code would look like:

You get so much data back that it is hard to look at. We can only see about a third of the columns in this data frame. I want to see a subset of this data and I also want to derive so stats like positivity and death rates. Since I know that I need this for the entire US and nearby states like PA and NJ I decided to write a function that cleans and adds to the data.
def clean_covidtracking_api_data(df): # get subset df = df[['date','positiveIncrease','totalTestResultsIncrease' ,'hospitalizedCurrently','death','deathIncrease','positive']] # Rename dataset columns df = df.rename(columns={'positiveIncrease':'positive' ,'totalTestResultsIncrease': 'tests' ,'hospitalizedCurrently': 'hospitalized' ,'death': 'death_total' ,'deathIncrease': 'death' ,'positive': 'positive_total'}) # Harmonize date column df['date'] = df['date'].astype(str) df['date'] = pd.to_datetime(pd.Series(df['date']), format="%Y%m%d") # Derived Columns # Positivity Rate positive = df.loc[:,['positive']].values[0:] tests = df.loc[:,['tests']].values[0:] df['pos_rate'] = positive / tests * 100 # Death Rate deaths = df.loc[:,['death_total']].values[0:] positive = df.loc[:,['positive_total']].values[0:] df['death_rate'] = (deaths / positive) * 100 df['death_rate_est'] = (deaths / (positive * 10)) * 100 return df
Now we can apply this function to our US dataset like this:
us = clean_covidtracking_api_data(us) us
This gives us a much cleaner dataset.

We do have some missing data and that could be cleaned up a little further, but in interest of time I’ve left that alone for now.
There is much more to the dashboard, for instance I have a plot that shows you trends. The point of this is to give some context for these numbers. I want to be able to answer questions like “are we doing better or worse in terms of deaths, hospitalizations, and positivity rates?”.

In the dashboard, I do a bunch of other stuff that you might find interesting if you are looking to do something like this. You can check out the complete source code and see the dashboard here in my GitHub account.