data science

datapointsr

datapointsr is an R package that I wrote while I was working at College Board. datapointsr makes it a little bit easier to QC and work on statistical summary tables. datapointsr acts as a wrapper for sqldf and reshape and it also puts statistical tables into a standard format(categories, variables, values). The idea is that it will be easier to build reusable functions later on if I can assume that data will be in this format.

I’m moving toward making this work more like the dplyr package with a emphasis on verbs.

You can install datapointsr from Github using the devtools package like this:

library(devtools)
install_github('mattjcamp/datapointsr','mattjcamp')

Here is an example of how to use datapointsr. First, make a dataset to work on. Below I summarized the built-in quakes dataset.

quakes_summary <- quakes %>% 
    group_by(stations) %>% 
    summarise(mag_mean = mean(mag),
              mag_sd = sd(mag),
              depth_mean = mean(depth),
              depth_sd = sd(depth))

quakes_summary %>% head()

Here is what the output would look like:

stations    mag_mean mag_sd      depth_mean depth_sd                   
1        10 4.230000 0.1894591   345.5000   199.3979
2        11 4.228571 0.1843048   338.2857   214.3564
3        12 4.196000 0.2091252   364.4000   201.9051
4        13 4.333333 0.2008316   416.2381   207.2462
5        14 4.276923 0.2019139   312.1282   214.2942
6        15 4.282353 0.1930153   313.4706   188.8933

To use datapointsr, figure out which variables will act as categories and which variables will act as measure values. Since I grouped by stations I can treat it as a category while all the other columns look like measures; so I will use datapoints to make a long dataset with stations.

quakes_summary %>% 
    datapoints(1) %>% 
    filter_sql("stations IN (10,43) AND Variable = 'mag_sd'")
    

Here is what you would see if you ran the code above:

         stations Variable     Value
1        10       mag_sd       0.1894591
2        43       mag_sd       0.2143223

There is also a wide() function that can be used to switch back to wide format.You can see the latest work on datapointsr in this GitHub repo.

Matt has worked as a data analyst, writer, counselor, and business owner for a total of 20 years. Since the start of his career he's been fascinated by technology and passionate about helping people use modern technology to hack their work and their lives.

Leave a Reply

Your email address will not be published. Required fields are marked *