datapointsr is an R package that I wrote while I was working at College Board. datapointsr makes it a little bit easier to QC and work on statistical summary tables. datapointsr acts as a wrapper for sqldf and reshape and it also puts statistical tables into a standard format(categories, variables, values). The idea is that it will be easier to build reusable functions later on if I can assume that data will be in this format.
I’m moving toward making this work more like the dplyr package with a emphasis on verbs.
You can install datapointsr from Github using the devtools package like this:
library(devtools) install_github('mattjcamp/datapointsr','mattjcamp')
Here is an example of how to use datapointsr. First, make a dataset to work on. Below I summarized the built-in quakes
dataset.
quakes_summary <- quakes %>% group_by(stations) %>% summarise(mag_mean = mean(mag), mag_sd = sd(mag), depth_mean = mean(depth), depth_sd = sd(depth)) quakes_summary %>% head()
Here is what the output would look like:
stations mag_mean mag_sd depth_mean depth_sd 1 10 4.230000 0.1894591 345.5000 199.3979 2 11 4.228571 0.1843048 338.2857 214.3564 3 12 4.196000 0.2091252 364.4000 201.9051 4 13 4.333333 0.2008316 416.2381 207.2462 5 14 4.276923 0.2019139 312.1282 214.2942 6 15 4.282353 0.1930153 313.4706 188.8933
To use datapointsr, figure out which variables will act as categories and which variables will act as measure values. Since I grouped by stations
I can treat it as a category while all the other columns look like measures; so I will use datapoints to make a long dataset with stations
.
quakes_summary %>% datapoints(1) %>% filter_sql("stations IN (10,43) AND Variable = 'mag_sd'")
Here is what you would see if you ran the code above:
stations Variable Value 1 10 mag_sd 0.1894591 2 43 mag_sd 0.2143223
There is also a wide() function that can be used to switch back to wide format.You can see the latest work on datapointsr in this GitHub repo.