data science


datapointsr is an R package that I wrote while I was working at College Board. datapointsr makes it a little bit easier to QC and work on statistical summary tables. datapointsr acts as a wrapper for sqldf and reshape and it also puts statistical tables into a standard format(categories, variables, values). The idea is that it will be easier to build reusable functions later on if I can assume that data will be in this format.

I’m moving toward making this work more like the dplyr package with a emphasis on verbs.

You can install datapointsr from Github using the devtools package like this:


Here is an example of how to use datapointsr. First, make a dataset to work on. Below I summarized the built-in quakes dataset.

quakes_summary <- quakes %>% 
    group_by(stations) %>% 
    summarise(mag_mean = mean(mag),
              mag_sd = sd(mag),
              depth_mean = mean(depth),
              depth_sd = sd(depth))

quakes_summary %>% head()

Here is what the output would look like:

stations    mag_mean mag_sd      depth_mean depth_sd                   
1        10 4.230000 0.1894591   345.5000   199.3979
2        11 4.228571 0.1843048   338.2857   214.3564
3        12 4.196000 0.2091252   364.4000   201.9051
4        13 4.333333 0.2008316   416.2381   207.2462
5        14 4.276923 0.2019139   312.1282   214.2942
6        15 4.282353 0.1930153   313.4706   188.8933

To use datapointsr, figure out which variables will act as categories and which variables will act as measure values. Since I grouped by stations I can treat it as a category while all the other columns look like measures; so I will use datapoints to make a long dataset with stations.

quakes_summary %>% 
    datapoints(1) %>% 
    filter_sql("stations IN (10,43) AND Variable = 'mag_sd'")

Here is what you would see if you ran the code above:

         stations Variable     Value
1        10       mag_sd       0.1894591
2        43       mag_sd       0.2143223

There is also a wide() function that can be used to switch back to wide format.You can see the latest work on datapointsr in this GitHub repo.

Leave a Reply

Matt is the author of five Apress books including Learn RStudio IDE, Quick, Effective, and Productive Data Science, Objective-C Recipes, Swift Quick Syntax Reference, Objective-C Quick Reference, and the upcoming Pro Data Visualization with R and JavaScript. He has over 20 years of experience in technology, psychometrics, and data analytics working in major higher education institutions such as The College Board and Educational Testing Service. He has earned a Master’s degree in Information Systems Management and a Bachelor’s degree in Quantitative Psychology.