BBS Course: Good Software Engineering Practice for R Packages
Friedrich
February 10, 2023
Motivation
From an idea to a production-grade R package
Example scenario: in your daily work, you notice that you need certain one-off scripts again and again.
The idea of creating an R package was born because you understood that “copy and paste” R scripts is inefficient and on top of that, you want to share your helpful R functions with colleagues and the world…
Professional Workflow
Photo CC0 by ELEVATE on pexels.com
Typical work steps
Idea
Concept creation
Validation planning
Specification:
User Requirements Spec (URS),
Functional Spec (FS), and
Software Design Spec (SDS)
R package programming
Documented verification
Completion of formal validation
R package release
Use in production
Maintenance
Workflow in Practice
Photo CC0 by Chevanon Photography on pexels.com
Frequently Used Workflow in Practice
Idea
R package programming
Use in production
Bug fixing
Use in production
Bug fixing + Documentation
Use in production
Bug fixing + Further development
Use in production
Bug fixing + …
Bad practice!
Why?
Why practice good engineering?
Warning: Paket 'ggplot2' wurde unter R Version 4.2.3 erstellt
Warning: Paket 'dplyr' wurde unter R Version 4.2.3 erstellt
Attache Paket: 'dplyr'
Die folgenden Objekte sind maskiert von 'package:stats':
filter, lag
Die folgenden Objekte sind maskiert von 'package:base':
intersect, setdiff, setequal, union
Cost distribution among software process activities
Let’s assume that you used some lines of code to create simulated data in multiple projects:
dat <-data.frame(group =c(rep(1, 50), rep(2, 50)),values =c(rnorm(n =50, mean =8, sd =12),rnorm(n =50, mean =14, sd =11) ))
Idea: put the code into a package
Example - Step 2: Design docs
Describe the purpose and scope of the package
Analyse and describe the requirements in clear and simple terms (“prose”)
Obligation level
Key word1
Description
Duty
shall
“must have”
Desire
should
“nice to have”
Intention
will
“optional”
Example - Step 2: Design docs
Purpose and Scope
The R package simulatr shall enable the creation of reproducible fake data.
Package Requirements
simulatrshall provide a function to generate normal distributed random data for two independent groups. The function shall allow flexible definition of sample size per group, mean per group, standard deviation per group. The reproducibility of the simulated data shall be ensured via an optional seed It should be possible to print the function result. A graphical presentation of the simulated data will also be possible.
#' @title#' Print Simulation Result#'#' @description#' Generic function to print a `SimulationResult` object.#'#' @param x a \code{SimulationResult} object to print.#' @param ... further arguments passed to or from other methods.#' #' @examples#' x <- getSimulatedTwoArmMeans(n1 = 50, n2 = 50, mean1 = 5, #' mean2 = 7, sd1 = 3, sd2 = 4, seed = 123)#' print(x)#'#' @export
Gillespie, C., & Lovelace, R. (2017). Efficient R Programming: A Practical Guide to Smarter Programming. O’Reilly UK Ltd. [Book | Online]
Grolemund, G. (2014). Hands-On Programming with R: Write Your Own Functions and Simulations (1. Aufl.).
O’Reilly and Associates. [Book | Online]
Rupp, C., & SOPHISTen, die. (2009). Requirements-Engineering und -Management: Professionelle, iterative Anforderungsanalyse für die Praxis (5. Ed.). Carl Hanser Verlag GmbH & Co. KG. [Book]
Wickham, H. (2015). R Packages: Organize, Test, Document, and Share Your Code (1. Aufl.). O’Reilly and Associates. [Book | Online]
Wickham, H. (2019). Advanced R, Second Edition.
Taylor & Francis Ltd. [Book | Online]
Important: To use this work you must provide the name of the creator (initial author), a link to the material, a link to the license, and indicate if changes were made