# Validation Report for **adoptr** package

*2021-07-27*

# 1 Introduction

This work is licensed under the CC-BY-SA 4.0 license

## 1.1 Preliminaries

R package validation for regulatory environments can be a
tedious endeavour.
The authors firmly believe that under the current regulation,
there is no such thing as a ‘validated R package’:
validation is by definition a process conducted by the *user*.
This validation report merely aims at facilitating
validation of **adoptr** as
much as possible.
No warranty whatsoever as to the correctness of **adoptr** nor the
completeness of the validation report are given by the authors.

We assume that the reader is familiar with the notation an theoretical
background of **adoptr**.
Otherwise, the following resources might be of help:

**adoptr**online documentation at https://kkmann.github.io/adoptr/- paper on the theoretical background of the core
**adoptr**functionality (Pilz et al. 2019) - a general overview on adaptive designs is given in (Bauer et al. 2015)
- a more extensive treatment of the subject in (Wassmer and Brannath 2016).

## 1.2 Scope

**adoptr** itself already makes extensive use of unittesting to
ensure correctness of all implemented functions.
Yet, due to constraints on the build-time for an R package,
the range of scenarios covered in the unittests of **adoptr** is
rather limited.
Furthermore, the current R unittesting framework does not permit
an easy generation of a human-readable report of the test cases
to ascertain coverage and test quality.

Therefore, **adoptr** splits testing in two parts: technical
correctness is ensured via an extensive unittesting suit in **adoptr**
itself (aiming to maintain a 100% code coverage).
The validation report, however, runs through a wide range of possible
application scenarios and ensures plausibility of results as well
as consistency with existing methods wherever possible.
The report itself is implemented as a collection of Rmarkdown documents
allowing to show both the underlying code as well as the corresponding
output in a human-readable format.

The online version of the report is dynamically re-generated on a
weekly basis based on the respective
most current version of **adoptr** on CRAN.
The latest result of these builds is available at
https://kkmann.github.io/adoptr-validation-report/.
To ensure early warning in case of any test-case failures,
formal tests are implemented using the **testthat** package
(Wickham, R Studio, and R Core Team 2018).
I.e., the combination of using a unittesting framework, a continuous
integration, and continuous deployment service leads to an always
up-to-date validation report (build on the current R release on Linux).
Any failure of the integrated formal tests will cause the build status
of the validation report to switch from ‘passing’ to ‘failed’ and
the respective maintainer will be notified immediately.

### 1.2.1 Validating a local installation of adoptr

Note that, strictly speaking, the online version of the validation
report only provides evidence of the correctness on the respective
Travis-CI cloud virtual machine infrastructure using the respective
most recent release of R and the most recent versions of the
dependencies available on CRAN.
In some instances it might therefore be desireable to conduct a
local validaton of **adoptr**.

To do so, one should install **adoptr** with the `INSTALL_opts`

option
to include tests and invoke the test suit locally via

```
install.packages("adoptr", INSTALL_opts = c("--install-tests"))
tools::testInstalledPackage("adoptr", types = c("examples", "tests"))
```

Upon passing the test suit successfully, the validation report can be build locally. To do so, first clone the entire source directory and switch to the newly created folder

Make sure that all packages requied for building the report are
available, i.e., install all dependencies listed in the top-level
`DESCRIPTION`

file, e.g.,

The book can then be build using the terminal command

or directly from R via

This produces a new folder `_book`

with the html and pdf versions
of the report.

## 1.3 Validation Scenarios

### 1.3.1 Scenario I: Large effect, point prior

This is the default scenario.

**Data distribution:**Two-armed trial with normally distributed test statistic**Prior:**\(\delta\sim\textbf{1}_{\delta=0.4}\)**Null hypothesis:**\(\mathcal{H}_0:\delta \leq 0\)

#### 1.3.1.1 Variant I.1: Minimizing Expected Sample Size under the Alternative

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.4\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- All three
**adoptr**variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points. - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

?

#### 1.3.1.2 Variant I.2: Minimizing Expected Sample Size under the Null Hypothesis

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\color{red}{\delta=0.0}\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Validate constraint compliance by testing vs. simulated values of the power curve at respective points.
- \(n()\) of optimal design is monotonously increasing on continuation area.
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

?

#### 1.3.1.3 Variant I.3: Condtional Power Constraint

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.4\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(CP := \color{red}{\boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4, X_1 = x_1\big] \geq 0.7}\) for all \(x_1\in(c_1^f, c_1^e)\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Check \(Power\) and \(TOER\) constraints with simulation. Check \(CP\) constraint on 25 different values of \(x_1\) in \([c_1^f, c_1^e]\)
- Are the \(CP\) values at the 25 test-pivots obtained from simulation the
same as the ones obtained by using numerical integration via
`adoptr::evaluate`

? - Is \(ESS\) of optimal two-stage design with \(CP\) constraint higher than \(ESS\) of optimal two-stage design without this constraint?

### 1.3.2 Scenario II: Large effect, Gaussian prior

Similar scope to Scenario I, but with a continuous Gaussian prior on \(\delta\).

**Data distribution:**Two-armed trial with normally distributed test statistic**Prior:**\(\delta\sim\mathcal{N}(0.4, .3)\)**Null hypothesis:**\(\mathcal{H}_0:\delta \leq 0\)

#### 1.3.2.1 Variant II.1: Minimizing Expected Sample Size

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- All designs comply with type one error rate constraints (tested via simulation).
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.

#### 1.3.2.2 Variant II.2: Minimizing Expected Sample Size under the Null hypothesis

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\color{red}{\delta\leq 0}\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design comply with \(TOER\) constraint (via simulation)?
- Is \(ESS\) lower than expected sample size under the null hypothesis for the optimal two stage design from Variant II-1?

#### 1.3.2.3 Variant II.3: Condtional Power Constraint

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta>0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(CP := \color{red}{\boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0, X_1 = x_1\big] \geq 0.7}\) for all \(x_1\in(c_1^f, c_1^e)\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Check \(TOER\) constraint with simulation.
- Check \(CP\) constraint on three different values of \(x_1\) in \((c_1^f, c_1^e)\)
- Is \(ESS\) of optimal two-stage design with \(CP\) constraint higher than \(ESS\) of optimal two-stage design without the constraint?

### 1.3.3 Scenario III: Large effect, uniform prior

**Data distribution:**Two-armed trial with normally distributed test statistic**Prior:**sequence of uniform distributions \(\delta\sim\operatorname{Unif}(0.4 - \Delta_i, 0.4 + \Delta_i)\) around \(0.4\) with \(\Delta_i=(3 - i)/10\) for \(i=0\ldots 3\). I.e., for \(\Delta_3=0\) reduces to a point prior on \(\delta=0.4\).**Null hypothesis:**\(\mathcal{H}_0:\delta \leq 0\)

#### 1.3.3.1 Variant III.1: Convergence under Prior Concentration

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta>0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Simulated type one error rate is compared to \(TOER\) constraint for each design.
- \(ESS\) decreases with prior variance.

Additionally, the designs are compared graphically. Inspect the plot to see convergence pattern.

### 1.3.4 Scenario IV: Smaller effect size, larger trials

#### 1.3.4.1 Variant IV.1: Minimizing Expected Sample Size under the Alternative

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- All three adoptr variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points.
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?

#### 1.3.4.2 Variant IV.2: Increasing Power

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq \color{red}{0.9}\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design respect all constraints (via simulation)?
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?

#### 1.3.4.3 Variant IV.3: Increasing Maximal Type One Error Rate

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq \color{red}{0.05}\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design respect all constraints (via simulation)?
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?

### 1.3.5 Scenario V: Single-arm design, medium effect size

**Data distribution:**trial with normally distributed test statistic**Prior:**\(\delta\sim\delta_{0.3}\)**Null hypothesis:**\(\mathcal{H}_0:\delta \leq 0\)

#### 1.3.5.1 Variant V.1: Sensitivity to Integration Order

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\color{red}{\delta=0.3}\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: integration order 5, 8, 11 two-stage designs.

**Formal tests:**- Do all designs converge within the respective iteration limit?
- Do all designs respect all constraints (via simulation)?

#### 1.3.5.2 Variant V.2: Utility Maximization

**Objective:**\(\lambda\, Power - ESS := \lambda\, \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] - \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big].\) for \(\lambda = 100\) and \(200\)**Constraints:**- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate constraint (via simulation)?
- Is the power of the design with larger \(\lambda\) larger?

#### 1.3.5.3 Variant V.3: \(n_1\) penalty

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big] + \lambda \, n_1\) for \(\lambda = 0.05\) and \(0.2\).**Constraints:**- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] \geq 0.8\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate and power constraints (via simulation)?
- Is \(n_1\) for the optimal design smaller than the order-5 design in V.1?

#### 1.3.5.4 Variant V.4: \(n_2\) penalty

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big] + \lambda\)`AverageN2`

for \(\lambda = 0.01\) and \(0.1\).**Constraints:**- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] \geq 0.8\)

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate and power constraints (via simulation)?
- Is the
`AverageN2`

for the optimal design smaller than for the order-5 design in V.1?

### 1.3.6 Scenario VI: Binomial distribution

This scenario investigates the implementation of the binomial distribution.

**Data distribution:**Two-armed trial with binomial distributed outcomes. Thus \(\delta := p_E - p_C\) refers to the rate difference here. The control rate is assumed to equal \(p_C = 0.3\).**Prior:**\(\delta\sim\textbf{1}_{\delta=0.2}\)**Null hypothesis:**\(\mathcal{H}_0:\delta \leq 0\)

#### 1.3.6.1 Variant VI.1: Minimizing Expected Sample Size under the Alternative

**Objective:**\(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)**Constraints:**- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.9\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.

**Formal tests:**- Number of iterations are checked against default maximum to ensure proper convergence.
- All three
**adoptr**variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points. - Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
`adoptr::evaluate`

?

## 1.4 Technical Setup

All scenarios are run in a single, shared R session.
Required packages are loaded here,
the random seed is defined and set centrally, and the default number
of iteration is increased to make sure that all scenarios
converge properly.
Additionally R scripts with convenience functions are sourced here as well.
There are three additional functions for this report.
`rpact_design`

creates a two-stage design via the package **rpact** (Wassmer and Pahlke 2018)
in the notation of **adoptr**.
`sim_pr_reject`

and `sim_n`

allow to simulate rejection probabilities
and expected sample sizes respectively by the **adoptr** routine `simulate`

.
Furthermore, global tolerances for the validation are set.
For error rates, a relative deviation of \(1\%\) from the target value is
accepted.
(Expected) Sample sizes deviations are more liberally accepted up to an
absolute deviation of \(0.5\).

`## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──`

```
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.3 ✔ dplyr 1.0.7
## ✔ tidyr 1.1.3 ✔ stringr 1.4.0
## ✔ readr 2.0.0 ✔ forcats 0.5.1
```

```
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::n() masks adoptr::n()
```

```
##
## Attaching package: 'testthat'
```

```
## The following object is masked from 'package:dplyr':
##
## matches
```

```
## The following object is masked from 'package:purrr':
##
## is_null
```

```
## The following objects are masked from 'package:readr':
##
## edition_get, local_edition
```

```
## The following object is masked from 'package:tidyr':
##
## matches
```

```
## The following object is masked from 'package:adoptr':
##
## expectation
```

```
library(tinytex)
# load custom functions in folder subfolder '/R'
for (nm in list.files("R", pattern = "\\.[RrSsQq]$"))
source(file.path("R", nm))
# define seed value
seed <- 42
# define absolute tolerance for error rates
tol <- 0.01
# define absolute tolerance for sample sizes
tol_n <- 0.5
# define custom tolerance and iteration limit for nloptr
opts = list(
algorithm = "NLOPT_LN_COBYLA",
xtol_rel = 1e-5,
maxeval = 100000
)
```

### References

Bauer, P., F. Bretz, V. Dragalin, F. König, and G. Wassmer. 2015. “Twenty-Five Years of Confirmatory Adaptive Designs: Opportunities and Pitfalls.” *Statistics in Medicine* 35 (3): 325–47. https://doi.org/10.1002/sim.6472.

Pilz, M., K. Kunzmann, C. Herrmann, G. Rauch, and M. Kieser. 2019. “A Variational Approach to Optimal Two-Stage Designs.” *Statistics in Medicine* 38 (21): 4159–71. https://doi.org/10.1002/sim.8291.

Wassmer, G., and W. Brannath. 2016. *Group Sequential and Confirmatory Adaptive Designs in Clinical Trials*. Springer Series in Pharmaceutical Statistics -. Springer International Publishing.

Wassmer, G., and F. Pahlke. 2018. *Rpact: Confirmatory Adaptive Clinical Trial Design and Analysis*. https://www.rpact.org.

Wickham, H., R Studio, and R Core Team. 2018. *Testthat: Unit Testing for R*. https://cran.r-project.org/web/packages/testthat/index.html.