罕见病疫苗有效性样本量估计

我们首先导入后面会用到的包,后续会用到的dplyr的%>%管道函数和的kableExtra输出表格[1,2]

library(dplyr)
library(kableExtra)

疫苗有效性(Vaccine Efficacy)

当我们检验一款新的疫苗是否有效时,通常会使用疫苗有效性\(\pi\)(Vaccine Efficacy)这个指标。让\(P_1\), \(P_2\)分别为安慰剂组和接种疫苗组的发病率;\(N_1\), \(N_2\)分别为安慰剂组和接种疫苗组的群体总数;\(X, Y\)分别为安慰剂和疫苗组的病例人数。则\(X \overset{i.i.d}{\sim}\text{Binomial}(N_1, P_1)\), \(Y \overset{i.i.d}{\sim}\text{Binomial}(N_2, P_2)\)

疫苗有效性为: \[ \pi = 1-(P_2/P_1) \tag{1}\] 原假设和备择假设为:

\[ H_0: \pi \leq \pi_0 \text{ versus } \ H_1:\pi>\pi_0 \tag{2}\]

如果一个疾病发病率较低,我们就需要更多受试者参加实验,在这种情况下\(X, Y\)可以被近似成为独立的泊松分布, 也就是\(X \overset{i.i.d}{\sim}\text{Poisson}(\lambda_1)\), \(Y \overset{i.i.d}{\sim}\text{Poisson}(\lambda_2)\) with \(\lambda_1 = N_1\cdot P_1, \lambda_2 = N_2\cdot P_2\).

\(Y\)给定\(T\)的条件概率分布为 \[ Y|T \sim \text{Binomial}(T, \theta)=\binom{T}{k} \theta^k(1-\theta)^{T-k} \text{ with } T = X + Y, \theta = \frac{\lambda_1}{\lambda_1+\lambda_2} \tag{3}\] 此时原假设和备择假设为:

\[ H_0: \theta \geq \theta_0 \text{ versus } \ H_1:\theta<\theta_0 \text{ where } \theta_0 = \frac{1-\pi_0}{2-\pi_0} \tag{4}\]

  • 计算的P value 公式为 \[ p = \Pr[Y \leq Y_{\text{obs}} \mid Y \sim \text{Binomial}(T, \theta_0)] = \sum_{k=0}^{Y_{\text{obs}}} \binom{T}{k} \theta_0^k (1 - \theta_0)^{T - k} \tag{5}\]

  • 计算Statistical power公式为 \[ 1 - \beta = \Pr[Y \leq Y_c \mid Y \sim \text{Binomial}(T, \theta_1)] = \sum_{k=0}^{Y_c} \binom{T}{k} \theta_1^k (1 - \theta_1)^{T - k} \tag{6}\]

  • 计算临床实验所需样本量公式为 \[ N_2 = T/[(2-\pi_1)/P_1] \tag{7}\]

计算样本量所需的变量为\(\pi_0, \pi_1, \alpha\), 期望统计功效(1-\(\beta\))和安慰剂组发病率(\(P_1\))。

详细推导过程可以看这两篇论文 [3,4]

R代码示例

exact_conditonal_test <- function(T, alpha, power, theta0, theta1) {
  Y_c <- qbinom(alpha, size = T, prob = theta0)-1
  p_value <- pbinom(Y_c, size = T, prob = theta0)
  power <- pbinom(Y_c, size = T, prob = theta1)
  return(c(T, Y_c, power, p_value))
}

ect_sample_size <- function(T, pi0, pi1, incidence, alpha, power){
  theta0 <- (1-pi0) / (2-pi0)
  theta1 <- (1-pi1) / (2-pi1)
  table_t <- as.data.frame(do.call(rbind, 
                               lapply(T, exact_conditonal_test, 
                                      alpha = alpha, theta0= theta0, theta1 = theta1)), 
                           .name_repair = "unique")
  colnames(table_t) <- c('T', 'Y_c', 'power', 'p-value')
  
  for (n in 1:nrow(table_t)){
    if (table_t$power[n] >= power && all(table_t$power[n:nrow(table_t)] >= power)) {
      min_n <- table_t$T[n]
      break
    }
  }
  result <- list(
    T_table = table_t,
    T = min_n,
    text = paste0('The min value of T achieve power of ', power, ' is ', min_n)
  )
  class(result) <- 'result'
  result
}

T2N <- function(T_value, incidence, pi1, dropout_rate){
  N2 <- T_value/((2-pi1)*incidence)/(1-dropout_rate)
  cat('The sample of vaccine group considering the drop out rate:', N2)
}

例1

[3] 在其论文中详细阐述了exact conditional方法的理论推导,并提供了不同样本量下统计功效(power)与显著性水平(significance level)的对应表格。为验证该方法的计算准确性,我们可以调用ect_sample_size()函数进行实证分析。具体参数设置:病例范围33到40,\(\pi_0=0.2,\pi_1=0.8, P_1=0.006, \alpha=0.025\), 目标统计功效为95%。将这些参数输入函数后,即可获得相应的样本量估计结果。

T <- 33:40 # the number of T
pi0 <- 0.2 # null hypothesis efficacy
pi1 <- 0.8 # True efficacy under alternative hypothesis
alpha <- 0.025 # type I error
incidence <- 0.006 # placebo incidence rate
power <- 0.95
res1 <- ect_sample_size(T, pi0, pi1, incidence, alpha, power)
kbl(res1$T_table)%>% 
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
T Y_c power p-value
33 8 0.9139690 0.0136117
34 9 0.9540856 0.0244451
35 9 0.9449925 0.0178969
36 9 0.9347919 0.0129998
37 10 0.9653937 0.0227940
38 10 0.9584044 0.0168288
39 10 0.9504998 0.0123313
40 11 0.9738542 0.0211901

代码中的incidence 就是\(P_1\), 其他参数与上面提及的保持一致。可以看到输出的表格中样本量,critical value, statistical power和p value与论文中的表格完全一致。我们打印出能够达到95% statistical power的病例数

res1$T
[1] 37

打印结果显示 T = 37,与论文中的结果一致。接下来,利用公式 Equation 7 计算疫苗组所需的样本量。为了方便重复使用,我将该计算过程封装成了函数 T2N()

在本示例中,未考虑受试者脱落情况,因此将 dropout_rate 参数设为 0。计算结果显示,疫苗组的样本量应不少于 5138.889。将该值乘以 2,即可得出疫苗组与安慰剂组的总样本量应不少于 10277.78,进一后为 10278

该结果与论文完全一致,说明我们的算法实现是正确的。

T2N(37, incidence, pi1, dropout_rate = 0)
The sample of vaccine group considering the drop out rate: 5138.889

HRV-三期

[5] 这篇论文原本是我们希望复现样本量计算的参考文献。然而在复现过程中发现,文中并未明确给出所需的关键参数 \(\pi_1\)\(\alpha\) 以及期望的统计功效,因此无法准确计算所需样本量。因此,我们决定后续不再以该论文作为参考。

接下来的示例将以 HRV-三期这篇论文的参考文献为依据,该研究同样是关于轮状病毒(rotavirus)疫苗有效性的临床试验。

RV5

[6] 这篇论文将作为我们后续进行疫苗有效性检验样本量计算的参考文献。文中提供了以下参数:15% 的脱落率,(_0 = 0)、(_1 = 0.6)、(P_1 = 0.02)、显著性水平 (= 0.025),以及期望的统计功效为 80%。

代入上述参数后,可得所需的最小病例数为47。为了方便分组,我们取双数为48。接着,使用T2N()函数计算疫苗组所需的样本量,结果为至少2016.807,向上取整后为2020。该样本量结果与论文中的报告一致,验证了我们算法的正确性。

T <- 40:50 # the number of T
pi0 <- 0 # null hypothesis efficacy
pi1 <- 0.6 # True efficacy under alternative hypothesis
alpha <- 0.025 # type I error
incidence <- 0.02 # placebo incidence rate
power <- 0.8
res_rv5 <- ect_sample_size(T, pi0, pi1, incidence, alpha, power)
kbl(res_rv5$T_table)%>% 
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
T Y_c power p-value
40 13 0.7692914 0.0192387
41 13 0.7363326 0.0137666
42 14 0.8052771 0.0217793
43 14 0.7757295 0.0157697
44 15 0.8362319 0.0243834
45 15 0.8100042 0.0178489
46 15 0.7819032 0.0129480
47 16 0.8396107 0.0199930
48 16 0.8146130 0.0146525
49 17 0.8650285 0.0221921
50 17 0.8429717 0.0164196
res_rv5$T
[1] 47
T2N(48, incidence, pi1, dropout_rate = 0.15)
The sample of vaccine group considering the drop out rate: 2016.807

PCV13

T <- 100:170 # the number of T
pi0 <- 0 # null hypothesis efficacy
pi1 <- 0.52 # True efficacy under alternative hypothesis
alpha <- 0.025 # type I error
incidence <- 0.0024 # placebo incidence rate
power <- 0.9
dropout <- 0.05
res_rv5 <- ect_sample_size(T, pi0, pi1, incidence, alpha, power)
kbl(res_rv5$T_table)%>%
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
T Y_c power p-value
100 39 0.9327041 0.0176001
101 40 0.9482422 0.0230220
102 40 0.9398983 0.0185334
103 41 0.9539240 0.0241168
104 41 0.9463670 0.0194790
105 41 0.9379379 0.0156509
106 42 0.9521763 0.0204360
107 42 0.9445280 0.0164733
108 43 0.9573873 0.0214036
109 43 0.9504577 0.0173077
110 44 0.9620567 0.0223810
111 44 0.9557870 0.0181532
112 45 0.9662365 0.0233675
113 45 0.9605713 0.0190093
114 46 0.9699746 0.0243623
115 46 0.9648618 0.0198753
116 46 0.9591042 0.0161360
117 47 0.9687057 0.0207504
118 47 0.9635009 0.0168941
119 48 0.9721463 0.0216341
120 48 0.9674468 0.0176618
121 49 0.9752232 0.0225259
122 49 0.9709847 0.0184387
123 50 0.9779725 0.0234250
124 50 0.9741539 0.0192242
125 51 0.9804271 0.0243311
126 51 0.9769904 0.0200179
127 51 0.9730892 0.0163948
128 52 0.9795269 0.0208191
129 52 0.9760104 0.0170934
130 53 0.9817936 0.0216276
131 53 0.9786270 0.0178000
132 54 0.9838175 0.0224427
133 54 0.9809686 0.0185143
134 55 0.9856234 0.0232641
135 55 0.9830627 0.0192358
136 56 0.9872337 0.0240913
137 56 0.9849340 0.0199641
138 57 0.9886687 0.0249240
139 57 0.9866051 0.0206989
140 57 0.9842415 0.0171178
141 58 0.9880963 0.0214398
142 58 0.9859725 0.0177687
143 59 0.9894263 0.0221865
144 59 0.9875194 0.0184261
145 60 0.9906117 0.0229385
146 60 0.9889008 0.0190897
147 61 0.9916676 0.0236957
148 61 0.9901338 0.0197592
149 62 0.9926077 0.0244576
150 62 0.9912335 0.0204342
151 62 0.9896509 0.0170053
152 63 0.9922139 0.0211146
153 63 0.9907943 0.0176055
154 64 0.9930874 0.0218000
155 64 0.9918148 0.0182114
156 65 0.9938653 0.0224901
157 65 0.9927253 0.0188226
158 66 0.9945576 0.0231847
159 66 0.9935370 0.0194390
160 67 0.9951735 0.0238835
161 67 0.9942604 0.0200603
162 68 0.9957211 0.0245863
163 68 0.9949046 0.0206862
164 68 0.9939581 0.0173403
165 69 0.9954782 0.0213166
166 69 0.9946308 0.0178995
167 70 0.9959886 0.0219512
168 70 0.9952305 0.0184634
169 71 0.9964426 0.0225898
170 71 0.9957646 0.0190318
res_rv5$T
[1] 100
T2N(156, 0.0024, 0.52, dropout_rate = dropout)
The sample of vaccine group considering the drop out rate: 46230.44

Reference

[1]
Wickham H, François R, Henry L, et al. Dplyr: A grammar of data manipulation [Internet]. 2023. Available from: https://dplyr.tidyverse.org.
[2]
Zhu H. kableExtra: Construct complex table with kable and pipe syntax [Internet]. 2024. Available from: http://haozhu233.github.io/kableExtra/.
[3]
Chan ISF, Bohidar NR. Exact power and sample size for vaccine efficacy studies. Communications in Statistics - Theory and Methods. 1998;27(6):1305–1322.
[4]
Loiacono MM. SAMPLE SIZE ESTIMATION AND POWER CALCULATIONS FOR VACCINE EFFICACY TRIALS FOR EXCEEDINGLY RARE DISEASES.
[5]
Wu Z, Li Q, Liu Y, et al. Efficacy, safety and immunogenicity of hexavalent rotavirus vaccine in Chinese infants. Virologica Sinica [Internet]. 2022 [cited 2025 Jul 22];37(5):724–730.
[6]