Janis Hardwick
Robert Oehmke
Quentin F. Stout
University of Michigan
Abstract: We describe a program for optimizing and analyzing sequential allocation problems involving three Bernoulli populations and a general objective function. Previous researchers had considered this problem computationally intractable, and there appears to be no prior exact optimizations for such problems, even for very small sample sizes. This paper contains a description of the program, along with the techniques used to scale it to large sample sizes. The program currently handles problems of size 200 or more by using a modest parallel computer, and problems of size 100 on a workstation. As an illustration, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. The bandit procedure is then compared to two other allocation procedures along various Bayesian and frequentist metrics. Note that such models can be applied to settings such as clinical trials.
An important aspect of the dynamic programming equations being solved is that they are near-neighbor recurrence equations, having very simple template patterns. Many other problems can be solved via similar equations, and the basic control and communication structure of these programs can be used to solve several such problems. A few of these are pointed out in the conclusion.
Keywords: multi-arm bandit problem and allocation, controlled clinical trial, response adaptive sampling design, binary response, sequential allocation, adaptive procedure, dynamic programming, design of experiments, parallel computing, supercomputing
Complete paper. This paper appears in Computational Statistics and Data Analysis 31 (1999), pp. 397-416.
![]() |
Copyright © 2004-2009 Quentin F. Stout |