- The Complexity Of Nonuniform Random Number Generation Pdf Reader Online
- The Complexity Of Nonuniform Random Number Generation Pdf Reader Download
- The Complexity Of Nonuniform Random Number Generation Pdf Reader Free
The Complexity Of Nonuniform Random Number Generation Pdf Reader Online
Statistics and Computing Series Editors:
J. Chambers w. Eddy W.W l e S. Sheather L.Tiemey
Springer Y~rk Berlin Heidelberg Hang Kong London
Milan Perfs
Tokyo
Statistics and Computing Dalgomd: lnboductoryStatistics with R. Gentle. Elemak ofComputational Stptistics. OenfIe: Numerical Linear Algebra for Applications m Statistics. Oentle Random N& omaation andMonte &lo Mahods, 2nd Editim. Hcr*dwMWwlach: XploRe: An Intnactive Statistical Computing Bnvirommt RiOUFPN)Iron:me Basics of S and S-Pws,3rd Edition Lmge: NNllmrkal Analysis for Statisticians. b&r:Local Regrnsion and Lilihcd bRurmcrldh/Fibgemld.Numrical Baycsisn Mcmads Applied to Signal Roassing. Pluvrallw: VARIOWIN: Softwan for Spatial Data Analysis in 2D. PinheirOlBau1: Mixed-Effcds Models in S and S - h u s venabk.dRiy,l~: Modem ~ppliedStatisticswith S,4th ~ t l o n . venabler/Riprey: s ProgmEhg. . WWnmn: me Ibeof Graphics.
James E. Gentle
Random Number Generation and Monte Carlo Methods Second Edition With 54 Illustrati~ns
Springer
James H. Gentle School of Computational Sciences George Mason University Fairfax. VA 22030-4444 USA j gen [email protected] Series
Editors:
J. Chambers Bell Labs, Lucent Techonologies 600 Mountain Avenue Murray Hill. NJ 07974 USA
W. Eddy Department of Statistics Carnegie Mellon University Pittsburgh, PA USA
S. Sheather Australian Graduate School of Management University of New South Wales Sydney, NSW 2052 Australia
L. Tiemey Sclool of Statistics and Actuarial Science Universily of Iowa lowa City. IA 52242-1414 USA
W, Hardle Institut fiir Slatistik und Okonnmetrie Humboldt-University Spandaucr .Str. I D-10178 Berlin Germany
Library of Congress Cataloging-in-Puhlication Data Gentle, James E. 1943-Random number generation and Monte Carlo methods / James H. Gentle. p. cm. — (Statistics and computing) Includes bibliographical references and index ISBN 0-3S7-OOI78-6 (alk, paper) 1. Monte (,Carlo method. 2. Random number generators. I. Title. [I. Series. (QA2298 ,(G46 2003 519 .2'82—dc21 2003042437 ISBN 0-387-0017-6
e-ISBN
0-387-21610
Printed on acid-free paper
CO 2003,3l')'1998Springer Science Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in pan without the written permission of the publisher (Springer Science Business Media, Inc., 233 Spring Strcoi, New York, NY 10013, USA), except for brief excerpts in connection wish reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in (his publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject In proprietary rights. Printed in the United States of America. 9 S 7 6 5 4 3 2 springeronline.cnm
(HID
Corrected second printing, 2005.
SPIN 11016038
To Maria
This page intentionally left blank
Preface The role of Monte Carlo methods and simulation in all of the sciences has increased in importance during the past several years. This edition incorporates discussion of many advances in the field of random number generation and Monte Carlo methods since the appearance of the first edition of this book in 1998. These methods play a central role in the rapidly developing subdisciplines of the computational physical sciences, the computational life sciences, and the other computational sciences. The growing power of computers and the evolving simulation methodology have led to the recognition of computation as a third approach for advancing the natural sciences, together with theory and traditional experimentation. At the kernel of Monte Carlo simulation is random number generation. Generation of random numbers is also at the heart of many standard statistical methods. The random sampling required in most analyses is usually done by the computer. The computations required in Bayesian analysis have become viable because of Monte Carlo methods. This has led to much wider applications of Bayesian statistics, which, in turn, has led to development of new Monte Carlo methods and to refinement of existing procedures for random number generation. Various methods for generation of random numbers have been used. Sometimes, processes that are considered random are used, but for Monte Carlo methods, which depend on millions of random numbers, a physical process as a source of random numbers is generally cumbersome. Instead of “random” numbers, most applications use “pseudorandom” numbers, which are deterministic but “look like” they were generated randomly. Chapter 1 discusses methods for generation of sequences of pseudorandom numbers that simulate a uniform distribution over the unit interval (0, 1). These are the basic sequences from which are derived pseudorandom numbers from other distributions, pseudorandom samples, and pseudostochastic processes. In Chapter 1, as elsewhere in this book, the emphasis is on methods that work. Development of these methods often requires close attention to details. For example, whereas many texts on random number generation use the fact that the uniform distribution over (0, 1) is the same as the uniform distribution over (0, 1] or [0, 1], I emphasize the fact that we are simulating this disvii
viii
PREFACE
tribution with a discrete set of “computer numbers”. In this case whether 0 and/or 1 is included does make a difference. A uniform random number generator should not yield a 0 or 1. Many authors ignore this fact. I learned it over twenty years ago, shortly after beginning to design industrial-strength software. The Monte Carlo methods raise questions about the quality of the pseudorandom numbers that simulate physical processes and about the ability of those numbers to cover the range of a random variable adequately. In Chapter 2, I address some issues of the quality of pseudorandom generators. Chapter 3 describes some of the basic issues in quasirandom sequences. These sequences are designed to be very regular in covering the support of the random process simulated. Chapter 4 discusses general methods for transforming a uniform random deviate or a sequence of uniform random deviates into a deviate from a different distribution. Chapter 5 describes methods for some common specific distributions. The intent is not to provide a compendium in the manner of Devroye (1986a) but, for many standard distributions, to give at least a simple method or two, which may be the best method, but, if the better methods are quite complicated, to give references to those methods. Chapter 6 continues the developments of Chapters 4 and 5 to apply them to generation of samples and nonindependent sequences. Chapter 7 considers some applications of random numbers. Some of these applications are to solve deterministic problems. This type of method is called Monte Carlo. Chapter 8 provides information on computer software for generation of random variates. The discussion concentrates on the S-Plus, R, and IMSL software systems. Monte Carlo methods are widely used in the research literature to evaluate properties of statistical methods. Chapter 9 addresses some of the considerations that apply to this kind of study. I emphasize that a Monte Carlo study uses an experiment, and the principles of scientific experimentation should be observed. The literature on random number generation and Monte Carlo methods is vast and ever-growing. There is a rather extensive list of references beginning on page 336; however, I do not attempt to provide a comprehensive bibliography or to distinguish the highly-varying quality of the literature. The main prerequisite for this text is some background in what is generally called “mathematical statistics”. In the discussions and exercises involving multivariate distributions, some knowledge of matrices is assumed. Some scientific computer literacy is also necessary. I do not use any particular software system in the book, but I do assume the ability to program in either Fortran or C and the availability of either S-Plus, R, Matlab, or Maple. For some exercises, the required software can be obtained from either statlib or netlib (see the bibliography). The book is intended to be both a reference and a textbook. It can be
PREFACE
ix
used as the primary text or a supplementary text for a variety of courses at the graduate or advanced undergraduate level. A course in Monte Carlo methods could proceed quickly through Chapter 1, skip Chapter 2, cover Chapters 3 through 6 rather carefully, and then, in Chapter 7, depending on the backgrounds of the students, discuss Monte Carlo applications in specific fields of interest. Alternatively, a course in Monte Carlo methods could begin with discussions of software to generate random numbers, as in Chapter 8, and then go on to cover Chapters 7 and 9. Although the material in Chapters 1 through 6 provides the background for understanding the methods, in this case the details of the algorithms are not covered, and the material in the first six chapters would only be used for reference as necessary. General courses in statistical computing or computational statistics could use the book as a supplemental text, emphasizing either the algorithms or the Monte Carlo applications as appropriate. The sections that address computer implementations, such as Section 1.2, can generally be skipped without affecting the students’ preparation for later sections. (In any event, when computer implementations are discussed, note should be taken of my warnings about use of software for random number generation that has not been developed by software development professionals.) In most classes that I teach in computational statistics, I give Exercise 9.3 in Chapter 9 (page 311) as a term project. It is to replicate and extend a Monte Carlo study reported in some recent journal article. In working on this exercise, the students learn the sad facts that many authors are irresponsible and many articles have been published without adequate review.
Acknowledgments I thank John Kimmel of Springer for his encouragement and advice on this book and other books on which he has worked with me. I thank Bruce McCullough for comments that corrected some errors and improved clarity in a number of spots. I thank the anonymous reviewers of this edition for their comments and suggestions. I also thank the many readers of the first edition who informed me of errors and who otherwise provided comments or suggestions for improving the exposition. I thank my wife Mar´ıa, to whom this book is dedicated, for everything. I did all of the typing, programming, etc., myself, so all mistakes are mine. I would appreciate receiving suggestions for improvement and notice of errors. Notes on this book, including errata, are available at http://www.science.gmu.edu/~jgentle/rngbk/
Fairfax County, Virginia
James E. Gentle April 10, 2003
This page intentionally left blank
Contents Preface
vii
1 Simulating Random Numbers from a Uniform Distribution 1.1 Uniform Integers and an Approximate Uniform Density . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Simple Linear Congruential Generators . . . . . . . . . . . . . . 1.2.1 Structure in the Generated Numbers . . . . . . . . . . . 1.2.2 Tests of Simple Linear Congruential Generators . . . . . 1.2.3 Shuffling the Output Stream . . . . . . . . . . . . . . . 1.2.4 Generation of Substreams in Simple Linear Congruential Generators . . . . . . . . . . . . . . . . . . 1.3 Computer Implementation of Simple Linear Congruential Generators . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Ensuring Exact Computations . . . . . . . . . . . . . . 1.3.2 Restriction that the Output Be in the Open Interval (0, 1) . . . . . . . . . . . . . . . . . . . . 1.3.3 Efficiency Considerations . . . . . . . . . . . . . . . . . 1.3.4 Vector Processors . . . . . . . . . . . . . . . . . . . . . . 1.4 Other Linear Congruential Generators . . . . . . . . . . . . . . 1.4.1 Multiple Recursive Generators . . . . . . . . . . . . . . 1.4.2 Matrix Congruential Generators . . . . . . . . . . . . . 1.4.3 Add-with-Carry, Subtract-with-Borrow, and Multiply-with-Carry Generators . . . . . . . . . . . . . 1.5 Nonlinear Congruential Generators . . . . . . . . . . . . . . . . 1.5.1 Inversive Congruential Generators . . . . . . . . . . . . 1.5.2 Other Nonlinear Congruential Generators . . . . . . . . 1.6 Feedback Shift Register Generators . . . . . . . . . . . . . . . . 1.6.1 Generalized Feedback Shift Registers and Variations . . 1.6.2 Skipping Ahead in GFSR Generators . . . . . . . . . . . 1.7 Other Sources of Uniform Random Numbers . . . . . . . . . . 1.7.1 Generators Based on Cellular Automata . . . . . . . . . 1.7.2 Generators Based on Chaotic Systems . . . . . . . . . . 1.7.3 Other Recursive Generators . . . . . . . . . . . . . . . . xi
1 . . . . .
5 11 14 20 21
. 23 . 27 . 28 . . . . . .
29 30 30 31 32 34
. . . . . . . . . . .
35 36 36 37 38 40 43 43 44 45 45
xii
CONTENTS 1.7.4 Tables of Random Numbers . . . . . . . . . . . . . . . . 1.8 Combining Generators . . . . . . . . . . . . . . . . . . . . . . . 1.9 Properties of Combined Generators . . . . . . . . . . . . . . . . 1.10 Independent Streams and Parallel Random Number Generation 1.10.1 Skipping Ahead with Combination Generators . . . . . 1.10.2 Different Generators for Different Streams . . . . . . . . 1.10.3 Quality of Parallel Random Number Streams . . . . . . 1.11 Portability of Random Number Generators . . . . . . . . . . . 1.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
46 46 48 51 52 52 53 54 55 56
2 Quality of Random Number Generators 2.1 Properties of Random Numbers . . . . . . . . . . . . 2.2 Measures of Lack of Fit . . . . . . . . . . . . . . . . 2.2.1 Measures Based on the Lattice Structure . . 2.2.2 Differences in Frequencies and Probabilities . 2.2.3 Independence . . . . . . . . . . . . . . . . . . 2.3 Empirical Assessments . . . . . . . . . . . . . . . . . 2.3.1 Statistical Goodness-of-Fit Tests . . . . . . . 2.3.2 Comparisons of Simulated Results with Statistical Models in Physics . . . . . . . . . 2.3.3 Anecdotal Evidence . . . . . . . . . . . . . . 2.3.4 Tests of Random Number Generators Used in 2.4 Programming Issues . . . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
61 62 64 64 67 70 71 71
. . . . . . . . . . Parallel . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
86 86 87 87 87 88
3 Quasirandom Numbers 3.1 Low Discrepancy . . . . 3.2 Types of Sequences . . . 3.2.1 Halton Sequences 3.2.2 Sobol’ Sequences 3.2.3 Comparisons . . 3.2.4 Variations . . . . 3.2.5 Computations . . 3.3 Further Comments . . . Exercises . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
93 93 94 94 96 97 97 98 98 100
4 Transformations of Uniform Deviates: General Methods 4.1 Inverse CDF Method . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decompositions of Distributions . . . . . . . . . . . . . . . . . . 4.3 Transformations that Use More than One Uniform Deviate . . 4.4 Multivariate Uniform Distributions with Nonuniform Marginals 4.5 Acceptance/Rejection Methods . . . . . . . . . . . . . . . . . . 4.6 Mixtures and Acceptance Methods . . . . . . . . . . . . . . . .
. . . . . .
101 102 109 111 112 113 125
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . . . .
CONTENTS 4.7 Ratio-of-Uniforms Method . . . . . . . . . . . . . . . . . . 4.8 Alias Method . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Use of the Characteristic Function . . . . . . . . . . . . . 4.10 Use of Stationary Distributions of Markov Chains . . . . . 4.11 Use of Conditional Distributions . . . . . . . . . . . . . . 4.12 Weighted Resampling . . . . . . . . . . . . . . . . . . . . 4.13 Methods for Distributions with Certain Special Properties 4.14 General Methods for Multivariate Distributions . . . . . . 4.15 Generating Samples from a Given Distribution . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
5 Simulating Random Numbers from Specific Distributions 5.1 Modifications of Standard Distributions . . . . . . . . . . . . . 5.2 Some Specific Univariate Distributions . . . . . . . . . . . . . . 5.2.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . 5.2.2 Exponential, Double Exponential, and Exponential Power Distributions . . . . . . . . . . . . . . . . . . . . 5.2.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . . 5.2.4 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Chi-Squared, Student’s t, and F Distributions . . . . . . 5.2.6 Weibull Distribution . . . . . . . . . . . . . . . . . . . . 5.2.7 Binomial Distribution . . . . . . . . . . . . . . . . . . . 5.2.8 Poisson Distribution . . . . . . . . . . . . . . . . . . . . 5.2.9 Negative Binomial and Geometric Distributions . . . . . 5.2.10 Hypergeometric Distribution . . . . . . . . . . . . . . . 5.2.11 Logarithmic Distribution . . . . . . . . . . . . . . . . . 5.2.12 Other Specific Univariate Distributions . . . . . . . . . 5.2.13 General Families of Univariate Distributions . . . . . . . 5.3 Some Specific Multivariate Distributions . . . . . . . . . . . . . 5.3.1 Multivariate Normal Distribution . . . . . . . . . . . . . 5.3.2 Multinomial Distribution . . . . . . . . . . . . . . . . . 5.3.3 Correlation Matrices and Variance-Covariance Matrices 5.3.4 Points on a Sphere . . . . . . . . . . . . . . . . . . . . . 5.3.5 Two-Way Tables . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Other Specific Multivariate Distributions . . . . . . . . 5.3.7 Families of Multivariate Distributions . . . . . . . . . . 5.4>
After the guide table is set up, Algorithm 4.3 generates a random number from the given distribution. Algorithm 4.3 Sampling a Discrete Random Variate Using the Chen and Asau Guide Table Method 1. Generate u from a U(0, 1) distribution, and set i = 0012un0014. 2. Set x = gi + 1. 0001 3. While xk=1 pk > u, set x = x − 1. Efficiency of the Inverse CDF for Discrete Distributions Rather than using a stored table of the mass points of the distribution, we may seek other efficient methods of searching for the x in equation (4.2). The search can often be improved by knowledge of the relative magnitude of the probabilities of the points. The basic idea is to begin at a point with a high probability of satisfying the relation (4.2). Obviously, the mode is a good place to begin the search, especially if the probability at the mode is quite high.
108
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
For many discrete distributions of interest, there may be a simple recursive relationship between the probabilities of adjacent mass points: p(x) = f (p(x − 1))
for x > x0 ,
where f is some simple function (and we assume that the mass points differ by 1, and x0 is the smallest value with positive mass). In the Poisson distribution (see page 188), for example, p(x) = θp(x − 1)/x for x > 0. For this case, Kemp (1981) describes two approaches. One is a “build-up search” method in which the CDF is built up by the recursive computation of the mass probabilities. This is Algorithm 4.4. Algorithm 4.4 Build-Up Search for Discrete Distributions 0. Set t = p(x0 ). 1. Generate u from a U(0, 1) distribution, and set x = x0 , px = t, and s = px . 2. If u ≤ s, then 2.a. deliver x; otherwise, 2.b. set x = x + 1, px = f (px ), and s = s + px , and return to step 2. The second method that uses the recursive evaluation of probabilities to speed up the search is a “chop-down” method in which the generated uniform variate is decreased by an amount equal to the CDF. This method is given in Algorithm 4.5. Algorithm 4.5 Chop-Down Search for Discrete Distributions 0. Set t = p(x0 ). 1. Generate u from a U(0, 1) distribution, and set x = x0 and px = t. 2. If u ≤ px , then 2.a. deliver x; otherwise, 2.b. set u = u − px , x = x + 1, and px = f (px ), and return to step 2. Either of these methods could be modified to start at some other point, such as the mode.
4.2. DECOMPOSITIONS OF DISTRIBUTIONS
109
Interpolating in Tables Often, for a continuous random variable, we may have a table of values of the cumulative distribution function but not have a function representing the CDF over its full range. This situation may arise in applications in which a person familiar with the process can assign probabilities for the variable of interest yet may be unwilling to assume a particular distributional form. One approach to this problem is to fit a continuous function to the tabular values and then use the inverse CDF method on the interpolant. The simplest interpolating function, of course, is the piecewise linear function, but second- or third-degree polynomials may give a better fit. It is important, however, that the interpolant be monotone. Guerra, Tapia, and Thompson (1976) describe a scheme for approximating the CDF based on an interpolation method of Akima (1970). Their procedure is implemented in the IMSL routine rngct. Multivariate Distributions The inverse CDF method does not apply to a multivariate distribution, although marginal and conditional univariate distributions can be used in an inverse CDF method to generate multivariate random variates. If the CDF of the multivariate random variable (X1 , X2 , . . . , Xd ) is decomposed as PX1 X2 ..Xd (x1 , x2 , . . . , xd ) = PX1 (x1 )PX2 |X1 (x2 |x1 ) · · · PXd |X1 X2 ..Xd−1 (xd |x1 , x2 , . . . , xd−1 ) and if the functions are invertible, the inverse CDF method is applied sequentially using independent realizations of a U(0, 1) random variable, u1 , u2 , . . . , ud : x1
−1 = PX (u1 ), 1
x2
−1 = PX (u2 ), 2 |X1
.. .. −1 (ud ). xd = P X d |X1 X2 ..Xd−1 The modifications of the inverse CDF for discrete random variables described above can be applied if necessary.
4.2
Decompositions of Distributions
It is often useful to break up the range of the distribution of interest using one density over one subrange and another density over another subrange. More generally, we may represent the distribution of interest as a mixture distribution that is composed of proportions of other distributions. Suppose that the probability density or probability function of the random variable of interest,
110
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
p(·), can be represented as p(x) =
k
wj pj (x),
(4.6)
j=1
where the pj (·) are density functions or probability functions of random variables, the union of whose support is the support of the random variable of interest. We require wj ≥ 0 and
k
wj = 1.
j=1
The random variable of interest has a mixture distribution. If the pj are such that the pairwise intersections of the supports of the distributions are all null, the mixture is a stratification. To generate a random deviate from a mixture distribution, first use a single uniform to select the component distribution, and then generate a deviate from it. The mixture can consist of any number of terms. To generate a sample of n random deviates from a mixture distribution of d distributions, consider the proportions to be the parameters of a d-variate multinomial distribution. The first step is to generate a single multinomial deviate, and then generate the required number of deviates from each of the component distributions. Any decomposition of p into the sum of nonnegative integrable functions yields the decomposition in equation (4.6). The nonnegative wi are chosen to sum to 1. For example, suppose that a distribution has density p(x), and for some constant c, p(x) ≥ c over (a, b). Then, the distribution can be decomposed into a mixture of a uniform distribution over (a, b) with proportion c(b − a) and some leftover part, say g(x). Now, g(x)/(1 − c(b − a)) is a probability density function. To generate a deviate from p: with probability c(b − a), generate a deviate from U(a, b); otherwise, generate a deviate from the density
1 g(x). 1 − c(b − a)
If c(b − a) is close to 1, we will generate from the uniform distribution most of the time, so even if it is difficult to generate from g(x)/(1 − c(b − a)), this decomposition of the original distribution may be useful. Another way of forming a mixture distribution is to consider a density similar to equation (4.6) that is a conditional density, p(x|y) = yp1 (x) + (1 − y)p2 (x),
4.3. USE OF MORE THAN ONE UNIFORM DEVIATE
111
where y is the realization of a Bernoulli random variable, Y . If Y takes a value of 0 with probability w1 /(w1 + w2 ), then the density in equation (4.6) is the marginal density. This conditional distribution yields 0016 pX,Y (x, y) dy pX (x) =
= pX|Y =y Pr(Y = y) y
= w1 p1 (x) + w2 p2 (x), as in equation (4.6). More generally, for any random variable X with a distribution parameterized by θ, we can think of the parameter as being the realization of a random variable Θ. Some common distributions result from mixing other distributions; for example, if the gamma distribution is used to generate the parameter in a Poisson distribution, a negative binomial distribution is formed. Mixture distributions are often useful in their own right; for example, the beta-binomial distribution (see page 187) can be used to model overdispersion.
4.3
Transformations that Use More than One Uniform Deviate
Methods for generating random deviates by first decomposing the distribution of interest require the use of more than one uniform deviate for each deviate from the target distribution. Most other methods discussed in this chapter also require more than one uniform deviate for each deviate of interest. For such methods we must be careful to avoid any deleterious effects of correlations in the underlying uniform generator. An example of a short-range correlation occurs in the use of a congruential generator, xi ≡ axi−1 mod m, when xi−1 is extremely small. In this case, the value of xi is just axi−1 with no modular reduction. A small value of xi−1 may correspond to some extreme intermediate value in one of the constituent distributions in the decomposition of the density in equation (4.6). Because xi = axi−1 , when xi is used to complete the transformation to the variate of interest, it may happen that the extreme values of that variate do not cover their appropriate range. As a simple example, consider a method for generating a variate from a double exponential distribution. One way to do this is to use one uniform variate to generate an exponential variate (using one of the methods that we discuss below) and then use a second uniform variate to decide whether to change the sign of the exponential variate (with probability 1/2). Suppose that the method for generating an exponential variate yields an extremely large value if the underlying uniform variate is extremely small. (The method given
112
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
by equation (5.10) on page 176 does this.) If the next uniform deviate from the basic generator is used to determine whether to change the sign, it may happen that all of the extreme double exponentials generated have the same sign. Many such problems arise because of a poor uniform generator; a particular culprit is a multiplicative congruential generator with a small multiplier. Use of a high-quality uniform generator generally solves the problem. A more conservative approach may be to use a different uniform generator for each uniform deviate used in the generation of a single nonuniform deviate. For this to be effective, each generator must be of high quality, of course. Because successive numbers in a quasirandom sequence are constructed so as to span a space systematically, such sequences generally should not be used when more than one uniform deviate is transformed into a single deivate from another distribution. The autocorrelations in the quasirandom sequence may prevent certain ranges of values of the transformations from being realized. A common way in which uniform deviates are transformed to deviates from nonuniform distributions is to use one uniform random number to make a decision about how to use another uniform random number. The decision is often based on a comparison of two floating-point numbers. In rare cases, because of slight differences in rounding to a finite precision, this comparison may result in different decisions in different computer environments. The different decisions can result in the generation of different output streams from that point on. Our goal of completely portable random number generators (Section 1.11) may not be achieved when comparisons are made between two floating-point numbers that might differ in the least significant bits on different systems.
4.4
Multivariate Uniform Distributions with Nonuniform Marginals
Suppose that pX is a continuous probability density function, and consider the set S = {(x, u), s.t. 0 ≤ u ≤ pX (x)}. (4.7) Let (X, U ) be a bivariate random variable with uniform distribution over S. Its density function is (4.8) pXU (x, u) = IS (x, u). The conditional distribution of U given X = x is U(0, pX (x)), and the conditional distribution of X given U = u is also uniform with density pX|U (x|u) = I{t, s.t. pX (t)≥u} (x). The important fact, which we see by integrating u out of the density in equation (4.8), is that the marginal distribution of X has density pX . This can be seen in Figure 4.3, where the points are uniformly distributed over S, but the marginal histogram of the x values corresponds to the density pX .
4.5. ACCEPTANCE/REJECTION METHODS
113
Figure 4.3: Support of a Bivariate Uniform Random Variable (X, U ) Having a Marginal with Density p(x) These facts form the basis of methods of generating random deviates from various nonuniform distributions. The effort in these methods is expended in getting the bivariate uniform points over the region S. In most cases, this is done by generating bivariate points uniformly over some larger region and then rejecting those points that are not in the region S. This same approach is valid if the random variable X is a vector. In this case, we would identify a higher-dimensional region S with a scalar u and a vector x corresponding respectively to a scalar uniform random variable and the vector random variable X.
4.5
Acceptance/Rejection Methods
To generate realizations of a random variable X, an acceptance/rejection method makes use of realizations of another random variable Y having probability density gY similar to the probability density of X, pX . The basic idea is that selective subsamples from samples from one distribution are stochastically equivalent to samples from a different distribution. The acceptance/rejection technique is one of the most important methods in random number generation, and it occurs in many variations.
114
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Majorizing the Density In the basic form of the method, to generate a deviate from a distribution with density pX , a random variable Y is chosen so that we can easily generate realizations of it and so that its density gY can be scaled to majorize pX using some constant c; that is, so that cgY (x) ≥ pX (x) for all x. The density gY is called the majorizing density, and cgY is called the majorizing function. The majorizing function is also called the “envelope” or the “hat function”. The majorizing density is also sometimes called the “trial density”, the “proposal density”, or the “instrumental density”. There are many variations of the acceptance/rejection method. The method described here uses a sequence of i.i.d. variates from the majorizing density. It is also possible to use a sequence from a conditional majorizing density. A method using a nonindependent sequence is called a Metropolis method (and there are variations of these, with their own names, as we see below). Unlike the inverse CDF method, the acceptance/rejection method applies immediately to multivariate random variables, although, as we will see, the method may not be very efficient in high dimensions. Algorithm 4.6 The Acceptance/Rejection Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function gY . 2. Generate u from a U(0, 1) distribution. 3. If u ≤ pX (y)/cgY (y), then 3.a. take y as the desired realization; otherwise 3.b. return to step 1. It is easy to see that the random number delivered by Algorithm 4.6 has a density pX . (In Exercise 4.2, page 160, you are asked to write the formal proof.) The pairs (u, y) that are accepted follow a bivariate uniform distribution over the region S in equation (4.7). Figure 4.4 illustrates the functions used in the acceptance/rejection method. (Figure 4.4 shows the same density used in Figure 4.3 with a different scaling of the axes. The density is the beta distribution with parameters 3 and 2. In Exercise 4.3, page 160, you are asked to write a program implementing the acceptance/rejection method with the majorizing density shown.) The acceptance/rejection method can be visualized as choosing a subsequence from a sequence of independently and identically distributed (i.i.d.) realizations from the distribution with density gY in such a way that the subsequence has density pX , as shown in Figure 4.5. If we ignore the time required to generate y from the dominating density gY , the closer cgY (x) is to pX (x) (that is, the closer c is to its lower bound of 1), the faster the acceptance/rejection algorithm will be. The proportion of
4.5. ACCEPTANCE/REJECTION METHODS
115
Figure 4.4: The Acceptance/Rejection Method to Convert Uniform Random Numbers acceptances to the total number of trials is the ratio of the area marked “A” in Figure 4.4 to the total area of region “A” and region “R”. Because pX is a density, the area of “A” is 1, so the relevant proportion is 1/(r + 1),
(4.9)
where r is the area between the curves. This ratio only relates to the efficiency of the acceptance; other considerations in the efficiency, of course, involve the amount of computation necessary to generate from the majorizing density. The random variable corresponding to the number of passes through the steps of Algorithm 4.6 until the desired variate is delivered has a geometric distribution (equation (5.21) on page 189, except beginning at 1 instead of 0) with parameter π = 1/(r + 1). Selection of a majorizing function involves the principles of function approximation with the added constraint that the approximating function be a i.i.d. from gY accept? i.i.d. from pX
yi no
yi+1 yes xj
yi+2 no
yi+3 yes xj+1
··· ··· ···
Figure 4.5: Acceptance/Rejection
yi+k yes xj+l
··· ··· ···
116
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.6: Normal (0, 1) Density with a Normal (0, 2) Majorizing Density probability density from which it is easy to generate random variates. Often, gY is chosen to be a very simple density, such as a uniform or a triangular density. When the dominating density is uniform, the acceptance/rejection method is similar to the “hit-or-miss” method (see Exercise 7.2, page 271). The acceptance/rejection method can be used for multivariate random variables, in which case the majorizing distribution is also multivariate. For higher dimensions, however, the acceptance ratio (4.9) may be very small. Consider the use of a normal with mean 0 and variance 2 as a majorizing density for a normal with mean 0 and variance 1, as shown in Figure 4.6. A majorizing density like this with a shape more closely approximating that of the target density is more efficient. (This majorizing function is just chosen for illustration. An obvious problem in this case would be that if we could generate deviates from the N(0, 2) distribution, then we could generate ones from the N(0, 1) distribution, and we would not use this method.) In the one-dimensional case, as shown in Figure 4.6, the acceptance region is the area under the lower curve, and the rejection region is √ the thin shell between √ the two curves. The acceptance proportion (4.9) is 1/ 2. (Note that c = 2.) In higher dimensions, even a thin shell contains most of the volume, so the rejection proportion would be high. In d dimensions, use of a multivariate normal with a diagonal variance-covariance matrix with all entries equal to 2 as a majorizing density to generate a multivariate normal with a diagonal variance-covariance matrix √ with all entries equal to 1 would have an acceptance proportion of only 1/ d.
4.5. ACCEPTANCE/REJECTION METHODS
117
Figure 4.7: The Acceptance/Rejection Method with a Squeeze Function Reducing the Computations in Acceptance/Rejection: Squeeze Functions A primary concern in reducing the number of computations in the acceptance/rejection method is to ensure that the proportion of acceptances is high; that is, that the ratio (4.9) is close to one. Two other issues are the difficulty in generating variates from the majorizing density and the speed of the computations to determine whether to accept or to reject. If the target density, p, is difficult to evaluate, an easy way of speeding up the process is to use simple functions that bracket p to avoid the evaluation of p with a high probability. This method is called a “squeeze” (see Marsaglia, 1977). This allows quicker acceptance. The squeeze function is often a linear or piecewise linear function. The basic idea is to do pretests using simpler functions. Most algorithms that use a squeeze function only use one below the density of interest. Figure 4.7 shows a piecewise linear squeeze function for the acceptance/rejection setup of Figure 4.4. For a given trial value y, before evaluating pX (y) we may evaluate the simpler s(y). If u ≤ s(y)/cgY (y), then u ≤ pX (y)/cgY (y), so we can accept without computing pX (y). Pairs (y, u) lying in the region marked “Q” allow for quick acceptance. The efficiency of an acceptance/rejection method with a squeeze function depends not only on the area between the majorizing function and the target density, as in equation (4.9), but also on the difference in the total area of the acceptance region, which is 1, and the area under the squeeze function (that is, the area of the region marked “Q”). The closer this area is to 1, the more effective is the squeeze. These ratios of areas relate only to the efficiency of the
118
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
acceptance and the quick acceptance. Other considerations in the efficiency, of course, involve the amount of computation necessary to generate from the majorizing density and the amount of computation necessary to evaluate the squeeze function, which, it is presumed, is very small. Another procedure for making the acceptance/rejection decision with fewer computations is the “patchwork” method of Kemp (1990). In this method, the unit square is divided into rectangles that correspond to pairs of uniform distributions that would lead to acceptance, rejection, or lack of decision. The full evaluations for the acceptance/rejection algorithm need be performed only if the pair of uniform deviates to be used are in a rectangle of the latter type. For a density that is nearly linear (or nearly linear over some range), Marsaglia (1962) and Knuth (1998) describe some methods for efficient generation. These methods make use of simple methods for generating from a density that is exactly linear. Use of an inverse CDF method for a distribution with a density that is exactly linear over some range involves a square root operation, but another simple way of generating from a linear density is to use the maximum order statistic of a sample of size two from a uniform distribution; that is, independently generate two U(0, 1) variates, u1 and u2 , and use max(u1 , u2 ). (Order statistics from a uniform distribution have a beta distribution; see Section 6.4.1, page 221.) Following Knuth’s development, suppose that, as in Figure 4.8, the density over the interval (s, s + h) is bounded by two parallel lines, l1 (x) = a − b(x − s)/h and l2 (x) = b − b(x − s)/h. Consider the density p(x) shown in Figure 4.8. Algorithm 4.7, which is Knuth’s method, yields deviates from the distribution with density p. Notice the use of the maximum of two uniform deviates to generate from an exactly linear density. By determining the probability that the resulting deviate falls in any given interval, it is easy to see that the algorithm yields deviates from the given density. You are asked to show this formally in Exercise 4.8, page 161. (The solution to the exercise is given in Appendix B.) Algorithm 4.7 Sampling from a Nearly Linear Density 1. Generate u1 and u2 independently from a U(0, 1) distribution. Set u = min(u1 , u2 ), v = max(u1 , u2 ), and x = s + hu. 2. If v ≤ a/b, then 2.a. go to step 3; otherwise, 2.b. if v > u + p(x)/b, go to step 1. 3. Deliver x.
4.5. ACCEPTANCE/REJECTION METHODS
119
Figure 4.8: A Nearly Linear Density Usually, when we take advantage of the fact that a density is nearly linear, it is not the complete density that is linear, but rather the nearly linear density is combined with other densities to form the density of interest. The density shown 001c s+h p(x) dx = in Figure 4.8 may be the density over some interval (s, s + h) so s p < 1. (See the discussion of mixtures of densities in Section 4.2.) For densities that are concave, we can also very easily form linear majorizing and linear squeeze functions. The majorizing function is a polygon of tangents and the squeeze function is a polygon of secants, as shown in Figure 4.9 (for the density p(x) = 43 (1−x2 ) over [−1, 1]). Any number of polygonal sections could be used in this approach. The tradeoffs involve the amount of setup and housekeeping for the polygonal sections and the proportion of total rejections and the proportion of easy acceptances. The formation of the majorizing and squeeze functions can be done adaptively or sequentially, as we discuss on page 151. Acceptance/Rejection for Discrete Distributions There are various ways that acceptance/rejection can be used for discrete distributions. One advantage of these methods is that they can be easily adapted to changes in the distribution. Rajasekaran and Ross (1993) consider the discrete random variable Xs such that Pr(Xs = xi )
= psi =
asi , as1 + as2 + · · · ask
i = 1, . . . , k.
0001k (If i=1 asi = 1, the numerator asi is the ordinary probability psi at the mass point i.) Suppose that there exists an a∗i such that a∗i ≤ asi for s = 1, 2, . . . and
120
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.9: Linear Majorizing and Squeeze Functions for a Concave Density b > 0 such that
0001k i=1
asi ≥ b for s = 1, 2, . . . Let a∗ = max{a∗i },
and let
Psi = asi /a∗
for i = 1, . . . , k.
The generation method for Xs is shown in Algorithm 4.8. Algorithm 4.8 Acceptance/Rejection Method for Discrete Distributions 1. Generate u from a U(0, 1) distribution, and let i = 0012ku0014. 2. Let r = i − ku. 3. If r ≤ Psi , then 3.a. take i as the desired realization; otherwise, 3.b. return to step 1. Suppose that for the random variable Xs+1 , ps+1,i = psi for some i. (Of course, if this is the case for mass point i, it is also necessarily the case for some other mass point.) For each mass point for which the probability changes, reset Ps+1,i to as+1,i /a∗ and continue with Algorithm 4.8. Rajasekaran and Ross (1993) also gave two other acceptance/rejection type algorithms for discrete distributions that are particularly efficient for use with distributions that may be changing. The other algorithms require slightly more preprocessing time but yield faster generation times than Algorithm 4.8.
4.5. ACCEPTANCE/REJECTION METHODS
121
Variations of Acceptance/Rejection There are many variations of the basic acceptance/rejection method, and the idea of selection of variates from one distribution to form a sample from a different distribution forms the basis of several other methods discussed in this chapter, such as formation of ratios of uniform deviates, use of the characteristic function, and various uses of Markov chains. Wallace (1976) introduced a modified acceptance/rejection method called transformed rejection. In the transformed acceptance/rejection method, the steps of Algorithm 4.6 are combined and rearranged slightly. Let G be the CDF corresponding to the dominating density g. Let H(x) = G−1 (x), and let h(x) = d H(x)/dx. If v is a U(0, 1) deviate, step 1 in Algorithm 4.6 is equivalent to y = H(v), so we have Algorithm 4.9. Algorithm 4.9 The Transformed Acceptance/Rejection Method 1. Generate u and v independently from a U(0, 1) distribution. 2. If u ≤ p(H(v))h(v)/c, then 2.a. take H(v) as the desired realization; otherwise, 2.b. return to step 1. Marsaglia (1984) describes a method very similar to the transformed acceptance/rejection method: use ordinary acceptance/rejection to generate a variate x from the density proportional to p(H(·))h(·) and then return H(x). The choice of H is critical to the efficiency of the method, of course. It should be close to the inverse of the CDF of the target distribution, P −1 . Marsaglia called this the exact-approximation method. Devroye (1986a) calls the method almost exact inversion. Other Applications of Acceptance/Rejection The acceptance/rejection method can often be used to evaluate an elementary function at a random point. Suppose, for example, that we wish to evaluate tan(πU ) for U distributed as U(−.5, .5). A realization of tan(πU ) can be simulated by generating u1 and u2 independently from U(−1, 1), checking if u21 + u22 ≤ 1, and, if so, delivering u1 /u2 as tan(πu). (To see this, think of u1 and u2 as sine and cosine values.) Von Neumann (1951) gives an acceptance/rejection method for generating sines and cosines of random angles. An example of evaluating a logarithm can be constructed by use of the equivalence of an inverse CDF method and an acceptance/rejection method for sampling an exponential random deviate. (The methods are equivalent in a stochastic sense; they are both valid, but they will not yield the same stream of deviates.) These methods of evaluating deterministic functions are essentially the same as using the “hit-or-miss” Monte Carlo method described in Exercise 7.2 on page 271 to evaluate an integral.
122
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Generally, if reasonable numerical software is available for evaluating special functions, it should be used rather than using Monte Carlo methods to estimate the function values. Quality and Portability of Acceptance/Rejection Methods Acceptance/rejection methods, like any method for generating nonuniform random numbers, are dependent on a good source of uniform deviates. H¨ ormann and Derflinger (1993) illustrate that small values of the multiplier in a congruential generator for the uniform deviates can result in poor quality of the output from an acceptance/rejection method. Of course, we have seen that small multipliers are not good for generating uniform deviatess. (See the discussion about Figure 1.3, page 16.) H¨ ormann and Derflinger rediscover the method of expression (1.23) and recommend using it so that larger multipliers can be used in the linear congruential generator. Acceptance/rejection methods generally use two uniform deviates to decide whether to deliver one variate of interest. In implementing an acceptance/rejection method, we must be aware of the cautionary note in Section 4.3, page 111. If the y in Algorithm 4.6 is special (extreme, perhaps) and results from a special value from the uniform generator, the u generated subsequently may also be special and may almost always result in the same decision to accept or to reject. Thus, we may get either an abundance or a deficiency of special values for the distribution of interest. Because of the comparison of floating-point numbers (that occurs in step 3 of Algorithm 4.6), there is a chance that an acceptance/rejection method may yield different streams on different computer systems or in implementations in different precisions. Even if the computations are carried out correctly, the program is inherently nonportable and the results may not be strictly reproducible because if a comparision on one system at a given precision results in acceptance and the comparison on another system results in rejection, the two output streams will be different. At best, the streams will be the same except for a few differences; at worst, however, because of how the output is used, the results will be different beginning at the point at which the acceptance/rejection decision is different. If the decision results in the generation of another random number (as in Algorithm 4.10 on page 126), the two output streams can become completely different. Acceptance/Rejection for Multivariate Distributions The acceptance/rejection method is one of the most widely applicable methods for random number generation. It is used in many different forms, often in combination with other methods. It is clear from the description of the algorithm that the acceptance/rejection method applies equally to multivariate distributions. (The uniform random number is still univariate, of course.)
4.5. ACCEPTANCE/REJECTION METHODS
123
As we have mentioned, however, for higher dimensions, the rejection proportion may be high, and thus the efficiency of the acceptance/rejection method may be low. Example of Acceptance/Rejection: A Bivariate Gamma Distribution Becker and Roux (1981) defined a bivariate extension of the gamma distribution that serves as a useful model for failure times for two related components in a system. (The model is also a generalization of a bivariate exponential distribution introduced by Freund, 1961; see Steel and Le Roux, 1987.) The probability density is given by −1 λ2 (Γ(α1 ) Γ(α2 ) β1α1 β2α2 ) × α1 −1 α2 −1 × x1 (λ2 (x2 − x1 ) + x1 ) λ2 λ2 1 1 for 0 ≤ x1 ≤ x2 , exp −( β1 + β2 − β2 )x1 − β2 x2 −1 pX1 X2 (x1 , x2 ) = × λ1 (Γ(α1 ) Γ(α2 ) β1α1 β2α2 ) α2 −1 α −1 x2 (λ1 (x1 − x2 ) + x2 ) 1 × for 0 ≤ x2 < x1 , exp −( β11 + β12 − λβ11 )x2 − λβ11 x1 0 elsewhere. (4.10) The density for α1 = 4, α2 = 3, β1 = 3, β2 = 1, λ1 = 3, and λ2 = 2 is shown in Figure 4.10. It is a little more complicated to determine a majorizing density for this distribution. First of all, not many bivariate densities are familiar to us. The density must have support over the positive quadrant. A bivariate normal density might be tried, but the exp(−(u1 x1 +u2 x2 )2 ) term in the normal density dies out more rapidly than the exp(−v1 x1 − v2 x2 ) term in the gamma density. The normal cannot majorize the gamma in the limit. We may be concerned about covariance of the variables in the bivariate gamma distribution, but the fact that the variables have nonzero covariance is of little concern in using the acceptance/rejection method. The main thing, of course, is that we determine a majorizing density so that the probability of acceptance is high. We can use a bivariate density of independent variables as the majorizing density. The density would be the product of two univariate densities. A bivariate distribution of independent exponentials might work. Such a density has a maximum at (0, 0), however, and there would be a large volume between the bivariate gamma density and the majorizing function formed from a bivariate exponential density. We can reduce this volume by choosing a bivariate uniform over the rectangle with corners (0, 0) and (z1 , z2 ). Our majorizing
124
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.10: A Bivariate Gamma Density, Equation (4.10) density then is composed of two densities, a bivariate exponential, y1 y2 1 for y1 > z1 and y2 > 0 exp − − θ1 θ2 v or y1 > 0 and y2 > z2 , g1 (y1 , y2 ) = 0 elsewhere,
(4.11)
where the constant v is chosen to make g1 a density, and a bivariate uniform, 1 z1 z2 for 0 < y1 ≤ z1 and 0 < y2 ≤ z2 , g2 (y1 , y2 ) = (4.12) 0 elsewhere. Next, we choose θ1 and θ2 so that the bivariate exponential density can majorize the bivariate gamma density. This requires that 00110011 0012 0012 1 1 1 λ2 λ1 ≥ max + − , , θ1 β1 β2 β2 β1 with a similar requirement for θ2 . Let us choose θ1 = 1 and θ2 = 2. Next, we choose z1 and z2 as the mode of the bivariate gamma density. This point is (4 13 , 2). We now choose c so that cg1 (z1 , z2 ) ≥ p(z1 , z2 ). The method is: 1. Generate u from a U(0, 1) distribution.
4.6. MIXTURES AND ACCEPTANCE METHODS
125
2. Generate (y1 , y2 ) from a bivariate exponential density such as (4.11) except over the full range; that is, with v = θ1 θ2 . 3. If (y1 , y2 ) is outside of the rectangle with corners (0, 0) and (z1 , z2 ), then 3.a. if u ≤ p(y1 , y2 )/cg1 (y1 , y2 ), then 3.a.i. deliver (y1 , y2 ); otherwise, 3.a.ii. go to step 1; otherwise, 3.b. generate (y1 , y2 ) as bivariate uniform deviates in that rectangle and if u ≤ p(y1 , y2 )/(cy1 y2 ), then 3.b.i. deliver (y1 , y2 ); otherwise, 3.b.ii. go to step 1. The majorizing density could be changed so that it is closer to the bivariate gamma density. In particular, instead of the uniform density over the rectangle with a corner on the origin, a pyramidal density that is closer to the bivariate gamma density could be used.
4.6
Mixtures and Acceptance Methods
In practice, in acceptance/rejection methods, the density of interest p and/or the majorizing density are often decomposed into mixtures. If the mixture for the density is a stratification, it may be possible to have simple majorizing and squeeze functions within each stratum. Ahrens (1995) suggested using a stratification into equal-probability regions (that is, the wj s in equation (4.6) are all constant) and then using constant majorizing and squeeze functions in each stratum. There is, of course, a tradeoff in gains in high probability of acceptance (because the majorizing function is close to the density) and/or in efficiency of the evaluation of the acceptance decision (because the squeeze function is close to the density) and the complexity introduced by the decomposition. Decomposition into regions where the density is nearly constant almost always will result in overall gains in efficiency. If the decomposition is into equal-probability regions, the random selection of the stratum is very fast. There are many ways in which mixtures can be combined with acceptance/rejection methods. Suppose that the density of interest, p, may be written as p(x) = w1 p1 (x) + w2 p2 (x), and suppose that there is a density g that majorizes w1 p1 ; that is, g(x) ≥ w1 p1 (x) for all x. Kronmal and Peterson (1981, 1984) consider this case and propose the following algorithm, which they call the acceptance/complement method.
126
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Algorithm 4.10 The Acceptance/Complement Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function g. 2. Generate u from a U(0, 1) distribution. 3. If u > w1 p1 (y)/g(y), then generate y from the density p2 . 4. Take y as the desired realization. We discussed nearly linear densities and gave Knuth’s algorithm for generating from such densities as Algorithm 4.7. Devroye (1986a) gives an algorithm for a special nearly linear density; namely, one that is almost flat. The method is based on a simple decomposition using the supremum of the density. (In practice, as we have indicated in discussing other techniques, this method would probably be used for a component of a density that has already been decomposed.) To keep the description simple, assume that the range of the random variable is (−1, 1) and that the density p satisfies sup p(x) − inf p(x) ≤ x
x
1 2
over that interval. Now, because p is a density, we have 0 ≤ inf p(x) ≤ x
1 ≤ sup p(x) 2 x
and sup p(x) ≤ 1. x
∗
Let p = supx p(x), and decompose the target density into 0011 0012 1 p1 (x) = p(x) − p∗ − 2 and
0011 0012 1 ∗ p2 (x) = p − 2
The method is shown in Algorithm 4.11. Algorithm 4.11 Sampling from a Nearly Flat Density 1. Generate u from U(0, 1). 2. Generate x from U(−1, 1). 3. If u > 2(p(x) − (p∗ − 12 )), then generate x from U(−1, 1). 4. Deliver x.
4.6. MIXTURES AND ACCEPTANCE METHODS
127
Another variation on the general theme of acceptance/rejection applied to mixtures was proposed by De´ ak (1981) in what he called the “economical method”. To generate a deviate from the density p using this method, an auxiliary density g is used, and an “excess area” and a “shortage area” are defined. The excess area is where g(x) > p(x), and the shortage area is where g(x) ≤ p(x). We define two functions p1 and p2 : p1 (x)
= g(x) − p(x)
if
g(x) − p(x) < 0,
if
p(x) − g(x) ≥ 0,
= 0 otherwise, p2 (x)
= p(x) − g(x) = 0 otherwise.
Now, we define a transformation T that will map the excess area into the shortage area in a way that will yield the density p. Such a T is not unique, but one transformation that will work is ! 000e 0016 t 0016 x p1 (s) ds = p2 (s) ds . T (x) = min t, s.t. −∞
−∞
Algorithm 4.12 shows the method. Algorithm 4.12 The Economical Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function g. 2. If p(y)/g(y) < 1, then 2.a. generate u from a U(0, 1) distribution; 2.b. if u ≤ p(y)/g(y), then replace y with T (y). 3. Take y as the desired realization. Using the representation of a discrete distribution that has k mass points as an equally weighted mixture of k two-point distributions, De´ ak (1986) develops a version of the economical method for discrete distributions. (See Section 4.8, page 133, on the alias method for additional discussion of two-point representations.) Marsaglia and Tsang (1984) give a method that involves forming a decomposition of a density into horizontal slices with equal areas. For a unimodal distribution, they first form two regions, one on each side of the mode, prior to the slicing decomposition. They call a method that uses this kind of decomposition the “ziggurat method”. Marsaglia and Tsang (1998) also describe a decomposition and transformation that they called the “Monty Python method”, in which the density (or a part of the density) is divided into three regions as shown in the left-hand plot in Figure 4.11. If the density of interest has already been decomposed
128
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
into a mixture, the part to be decomposed further, p(x), is assumed to have been scaled to integrate to 1. The density in Figure 4.11 may represent the right-hand side of a t distribution, for example, but the function shown has been scaled to integrate to 1. The support of the density is transformed if necessary to begin at 0. One region is now rotated and stretched to fit into an area within a rectangle above another region, as shown in the right-hand plot in Figure 4.11.
Figure 4.11: The Monty Python Decomposition Method The key parameter in the Monty Python method is b, the length of the base of a rectangle that has an area of 1. The portion of the distribution represented by the density above 1/b between 0 and p−1 (1/b) (denote this point by a) is transformed into a region of equal area between a and b bounded from below by the function 1 g(x) = − cp(b − x) − d, b where c and d are chosen so the area is equal to the original and g(x) ≥ p(x) over (a, b). This implies that the tail area beyond b is equal to the area between p(x) and g(x). If a point (x, y), chosen uniformly over the rectangle, falls in region A or B, then x is delivered; if it falls in C, then b − x is delivered; otherwise, x is discarded, and a variate is generated from the tail of the distribution. The efficiency of this method obviously depends on a choice of b in which the decomposition minimizes the tail area. Marsaglia and Tsang (1998) suggest an improvement that may allow a better choice of b. Instead of the function g(x), a polynomial, say a cubic, is determined to satisfy the requirements of majorizing p(x) over (a, b) and having an area equal to the original areal of C. This allows more flexibility in the choice of b. This is the same as the transformation in the exact-approximation method of Marsaglia referred to earlier.
4.7. RATIO-OF-UNIFORMS METHOD
4.7
129
Ratio-of-Uniforms Method
Kinderman and Monahan (1977) discuss a very useful relationship among random variables U , V , and V /U . If (U, V ) is uniformly distributed over the set ' ! 000e v C = (u, v), s.t. 0 ≤ u ≤ h , (4.13) u where h is a nonnegative integrable function, then V /U has probability density proportional to h. Use of this relationship is called a ratio-of-uniforms method. It is easy to see that this relationship holds. For U and V as given, their joint density is pU V (u, v) = IC (u, v)/c, where c is the area of C. Let X = U and Y = V /U . The Jacobian of the transformation is x, so the joint density of X and Y is pXY (x, y) = xIC (x, y)/c. Hence, we have pXY (x, y) =
x # $ (x), I √ c 0, h(y)
and integrating out x, we get 0016 √h(y) pY (y)
= 0
x dx c
1 h(y). 2c
=
In practice, we may choose a simple geometric region that encloses C, generate a uniform point in the rectangle, and reject a point that does not satisfy ' v . u≤ h u The larger region enclosing C is called the majorizing region because it is similar to the region under the majorizing function in acceptance/rejection methods. The ratio-of-uniforms method is very simple to apply, and it can be quite fast. If h(x) and x2 h(x) are bounded in C, a simple form of the majorizing region is the rectangle {(u, v), s.t. 0 ≤ u ≤ b, c ≤ v ≤ d}, where % h(x), x % c = inf x h(x), x % d = sup x h(x). b = sup
x
This yields the method shown in Algorithm 4.13.
130
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Algorithm 4.13 Ratio-of-Uniforms Method (Using a Rectangular Majorizing Region for Continuous Variates) 1. Generate u and v independently from a U(0, 1) distribution. 2. Set u1 = bu and v1 = c + (d − c)v. 3. Set x = v1 /u1 . 4. If u21 ≤ h(x), then 4.a. take x as the desired realization; otherwise, 4.b. return to step 1. Figure 4.12 shows a rectangular region and the area of acceptance for the same density used to illustrate the acceptance/rejection method in Figure 4.4. The full rectangular region as defined above has a very low proportion of acceptances in the example shown in Figure 4.12. There are many obvious ways of reducing the size of this region. A simple reduction would be to truncate the rectangle by the line v = u, as shown. Just as in other acceptance/rejection methods, there is a tradeoff in the effort to generate uniform deviates over a region with a high acceptance rate and the wasted effort of generating uniform deviates that will be rejected. The effort to generate only in the acceptance region is likely to be slightly greater than the effort to invert the CDF. Wakefield, Gelfand, and Smith (1991) give a generalization of the ratio-ofuniforms method by introducing a strictly increasing, differentiable function g that has the property g(0) = 0. Their method uses the fact that if (U, V ) is uniformly distributed over the set 000e 0011 0011 00120012! v , Ch,g = (u, v), s.t. 0 ≤ u ≤ g ch g 0007 (u) where c is a positive constant and h is a nonnegative integrable function as before, then V /g 0007 (U ) has a probability density proportional to h. Ratio-of-Uniforms and Acceptance/Rejection Stadlober (1990, 1991) considers the relationship of the ratio-of-uniforms method to the ordinary acceptance/rejection method and applied the ratio-of-uniforms method to discrete distributions. If (U, V ) is uniformly distributed over the rectangle {(u, v), s.t. 0 ≤ u ≤ 1, −1 ≤ v ≤ 1}, and X = sV /U + a, for any s > 0, then X has the density 1 , a − s ≤ x ≤ a + s, 4s gX (x) = s elsewhere, 4(x − a)2
4.7. RATIO-OF-UNIFORMS METHOD
131
Figure 4.12: The Ratio-of-Uniform Method (Same Density as in Figure 4.4) and the conditional density of Y = U 2 , given X, is 1 for a − s ≤ x ≤ a + s, and 0 ≤ y ≤ 1, (x − a)2 s2 gY |X (y|x) = for x > |a + s|, and 0 ≤ y ≤ , s2 (x − a)2 0 elsewhere. The conditional distribution of Y given X = x is uniform on (0, 4sg(x)), and the ratio-of-uniforms method is an acceptance/rejection method with a table mountain majorizing function. Ratio-of-Uniforms for Discrete Distributions Stadlober (1990) gives the modification of the ratio-of-uniforms method in Algorithm 4.14 for a general discrete random variable with probability function p(·). Algorithm 4.14 Ratio-of-Uniforms Method for Discrete Variates 1. Generate u and v independently from a U(0, 1) distribution.
132
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
2. Set x = 0006a + s(2v − 1)/u0007. 3. Set y = u2 . 4. If y ≤ p(x), then 4.a. take x as the desired realization; otherwise, 4.b. return to step 1. Ahrens and Dieter (1991) describe a ratio-of-uniforms algorithm for the Poisson distribution, and Stadlober (1991) describes one for the binomial distribution. Improving the Efficiency of the Ratio-of-Uniforms Method As we discussed on page 117, the efficiency of any acceptance/rejection method depends negatively on three things: • the effort required to generate the trial variates; • the effort required to make the acceptance/rejection decision; and • the proportion of rejections. There are often tradeoffs among them. We have indicated how the proportion of rejections can be decreased by forming the majorizing region so that it is closer in shape to the shape of the acceptance region. This generally comes at the cost of more effort to generate trial variates. The increase in effort is modest if the majorizing region is a polygon. Leydold (2000) described a systematic method of forming polygonal majorizing regions for a broad class of distributions (T -concave distributions, see page 152). The effort required to make the acceptance/rejection decision can be reduced in the same manner as a squeeze in acceptance/rejection. If a convex polygonal set interior to the acceptance region can be defined, then acceptance decisions can be made quickly by comparisons with linear functions. For a class of distributions, Leydold (2000) described a systematic method for forming interior polygons from construction points defined by the sides of a polygonal majorizing region. Quality of Random Numbers Produced by the Ratio-of-Uniforms Method The ratio-of-uniforms method, like any method for generating nonuniform random numbers, is dependent on a good source of uniforms. The special relationships that may exist between two successive uniforms when one of them is an extreme value can cause problems, as we indicated on pages 111 and 122. Given a high-quality uniform generator, the method is subject to the same issues of floating-point computations that we discussed on page 122.
4.8. ALIAS METHOD
133
Afflerbach and H¨ ormann (1992) and H¨ ormann (1994b) indicate that, in some cases, output of the ratio-of-uniforms method can be quite poor because of structure in the uniforms. The ratio-of-uniforms method transforms all points lying on one line through the origin into a single number. Because of the lattice structure of the uniforms from a linear congruential generator, the lines passing through the origin have regular patterns, which result in structural gaps in the numbers yielded by the ratio-of-uniforms method. Noting these distribution problems, H¨ ormann and Derflinger (1994) make some comparisons of the ratioof-uniforms method with the transformed rejection method (Algorithm 4.9, page 121), and based on their empirical study, they recommend the transformed rejection method over the ratio-of-uniforms method. The quality of the output of the ratio-of-uniforms method, however, is more a function of the quality of the uniform generator and would usually not be of any concern if a good uniform generator is used. The relative computational efficiencies of the two methods depend on the majorizing functions used. The polygonal majorizing functions used by Leydold (2000) in the ratio-of-uniforms method apparently alleviate some of the problems found by H¨ ormann (1994b). Ratio-of-Uniforms for Multivariate Distributions Stef˘ anescu and V˘ aduva (1987) and Wakefield, Gelfand, and Smith (1991) extend the ratio-of-uniforms method to multivariate distributions. As we mentioned in discussing simple acceptance/rejection methods, the probability of rejection may be quite high for multivariate distributions. High correlations in the target distribution can also reduce the efficiency of the ratioof-uniforms method even further.
4.8
Alias Method
Walker (1977) shows that a discrete distribution with k mass points can be represented as an equally weighted mixture of k two-point distributions; that is, distributions with only two mass points. Consider the random variable X such that Pr(X = xi ) = pi , i = 1, . . . , k, 0001k and i=1 pi = 1. Walker constructed k two-point distributions, Pr(Yi = yij ) = qij ,
j = 1, 2;
i = 1, . . . , k
(with qi1 + qi2 = 1) in such a way that any pi can be represented as k −1 times a sum of qi,j s. (It is easy to prove that this can be done; use induction, starting with k = 1.) A setup procedure for the alias method is shown in Algorithm 4.15. The setup phase associates with each i = 1 to k a value Pi that will determine whether the original mass point or an “alias” mass point, indexed by ai , will be delivered when i is chosen with equal probability, k1 . Two lists, L and H,
134
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
are maintained to determine which points or point pairs have probabilities less than or greater than k1 . At termination of the setup phase, all points or point pairs have probabilities equal to k1 . Marsaglia calls the setup phase “leveling the histogram”. The outputs of the setup phase are two lists, P and a, each of length k. Algorithm 4.15 Alias Method Setup to Initialize the Lists a and P 0. For i = 1 to k, set ai = i; set Pi = 0; set bi = pi − k1 ; and if bi < 0, put i in the list L; otherwise, put i in the list H. 1. If max(bi ) = 0, then stop. 2. Select l ∈ L and h ∈ H. 3. Set c = bl and d = bh . 4. Set bl = 0 and bh = c + d. 5. Remove l from L. 6. If bh ≤ 0, then remove h from H; and if bh < 0, then put h in L. 7. Set al = h and Pl = 1 + kc. 8. Go to step 1. 0001 Notice that bi = 0 during every step. The steps are illustrated in Figure 4.13 for a distribution such that Pr(X = 1) = .30, Pr(X = 2) = .05, Pr(X = 3) = .20, Pr(X = 4) = .40, Pr(X = 5) = .05.
At the beginning, L = {2, 5} and H = {1, 4}. In the first step, the values corresponding to 2 and 4 are adjusted. The steps to generate deviates, after the values of Pi and ai are computed by the setup, are shown in Algorithm 4.16. Algorithm 4.16 Generation Using the Alias Method Following the Setup in Algorithm 4.15
4.8. ALIAS METHOD
135
Figure 4.13: Setup for the Alias Method; Leveling the Histogram 1. Generate u from a U(0, 1) distribution. 2. Generate i from a discrete uniform over 1, 2, . . . , k. 3. If u ≤ Pi , then 3.a. deliver xi ; otherwise, 3.b. deliver xai . It is clear that the setup time for Algorithm 4.15 is O(k) because the total number of items in the lists L and H goes down by at least one at each step. If, in step 2, the minimum and maximum values of b are found, as in the original algorithm of Walker (1977), the algorithm may proceed slightly faster in some cases, but then the algorithm is O(k log k). The setup method given in Algorithm 4.15 is from Kronmal and Peterson (1979a). Vose (1991) also describes a setup procedure that is O(k). Once the setup is finished by whatever method, the generation time is constant, or O(1). The alias method is always at least as fast as the guide table method of Chen and Asau (Algorithm 4.3 on page 107). Its speed relative to the table-lookup method of Marsaglia as implemented by Norman and Cannon (Algorithm 4.2 on page 106) depends on the distribution. Variates from distributions with a substantial proportion of mass points whose base b representations (equation (4.5)) have many zeros can be generated very rapidly by the table-lookup method. In the IMSL Libraries, the routine rngda performs both the setup and the generation of discrete random deviates using an alias method.
136
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Kronmal and Peterson (1979a, 1979b) apply the alias method to mixture methods and acceptance/rejection methods for continuous random variables. Peterson and Kronmal (1982) describe a modification of the alias method incorporating some aspects of the urn method. This hybrid method, which they called the alias-urn method, reduces the burden of comparisons at the expense of slightly more storage space.
4.9
Use of the Characteristic Function
The characteristic function of a d-variate random variable X is defined as T φX (t) = E eit X , t ∈ IRd . (4.14) The characteristic function exists for any random variable. For a univariate random variable whose001c first two moments are finite, and 001c whose characteristic function φ is such that |φ(t)| dt and |φ00070007 (t)| dt are finite, Devroye (1986b) describes a method for generating random variates using the characteristic function. Algorithm 4.17 is Devroye’s method for a univariate continuous random variable with probability density function p(·) and characteristic function φ(·). Algorithm 4.17 Conversion of Uniform Random Numbers Using the Characteristic Function & 001c & 001c 1 1 |φ(t)| dt and b = 2π |φ00070007 (t)| dt. 0. Set a = 2π 1. Generate u and v independently from a U(−1, 1) distribution. 2. If u < 0, then 2.a. set y = bv/a and t = a2 |u|; otherwise, 2.b. set y = b/(va) and t = a2 v 2 |u|. 3. If t ≤ p(y), then 3.a. take y as the desired realization; otherwise, 3.b. return to step 1. This method relies on the facts that, under the existence conditions, 0016 1 p(y) ≤ |φ(t)| dt for all y 2π 0016 1 |φ00070007 (t)| dt for all y. 2πx2 Both of these facts are easily established by use of the inverse characteristic function transform, which exists by the integrability conditions on φ(t).
and
p(y) ≤
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
137
The method requires evaluation of the density at each step. Devroye (1996b) also discusses variations that depend on Taylor expansion coefficients. Devroye (1991) describes a related method for the case of a discrete random variable. The characteristic function allows evaluation of all moments that exist. If only some of the moments are known, an approximate method described by Devroye (1989) can be used.
4.10
Use of Stationary Distributions of Markov Chains
Many of the methods for generating deviates from a given distribution are based on a representation of the density that allows the use of some simple transformation or some selection rule for deviates generated from a different density. In the univariate ratio-of-uniforms method, for example, we identify a bivariate uniform random variable with a region of support such that one of the marginal distributions is the distribution of interest. We then generate bivariate uniform random deviates over the region, and then, by a very simple transformation and selection, we get univariate deviates from the distribution of interest. Another approach is to look for a stochastic process that can be easily simulated and such that the distribution of interest can be identified as a distribution at some point in the stochastic process. The simplest useful stochastic process is a Markov chain with a stationary distribution corresponding to the distribution of interest. Markov Chains: Basic Definitions A Markov chain is a sequence of random variables, X1 , X2 , . . ., such that the distribution of Xt+1 given Xt is independent of Xt−1 , Xt−2 , . . . A sequence of realizations of such random variables is also called a Markov chain (that is, the term Markov chain can be used to refer either to a random sequence or to a fixed sequence of realizations). In this section, we will briefly discuss some types of Markov chains and their properties. The main purpose is to introduce the terms that are used to characterize the Markov chains in the applications that we describe later. See Meyn and Tweedie (1993) for extensive discussions of Markov chains. Tierney (1996) discusses the aspects of Markov chains that are particularly relevant for the applications that we consider in later sections. The union of the supports of the random variables is called the state space of the Markov chain. Whether or not the state space is countable is an important characteristic of a Markov chain. A Markov chain with a countable or “discrete” state space is easier to work with and can be used to approximate a Markov chain with a continuous state space. Another important characteristic of a Markov chain is the nature of the indexing. As we have written the sequence
138
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
above, we have implied that the index is discrete. We can generalize this to a continuous index, in which case we usually use the notation X(t). A Markov chain is time homogeneous if the distribution of Xt is independent of t. For our purposes, we can usually restrict attention to a time-homogeneous discrete-state Markov chain with a discrete index, and this is what we assume in the following discussion in this section. For the random variable Xt in a discrete-state Markov chain with state space S, let I index the states; that is, i in I implies that si is in S. For si ∈ S, let Pr(Xt = si ) = pti . The Markov chain can be characterized by an initial distribution and a square transition matrix or transition kernel K = (kij ), where kij = Pr(Xt+1 = si |Xt = sj ). The distribution at time t is characterized by a vector of probabilities pt = (pt1 , pt2 , . . .), so the vector itself is called a distribution. The initial distribution is p0 = (p01 , p02 , . . .), and the distribution at time t = 1 is Kp0 . We sometimes refer to a Markov chain by the doubleton (K, p0 ) or just (K, p). In general, we have pt
= Kpt−1 = K t p0 . (t)
t We 0001 denote the elements of K as kij . The relationships above require that j kij = 1. (A matrix with this property is called a stochastic matrix.) A distribution p such that Kp = p
is said to be invariant or stationary. From that definition, we see that an invariant is an eigenvector of the transition matrix corresponding to an eigenvalue of 1. (Notice the unusual usage of the word “distribution”; in this context, it means a vector.) For a given Markov chain, it is of interest to know whether the chain has an invariant (that is, whether the transition matrix has an eigenvalue equal to 1) and if so, whether the invariant can be reached from the starting distribution p0 . Some Markov chains oscillate among a set of distributions. (For example, think of a two-state Markov chain whose transition matrix has elements k11 = k22 = 0 and k12 = k21 = 1.) We will be interested in chains that do not oscillate; that is, chains that are aperiodic. A chain is guaranteed to be aperiodic if, for (t) some t sufficiently large, kii > 0 for all i in I. A Markov chain is reversible if, for any t, the conditional probability of Xt given Xt+1 is the same as the conditional probability of Xt given Xt−1 . A discrete-space Markov chain obviously is reversible if and only if its transition matrix is symmetric. A Markov chain defined by (K, p) is said to be in detailed balance if Kij pi = Kji pj .
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
139
A Markov chain is irreducible if, for all i, j in I, there exists a t > 0 such (t) that kij > 0. If the chain is irreducible, detailed balance and reversibility are equivalent. Another property of interest is when a Markov chain first takes on a given state. This is called the first passage time for that state. Given that the chain is in a particular state, the first passage time to that state is the first return time for that state. Let Tii be the first return time to state i; that is, for a discrete time chain, let Tii = min{t, s.t. Xt = si |X0 = si }. (Tii is a random variable.) An irreducible Markov chain is recurrent if, for some i, Pr(Tii < ∞) = 1. (For an irreducible chain, this implies the condition for all i.) An irreducible Markov chain is positive recurrent if, for some i, E(Tii ) < ∞. (For an irreducible chain, this implies the condition for all i.) An aperiodic, irreducible, positive recurrent Markov chain is associated with a stationary distribution or invariant distribution, which is the limiting distribution of the chain. In applications of Markov chains, the question of whether the chain has converged to this limiting distribution is one of the primary concerns. Applications that we discuss in later sections have uncountable state spaces, but the basic concepts extend to those. For a continuous state space, instead of a vector specifying the distribution at any given time, we have a probability density at that time, K is a conditional probability density for Xt+1 |Xt , and we have a similar expression for the density at t + 1 formed by integrating over the conditional density weighted by the unconditional density at t. Tierney (1996) carefully discusses the generalization to an uncountable state space and a continuous index. Markov Chain Monte Carlo There are various ways of using a Markov chain to generate random variates from some distribution related to the chain. Such methods are called Markov chain Monte Carlo, or MCMC. An algorithm based on a stationary distribution of a Markov chain is an iterative method because a sequence of operations must be performed until they converge. A Markov chain is the basis for several schemes for generating random numbers. The interest is not in the sequence of the Markov chain itself. The elements of the chain are accepted or rejected in such a way as to form a different chain whose stationary distribution is the distribution of interest. Following engineering terminology for sampling sequences, the techniques based on these chains are generally called “samplers”. The static sample, and not the sequence, is what is used. The objective in the Markov chain samplers is to generate a sequence of autocorrelated points with a given stationary distribution.
140
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
The Metropolis Random Walk For a distribution with density pX , the Metropolis algorithm, introduced by Metropolis et al. (1953), generates a random walk and performs an acceptance/rejection based on p evaluated at successive steps in the walk. In the simplest version, the walk moves from the point yi to a candidate point yi+1 = yi +s, where s is a realization from U(−a, a), if pX (yi+1 ) ≥ u, pX (yi )
(4.15)
where u is an independent realization from U(0, 1). If the new point is at least as probable (that is, if pX (yi+1 ) ≥ pX (yi )), the condition (4.15) implies acceptance without the need to generate u. The random walk of Metropolis et al. is the basic algorithm of simulated annealing, which is currently widely used in optimization problems. It is also used in simulations of models in statistical mechanics (see Section 7.9). The algorithm is described in Exercise 7.16 on page 277. If the range of the distribution is finite, the random walk is not allowed to go outside of the range. Consider, for example, the von Mises distribution, with density 1 ec cos(x) for − π ≤ x ≤ π, (4.16) p(x) = 2πI0 (c) where I0 is the modified Bessel function of the first kind and of order zero. Notice, however, that it is not necessary to know this normalizing constant because it is canceled in the ratio. The fact that all we need is a nonnegative function that is proportional to the density of interest is an important property of this method. In the ordinary acceptance/rejection methods, we need to know the constant. If c = 3, after a quick inspection of the amount of fluctuation in p, we may choose a = 1. The output for n = 1000 and a starting value of y0 = 1 is shown in Figure 4.14. The output is a Markov chain. A histogram, which is not affected by the sequence of the output in a large sample, is shown in Figure 4.15. The von Mises distribution is an easy one to simulate by the Metropolis algorithm. This distribution is often used by physicists in simulations of lattice gauge and spin models, and the Metropolis method is widely used in these simulations. Notice the simplicity of the algorithm: we do not need to determine a majorizing density nor even evaluate the Bessel function that is the normalizing constant for the von Mises density. The Markov chain samplers generally require a “burn-in” period (that is, a number of iterations before the stationary distribution is achieved). In practice, the variates generated during the burn-in period are discarded. The number of iterations needed varies with the distribution and can be quite large sometimes thousands. The von Mises example shown in Figure 4.14 is unusual; no burn-in
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
141
Figure 4.14: Sequential Output from the Metropolis Algorithm for a Von Mises Distribution is required. In general, convergence is much quicker for univariate distributions with finite ranges such as this one. It is important to remember what convergence means; it does not mean that the sequence is independent from the point of convergence forward. The deviates are still from a Markov chain. The Metropolis acceptance/rejection sequence is illustrated in Figure 4.16. Compare this with the acceptance/rejection method based on independent variables, as illustrated in Figure 4.5. The Metropolis–Hastings Method Hastings (1970) describes an algorithm that uses a more general chain for the acceptance/rejection step. Instead of just basing the decision on the probability density pX as in the inequality (4.15), the Metropolis–Hastings sampler to generate deviates from a distribution with a probability density pX uses deviates from a Markov chain with density gYt+1 |Yt . The method is shown in Algorithm 4.18. The conditional density gYt+1 |Yt is chosen so that it is easy to generate deviates from it. Algorithm 4.18 Metropolis–Hastings Algorithm 0. Set k = 0. 1. Choose x(k) in the range of pX . (The choice can be arbitrary.)
142
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.15: Histogram of the Output from the Metropolis Algorithm for a Von Mises Distribution 2. Generate y from the density gYt+1 |Yt (y|x(k) ). 3. Set r: r = pX (y)
gYt+1 |Yt (x(k) |y) . pX (x(k) )gYt+1 |Yt (y|x(k) )
4. If r ≥ 1, then 4.a. set x(k+1) = y; otherwise, 4.b. generate u from U(0, 1) and if u < r, then 4.b.i. set x(k+1) = y; otherwise, 4.b.ii. set x(k+1) = x(k) . 5. If convergence has occurred, then random walk
yi
accept? i.i.d. from pX
no
yi+1 = yi + si+1 yes xj
yi+3 = yi+1 + si+2 no
yi+2 = yi+2 + si+3 yes xj+1
Figure 4.16: Metropolis Acceptance/Rejection
··· ··· ···
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
143
5.a. deliver x = x(k+1) ; otherwise, 5.b. set k = k + 1, and go to step 2. Compare Algorithm 4.18 with the basic acceptance/rejection method in Algorithm 4.6, page 114. The analog to the majorizing function in the Metropolis– Hastings algorithm is the reference function gYt+1 |Yt (x|y) . pX (x) gYt+1 |Yt (y|x) In Algorithm 4.18, r is called the “Hastings ratio”, and step 4 is called the “Metropolis rejection”. The conditional density gYt+1 |Yt (·|·) is called the “proposal density” or the “candidate generating density”. Notice that because the reference function contains pX as a factor, we only need to know pX to within a constant of proportionality. As we have mentioned already, this is an important characteristic of the Metropolis algorithms. We can see that this algorithm delivers realizations from the density pX by using the same method suggested in Exercise 4.2 (page 160); that is, determine the CDF and differentiate. The CDF is the probability-weighted sum of the two components corresponding to whether the chain moved or not. In the case in which the chain does move (that is, in the case of acceptance), for the random variable Z whose realization is y in Algorithm 4.18, we have 0012 0011 ' g(xi |Y ) ' Pr(Z ≤ x) = Pr Y ≤ x U ≤ p(Y ) p(xi )g(Y |xi ) 001c x 001c p(t)g(xi |t)/(p(xi )g(t|xi )) g(t|xi ) ds dt 0 = 001c−∞ ∞ 001c p(t)g(xi |t)/(p(xi )g(t|xi )) g(t|xi ) ds dt −∞ 0 0016 x = pX (t) dt. −∞
We can illustrate the use of the Metropolis–Hastings algorithm using a Markov chain in which the density of Xt+1 is normal with a mean of Xt and a variance of σ 2 . Let us use this density to generate a sample from a standard normal distribution (that is, a normal with a mean of 0 and a variance of 1). We start with x0 chosen arbitrarily. We take logs and cancel terms in the expression for r in Algorithm 4.18. The sequential output for n = 1000, a starting value of x0 = 10 and a variance of σ 2 = 9 is shown in Figure 4.17. Notice that the values descend very quickly from the starting value, which would be a very unusual realization of a standard normal. This example is also special. In practice, we generally cannot expect such a short burn-in period. Notice also in Figure 4.17 the horizontal line segments where the underlying Markov chain did not advance. There are several variations of the basic Metropolis–Hastings algorithm. See Bhanot (1988) and Chib and Greenberg (1995) for descriptions of modifications
144
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.17: Sequential Output from a Standard Normal Distribution Using a Markov Chain, N(Xt , σ 2 ) and generalizations. Also see Section 4.14 for two related methods: Gibbs sampling and hit-and-run sampling. Because those methods are particularly useful in multivariate simulation, we defer the discussion to that section. The Markov chain Monte Carlo method has become one of the most important tools in statistics in recent years. Its applications pervade Bayesian analysis as well as Monte Carlo procedures in many settings. See Gilks, Richardson, and Spiegelhalter (1996) for several examples. Whenever a correlated sequence such as a Markov chain is used, variance estimation must be performed with some care. In the more common cases of positive autocorrelation, the ordinary variance estimators are negatively biased. The method of batch means or some other method that attempts to account for the autocorrelation should be used. See Section 7.4 for discussions of these methods. Tierney (1991, 1994) describes an independence sampler, a Metropolis– Hastings sampler for which the proposal density does not depend on Yt ; that is, gYt+1 |Yt (·|·) = gYt+1 (·). For this type of proposal density, it is more critical that gYt+1 (·) approximates pX (·) fairly well and that it can be scaled to majorize pX (·) in the tails. Liu (1996) and Roberts (1996) discuss some of the properties of the independence sampler and its relationship to other Metropolis–Hastings methods. As with the acceptance/rejection methods using independent sequences, the acceptance/rejection methods based on Markov chains apply immediately to multivariate random variables. As mentioned above, however, convergence gen-
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
145
erally becomes slower as the number of elements in the random vector increases. As an example of MCMC in higher dimensions, consider an example similar to that shown in Figure 4.17 except for a multivariate normal distribution instead of a univariate one. We use a d-dimensional normal with a mean vector xt and a variance-covariance matrix Σ to generate xt+1 for use in the Metropolis– Hastings method of Algorithm 4.18. Taking d = 3, 9 0 0 Σ = 0 9 0 , 0 0 9 and starting with x0 = (10, 10, 10), the first 1000 values of the first element (which should be a realization from a standard univariate normal) are shown in Figure 4.18.
Figure 4.18: Sequential Output of x1 from a Trivariate Standard Normal Distribution Using a Markov Chain, N(Xt , Σ)
Convergence Two of the most important issues in MCMC concern the rate of convergence (that is, the length of the burn-in) and the frequency with which the chain advances. In many applications of simulation, such as studies of waiting times in queues, there is more interest in transient behavior than in stationary behavior. This is not the case in random number generation using an iterative method. For general use in random number generation, the stationary distribution is the only thing of interest. (We often use the terms “Monte Carlo” and “simulation”
146
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
rather synonymously; stationarity and transience, however, are often the key distinctions between Monte Carlo applications and simulation applications. In simulation in practice, the interest is rarely in the stationary behavior, but it is in these Monte Carlo applications.) The issue of convergence is more difficult to address in multivariate distributions. It is for multivariate distributions, however, that the MCMC method is most useful. This is because the Metropolis–Hastings algorithm does not require knowledge of the normalizing constants, and the computation of a normalizing constant may be more difficult for multivariate distributions. Gelman and Rubin (1992b) give examples in which the burn-in is much longer than might be expected. Various diagnostics have been proposed to assess convergence. Cowles and Carlin (1996) discuss and compare thirteen different ones. Most of these diagnostics use multiple chains in one way or another; see, for example, Gelman and Rubin (1992a), Roberts (1992), and Johnson (1996). Multiple chains or separate subsequences within a chain can be compared using analysis-of-variance methods. Once convergence has occurred, the variance within subsequences should be the same as the variance between subsequences. Measuring the variance within a subsequence must be done with some care, of course, because of the autocorrelations. Batch means from separate streams can be used to determine when the variance has stabilized. (See Section 7.4 for a description of batch means.) Yu (1995) uses a cusum plot on only one chain to help to identify convergence. Robert (1998a) provides a benchmark case for evaluation of convergence assessment techniques. Rosenthal (1995), under certain conditions, gives bounds on the length of runs required to give satisfactory results. Cowles and Rosenthal (1998) suggest using auxiliary simulations to determine if the conditions that ensure the bounds on the lengths are satisfied. All of these methods have limitations. The collection of articles in Gilks, Richardson, and Spiegelhalter (1996) addresses many of the problems of convergence. Gamerman (1997) provides a general introduction to MCMC in which many of these convergence issues are explored. Additional reviews are given in Brooks and Roberts (1999) and the collection of articles in Robert (1998b). Mengersen, Robert, and GuihenneucJouyaux (1999) give a classification of methods and review their performance. Methods of assessing convergence are currently an area of active research. Use of any method that indicates that convergence has occurred based on the generated data can introduce bias into the results, unless somehow the probability of making the decision that convergence has occurred can be accounted for in any subsequent inference. This is the basic problem in any adaptive statistical procedure. Cowles, Roberts, and Rosenthal (1999) discuss how bias may be introduced in inferences made using an MCMC method after a convergence diagnostic has been used in the sampling. The main point in this section is that there are many subtle issues, and MCMC must be used with some care. Various methods have been proposed to speed up the convergence; see Gelfand and Sahu (1994), for example. Frigessi, Martinelli, and Stander (1997)
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
147
discuss general issues of convergence and acceleration of convergence. How quickly convergence occurs is obviously an important consideration for the efficiency of the method. The effects of slow convergence, however, are not as disastrous as the effects of prematurely assuming that convergence has occurred. Coupled Markov Chains and “Perfect” Sampling Convergence is an issue because we want to sample from the stationary distribution. The approach discussed above is to start at some arbitrary point, t = 0, and proceed until we think convergence has occurred. Propp and Wilson (1996, 1998) suggested another approach for aperiodic, irreducible, positive recurrent chains with finite state spaces. Their method is based on starting multiple chains at an earlier point. The method is to generate chains that are coupled by the same underlying element of the sample space. The coupling can be accomplished by generating a single realization of some random variable and then letting that realization determine the updating for each of the chains. This can be done in several ways. The simplest, perhaps, is to choose the coupling random variable to be U(0, 1) and use the inverse CDF method. At the point t, we generate ut+1 and update each chain with Xt+1 |ut+1 , xt by the method of equation (4.3), xt+1 = min{v, s.t. ut+1 ≤ PXt+1 |xt (v)},
(4.17)
where PXt+1 |xt (·) is the conditional CDF for Xt+1 , given Xt = xt . With this setup for coupled chains, any one of the chains may be represented in a “stochastic recursive sequence”, Xt = φ(Xt−1 , Ut ),
(4.18)
where φ is called the transition rule. The transition rule also allows us to generate Ut+1 |xt+1 , xt , as
(4.19) U PXt+1 |xt (xt+1 − 0001), PXt+1 |xt (xt+1 ) , where 0001 is vanishingly small. The idea in the method of Propp and Wilson is to start coupled chains at ts = −1 at each of the states and advance them all to t = 0. If they coalesce (that is, if they all take the same value), they are the same chain from then on, and X0 has the stationary distribution. If they do not coalesce, then we can start the chains at ts = −2 and maintain exactly the same coupling; that is, we generate a u−1 , but we use the same u0 as before. If these chains coalesce at t = 0, then we accept the common value as a realization of the stationary distribution. If they do not coalesce, we back up the starting points of the chains even further. Propp and Wilson (1996) suggested doubling the starting point each time, but any point further back in time would work. The important thing is that each time chains are started that the realizations of the uniform random variable from previous runs be used. This method is called coupling
148
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
from the past (CFTP). Propp and Wilson (1996) called this method of sampling “exact sampling”. Note that if we do the same thing starting at a fixed point and proceeding forward with parallel chains, the value to which they coalesce is not a realization of the stationary distribution. If the state space is large, checking for coalescence can be computationally intensive. There are various ways of reducing the burden of checking for coalescence. Propp and Wilson (1996) discussed the special case of a monotone chain (one for which the transition matrix stochastically preserves orderings of state + vectors) that has two starting state vectors x− 0 and x0 such that, for all x ∈ S, − + x0 ≤ x ≤ x0 . In that case, they show that if the sequence beginning with x− 0 and the sequence beginning with x+ 0 coalesce, the sequence from that point on is a sample from the stationary distribution. This is interesting, but of limited relevance. Because CFTP depends on fixed values of u0 , u−1 , . . ., for certain of these values, coalescence may occur with very small probability. (This is similar to the modified acceptance/rejection method described in Exercise 4.7b.) In these cases, the ts that will eventually result in coalescence may be very large in absolute value. An “impatient user” may decide just to start over. Doing so, however, biases the procedure. Fill (1998) described a method for sampling directly from the invariant distribution that uses coupled Markov chains of a fixed length. It is an acceptance/rejection method based on whether coalescence has occurred. This method can be restarted without biasing the results; the method is “interruptible”. In this method, an ending time and a state corresponding to that time are chosen arbitrarily. Then, we generate backwards from ts as follows. 1. Select a time ts > 0 and a state xts . 2. Generate xts −1 |xts , xts −2 |xts −1 , . . . , x0 |x1 . 3. Generate u1 |x0 , x1 , u2 |x1 , x2 , . . . , uts |xts −1 , xts using, perhaps, the distribution (4.19). 4. Start chains at t = 0 at each of the states, and advance them to t = ts using the common us. 5. If the chains have coalesced by time ts , then accept x0 ; otherwise, return to step 1. Fill gives a simple proof that this method indeed samples from the invariant distribution. Methods that attempt to sample directly from the invariant distribution of a Markov chain, such as CFTP and interruptible coupled chains, are sometimes called “perfect sampling” methods. The requirement of these methods of a finite state space obviously limits their usefulness. Møller and Schladitz (1999) extended the method to a class
4.11. USE OF CONDITIONAL DISTRIBUTIONS
149
of continuous-state Markov chains. Fill et al. (2000) also discussed the problem of continuous-state Markov chains and considered ways of increasing the computational efficiency.
4.11
Use of Conditional Distributions
If the density of interest, pX , can be represented as a marginal density of some joint density pXY , observations on X can be generated as a Markov chain with elements having densities pYi |Xi−1 ,
pXi |Yi ,
pYi+1 |Xi ,
pXi+1 |Yi+1 , . . . .
This is a simple instance of the Gibbs algorithm, which we discuss beginning on page 156. Casella and George (1992) explain this method in general. The usefulness of this method depends on identifying a joint density with conditionals that are easy to simulate. For example, if the distribution of interest is a standard normal, the joint density 1 1 pXY (x, y) = √ , 2 −x 2π e /2
2
for − ∞ < x < ∞, 0 < y < e−x
/2
,
has a marginal density corresponding to the distribution of interest, and it has 2 simple conditionals. The conditional distribution of Y |X is U(0, e−X /2 ), and √ √ the conditional of X|Y is U(− −2 log Y , −2 log Y ). Starting with x0 in the range of X, we generate y1 as a uniform conditional on x0 , then x1 as a uniform conditional on y1 , and so on. The auxiliary variable Y that we introduce just to simulate X is called a “latent variable”.
4.12
Weighted Resampling
To obtain a sample x1 , x2 , . . . , xm that has an approximate distribution with density pX , a sample y1 , y2 , . . . , yn from another distribution with density gY can be resampled using weights or probabilities pX (yi )/gY (xi ) , wi = 0001n j=1 pX (yj )/gY (xj )
for i = 1, 2, . . . n.
The method was suggested by Rubin (1987, 1988), who called it SIR for sampling/importance resampling. The method is also called importance-weighted resampling. The resampling should be done without replacement to give points with low probabilities a chance to be represented. Methods for sampling from a given set with given probabilities are discussed in Section 6.1, page 217. Generally, in SIR, n is much larger than m. This method can work reasonably well if the density gY is very close to the target density pX .
150
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
This method, like the Markov chain methods above, has the advantage that the normalizing constant of the target density is not needed. Instead of the density pX (·), any nonnegative proportional function cpX (·) could be used. Gelman (1992) describes an iterative variation in which n is allowed to increase as m increases; that is, as the sampling continues, more variates are generated from the distribution with density gY .
4.13
Methods for Distributions with Certain Special Properties
Because of the analytical and implementation burden involved in building a random number generator, a general rule is that a single algorithm that works in two settings is better than two different algorithms, one for each setting. This is true, of course, unless the individual algorithms perform better in the respective special cases, and then the question is how much better. In random number generation from nonuniform distributions, it is desirable to have “universal algorithms” that use general methods that we have discussed above but are optimized for certain broad classes of distributions. For distributions with certain special properties, general algorithms using mixtures and rejection can be optimized for broad classes of distributions. We have already discussed densities that are nearly linear (Algorithm 4.7, page 118) and densities that are nearly flat (Algorithm 4.11, page 126). Another broad class of distributions are those that are infinitely divisible. Damien, Laud, and Smith (1995) give general methods for generation of random deviates from distributions that are infinitely divisible. Distributions with Densities that Can Be Transformed to Concave Functions An important special property of some distributions is concavity of the density or of some transformation of the density. On page 119, we discuss how easy it is to form polygonal majorizing and squeeze functions for concave densities. Similar ideas can be employed for cases in which the density can be invertibly transformed into a concave function. In some applications, especially in reliability or survival analysis, the logarithm is a standard transformation and log-concavity is an important property. A distribution is log-concave if its density (or probability function) has the property 0011 0012 x1 + x2 + log p(x2 ) < 0, log p(x1 ) − 2 log p 2 wherever the densities are positive. If the density is twice-differentiable, this condition is satisfied if the negative of the Hessian is positive definite. Many of the commonly used distributions, such as the normal, the gamma with shape
4.13. DISTRIBUTIONS WITH SPECIAL PROPERTIES
151
parameter greater than 1, and the beta with parameters greater than 1, are logconcave. See Pratt (1981) for discussion of these properties, and see Dellaportas and Smith (1993) for some examples in generalized linear models. Devroye (1984b) describes general methods for a log-concave distribution, and Devroye (1987) describes a method for a discrete distribution that is log-concave. The methods of forming polygonal majorizing and squeeze functions for concave densities can also be applied to convex densities or to densities that can be invertibly transformed into concave functions by reversing the role of the majorizing and squeeze functions. Incremental Formation of Majorizing and Squeeze Functions: “Adaptive” Rejection Gilks (1992) and Gilks and Wild (1992) describe a method that they call adaptive rejection sampling or ARS for a continuous log-concave distribution. The adaptive rejection method described by Gilks (1992) begins with a set Sk consisting of the points x0 < x1 < . . . < xk < xk+1 from the range of the distribution of interest. Define Li as the straight line determined by the points (xi , log p(xi )) and (xi+1 , log p(xi+1 )); then, for i = 1, 2, . . . , k, define the piecewise linear function hk (x) as hk (x) = min (Li−1 (x), Li+1 (x))
for xi ≤ x < xi+1 .
This piecewise linear function is a majorizing function for the log of the density, as shown in Figure 4.19. The chords formed by the continuation of the line segments form functions that can be used as a squeeze function, mk (x), which is also piecewise linear. For the density itself, the majorizing function and the squeeze function are piecewise exponentials. The majorizing function is cgk (x) = exp hk (x), where each piece of gk (x) is an exponential density function truncated to the appropriate range. The density is shown in Figure 4.20. In each step of the acceptance/rejection algorithm, the set Sk is augmented by the point generated from the majorizing distribution, and k is increased by 1. The method is shown in Algorithm 4.19. In Exercise 4.14, page 162, you are asked to write a program for performing adaptive rejection sampling for the density shown in Figure 4.20, which is the same one as in Figure 4.4, and to compare the efficiency of this method with the standard acceptance/rejection method. Algorithm 4.19 Adaptive Acceptance/Rejection Sampling 0. Initialize k and Sk . 1. Generate y from gk .
152
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.19: Adaptive Majorizing Function with the Log-Density (Same Density as in Figure 4.4) 2. Generate u from a U(0, 1) distribution. exp mk (y) , then cgk (y) 3.a. deliver y; otherwise, p(y) , then deliver y; 3.b. if u ≤ cgk (y) 3.c. set k = k + 1, add y to Sk , and update hk , gk , and mk .
3. If u ≤
4. Go to step 1. After an update step, the new piecewise linear majorizing function for the log of the density is as shown in Figure 4.21. Gilks and Wild (1992) describe a similar method, but instead of using secants as the piecewise linear majorizing function, they use tangents of the log of the density. This requires computation of numerical derivatives of the log density. H¨ ormann (1994a) adapts the methods of Gilks (1992) and Gilks and Wild (1992) to discrete distributions. T -concave Distributions H¨ ormann (1995) extends the methods for a distribution with a log-concave density to a distribution whose density p can be transformed by a strictly increasing
4.13. DISTRIBUTIONS WITH SPECIAL PROPERTIES
153
Figure 4.20: Exponential Adaptive Majorizing Function with the Density in Figure 4.4 operator T such that T (p(x)) is concave. In that case, the √ density p is said to be “T -concave”. Often, a good choice is T (s) = −1/ s. A density that is T -concave with respect to this transformation is log-concave. Many of the standard distributions have T -concave densities, and in those cases we refer to the distribution itself as T -concave. The normal distribution (equation (5.6)), for example, is T -concave for all values of its parameters. The gamma distribution (equation (5.13)) is T -concave for α ≥ 1 and β > 0. The beta distribution (equation (5.14)) is√T -concave for α ≥ 1 and β ≥ 1. The transformation T (s) = −1/ s allows construction of a table mountain majorizing function (reminiscent of a majorizing function in the ratioof-uniforms method) that is then used in an acceptance/rejection method. H¨ ormann calls this method transformed density rejection. Leydold (2001) describes an algorithm for T -concave distributions based on a ratio-of-uniforms type of acceptance/rejection method. The advantage of Leydold’s method is that it requires less setup time than H¨ ormann’s method, and so would be useful in applications in which the parameters of the distribution change relatively often compared to the number of variates generated at each fixed value. Gilks, Best, and Tan (1995; corrigendum, Gilks, Neal, Best, and Tan, 1997) develop an adaptive rejection method that does not require the density to be log-concave. They call the method adaptive rejection Metropolis sampling.
154
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.21: Adaptive Majorizing Function with an Additional Point Unimodal Densities Many densities of interest are unimodal, and some simple methods of random number generation take advantage of that property. The ziggurat and Monty Python decomposition methods of Marsaglia and Tsang (1984, 1998) are most effective for unimodal distributions, in which the first decomposition can involve forming two regions, one on each side of the mode. Devroye (1984a) describes general methods for generating variates from such distributions. If a distribution is not unimodal, it is sometimes useful to decompose the distribution into a mixture of unimodal distributions to use the techniques on them separately. Methods for sampling from unimodal discrete distributions, which often involve linear searches, can be more efficient if the search begins at the mode. (See, for example, the method for the Poisson distribution on page 188.) Multimodal Densities For simulating densities with multiple modes, it is generally best to express the distribution as a mixture and use different methods in different regions. MCMC methods can become trapped around a local mode. There are various ways of dealing with this problem. One way to do this is to modify the target density pX (·) in the Hastings ratio so that it becomes flatter, and therefore it is more likely that the sequence will move away from a local mode. Geyer and Thompson (1995) describe a method of “simulated tempering”, in which a “temperature” parameter, which controls how likely it is that the sequence will move away from a current state, is varied randomly. This is similar to
4.14. GENERAL METHODS FOR MULTIVARIATE DISTRIBUTIONS 155 methods used in simulated annealing (see Section 7.9). Neal (1996) describes a systematic method of alternating between the target density and a flatter one. He called the method “tempered transition”.
4.14
General Methods for Multivariate Distributions
Two simple methods of generating multivariate random variates make use of variates from univariate distributions. One way is to generate a vector of i.i.d. variates and then apply a transformation to yield a vector from the desired multivariate distribution. Another way is to use the representation of the distribution function or density function as a product of the form pX1 X2 X3 ···Xd = pX1 |X2 X3 ···Xd · pX2 |X3 ···Xd · pX3 |···Xd · · · pXd . In this method, we generate a marginal xd from pXd , then a conditional xd−1 from pXd−1 |Xd , and continue in this way until we have the full realization x1 , x2 , . . . , xd . We see two simple examples of these methods at the beginning of Section 5.3, page 197. In the first example in that section, we generate a d-variate normal with variance-covariance matrix Σ either by the transformation x = T T z, where T is a d × d matrix such that T T T = Σ and z is a d-vector of i.i.d. N(0, 1) variates. In the second case, we generate x1 from N1 (0, σ11 ), then generate x2 conditionally on x1 , then generate x3 conditionally on x1 and x2 , and so on. As mentioned in discussing acceptance/rejection methods in Sections 4.5 and 4.10, these methods are directly applicable to multivariate distributions, so acceptance/rejection is a third general way of generating multivariate observations. As in the example of the bivariate gamma on page 123, however, this usually involves a multivariate majorizing function, so we are still faced with the basic problem of generating from some multivariate distribution. For higher dimensions, the major problem in using acceptance/rejection methods for generating multivariate deviates results from one of the effects of the so-called “curse of dimensionality”. The proportion of the volume of a closed geometrical figure that is in the outer regions of that figure increases with increasing dimensionality. (See Section 10.7 of Gentle, 2002, and Exercise 4.4f at the end of this chapter.) An iterative method somewhat similar to the use of marginals and conditionals can also be used to generate multivariate observations. This method was used by Geman and Geman (1984) for generating observations from a Gibbs distribution (Boltzmann distribution) and so is called the Gibbs method. In the Gibbs method, after choosing a starting point, the components of the d-vector variate are generated one at a time conditionally on all others. If pX is the density of the d-variate random variable X, we use the conditional densities pX1 |X2 X3 ···Xd , pX2 |X1 X3 ···Xd , and so on. At each stage, the conditional distribution uses the most recent values of all of the other components. Obviously,
156
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
it may require a number of iterations before the choice of the initial starting point is washed out. The method is shown in Algorithm 4.20. (In the algorithms to follow, we represent the support of the density of interest by S, where S ⊆ IRd .) Algorithm 4.20 Gibbs Method 0. Set k = 0. 1. Choose x(k) ∈ S. (k+1)
(k)
(k)
(k+1)
, x2
(k)
conditionally on x2 , x3 , . . . , xd , 2. Generate x1 (k+1) (k+1) (k) (k) Generate x2 conditionally on x1 , x3 , . . . , xd , .. (k+1) (k+1) (k+1) (k) , x2 , . . . , xd , Generate xd−1 conditionally on x1 (k+1)
Generate xd
conditionally on x1
(k+1)
(k+1)
, . . . , xd−1 .
3. If convergence has occurred, then 3.a. deliver x = x(k+1) ; otherwise, 3.b. set k = k + 1, and go to step 2. Casella and George (1992) give a simple proof that this iterative method converges; that is, as k → ∞, the density of the realizations approaches pX . The question of whether convergence has practically occurred in a finite number of iterations in the Gibbs method is similar to the same question in the Metropolis– Hastings method discussed in Section 4.10. In either case, to determine that convergence has occurred is not a simple problem. Once a realization is delivered in Algorithm 4.20 (that is, once convergence has been deemed to have occurred), subsequent realizations can be generated either by starting a new iteration with k = 0 in step 0 or by continuing at step 1 with the current value of x(k) . If the chain is continued at the current value of x(k) , we must remember that the subsequent realizations are not independent. This affects variance estimates (second-order sample moments) but not means (first-order moments). In order to get variance estimates, we may use means of batches of subsequences or use just every mth (for some m > 1) deviate in step 3. (The idea is that this separation in the sequence will yield subsequences or a systematic subsample with correlations nearer 0. See Section 7.4 for a description of batch means.) If we just want estimates of means, however, it is best not to subsample the sequence; that is, the variances of the estimates of means (first-order sample moments) using the full sequence are smaller than the variances of the estimates of the same means using a systematic (or any other) subsample (as long as the Markov chain is stationary). To see this, let x ¯i be the mean of a systematic subsample of size n consisting of every mth realization beginning with the ith realization of the converged
4.14. GENERAL METHODS FOR MULTIVARIATE DISTRIBUTIONS 157 sequence. Now, following MacEachern and Berliner (1994), we observe that ¯j )| ≤ V(¯ xl ) |Cov(¯ xi , x for any positive i, j, and l less than or equal to m. Hence, if x ¯ is the sample mean of a full sequence of length nm, then V(¯ x)
m
= V(¯ xl )/m +
Cov(¯ xi , x ¯j )/m2
i=j;i,j=1
≤ V(¯ xl )/m + m(m − 1)V(¯ xl )/m2 = V(¯ xl ). See also Geyer (1992) for a discussion of subsampling in the chain. The paper by Gelfand and Smith (1990) was very important in popularizing the Gibbs method. Gelfand and Smith also describe a related method of Tanner and Wong (1987), called data augmentation, which Gelfand and Smith call substitution sampling. In this method, a single component of the d-vector is chosen (in step 1), and then multivariate subvectors are generated conditional on just one component. This method requires d(d−1) conditional distributions. The reader is referred to their article and to Schervish and Carlin (1992) for descriptions and comparisons with different methods. Tanner (1996) defines a chained data augmentation, which is the Gibbs method described above. In the Gibbs method, the components of the d-vector are changed systematically, one at a time. The method is sometimes called alternating conditional sampling to reflect this systematic traversal of the components of the vector. Another type of Metropolis method is the hit-and-run sampler. In this method, all components of the vector are updated at once. The method is shown in Algorithm 4.21 in the general version described by Chen and Schmeiser (1996). Algorithm 4.21 Hit-and-Run Sampling 0. Set k = 0. 1. Choose x(k) ∈ S. 2. Generate a random normalized direction v (k) in IRd . (This is equivalent to a random point on a sphere, as discussed on page 201.) 3. Determine the set S (k) ⊆ IR consisting of all λ 0018 (x(k) + λv (k) ) ∈ S. (S (k) is one-dimensional; S is d-dimensional.) 4. Generate λ(k) from the density g (k) , which has support S (k) . 5. With probability a(k) , 5.a. set x(k+1) = x(k) + λ(k) v (k) ; otherwise, 5.b. set x(k+1) = x(k) .
158
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
6. If convergence has occurred, then 6.a. deliver x = x(k+1) ; otherwise, 6.b. set k = k + 1, and go to step 2. Chen and Schmeiser (1996) discuss various choices for g (k) and a(k) . One choice is p(x(k) + λv (k) ) for λ ∈ S (k) , 001c (k) + uv (k) ) du p(x (k) (k) S g (λ) = 0 otherwise, and a(k) = 1. Another choice is g uniform over S (k) if S (k) is bounded, or else some symmetric distribution centered on 0 (such as a normal or Cauchy distribution), together with 0011 0012 p(x(k) + λ(k) v (k) ) (k) . a = min 1, p(x(k) ) Smith (1984) uses the hit-and-run sampler for generating uniform points over bounded regions, and B´elisle, Romeijn, and Smith (1993) use it for generating random variates from general multivariate distributions. Proofs of the convergence of the method can be found in B´elisle, Romeijn, and Smith (1993) and Chen and Schmeiser (1996). Gilks, Roberts, and George (1994) describe a generalization of the hit-andrun algorithm called adaptive direction sampling. In this method, a set of current points is maintained, and only one, chosen at random from the set, is updated at each iteration (see Gilks and Roberts, 1996). Both the Gibbs and hit-and-run methods are special cases of the Metropolis– Hastings method in which the r of step 2 in Algorithm 4.18 (page 141) is exactly 1, so there is never a rejection. The same issues of convergence that we encountered in discussing the Metropolis–Hastings method must be addressed when using the Gibbs or hit-and-run methods. The need to run long chains can increase the number of computations to unacceptable levels. Schervish and Carlin (1992) and Cowles and Carlin (1996) discuss general conditions for convergence of the Gibbs sampler. Dellaportas (1995) discusses some issues in the efficiency of random number generation using the Gibbs method. Berbee et al. (1987) compare the efficiency of hit-and-run methods with acceptance/rejection methods and find the hit-andrun methods to be more efficient in higher dimensions. Chen and Schmeiser (1993) give some general comparisons of Gibbs, hit-and-run, and variations. Generalizations about the performance of the methods are difficult; the best method often depends on the problem.
4.15. GENERATING SAMPLES FROM A GIVEN DISTRIBUTION
159
Multivariate Densities with Special Properties We have seen that certain properties of univariate densities can be used to develop efficient algorithms for general distributions that possess those special properties. For example, adaptive rejection sampling and other special acceptance/rejection methods can be used for distributions having concave densities or concave transformed densities, as discussed on page 150. H¨ ormann (2000) describes a method for log-concave bivariate distributions that uses adaptive rejection sampling to develop the majorizing function. Leydold (1998) shows that while the methods for univariate T -concave distributions would work for multivariate T -concave distributions, such methods are unacceptably slow. He splits the T -concave multivariate density into a set of simple cones and constructs the majorizing function from piecewise hyperplanes that are tangent to the cones. He reports favorably on the performance of his method for as many as eight dimensions. As we have seen, unimodal distributions are generally easier to work with than multimodal distributions. A product multivariate density having unimodal factors will of course be unimodal. Devroye (1997) described general acceptance/rejection methods for multivariate distributions with the slightly weaker property of being orthounimodal; that is, each marginal density is unimodal.
4.15
Generating Samples from a Given Distribution
Usually, in applications, rather than just generating a single random deviate, we generate a random sample of deviates from the distribution of interest. A random sample of size n from a discrete distribution with probability function p(X = mi ) = pi has a vector of counts of the mass points that has a multinomial (n, p1 , . . . , pk ) distribution. If the sample is to be used as a set, rather than as a sequence, and if n is large relative to k, it obviously makes more sense to generate a single multinomial (x1 , x2 , . . . , xk ) and use these values as counts of occurrences of the respective mass points m1 , m2 , . . . , mk . (Methods for generating multinomials are discussed in Section 5.3.2, page 198.) This same idea can be applied to continuous distributions with a modification to discretize the range (see Kemp and Kemp, 1987).
Exercises 4.1. The inverse CDF method.
160
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES (a) Prove that if X is a random variable with an absolutely continuous distribution function PX , the random variable PX (X) has a U(0, 1) distribution. (b) Prove that the inverse CDF method for discrete random variables as specified in the relationship in expression (4.2) on page 104 is correct.
4.2. Formally prove that the random variable delivered in Algorithm 4.6 on page 114 has the density pX . Hint: For the delivered variable, Z, determine the distribution function Pr(Z ≤ x) and differentiate. 4.3. Write a Fortran or C function to implement the acceptance/rejection method for generating a beta(3, 2) random deviate. Use the majorizing function shown in Figure 4.4 on page 115. The value of c is 1.2. Use the inverse CDF method to generate a deviate from g. (This will involve taking a square root.) 4.4. Acceptance/rejection methods. (a) Give an algorithm to generate a normal random deviate using the basic acceptance/rejection method with the double exponential density (see equation (5.11), page 177) as the majorizing density. (b) What is the acceptance proportion of this method? (c) After you have obtained the basic acceptance/rejection test, try to simplify it. (d) Develop an algorithm to generate bivariate normal deviates with mean (0, 0), variance (1, 1), and correlation ρ using a bivariate product double exponential density as the majorizing density. For ρ = 0, what is the acceptance probability? (e) Write a program to generate bivariate normal deviates with mean (0, 0), variance (1, 1), and correlation ρ. Use a bivariate product double exponential density as the majorizing density. Now, set ρ = 0.5 and generate a sample of 1000 bivariate normals. Compare the sample statistics with the parameters of the simulated distribution. (f) What is the acceptance probability for a basic acceptance/rejection method to generate d-variate normal deviates with mean 0 and diagonal variance-covariance matrix with all elements equal to 1 using a d-variate product double exponential density as the majorizing density? 4.5. What would be the problem with using a normal density to make a majorizing function for the double exponential distribution (or using a halfnormal for an exponential)? 4.6. (a) Write a Fortran or C function to implement the acceptance/rejection method for a bivariate gamma distribution whose density is given in
EXERCISES
161
equation (4.10) on page 123 using the method described in the text. (You must develop a method for determining the mode.) (b) Now, instead of the bivariate uniform in the rectangle near the origin, devise a pyramidal distribution to use as a majorizing density. (c) Use Monte Carlo methods to compare efficiency of the method using the bivariate uniform and the method using a pyramidal density. 4.7. Consider the acceptance/rejection method given in Algorithm 4.6 to generate a realization of a random variable X with density function pX using a density function gY . (a) Let T be the number of passes through the three steps until the desired variate is delivered. Determine the mean and variance of T (in terms of pX and gY ). (b) Now, consider a modification of the rejection method in which steps 1 and 2 are reversed, and the branch in step 3 is back to the new step 2; that is: 1. Generate u from a uniform (0,1) distribution. 2. Generate y from the distribution with density function gY . 3. If u ≤ pX (y)/cgY (y), then take y as the desired realization; otherwise return to step 2. Is this a better method? Let Q be the number of passes through these three steps until the desired variate is delivered. Determine the mean and variance of Q. (This method was suggested by Sibuya, 1961, and analyzed by Greenwood, 1976c.) 4.8. Formally prove that the random variable delivered in Algorithm 4.7 on page 118 has the density p. 4.9. Write a Fortran or C function to implement the ratio-of-uniforms method (page 130) to generate deviates from a gamma distribution with shape parameter α. Generate a sample of size 1000 and perform a chi-squared goodness-of-fit test (see Cheng and Feast, 1979). 4.10. Use the Metropolis–Hastings algorithm (page 141) to generate a sample of standard normal random variables. Use as the candidate generating density g(x|y), a normal density in x with mean y. Experiment with different burn-in periods and different starting values. Plot the sequences generated. Test your samples for goodness-of-fit to a normal distribution. (Remember that they are correlated.) Experiment with different sample sizes. 4.11. Let Π have a beta distribution with parameters α and β, and let X have a conditional distribution given Π = π of a binomial with parameters n and π. Let Π conditional on X = x have a beta distribution with parameters
162
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES α + x and n + β − x. (This leads to the “beta-binomial” distribution; see page 187.) Consider a bivariate Markov chain, (Π0 , X0 ), (Π1 , X1 ), . . ., with an uncountable state space (see Casella and George, 1992). (a) What is the transition kernel? That is, what is the conditional density of (Πt , Xt ) given (πt−1 , xt−1 )? (b) Consider just the Markov chain of the beta-binomial random variable X. What is the (i, j) element of the transition matrix?
4.12. Obtain a sample of size 100 from the beta(3,2) distribution using the SIR method of Section 4.12 and using a sample of size 1000 from the density gY that is proportional to the triangular majorizing function used in Exercise 4.3. (Use Algorithm 6.1, page 218, to generate the sample without replacement.) Compare the efficiency of the program that you have written with the one that you wrote in Exercise 4.3. 4.13. Formally prove that the random variable delivered in Algorithm 4.19 on page 151 has the density pX . (Compare Exercise 4.2.) 4.14. Write a computer program to implement the adaptive acceptance/rejection method for generating a beta(3,2) random deviate. Use the majorizing function shown in Figure 4.19 on page 152. The initial value of k is 4, and Sk = {0.00, 0.10, 0.60, 0.75, 0.90, 1.00}. Compare the efficiency of the program that you have written with the ones that you wrote in Exercises 4.3 and 4.12. 4.15. Consider the trivariate normal distribution used as the example in Figure 4.18 (page 145). (a) Use the Gibbs method to generate and plot 1000 realizations of X1 (including any burn-in). Explain any choices that you make on how to proceed with the method. (b) Use the hit-and-run method to generate and plot 1000 realizations of X1 (including any burn-in). Explain any choices that you make on how to proceed with the method. (c) Compare the Metropolis–Hastings method (page 145) and the Gibbs and hit-and-run methods for this problem. 4.16. Consider a probability model in which the random variable X has a binomial distribution with parameters n and y, which are, respectively, realizations of a conditional shifted Poisson distribution and a conditional beta distribution. For fixed λ, α, and β, let the joint density of X, N , and Y be proportional to λn y x+α−1 (1 − y)n−x+β−1 e−λ x!(n − x)!
for
x = 0, 1, . . . , n; 0 ≤ y ≤ 1; n = 1, 2, . . . .
EXERCISES
163
First, determine the conditional densities for X|y, n, Y |x, n, and N |x, y. Next, write a Fortran or C program to sample X from the multivariate distribution for given λ, α, and β. Now, set λ = 16, α = 2, and β = 4, run 500 independent Gibbs sequences of length k = 10, taking only the final variate, and plot a histogram of the observed x. (Use a random starting point.) Now repeat the procedure, except using only one Gibbs sequence of length 5000, and plot a histogram of all observed xs after the ninth one (see Casella and George, 1992). 4.17. Generate a random sample of 1000 Bernoulli variates with π = 0.3. Do not use Algorithm 4.1; instead, use the method of Section 4.15.
This page intentionally left blank
Chapter 5
Simulating Random Numbers from Specific Distributions For the important distributions, specialized algorithms based on the general methods discussed in the previous chapter are available. The important difference in the algorithms is their speed. A secondary difference is the size and complexity of the program to implement the algorithm. Because all of the algorithms for generating from nonuniform distributions rely on programs to generate from uniform distributions, an algorithm that uses only a small number of uniforms to yield a variate of the target distribution may be faster on a computer system on which the generation of the uniform is very fast. As we have mentioned, on a given computer system, there may be more than one program available to generate uniform deviates. Often, a portable generator is slower than a nonportable one, so for portable generators of nonuniform distributions, those that require a small number of uniform deviates may be better. If evaluation of elementary functions is a part of the algorithm for generating random deviates, then the speed of the overall algorithm depends on the speed of the evaluation of the functions. The relative speed of elementary function evaluation is different on different computer systems. The algorithm for a given distribution is some specialized version of those methods discussed in the previous chapter. Often, the algorithm uses some combination of these general techniques. Many algorithms require some setup steps to compute various constants and to store tables; therefore, there are two considerations for the speed: the setup time and the generation time. In some applications, many random numbers from the same distribution are required. In those cases, the setup time may not be too important. In other applications, the random numbers come from different distributions—probably the same family of distributions but with changing 165
166
CHAPTER 5. SPECIFIC DISTRIBUTIONS
values of the parameters. In those cases, the setup time may be very significant. If the best algorithm for a given distribution has a long setup time, it may be desirable to identify another algorithm for use when the parameters vary. Any computation that results in a quantity that is constant with respect to the parameters of the distribution should of course be performed as part of the setup computations in order to avoid performing the computation in every pass through the main part of the algorithm. The efficiency of an algorithm may depend on the values of the parameters of the distribution. Many of the best algorithms therefore switch from one method to another, depending on the values of the parameters. In some cases, the speed of the algorithm is independent of the parameters of the distribution. Such an algorithm is called a uniform time algorithm. In many cases, the most efficient algorithm in one range of the distribution is not the most efficient in other regions. Many of the best algorithms therefore use mixtures of the distribution. Sometimes, it is necessary to generate random numbers from some subrange of a given distribution, such as the tail region. In some cases, there are efficient algorithms for such truncated distributions. (If there is no specialized algorithm for a truncated distribution, acceptance/rejection applied to the full distribution will always work, of course.) Methods for generating random variates from specific distributions are an area in which there have been literally hundreds of papers, each proposing some wrinkle (not always new or significant). Because the relative efficiencies (“efficiency” here means “speed”) of the individual operations in the algorithms vary from one computing system to another, and also because these individual operations can be programmed in various ways, it is very difficult to compare the relative efficiencies of the algorithms. This provides fertile ground for a proliferation of “research” papers. Two other things contribute to the large number of insignificant papers in this area. It is easy to look at some algorithm, modify some step, and then offer the new algorithm. Thus, the intellectual capitalization required to enter the field is small. (In business and economics, this is the same reason that so many restaurants are started; only a relatively small capitalization is required.) Another reason for the large number of papers purporting to give new and better algorithms is the diversity of the substantive and application areas that constitute the backgrounds of the authors. Monte Carlo simulation is widely used throughout both the hard and the soft sciences. Research workers in one field often are not aware of the research published in another field. Although, of course, it is important to seek efficient algorithms, it is also necessary to consider a problem in its proper context. In Monte Carlo simulation applications, literally millions of random numbers may be generated, but the time required to generate them is likely to be only a very small fraction of the total computing time. In fact, it is probably the case that the fraction of time required for the generation of the random numbers is somehow negatively correlated with the importance of the problem. The importance of the time
5.1. MODIFICATIONS OF STANDARD DISTRIBUTIONS
167
required to perform some task usually depends more on its proportion of the overall time of the job rather than on its total time. Another consideration is whether the algorithm is portable; that is, whether it yields the same stream on different computer systems. As we mention in Section 4.5, methods that accept or reject a candidate variate based on a floatingpoint comparison may not yield the same streams on different systems. The descriptions of the algorithms in this chapter are written with an emphasis on clarity, so they should not be incorporated directly into program code without considerations of efficiency. These considerations generally involve avoiding unnecessary computations. This may mean defining a variable not mentioned in the algorithm description or reordering the steps slightly.
5.1
Modifications of Standard Distributions
For many of the common distributions, there are variations that are useful either for computational or other practical reasons or because they model some stochastic process well. A distribution can sometimes be simplified by transformations of the random variable that effectively remove certain parameters that characterize the distribution. In many cases, the algorithms for generating random deviates address the simplified version of the distribution. An appropriate transformation is then applied to yield deviates from the distribution with the given parameters. Standard Distributions A linear transformation, Y = aX + b, is simple to apply and is one of the most useful. The multiplier affects the scale, and the addend affects the location. For example, a “three-parameter” gamma distribution with density p(y) =
1 (y − γ)α−1 e−(y−γ)/β , Γ(α)β α
for γ ≤ y ≤ ∞,
can be formed from the simpler distribution with density g(x) =
1 α−1 −x x e , Γ(α)
for 0 ≤ x ≤ ∞,
using the transformation Y = βX + γ. (Here, and elsewhere, when we give an expression for a probability density function, we imply that the density is equal to 0 outside of the range specified.) The β parameter is a scale parameter, and γ is a location parameter. (The remaining α parameter is called the “shape parameter”, and it is the essential parameter of the family of gamma distributions.) The simpler form is called the standard gamma distribution. Other distributions have similar standard forms. Standard distributions (or standardized random variables) allow us to develop simpler algorithms and more compact tables of values that can be used for a range of parameter values.
168
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Truncated Distributions In many stochastic processes, the realizations of the random variable are constrained to be within a given region of the support of the random variable. Over the allowable region, the random variable has a probability density (or probability function) that is proportional to the density (or probability) of the unconstrained random variable. If the random variable Y has probability density p(y) over a domain S, and if Y is constrained to R ⊂ S, the probability density of the constrained random variable is 1 p(x) for x ∈ R; pc (x) = Pr(Y ∈ R) = 0, elsewhere. (5.1) The most common types of constraint are truncations, either left or right. In a left truncation at τ , say, the random variable Y is constrained by τ ≤ Y , and in a right truncation, it is constrained by Y ≤ τ . Truncated distributions are useful models in applications in which the observations are censored. Such observations often arise in studies where a variable of interest is the time until a particular event occurs. At the end of the study, there may be a number of observational units that have not experienced the event. The corresponding times for these units are said to be censored, or rightcensored; it is known only that the times for these units would be greater than some fixed value. In a similar fashion, left-censoring occurs when the exact times are not recorded early in the study. There are many issues to consider in the analysis of censored data, but it is not our purpose here to discuss the analysis. Generation of random variates with constraints can be handled by the general methods discussed in the previous chapter. The use of acceptance/rejection is obvious; merely generate from the full distribution and reject any realizations outside of the acceptable region. Of course, choosing a majorizing density with no support in the truncated region is a better approach. Modification of the inverse CDF method to handle truncated distributions is simple. For a right truncation at τ of a distribution with CDF PY , for example, instead of the basic transformation (4.1), page 102, we use X = PY−1 (U PY (τ )),
(5.2)
where U is a random variable from U(0, 1). The method using a sequence of conditional distributions described on page 149 can often be modified easily to generate variates from truncated distributions. In some simple applications, the truncated distribution is simulated by a conditional uniform distribution, the range of which is the intersection of the full conditional range and the truncated range. See Damien and Walker (2001) for some examples. There are usually more efficient ways of generating variates from constrained distributions. We describe a few of the more common ones (which are invariably truncations) in the following sections.
5.1. MODIFICATIONS OF STANDARD DISTRIBUTIONS
169
“Inverse” Distributions In Bayesian applications, joint probability densities of interest often involve a product of the density of some well-known random variable and what might be considered the density of the multiplicative inverse of another well-known random variable. Common examples of this are the statistics used in studentization: the chi-squared and the Wishart. Many authors refer to the distribution of such a random variable as the “inverse distribution”; for example, an “inverse chi-squared distribution” is the distribution of X −1 , where X has a chi-squared distribution. Other distributions with this interpretation are the inverse gamma distribution and the inverse Wishart distribution. This interpretation of “inverse” is not the same as for that word in the inverse Gaussian distribution with density given in equation (5.30) on page 193. In the cases of the inverse gamma, chi-squared, and Wishart distributions, the method for generating random variates is the obvious one: generate from the regular distribution and then obtain the inverse. Folded Symmetric Distributions For symmetric distributions, a useful nonlinear transformation is the absolute value. The distribution of the absolute value is often called a “folded” distribution. The exponential distribution, for example, is the folded double exponential distribution (see page 176). The halfnormal distribution, which is the distribution of the absolute value of a normal random variable, is a folded normal distribution. Mixture Distributions In Chapter 4, we discussed general methods for generating random deviates by decomposing the density p(·) into a mixture of other densities,
wi pi (x), (5.3) p(x) = 0001
i
where i wi = 1. Mixture distributions are useful in their own right. It was noticed as early as the nineteenth century by Francis Galton and Karl Pearson that certain observational data correspond very well to a mixture of two normal distributions, whereas a single normal does not fit the data well at all. Often, a simple mixture distribution can be used to model outliers or aberrant observations. This kind of mixture, in which a substantial proportion of the total probability follows one distribution and a small proportion follow another distribution, is called a “contaminated distribution”. Mixture distributions are often used in robustness studies because the interest is in how well a standard procedure holds up when the data are subject to contamination by a different population or by incorrect measurements. A very simple extension of a finite (or countable) mixture, as in equation (5.3), is one in which the parameter of the individual is used to weight
170
CHAPTER 5. SPECIFIC DISTRIBUTIONS
the densities continuously. Let the individual densities be indexed continuously by θ; that is, the density corresponding to θ is p(·; θ).001c Now, let w(·) be a weight (density) associated with θ such that w(θ) ≥ 0 and w(θ)dθ = 1. Then, form the mixture density p(·) as 0016 p(x) = w(θ)p(x; θ)dθ. (5.4) An example of this kind of mixture is the beta-binomial distribution, the density of which is given in equation (5.18). Probability-Skewed Distributions A special type of mixture distribution is a probability-skewed distribution, in which the mixing weights are the values of a CDF. The skew-normal distribution is a good example. The (standard) skew-normal distribution has density 2 2 (5.5) g(x) = √ e−x /2 Φ(λx) for − ∞ ≤ x ≤ ∞, 2π where Φ(·) is the standard normal CDF, and λ is a constant such that −∞ < λ < ∞. For λ = 0, the skew-normal distribution is the normal distribution, and in general, if |λ| is relatively small, the distribution is close to the normal. For larger |λ|, the distribution is more skewed, either positively or negatively. This distribution is an appropriate distribution for variables that would otherwise have a normal distribution but have been screened on the basis of a correlated normal random variable. See Arnold et al. (1993) for discussions. Other distributions symmetric about 0 can also be skewed by a CDF in this manner. The general form of the probability density is g(x) ∝ p(x)P (λx), where p(·) is the density of the underlying symmetric distribution, and P (·) is a CDF (not necessarily the corresponding one). The idea also extends to multivariate distributions. Arnold and Beaver (2000) discuss definitions and applications of such densities, specifically a skew-Cauchy density. In most cases, if |λ| is relatively small, generation of random variables from a probability-skewed symmetric distribution using an acceptance/rejection method with the underlying symmetric distribution as the majorizing density is entirely adequate. For larger values of |λ|, it is necessary to divide the support into two or more intervals. It is still generally possible to use the same majorizing density, but the multiplicative constant can be different in different intervals.
5.2
Some Specific Univariate Distributions
In this section, we consider several of the more common univariate distributions and indicate methods for simulating them. The methods discussed are generally
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
171
among the better ones, at least according to some criteria, but the discussion is not exhaustive. We give the details for some simpler algorithms, but in many cases the best algorithm involves many lines of a program with several constants that optimize a majorizing function or a squeeze function or the breakpoints of mixtures. We sometimes do not describe the best method in detail but rather refer the interested reader to the relevant literature. Devroye (1986a) has given a comprehensive treatment of methods for generating deviates from various distributions, and more information on many of the algorithms in this section can be found in that reference. The descriptions of the algorithms that we give indicate the computations, but if the reader develops a program from the algorithm, issues of computational efficiency should be considered. For example, in the descriptions, we do not identify the computations that should be removed from the main body of the algorithm and made part of some setup computations. Two variations of a distribution are often of interest. In one variation, the distribution is truncated. In this case, as we mentioned above, the range of the original distribution is restricted to a subrange and the probability measure adjusted accordingly. In another variation, the role of the random variable and the parameter of the distribution are interchanged. In some cases, these quantities have a natural association, and the corresponding distributions are said to be conjugate. An example of two such distributions are the binomial and the beta. What is a realization of a random variable in one distribution is a parameter in the other distribution. For many distributions, we may want to generate samples of a parameter, given realizations of the random variable (the data).
5.2.1
Normal Distribution
The normal distribution, which we denote by N(µ, σ 2 ), has the probability density 2 2 1 p(x) = √ e−(x−µ) /(2σ ) for − ∞ ≤ x ≤ ∞. (5.6) 2πσ If Z ∼ N(0, 1) and X = σZ + µ, then X ∼ N(µ, σ 2 ). Because of this simple relationship, it is sufficient to develop methods to generate deviates from the standard normal distribution, N(0, 1), so there is no setup involved. All constants necessary in any algorithm can be precomputed and stored. There are several methods for transforming uniform random variates into normal random variates. One transformation not to use is: for i = 1, . . . , 12 as i.i.d. U(0, 1). 1. Generate ui 0001 2. Deliver x = ui − 6. This method is the Central Limit Theorem applied to a sample of size 12. Not only is the method approximate (and based on a poor approximation!), but it is also slower than better methods.
172
CHAPTER 5. SPECIFIC DISTRIBUTIONS
A simple method is the Box–Muller method arising from a polar transformation: If U1 and U2 are independently distributed as U(0, 1), and % X1 = −2 log(U1 ) cos(2πU2 ), % X2 = −2 log(U1 ) sin(2πU2 ), (5.7) then X1 and X2 are independently distributed as N(0, 1) (see Exercises 5.1a and 5.1b on page 213). The Box–Muller transformation is rather slow. It requires evaluation of one square root and two trigonometric functions for every two deviates generated. As noted by Neave (1973), if the uniform deviates used in the Box–Muller transformation are generated by a congruential generator with small multiplier, the resulting normals are deficient in the tails. Golder and Settle (1976) under similar conditions demonstrate that the density of the generated normal variates has a jagged shape, especially in the tails. Of course, if they had analyzed their small-multiplier congruential generator, they would have found that generator lacking. (See the discussion about Figure 1.3, page 16.) It is easy to see that the largest and smallest numbers generated by the Box–Muller transformation occur when a value of u1 from the uniform generator % is close to 0. A bound on the absolute value of the numbers generated is −2 log(e), where e is the smallest floating-point that can be generated by the uniform generator. (In the “minimal standard” congruential generator and many similar generators, the smallest number is approximately 2−31 , so the bound on the absolute value is approximately 6.56.) How close the results of the transformation come to the bound depends on whether u2 is close to 0, 1/4, 1/2, 3/4, or 1 when u1 is close to 0. (In the “minimal standard” generator, when u1 is at the smallest value possible, cos(2πu2 ) is close to 1 because u2 is relatively close to 0. The maximum number, therefore, is very close to the upper bound of 6.56, which has a p-value of the same order of magnitude as the reciprocal of the period of the generator. On the other hand, when u1 is close to 0, the value of u2 is never close enough to 1/2 or 3/4 to yield a value of one of the trigonometric functions close to −1. The minimum value that results from the Box–Muller transformation, therefore, does not approach the lower bound of −6.56. The p-value of the minimum value is three to four orders of magnitude greater than the reciprocal of the period of the generator.) If the Box–Muller transformation is used with a congruential generator, especially one with a relatively small multiplier, the roles of u1 and u2 should be exchanged periodically. This ensures that the lower and upper bounds are approximately symmetric if the generator has full period. Tezuka (1991) shows that similar effects also are noticeable if a poor Tausworthe generator is used. These studies emphasize the importance of using a good uniform generator for whatever distribution is to be simulated. It is especially important to be wary of the effects of a poor uniform generator in algorithms that require more than one uniform deviate (see Section 4.3). Bratley, Fox, and Schrage (1987) show that normal variates generated by the Box-Muller transformation lie pairwise on spirals. The spirals are of exactly
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
173
the same origin as the lattice of the congruential generator itself, so a solution would be to use a better uniform generator. To alleviate potential problems of patterns in the output of a polar method such as the Box–Muller transformation, some authors have advocated that, for each pair of uniforms, only one of the resulting pair of normals be used. If there is any marginal gain in quality, it is generally not noticeable, especially if the roles of u1 and u2 are exchanged periodically as recommended. The Box–Muller transformation is one of several polar methods. All of them have similar properties, but the Box–Muller transformation generally requires slower computations. Although most currently available computing systems can evaluate the necessary trigonometric functions extremely rapidly, the Box–Muller transformation can often be performed more efficiently using an acceptance/rejection algorithm, as we indicated in the general discussion of acceptance/rejection methods (see Exercise 5.1d on page 213). The Box–Muller transformation is implemented via rejection in Algorithm 5.1. Algorithm 5.1 A Rejection Polar Method for Normal Variates 1. Generate v1 and v2 independently from U(−1, 1), and set r2 = v12 + v22 . 2. If r2 ≥ 1, then go to step 1; otherwise, deliver % x1 = v1 %−2 log r2 /r2 x2 = v2 −2 log r2 /r2 . Ahrens and Dieter (1988) describe fast polar methods for the Cauchy and exponential distributions in addition to the normal distribution. The fastest algorithms for generating normal deviates use either a ratioof-uniforms method or a mixture with acceptance/rejection. One of the best algorithms, called the rectangle/wedge/tail method, is described by Marsaglia, MacLaren, and Bray (1964). In that method, the normal density is decomposed into a mixture of densities with shapes as shown in Figure 5.1. It is easy to generate a variate from one of the rectangular densities, so the decomposition is done to give a high probability of being able to use a rectangular density. That, of course, means lots of rectangles, which brings some inefficiencies. The optimal decomposition must address those tradeoffs. The wedges are nearly linear densities (see Algorithm 4.7), so generating from them is relatively fast. The tail region takes the longest time, so the decomposition is such as to give a small probability to the tail. Ahrens and Dieter (1972) give an implementation of the rectangle/wedge/tail method that can be optimized at the bit level. Kinderman and Ramage (1976) represent the normal density as a mixture and apply a variety of acceptance/rejection and table-lookup techniques for the components. The individual techniques for various regions have been developed by Marsaglia (1964), Marsaglia and Bray (1964), and Marsaglia, MacLaren, and
174
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Figure 5.1: Rectangle/Wedge/Tail Decomposition Bray (1964). Marsaglia and Tsang (1984) also give a decomposition, resulting in what they call the “ziggurat method”. Leva (1992a) describes a ratio-of-uniforms method with very tight bounding curves for generating normal deviates. (The 15-line Fortran program implementing Leva’s method is Algorithm 712 of CALGO; see Leva, 1992b.) Given the current speed of the standard methods of evaluating the inverse normal CDF, the inverse CDF method is often useful, especially if order statistics are of interest. Even with the speed of the standard algorithms for the inverse normal CDF, specialized versions, possibly to a slightly lower accuracy, have been suggested, for example by Marsaglia (1991) and Marsaglia, Zaman, and Marsaglia (1994). (The latter reference gives two algorithms for inverting the normal CDF: one very accurate, and one faster but slightly less accurate.) Wallace (1996) describes an interesting method of generating normals from other normals rather than by making explicit transformations of uniforms. The method begins with a set of kp normal deviates generated by some standard method. The deviates are normalized so that their sum of squares is 1024. Let X be a k × p array containing those deviates, and let Ai be a k × k orthogonal matrix. New normal deviates are formed by multiplication of an orthogonal matrix and the columns of X. A random column from the array and a random method of moving from one column to another are chosen. In Wallace’s implementation, k is 4 and p is 256. Four orthogonal 4 × 4 matrices are chosen to
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS make the matrix/vector multiplication fast: 1 1 −1 1 1 −1 1 −1 1 1 1 −1 1 1 A2 = A1 = 1 2 1 −1 −1 −1 2 1 −1 −1 −1 1 −1 −1 1 −1 1 1 −1 1 −1 −1 1 1 −1 −1 1 −1 A4 = A3 = 1 1 2 −1 −1 −1 2 −1 −1 1 1 1 1 1
175
−1 −1 1 1 −1 1 −1 1 −1 −1 1 −1 ; 1 1 1 −1
hence, the matrix multiplication is usually just the addition of two elements of the vector. After a random column of X is chosen (that is, a random integer between 1 and 256), a random odd number between 1 and 255 is chosen as a stride (that is, as an increment for the column number) to allow movement from one column to another. The first column chosen is multiplied by A1 , the next by A2 , the next by A3 , the next by A4 , and then the next by A1 , and so on. The elements of the vectors that result from these multiplications constitute both the normal deviates output in this pass and the elements of a new k × p array. Except for rounding errors, the elements in the new array should have a sum of squares of 1024 also. Just to avoid any problems from rounding, however, the last element generated is not delivered as a normal deviate but instead is used to generate a chi-squared deviate, y, with 1024 degrees of freedom via a Wilson–Hilferty approximation, and the 1023 other values are normalized by % y/1024. (The Wilson–Hilferty approximation relates the chi-squared random variable Y with ν degrees of freedom to the standard normal random variable X by
Y 13 2 − 1 − 9ν & . X≈ ν 2 9ν
The approximation is fairly good for ν > 30. See Abramowitz and Stegun, 1964.) Truncated Normal Distribution In Monte Carlo studies, the tail behavior is often of interest. Variates from the tail of a distribution can always be formed by selecting variates generated from the full distribution, of course, but this can be a very slow process. Marsaglia (1964), Geweke (1991a), Robert (1995), and Damien and Walker (2001) give methods for generating variates directly from a truncated normal distribution. The truncated normal with left truncation point τ has density 2
2
e−(x−µ) /(2σ )
p(x) = √ 2πσ 1 − Φ τ −µ σ where Φ(·) is the standard normal CDF.
for τ ≤ x ≤ ∞,
176
CHAPTER 5. SPECIFIC DISTRIBUTIONS
The method of Robert uses an acceptance/rejection method with a translated exponential as the majorizing density; that is, g(y) = λ∗ e−λ
∗
(y−τ )
where
for τ ≤ y ≤ ∞,
√
τ2 + 4 . (5.8) 2 (See the next section for methods to generate exponential random variates.) The method of Damien and Walker uses conditional distributions. The range of the conditional uniform that yields the normal is taken of √ √ as the intersection the truncated range and the full conditional range (− −2 log Y , −2 log Y ) in the example on page 149. ∗
λ =
τ+
Lognormal and Halfnormal Distributions Two distributions closely related to the normal are the lognormal and the halfnormal. The lognormal is the distribution of a random variable whose logarithm has a normal distribution. A very good way to generate lognormal variates is just to generate normal variates and exponentiate. The halfnormal is the folded normal distribution. The best way to generate deviates from the halfnormal is just to take the absolute value of normal deviates.
5.2.2
Exponential, Double Exponential, and Exponential Power Distributions
The exponential distribution with parameter λ > 0 has the probability density p(x) = λe−λx
for 0 ≤ x ≤ ∞.
(5.9)
If Z has the standard exponential distribution (that is, with parameter equal to 1), and X = Z/λ, then X has the exponential distribution with parameter λ (called the “rate”). Because of this simple relationship, it is sufficient to develop methods to generate deviates from the standard exponential distribution. The exponential distribution is a special case of the gamma distribution, the density of which is given in equation (5.13). The parameters of the gamma distribution are α = 1 and β = λ1 . The inverse CDF method is very easy to implement and is generally satisfactory for the exponential distribution. The method is to generate u from U(0, 1) and then take log(u) . (5.10) x=− λ (This and similar computations are why we require that the simulated uniform not include its endpoints.) Many other algorithms for generating exponential random numbers have been proposed over the years. Marsaglia, MacLaren, and Bray (1964) apply
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
177
the rectangle/wedge/tail method to the exponential distribution. Ahrens and Dieter (1972) give a method that can be highly optimized at the bit level. Ahrens and Dieter also provide a catalog of other methods for generating exponentials. These other algorithms seek greater speed by avoiding the computation of the logarithm. Many simple algorithms for random number generation involve evaluation of elementary functions. As we have indicated, evaluation of an elementary function at a random point can often be performed equivalently by acceptance/rejection, and Ahrens and Dieter (1988) describe a method for the exponential that does that. (See Hamilton, 1998, for some corrections to their algorithm.) As the software for evaluating elementary functions has become faster, the need to avoid their evaluation has decreased. A common use of the exponential distribution is as the model of the interarrival times in a Poisson process. A (homogeneous) Poisson process, T1 < T2 < . . . , with rate parameter λ can be generated by taking the output of an exponential random number generator with parameter λ as the times, t1 , t2 − t1 , . . . . We consider nonhomogeneous Poisson processes in Section 6.5.2, page 225. Truncated Exponential Distribution The interarrival process is memoryless, and the tail of the exponential distribution has an exponential distribution; that is, if X has the density (5.9), and Y = X + τ , then Y has the density λeλτ e−λy
for τ ≤ y ≤ ∞.
This fact provides a very simple process for generating from the tail of an exponential distribution. Double Exponential Distribution The double exponential distribution, also called the Laplace distribution, with parameter λ > 0 has the probability density p(x) =
λ −λ|x| e 2
for − ∞ ≤ x ≤ ∞.
(5.11)
The double exponential distribution is often used in Monte Carlo studies of robust procedures because it has a heavier tail than the normal distribution yet corresponds well with observed distributions. If Z has the standard exponential distribution and X = SZ/λ, where S is a random variable with probability mass 12 at −1 and at +1, then X has the double exponential distribution with parameter λ. This fact is the basis for the
178
CHAPTER 5. SPECIFIC DISTRIBUTIONS
method of generating double exponential variates; generate an exponential, and change the sign with probability 12 . The method of bit stripping (see page 10) can be used to do this as long as the lower-order bits are the ones used and assuming that the basic uniform generator is a very good one. Exponential Power Distribution A generalization of the double exponential distribution is the exponential power distribution, having density p(x) ∝ e−λ|x|
α
for − ∞ ≤ x ≤ ∞.
(5.12)
For α = 2, the exponential power distribution is the normal distribution. The members of this family with 1 ≤ α < 2 are often used to model distributions with slightly heavier tails than the normal distribution. Either the double exponential or the normal distribution, depending on the value of α, works well as a majorizing density to generate exponential power variates by acceptance/rejection (see Tadikamalla, 1980a).
5.2.3
Gamma Distribution
The gamma distribution with parameters α > 0 and β > 0 has the probability density 1 p(x) = xα−1 e−x/β for 0 ≤ x ≤ ∞, (5.13) Γ(α)β α where Γ(α) is the complete gamma function. The α parameter is called the shape parameter, and β is called the scale parameter. If the random variable Z has the standard gamma distribution with shape parameter α and scale parameter 1, and X = βZ, then X has a gamma distribution with parameters α and β. (Notice that the exponential is a gamma with α = 1 and β = 1/λ.) Of the special distributions that we have considered thus far, this is the first one that has a parameter that cannot be handled by simple translations and scalings. Hence, the best algorithms for the gamma distribution may be different depending on the value of α and on how many deviates are to be generated for a given value of α. Cheng and Feast (1979) use a ratio-of-uniforms method, as shown in Algorithm 5.2, for a gamma distribution with α > 1. The mean time of this 1 algorithm is O(α 2 ), so for larger values of α it is less efficient. Cheng and Feast (1980) also gave an acceptance/rejection method that was better for large values of the shape parameter. Schmeiser and Lal (1980) use a composition of ten densities, some of the rectangle/wedge/tail type, followed by the acceptance/rejection method. The Schmeiser/Lal method is the algorithm used in the IMSL Libraries for values of the shape parameter greater than 1. The speed of the Schmeiser/Lal method does not depend on the value of the shape parameter. Sarkar (1996) gives a modification of the Schmeiser/Lal method that
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
179
has greater efficiency because of using more intervals, resulting in tighter majorizing and squeeze functions, and because of using an alias method to help speed the process. Algorithm 5.2 The Cheng/Feast (1979) Algorithm for Generating Gamma Random Variates when the Shape Parameter is Greater than 1 1. Generate u1 and u2 independently from U(0, 1), and set 1 u1 α − 6α v= . (α − 1)u2 2. If
2(u2 − 1) 1 + v + ≤ 2, α−1 v
then deliver x = (α − 1)v; otherwise, if 2 log u2 − log v + v ≤ 1, α−1 then deliver x = (α − 1)v. 3. Go to step 1. An efficient algorithm for values of the shape parameter less than 1 is the acceptance/rejection method described in Ahrens and Dieter (1974) and modified by Best (1983), as shown in Algorithm 5.3. That method is the algorithm used in the IMSL Libraries for values of the shape parameter less than 1. Algorithm 5.3 The Best/Ahrens/Dieter Algorithm for Generating Gamma Random Variates when the Shape Parameter Is Less than 1 √ 0. Set t = 0.07 + 0.75 1 − α and e−t α . b=1+ t 1. Generate u1 and u2 independently from U(0, 1), and set v = bu1 . 2. If v ≤ 1, then 1 set x = tv α ; 2−x , then deliver x; if u2 ≤ 2+x otherwise, if u2 ≤ e−x , then deliver x; otherwise, 0011 0012 t(b − v) x set x = − log and y = ; α t
180
CHAPTER 5. SPECIFIC DISTRIBUTIONS if u2 (α + y(1 − α)) ≤ 1, then deliver x; otherwise, if u2 ≤ y a−1 , then deliver x.
3. Go to step 1. There are two cases of the gamma distribution that are of particular interest. The shape parameter α often is a positive integer. In that case, the distribution are independently is sometimes called the Erlang distribution. If Y1 , Y2 , . . . , Yα0001 distributed as exponentials with parameter 1/β, then X = Yi has a gamma (Erlang) distribution with parameters α and β. Using the inverse CDF method (equation (5.10)) with the independent realizations u1 , u2 , . . . , uα , we generate an Erlang deviate as (α ) 000f x = −β log ui . i=1
The general algorithms for gammas work better for the Erlang distribution if α is large. The other special case of the gamma is the chi-squared distribution in which the scale parameter β is 2. Twice the shape parameter α is called the degrees of freedom. For large or nonintegral degrees of freedom, the general methods for generating gamma random deviates are best for generating chi-squared deviates; otherwise, special methods described below are used. An important property of the gamma distribution is: If X and Y are independent gamma variates with common scale parameter β and shape parameters α1 and α2 , then X + Y has a gamma distribution with scale parameter β and shape parameter α1 + α2 . This fact may be used in developing schemes for generating gammas. For example, any gamma can be represented as the sum of an Erlang variate, which is the sum of exponential variates, and a gamma variate with shape parameter less than 1. This representation may effectively be used in a method of generating gamma variates (see Atkinson and Pearce, 1976). Truncated Gamma Distribution In some applications, especially ones involving censored observations, a truncated gamma is a useful model. In the case of left-censored data, we need to sample from the tail of a gamma distribution. The relevant distribution has the density p(x) =
1 xα−1 e−x/β (Γ(α) − Γτ /β (α))β α
for τ ≤ x ≤ ∞,
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
181
where Γτ /β (·) is the incomplete gamma function (see page 321). Dagpunar (1978) describes a method of sampling from the tail of a gamma distribution. The method makes use of the fact mentioned above that an exponential distribution is memoryless. A truncated exponential is used as a majorizing density in an acceptance/rejection method. Dagpunar first determines the optimal value of the exponential scale parameter that will maximize the probability of acceptance. The value is the saddlepoint in the ratio of the truncated gamma density to the truncated exponential (both truncated at the same point, τ ), % τ − α + (τ − α)2 + 4τ . λ= 2τ The procedure therefore is: 1. Generate y from the truncated exponential and u independently as U(0, 1). u1 + (y can be generated by generating u1 as U(0, 1) and taking y = − log λ τ .) 1−λ ) ≤ log u1 , then deliver y. 2. If (1 − λ)y − (α − 1)(1 + log y + log α−1
Many common applications require truncation on the right; that is, the observations are right censored. Philippe (1997) describes a method for generating variates from a right-truncated gamma distribution, which has density p(x) =
1 xα−1 e−x/β Γτ /β (α)β α
for 0 ≤ x ≤ τ,
where Γτ /β (α) is the incomplete gamma function. Philippe shows that if the random variable X has this distribution, then it can be represented as an infinite mixture of beta random variables: X=
∞
k=1
Γ(α) Yk , Γ(α + k)Γ1/β (α)β α+k−1 e1/β
where Yk is a random variable with a beta random variable with parameters α and k. Philippe suggested as a majorizing density a finite series gm (y)
∝ =
m
k=1 m
k=1
β k−1 Γ(α)Γ(k) β k−1 Γ(α
+ k)
1 0001m
y α−1 (1 − y)k−1
1 0001m
hk (y),
1 i=1 β i−1 Γ(α+i)
1 i=1 β i−1 Γ(α+i)
where hk is a beta density (equation (5.14)) with parameters α and k. Thus, to generate a variate from a distribution with density gm , we select a beta with the probability equal to the weight and then use a method described in the next section for generating a beta variate. The number of terms depends on the probability of acceptance. Obviously, we want a high probability of acceptance,
182
CHAPTER 5. SPECIFIC DISTRIBUTIONS
but this requires a large number of terms in the series. For a probability of acceptance of at least p∗ (with p∗ < 1, obviously), Philippe shows that the number of terms required in the series is approximately 1 m = 4 ∗
' 0011 00122 4 2 , zp + zp + β
where zp = Φ−1 (p) and Φ is the standard normal CDF. Algorithm 5.4 The Philippe (1997) Algorithm for Generating Gamma Random Variates Truncated on the Right at τ 0. Determine m∗ , and initialize quantities in gm∗ . 1. Generate y from the distribution with density gm∗ . 2. Generate u from U(0, 1). 3. If
0001m∗ u≤
y
eβ
1 k=1 β k−1 Γ(k) 0001m∗ (1−y)k−1 k=1 β k−1 Γ(k)
,
then take y as the desired realization; otherwise, return to step 1. Philippe (1997) also describes methods for a left-truncated gamma distribution, including special algorithms for the case where the truncation point is an integer. The interested reader is referred to the paper for the details. Damien and Walker (2001) also give a method for generating variates directly from a truncated gamma distribution. Their method uses conditional distributions, as we discuss on page 149. The range of the conditional uniform that yields the gamma is taken as the intersection of the truncated range and the full conditional range. Generalized Gamma Distributions There are a number of generalizations of the gamma distribution. The generalizations provide more flexibility in modeling because they have more parameters. Stacy (1962) defined a generalization that has two shape parameters. It is especially useful in failure-time models. The distribution has density p(x) =
γ |γ| xαγ−1 e(−x/β) Γ(α)β αγ
for 0 ≤ x ≤ ∞.
This distribution includes as special cases the ordinary gamma (with γ = 1), the halfnormal distribution (with α = 12 and γ = 2), and the Weibull (with
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
183
α = 1). The best way to generate a generalized gamma distribution is to use the best method for the corresponding gamma and then exponentiate. Everitt (1998) describes a generalized gamma distribution, which he calls the “Creedy and Martin generalized gamma”, with density 2
p(x) = θ0 xθ1 eθ2 x+θ3 x
+θ4 x3
for 0 ≤ x ≤ ∞.
This density can of course be scaled with a β, as in the other gamma distributions that we have discussed. Ghitany (1998) and Agarwal and Al-Saleh (2001) have described a generalized gamma distribution based on a generalized gamma function, 0016 ∞ 1 tα−1 e−t dt, Γ(α, ν, λ) = (t + ν)λ 0 introduced by Kobayashi (1991). The distribution has density p(x) =
xα−1 1 e(−x/β) Γ(α, ν, λ)β α−λ (x + βν)λ
for 0 ≤ x ≤ ∞.
This distribution is useful in reliability studies because of the shapes of the hazard function that are possible for various values of the parameters. It is the same as the ordinary gamma for λ = 0. D-Distributions A class of distributions, called D-distributions, that arise in extended gamma processes is studied by Laud, Ramgopal, and Smith (1993). The interested reader is referred to that paper for the details.
5.2.4
Beta Distribution
The beta distribution with parameters α > 0 and β > 0 has the probability density 1 xα−1 (1 − x)β−1 for 0 ≤ x ≤ 1, p(x) = (5.14) B(α, β) where B(α, β) is the complete beta function. Efficient methods for generating beta variates require different algorithms for different values of the parameters. If either parameter is equal to 1, it is very simple to generate beta variates using the inverse CDF method, which in this case would just be a root of a uniform. If both values of the parameters are less than 1, the simple acceptance/rejection method of J¨ ohnk (1964), given as Algorithm 5.5, is one of the best. If one parameter is less than 1 and the other is greater than 1, the method of Atkinson (1979) is useful. If both parameters are greater than 1, the method of Schmeiser and Babu (1980) is very efficient, except that it requires a lot of setup time. For the case of both parameters
184
CHAPTER 5. SPECIFIC DISTRIBUTIONS
greater than 1, Cheng (1978) gives an algorithm that requires very little setup time. The IMSL Libraries use all five of these methods, depending on the values of the parameters and how many deviates are to be generated for a given setting of the parameters. Algorithm 5.5 J¨ ohnk’s Algorithm for Generating Beta Random Variates when Both Parameters are Less than 1 1/α
1. Generate u1 and u2 independently from U(0, 1), and set v1 = u1 1/β v2 = u2 .
and
2. Set w = v1 + v2 . 3. If w > 1, then go to step 1. v1 4. Set x = , and deliver x. w
5.2.5
Chi-Squared, Student’s t, and F Distributions
The chi-squared, Student’s t, and F distributions all are derived from the normal distribution. Variates from these distributions could, of course, be generated by transforming normal deviates. In the case of the chi-squared distribution, however, this would require generating and squaring several normals for each chi-squared deviate. A more direct algorithm is much more efficient. Even in the case of the t and F distributions, which would require only a couple of normals or chi-squared deviates, there are better algorithms. Chi-Squared Distribution The chi-squared distribution, as we have mentioned above, is a special case of the gamma distribution (see equation (5.13)) in which the scale parameter, β, is 2. Twice the shape parameter, 2α, is called the degrees of freedom and is often denoted by ν. If ν is large or is not an integer, the general methods for generating gamma random deviates described above are best for generating chi-squared deviates. If the degrees of freedom value is a small integer, the chi-squared deviates can be generated by taking a logarithm of the product of some independent uniforms. If ν is an even integer, the chi-squared deviate r is produced from ν/2 independent uniforms, ui , by ν/2 000f r = −2 log ui . i=1
If ν is an odd integer, this method can be used with the product going to (ν − 1)/2, and then the square of an independent normal deviate is added to produce r. The square root of the chi-squared random variable is sometimes called a chi random variable. Although, clearly, a chi random variable could be generated
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
185
as the square root of a chi-squared deviate generated as above, there are more efficient direct ways of generating a chi deviate; see Monahan (1987). Student’s t Distribution The standard t distribution with ν degrees of freedom has density
0011 0012− ν+1 2 Γ ν+1 x2 2 p(x) = ν √ 1+ ν Γ 2 νπ
for − ∞ ≤ x ≤ ∞.
(5.15)
The degrees of freedom, ν, does not have to be an integer, but it must be positive. A standard normal random variable divided by the square root of a chisquared random variable with ν degrees of freedom is a t random variable with ν degrees of freedom. Also, the square root of an F random variable with 1 and ν degrees of freedom is a t random variable with ν degrees of freedom. These relations could be used to generate t deviates, but neither yields an efficient method. Kinderman and Monahan (1980) describe a ratio-of-uniforms method for the t distribution. The algorithm is rather complicated, but it is very efficient. Marsaglia (1980) gives a simpler procedure that is almost as fast. Either is almost twice as fast as generating a normal deviate and a chi-squared deviate and dividing by the square root of the chi-squared one. Marsaglia (1984) also gives a very fast algorithm for generating t variates that is based on a transformed acceptance/rejection method that he called exact-approximation (see Section 4.5). Bailey (1994) gives the polar method shown in Algorithm 5.6 for the Student’s t distribution. It is similar to the polar method for normal variates given in Algorithm 5.1. Algorithm 5.6 A Rejection Polar Method for t Variates with ν Degrees of Freedom 1. Generate v1 and v2 independently from U(−1, 1), and set r2 = v12 + v22 . 2. If r2 ≥ 1, then go to step 1; otherwise, ' deliver x = v1
ν(r−4/ν − 1) . r2
The jagged shape of the frequency curve of normals generated via a polar method based on a poor uniform generator that was observed by Neave (1973) and by Golder and Settle (1976) may also occur for t variates generated by this polar method. It is important to use a good uniform generator for whatever distribution is to be simulated.
186
CHAPTER 5. SPECIFIC DISTRIBUTIONS
In Bayesian analysis, it is sometimes necessary to generate random variates for the degrees of freedom in a t distribution conditional on the data. In the hierarchical model underlying the analysis, the t random variable is interpreted as a mixture of normal random variables divided by square roots of gamma random variables. For given realizations of gammas, λ1 , λ2 , . . . , λn , the density of the degrees of freedom is p(x) ∝
n 000f i=1
ν ν/2 −νλi /2 ν λν/2 . i e
2ν/2 Γ
2
Mendoza-Blanco and Tu (1997) show that three different gamma distributions can be used to approximate this density very well for three different ranges of values of λg e−λa , where λg is the geometric mean of the λi , and λa is the arithmetic mean. Although the approximations are very good, the gamma approximations could also be used as majorizing densities. F Distribution A variate from the F distribution can be generated as the ratio of two chisquared deviates, which, of course, would be only half as fast as generating a chi-squared deviate. A better way to generate an F variate is as a transformed beta. If X is distributed as a beta, with parameters ν1 /2 and ν2 /2, and Y =
ν2 X , ν1 1 − X
then Y has an F distribution with ν1 and ν2 degrees of freedom. Generating a beta deviate and transforming it usually takes only slightly longer than generating a single chi-squared deviate.
5.2.6
Weibull Distribution
The Weibull distribution with parameters α > 0 and β > 0 has the probability density α α p(x) = xα−1 e−x /β for 0 ≤ x ≤ ∞. (5.16) β The simple inverse CDF method applied to the standard Weibull distribution (i.e., β = 1) is quite efficient. The expression is simply 1
(− log u) α . Of course, an acceptance/rejection method could be used to replace the evaluation of the logarithm in the inverse CDF. The standard Weibull deviates are then scaled by β 1/α .
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
5.2.7
187
Binomial Distribution
The probability function for the binomial distribution with parameters n and π is n! π x (1 − π)n−x for x = 0, 1, . . . , n, (5.17) p(x) = x!(n − x)! where n is a positive integer and 0 < π < 1. To generate a binomial, a simple way is to sum Bernoullis (equation (4.4), and Algorithm 4.1, page 105), which is equivalent to an inverse CDF technique. If n, the number of independent Bernoullis, is small, this method is adequate. The time required for this kind of algorithm is obviously O(n). For larger values of n, the median of a random sample of size n from a Bernoulli distribution can be generated (it has an approximate beta distribution; see Relles, 1972), and then the inverse CDF method can be applied from that point. Starting at the median allows the time required to be halved. Kemp (1986) shows that starting at the mode results in an even faster method and gives a method to approximate the modal probability quickly. If this idea is applied recursively, the time becomes O(log n). The time required for any method based on the CDF of the binomial is an increasing function of n. Several methods whose efficiencies are not so dependent on n are available, and for large values of n they are to be preferred to methods based on the CDF. (The value of π also affects the speed; the inverse CDF methods are generally competitive as long as nπ < 500.) Stadlober (1991) described an algorithm based on a ratio-of-uniforms method. Kachitvichyanukul (1982) gives an efficient method using acceptance/rejection over a composition of four regions (see Schmeiser, 1983; and Kachitvichyanukul and Schmeiser, 1988a, 1990). This is the method used in the IMSL Libraries. Beta-Binomial Distribution The beta-binomial distribution is the mixture distribution that is a binomial for which the parameter π is a realization of a random variable that has a beta distribution. This distribution is useful for modeling overdispersion or “extravariation” in applications where there are clusters of separate binomial distributions. The probability function for the beta-binomial distribution with parameters n, which is a positive integer, α > 0, and β > 0 is 0016 1 n! π α−1+x (1 − π)n+β−1−x dπ for x = 0, 1, . . . , n, p(x) = x!(n − x)!B(α, β) 0 (5.18) where B(α, β) is the complete beta function. (The integral in this expression is B(α + x, n + β − x).) The mean of the beta-binomial is in the form of the binomial mean, nπ,
188
CHAPTER 5. SPECIFIC DISTRIBUTIONS
with the beta mean, α/(α + β), in place of π, but the variance is nαβ n + α + β . (α + β)2 1 + α + β A simple way of generating deviates from a beta-binomial distribution is first to generate the parameter π as the appropriate beta and then to generate the binomial (see Ahn and Chen, 1995). In this case, an inverse CDF method for the binomial may be more efficient because it does not require as much setup time as the generally more efficient ratio-of-uniforms or acceptance/rejection methods referred to above.
5.2.8
Poisson Distribution
The probability function for the Poisson distribution with parameter θ > 0 is p(x) =
e−θ θx x!
for x = 0, 1, 2, . . . .
(5.19)
A Poisson with a small mean, θ, can be generated efficiently by the inverse CDF technique. Kemp and Kemp (1991) describe a method that begins at the mode of the distribution and proceeds in the appropriate direction to identify the inverse. They give a method for identifying the mode and computing the modal probability. Many of the other methods that have been suggested for the Poisson also require longer times for distributions with larger means. Ahrens and Dieter (1980) and Schmeiser and Kachitvichyanukul give efficient methods having times that do not depend on the mean (see Schmeiser, 1983). The method of Schmeiser and Kachitvichyanukul uses acceptance/rejection over a composition of four regions. This is the method used in the IMSL Libraries.
5.2.9
Negative Binomial and Geometric Distributions
The probability function for the negative binomial is 0011 0012 x+r−1 p(x) = π r (1 − π)x for x = 0, 1, 2, . . . , r−1
(5.20)
where r > 0 and 0 < π < 1. If r is an integer, the negative binomial distribution is sometimes called the Pascal distribution. If π is the probability of a success in a single Bernoulli trial, the random variable can be thought of as the number of failures before r successes are obtained. If rπ/(1 − π) is relatively small and (1 − π)r is not too small, the inverse CDF method works well. Otherwise, a gamma (r, π/(1 − π)) can be generated and used as the parameter to generate a Poisson. The Poisson variate is then delivered as the negative binomial variate.
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
189
The geometric distribution is a special case of the negative binomial with r = 1. The probability function is p(x) = π(1 − π)x
for x = 0, 1, 2, . . . .
(5.21)
The integer part of an exponential random variable with parameter λ = − log(1− π) has a geometric distribution with parameter π; hence, the simplest, and also one of the best, methods for the geometric distribution with parameter π is to generate a uniform deviate u and take * log u + . log(1 − π) It is common to see the negative binomial and the geometric distributions defined as starting at 1 instead of 0, as above. The distributions are the same after making an adjustment of subtracting 1.
5.2.10
Hypergeometric Distribution
The probability function for the hypergeometric distribution is 0011 00120011 0012 M L−M x N −x 0011 0012 p(x) = L N
(5.22)
for x = max(0, N − L + M ), . . . , min(N, M ). The usual method of developing the hypergeometric distribution is with a finite sampling model: N items are to be sampled, independently with equal probability and without replacement, from a lot of L items of which M are special; the random variable X is the number of special items in the random sample. A good method for generating from the hypergeometric distribution is the inverse CDF method. The inverse CDF can be evaluated recursively using the simple expression for the ratio p(x + 1)/p(x), so the build-up search of Algorithm 4.4 or the chop-down search of Algorithm 4.5 could be used. In either case, beginning at the mean, M N/L, can speed up the search. Another simple method that is good is straightforward use of the finite sampling model that defines the distribution. Kachitvichyanukul and Schmeiser (1985) give an algorithm based on acceptance/rejection of a probability function decomposed as a mixture, and Stadlober (1990) describes an algorithm based on a ratio-of-uniforms method. Both of these can be faster than the inverse CDF for larger values of N and M . Kachitvichyanukul and Schmeiser (1988b) give a Fortran program for sampling from the hypergeometric distribution. The program uses either the inverse CDF or the acceptance/rejection method depending on the mode, , (N + 1)(M + 1) . m= L+2
190
CHAPTER 5. SPECIFIC DISTRIBUTIONS
If m−max(0, N +M −L) < 10, then the inverse CDF method is used; otherwise, the composition/acceptance/rejection method is used. Extended Hypergeometric Distribution A related distribution, called the extended hypergeometric distribution, can be developed by assuming that X and Y = N − X are binomial random variables with parameters M and πX and L − M and πY , respectively. Let ρ be the odds ratio, πX (1 − πY ) ; ρ= πY (1 − πX ) then, the conditional distribution of X given X + Y = N has probability mass function 0011 00120011 0012 M L−M ρx x N −x p(x|x + y = N ) = (5.23) 00120011 0012 b 0011
M L−M ρj j N −j j=a
for x = a . . . , b, where a = max(0, N − L + M ) and b = min(N, M ). This function can also be evaluated recursively and random numbers generated by the inverse CDF method, similarly to the hypergeometric distribution. Liao and Rosen (2001) describe methods for evaluating the probability mass functions and also for computing the mode of the distribution in order to speed up the evaluation of the inverse CDF. Another generalization, called the noncentral hypergeometric distribution, is developed by allowing different probabilities of selecting the two types of items. If the relative probability of selecting an item of the special type to that of selecting an item of the other type (given an equal number of each type) ω, the realization of X can be built sequentially by Bernoulli realizations with probability Mk /(Mk + ω(Lk − Mk )), where Mk is the number of special items remaining. Variates from this distribution can be generated by the finite sampling model underlying the distribution.
5.2.11
Logarithmic Distribution
The probability function for the logarithmic distribution with parameter θ is p(x) = −
θx x log(1 − θ)
for x = 1, 2, 3, . . . ,
(5.24)
where 0 < θ < 1. Kemp (1981) describes a method for generating efficiently from the inverse logarithmic CDF either using a chop-down approach (see page 108) to move
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
191
rapidly down the set of CDF values or using a mixture in which highly likely values are given priority.
5.2.12
Other Specific Univariate Distributions
Many other interesting distributions have simple relationships to the standard distributions discussed above. When that is the case, because there are highly optimized methods for the standard distributions, it is often best just to use a very good method for the standard distribution and then apply the appropriate transformation. For some distributions, the inverse CDF method is almost as good as more complicated methods. Cauchy Distribution Variates from the Cauchy or Lorentzian distribution, which has density p(x) =
1
2 πa 1 + x−b a
for − ∞ ≤ x ≤ ∞,
(5.25)
can be generated easily by the inverse CDF method. For the standard Cauchy distribution (that is, with a = 1 and b = 0), given u from U(0, 1), a Cauchy is tan(πu). The tangent function in the inverse CDF could be evaluated by acceptance/rejection in the manner mentioned on page 121, but if the inverse CDF is to be used, it is probably better just to use a good numerical function to evaluate the tangent. Kronmal and Peterson (1981) express the Cauchy distribution as a mixture and give an acceptance/complement method that is very fast. Rayleigh Distribution For the Rayleigh distribution with density 2 2 x p(x) = 2 e−x /2σ for 0 ≤ x ≤ ∞ σ
(5.26)
(which is a Weibull distribution with parameters α = 2 and β = 2σ 2 ), variates can be generated by the inverse CDF method as % x = σ − log u. Faster acceptance/rejection methods can be constructed, but if the computing system has fast functions for exponentiation and taking logarithms, the inverse CDF is adequate. Pareto Distribution For the Pareto distribution with density p(x) =
aba xa+1
for b ≤ x ≤ ∞,
(5.27)
192
CHAPTER 5. SPECIFIC DISTRIBUTIONS
variates can be generated by the inverse CDF method as x=
b u1/a
.
There are many variations of the continuous Pareto distribution, the simplest of which is the one defined above. In addition, there are some discrete versions, including various zeta and Zipf distributions. (See Arnold, 1983, for an extensive discussion of the variations.) Variates from these distributions can usually be generated by discretizing some form of a Pareto distribution. Dagpunar (1988) describes such a method for a zeta distribution, in which the Pareto variates are first generated by an acceptance/rejection method. Zipf Distribution The standard Zipf distribution assigns probabilities to the positive integers x proportional to x−α , for α > 1. The probability function is p(x) =
1 ζ(α)xα
for x = 1, 2, 3, . . . ,
(5.28)
0001∞ where ζ(α) = x=1 x−α (the Riemann zeta function). Variates from the simple Zipf distribution can be generated efficiently by a direct acceptance/rejection method given by Devroye (1986a). In this method, first two variates u1 and u2 are generated from U(0, 1), and then x and t are defined as −1/(α−1) 0007 x = 0006u1 and t = (1 + 1/x)α−1 . The variate x is accepted if x≤
t 2α−1 − 1 . t − 1 2α−1 u2
Von Mises Distribution Variates from the von Mises distribution with density p(x) =
1 ec cos(x) 2πI0 (c)
for − π ≤ x ≤ π,
(5.29)
as discussed on page 140, can be generated by the acceptance/rejection method. Best and Fisher (1979) use a transformed folded Cauchy distribution as the majorizing distribution. The majorizing density is g(y) =
1 − ρ2 π(1 + ρ2 − 2ρ cos y)
for 0 ≤ y ≤ π,
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
193
where ρ is chosen in [0, 1) to optimize the probability of acceptance for a given value of the von Mises parameter, c. (Simple plots of g(·) with different values of ρ compared to a plot of p(·) with a given value of c visually lead to a relatively good choice of ρ.) This is the method used in the IMSL Libraries. Dagpunar (1990) gives an acceptance/rejection method for the von Mises distribution that is often more efficient. Inverse Gaussian Distribution The inverse Gaussian distribution is widely used in reliability studies. The density, for location parameter µ > 0 and scale parameter λ > 0, is 0011 p(x) =
λ 2π
00121/2
x−3/2 exp
0011
−λ(x − µ)2 2µ2 x
0012 for 0 ≤ x ≤ ∞.
(5.30)
The inverse Gaussian distribution with µ = 1 is called the Wald distribution. It is the distribution of the first passage time in a standard Brownian motion with positive drift. Michael, Schucany, and Haas (1976) and Atkinson (1982) discussed methods for simulating inverse Gaussian random deviates. The method of Michael et al., given as Algorithm 5.7, is particularly straightforward but efficient. Algorithm 5.7 Michael/Schucany/Haas Method for Generating Inverse Gaussians 1. Generate v from N(0, 1), and set y = v 2 . % 2 µ 2. Set x1 = µ + µ2λy − 2λ 4µλy + µ2 y 2 . 3. Generate u from U(0, 1). µ , then 4. If u ≤ µ+x 1 deliver x = x1 ; otherwise, 2 deliver x = µx1 .
The generalized inverse Gaussian distribution has an additional parameter that is the exponent of x in the density (5.30), which allows for a wider range of shapes. Barndorff-Nielsen and Shephard (2001) discuss the generalized inverse Gaussian distribution, and describe a method for generating random numbers from it.
5.2.13
General Families of Univariate Distributions
In simulation applications, one of the first questions is what is the distribution of the random variables that model observational data. Some distributions, such as Poisson or hypergeometric distributions, are sometimes obvious from first principles of the 1 − s1 = u4 . s2
The IMSL routine rnsph uses these methods for three or four dimensions and uses scaled normals for higher dimensions. Banerjia and Dwyer (1993) consider the related problem of generating random points in a ball, which would be equivalent to generating random points on a sphere and then scaling the radius by a uniform deviate. They describe a divide-and-conquer algorithm that can be used in any dimension and that is faster than scaling normals or scaling points from Marsaglia’s method, assuming that the speed of the underlying uniform generator is great relative to square root computations.
5.3.5
Two-Way Tables
Boyett (1979) and Patefield (1981) consider the problem of generating random entries in a two-way table subject to given marginal row and column totals. The distribution is uniform over the integers that yield the given totals. Boyett derives the joint distribution for the cell counts and then develops the conditional distribution for a given cell, given the counts in all cells in previous rows and all cells in previous columns of the current given row. Patefield then uses the conditional expected value of a cell count to generate a random entry for each cell in turn. Let aij for i = 1, 2, . . . , r and j = 1, 2, . . . , c be the cell count, and use the “dot notation” for summation: a•j is the sum of the counts in the j th column,
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
203
for example, and a•• is the grand total. The conditional probability that the count in the (l, m)th cell is alm given the counts aij , for 1 ≤ i < l and 1 ≤ j ≤ c and for i = l and 1 ≤ j < m, is 0001 0001 0001 0001 0001 al• −
alm ! al• −
j<m
alj ! a•• −
0001
j≤m
alj ! a•• −
a•m −
0001
i≤l
ai• −
0001
i≤l
ai• −
0001
a ! i≤l im
j<m
a•j +
0001
j≤m
a•j −
a•j +
0001
j<m
0001
a i
i≤l
aij !
0001
j<m
i≤l
aij !
! j>m ×
0001
. 0001 0001 a•m − i
For each cell, a random uniform is generated, and the discrete inverse CDF method is used. Sequential evaluation of this expression is fairly simple, so the probability accumulation proceeds rapidly. The full expression is evaluated only once for each cell. Patefield (1981) also speeds up the process by beginning at the conditional expected value of each cell rather than accumulating the CDF from 0. The conditional expected value of the random count in the (l, m)th cell, Alm , for 1 ≤ i < l and 1 ≤ j ≤ c and for i = l and 1 ≤ j < m, is 0001l−1 0001m−1 a•m − i=1 aim al• − j=1 alj E(Alm | aij ) = 0001l−1 0001c a − • j j=m i=1 aij unless the denominator is 0, in which case E(Alm |aij ) is zero. Patefield (1981) gives a Fortran program implementing the method described. This is the method used in the IMSL routine rntab.
5.3.6
Other Specific Multivariate Distributions
Only a few of the standard univariate distributions have standard multivariate extensions. Various applications often lead to different extensions; see Kotz, Balakrishnan, and Johnson (2000). If the density of a multivariate distribution exists and can be specified, it is usually possible to generate variates from the distribution using an acceptance/rejection method. The majorizing density can often be just the product density; that is, a multivariate density with components that are the independent univariate variables, as in the example of the bivariate gamma on page 123. Multivariate Bernoulli Variates and the Multivariate Binomial Distribution A multivariate Bernoulli distribution of correlated binary random variables has applications in modeling system reliability, clinical trials with repeated measures, and genetic transmission of disease. For the multivariate Bernoulli distribution with marginal probabilities π1 , π2 , . . . , πd and pairwise correlations ρij , Emrich and Piedmonte (1991) propose identifying a multivariate normal
204
CHAPTER 5. SPECIFIC DISTRIBUTIONS
distribution with similar pairwise correlations. The normal is determined by solving for normal pairwise correlations, rij , in a system of d(d − 1)/2 equations involving the bivariate standard normal CDF, Φ2 , evaluated at percentiles zπ corresponding to the Bernoulli probabilities: & Φ2 (zπi , zπj ; rij ) = ρij πi (1 − πi )πj (1 − πj ) + πi πj . (5.37) Once these pairwise correlations are determined, a multivariate normal y is generated and transformed to a Bernoulli, x, by the rule xi
=
1 if yi ≤ zπi
= 0 otherwise. Sums of multivariate Bernoulli random variables are multivariate binomial random variables. Phenomena modeled by binomial distributions, within clusters, often exhibit greater or less intracluster variation than independent binomial distributions would indicate. This behavior is called “overdispersion” or “underdispersion”. Overdispersion can be simulated by the beta-binomial distribution discussed earlier. A beta-binomial cannot model underdispersion, but the method of Emrich and Piedmonte (1991) to induce correlations in the Bernoulli variates can be used to model either overdispersion or underdispersion. Ahn and Chen (1995) discuss this method and compare it with the use of a beta-binomial in the case of overdispersion. They also compared the output of the simulation models for both underdispersed and overdispersed binomials with actual data from animal litters. Park, Park, and Shin (1996) give a method for generating correlated binary variates based on sums of Poisson random variables in which the sums have some terms in common. They let Z1 , Z2 , and Z3 be independent Poisson random variables with nonnegative parameters α11 − α12 , α22 − α12 , and α12 , respectively, with the convention that a Poisson with parameter 0 is a degenerate random variable equal to 0, and define the random variables Y1 and Y2 as Y1 = Z1 + Z3 and Y2 = Z2 + Z3 . They then define the binary random variables X1 and X2 by Xi
=
1 if Yi = 0
= 0 otherwise. They then determine the constants, α11 , α22 , and α12 , so that E(Xi ) = πi and Corr(X1 , X2 ) = ρ12 .
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
205
It is easy to see that 0010
( αij = log 1 + ρij
(1 − πi )(1 − πj ) πi πj
) (5.38)
yields those relations. After the αs are computed, the procedure is as shown in Algorithm 5.10. Algorithm 5.10 Park/Park/Shin Method for Generating Correlated Binary Variates 0. Set k = 0. 1. Set k = k + 1. Let βk = αrs be the smallest positive αij . 2. If αrr = 0 or αss = 0, then stop. 3. Let Sk be the set of all indices, i, j, for which αij > 0. For all {i, j} ⊆ Sk , set αij = αij − βk . 4. If not all αij = 0, then go to step 1. 5. Generate k Poisson deviates, zj , with parameters βj . For i = 1, 2, . . . , d, set
zj . yi = i∈Sj
6. For i = 1, 2, . . . , d, set xi
= 1 if yi = 0 = 0 otherwise.
Lee (1993) gives another method to generate multivariate Bernoullis that uses odds ratios. (Odds ratios and correlations uniquely determine each other for binary variables.) Multivariate Beta or Dirichlet Distribution The Dirichlet distribution is a multivariate extension of a beta distribution, and the density of the Dirichlet is the obvious extension of the beta density (equation (5.14)), p(x)
=
Γ(Σαj ) .
Γ(αj )
.
α −1
xj j
(1 − x1 − x2 − · · · − xd )αd+1 −1 for 0 ≤ xj ≤ 1.
(5.39)
Arnason and Baniuk (1978) consider several ways to generate deviates from the Dirichlet distribution, including a sequence of conditional betas and the use of
206
CHAPTER 5. SPECIFIC DISTRIBUTIONS
the relationship of order statistics from a uniform distribution to a Dirichlet. (The ith order statistic from a sample of size n from a U(0, 1) distribution has a beta distribution with parameters i and n − i + 1.) The most efficient method that they found was the use of a relationship between independent gamma variates and a Dirichlet variate. If Y1 , Y2 , . . . , Yd , Yd+1 are independently distributed as gamma random variables with shape parameters α1 , α2 , . . . , αd , αd+1 and common scale parameter, then the d-vector X with elements Yj Xj = 0001d+1 k=1
Yk
,
j = 1, . . . , k,
has a Dirichlet distribution with parameters α1 , α2 , . . . , αd . This relationship yields the straightforward method of generating Dirichlets by generating gammas. Dirichlet-Multinomial Distribution The Dirichlet-multinomial distribution is the mixture distribution that is a multinomial with parameter π that is a realization of a random variable having a Dirichlet distribution. Just like the beta-binomial distribution (5.18), this distribution is useful for modeling overdispersion or extravariation in applications where there are clusters of separate multinomial distributions. A simple way of generating deviates from a Dirichlet-multinomial distribution is first to generate the parameter π as the appropriate Dirichlet and then to generate the multinomial conditionally. There are other ways of inducing overdispersion in multinomial distributions. Morel (1992) describes a simple algorithm to generate a finite mixture of multinomials by clumping individual multinomials. This mixture distribution has the same first two moments of the Dirichlet-multinomial distribution, but it is not the same distribution. Multivariate Hypergeometric Distribution The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution for more than two types of outcomes. The model is an urn filled with balls of different colors. The multivariate random variable is the vector of numbers of each type of ball when N balls are drawn randomly and without replacement. The probability function for the multivariate hypergeometric distribution is the same as that for the univariate hypergeometric distribution (equation (5.22), page 189) except with more classes. To generate a multivariate hypergeometric random deviate, a simple way is to work with the marginals. The generation is done sequentially. Each succeeding conditional marginal is hypergeometric. To generate the deviate, combine all classes except the first in order to form just two classes. Next, generate a univariate hypergeometric deviate x1 . Then remove the first class and form two classes consisting of the second one and the third through the
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
207
last combined, and generate a univariate hypergeometric deviate based on N − x1 draws. This gives x2 , the number of the second class. Continue in this manner until the number remaining to be drawn is 0 or until the classes are exhausted. For efficiency, the first marginal used would be the one with the largest probability. Multivariate Uniform Distribution Falk (1999) considers the problem of simulating a d-variate Ud (0, 1) distribution with specified correlation matrix R = (ρij ). A simple approximate method is to generate y from Nd (0, R) and take xi = Φ(yi ), where Φ is the standard normal CDF. Falk shows that the correlation matrix of variates generated in this way is very close to the target correlation matrix R. He also shows that if the matrix ˜ R
= (rij )
= 2 sin(πρij /6)
˜ and Xi = Φ(Yi ), then Corr(X) = is positive semidefinite, and if Y ∼ Nd (0, R) R. Therefore, if the target correlation matrix R is such that the corresponding ˜ is positive semidefinite, then variates generated as above are from a matrix R d-variate Ud (0, 1) distribution with exact correlation matrix R. Multivariate Exponential Distributions A multivariate exponential distribution can be defined in terms of Poisson shocks (see Marshall and Olkin, 1967), and variates can be generated from that distribution by generating univariate Poisson variates (see Dagpunar, 1988). There are various ways to define a multivariate double exponential distribution, or a multivariate Laplace distribution. Ernst (1998) describes an elliptically contoured multivariate Laplace distribution with density
γ/2 γΓ(d/2) . (5.40) − (x − µ)T Σ −1 (x − µ) p(x) = 1 exp 2π d/2 Γ(d/γ)|Σ| 2 Ernst shows that a simple way to generate a variate from this distribution is to generate a point s on the d-dimensional sphere (see Section 5.3.4), generate a generalized univariate gamma variate y (page 182) with parameters d, 1, and γ, and deliver x = yT T s + µ, where T T T = Σ. Kozubowski and Podg´ orski (2000) describe an asymmetric multivariate Laplace distribution (not elliptically contoured). They also describe a method for generating random deviates from that distribution.
208
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Multivariate Gamma Distributions The bivariate gamma distribution of Becker and Roux (1981) discussed in Section 4.5 (page 123) is only one possibility for extending the gamma. Others, motivated by different models of applications, are discussed by Mihram and Hultquist (1967), Ronning (1977), Ratnaparkhi (1981), and Jones, Lai, and Rayner (2000), for example. Ronning (1977) and London and Gennings (1999) describe specific multivariate gamma distributions and describe methods for generating variates from the multivariate gamma distribution that they considered. The bivariate gamma distribution of Jones, Lai, and Rayner (2000) is formed from two univariate gamma distributions with fixed shape parameters and scale parameters ζ and ξ, each of which takes one of two values with a generalized Bernoulli distribution. For i, j = 1, 2, Pr(ζ = ζi , ξ = ξj ) = πij . The correlation between the two elements of the bivariate gamma depends on the πij , as Jones, Lai, and Rayner (2000) discuss. It is easy to generate random variates from this bivariate distribution: for each variate, generate a value for ζ and ξ, and then generate two of the univariate gammas. Multivariate Stable Distributions Various multivariate extensions of the stable distributions can be defined. Modarres and Nolan (1994) give a representation of a class of multivariate stable distributions in which the multivariate stable random variable is a weighted sum of a univariate stable random variable times a point on the unit sphere. The reader is referred to the paper for the description of the class of multivariate stable distributions for which the method applies. See also Nolan (1998a).
5.3.7
Families of Multivariate Distributions
Methods are available for generating multivariate distributions with various specific properties. Extensions have been given for multivariate versions of some of the general families of univariate distributions discussed on page 193. Parrish (1990) gives a method to generate random deviates from a multivariate Pearson family of distributions. Takahasi (1965) defines a multivariate extension of the Burr distributions. Generation of deviates from the multivariate Burr distribution can be accomplished by transformations of univariate samples. Gange (1995) gives a method for generating general multivariate categorical variates using iterative proportional fitting to the marginals. A useful general class of multivariate distributions are the elliptically contoured distributions. A nonsingular elliptically contoured distribution has a density of the general form p(x) =
c |Σ|
1 2
g (x − µ)T Σ −1 (x − µ) ,
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
209
where g(·) is a nonnegative function, and Σ is a positive definite matrix. The multivariate normal distribution is obviously of this class, as is the multivariate Laplace distribution discussed above. There are other interesting distributions in this class, including two types of multivariate Pearson distributions. Johnson (1987) discusses general methods for generating variates from the Pearson elliptically contoured distributions. The book edited by Fang and Anderson (1990) contains several papers on applications of elliptically contoured distributions. Cook and Johnson (1981, 1986) define families of non-elliptically symmetric multivariate distributions, and consider their use in applications for modeling data. Johnson (1987) discusses methods for generating variates from those distributions. Distributions with Specified Correlations Li and Hammond (1975) propose a method for a d-variate distribution with specified marginals and variance-covariance matrix. The Li–Hammond method uses the inverse CDF method to transform a d-variate normal into a multivariate distribution with specified marginals. The variance-covariance matrix of the multivariate normal is chosen to yield the specified variance-covariance matrix for the target distribution. The determination of the variance-covariance matrix for the multivariate normal to yield the desired target distribution is difficult, however, and does not always yield a positive definite variance-covariance matrix for the multivariate normal. (An approximate variance-covariance or correlation matrix that is not positive definite can be a general problem in applications of multivariate simulation. See Exercise 6.1 in Gentle, 1998, for a possible solution.) Lurie and Goldberg (1998) modify the Li–Hammond approach by iteratively refining the correlation matrix of the underlying normal using the sample correlation matrix of the transformed variates. They begin with a fixed sample of t multivariate normals with the identity matrix as the variance-covariance. These normal vectors are first linearly transformed by the matrix T (k) as described on page 197 and then transformed by the inverse CDF method into a sample of t vectors with the specified marginal distributions. The correlation matrix of the transformed sample is computed and compared with the target correlation. A measure of the difference in the sample correlation matrix and the target correlation is minimized by iterations over T (k) . A good starting point for T (0) is the d × d matrix that is the square root of the target correlation matrix R (that is, the Cholesky factor) so that T (0)T T (0) = R. The measure of the difference in the sample correlation matrix and the target correlation is a sum of squares, so the minimization is a nonlinear least squares problem. The sample size t to use in the determination of the optimal transformation matrix must be chosen in advance. Obviously, t must be large enough to give some confidence that the sample correlation matrices reflect the target accurately. Because of the number of variables in the optimization
210
CHAPTER 5. SPECIFIC DISTRIBUTIONS
problem, it is likely that t should be chosen proportional to d2 . Once the transformation matrix is chosen, to generate a variate from the target distribution, first generate a variate from Nd (0, I), then apply the linear transformation, and finally apply the inverse CDF transformation. To generate n variates from the target distribution, Lurie and Goldberg (1998) also suggest that the normals be adjusted so that the sample has a mean of 0 and a variancecovariance matrix exactly equal to the expected value that the transformation would yield. (This is constrained sampling, as discussed on page 248.) Vale and Maurelli (1983) also generate general random variates using a multivariate normal distribution with the target correlation matrix as a starting point. They express the individual elements of the multivariate random variable of interest as polynomials in the elements of the multivariate normal random variable, similar to the method of Fleishman (1978) in equation (5.33). They then determine the coefficients in the polynomials so that the lower-order marginal moments correspond to specified values. This does not, of course, mean that the correlation matrix of the random variable determined in this way is the desired matrix. Vale and Maurelli suggest use of the first four marginal moments. Parrish (1990), as mentioned above, gives a method for generating variates from a multivariate Pearson family of distributions. A member of the Pearson family is specified by the first four moments, which of course includes the covariances. Kachitvichyanukul, Cheng, and Schmeiser (1988a) describe methods for inducing correlation in binomial and Poisson random variates.
5.4
Data-Based Random Number Generation
Often, we have a sample and need to generate random numbers from the unknown distribution that yielded it. Specifically, we have a set of observations, {x1 , x2 , . . . , xn }, and we wish to generate a pseudorandom sample from the same distribution as the given dataset. This kind of method is called databased random number generation. Discrete Distributions How we proceed depends on some assumptions about the distribution. If the distribution is discrete and we have no other information about it than what is available from the given dataset, the best way of generating a pseudorandom sample from the distribution is just to generate a random sample of indices with replacement (see Chapter 6) and then use the index set as indices for the given sample. For scalar x, this is equivalent to using the inverse CDF method on the ECDF (the empirical cumulative distribution function): Pn (x) =
1 (number of xi ≤ x). n
5.4. well-designed=' s-plus=' or=' r=' function=' that=' invokes=' random=' number=' generator=' would=' have=' code=' similar=' to=' in=' figure=' 8.5.=' oldseed=' <-=' .random.seed=' ..=' <<-=' return(..)
# save seed on entry # restore seed on exit
Figure 8.5: Saving and Restoring the State of the Generator within an S-Plus or R Function
Monte Carlo in S-Plus and R Explicit loops in S-Plus or R execute very slowly. For that reason, it is best to use array arguments for functions rather than to loop over scalar values of the
EXERCISES
295
arguments. Consider, for example, the problem of evaluating the integral 0016 2 log(x + 1)x2 (2 − x)3 dx. 0
This could be estimated in a loop as follows: # First, initialize n. uu <- runif(n, 0, 2) eu <- 0 for (i in 1:n) eu <- eu + log(uu[i]+1)*uu[i]^2*(2-uu[i])^3 eu <- 2*eu/n A much more efficient way, without the for loop but still using the uniform, is uu <- runif(n, 0, 2) eu <- 2*sum(log(uu+1)*uu^2*(2-uu)^3)/n Alternatively, using the beta density as a weight function, we have eb <- (16/15)*sum(log(2*rbeta(n,3,4)+1))/n (Of course, if we recognize the relationship of the integral to the beta distribution, we would not use Monte Carlo as the method of integration.) For large-scale Monte Carlo studies, an interpretive language such as S-Plus or R may require an inordinate amount of running time. These systems are very useful for prototyping Monte Carlo studies, but it is often better to do the actual computations in a compiled language such as Fortran or C.
Exercises 8.1. Identify as many random number generators as you can that are available on your computer system. Try to determine what method each uses. Do the generators appear to be of high quality? 8.2. Consider the problem of evaluating the integral 0016 π 0016 40016 ∞ x x2 y 3 sin(z)(π + z)2 (π − z)3 e− 2 dx dy dz. −π
0
0
Note the gamma and beta weighting functions. (a) Write a Fortran or C program to use the IMSL Libraries to evaluate this integral by Monte Carlo methods. Use a sample of size 1000, and save the state of the generator, so you can restart it. Now, use a sample of size 10,000, starting where you left off in the first 1000. Combine your two estimates. (b) Now, do the same thing in S-Plus.
296
CHAPTER 8. RANDOM NUMBER GENERATION SOFTWARE (c) Now, do the same thing in Fortran 90 using its built-in random number functions. You may use other software to evaluate special functions if you wish.
8.3. Obtain the programs for Algorithm 738 for generating quasirandom numbers (Bratley, Fox, and Niederreiter, 1994) from the Collected Algorithms of the ACM. The programs are in Fortran and may require a small number of system-dependent modifications, which are described in the documentation embedded in the source code. Devise some tests for Monte Carlo evaluation of multidimensional integrals, and compare the performance of Algorithm 738 with that of a pseudorandom number generator. (Just use any convenient pseudorandom generator available to you.) The subroutine TESTF accompanying Algorithm 738 can be used for this purpose. Can you notice any difference in the performance? 8.4. Obtain the code for SPRNG, the scalable parallel random number generators. The source code is available at http://sprng.cs.fsu.edu Get the code running, preferably on a parallel system. (It will run on a serial machine also.) Choose some simple statistical tests, and apply them to sample output from single streams and also to the output of separate streams. (In the latter case, the primary interest is in correlations across the streams.)
Chapter 9
Monte Carlo Studies in Statistics In statistical inference, certain properties of the test statistic or estimator must be assumed to be known. In simple cases, under rigorous assumptions, we have complete knowledge of the statistic. In testing a mean of a normal distribution, for example, we use a t statistic, and we know its exact distribution. In other cases, however, we may have a perfectly reasonable test statistic but know very little about its distribution. For example, suppose that a statistic T , computed from a differenced time series, could be used to test the hypothesis that the order of differencing is sufficient to yield a series with a zero mean. If enough information about the distribution of T is known under the null hypothesis, that value may be used to construct a test that the differencing is adequate. This, in fact, was what Erastus Lyman de Forest studied in the 1870s in one of the earliest documented Monte Carlo studies of a statistical procedure. De Forest studied ways of smoothing a time series by simulating the data using cards drawn from a box. A description of De Forest’s Monte Carlo study is given in Stigler (1978). Stigler (1991) also describes other Monte Carlo simulation by nineteenth-century scientists and suggests that “simulation, in the modern sense of that term, may be the oldest of the stochastic arts”. Another early use of Monte Carlo was the sampling experiment (using biometric data recorded on pieces of cardboard) that led W. S. Gosset to the discovery of the distribution of the t-statistic and the correlation coefficient. (See Student, 1908a, 1908b. Of course, it was Ronald Fisher who later worked out the distributions.) Major advances in Monte Carlo techniques were made during World War II and afterward by mathematicians and scientists working on problems in atomic physics. (In fact, it was the group led by John von Neumann and S. M. Ulam who coined the term “Monte Carlo” to refer to these methods.) The use of Monte Carlo techniques by statisticians gradually increased from the time of
297
298
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
De Forest, but after the widespread availability of digital computers, the usage greatly expanded. In the mathematical sciences, including statistics, simulation has become an important tool in the development of theory and methods. For example, if the properties of an estimator are very difficult to work out analytically, a Monte Carlo study may be conducted to estimate those properties. Often, the Monte Carlo study is an informal investigation whose main purpose is to indicate promising research directions. If a “quick and dirty” Monte Carlo study indicates that some method of inference has good properties, it may be worth the time of the research worker in developing the method and perhaps doing the difficult analysis to confirm the results of the Monte Carlo study. In addition to quick Monte Carlo studies that are mere precursors to analytic work, Monte Carlo studies often provide a significant amount of the available knowledge of the properties of statistical techniques, especially under various alternative models. A large proportion of the articles in the statistical literature include Monte Carlo studies. In recent issues of the Journal of the American Statistical Association, for example, almost half of the articles report on Monte Carlo studies that supported the research. One common use of Monte Carlo studies is to compare statistical methods. For example, we may wish to compare a procedure based on maximum likelihood with a procedure using least squares. The comparison of methods is often carried out for different distributions for the random component of the model used in the study. It is especially interesting to study how standard statistical methods perform when the distribution of the random component has heavy tails or when the distribution is contaminated by outliers. Monte Carlo methods are widely used in these kinds of studies of the robustness of statistical methods.
9.1
Simulation as an Experiment
A simulation study that incorporates a random component is an experiment. The principles of statistical design and analysis apply just as much to a Monte Carlo study as they do to any other scientific experiment. The Monte Carlo study should adhere to the same high standards of any scientific experimentation: • control; • reproducibility; • efficiency; • careful and complete documentation. In simulation, control, among other things, relates to the fidelity of a nonrandom process to a random process. The experimental units are only simulated.
9.1. SIMULATION AS AN EXPERIMENT
299
Questions about the computer model must be addressed (tests of the random number generators and so on). Likewise, reproducibility is predicated on good random number generators (or else on equally bad ones!). Portability of the random number generators enhances reproducibility and in fact can allow strict reproducibility. Reproducible research also requires preservation and documentation of the computer programs that produced the results (see Buckheit and Donoho, 1995). The principles of good statistical design can improve the efficiency. Use of good designs (fractional factorials, etc.) can allow efficient simultaneous exploration of several factors. Also, there are often many opportunities to reduce the variance (improve the efficiency). Hammersley and Hanscomb (1964, page 8) note .. statisticians were insistent that other experimentalists should design experiments to be as little subject to unwanted error as possible, and had indeed given important and useful help to the experimentalist in this way; but in their own experiments they were singularly inefficient, nay negligent in this respect. Many properties of statistical methods of inference are analytically intractable. Asymptotic results, which are often easy to work out, may imply excellent performance, such as consistency with a good rate of convergence, but the finite sample properties are ultimately what must be considered. Monte Carlo studies are a common tool for investigating the properties of a statistical method, as noted above. In the literature, the Monte Carlo studies are sometimes called “numerical results”. Some numerical results are illustrated by just one randomly generated dataset; others are studied by averaging over thousands of randomly generated sets. In a Monte Carlo study, there are usually several different things (“treatments” or “factors”) that we want to investigate. As in other kinds of experiments, a factorial design is usually more efficient. Each factor occurs at different “levels”, and the set of all levels of all factors that are used in the study constitutes the “design space”. The measured responses are properties of the statistical methods, such as their sample means and variances. The factors commonly studied in Monte Carlo experiments in statistics include the following. • statistical method (estimator, test procedure, etc.) • sample size • the problem for which the statistical method is being applied (that is, the “true” model, which may be different from the one for which the method was developed). Factors relating to the type of problem may be: – distribution of the random component in the model (normality?) – correlation among observations (independence?)
300
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS – homogeneity of the observations (outliers?, mixtures?) – structure of associated variables (leverage?)
The factor whose effect is of primary interest is the statistical method. The other factors are generally just blocking factors. There is, however, usually an interaction between the statistical method and these other factors. As in physical experimentation, observational units are selected for each point in the design space and measured. The measurements, or “responses” made at the same design point, are used to assess the amount of random variation, or variation that is not accounted for by the factors being studied. A comparison of the variation among observational units at the same levels of all factors with the variation among observational units at different levels is the basis for a decision as to whether there are real (or “significant”) differences at the different levels of the factors. This comparison is called analysis of variance. The same basic rationale for identifying differences is used in simulation experiments. A fundamental (and difficult) question in experimental design is how many experimental units to observe at the various design points. Because the experimental units in Monte Carlo studies are generated on the computer, they are usually rather inexpensive. The subsequent processing (the application of the factors, in the terminology of an experiment) may be very extensive, however, so there is a need to design an efficient experiment.
9.2
Reporting Simulation Experiments
The reporting of a simulation experiment should receive the same care and consideration that would be accorded the reporting of other scientific experiments. Hoaglin and Andrews (1975) outline the items that should be included in a report of a simulation study. In addition to a careful general description of the experiment, the report should include mention of the random number generator used, any variance-reducing methods employed, and a justification of the simulation sample size. The Journal of the American Statistical Association includes these reporting standards in its style guide for authors. Closely related to the choice of the sample size is the standard deviation of the estimates that result from the study. The sample standard deviations actually achieved should be included as part of the report. Standard deviations are often reported in parentheses beside the estimates with which they are associated. A formal analysis, of course, would use the sample variance of each estimate to assess the significance of the differences observed between points in the design space; that is, a formal analysis of the simulation experiment would be a standard analysis of variance. The most common method of reporting the results is by means of tables, but a better understanding of the results can often be conveyed by graphs.
9.3. AN EXAMPLE
9.3
301
An Example
One area of statistics in which Monte Carlo studies have been used extensively is robust statistics. This is because the finite sampling distributions of many robust statistics are very difficult to work out, especially for the kinds of underlying distributions for which the statistics are to be studied. A well-known use of Monte Carlo methods is in the important study of robust statistics described by Andrews et al. (1972), who introduced and examined many alternative estimators of location for samples from univariate distributions. This study, which involved many Monte Carlo experiments, employed innovative methods of variance reduction and was very influential in subsequent Monte Carlo studies reported in the statistical literature. As an example of a Monte Carlo study, we will now describe a simple experiment to assess the robustness of a statistical test in linear regression analysis. The purpose of this example is to illustrate some of the issues in designing a Monte Carlo experiment. The results of this small study are not of interest here. There are many important issues about the robustness of the procedures that we do not address in this example. The Problem Consider the simple linear regression model Y = β0 + β1 x + E, where a response or “dependent variable”, Y , is modeled as a linear function of a single regressor or “independent variable”, x, plus a random variable, E, called the “error”. Because E is a random variable, Y is also a random variable. The statistical problem is to make inferences about the unknown, constant parameters β0 and β1 and about distributional parameters of the random variable, E. The inferences are made based on a sample of n pairs, (yi , xi ), with which are associated unobservable realizations of the random error, 0001i , and are assumed to have the relationship (9.1) yi = β0 + β1 xi + 0001i . We also generally assume that the realizations of the random error are independent and are unrelated to the value of x. For this example, let us consider just the specific problem of testing the hypothesis (9.2) H0 : β1 = 0 versus the universal alternative. If the distribution of E is normal and we make the additional assumptions above about the sample, the optimal test for the hypothesis (using the common definitions of optimality) is based on a least squares procedure that yields the statistic % 0001 β00171 (n − 2) (xi − x¯)2 %0001 , (9.3) t= ri2
302
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
Figure 9.1: Least Squares Fit Using Two Datasets that are the Same Except for Two Outliers where x ¯ is the mean of the xs, β00171 together with β00170 minimizes the function L2 (b0 , b1 ) =
n
(yi − b0 − b1 xi )2 ,
i=1
and
ri = yi − (β00170 + β00171 xi ).
If the null hypothesis is true, then t is a realization of a Student’s t distribution with n − 2 degrees of freedom. The test is performed by comparing the p-value from the Student’s t distribution with a preassigned significance level, α, or by comparing the observed value of t with a critical value. The test of the hypothesis depends on the estimates of β0 and β1 used in the test statistic t. Often, a dataset contains outliers (that is, observations that have a realized error that is very large in absolute value) or observations for which the model is not appropriate. In such cases, the least squares procedure may not perform so well. We can see the effect of some outliers on the least squares estimates of β0 and β1 in Figure 9.1. For well-behaved data, as in the plot on the left, the least squares estimates seem to fit the data fairly well. For data with two outlying points, as in the plot on the right in Figure 9.1, the least squares estimates are affected so much by the two points in the upper left part of the graph that the estimates do not provide a good fit for the bulk of the data. Another method of fitting the linear regression line that is robust to outliers in E is to minimize the absolute values of the deviations. The least absolute
9.3. AN EXAMPLE
303
values procedure chooses estimates of β0 and β1 to minimize the function L1 (b0 , b1 ) =
n
|yi − b0 − b1 xi |.
i=1
Figure 9.2 shows the same two datasets as before with the least squares (LS) fit and the least absolute values (LAV) fit plotted on both graphs. We see that the least absolute values fit does not change because of the outliers.
Figure 9.2: Least Squares Fits and Least Absolute Values Fits Another concern in regression analysis is the unduly large influence that some individual observations exert on the aggregate statistics because the values of x in those observations lie at a large distance from the mean of all of the xi s (that is, those observations whose values of the independent variables are outliers). The influence of an individual observation is called leverage. Figure 9.3 shows two datasets together with the least squares and the least absolute values fits for both. In both datasets, there is one value of x that lies far outside the range of the other values of x. All of the data in the plot on the left in Figure 9.3 lie relatively close to a line, and both fits are very similar. In the plot on the right, the observation with an extreme value of x also happens to have an outlying value of E. The effect on the least squares fit is marked, while the least absolute values fit is not affected as much. (Despite this example, least absolute values fits are generally not very robust to outliers at high leverage points, especially if there are multiple such outliers. There are other methods of fitting that are more robust to outliers at high leverage points. We refer the interested reader to Rousseeuw and Leroy, 1987, for discussion of these issues.)
304
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
Figure 9.3: Least Squares and Least Absolute Values Fits Now, we continue with our original objective in this example: to evaluate ways of testing the hypothesis (9.2). A test statistic analogous to the one in equation (9.3), but based on the least absolute values fit, is %0001 (xi − x¯)2 2β˜1 √ , (9.4) t1 = (e(k2 ) − e(k1 ) ) n − 2 where β˜1 together with β˜0 minimizes the function L1 (b0 , b1 ) =
n
|yi − b0 − b1 xi |,
i=1
e(k) is the k th order statistic from ei = yi − (β˜0 + β˜1 xi ), √ k1 is the integer √ closest to (n − 1)/2 − n − 2, and k2 is the integer closest to (n − 1)/2 + n − 2. This statistic has an approximate Student’s t distribution with n − 2 degrees of freedom (see Birkes and Dodge, 1993, for example). If the distribution of the random error is normal, inference based on minimizing the sum of the absolute values is not nearly as efficient as inference based on least squares. This alternative to least squares should therefore be used with some discretion. Furthermore, there are other procedures that may warrant consideration. It is not our purpose here to explore these important issues in robust statistics, however.
9.3. AN EXAMPLE
305
The Design of the Experiment At this point, we should have a clear picture of the problem: we wish to compare two ways of testing the hypothesis (9.2) under various scenarios. The data may have outliers, and there may be observations with large leverage. We expect that the optimal test procedure will depend on the presence of outliers or, more generally, on the distribution of the random error and on the pattern of the values of the independent variable. The possibilities of interest for the distribution of the random error include • the family of the distribution (that is, normal, double exponential, Cauchy, and so on); • whether the distribution is a mixture of more than one basic distribution, and, if so, the proportions in the mixture; • the values of the parameters of the distribution (that is, the variance, the skewness, or any other parameters that may affect the power of the test). In textbooks on the design of experiments, a simple objective of an experiment is to perform a t test or an F test of whether different levels of response are associated with different treatments. Our objective in the Monte Carlo experiment that we are designing is to investigate and characterize the dependence of the performance of the hypothesis test on these factors. The principles of design are similar to those of other experiments, however. It is possible that the optimal test of the hypothesis will depend on the sample size or on the true values of the coefficients in the regression model, so some additional issues that are relevant to the performance of a statistical test of this hypothesis are the sample size and the true values of β0 and β1 . In the terminology of statistical models, the factors in our Monte Carlo experiment are the estimation method and the associated test, the distribution of the random error, the pattern of the independent variable, the sample size, and the true value of β0 and β1 . The estimation method together with the associated test is the “treatment” of interest. The “effect” of interest (that is, the measured response) is the proportion of times that the null hypothesis is rejected using the two treatments. We now can see our objective more clearly: for each setting of the distribution, pattern, and size factors, we wish to measure the power of the two tests. These factors are similar to blocking factors except that there is likely to be an interaction between the treatment and these factors. Of course, the power depends on the nominal level of the test, α. It may be the case that the nominal level of the test affects the relative powers of the two tests. We can think of the problem in the context of a binary response model, E(Pijklqsr ) = f (τi , δj , φk , νl , αq , β1s ),
(9.5)
where the parameters represent levels of the factors listed above (β1s is the sth level of β1 ), and Pijklqsr is a binary variable representing whether the test
306
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
rejects the null hypothesis on the rth trial at the (ijklqs)th setting of the design factors. It is useful to write down a model like this to remind ourselves of the issues in designing an experiment. At this point, it is necessary to pay careful attention to our terminology. We are planning to use a statistical procedure (a Monte Carlo experiment) to evaluate a statistical procedure (a statistical test in a linear model). For the statistical procedure that we will use, we have written a model (9.5) for the observations that we will make. Those observations are indexed by r in that model. Let m be the sample size for each combination of factor settings. This is the Monte Carlo sample size. It is not to be confused with the data sample size, n, which is one of the factors in our study. We now choose the levels of the factors in the Monte Carlo experiment. • For the estimation method, we have decided on two methods: least squares and least absolute values. Its differential effect in the binary response model (9.5) is denoted by τi for i = 1, 2. • For the distribution of the random error, we choose three general ones: 1. normal (0, 1); 2. normal (0, 1) with c% outliers from normal (0, d2 ); 3. standard Cauchy. We choose different values of c and d as appropriate. For this example, let us choose c = 5 and 20 and d = 2 and 5. Thus, in the binary response model (9.5), j = 1, 2, 3, 4, 5, 6. • For the pattern of the independent variable, we choose three different arrangements: 1. uniform over the range; 2. a group of extreme outliers; 3. two groups of outliers. In the binary response model (9.5), k = 1, 2, 3. We use fixed values of the independent variable. • For the sample size, we choose three values: 20, 200, and 2000. In the binary response model (9.5), l = 1, 2, 3. • For the nominal level of the test, we choose two values: 0.01 and 0.05. In the binary response model (9.5), q = 1, 2. • The true value of β0 is probably not relevant, so we just choose β0 = 1. We are interested in the power of the tests at different values of β1 . We expect the power function to be symmetric about β1 = 0 and to approach 1 as |β1 | increases.
9.3. AN EXAMPLE
307
The estimation method is the “treatment” of interest. Restating our objective in terms of the notation introduced above, for each of two tests, we wish to estimate the power curve, Pr(reject H0 ) = g(β1 | τi , δj , φk , νl , αq ), for any combination (τi , δj , φk , νl , αq ). For either test, this curve should have the general appearance of the curve shown in Figure 9.4. The minimum of the power curve should occur at β1 = 0 and should be α. The curve should approach 1 symmetrically as |β1 |.
Figure 9.4: Power Curve for Testing β1 = 0 To estimate the curve, we use a discrete set of points, and because of symmetry, all values chosen for β1 can be nonnegative. The first question is at what point does the curve flatten out just below 1. We might arbitrarily define the region of interest to be that in which the power is less than 0.99 approximately. The abscissa of this point is the maximum β1 of interest. This point, say β1∗ , varies, depending on all of the factors in the study. We could work this out in the least squares case for uncontaminated normal errors using the noncentral Student’s t distribution, but, for other cases, it is analytically intractable. Hence, we compute some preliminary Monte Carlo estimates to determine the maximum β1 for each factor combination in the study. To do a careful job of fitting a curve using a relatively small number of points, we would choose points where the second derivative is changing rapidly and especially near points of inflection where the second derivative changes sign. Because the problem of determining these points for each combination of
308
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
(i, j, k, l, q) is not analytically tractable (otherwise, we would not be doing the study!), we may conveniently choose a set of points equally spaced between 0 and β1∗ . Let us decide on five such points for this example. It is not important that the β1∗ s be chosen with a great deal of care. The objective is that we be able to calculate two power curves between 0 and β1∗ that are meaningful for comparisons. The Experiment The observational units in the experiment are the values of the test statistics (9.3) and (9.4). The measurements are the binary variables corresponding to rejection of the hypothesis (9.2). At each point in the factor space, there will be m such observations. If z is the number of rejections observed, then the estimate of the power is z/m, and the variance of the estimator is π(1 − π)/m, where π is the true power at that point. (z is a realization of a binomial random variable with parameters m and π.) This leads % us to a choice of the value of m. The coefficient of variation at any point is (1 − π)/(mπ), which increases as π decreases. At π = 0.50, a 5% coefficient of variation can be achieved with a sample of size 400. This yields a standard deviation of 0.025. There may be some motivation to choose a slightly larger value of m because we can assume that the minimum of π will be approximately the minimum of α. To achieve a 5% coefficient of variation at the point at which α1 = 0.05 would require a sample of size approximately 160,000. That would correspond to a standard deviation of 0.0005, which is probably much smaller than we need. A sample size of 400 would yield a standard deviation of 0.005. Although that is large in a relative sense, it may be adequate for our purposes. Because this particular point (where β1 = 0) corresponds to the null hypothesis, however, we may choose a larger sample size, say 4000, at that special point. A reasonable choice therefore is a Monte Carlo sample size of 4000 at the null hypothesis and 400 at all other points. We will, however, conduct the experiment in such a way that we can combine the results of this experiment with independent results from a subsequent experiment. The experiment is conducted by running a computer program. The main computation in the program is to determine the values of the test statistics and to compare them with their critical values to decide on the hypothesis. These computations need to be performed at each setting of the factors and for any given realization of the random sample. We design a program that allows us to loop through the settings of the factors and, at each factor setting, to use a random sample. The result is a nest of loops. The program may be stopped and restarted, so we need to be able to control the seeds (see Section 8.2, page 286). Recalling that the purpose of our experiment is to obtain estimates, we may now consider any appropriate methods of reducing the variance of those estimates. There is not much opportunity to apply methods of variance reduction discussed in Section 7.5, but at least we might consider at what points to use
9.3. AN EXAMPLE
309
common realizations of the pseudorandom variables. Because the things that we want to compare most directly are the powers of the tests, we perform the tests on the same pseudorandom datasets. Also, because we are interested in the shape of the power curves, we may want to use the same pseudorandom datasets at each value of β1 ; that is, to use the same set of errors in the model (9.1). Finally, following similar reasoning, we may use the same pseudorandom datasets at each setting of the pattern of the independent variable. This implies that our program of nested loops has the structure shown in Figure 9.5. Initialize a table of counts. Fix the data sample size. (Loop over the sample sizes n = 20, n = 200, and n = 2000.) Generate a set of residuals for the linear regression model (9.1). (This is the loop of m Monte Carlo replications.) Fix the pattern of the independent variable. (Loop over patterns P1 , P2 , and P3 .) Choose the distribution of the error term. (Loop over the distributions D1 , D2 , D3 , D4 , D5 , and D6 .) For each value of β1 , generate a set of observations (the y values) for the linear regression model (9.1), and perform the tests using both procedures and at both levels of significance. Record results. End distributions loop. End patterns loop. End Monte Carlo loop. End sample size loop. Perform computations of summary statistics. Figure 9.5: Program Structure for the Monte Carlo Experiment After writing a computer program with this structure, the first thing is to test the program on a small set of problems and determine appropriate values of β1∗ . We should compare the results with known values at a few points. (As mentioned earlier, the only points that we can work out correspond to the normal case with the ordinary t statistic. One of these points, at β1 = 0, is easily checked.) We can also check the internal consistency of the results. For example, does the power curve increase? We must be careful, of course, in applying such consistency checks because we do not know the behavior of the tests in most cases. Reporting the Results The report of this Monte Carlo study should address as completely as possible the results of interest. The relative values of the power are the main points
310
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
of interest. The estimated power at β1 = 0 is of interest. This is the actual significance level of the test, and how it compares to the nominal level α is of particular interest. The presentation should be in a form easily assimilated by the reader. This may mean graphs similar to Figure 9.4, except only the nonnegative half, and with the tick marks on the horizontal axis. Two graphs, for the two test procedures, should be shown on the same set of axes. It is probably counterproductive to show a graph for each factor setting. (There are 108 combinations of factor settings.) In addition to the graphs, tables may allow presentation of a large amount of information in a compact format. The Monte Carlo study should be described so carefully that the study could be replicated exactly. This means specifying the factor settings, the loop nesting, the software and computer used, the seed used, and the Monte Carlo sample size. There should also be at least a simple statement explaining the choice of the Monte Carlo sample size. As mentioned earlier, the statistical literature is replete with reports of Monte Carlo studies. Some of these reports (and, likely, the studies themselves) are woefully deficient. An example of a careful Monte Carlo study and a good report of the study are given by Kleijnen (1977). He designed, performed, and reported on a Monte Carlo study to investigate the robustness of a multiple ranking procedure. In addition to reporting on the study of the question at hand, another purpose of the paper was to illustrate the methods of a Monte Carlo study.
Exercises 9.1. Write a computer program to implement the Monte Carlo experiment described in Section 9.3. The S-Plus functions lsfit and l1fit or the IMSL Fortran subroutines rline and rlav can be used to calculate the fits. See Chapter 8 for discussions of other software that you may use in the program. 9.2. Choose a recent issue of the Journal of the American Statistical Association and identify five articles that report on Monte Carlo studies of statistical methods. In each case, describe the Monte Carlo experiment. (a) What are the factors in the experiment? (b) What is the measured response? (c) What is the design space (that is, the set of factor settings)? (d) What random number generators were used? (e) Critique the report in each article. Did the author(s) justify the sample size? Did the author(s) report variances or confidence intervals? Did the author(s) attempt to reduce the experimental variance?
EXERCISES
311
9.3. Select an article that you identified in Exercise 9.2 that concerns a statistical method that you understand and that interests you. Choose a design space that is not a subspace of that used in the article but has a nonnull intersection with it, and perform a similar experiment. Compare your results with those reported in the article.
This page intentionally left blank
Appendix A
Notation and Definitions All notation used in this work is “standard”, and in most cases it conforms to the ISO conventions. (The notable exception is the notation for vectors.) I have opted for simple notation, which, of course, results in a one-to-many map of notation to object classes. Within a given context, however, the overloaded notation is generally unambiguous. I have endeavored to use notation consistently. This appendix is not intended to be a comprehensive listing of definitions. The subject index, beginning on page 377, is a more reliable set of pointers to definitions, except for symbols that are not words.
General Notation Uppercase italic Latin and Greek letters, A, B, E, Λ, and so on, are generally used to represent either matrices or random variables. Random variables are usually denoted by letters nearer the end of the Latin alphabet, X, Y , Z, and by the Greek letter E. Parameters in models (that is, unobservables in the models), whether or not they are considered to be random variables, are generally represented by lowercase Greek letters. Uppercase Latin and Greek letters, especially P , in general, and Φ, for the normal distribution, are also used to represent cumulative distribution functions. Also, uppercase Latin letters are used to denote sets. Lowercase Latin and Greek letters are used to represent ordinary scalar or vector variables and functions. No distinction in the notation is made between scalars and vectors; thus, β may represent a vector, and βi may represent the ith element of the vector β. In another context, however, β may represent a scalar. All vectors are considered to be column vectors, although we may write a vector as x = (x1 , x2 , . . . , xn ). Transposition of a vector or a matrix is denoted by a superscript T . Uppercase calligraphic Latin letters, F, V, W, and so on, are generally used to represent either vector spaces or transforms. 313
314
APPENDIX A. NOTATION AND DEFINITIONS
Subscripts generally represent indexes to a larger structure; for example, xij may represent the (i, j)th element of a matrix, X. A subscript in parentheses represents an order statistic. A superscript in parentheses represents an (k) iteration, for example, xi may represent the value of xi at the k th step of an iterative process. The following are some examples: xi
The ith element of a structure (including a sample, which is a multiset).
x(i)
The ith order statistic.
x(i)
The value of x at the ith iteration.
Realizations of random variables and placeholders in functions associated with random variables are usually represented by lowercase letters corresponding to the uppercase letters; thus, 0001 may represent a realization of the random variable E. A single symbol in an italic font is used to represent a single variable. A Roman font or a special font is often used to represent a standard operator or a standard mathematical structure. Sometimes, a string of symbols in a Roman font is used to represent an operator (or a standard function); for example, exp represents the exponential function, but a string of symbols in an italic font on the same baseline should be interpreted as representing a composition (probably by multiplication) of separate objects; for example, exp represents the product of e, x, and p. A fixed-width font is used to represent computer input or output; for example, a = bx + sin(c). In computer text, a string of letters or numerals with no intervening spaces or other characters, such as bx above, represents a single object, and there is no distinction in the font to indicate the type of object. Some important mathematical structures and other objects are: IR
The field of reals or the set over which that field is defined.
IRd
The usual d-dimensional vector space over the reals or the set of all d-tuples with elements in IR.
IRd+
The set of all d-tuples with positive real elements.
APPENDIX A. NOTATION AND DEFINITIONS
315
C I
The field of complex numbers or the set over which that field is defined.
ZZ
The ring of integers or the set over which that ring is defined.
G(n) I
A Galois field defined on a set with n elements.
C 0 , C 1 , C 2 , . . . The set of continuous functions, the set of functions with continuous first derivatives, and so forth. √ i The imaginary unit −1.
Computer Number Systems Computer number systems are used to simulate the more commonly used number systems. It is important to realize that they have different properties, however. Some notation for computer number systems follows. IF
The set of floating-point numbers with a given precision, on a given computer system, or this set together with the four operators +, -, *, and /. IF is similar to IR in some useful ways; it is not, however, closed under the two basic operations, and not all reciprocals of the elements exclusive of the additive identity exist, so it is clearly not a field.
II
The set of fixed-point numbers with a given length, on a given computer system, or this set together with the four operators +, -, *, and /. II is similar to ZZ in some useful ways; it is not, however, closed under the two basic operations, so it is clearly not a ring.
emin and emax The minimum and maximum values of the exponent in the set of floating-point numbers with a given length. 0001min and 0001max The minimum and maximum spacings around 1 in the set of floating-point numbers with a given length. 0001 or 0001mach
The machine epsilon, the same as 0001min .
[·]c
The computer version of the object ·.
NaN
Not-a-Number.
316
APPENDIX A. NOTATION AND DEFINITIONS
Notation Relating to Random Variables A common function used with continuous random variables is a density function, and a common function used with discrete random variables is a probability function. The more fundamental function for either type of random variable is the cumulative distribution function, or CDF. The CDF of a random variable X, denoted by PX (x) or just by P (x), is defined by P (x) = Pr(X ≤ x), where “Pr”, or “probability”, can be taken here as a primitive (it is defined in terms of a measure). For vectors (of the same length), “X ≤ x” means that each element of X is less than or equal to the corresponding element of x. Both the CDF and the density or probability function for a d-dimensional random variable are defined over IRd . (It is unfortunately necessary to state that “P (x)” means the “function P evaluated at x”, and likewise “P (y)” means the same “function P evaluated at y” unless P has been redefined. Using a different expression as the argument does not redefine the function despite the sloppy convention adopted by some statisticians—including myself sometimes!) The density for a continuous random variable is just the derivative of the CDF (if it exists). The CDF is therefore the integral. To keep the notation simple, we likewise consider the probability function for a discrete random variable to be a type of derivative (a Radon–Nikodym derivative) of the CDF. Instead of expressing the CDF of a discrete random variable as a sum over a countable set, we often also express it as an integral. (In this case, however, the integral is over a set whose ordinary Lebesgue measure is 0.) A useful analog of the CDF for a random sample is the empirical cumulative distribution function, or ECDF. For a sample of size n, the ECDF is 1 I(−∞,x] (xi ) n i=1 n
Pn (x) =
for the indicator function I(−∞,x] (·). Functions and operators such as Cov and E that are commonly associated with Latin letters or groups of Latin letters are generally represented by that letter in a Roman font. Pr(A)
The probability of the event A.
pX (·) or PX (·)
The probability density function (or probability function), or the cumulative probability function, of the random variable X.
pXY (·) or PXY (·)
The joint probability density function (or probability function), or the joint cumulative probability function, of the random variables X and Y .
APPENDIX A. NOTATION AND DEFINITIONS
317
pX|Y (·) or PX|Y (·)
The conditional probability density function (or probability function), or the conditional cumulative probability function, of the random variable X given the random variable Y (these functions are random variables).
pX|y (·) or PX|y (·)
The conditional probability density function (or probability function), or the conditional cumulative probability function, of the random variable X given the realization y. Sometimes, the notation above is replaced by a similar notation in which the arguments indicate the nature of the distribution; for example, p(x, y) or p(x|y).
pθ (·) or Pθ (·)
The probability density function (or probability function), or the cumulative probability function, of the distribution characterized by the parameter θ.
Y ∼ DX (θ)
The random variable Y is distributed as DX (θ), where X is the name of a random variable associated with the distribution, and θ is a parameter of the distribution. The subscript may take forms similar to those used in the density and distribution functions, such as X|y, or it may be omitted. Alternatively, in place of DX , a symbol denoting a specific distribution may be used. An example is Z ∼ N(0, 1), which means that Z has a normal distribution with mean 0 and variance 1.
CDF
A cumulative distribution function.
ECDF
An empirical cumulative distribution function.
i.i.d.
Independent and identically distributed. d
X (i) → X d or Xi → X
The sequence of random variables X (i) or Xi converges in distribution to the random variable X. (The difference in the notation X (i) and Xi is generally unimportant. The former notation is often used to emphasize the iterative nature of a process.)
E(g(X))
The expected value of the function g of the random variable X. The notation EP (·), where P is a cumulative distribution function or some other identifier of a probability distribution, is sometimes used to indicate explicitly the distribution with respect to which the expectation is evaluated.
V(g(X))
The variance of the function g of the random variable X. The notation VP (·) is also often used.
318
APPENDIX A. NOTATION AND DEFINITIONS
Cov(X, Y )
The covariance of the random variables X and Y . The notation CovP (·, ·) is also often used.
Cov(X)
The variance-covariance matrix of the vector random variable X.
Corr(X, Y )
The correlation of the random variables X and Y . The notation CorrP (·, ·) is also often used.
Corr(X)
The correlation matrix of the vector random variable X.
Bias(T, θ) or Bias(T )
The bias of the estimator T (as an estimator of θ); that is,
MSE(T, θ) or MSE(T )
The mean squared error of the estimator T (as an estimator of θ); that is,
Bias(T, θ) = E(T ) − θ.
2 MSE(T, θ) = Bias(T, θ) + V(T ).
General Mathematical Functions and Operators Functions such as sin, max, span, and so on that are commonly associated with groups of Latin letters are generally represented by those letters in a Roman font. Generally, the argument of a function is enclosed in parentheses: sin(x). Often, for the very common functions, the parentheses are omitted: sin x. In expressions involving functions, parentheses are generally used for clarity, for example, (E(X))2 instead of E2 (X). Operators such as d (the differential operator) that are commonly associated with a Latin letter are generally represented by that letter in a Roman font. |x|
The modulus of the real or complex number x; if x is real, |x| is the absolute value of x.
0012x0014
The ceiling function evaluated at the real number x: 0012x0014 is the smallest integer greater than or equal to x.
0006x0007
The floor function evaluated at the real number x: 0006x0007 is the largest integer less than or equal to x.
#S
The cardinality of the set S.
APPENDIX A. NOTATION AND DEFINITIONS IS (·)
319
The indicator function: IS (x)
= 1 if x ∈ S; = 0 otherwise.
If x is a scalar, the set S is often taken as the interval (−∞, y], and, in this case, the indicator function is the Heaviside function, H, evaluated at the difference of the argument and the upper bound on the interval: I(−∞,y] (x) = H(y − x). (An alternative definition of the Heaviside function is the same as this, except that H(0) = 12 .) In higher dimensions, the set S is often taken as the product set, Ad
= (−∞, y1 ] × (−∞, y2 ] × · · · × (−∞, yd ] = A1 × A2 × · · · × Ad ,
and, in this case, IAd (x) = IA1 (x1 )IA2 (x2 ) · · · IAd (xd ), where x = (x1 , x2 , . . . , xd ). The derivative of the indicator function is the Dirac delta function, δ(·). δ(·)
The Dirac delta “function”, defined by δ(x) = 0 for x = 0, 0016
and
∞
δ(t) dt = 1. −∞
The Dirac delta function is not a function in the usual sense. For any continuous function f , we have the useful fact 0016 ∞ 0016 ∞ f (y) dI(−∞,y] (x) = f (y) δ(y − x) dy −∞
−∞
= f (x). minf (·) or min(S)
The minimum value of the real scalar-valued function f , or the smallest element in the countable set of real numbers S.
argminf (·)
The value of the argument of the real scalar-valued function f that yields its minimum value.
320
APPENDIX A. NOTATION AND DEFINITIONS
⊕
Bitwise binary exclusive-or (see page 39).
O(f (n))
Big O; g(n) = O(f (n)) means that there exists a positive constant M such that |g(n)| ≤ M |f (n)| as n → ∞. g(n) = O(1) means that g(n) is bounded from above.
d
The differential operator. The derivative with respect to the d . variable x is denoted by dx 0002
f 0007 , f 00070007 , . . . , f k For the scalar-valued function f of a scalar variable, differentiation (with respect to an implied variable) taken on the function once, twice, . . ., k times. x ¯
The mean of a sample of objects generically denoted by x.
x•
The 0001 sum of the elements in the object x. More generally, xi•k = j xijk .
x−
The multiplicative inverse of x with respect to some modulus (see page 36).
Special Functions log x
The natural logarithm evaluated at x.
sin x
The sine evaluated at x (in radians) and similarly for other trigonometric functions.
x!
The factorial of x. If x is a positive integer, x! = x(x−1) · · · 2·1. For other values of x, except negative integers, x! is often defined as x! = Γ(x + 1).
Γ(α)
The (complete) gamma function. For α not equal to a nonpositive integer, 0016 ∞ tα−1 e−t dt. Γ(α) = 0
We have the useful relationship √ Γ(α) = (α − 1)!. An important argument is 12 , and Γ( 12 ) = π.
APPENDIX A. NOTATION AND DEFINITIONS Γx (α)
The incomplete gamma function: 0016 x Γx (α) = tα−1 e−t dt. 0
B(α, β)
The (complete) beta function: 0016
1
B(α, β) =
tα−1 (1 − t)β−1 dt,
0
where α > 0 and β > 0. A useful relationship is B(α, β) =
Bx (α, β)
Γ(α)Γ(β) . Γ(α + β)
The incomplete beta function: 0016 x Bx (α, β) = tα−1 (1 − t)β−1 dt. 0
321
This page intentionally left blank
Appendix B
Solutions and Hints for Selected Exercises 1.5.
With a = 17, the correlation of pairs of successive numbers should be about 0.09, and the plot should show 17 lines. With a = 85, the correlation of lag 1 is about 0.03, but the correlation of lag 2 is about −0.09.
1.8.
35 planes for 65 541 and 15 planes for 65 533.
1.10.
950 706 376, 129 027 171, 1 728 259 899, 365 181 143, 1 966 843 080, 1 045 174 992, 636 176 783, 1 602 900 997, 640 853 092, 429 916 489.
1.13.
We seek x0 such that 16 807x0 − (231 − 1)c1 = 231 − 2 for some integer c1 . First, observe that 231 − 2 is equivalent to −1, so we use Euler’s method (see, e.g., Ireland and Rosen, 1991, or Fang and Wang, 1994) with that simpler value and write x0
((231 − 1)c1 − 1) 16 807 (2836c1 − 1) = 127 773c1 + . 16 807
=
Because the latter term must also be an integer, we write 16 807c2
=
2836c1 − 1
or c1
=
5c2 + 323
(2627c2 + 1) 2836
324
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES for some integer c2 . Continuing, (209c3 − 1) , 2627 (119c4 + 1) c3 = 12c4 + , 209 (90c5 − 1) , c4 = c5 + 119 (29c6 + 1) c5 = c6 + , 90 (3c7 − 1) , c6 = 3c7 + 29 (3c8 + 1) c7 = 9c8 + , 3 (c9 − 1) . c8 = c9 + 2 Letting c9 = 1, we can backsolve to get x0 = 739 806 647. c2
1.14.
= c3 +
Using Maple, for example, > pr := 0: > while pr < 8191 do > pr := primroot(pr, 8191) > od; yields the 1728 primitive roots, starting with the smallest one, 17, and going through the largest, 8180. To use primroot, you may first have to attach the number theory package: with(numtheory):.
1.15.
0.5.
2.2c.
The distribution is degenerate with probability 1 for r = min(n, m); that is, the matrix is of full rank with probability 1.
2.3.
Out of the 100 trials, 97 times the maximum element is in position 1311. The test is not really valid because the seeds are all relatively small and are very close together. Try the same test but with 100 randomly generated seeds.
4.1a.
X is a random variable with an absolutely continuous distribution function P . Let Y be the random variable P (X). Then, for 0 ≤ t ≤ 1, using the existence of P −1 , Pr(Y ≤ t)
= Pr(P (X) ≤ t) = Pr(X ≤ P −1 (t)) = P −1 (P (t)) = t.
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
325
Hence, Y has a U(0, 1) distribution. 4.2.
Let Z be the random variable delivered. For any x, because Y (from the density g) and U are independent, we have 0011 0012 p(Y ) Pr(Z ≤ x) = Pr Y ≤ x | U ≤ cg(Y ) 001c x 001c p(t)/cg(t) g(t) ds dt 0 = 001c−∞ ∞ 001c p(t)/cg(t) g(t) ds dt −∞ 0 0016 x = p(t) dt, −∞
the distribution function corresponding to p. Differentiating this quantity with respect to x yields p(x). 4.4c.
Using the relationship 1 1 1 x2 √ e− 2 ≤ √ e 2 −|x| 2π 2π (see Devroye, 1986a), we have the following algorithm, after simplification. 1. Generate x from the double exponential, and generate u from U(0, 1). 2. If x2 + 1 − 2|x| ≤ −2 log u, then deliver x; otherwise, go to step 1.
4.5.
As x → ∞, there is no c such that cg(x) ≥ p(x), where g is the normal density and p is the exponential density.
4.7a.
E(T ) = c; V(T ) = c2 − c. (Note that c ≥ 1.)
4.8.
For any t, we have Pr(X ≤ t)
= Pr(X ≤ s + rh) (for 0 ≤ r ≤ 1) = Pr(U ≤ r | V ≤ U + p(s + hU )/b) 001c r 001c u+p(s+hu)/b 2 dv du = 001c01 001cuu+p(s+hu)/b 2 dv du 001c0r u (p(s + hu)/b) du = 001c01 (p(s + hu)/b) du 0 0016 t p(x) dx, = s
where all of the symbols correspond to those in Algorithm 4.7 with the usual convention of uppercase representing random variables and lowercase representing constants or realizations of random variables.
326
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
5.2b.
We can consider only the case in which τ ≥ 0; otherwise, we could make use of the symmetry of the normal distribution and split the algorithm into two regions. Also, for simplicity, we can generate truncated normals with µ = 0 and σ 2 = 1 and then shift and scale just as we do for the full normal distribution. The probability of acceptance is the ratio of the area under the truncated exponential (the majorizing function) to the area under the truncated normal density. For an exponential with parameter λ and truncated at τ , the density is g(x) = λe−λ(x−τ ) . To scale this so that it majorizes the truncated normal density requires a constant c that does not depend on λ. We can write the probability of acceptance as 2 cλe−λτ −λ /2 . Maximizing this quantity (by taking the derivative and equating to 0) yields the equation λ2 − λτ − 1 = 0.
5.3.
Use the fact that U and 1 − U have the same distribution.
5.6b.
A simple program using the IMSL routine bnrdf can be used to compute r. Here is a fragment of code that will work: 10 pl ph if rt pt if
= bnrdf(z1,z2,rl) = bnrdf(z1,z2,rh) (abs(ph-pl) .le. eps) go to 99 = rl + (rh-rl)/2. = bnrdf(z1,z2,rt) (pt .gt. prob) then rh = rt else rl = rt endif go to 10 99 continue print *, rl 5.7b
We need the partitions of Σ −1 : Σ Now,
−1
0013 =
Σ11 Σ21
Σ12 Σ22
0013
0014−1 =
T11 T21
T12 T22
−1 T −1 Σ11 T11 = Σ11 − Σ12 Σ22
(see Gentle, 1998, page 61).
0014 .
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
327
−1 (x2 − The conditional distribution of X1 given X2 = x2 is Nd1 (µ1 +Σ12 Σ22 −1 µ2 ), T11 ) (see any book on multivariate distributions, such as Kotz, Balakrishnan, and Johnson, 2000). Hence, first, take y1 as rnorm(d1). Then, −1/2 −1 (x2 − µ2 ), x1 = T11 y1 + µ1 + Σ12 Σ22 −1/2
where T11
−1 −1 T is the Cholesky factor of T11 , that is, of Σ11 − Σ12 Σ22 Σ11 .
If Y2 is a d2 -variate random variate with a standard circular normal distribution, and X1 has the given relationship, then −1 E(X1 |X2 = x2 ) = µ1 + Σ12 Σ22 (x2 − µ2 ),
and
−1/2
V(X1 |X2 = x2 ) = T11
−1/2
V(Y1 )T11
−1 = T11 .
6.2a.
The problem with random sampling using a pseudorandom number generator is that the fixed relationships in the generator must be ignored – else there can be no simple random sample larger than 1. On the other hand, if these fixed relationships are ignored, then it does not make sense to speak of a period.
7.2b.
The set is a random sample from the distribution with density f .
7.3b.
The integral can be reduced to 0016 2' 0
π cos πydy. y
Generate yi as 2ui , where ui are from U(0, 1), and estimate the integral as √ cos πyi . 2 π √ yi 7.6.
The order of the variance is O(n−2 ). The order is obviously dependent on the dimension of the integral, however, and, in higher dimensions, it is not competitive with the crude Monte Carlo method.
7.9a.
Generate xi from a gamma(3, 2) distribution, and take your estimator as 0001 sin(πxi ) . 16 n
7.10b. The optimum is l = d. 7.10d. An unbiased estimator for θ is d2 (n1 + n2 ) . (dl − l2 )n The optimum is l = d.
328
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
7.14a.
(
∗
x ) EP0017(¯
1 ∗ = EP0017 x n i i 1 = E (x∗ ) n i P0017 i 1 x¯ = n i
)
= x ¯. Note that the empirical distribution is a conditional distribution, given the sample. With the sample fixed, x ¯ is a “parameter” rather than a “statistic”. 7.14b.
(
∗
x ) EP (¯
= = = =
1 ∗ EP x n i i 1 EP (x∗i ) n i 1 µ n i µ.
)
Alternatively, EP (¯ x∗ )
7.14c. First, note that
EP EP0017(¯ x∗ ) = EP (¯ x) = µ.
=
x∗j ) = x ¯, EP0017(¯ 11 VP0017(¯ x∗j ) = ¯)2 , (xi − x nn
and VP0017(¯ x∗j ) =
1 (xi − x ¯ )2 . mn2
Now, EP0017(V )
=
1 2 E x¯∗j − x ¯∗j m − 1 P0017 j
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
= = = = =
329
1 2 E x ¯∗2 x∗j j − m¯ m − 1 P0017 j 1 2 m m 2 m¯ x + ¯)2 − m¯ x2 − − x ¯ ) (x (xi − x i m−1 n mn2 0012 0011
m 1 1 2 2 (x (x − x ¯ ) − − x ¯ ) i i m − 1 n2 n2 1 (xi − x ¯)2 n2 1 2 σ . n P0017
7.14d. EP (V )
= EP (EP0017(V )) 0012 0011 1 = EP (xi − x¯)2 /n n 1 n−1 2 = σP . n n
This page intentionally left blank
Bibliography As might be expected, the literature in the interface of computer science, numerical analysis, and statistics is quite diverse, and articles on random number generation and Monte Carlo methods are likely to appear in journals devoted to quite different disciplines. There are at least ten journals and serials with titles that contain some variants of both “computing” and “statistics”, but there are far more journals in numerical analysis and in areas such as “computational physics”, “computational biology”, and so on that publish articles relevant to the fields of statistical computing and computational statistics. Many of the methods of computational statistics involve random number generation and Monte Carlo methods. The journals in the mainstream of statistics also have a large proportion of articles in the fields of statistical computing and computational statistics because, as we suggested in the preface, recent developments in statistics and in the computational sciences have paralleled each other to a large extent. There are two well-known learned societies with a primary focus in statistical computing: the International Association for Statistical Computing (IASC), which is an affiliated society of the International Statistical Institute, and the Statistical Computing Section of the American Statistical Association (ASA). The Statistical Computing Section of the ASA has a regular newsletter carrying news and notices as well as articles on practicum. Also, the activities of the Society for Industrial and Applied Mathematics (SIAM) are often relevant to computational statistics. There are two regular conferences in the area of computational statistics: COMPSTAT, held biennially in Europe and sponsored by the IASC, and the Interface Symposium, generally held annually in North America and sponsored by the Interface Foundation of North America with cooperation from the Statistical Computing Section of the ASA. In addition to literature and learned societies in the traditional forms, an important source of communication and a repository of information are computer databases and forums. In some cases, the databases duplicate what is available in some other form, but often the material and the communications facilities provided by the computer are not available elsewhere.
331
332
BIBLIOGRAPHY
Literature in Computational Statistics In the Library of Congress classification scheme, most books on statistics, including statistical computing, are in the QA276 section, although some are classified under H, HA, and HG. Numerical analysis is generally in QA279 and computer science in QA76. Many of the books in the interface of these disciplines are classified in these or other places within QA. Current Index to Statistics, published annually by the American Statistical Association and the Institute for Mathematical Statistics, contains both author and subject indexes that are useful in finding journal articles or books in statistics. The Index is available in hard copy and on CD-ROM. The Association for Computing Machinery (ACM) publishes an annual index, by author, title, and keyword, of the literature in the computing sciences. Mathematical Reviews, published by the American Mathematical Society (AMS), contains brief reviews of articles in all areas of mathematics. The areas of “Statistics”, “Numerical Analysis”, and “Computer Science” contain reviews of articles relevant to computational statistics. The papers reviewed in Mathematical Reviews are categorized according to a standard system that has slowly evolved over the years. In this taxonomy, called the AMS MR classification system, “Statistics” is 62Xyy; “Numerical Analysis”, including random number generation, is 65Xyy; and “Computer Science” is 68Xyy. (“X” represents a letter and “yy” represents a two-digit number.) Mathematical Reviews is available to subscribers via the World Wide Web at MathSciNet: http://www.ams.org/mathscinet/ There are various handbooks of mathematical functions and formulas that are useful in numerical computations. Three that should be mentioned are Abramowitz and Stegun (1964), Spanier and Oldham (1987), and Thompson (1997). Anyone doing serious scientific computations should have ready access to at least one of these volumes. Almost all journals in statistics have occasional articles on computational statistics and statistical computing. The following is a list of journals, proceedings, and newsletters that emphasize this field. ACM Transactions on Mathematical Software, published quarterly by the ACM (Association for Computing Machinery). This journal publishes algorithms in Fortran and C. The ACM collection of algorithms is sometimes called CALGO. The algorithms published during the period 1975 through 1999 are available on a CD-ROM from ACM. Most of the algorithms are available through netlib at http://www.netlib.org/liblist.html ACM Transactions on Modeling and Computer Simulation, published quarterly by the ACM. Applied Statistics, published quarterly by the Royal Statistical Society. (Until 1998, it included algorithms in Fortran. Some of these algorithms, with cor-
BIBLIOGRAPHY
333
rections, were collected by Griffiths and Hill, 1985. Most of the algorithms are available through statlib at Carnegie Mellon University.) Communications in Statistics — Simulation and Computation, published quarterly by Marcel Dekker. (Until 1996, it included algorithms in Fortran. Until 1982, this journal was designated as Series B.) Computational Statistics, published quarterly by Physica-Verlag (formerly called Computational Statistics Quarterly). Computational Statistics. Proceedings of the xxth Symposium on Computational Statistics (COMPSTAT), published biennially by Physica-Verlag. (It is not refereed.) Computational Statistics & Data Analysis, published by North Holland. The number of issues per year varies. (This is also the official journal of the International Association for Statistical Computing, and as such incorporates the Statistical Software Newsletter.) Computing Science and Statistics. This is an annual publication containing papers presented at the Interface Symposium. Until 1992, these proceedings were named Computer Science and Statistics: Proceedings of the xxth Symposium on the Interface. (The 24th symposium was held in 1992.) In 1997, Volume 29 was published in two issues: Number 1, which contains the papers of the regular Interface Symposium, and Number 2, which contains papers from another conference. The two numbers are not sequentially paginated. Since 1999, the proceedings have been published only in CD-ROM form, by the Interface Foundation of North America. (It is not refereed.) Journal of Computational and Graphical Statistics, published quarterly by the American Statistical Association. Journal of Statistical Computation and Simulation, published irregularly in four numbers per volume by Gordon and Breach. Proceedings of the Statistical Computing Section, published annually by the American Statistical Association. (It is not refereed.) SIAM Journal on Scientific Computing, published bimonthly by SIAM. This journal was formerly SIAM Journal on Scientific and Statistical Computing. (Is this a step backward?) Statistical Computing & Graphics Newsletter, published quarterly by the Statistical Computing and the Statistical Graphics Sections of the American Statistical Association. (It is not refereed and is not generally available in university libraries.) Statistics and Computing, published quarterly by Chapman & Hall. There are two journals whose contents are primarily in the subject area of random number generation, simulation, and Monte Carlo methods: ACM Transactions on Modeling and Computer Simulation (Volume 1 appeared in 1992) and Monte Carlo Methods and Applications (Volume 1 appeared in 1995). There has been a series of conferences concentrating on this area (with an emphasis on quasirandom methods). The first International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing was held in Las Vegas, Nevada, in 1994. The fifth was held in Singapore in 2002.
334
BIBLIOGRAPHY
The proceedings of the conferences have been published in the Lecture Notes in Statistics series of Springer-Verlag. The proceedings of the first conference were published as Niederreiter and Shiue (1995); those of the second as Niederreiter et al. (1998), the third as Niederreiter and Spanier (1999), and of the fourth as Fang, Hickernell, and Niederreiter (2002). The proceedings of the CRYPTO conferences often contain interesting articles on uniform random number generation, with an emphasis on the cryptographic applications. These proceedings are published in the Lecture Notes in Computer Science series of Springer-Verlag under the name Proceedings of CRYPTO XX, where XX is a two digit number representing the year of the conference. There are a number of textbooks, monographs, and survey articles on random number generation and Monte Carlo methods. Some of particular note (listed alphabetically) are Bratley, Fox, and Schrage (1987), Dagpunar (1988), De´ ak (1990), Devroye (1986a), Fishman (1996), Knuth (1998), L’Ecuyer (1990), L’Ecuyer and Hellekalek (1998), Lewis and Orav (1989), Liu (2001), Morgan (1984), Niederreiter (1992, 1995c), Ripley (1987), Robert and Casella (1999), and Tezuka (1995).
World Wide Web, News Groups, List Servers, and Bulletin Boards The best way of storing information is in a digital format that can be accessed by computers. In some cases, the best way for people to access information is by computers. In other cases, the best way is via hard copy, which means that the information stored on the computer must go through a printing process resulting in books, journals, or loose pages. The references that I have cited in this text are generally traditional books, journal articles, or compact discs. This usually means that the material has been reviewed by someone other than the author. It also means that the author possibly has newer thoughts on the same material. The Internet provides a mechanism for the dissemination of large volumes of information that can be updated readily. The ease of providing material electronically is also the source of the major problem with the material: it is often half-baked and has not been reviewed critically. Another reason that I have refrained from making frequent reference to material available over the Internet is the unreliability of some sites. The average life of a Web site is measured in weeks. For statistics, one of the most useful sites on the Internet is the electronic repository statlib, maintained at Carnegie Mellon University, which contains programs, datasets, and other items of interest. The URL is http://lib.stat.cmu.edu. The collection of algorithms published in Applied Statistics is available in statlib. These algorithms are sometimes called the ApStat algorithms.
BIBLIOGRAPHY
335
Another very useful site for scientific computing is netlib, which was established by research workers at AT&T (now Lucent) Bell Laboratories and national laboratories, primarily Oak Ridge National Laboratory. The URL is http://www.netlib.org The Collected Algorithms of the ACM (CALGO), which are the Fortran, C, and Algol programs published in ACM Transactions on Mathematical Software (or in Communications of the ACM prior to 1975), are available in netlib under the TOMS link. The Guide to Available Mathematical Software (GAMS) can be accessed at http://gams.nist.gov A different interface, using Java, is available at http://math.nist.gov/HotGAMS/ A good set of links for software are the Econometric Links of the Econometrics Journal (which are not just limited to econometrics): http://www.eur.nl/few/ei/links/software.html There are two major problems in using the WWW to gather information. One is the sheer quantity of information and the number of sites providing information. The other is the “kiosk problem”; anyone can put up material. Sadly, the average quality is affected by a very large denominator. The kiosk problem may be even worse than a random selection of material; the “fools in public places” syndrome is much in evidence. There is not much that can be done about the second problem. It was not solved for traditional postings on uncontrolled kiosks, and it will not be solved on the WWW. For the first problem, there are remarkable programs that automatically crawl through WWW links to build a database that can be searched for logical combinations of terms and phrases. Such systems and databases have been built by several people and companies. One of the most useful is Google at http://google.stanford.edu A very widely used search program is Yahoo at http://www.yahoo.com A neophyte can be quickly disabused of an exaggerated sense of the value of such search engines by doing a search on “Monte Carlo”. It is not clear at this time what will be the media for the scientific literature within a few years. Many of the traditional journals will be converted to an electronic version of some kind. Journals will become Web sites. That is for certain; the details, however, are much less certain. Many bulletin boards and discussion groups have already evolved into “electronic journals”. A publisher of a standard commercial journal has stated that “we reject 80% of the articles submitted to our journal; those are the ones you can find on the Web”.
336
BIBLIOGRAPHY
References for Software Packages There is a wide range of software used in the computational sciences. Some of the software is produced by a single individual who is happy to share the software, sometimes for a fee, but who has no interest in maintaining the software. At the other extreme is software produced by large commercial companies whose continued existence depends on a process of production, distribution, and maintenance of the software. Information on much of the software can be obtained from GAMS. Some of the free software can be obtained from statlib or netlib. The names of many software packages are trade names or trademarks. In this book, the use of names, even if the name is not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
References to the Literature The following bibliography obviously covers a wide range of topics in random number generation and Monte Carlo methods. Except for a few of the general references, all of these entries have been cited in the text. The purpose of this bibliography is to help the reader get more information; hence, I eschew “personal communications” and references to technical reports that may or may not exist. Those kinds of references are generally for the author rather than for the reader. In some cases, important original papers have been reprinted in special collections, such as Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York. In most such cases, because the special collection may be more readily available, I list both sources.
A Note on the Names of Authors In these references, I have generally used the names of authors as they appear in the original sources. This may mean that the same author will appear with different forms of names, sometimes with given names spelled out and sometimes abbreviated. In the author index, beginning on page 371, I use a single name for the same author. The name is generally the most unique (i.e., least abbreviated) of any of the names of that author in any of the references. This convention may occasionally result in an entry in the author index that does not occur exactly in any references. For example, a reference to J. Paul Jones together with one to John P. Jones, if I know that the two names refer to the same person, would result in an author index entry for John Paul Jones. Abramowitz, Milton, and Irene A. Stegun (Editors) (1964), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Na-
BIBLIOGRAPHY
337
tional Bureau of Standards (NIST), Washington. (Reprinted by Dover Publications, New York, 1974. Work on an updated version is occurring at NIST; see http://dlmf.nist.gov/ for the current status.) Afflerbach, L., and H. Grothe (1985), Calculation of Minkowski-reduced lattice bases, Computing 35, 269–276. Afflerbach, Lothar, and Holger Grothe (1988), The lattice structure of pseudorandom vectors generated by matrix generators, Journal of Computational and Applied Mathematics 23, 127–131. Afflerbach, L., and W. H¨ ormann (1992), Nonuniform random numbers: A sensitivity analysis for transformation methods, International Workshop on Computationally Intensive Methods in Simulation and Optimization (edited by U. Dieter and G. C. Pflug), Springer-Verlag, Berlin, 374. Agarwal, Satish K., and Jamal A. Al-Saleh (2001), Generalized gamma type distribution and its hazard rate function, Communications in Statistics — Theory and Methods 30, 309–318. Agresti, Alan (1992), A survey of exact inference for contingency tables (with discussion), Statistical Science 7, 131–177. Ahn, Hongshik, and James J. Chen (1995), Generation of over-dispersed and under-dispersed binomial variates, Journal of Computational and Graphical Statistics 4, 55–64. Ahrens, J. H. (1995), A one-sample method for sampling from continuous and discrete distributions, Computing 52, 127–146. Ahrens, J. H., and U. Dieter (1972), Computer methods for sampling from the exponential and normal distributions, Communications of the ACM 15, 873–882. Ahrens, J. H., and U. Dieter (1974), Computer methods for sampling from gamma, beta, Poisson, and binomial distributions, Computing 12, 223–246. Ahrens, J. H., and U. Dieter (1980), Sampling from binomial and Poisson distributions: A method with bounded computation times, Computing 25, 193–208. Ahrens, J. H., and U. Dieter (1985), Sequential random sampling, ACM Transactions on Mathematical Software 11, 157–169. Ahrens, Joachim H., and Ulrich Dieter (1988), Efficient, table-free sampling methods for the exponential, Cauchy and normal distributions, Communications of the ACM 31, 1330–1337. (See also Hamilton, 1998.) Ahrens, J. H., and U. Dieter (1991), A convenient sampling method with bounded computation times for Poisson distributions, The Frontiers of Statistical Computation, Simulation & Modeling (edited by P. R. Nelson, E. J. ¨ urk, and E. C. van der Meulen), American Sciences Press, Dudewicz, A. Ozt¨ Columbus, Ohio, 137–149. Akima, Hirosha (1970), A new method of interpolation and smooth curve fitting based on local procedures, Journal of the ACM 17, 589–602. Albert, James; Mohan Delampady; and Wolfgang Polasek (1991), A class of distributions for robustness studies, Journal of Statistical Planning and Inference 28, 291–304.
338
BIBLIOGRAPHY
Alonso, Laurent, and Ren´e Schott (1995), Random Generation of Trees: Random Generators in Science, Kluwer Academic Publishers, Boston. Altman, N. S. (1989), Bit-wise behavior of random number generators, SIAM Journal on Scientific and Statistical Computing 9, 941–949. Aluru, S.; G. M. Prabhu; and John Gustafson (1992), A random number generator for parallel computers, Parallel Computing 18, 839–847. Anderson, N. H., and D. M. Titterington (1993), Cross-correlation between simultaneously generated sequences of pseudo-random uniform deviates, Statistics and Computing 3, 61–65. Anderson, T. W.; I. Olkin; and L. G. Underhill (1987), Generation of random orthogonal matrices, SIAM Journal on Scientific and Statistical Computing 8, 625–629. Andrews, D. F.; P. J. Bickel; F. R. Hampel; P. J. Huber; W. H. Rogers; and J. W. Tukey (1972), Robust Estimation of Location: Survey and Advances, Princeton University Press, Princeton, New Jersey. Antonov, I. A., and V. M. Saleev (1979), An economic method of computing LPτ -sequences, USSR Computational Mathematics and Mathematical Physics 19, 252–256. Arnason, A. N., and L. Baniuk (1978), A computer generation of Dirichlet variates, Proceedings of the Eighth Manitoba Conference on Numerical Mathematics and Computing, Utilitas Mathematica Publishing, Winnipeg, 97– 105. Arnold, Barry C. (1983), Pareto Distributions, International Co-operative Publishing House, Fairland, Maryland. Arnold, Barry C., and Robert J. Beaver (2000), The skew-Cauchy distribution, Statistics and Probability Letters 49, 285–290. Arnold, Barry C., and Robert J. Beaver (2002), Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion), Test 11, 7–54. Arnold, Barry C.; Robert J. Beaver; Richard A. Groeneveld; and William Q. Meeker (1993), The nontruncated marginal of a truncated bivariate normal distribution, Psychometrika 58, 471–488. Atkinson, A. C. (1979), A family of switching algorithms for the computer generation of beta random variates, Biometrika 66, 141–145. Atkinson, A. C. (1980), Tests of pseudo-random numbers, Applied Statistics 29, 164–171. Atkinson, A. C. (1982), The simulation of generalized inverse Gaussian and hyperbolic random variables, SIAM Journal on Scientific and Statistical Computing 3, 502–515. Atkinson, A. C., and M. C. Pearce (1976), The computer generation of beta, gamma and normal random variables (with discussion), Journal of the Royal Statistical Society, Series A 139, 431–460. Avramidis, Athanassios N., and James R. Wilson (1995), Correlation-induction techniques for estimating quantiles in simulation experiments, Proceedings
BIBLIOGRAPHY
339
of the 1995 Winter Simulation Conference, Association for Computing Machinery, New York, 268–277. Azzalini, A., and A. Dalla Valle (1996), The multivariate skew-normal distribution, Biometrika 83, 715–726. Bacon-Shone, J. (1985), Algorithm AS210: Fitting five parameter Johnson SB curves by moments, Applied Statistics 34, 95–100. Bailey, David H., and Richard E. Crandall (2001), On the random character of fundamental constant expressions, Experimental Mathematics 10, 175–190. Bailey, Ralph W. (1994), Polar generation of random variates with the tdistribution, Mathematics of Computation 62, 779–781. Balakrishnan, N., and R. A. Sandhu (1995), A simple simulation algorithm for generating progressive Type-II censored samples, The American Statistician 49, 229–230. Banerjia, Sanjeev, and Rex A. Dwyer (1993), Generating random points in a ball, Communications in Statistics — Simulation and Computation 22, 1205–1209. Banks, David L. (1998), Testing random number generators, Proceedings of the Statistical Computing Section, ASA, 102–107. Barnard, G. A. (1963), Discussion of Bartlett, “The spectral analysis of point processes”, Journal of the Royal Statistical Society, Series B 25, 264–296. Barndorff-Nielsen, Ole E., and Neil Shephard (2001), Non-Gaussian OrnsteinUhlenbeck-based models and some of their uses in financial economics (with discussion), Journal of the Royal Statistical Society, Series B 63, 167–241. Bays, Carter, and S. D. Durham (1976), Improving a poor random number generator, ACM Transactions on Mathematical Software 2, 59–64. Beck, J., and W. W. L. Chen (1987), Irregularities of Distribution, Cambridge University Press, Cambridge, United Kingdom. Becker, P. J., and J. J. J. Roux (1981), A bivariate extension of the gamma distribution, South African Statistical Journal 15, 1–12. Becker, Richard A.; John M. Chambers; and Allan R. Wilks (1988), The New S Language, Wadsworth & Brooks/Cole, Pacific Grove, California. Beckman, Richard J., and Michael D. McKay (1987), Monte Carlo estimation under different distributions using the same simulation, Technometrics 29, 153–160. B´elisle, Claude J. P.; H. Edwin Romeijn; and Robert L. Smith (1993), Hitand-run algorithms for generating multivariate distributions, Mathematics of Operations Research 18, 255–266. Bendel, R. B., and M. R. Mickey (1978), Population correlation matrices for sampling experiments, Communications in Statistics — Simulation and Computation B7, 163–182. Berbee, H. C. P.; C. G. E. Boender; A. H. G. Rinnooy Kan; C. L. Scheffer; R. L. Smith; and J. Telgen (1987), Hit-and-run algorithms for the identification of nonredundant linear inequalities, Mathematical Programming 37, 184–207. Best, D. J. (1983), A note on gamma variate generators with shape parameter less than unity, Computing 30, 185–188.
340
BIBLIOGRAPHY
Best, D. J., and N. I. Fisher (1979), Efficient simulation of the von Mises distribution, Applied Statistics 28, 152–157. Beyer, W. A. (1972), Lattice structure and reduced bases of random vectors generated by linear recurrences, Applications of Number Theory to Numerical Analysis (edited by S. K. Zaremba), Academic Press, New York, 361–370. Beyer, W. A.; R. B. Roof; and D. Williamson (1971), The lattice structure of multiplicative congruential pseudo-random vectors, Mathematics of Computation 25, 345–363. Bhanot, Gyan (1988), The Metropolis algorithm, Reports on Progress in Physics 51, 429–457. Birkes, David, and Yadolah Dodge (1993), Alternative Methods of Regression, John Wiley & Sons, New York. Blum, L.; M. Blum; and M. Shub (1986), A simple unpredictable pseudorandom number generator, SIAM Journal of Computing 15, 364–383. Bouleau, Nicolas, and Dominique L´epingle (1994), Numerical Methods for Stochastic Processes, John Wiley & Sons, New York. Boyar, J. (1989), Inferring sequences produced by pseudo-random number generators, Journal of the ACM 36, 129–141. Boyett, J. M. (1979), Random R × C tables with given row and column totals, Applied Statistics 28, 329–332. Braaten, E., and G. Weller (1979), An improved low-discrepancy sequence for multidimensional quasi-Monte Carlo integration, Journal of Computational Physics 33, 249–258. Bratley, Paul, and Bennett L. Fox (1988), Algorithm 659: Implementing Sobol’s quasirandom sequence generator, ACM Transactions on Mathematical Software 14, 88–100. Bratley, Paul; Bennett L. Fox; and Harald Niederreiter (1992), Implementation and tests of low-discrepancy sequences, ACM Transactions on Modeling and Computer Simulation 2, 195–213. Bratley, Paul; Bennett L. Fox; and Harald Niederreiter (1994), Algorithm 738: Programs to generate Niederreiter’s low-discrepancy sequences, ACM Transactions on Mathematical Software 20, 494–495. Bratley, Paul; Bennett L. Fox; and Linus E. Schrage (1987), A Guide to Simulation, second edition, Springer-Verlag, New York. Brooks, S. P., and G. O. Roberts (1999) Assessing convergence of iterative simulations, Statistics and Computing 8, 319–335. Brophy, John F.; James E. Gentle; Jing Li; and Philip W. Smith (1989), Software for advanced architecture computers, Computer Science and Statistics: Proceedings of the Twenty-first Symposium on the Interface (edited by Kenneth Berk and Linda Malone), American Statistical Association, Alexandria, Virginia, 116–120. Brown, Morton B., and Judith Bromberg (1984), An efficient two-stage procedure for generating random variates from the multinomial distribution, The American Statistician 38, 216–219.
BIBLIOGRAPHY
341
Buckheit, Jonathan B., and David L. Donoho (1995), WaveLab and reproducible research, Wavelets and Statistics (edited by Anestis Antoniadis and Georges Oppenheim), Springer-Verlag, New York, 55–81. Buckle, D. J. (1995), Bayesian inference for stable distributions, Journal of the American Statistical Association 90, 605–613. Burr, I. W. (1942), Cumulative frequency functions, Annals of Mathematical Statistics 13, 215–232. Burr, Irving W., and Peter J. Cislak (1968), On a general system of distributions. I. Its curve-shape characteristics. II. The sample median, Journal of the American Statistical Association 63, 627–635. Cabrera, Javier, and Dianne Cook (1992), Projection pursuit indices based on fractal dimension, Computing Science and Statistics 24, 474–477. Caflisch, Russel E., and Bradley Moskowitz (1995), Modified Monte Carlo methods using quasi-random sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 1–16. Carlin, Bradley P., and Thomas A. Louis (1996), Bayes and Empirical Bayes Methods for Data Analysis, Chapman & Hall, New York. Carta, David G. (1990), Two fast implementations of the “minimal standard” random number generator, Communications of the ACM 33, Number 1 (January), 87–88. Casella, George, and Edward I. George (1992), Explaining the Gibbs sampler, The American Statistician 46, 167–174. Chalmers, C. P. (1975), Generation of correlation matrices with given eigenstructure, Journal of Statistical Computation and Simulation 4, 133–139. Chamayou, J.-F. (2001), Pseudo random numbers for the Landau and Vavilov distributions, Computational Statistics 19, 131–152. Chambers, John M. (1997), The evolution of the S language, Computing Science and Statistics 28, 331–337. Chambers, J. M.; C. L. Mallows; and B. W. Stuck (1976), A method for simulating stable random variables, Journal of the American Statistical Association 71, 340–344 (Corrections, 1987, ibid. 82, 704, and 1988, ibid. 83, 581). Chen, H. C., and Y. Asau (1974), On generating random variates from an empirical distribution, AIIE Transactions 6, 163–166. Chen, Huifen, and Bruce W. Schmeiser (1992), Simulation of Poisson processes with trigonometric rates, Proceedings of the 1992 Winter Simulation Conference, Association for Computing Machinery, New York, 609–617. Chen, Ming-Hui, and Bruce Schmeiser (1993), Performance of the Gibbs, hitand-run, and Metropolis samplers, Journal of Computational and Graphical Statistics 3, 251–272. Chen, Ming-Hui, and Bruce W. Schmeiser (1996), General hit-and-run Monte Carlo sampling for evaluating multidimensional integrals, Operations Research Letters 19, 161–169. Chen, Ming-Hui; Qi-Man Shao; and Joseph G. Ibrahim (2000), Monte Carlo Methods in Bayesian Computation, Springer-Verlag, New York.
342
BIBLIOGRAPHY
Cheng, R. C. H. (1978), Generating beta variates with nonintegral shape parameters, Communications of the ACM 21, 317–322. Cheng, R. C. H. (1984), Generation of inverse Gaussian variates with given sample mean and dispersion, Applied Statistics 33, 309–316. Cheng, R. C. H. (1985), Generation of multivariate normal samples with given mean and covariance matrix, Journal of Statistical Computation and Simulation 21, 39–49. Cheng, R. C. H., and G. M. Feast (1979), Some simple gamma variate generators, Applied Statistics 28, 290–295. Cheng, R. C. H., and G. M. Feast (1980), Gamma variate generators with increased shape parameter range, Communications of the ACM 23, 389– 393. Chernick, Michael R. (1999), Bootstrap Methods: A Practitioner’s Guide, John Wiley & Sons, New York. Chib, Siddhartha, and Edward Greenberg (1995), Understanding the Metropolis– Hastings algorithm, The American Statistician 49, 327–335. Chou, Wun-Seng, and Harald Niederreiter (1995), On the lattice test for inversive congruential pseudorandom numbers, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 186–197. Chou, Youn-Min; S. Turner; S. Henson; D. Meyer; and K. S. Chen (1994), On using percentiles to fit data by a Johnson distribution, Communications in Statistics — Simulation and Computation 23, 341–354. Cipra, Barry A. (1987), An introduction to the Ising model, The American Mathematical Monthly 94, 937–959. Coldwell, R. L. (1974), Correlational defects in the standard IBM 360 random number generator and the classical ideal gas correlational function, Journal of Computational Physics 14, 223–226. Collings, Bruce Jay (1987), Compound random number generators, Journal of the American Statistical Association 82, 525–527. Compagner, Aaldert (1991), Definitions of randomness, American Journal of Physics 59, 700–705. Compagner, A. (1995), Operational conditions for random-number generation, Physical Review E 52, 5634–5645. Cook, R. Dennis, and Mark E. Johnson (1981), A family of distributions for modelling non-elliptically symmetric multivariate data, Journal of the Royal Statistical Society, Series B 43, 210–218. Cook, R. Dennis, and Mark E. Johnson (1986), Generalized Burr–Pareto-logistic distributions with applications to a uranium exploration data set, Technometrics 28, 123–131. Couture, R., and Pierre L’Ecuyer (1994), On the lattice structure of certain linear congruential sequences related to AWC/SWB generators, Mathematics of Computation 62, 799–808. Couture, Raymond, and Pierre L’Ecuyer (1995), Linear recurrences with carry as uniform random number generators, Proceedings of the 1995 Winter Sim-
BIBLIOGRAPHY
343
ulation Conference, Association for Computing Machinery, New York, 263– 267. Couture, Raymond, and Pierre L’Ecuyer (1997), Distribution properties of multiply-with-carry random number generators, Mathematics of Computation 66, 591–607. Coveyou, R. R., and R. D. MacPherson (1967), Fourier analysis of uniform random number generators, Journal of the ACM 14, 100–119. Cowles, Mary Kathryn, and Bradley P. Carlin (1996), Markov chain Monte Carlo convergence diagnostics: A comparative review, Journal of the American Statistical Association 91, 883–904. Cowles, Mary Kathryn; Gareth O. Roberts; and Jeffrey S. Rosenthal (1999), Possible biases induced by MCMC convergence diagnostics, Journal of Statistical Computation and Simulation 64, 87–104. Cowles, Mary Kathryn, and Jeffrey S. Rosenthal (1998), A simulation approach to convergence rates for Markov chain Monte Carlo algorithms, Statistics and Computing 8, 115–124. Cuccaro, Steven A.; Michael Mascagni; and Daniel V. Pryor (1994), Techniques for testing the quality of parallel pseudorandom number generators, Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, Philadelphia, 279–284. Currin, Carla; Toby J. Mitchell; Max Morris; and Don Ylvisaker (1991), Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments, Journal of the American Statistical Association 86, 953–963. D’Agostino, Ralph B. (1986), Tests for the normal distribution, Goodness-ofFit Techniques (edited by Ralph B. D’Agostino and Michael A. Stephens), Marcel Dekker, New York, 367–419. Dagpunar, J. S. (1978), Sampling of variates from a truncated gamma distribution, Journal of Statistical Computation and Simulation 8, 59–64. Dagpunar, John (1988), Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom. Dagpunar, J. (1990), Sampling from the von Mises distribution via a comparison of random numbers, Journal of Applied Statistics 17, 165–168. Damien, Paul; Purushottam W. Laud; and Adrian F. M. Smith (1995), Approximate random variate generation from infinitely divisible distributions with applications to Bayesian inference, Journal of the Royal Statistical Society, Series B 57, 547–563. Damien, Paul, and Stephen G. Walker (2001), Sampling truncated normal, beta, and gamma densities, Journal of Computational and Graphical Statistics 10, 206–215. David, Herbert A. (1981), Order Statistics, second edition, John Wiley & Sons, New York. Davis, Charles S. (1993), The computer generation of multinomial random variates, Computational Statistics & Data Analysis 16, 205–217.
344
BIBLIOGRAPHY
Davis, Don; Ross Ihaka; and Philip Fenstermacher (1994), Cryptographic randomness from air turbulence in disk drives, Advances in Cryptology — CRYPTO ’94, edited by Yvo G. Desmedt, Springer-Verlag, New York, 114– 120. Davison, A. C., and D. V. Hinkley (1997), Bootstrap Methods and Their Application, Cambridge University Press, Cambridge, United Kingdom. De´ ak, I. (1981), An economical method for random number generation and a normal generator, Computing 27, 113–121. De´ ak, I. (1986), The economical method for generating random samples from discrete distributions, ACM Transactions on Mathematical Software 12, 34– 36. De´ ak, Istv´ an (1990), Random Number Generators and Simulation, Akad´emiai Kiad´ o, Budapest. Dellaportas, Petros (1995), Random variate transformations in the Gibbs sampler: Issues of efficiency and convergence, Statistics and Computing 5, 133– 140. Dellaportas, P., and A. F. M. Smith (1993), Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling, Applied Statistics 42, 443–459. De Matteis, A., and S. Pagnutti (1990), Long-range correlations in linear and non-linear random number generators, Parallel Computing 14, 207–210. De Matteis, A., and S. Pagnutti (1993), Long-range correlation analysis of the Wichmann–Hill random number generator, Statistics and Computing 3, 67– 70. Deng, Lih-Yuan; Kwok Hung Chan; and Yilian Yuan (1994), Random number generators for multiprocessor systems, International Journal of Modelling and Simulation 14, 185–191. Deng, Lih-Yuan, and E. Olusegun George (1990), Generation of uniform variates from several nearly uniformly distributed variables, Communications in Statistics — Simulation and Computation 19, 145–154. Deng, Lih-Yuan, and E. Olusegun George (1992), Some characterizations of the uniform distribution with applications to random number generation, Annals of the Institute of Statistical Mathematics 44, 379–385. Deng, Lih-Yuan, and Dennis K. J. Lin (2000), Random number generation for the new century, The American Statistician 54, 145–150. Deng, L.-Y.; D. K. J. Lin; J. Wang; and Y. Yuan (1997), Statistical justification of combination generators, Statistica Sinica 7 993–1003. Devroye, L. (1984a), Random variate generation for unimodal and monotone densities, Computing 32, 43–68. Devroye, L. (1984b), A simple algorithm for generating random variates with a log-concave density, Computing 33, 247–257. Devroye, Luc (1986a), Non-Uniform Random Variate Generation, SpringerVerlag, New York. Devroye, Luc (1986b), An automatic method for generating random variates
BIBLIOGRAPHY
345
with a given characteristic function, SIAM Journal on Applied Mathematics 46, 698–719. Devroye, Luc (1987), A simple generator for discrete log-concave distributions, Computing 39, 87–91. Devroye, Luc (1989), On random variate generation when only moments or Fourier coefficients are known, Mathematics and Computers in Simulation 31, 71–79. Devroye, Luc (1991), Algorithms for generating discrete random variables with a given generating function or a given moment sequence, SIAM Journal on Scientific and Statistical Computing 12, 107–126. Devroye, Luc (1997), Random variate generation for multivariate unimodal densities, ACM Transactions on Modeling and Computer Simulation 7, 447– 477. Devroye, Luc; Peter Epstein; and J¨ org-R¨ udiger Sack (1993), On generating random intervals and hyperrectangles, Journal of Computational and Graphical Statistics 2, 291–308. Dieter, U. (1975), How to calculate shortest vectors in a lattice, Mathematics of Computation 29, 827–833. Do, Kim-Anh (1991), Quasi-random resampling for the bootstrap, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 297–300. Dodge, Yadolah (1996), A natural random number generator, International Statistical Review 64, 329–344. Doucet, Arnaud; Nando de Freitas; and Neil Gordon (Editors) (2001), Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York. Efron, Bradley, and Robert J. Tibshirani (1993), An Introduction to the Bootstrap, Chapman & Hall, New York. Eichenauer, J.; H. Grothe; and J. Lehn (1988), Marsaglia’s lattice test and nonlinear congruential pseudo random number generators, Metrika 35, 241–250. Eichenauer, J., and H. Niederreiter (1988), On Marsaglia’s lattice test for pseudorandom numbers, Manuscripta Mathematica 62, 245–248. Eichenauer, J¨ urgen, and J¨ urgen Lehn (1986), A non-linear congruential pseudo random number generator, Statistische Hefte 27, 315–326. Eichenauer-Herrmann, J¨ urgen (1995), Pseudorandom number generation by nonlinear methods, International Statistical Review 63, 247–255. Eichenauer-Herrmann, J¨ urgen (1996), Modified explicit inversive congruential pseudorandom numbers with power of 2 modulus, Statistics and Computing 6, 31–36. Eichenauer-Herrmann, J., and H. Grothe (1989), A remark on long-range correlations in multiplicative congruential pseudorandom number generators, Numerische Mathematik 56, 609–611. Eichenauer-Herrmann, J., and H. Grothe (1990), Upper bounds for the Beyer ratios of linear congruential generators, Journal of Computational and Applied Mathematics 31, 73–80.
346
BIBLIOGRAPHY
Eichenauer-Herrmann, J¨ urgen; Eva Herrmann; and Stefan Wegenkittl (1998), A survey of quadratic and inversive congruential pseudorandom numbers, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), SpringerVerlag, New York, 66–97. Eichenauer-Herrmann, J., and K. Ickstadt (1994), Explicit inversive congruential pseudorandom numbers with power of 2 modulus, Mathematics of Computation 62, 787–797. Emrich, Lawrence J., and Marion R. Piedmonte (1991), A method for generating high-dimensional multivariate binary variates, The American Statistician 45, 302–304. Erber, T.; P. Everett; and P. W. Johnson (1979), The simulation of random processes on digital computers with Chebyshev mixing transformations, Journal of Computational Physics 32, 168–211. Ernst, Michael D. (1998), A multivariate generalized Laplace distribution, Computational Statistics 13, 227–232. Evans, Michael, and Tim Swartz (2000), Approximating Integrals via Monte Carlo and Deterministic Methods, Oxford University Press, Oxford, United Kingdom. Everitt, B. S. (1998), The Cambridge Dictionary of Statistics, Cambridge University Press, Cambridge, United Kingdom. Everson, Philip J., and Carl N. Morris (2000), Simulation from Wishart distributions with eigenvalue constraints, Journal of Computational and Graphical Statistics 9, 380–389. Falk, Michael (1999), A simple approach to the generation of uniformly distributed random variables with prescribed correlations, Communications in Statistics — Simulation and Computation 28, 785–791. Fang, Kai-Tai, and T. W. Anderson (Editors) (1990), Statistical Inference in Elliptically Contoured and Related Distributions, Allerton Press, New York. Fang, K.-T.; F. J. Hickernell; and H. Niederreiter (Editors) (2002), Monte Carlo and Quasi-Monte Carlo Methods 2000, Springer-Verlag, New York. Fang, Kai-Tai, and Run-Ze Li (1997), Some methods for generating both an NTnet and the uniform distribution on a Stiefel manifold and their applications, Computational Statistics & Data Analysis 24, 29–46. Fang, Kai-Tai, and Yuan Wang (1994), Number Theoretic Methods in Statistics, Chapman & Hall, New York. Faure, H. (1986), On the star discrepancy of generalised Hammersley sequences in two dimensions. Monatshefte f¨ ur Mathematik 101, 291–300. Ferrenberg, Alan M.; D. P. Landau; and Y. Joanna Wong (1992), Monte Carlo simulations: Hidden errors from “good” random number generators, Physical Review Letters 69, 3382–3384. Fill, James A. (1998), An interruptible algorithm for perfect sampling via Markov Chains, Annals of Applied Probability 8, 131–162. Fill, James Allen; Motoya Machida; Duncan J. Murdoch; and Jeffrey S. Rosen-
BIBLIOGRAPHY
347
thal (2000), Extensions of Fill’s perfect rejection sampling algorithm to general chains, Random Structures and Algorithms 17, 290–316. Fishman, George S. (1996), Monte Carlo Concepts, Algorithms, and Applications, Springer-Verlag, New York. Fishman, George S., and Louis R. Moore, III (1982), A statistical evaluation of multiplicative random number generators with modulus 231 − 1, Journal of the American Statistical Association 77, 129–136. Fishman, George S., and Louis R. Moore, III (1986), An exhaustive analysis of multiplicative congruential random number generators with modulus 231 −1, SIAM Journal on Scientific and Statistical Computing 7, 24–45. Fleishman, Allen I. (1978), A method for simulating non-normal distributions, Psychometrika 43, 521–532. Flournoy, Nancy, and Robert K. Tsutakawa (Editors) (1991), Statistical Multiple Integration, American Mathematical Society (Contemporary Mathematics, Volume 115), Providence, Rhode Island. Forster, Jonathan J.; John W. McDonald; and Peter W. F. Smith (1996), Monte Carlo exact conditional tests for log-linear and logistic models, Journal of the Royal Statistical Society, Series B 55, 3–24. Fouque, Jean-Pierre; George Papanicolaou; and K. Ronnie Sircar (2000), Derivatives in Financial Markets with Stochastic Volatility, Cambridge University Press, Cambridge, United Kingdom. Fox, Bennett L. (1986), Implementation and relative efficiency of quasirandom sequence generators, ACM Transactions on Mathematical Software 12, 362– 376. Frederickson, P.; R. Hiromoto; T. L. Jordan; B. Smith; and T. Warnock (1984), Pseudo-random trees in Monte Carlo, Parallel Computing 1, 175–180. Freimer, Marshall; Govind S. Mudholkar; Georgia Kollia; and Thomas C. Lin (1988), A study of the generalized Tukey lambda family, Communications in Statistics — Theory and Methods 17, 3547–3567. Freund, John E. (1961), A bivariate extension of the exponential distribution, Journal of the American Statistical Association 56, 971–977. Friedman, Jerome H.; Jon Louis Bentley; and Raphael Ari Finkel (1977), An algorithm for finding best matches in logarithmic expected time, ACM Transactions on Mathematical Software 3, 209–226. Frigessi, Arnoldo; Fabio Martinelli; and Julian Stander (1997), Computational complexity of Markov chain Monte Carlo methods for finite Markov random fields, Biometrika 84, 1–18. Fuller, A. T. (1976), The period of pseudo-random numbers generated by Lehmer’s congruential method, Computer Journal 19, 173–177. Fushimi, Masanori (1990), Random number generation with the recursion Xt = Xt−3p ⊕ Xt−3q , Journal of Computational and Applied Mathematics 31, 105–118. Gamerman, Dani (1997), Markov Chain Monte Carlo, Chapman & Hall, London.
348
BIBLIOGRAPHY
Gange, Stephen J. (1995), Generating multivariate categorical variates using the iterative proportional fitting algorithm, The American Statistician 49, 134–138. Gelfand, Alan E., and Adrian F. M. Smith (1990), Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association 85, 398–409. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 526–550.) Gelfand, Alan E., and Sujit K. Sahu (1994), On Markov chain Monte Carlo acceleration, Journal of Computational and Graphical Statistics 3, 261–276. Gelman, Andrew (1992), Iterative and non-iterative simulation algorithms, Computing Science and Statistics 24, 433–438. Gelman, Andrew, and Xiao-Li Meng (1998), Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13, 163–185. Gelman, Andrew, and Donald B. Rubin (1992a), Inference from iterative simulation using multiple sequences (with discussion), Statistical Science 7, 457–511. Gelman, Andrew, and Donald B. Rubin (1992b), A single series from the Gibbs sampler provides a false sense of security, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 625–631. Gelman, Andrew; John B. Carlin; Hal S. Stern; and Donald B. Rubin (1995), Bayesian Data Analysis, Chapman & Hall, London. Geman, Stuart, and Donald Geman (1984), Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Gentle, James E. (1981), Portability considerations for random number generators, Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (edited by William F. Eddy), Springer-Verlag, New York, 158–164. Gentle, James E. (1990), Computer implementation of random number generators, Journal of Computational and Applied Mathematics 31, 119–125. Gentleman, Robert, and Ross Ihaka (1997), The R language, Computing Science and Statistics 28, 326–330. Gerontidis, I., and R. L. Smith (1982), Monte Carlo generation of order statistics from general distributions, Applied Statistics 31, 238–243. Geweke, John (1991a), Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 571–578. Geweke, John (1991b), Generic, algorithmic approaches to Monte Carlo integration in Bayesian inference, Statistical Multiple Integration (edited by
BIBLIOGRAPHY
349
Nancy Flournoy and Robert K. Tsutakawa), American Mathematical Society, Rhode Island, 117–135. Geyer, Charles J. (1992), Practical Markov chain Monte Carlo (with discussion), Statistical Science 7, 473–511. Geyer, Charles J., and Elizabeth A. Thompson (1995), Annealing Markov chain Monte Carlo with applications to ancestral inference, Journal of the American Statistical Association 90, 909–920. Ghitany, M. E. (1998), On a recent generalization of gamma distribution, Communications in Statistics — Theory and Methods 27, 223–233. Gilks, W. R. (1992), Derivative-free adaptive rejection sampling for Gibbs sampling, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 641–649. Gilks, W. R.; N. G. Best; and K. K. C. Tan (1995), Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics 44, 455–472 (Corrections, Gilks, et al., 1997, ibid. 46, 541–542). Gilks, W. R.; S. Richardson; and D. J. Spiegelhalter (Editors) (1996), Markov Chain Monte Carlo in Practice, Chapman & Hall, London. Gilks, Walter R., and Gareth O. Roberts (1996), Strategies for improving MCMC, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 89–114. Gilks, W. R.; G. O. Roberts; and E. I. George (1994), Adaptive direction sampling, The Statistician 43, 179–189. Gilks, W. R.; A. Thomas; and D. J. Spiegelhalter (1992), Software for the Gibbs sampler, Computing Science and Statistics 24, 439–448. Gilks, W. R.; A. Thomas; and D. J. Spiegelhalter (1994), A language and program for complex Bayesian modelling, The Statistician 43, 169–178. Gilks, W. R., and P. Wild (1992), Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41, 337–348. Gleser, Leon Jay (1976), A canonical representation for the noncentral Wishart distribution useful for simulation, Journal of the American Statistical Association 71, 690–695. Golder, E. R., and J. G. Settle (1976), The Box–Muller method for generating pseudo-random normal deviates, Applied Statistics 25, 12–20. Golomb, S. W. (1982), Shift Register Sequences, second edition, Aegean Part Press, Laguna Hills, California. Gordon, J. (1989), Fast multiplicative inverse in modular arithmetic, Cryptography and Coding (edited by H. J. Beker and F. C. Piper), Clarendon Press, Oxford, United Kingdom, 269–279. Gordon, N. J.; D. J. Salmond; and A. F. M. Smith (1993), Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proceedings F, Communications, Radar, and Signal Processing 140, 107–113. Grafton, R. G. T. (1981), The runs-up and runs-down tests, Applied Statistics 30, 81–85.
350
BIBLIOGRAPHY
Greenwood, J. Arthur (1976a), The demands of trivial combinatorial problems on random number generators, Proceedings of the Ninth Interface Symposium on Computer Science and Statistics (edited by David C. Hoaglin and Roy E. Welsch), Prindle, Weber, and Schmidt, Boston, 222–227. Greenwood, J. A. (1976b), A fast machine-independent long-period generator for 31-bit pseudo-random numbers, Compstat 1976: Proceedings in Computational Statistics (edited by J. Gordesch and P. Naeve), Physica-Verlag, Vienna, 30–36. Greenwood, J. Arthur (1976c), Moments of time to generate random variables by rejection, Annals of the Institute for Statistical Mathematics 28, 399– 401. Griffiths, P., and I. D. Hill (Editors) (1985), Applied Statistics Algorithms, Ellis Horwood Limited, Chichester, United Kingdom. Grothe, H. (1987), Matrix generators for pseudo-random vector generation, Statistische Hefte 28, 233–238. Guerra, Victor O.; Richard A. Tapia; and James R. Thompson (1976), A random number generator for continuous random variables based on an interpolation procedure of Akima, Computer Science and Statistics: 9th Annual Symposium on the Interface (edited by David C. Hoaglin and Roy E. Welsch), Prindle, Weber, and Schmidt, Boston, 228–230. Halton, J. H. (1960), On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals, Numerische Mathematik 2, 84–90 (Corrections, 1960, ibid. 2, 190). Hamilton, Kenneth G. (1998), Algorithm 780: Exponential pseudorandom distribution, ACM Transactions on Mathematical Software 24, 102–106. Hammersley, J. M., and D. C. Handscomb (1964), Monte Carlo Methods, Methuen & Co., London. Hartley, H. O., and D. L. Harris (1963), Monte Carlo computations in normal correlation procedures, Journal of the ACM 10, 302–306. Hastings, W. K. (1970), Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 240–256.) Heiberger, Richard M. (1978), Algorithm AS127: Generation of random orthogonal matrices, Applied Statistics 27, 199–205. (See Tanner and Thisted, 1982.) Hellekalek, P. (1984), Regularities of special sequences, Journal of Number Theory 18, 41–55. Hesterberg, Tim (1995), Weighted average importance sampling and defensive mixture distributions, Technometrics 37, 185–194. Hesterberg, Timothy C., and Barry L. Nelson (1998), Control variates for probability and quantile estimation, Management Science 44, 1295–1312. Hickernell, Fred J. (1995), A comparison of random and quasirandom points for multidimensional quadrature, Monte Carlo and Quasi-Monte Carlo Meth-
BIBLIOGRAPHY
351
ods in Scientific Computing (edited by Harald Niederreiter and Peter JauShyong Shiue), Springer-Verlag, New York, 212–227. Hill, I. D.; R. Hill, R.; and R. L. Holder (1976), Algorithm AS99: Fitting Johnson curves by moments, Applied Statistics 25, 180–189 (Remark, 1981, ibid. 30, 106). Hoaglin, David C., and David F. Andrews (1975), The reporting of computationbased results in statistics, The American Statistician 29, 122–126. Hope, A. C. A. (1968), A simplified Monte Carlo significance test procedure, Journal of the Royal Statistical Society, Series B 30, 582–598. Hopkins, T. R. (1983), A revised algorithm for the spectral test, Applied Statistics 32, 328–335. (See http://www.cs.ukc.ac.uk/pubs/1997 for an updated version.) H¨ ormann, W. (1994a), A universal generator for discrete log-concave distributions, Computing 52, 89–96. H¨ ormann, Wolfgang (1994b), A note on the quality of random variates generated by the ratio of uniforms method, ACM Transactions on Modeling and Computer Simulation 4, 96–106. H¨ ormann, Wolfgang (1995), A rejection technique for sampling from T -concave distributions, ACM Transactions on Mathematical Software 21, 182–193. H¨ ormann, Wolfgang (2000), Algorithm 802: An automatic generator for bivariate log-concave distributions, ACM Transactions on Mathematical Software 26, 201–219. H¨ ormann, Wolfgang, and Gerhard Derflinger (1993), A portable random number generator well suited for the rejection method, ACM Transactions on Mathematical Software 19, 489–495. H¨ ormann, Wolfgang, and Gerhard Derflinger (1994), The transformed rejection method for generating random variables, an alternative to the ratio of uniforms method, Communications in Statistics — Simulation and Computation 23, 847–860. Hosack, J. M. (1986), The use of Chebyshev mixing to generate pseudo-random numbers, Journal of Computational Physics 67, 482–486. Huber, Peter J. (1985), Projection pursuit (with discussion), The Annals of Statistics 13, 435–525. Hull, John C. (2000), Options, Futures, & Other Derivatives, Prentice–Hall, Englewood Cliffs, New Jersey. Ireland, Kenneth, and Michael Rosen (1991), A Classical Introduction to Modern Number Theory, Springer-Verlag, New York. J¨ ackel, Peter (2002), Monte Carlo Methods in Finance, John Wiley & Sons Ltd., Chichester. Jaditz, Ted (2000), Are the digits of π an independent and identically distributed sequence?, The American Statistician 54, 12–16. James, F. (1990), A review of pseudorandom number generators, Computer Physics Communications 60, 329–344. James, F. (1994), RANLUX: A Fortran implementation of the high-quality
352
BIBLIOGRAPHY
pseudorandom number generator of L¨ uscher, Computer Physics Communications 79, 111–114. J¨ ohnk, M. D. (1964), Erzeugung von Betaverteilter und Gammaverteilter Zufallszahlen, Metrika 8, 5–15. Johnson, Mark E. (1987), Multivariate Statistical Simulation, John Wiley & Sons, New York. Johnson, Valen E. (1996), Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, Journal of the American Statistical Association 91, 154–166. Jones, G.; C. D. Lai; and J. C. W. Rayner (2000), A bivariate gamma mixture distribution, Communications in Statistics — Theory and Methods 29, 2775–2790. Joy, Corwin; Phelim P. Boyle; and Ken Seng Tan (1996), Quasi-Monte Carlo methods in numerical finance, Management Science 42, 926–938. Juneja, Sandeep, and Perwez Shahabudding (2001), Fast simulation of Markov chains with small transition probabilities, Management Science 47, 547–562. Kachitvichyanukul, Voratas (1982), Computer Generation of Poisson, Binomial, and Hypergeometric Random Variables, unpublished Ph.D. dissertation, Purdue University, West Lafayette, Indiana. Kachitvichyanukul, Voratas; Shiow-Wen Cheng; and Bruce Schmeiser (1988), Fast Poisson and binomial algorithms for correlation induction, Journal of Statistical Computation and Simulation 29, 17–33. Kachitvichyanukul, Voratas, and Bruce Schmeiser (1985), Computer generation of hypergeometric random variates, Journal of Statistical Computation and Simulation 22, 127–145. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1988a), Binomial random variate generation, Communications of the ACM 31, 216–223. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1988b), Algorithm 668: H2PEC: Sampling from the hypergeometric distribution, ACM Transactions on Mathematical Software 14, 397–398. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1990), BTPEC: Sampling from the binomial distribution, ACM Transactions on Mathematical Software 16, 394–397. Kahn, H., and A. W. Marshall (1953), Methods of reducing sample size in Monte Carlo computations, Journal of the Operations Research Society of America 1, 263–278. Kankaala, K.; T. Ala-Nissila; and I. Vattulainen (1993), Bit-level correlations in some pseudorandom number generators, Physical Review E 48, R4211– R4214. Kao, Chiang, and H. C. Tang (1997a), Upper bounds in spectral test for multiple recursive random number generators with missing terms, Computers and Mathematical Applications 33, 113–118. Kao, Chiang, and H. C. Tang (1997b), Systematic searches for good multiple recursive random number generators, Computers and Operations Research 24, 899–905.
BIBLIOGRAPHY
353
Karian, Zaven A., and Edward J. Dudewicz (1999), Fitting the generalized lambda distribution to data: A method based on percentiles, Communications in Statistics — Simulation and Computation 28, 793–819. Karian, Zaven A., and Edward J. Dudewicz (2000), Fitting Statistical Distributions, CRC Press, Boca Raton, Florida. Karian, Zaven A.; Edward J. Dudewicz; and Patrick McDonald (1996), The extended generalized lambda distribution system for fitting distributions to data: History, completion of theory, tables, applications, the “final word” on moment fits, Communications in Statistics — Simulation and Computation 25, 611–642. Kato, Takashi; Li-ming Wu; and Niro Yanagihara (1996a), On a nonlinear congruential pseudorandom number generator, Mathematics of Computation 65, 227–233. Kato, Takashi; Li-ming Wu; and Niro Yanagihara (1996b), The serial test for a nonlinear pseudorandom number generator, Mathematics of Computation 65, 761–769. Kemp, A. W. (1981), Efficient generation of logarithmically distributed pseudorandom variables, Applied Statistics 30, 249–253. Kemp, A. W. (1990), Patchwork rejection algorithms, Journal of Computational and Applied Mathematics 31, 127–131. Kemp, C. D. (1986), A modal method for generating binomial variables, Communications in Statistics — Theory and Methods 15, 805–813. Kemp, C. D., and Adrienne W. Kemp (1987), Rapid generation of frequency tables, Applied Statistics 36, 277–282. Kemp, C. D., and Adrienne W. Kemp (1991), Poisson random variate generation, Applied Statistics 40, 143–158. Kinderman, A. J., and J. F. Monahan (1977), Computer generation of random variables using the ratio of uniform deviates, ACM Transactions on Mathematical Software 3, 257–260. Kinderman, A. J., and J. F. Monahan (1980), New methods for generating Student’s t and gamma variables, Computing 25, 369–377. Kinderman, A. J., and J. G. Ramage (1976), Computer generation of normal random variables, Journal of the American Statistical Association 71, 893– 896. Kirkpatrick, S.; C. D. Gelatt; and M. P. Vecchi (1983), Optimization by simulated annealing, Science 220, 671–679. Kirkpatrick, Scott, and Erich P. Stoll (1981), A very fast shift-register sequence random number generator, Journal of Computational Physics 40, 517–526. Kleijnen, Jack P. C. (1977), Robustness of a multiple ranking procedure: A Monte Carlo experiment illustrating design and analysis techniques, Communications in Statistics — Simulation and Computation B6, 235–262. Knuth, Donald E. (1975), Estimating the efficiency of backtrack programs, Mathematics of Computation 29, 121–136. Knuth, Donald E. (1998), The Art of Computer Programming, Volume 2, Semi-
354
BIBLIOGRAPHY
numerical Algorithms, third edition, Addison–Wesley Publishing Company, Reading, Massachusetts. Kobayashi, K. (1991), On generalized gamma functions occurring in diffraction theory, Journal of the Physical Society of Japan 60, 1501–1512. Kocis, Ladislav, and William J. Whiten (1997), Computational investigations of low-discrepancy sequences, ACM Transactions on Mathematical Software 23, 266–294. Koehler, J. R., and A. B. Owen (1996), Computer experiments, Handbook of Statistics, Volume 13 (edited by S. Ghosh and C. R. Rao), Elsevier Science Publishers, Amsterdam, 261–308. Kotz, Samuel; N. Balakrishnan; and Norman L. Johnson (2000), Continuous Multivariate Distributions, second edition, John Wiley & Sons, New York. Kovalenko, I. N. (1972), Distribution of the linear rank of a random matrix, Theory of Probability and Its Applications 17, 342–346. Kozubowski, Tomasz J., and Krzysztof Podg´ orski (2000), A multivariate and asymmetric generalization of Laplace distribution, Computational Statistics 15, 531–540. Krawczyk, Hugo (1992), How to predict congruential generators, Journal of Algorithms 13, 527–545. Krommer, Arnold R., and Christoph W. Ueberhuber (1994), Numerical Integration on Advanced Computer Systems, Springer-Verlag, New York. Kronmal, Richard A., and Arthur V. Peterson (1979a), On the alias method for generating random variables from a discrete distribution, The American Statistician 33, 214–218. Kronmal, R. A., and A. V. Peterson (1979b), The alias and alias-rejectionmixture methods for generating random variables from probability distributions, Proceedings of the 1979 Winter Simulation Conference, Institute of Electrical and Electronics Engineers, New York, 269–280. Kronmal, Richard A., and Arthur V. Peterson (1981), A variant of the acceptancerejection method for computer generation of random variables, Journal of the American Statistical Association 76, 446–451 (Corrections, 1982, ibid. 77, 954). Kronmal, Richard A., and Arthur V. Peterson (1984), An acceptance-complement analogue of the mixture-plus-acceptance-rejection method for generating random variables, ACM Transactions on Mathematical Software 10, 271– 281. Kumada, Toshihiro; Hannes Leeb; Yoshiharu Kurita; and Makoto Matsumoto (2000), New primitive t-nomials (t = 3, 5) over GF (2) whose degree is a Mersenne exponent, Mathematics of Computation 69, 811–814. Lagarias, Jeffrey C. (1993), Pseudorandom numbers, Statistical Science 8, 31– 39. Laud, Purushottam W.; Paul Ramgopal; and Adrian F. M. Smith (1993), Random variate generation from D-distributions, Statistics and Computing 3, 109–112.
BIBLIOGRAPHY
355
Lawrance, A. J. (1992), Uniformly distributed first-order autoregressive time series models and multiplicative congruential random number generators, Journal of Applied Probability 29, 896–903. Learmonth, G. P., and P. A. W. Lewis (1973), Statistical tests of some widely used and recently proposed uniform random number generators, Computer Science and Statistics: 7th Annual Symposium on the Interface (edited by William J. Kennedy), Statistical Laboratory, Iowa State University, Ames, Iowa, 163–171. L’Ecuyer, Pierre (1988), Efficient and portable combined random number generators, Communications of the ACM 31, 742–749, 774. L’Ecuyer, Pierre (1990), Random numbers for simulation, Communications of the ACM 33, Number 10 (October), 85–97. L’Ecuyer, Pierre (1996), Combined multiple recursive random number generators, Operations Research 44, 816–822. L’Ecuyer, Pierre (1997), Tests based on sum-functions of spacings for uniform random numbers, Journal of Statistical Computation and Simulation 59, 251–269. L’Ecuyer, Pierre (1998), Random number generators and empirical tests, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), Springer-Verlag, New York, 124–138. L’Ecuyer, Pierre (1999), Good parameters and implementations for combined multiple recursive random number generators, Operations Research 47, 159– 164. L’Ecuyer, Pierre; Fran¸cois Blouin; and Raymond Couture (1993), A search for good multiple recursive random number generators, ACM Transactions on Modeling and Computer Simulation 3, 87–98. L’Ecuyer, Pierre; Jean-Fran¸coise Cordeau; and Richard Simard (2000), Closepoint spatial tests and their application to random number generators, Operations Research 48, 308–317. L’Ecuyer, Pierre, and Peter Hellekalek (1998), Random number generators: Selection criteria and testing, Random and Quasi-Random Point Sets (edited by Peter Hellekalek and Gerhard Larcher), Springer-Verlag, New York, 223– 266. L’Ecuyer, Pierre, and Richard Simard (1999), Beware of linear congruential generators with multipliers of the form a = ±2q ± 2r , ACM Transactions on Mathematical Software 25, 367–374. L’Ecuyer, Pierre, and Shu Tezuka (1991), Structural properties for two classes of combined random number generators, Mathematics of Computation 57, 735–746. Lee, A. J. (1993), Generating random binary deviates having fixed marginal distributions and specified degrees of association, The American Statistician 47, 209–215. Leeb, Hannes, and Stefan Wegenkittl (1997), Inversive and linear congruential
356
BIBLIOGRAPHY
pseudorandom number generators in empirical tests, ACM Transactions on Modeling and Computer Simulation 7, 272–286. Lehmer, D. H. (1951), Mathematical methods in large-scale computing units, Proceedings of the Second Symposium on Large Scale Digital Computing Machinery, Harvard University Press, Cambridge, Massachusetts. 141–146. Leva, Joseph L. (1992a), A fast normal random number generator, ACM Transactions on Mathematical Software 18, 449–453. Leva, Joseph L. (1992b), Algorithm 712: A normal random number generator, ACM Transactions on Mathematical Software 18, 454–455. Lewis, P. A. W.; A. S. Goodman; and J. M. Miller (1969), A pseudo-random number generator for the System/360, IBM Systems Journal 8, 136–146. Lewis, P. A. W., and E. J. Orav (1989), Simulation Methodology for Statisticians, Operations Analysts, and Engineers, Volume I, Wadsworth & Brooks/Cole, Pacific Grove, California. Lewis, P. A. W., and G. S. Shedler (1979), Simulation of nonhomogeneous Poisson processes by thinning, Naval Logistics Quarterly 26, 403–413. Lewis, T. G., and W. H. Payne (1973), Generalized feedback shift register pseudorandom number algorithm, Journal of the ACM 20, 456–468. Leydold, Josef (1998), A rejection technique for sampling from log-concave multivariate distributions, ACM Transactions on Modeling and Computer Simulation 8, 254–280. Leydold, Josef (2000), Automatic sampling with the ratio-of-uniforms method, ACM Transactions on Mathematical Software 26, 78–98. Leydold, Josef (2001), A simple universal generator for continuous and discrete univariate T -concave distributions, ACM Transactions on Mathematical Software 27, 66–82. Li, Kim-Hung (1994), Reservoir-sampling algorithms of time complexity O(n(1+ log(N/n))), ACM Transactions on Mathematical Software 20, 481–493. Li, Shing Ted, and Joseph L. Hammond (1975), Generation of pseudo-random numbers with specified univariate distributions and correlation coefficients, IEEE Transactions on Systems, Man, and Cybernetics 5, 557–560. Liao, J. G., and Ori Rosen (2001), Fast and stable algorithms for computing and sampling from the noncentral hypergeometric distribution, The American Statistician 55, 366–369. Liu, Jun S. (1996), Metropolized independent sampling with comparisons to rejection sampling and importance sampling, Statistics and Computing 6, 113–119. Liu, Jun S. (2001), Monte Carlo Strategies in Scientific Computing, SpringerVerlag, New York. Liu, Jun S.; Rong Chen; and Tanya Logvinenko (2001), A theoretical framework for sequential importance sampling with resampling, Sequential Monte Carlo Methods in Practice (edited by Arnaud Doucet, Nando de Freitas, and Neil Gordon) Springer-Verlag, New York, 225–246. Liu, Jun S.; Rong Chen; and Wing Hung Wong (1998), Rejection control and
BIBLIOGRAPHY
357
sequential importance sampling Journal of the American Statistical Association 93, 1022–1031. London, Wendy B., and Chris Gennings (1999), Simulation of multivariate gamma data with exponential marginals for independent clusters, Communications in Statistics — Simulation and Computation 28, 487–500. Luby, Michael (1996), Pseudorandomness and Cryptographic Applications, Princeton University Press, Princeton. Lurie, D., and H. O. Hartley (1972), Machine generation of order statistics for Monte Carlo computations, The American Statistician 26(1), 26–27. Lurie, D., and R. L. Mason (1973), Empirical investigation of general techniques for computer generation of order statistics, Communications in Statistics 2, 363–371. Lurie, Philip M., and Matthew S. Goldberg (1998), An approximate method for sampling correlated random variables from partially specified distributions, Management Science 44, 203–218. L¨ uscher, Martin (1994), A portable high-quality random number generator for lattice field theory simulations, Computer Physics Communications 79, 100– 110. MacEachern, Steven N., and L. Mark Berliner (1994), Subsampling the Gibbs sampler, The American Statistician 48, 188–190. MacLaren, M. D., and G. Marsaglia (1965), Uniform random number generators, Journal of the ACM 12, 83–89. Manly, Bryan F. J. (1997), Randomization, Bootstrap and Monte Carlo Methods in Biology, second edition, Chapman & Hall, London. Marasinghe, Mervyn G., and William J. Kennedy, Jr. (1982), Direct methods for generating extreme characteristic roots of certain random matrices, Communications in Statistics — Simulation and Computation 11, 527–542. Marinari, Enzo, and G. Parisi (1992), Simulated tempering: A new Monte Carlo scheme, Europhysics Letters 19, 451–458. Marriott, F. H. C. (1979), Barnard’s Monte Carlo tests: How many simulations?, Applied Statistics 28, 75–78. Marsaglia, G. (1962), Random variables and computers, Information Theory, Statistical Decision Functions, and Random Processes (edited by J. Kozesnik), Czechoslovak Academy of Sciences, Prague, 499–510. Marsaglia, G. (1963), Generating discrete random variables in a computer, Communications of the ACM 6, 37–38. Marsaglia, George (1964), Generating a variable from the tail of a normal distribution, Technometrics 6, 101–102. Marsaglia, G. (1968), Random numbers fall mainly in the planes, Proceedings of the National Academy of Sciences 61, 25–28. Marsaglia, G. (1972a), The structure of linear congruential sequences, Applications of Number Theory to Numerical Analysis (edited by S. K. Zaremba), Academic Press, New York, 249–286. Marsaglia, George (1972b), Choosing a point from the surface of a sphere, Annals of Mathematical Statistics 43, 645–646.
358
BIBLIOGRAPHY
Marsaglia, G. (1977), The squeeze method for generating gamma variates, Computers and Mathematics with Applications 3, 321–325. Marsaglia, G. (1980), Generating random variables with a t-distribution, Mathematics of Computation 34, 235–236. Marsaglia, George (1984), The exact-approximation method for generating random variables in a computer, Journal of the American Statistical Association 79, 218–221. Marsaglia, George (1985), A current view of random number generators, Computer Science and Statistics: 16th Symposium on the Interface (edited by L. Billard), North-Holland, Amsterdam, 3–10. Marsaglia, George (1991), Normal (Gaussian) random variables for supercomputers, Journal of Supercomputing 5, 49–55. Marsaglia, George (1995), The Marsaglia Random Number CDROM, including the DIEHARD Battery of Tests of Randomness, Department of Statistics, Florida State University, Tallahassee, Florida. Available at http://stat.fsu.edu/~geo/diehard.html . Marsaglia, G., and T. A. Bray (1964), A convenient method for generating normal variables, SIAM Review 6, 260–264. Marsaglia, G.; M. D. MacLaren; and T. A. Bray (1964), A fast method for generating normal random variables, Communications of the ACM 7, 4–10. Marsaglia, George, and Ingram Olkin (1984), Generating correlation matrices, SIAM Journal on Scientific and Statistical Computing 5, 470–475. Marsaglia, George, and Wai Wan Tsang (1984), A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions, SIAM Journal of Scientific and Statistical Computing 5, 349–359. Marsaglia, George, and Wai Wan Tsang (1998), The Monty Python method for generating random variables, ACM Transactions on Mathematical Software 24, 341–350. Marsaglia, George, and Liang-Huei Tsay (1985), Matrices and the structure of random number sequences, Linear Algebra and Its Applications 67, 147– 156. Marsaglia, George, and Arif Zaman (1991), A new class of random number generators, The Annals of Applied Probability 1, 462–480. Marsaglia, George; Arif Zaman; and John C. W. Marsaglia (1994), Rapid evaluation of the inverse normal distribution function, Statistics and Probability Letters 19, 259–266. Marshall, Albert W., and Ingram Olkin (1967), A multivariate exponential distribution, Journal of the American Statistical Association 62, 30–44. Marshall, Albert W., and Ingram Olkin (1979), Inequalities — Theory of Majorization and Its Applications, Academic Press, New York. Mascagni, Michael; M. L. Robinson; Daniel V. Pryor; and Steven A. Cuccaro (1995), Parallel pseudorandom number generation using additive laggedFibonacci recursions, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 262–267.
BIBLIOGRAPHY
359
Mascagni, Michael, and Ashok Srinivasan (2000), SPRNG: A scalable library for pseudorandom number generation, ACM Transactions on Mathematical Software 26, 346–461. (Assigned as Algorithm 806, 2000, ibid. 26, 618– 619). Matsumoto, Makoto, and Yoshiharu Kurita (1992), Twisted GFSR generators, ACM Transactions on Modeling and Computer Simulation 2, 179–194. Matsumoto, Makoto, and Yoshiharu Kurita (1994), Twisted GFSR generators II, ACM Transactions on Modeling and Computer Simulation 4, 245– 266. Matsumoto, Makoto, and Takuji Nishimura (1998), Mersenne twister: A 623dimensionally equidistributed uniform pseudo-random generator, ACM Transactions on Modeling and Computer Simulation 8, 3–30. Maurer, Ueli M. (1992), A universal statistical test for random bit generators, Journal of Cryptology 5, 89–105. McCullough, B. D. (1999), Assessing the reliability of statistical software: Part II, The American Statistician 53, 149–159. McKay, Michael D.; William J. Conover; and Richard J. Beckman (1979), A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21, 239–245. McLeod, A. I., and D. R. Bellhouse (1983), A convenient algorithm for drawing a simple random sample, Applied Statistics 32, 182–184. Mendoza-Blanco, Jos´e R., and Xin M. Tu (1997), An algorithm for sampling the degrees of freedom in Bayesian analysis of linear regressions with tdistributed errors, Applied Statistics 46, 383–413. Mengersen, Kerrie L.; Christian P. Robert; and Chantal Guihenneuc-Jouyaux (1999), MCMC convergence diagnostics: A reviewww, Bayesian Statistics 6 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 415–440. Metropolis, N.; A. W. Rosenbluth; M. N. Rosenbluth; A. H. Teller; and E. Teller (1953), Equations of state calculation by fast computing machines, Journal of Chemical Physics 21, 1087–1092. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 127–139.) Meyn, S. P., and R. L. Tweedie (1993), Markov Chains and Stochastic Stability, Springer-Verlag, New York. Michael, John R.; William R. Schucany; and Roy W. Haas (1976), Generating random variates using transformations with multiple roots, The American Statistician 30, 88–90. Mihram, George A., and Robert A. Hultquist (1967), A bivariate warningtime/failure-time distribution, Journal of the American Statistical Association 62, 589–599. Modarres, R., and J. P. Nolan (1994), A method for simulating stable random vectors, Computational Statistics 9, 11–19. Møller, Jesper, and Katja Schladitz (1999), Extensions of Fill’s algorithm for
360
BIBLIOGRAPHY
perfect simulation, Journal of the Royal Statistical Society, Series B 61, 955–969. Monahan, John F. (1987), An algorithm for generating chi random variables, ACM Transactions on Mathematical Software 13, 168–171 (Corrections, 1988, ibid. 14, 111). Morel, Jorge G. (1992), A simple algorithm for generating multinomial random vectors with extravariation, Communications in Statistics — Simulation and Computation 21, 1255–1268. Morgan, B. J. T. (1984), Elements of Simulation, Chapman & Hall, London. Nagaraja, H. N. (1979), Some relations between order statistics generated by different methods, Communications in Statistics — Simulation and Computation B8, 369–377. Neal, Radford M. (1996), Sampling from multimodal distributions using tempered transitions, Statistics and Computing 6, 353–366. Neave, H. R. (1973), On using the Box–Muller transformation with multiplicative congruential pseudo-random number generators, Applied Statistics 22, 92–97. Newman, M. E. J., and G. T. Barkema (1999), Monte Carlo Methods in Statistical Physics, Oxford University Press, Oxford, United Kingdom. Niederreiter, H. (1988), Remarks on nonlinear congruential pseudorandom numbers, Metrika 35, 321–328. Niederreiter, H. (1989), The serial test for congruential pseudorandom numbers generated by inversions, Mathematics of Computation 52, 135–144. Niederreiter, Harald (1992), Random Number Generation and Quasi-Monte Carlo Methods, Society for Industrial and Applied Mathematics, Philadelphia. Niederreiter, Harald (1993), Factorization of polynomials and some linear-algebra problems over finite fields, Linear Algebra and Its Applications 192, 301– 328. Niederreiter, Harald (1995a), The multiple-recursive matrix method for pseudorandom number generation, Finite Fields and Their Applications 1, 3–30. Niederreiter, Harald (1995b), Pseudorandom vector generation by the multiplerecursive matrix method, Mathematics of Computation 64, 279–294. Niederreiter, Harald (1995c), New developments in uniform pseudorandom number and vector generation, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 87–120. Niederreiter, Harald (1995d), Some linear and nonlinear methods for pseudorandom number generation, Proceedings of the 1995 Winter Simulation Conference, Association for Computing Machinery, New York, 250–254. Niederreiter, Harald; Peter Hellekalek; Gerhard Larcher; and Peter Zinterhof (Editors) (1998), Monte Carlo and Quasi-Monte Carlo Methods 1996, Springer-Verlag, New York. Niederreiter, Harald, and Peter Jau-Shyong Shiue (Editors) (1995), Monte
BIBLIOGRAPHY
361
Carlo and Quasi-Monte Carlo Methods in Scientific Computing, SpringerVerlag, New York. Niederreiter, Harald, and Jerome Spanier (Editors) (1999), Monte Carlo and Quasi-Monte Carlo Methods 1998, Springer-Verlag, New York. NIST (2000), A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, NIST Special Publication 80022, National Institute for Standards and Technology, Gaithersburg, Maryland. Nolan, John P. (1998a), Multivariate stable distributions: Approximation, estimation, simulation and identification, A Practical Guide to Heavy Tails: Statistical Techniques and Applications (edited by Robert J. Adler, Raisa E. Feldman, and Murad S. Taqqu), Birkh¨ auser, Boston, 509–526. Nolan, John P. (1998b), Univariate stable distributions: Parameterizations and software, A Practical Guide to Heavy Tails: Statistical Techniques and Applications (edited by Robert J. Adler, Raisa E. Feldman, and Murad S. Taqqu), Birkh¨ auser, Boston, 527–533. Norman, J. E., and L. E. Cannon (1972), A computer program for the generation of random variables from any discrete distribution, Journal of Statistical Computation and Simulation 1, 331–348. Odell, P. L., and A. H. Feiveson (1966), A numerical procedure to generate a sample covariance matrix, Journal of the American Statistical Association 61, 199–203. Ogata, Yosihiko (1990), A Monte Carlo method for an objective Bayesian procedure, Annals of the Institute for Statistical Mathematics 42, 403–433. Oh, Man-Suk, and James O. Berger (1993), Integration of multimodal functions by Monte Carlo importance sampling, Journal of the American Statistical Association 88, 450–456. Øksendal, Bernt (1998), Stochastic Differential Equations. An Introduction with Applications, fifth edition, Springer-Verlag, Berlin. ¨ Okten, Giray (1998), Error estimates for quasi-Monte Carlo methods, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), Springer-Verlag, New York, 353–358. Olken, Frank, and Doron Rotem (1995a), Random sampling from databases: A survey, Statistics and Computing 5, 25–42. Olken, Frank, and Doron Rotem (1995b), Sampling from spatial databases, Statistics and Computing 5, 43–57. Owen, A. B. (1992a), A central limit theorem for Latin hypercube sampling, Journal of the Royal Statistical Society, Series B 54, 541–551. Owen, A. B. (1992b), Orthogonal arrays for computer experiments, integration and visualization, Statistica Sinica 2, 439–452. Owen, A. B. (1994a), Lattice sampling revisited: Monte Carlo variance of means over randomized orthogonal arrays, Annals of Statistics 22, 930–945. Owen, Art B. (1994b), Controlling correlations in Latin hypercube samples, Journal of the American Statistical Association 89, 1517–1522.
362
BIBLIOGRAPHY
Owen, Art B. (1995), Randomly permuted (t, m, s)-nets and (t, s)-sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 299–317. Owen, Art B. (1997), Scrambled net variance for integrals of smooth functions, Annals of Statistics 25, 1541–1562. Owen, Art B. (1998), Latin supercube sampling for very high-dimensional simulations, ACM Transactions on Modeling and Computer Simulation 8, 71– 102. Papageorgiou, A., and J. F. Traub (1996), Beating Monte Carlo, Risk (June), 63–65. Park, Chul Gyu; Tasung Park; and Dong Wan Shin (1996), A simple method for generating correlated binary variates, The American Statistician 50, 306–310. Park, Stephen K., and Keith W. Miller (1988), Random number generators: Good ones are hard to find, Communications of the ACM 31, 1192–1201. Parrish, Rudolph S. (1990), Generating random deviates from multivariate Pearson distributions, Computational Statistics & Data Analysis 9, 283– 295. Patefield, W. M. (1981), An efficient method of generating r × c tables with given row and column totals, Applied Statistics 30, 91–97. Pearson, E. S.; N. L. Johnson; and I. W. Burr (1979), Comparisons of the percentage points of distributions with the same first four moments, chosen from eight different systems of frequency curves, Communications in Statistics — Simulation and Computation 8, 191–230. Perlman, Michael D., and Michael J. Wichura (1975), Sharpening Buffon’s needle, The American Statistician 29, 157–163. Peterson, Arthur V., and Richard A. Kronmal (1982), On mixture methods for the computer generation of random variables, The American Statistician 36, 184–191. Philippe, Anne (1997), Simulation of right and left truncated gamma distributions by mixtures, Statistics and Computing 7, 173–181. Pratt, John W. (1981), Concavity of the log likelihood, Journal of the American Statistical Association 76, 103–106. Press, William H.; Saul A. Teukolsky; William T. Vetterling; and Brian P. Flannery (1992), Numerical Recipes in Fortran, second edition, Cambridge University Press, Cambridge, United Kingdom. Press, William H.; Saul A. Teukolsky; William T. Vetterling; and Brian P. Flannery (2002), Numerical Recipes in C++, second edition, Cambridge University Press, Cambridge, United Kingdom. Propp, James Gary, and David Bruce Wilson (1996), Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures and Algorithms 9, 223–252. Propp, James, and David Wilson (1998), Coupling from the past: A user’s guide, Microsurveys in Discrete Probability (edited by D. Aldous and J.
BIBLIOGRAPHY
363
Propp), American Mathematical Society, Providence, Rhode Island, 181– 192. Pullin, D. I. (1979), Generation of normal variates with given sample mean and variance, Journal of Statistical Computation and Simulation 9, 303–309. Rabinowitz, M., and M. L. Berenson (1974), A comparison of various methods of obtaining random order statistics for Monte-Carlo computations. The American Statistician 28, 27–29. Rajasekaran, Sanguthevar, and Keith W. Ross (1993), Fast algorithms for generating discrete random variates with changing distributions, ACM Transactions on Modeling and Computer Simulation 3, 1–19. Ramberg, John S., and Bruce W. Schmeiser (1974), An approximate method for generating asymmetric random variables, Communications of the ACM 17, 78–82. RAND Corporation (1955), A Million Random Digits with 100,000 Normal Deviates, Free Press, Glencoe, Illinois. Ratnaparkhi, M. V. (1981), Some bivariate distributions of (X, Y ) where the conditional distribution of Y , given X, is either beta or unit-gamma, Statistical Distributions in Scientific Work. Volume 4 – Models, Structures, and Characterizations (edited by Charles Taillie, Ganapati P. Patil, and Bruno A. Baldessari), D. Reidel Publishing Company, Boston, 389–400. Reeder, H. A. (1972), Machine generation of order statistics, The American Statistician 26(4), 56–57. Relles, Daniel A. (1972), A simple algorithm for generating binomial random variables when N is large, Journal of the American Statistical Association 67, 612–613. Ripley, Brian D. (1987), Stochastic Simulation, John Wiley & Sons, New York. Robert, Christian P. (1995), Simulation of truncated normal variables, Statistics and Computing 5, 121–125. Robert, Christian P. (1998a), A pathological MCMC algorithm and its use as a benchmark for convergence assessment techniques, Computational Statistics 13, 169–184. Robert, Christian P. (Editor) (1998b), Discretization and MCMC Convergence Assessment, Springer-Verlag, New York. Robert, Christian P., and George Casella (1999), Monte Carlo Statistical Methods, Springer-Verlag, New York. Roberts, G. O. (1992), Convergence diagnostics of the Gibbs sampler, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 775–782. Roberts, Gareth O. (1996), Markov chain concepts related to sampling algorithms, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 45–57. Robertson, J. M., and G. R. Wood (1998), Information in Buffon experiments, Journal of Statistical Planning and Inference 66, 21–37. Ronning, Gerd (1977), A simple scheme for generating multivariate gamma
364
BIBLIOGRAPHY
distributions with non-negative covariance matrix, Technometrics 19, 179– 183. Rosenbaum, Paul R. (1993), Sampling the leaves of a tree with equal probabilities, Journal of the American Statistical Association 88, 1455–1457. Rosenthal, Jeffrey S. (1995), Minorization conditions and convergence rates for Markov chain Monte Carlo, Journal of the American Statistical Association 90, 558–566. Rousseeuw, Peter J., and Annick M. Leroy (1987), Robust Regression and Outlier Detection, John Wiley & Sons, New York. Rubin, Donald B. (1987), Comment on Tanner and Wong, “The calculation of posterior distributions by data augmentation”, Journal of the American Statistical Association 82, 543–546. Rubin, Donald B. (1988), Using the SIR algorithm to simulate posterior distributions (with discussion), Bayesian Statistics 3 (edited by J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 395–402. Ryan, Thomas P. (1980), A new method of generating correlation matrices, Journal of Statistical Computation and Simulation 11, 79–85. Sacks, Jerome; William J. Welch; Toby J. Mitchell; and Henry P. Wynn (1989), Design and analysis of computer experiments (with discussion), Statistical Science 4, 409–435. Sarkar, P. K., and M. A. Prasad (1987), A comparative study of pseudo and quasirandom sequences for the solution of integral equations, Journal of Computational Physics 68, 66–88. Sarkar, Tapas K. (1996), A composition-alias method for generating gamma variates with shape parameter greater than 1, ACM Transactions on Mathematical Software 22, 484–492. S¨ arndal, Carl-Erik; Bengt Swensson; and Jan Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag, New York. Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, Chapman & Hall, London. Schervish, Mark J., and Bradley P. Carlin (1992) On the convergence of successive substitution sampling, Journal of Computational and Graphical Statistics 1, 111–127. Schmeiser, Bruce (1983), Recent advances in generation of observations from discrete random variates, Computer Science and Statistics: The Interface (edited by James E. Gentle), North-Holland Publishing Company, Amsterdam, 154–160. Schmeiser, Bruce, and A. J. G. Babu (1980), Beta variate generation via exponential majorizing functions, Operations Research 28, 917–926. Schmeiser, Bruce, and Voratas Kachitvichyanukul (1990), Noninverse correlation induction: Guidelines for algorithm development, Journal of Computational and Applied Mathematics 31, 173–180. Schmeiser, Bruce, and R. Lal (1980), Squeeze methods for generating gamma variates, Journal of the American Statistical Association 75, 679–682.
BIBLIOGRAPHY
365
Schucany, W. R. (1972), Order statistics in simulation, Journal of Statistical Computation and Simulation 1, 281–286. Selke, W.; A. L. Talapov; and L. N. Shchur (1993), Cluster-flipping Monte Carlo algorithm and correlations in “good” random number generators, JETP Letters 58, 665–668. Shao, Jun, and Dongsheng Tu (1995), The Jackknife and Bootstrap, SpringerVerlag, New York. Shaw, J. E. H. (1988), A quasi-random approach to integration in Bayesian statistics, Annals of Statistics 16, 895–914. Shchur, Lev N., and Henk W. J. Bl¨ ote (1997), Cluster Monte Carlo: Scaling of systematic errors in the two-dimensional Ising model, Physical Review E 55, R4905–R4908. Sibuya, M. (1961), Exponential and other variable generators, Annals of the Institute for Statistical Mathematics 13, 231–237. Sinclair, C. D., and B. D. Spurr (1988), Approximations to the distribution function of the Anderson–Darling test statistic, Journal of the American Statistical Association 83, 1190–1191. Smith, A. F. M., and G. O. Roberts (1993), Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society, Series B 55, 3–24. Smith, Robert L. (1984), Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions, Operations Research 32, 1297– 1308. Smith, W. B., and R. R. Hocking (1972), Algorithm AS53: Wishart variate generator, Applied Statistics 21, 341–345. Sobol’, I. M. (1967), On the distribution of points in a cube and the approximate evaluation of integrals, USSR Computational Mathematics and Mathematical Physics 7, 86–112. Sobol’, I. M. (1976), Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics 16, 236–242. Spanier, Jerome, and Keith B. Oldham (1987), An Atlas of Functions, Hemisphere Publishing Corporation, Washington (also Springer-Verlag, Berlin). Srinivasan, Ashok; Michael Mascagni; and David Ceperley (2003), Testing parallel random number generators, Parallel Computing 29, 69–94. Stacy, E. W. (1962), A generalization of the gamma distribution, Annals of Mathematical Statistics 33, 1187–1191. Stadlober, Ernst (1990), The ratio of uniforms approach for generating discrete random variates, Journal of Computational and Applied Mathematics 31, 181–189. Stadlober, Ernst (1991), Binomial variate generation: A method based on ratio of uniforms, The Frontiers of Statistical Computation, Simulation & Mod¨ urk, and E. C. van der eling (edited by P. R. Nelson, E. J. Dudewicz, A. Ozt¨ Meulen), American Sciences Press, Columbus, Ohio, 93–112.
366
BIBLIOGRAPHY
Steel, S. J., and N. J. le Roux (1987), A reparameterisation of a bivariate gamma extension, Communications in Statistics — Theory and Methods 16, 293–305. Stef˘ anescu, S., and I. V˘ aduva (1987), On computer generation of random vectors by transformations of uniformly distributed vectors, Computing 39, 141–153. Stein, Michael (1987), Large sample properties of simulations using Latin hypercube sampling, Technometrics 29, 143–151. Stephens, Michael A. (1986), Tests based on EDF statistics, Goodness-of-Fit Techniques (edited by Ralph B. D’Agostino and Michael A. Stephens), Marcel Dekker, New York, 97–193. Stewart, G. W. (1980), The efficient generation of random orthogonal matrices with an application to condition estimators, SIAM Journal of Numerical Analysis 17, 403–409. Stigler, Stephen M. (1978), Mathematical statistics in the early states, Annals of Statistics 6, 239–265. Stigler, Stephen M. (1991), Stochastic simulation in the nineteenth century, Statistical Science 6, 89–97. Student (1908a), On the probable error of a mean, Biometrika 6, 1–25. Student (1908b), Probable error of a correlation coefficient, Biometrika 6, 302– 310. Sullivan, Stephen J. (1993), Another test for randomness, Communications of the ACM 33, Number 7 (July), 108. Tadikamalla, Pandu R. (1980a), Random sampling from the exponential power distribution, Journal of the American Statistical Association 75, 683–686. Tadikamalla, Pandu R. (1980b), On simulating non-normal distributions, Psychometrika 45, 273–279. Tadikamalla, Pandu R., and Norman L. Johnson (1982), Systems of frequency curves generated by transformations of logistic variables, Biometrika 69, 461–465. Takahasi, K. (1965), Note on the multivariate Burr’s distribution, Annals of the Institute of Statistical Mathematics 17, 257–260. Tang, Boxin (1993), Orthogonal array-based Latin hypercubes, Journal of the American Statistical Association 88, 1392–1397. Tanner, Martin A. (1996), Tools for Statistical Inference, third edition, SpringerVerlag, New York. Tanner, M. A., and R. A. Thisted (1982), A remark on AS127. Generation of random orthogonal matrices, Applied Statistics 31, 190–192. Tanner, Martin A., and Wing Hung Wong (1987), The calculation of posterior distributions by data augmentation (with discussion), Journal of the American Statistical Association 82, 528–549. Tausworthe, R. C. (1965), Random numbers generated by linear recurrence modulo two, Mathematics of Computation 19, 201–209. Taylor, Malcolm S., and James R. Thompson (1986), Data based random
BIBLIOGRAPHY
367
number generation for a multivariate distribution via stochastic simulation, Computational Statistics & Data Analysis 4, 93–101. Tezuka, Shu (1991), Neave effect also occurs with Tausworthe sequences, Proceedings of the 1991 Winter Simulation Conference, Association for Computing Machinery, New York, 1030–1034. Tezuka, Shu (1993), Polynomial arithmetic analogue of Halton sequences, ACM Transactions on Modeling and Computer Simulation 3, 99–107. Tezuka, Shu (1995), Uniform Random Numbers: Theory and Practice, Kluwer Academic Publishers, Boston. Tezuka, Shu, and Pierre L’Ecuyer (1992), Analysis of add-with-carry and subtractwith-borrow generators, Proceedings of the 1992 Winter Simulation Conference, Association for Computing Machinery, New York, 443–447. Tezuka, Shu; Pierre L’Ecuyer; and R. Couture (1994), On the lattice structure of the add-with-carry and subtract-with-borrow random number generators, ACM Transactions on Modeling and Computer Simulation 3, 315–331. Thomas, Andrew; David J. Spiegelhalter; and Wally R. Gilks (1992), BUGS: A program to perform Bayesian inference using Gibbs sampling, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 837–842. Thompson, James R. (2000), Simulation: A Modeler’s Approach, John Wiley & Sons, New York. Thompson, William J. (1997), Atlas for Computing Mathematical Functions: An Illustrated Guide for Practitioners with Programs in C and Mathematica, John Wiley & Sons, New York. Tierney, Luke (1991), Exploring posterior distributions using Markov chains, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 563–570. Tierney, Luke (1994), Markov chains for exploring posterior distributions (with discussion), Annals of Statistics 22, 1701–1762. Tierney, Luke (1996), Introduction to general state-space Markov chain theory, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 59–74. Vale, C. David, and Vincent A. Maurelli (1983), Simulating multivariate nonnormal distributions, Psychometrika 48, 465–471. Vattulainen, I. (1999), Framework for testing random numbers in parallel calculations, Physical Review E 59, 7200–7204. Vattulainen, I.; T. Ala-Nissila; and K. Kankaala (1994), Physical tests for random numbers in simulations, Physical Review Letters 73, 2513–2516. Vattulainen, I.; T. Ala-Nissila; and K. Kankaala (1995), Physical models as tests for randomness, Physical Review E 52, 3205–3214. Vattulainen, I.; K. Kankaala; J. Saarinen; and T. Ala-Nissila (1995), A comparative study of some pseudorandom number generators, Computer Physics Communications 86, 209–226.
368
BIBLIOGRAPHY
Vitter, J. S. (1984), Faster methods for random sampling, Communications of the ACM 27, 703–717. Vitter, Jeffrey Scott (1985), Random sampling with a reservoir, ACM Transactions on Mathematical Software 11, 37–57. Von Neumann, J. (1951), Various Techniques Used in Connection with Random Digits, NBS Applied Mathematics Series 12, National Bureau of Standards (now National Institute of Standards and Technology), Washington. Vose, Michael D. (1991), A linear algorithm for generating random numbers with a given distribution, IEEE Transactions on Software Engineering 17, 972–975. Wakefield, J. C.; A. E. Gelfand; and A. F. M. Smith (1991), Efficient generation of random variates via the ratio-of-uniforms method, Statistics and Computing 1, 129–133. Walker, A. J. (1977), An efficient method for generating discrete random variables with general distributions, ACM Transactions on Mathematical Software 3, 253–256. Wallace, C. S. (1976), Transformed rejection generators for gamma and normal pseudo-random variables, Australian Computer Journal 8, 103–105. Wallace, C. S. (1996), Fast pseudorandom generators for normal and exponential variates, ACM Transactions on Mathematical Software 22, 119–127. Wichmann, B. A., and I. D. Hill (1982), Algorithm AS183: An efficient and portable pseudo-random number generator, Applied Statistics 31, 188–190 (Corrections, 1984, ibid. 33, 123). Wikramaratna, R. S. (1989), ACORN — A new method for generating sequences of uniformly distributed pseudo-random numbers, Journal of Computational Physics 83, 16–31. Wilson, David Bruce, and James Gary Propp (1996), How to get an exact sample from a generic Markov chain and sample a random spanning tree from a directed graph, both within the cover time, Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, 448–457. Wolfram, Stephen (1984), Random sequence generation by cellular automata, Advances in Applied Mathematics 7, 123–169. (Reprinted in Wolfram, 1994.) Wolfram, Stephen (1994), Cellular Automata and Complexity. Collected Papers, Addison–Wesley Publishing Company, Reading, Massachusetts. Wolfram, Stephen (2002), A New Kind of Science, Wolfram Media, Inc., Champaign, Illinois. Wollan, Peter C. (1992), A portable random number generator for parallel computers, Communications in Statistics — Simulation and Computation 21, 1247–1254. Wu, Pei-Chi (1997), Multiplicative, congruential random-number generators with multiplier ±2k1 ±2k2 and modulus 2p −1, ACM Transactions on Mathematical Software 23, 255–265.
BIBLIOGRAPHY
369
Yu, Bin (1995), Comment on Besag et al., “Bayesian computation and stochastic systems”: Extracting more diagnostic information from a single run using cusum path plot, Statistical Science 10, 54–58. Zaremba, S. K. (Editor) (1972), Applications of Number Theory to Numerical Analysis, Academic Press, New York. Zeisel, H. (1986), A remark on Algorithm AS183: An efficient and portable pseudo-random number generator, Applied Statistics 35, 89. Zierler, Neal, and John Brillhart (1968), On primitive trinomials (mod 2), Information and Control 13, 541–554. Zierler, Neal, and John Brillhart (1969), On primitive trinomials (mod 2), II, Information and Control 14, 566–569. Ziff, Robert M. (1998), Four-tap shift-register-sequence random-number generators, Computers in Physics 12, 385–392. Ziv, J., and A. Lempel (1977), A universal algorithm for sequential data compression, IEEE Transactions on Information Theory 23, 337–343.
This page intentionally left blank
Author Index Berbee, H. C. P., 158 Berenson, M. L., 222 Berger, James O., 243 Berliner, L. Mark, 157 Best, D. J., 179, 192 Best, N. G., 153 Beyer, W. A., 66 Bhanot, Gyan, 143 Bickel, Peter J., 301 Birkes, David, 304 Bl¨ ote, Henk W. J., 260 Blouin, Fran¸cois, 32, 287 Blum, L., 4, 37 Blum, M., 4, 37 Boender, G. E., 158 Bouleau, Nicolas, 99 Boyar, J., 4 Boyett, J. M., 202 Boyle, Phelim P., 98 Braaten, E., 95, 98, 239 Bratley, Paul, 97, 98, 172, 296, 334 Bray, T. A., 173, 174, 176 Brillhart, John, 39 Bromberg, Judith, 198 Brooks, S. P., 146 Brophy, John F., 30 Brown, Morton B., 198 Buckheit, Jonathan B., 299 Buckle, D. J., 196 Burr, Irving W., 194, 195
Abramowitz, Milton, 175, 332 Afflerbach, Lothar, 35, 66, 133 Agarwal, Satish K., 183 Agresti, Alan, 252 Ahn, Hongshik, 188, 204 Ahrens, Joachim H., 125, 132, 173, 177, 179, 188, 218 Akima, Hirosha, 109 Al-Saleh, Jamal A., 183 Ala-Nissila, T., 21, 41, 79, 86, 260 Albert, James, 194 Alonso, Laurent, 219 Altman, N. S., 34 Aluru, Srinivas, 43 Anderson, N. H., 26 Anderson, T. W., 201, 209 Andrews, David F., 300, 301 Antonov, I. A., 96 Arnason, A. N., 205 Arnold, Barry C., 170, 192 Asau, Y., 105, 107 Atkinson, A. C., 66, 180, 183, 193 Avramidis, Athanassios N., 221, 249 Babu, A. J. G., 183 Bacon-Shone, J., 194 Bailey, David H., 44, 91 Bailey, Ralph W., 185 Balakrishnan, N., 203, 223, 327 Banerjia, Sanjeev, 202 Baniuk, L, 205 Banks, David L., 80, 85 Barkema, G. T., 229, 260, 261 Barnard, G. A., 251 Barndorff-Nielsen, Ole E., 193, 270 Bays, Carter, 22 Beaver, Robert J., 170 Beck, J., 97 Becker, P. J., 123, 208 Becker, Richard A., 291 Beckman, Richard J., 249 B´elisle, Claude J. P., 158, 197 Bellhouse, D. R., 219 Bendel, R. B., 200 Bentley, Jon Louis, 212
Cabrera, Javier, 20 Caflisch, Russel E., 243 Cannon, L. E., 105, 106, 107 Carlin, Bradley P., 146, 157, 158, 256 Carlin, John B., 256 Carta, David G., 21 Casella, George, 149, 156, 251, 334 Ceperley, David, 87 Chalmers, C. P., 200 Chamayou, J.-F., 196 Chambers, John M., 196, 291 Chan, Kwok Hung, 52, 53 Chen, H. C., 105, 107 Chen, Huifen, 225
371
372 Chen, James J., 188, 204 Chen, K. S., 194 Chen, Ming-Hui, 157, 158, 256 Chen, Rong, 244, 273 Chen, W. W. L., 97 Cheng, R. C. H., 178, 184, 248 Cheng, Shiow-Wen, 210, 221 Chernick, Michael R., 255 Chib, Siddhartha, 143 Chou, Wun-Seng, 37 Chou, Youn-Min, 194 Cipra, Barry A., 260 Cislak, Peter J., 194 Coldwell, R. L., 20, 71, 87 Collings, Bruce Jay, 46 Compagner, Aaldert, 42 Conover, William J., 249 Cook, Dianne A., 20 Cook, R. Dennis, 209 Cordeau, Jean-Fran¸coise, 21, 67 Couture, Raymond, 32, 36, 287 Coveyou, R. R., 20, 65 Cowles, Mary Kathryn, 146, 158 Crandall, Richard E., 44, 91 Cuccaro, Steven A., 33, 87 Currin, Carla, 257 D’Agostino, Ralph B., 76 Dagpunar, John S., 181, 192, 193, 207, 334 Damien, Paul, 150, 168, 175, 182 David, Herbert A., 222, 227 Davis, Charles S., 198 Davis, Don, 3 Davison, Anthony C., 255 de Freitas, Nando, 234 De Matteis, A., 47, 70 De´ ak, Istv´ an, 127, 197, 334 Delampady, Mohan, 194 Dellaportas, Petros, 151, 158 Deng, Lih-Yuan, 21, 32, 34, 49, 52, 53, 61 Derflinger, Gerhard, 122, 133 Devroye, Luc, 121, 126, 136, 137, 151, 154, 159, 171, 192, 194, 195, 196, 213, 334, vii Dieter, Ulrich, 18, 65, 132, 173, 177, 179, 188, 218 Do, Kim-Anh, 98 Dodge, Yadolah, 43, 304 Donoho, David L., 299 Doucet, Arnaud, 234 Dudewicz, Edward J., 194 Durham, S. D., 22 Dwyer, Rex A., 202 Efron, Bradley, 255 Eichenauer, J¨ urgen, 36, 38, 66
AUTHOR INDEX Eichenauer-Herrmann, J¨ urgen, 37, 38, 66, 70 Emrich, Lawrence J., 203, 204, 214 Epstein, Peter, 213 Erber, T., 45 Ernst, Michael D., 207 Evans, Michael, 233 Everett, P., 45 Everitt, Brian S., 183 Everson, Philip J., 199 Falk, Michael, 207 Fang, Kai-Tai, 7, 47, 97, 201, 209, 334 Faure, H., 95 Feast, G. M., 178 Feiveson, A. H., 199 Fenstermacher, Philip, 3 Ferrenberg, Alan M., 21, 41, 86 Fill, James Allen, 148, 149 Finkel, Raphael Ari, 212 Fisher, N. I., 192 Fishman, George S., 20, 21, 58, 65, 79, 288, 334 Flannery, Brian P., 287 Fleishman, Allen I., 195, 210 Flournoy, Nancy, 233 Forster, Jonathan J., 252 Fouque, Jean-Pierre, 270 Fox, Bennett L., 97, 98, 172, 296, 334 Frederickson, P., 26 Freimer, Marshall, 194 Freund, John E., 123 Friedman, Jerome H., 212 Frigessi, A., 147 Fuller, A. T., 12 Fushimi, Masanori, 41, 288 Gamerman, Dani, 146 Gange, Stephen J., 208 Gelatt, C. D., 259, 278 Gelfand, Alan E., 130, 133, 146, 157, 256 Gelman, Andrew, 146, 150, 233, 256 Geman, Donald, 155 Geman, Stuart, 155 Gennings, Chris, 208 Gentle, James E., 6, 28, 30, 55, 59, 87, 251 Gentleman, Robert, 291 George, E. Olusegun, 49 George, Edward I., 149, 156, 158 Gerontidis, I., 222 Geweke, John, 175, 198, 256 Geyer, Charles J., 154, 157 Ghitany, M. E., 183 Gilks, Walter R., 144, 146, 151, 153, 158, 256 Gleser, Leon Jay, 200 Goldberg, Matthew S., 209, 210
AUTHOR INDEX Golder, E. R., 172, 185 Golomb, S. W., 40, 43 Goodman, A. S., 21, 288 Gordon, J., 37 Gordon, Neil J., 234, 244 Gosset, W. S. (“Student”), 297 Grafton, R. G. T., 78 Greenberg, Edward, 143 Greenwood, J. Arthur, 161, 220 Griffiths, P., 333 Groeneveld, Richard A., 170 Grothe, Holger, 35, 36, 38, 66, 70 Guerra, Victor O., 109 Guihenneuc-Jouyaux, Chantal, 146 Gustafson, John, 43 Haas, Roy W., 193 Halton, J. H., 94 Hamilton, Kenneth G., 177 Hammersley, J. M., 229, 271, 299 Hammond, Joseph L., 209 Hampel, Frank R., 301 Handscomb, D. C., 229, 271, 299 Harris, D. L., 199 Hartley, H. O., 199, 221 Hastings, W. K., 141 Heiberger, Richard M., 201 Hellekalek, Peter, 21, 95, 334 Henson, S., 194 Herrmann, Eva, 38 Hesterberg, Timothy C., 243, 245 Hickernell, Fred J., 99, 334 Hill, I. D., 47, 55, 194, 333 Hill, R., 194 Hinkley, David V., 255 Hiromoto, R., 26 Hoaglin, David C., 300 Hocking, R. R., 199 Holder, R. L., 194 Hope, A. C. A., 251 Hopkins, T. R., 65 H¨ ormann, Wolfgang, 122, 133, 152, 159 Hosack, J. M., 45 Huber, Peter J., 20, 301 Hull, John C., 264, 268 Hultquist, Robert A., 208 Ibrahim, Joseph G., 256 Ickstadt, K., 37 Ihaka, Ross, 3, 291 Ireland, Kenneth, 7, 9, 12 J¨ ackel, Peter, 97, 100, 270 Jaditz, Ted, 44 James, F., 20, 45, 58 J¨ ohnk, M. D., 183 Johnson, Mark E., 197, 209
373 Johnson, Norman L., 195, 203, 327 Johnson, P. W., 45 Johnson, Valen E., 146 Jones, G., 208 Jordan, T. L., 26 Joy, Corwin, 98 Juneja, Sandeep, 225 Kachitvichyanukul, Voratas, 187, 188, 189, 210, 221, 246 Kahn, H., 239 Kankaala, K., 21, 41, 79, 86, 260 Kao, Chiang, 33 Karian, Zaven A., 194 Kato, Takashi, 38, 78 Kemp, Adrienne W., 108, 118, 159, 188, 190 Kemp, C. D., 159, 187, 188 Kennedy, William J., 201 Kinderman, A. J., 129, 173, 185 Kirkpatrick, Scott, 41, 259, 278, 287 Kleijnen, Jack P. C., 310 Knuth, Donald E., 12, 32, 37, 53, 65, 118, 219, 334 Kobayashi, K., 183 Kocis, Ladislav, 95 Koehler, J. R., 257 Kollia, Georgia, 194 Kotz, Samuel, 203, 327 Kovalenko, I. N., 79 Kozubowski, Tomasz J., 207 Krawczyk, Hugo, 4 Krommer, Arnold R., 95 Kronmal, Richard A., 125, 135, 136, 191 Kumada, Toshihiro, 39 Kurita, Yoshiharu, 39, 41, 42 Lagarias, Jeffrey C., 4 Lai, C. D., 208 Lal, R., 178 Landau, D. P., 21, 41, 86 Larcher, Gerhard, 334 Laud, Purushottam W., 150, 183 Lawrance, A. J., 11 Le Roux, N. J., 123 Learmonth, G. P., 21, 46, 291 L’Ecuyer, Pierre, 14, 21, 29, 32, 36, 37, 41, 47, 48, 55, 57, 63, 65, 67, 80, 85, 287, 334 Lee, A. J., 205 Leeb, Hannes, 37, 39 Lehmer, D. H., 11 Lehn, J¨ urgen, 36, 38 Lempel, A., 84 L´epingle, Dominique, 99 Leva, Joseph L., 174
374 Lewis, P. A. W., 21, 46, 55, 58, 225, 288, 291, 334 Lewis, T. G., 40, 41 Leydold, Josef, 132, 133, 153, 159 Li, Jing, 30 Li, Kim-Hung, 219 Li, Run-Ze, 97, 201 Li, Shing Ted, 209 Liao, J. G., 190 Lin, Dennis K. J., 21, 32, 34, 49, 61 Lin, Thomas C., 194 Liu, Jun S., 144, 230, 244, 273, 334 Logvinenko, Tanya, 273 London, Wendy B., 208 Louis, Thomas A., 256 Luby, Michael, 3, 4 Lurie, D., 221, 222 Lurie, Philip M., 209, 210 L¨ uscher, Martin, 45 MacEachern, Steven N., 157 Machida, Motoya, 149 MacLaren, M. D., 21, 46, 173, 174, 176 MacPherson, R. D., 20, 65 Mallows, C. L., 196 Manly, Bryan F. J., 252 Marasinghe, Mervyn G., 201 Marinari, Enzo, 261 Marriott, F. H. C., 251 Marsaglia, George, 14, 17, 20, 21, 35, 43, 46, 49, 66, 79, 80, 83, 85, 105, 117, 118, 121, 127, 154, 173, 174, 175, 176, 185, 200, 202 Marsaglia, John C. W., 174 Marshall, A. W., 239 Marshall, Albert W., 49, 207 Martinelli, F., 147 Mascagni, Michael, 33, 53, 87 Mason, R. L., 222 Matsumoto, Makoto, 39, 41, 42 Maurelli, Vincent A., 210 Maurer, Ueli M., 84 McCullough, B. D., 83, 291 McDonald, John W., 252 McDonald, Patrick, 194 McKay, Michael D., 249 McLeod, A. I., 219 Meeker, William Q., 170 Mendoza-Blanco, Jos´e R., 186 Meng, Xiao-Li, 233 Mengersen, Kerrie L., 146 Metropolis, N., 140, 259, 277 Meyer, D., 194 Meyn, S. P., 137, 225 Michael, John R., 193 Mickey, M. R., 200 Mihram, George Arthur, 208
AUTHOR INDEX Miller, J. M., 21, 288 Miller, Keith W., 20, 28, 61, 86, 288 Mitchell, Toby J., 248, 257 Modarres, R., 208 Møller, Jesper, 148 Monahan, John F., 129, 185 Moore, Louis R., III, 20, 21, 58, 65, 79, 288 Morgan, B. J. T., 334 Morris, Carl N., 199 Morris, Max, 257 Moskowitz, Bradley, 243 Mudholkar, Govind S., 194 Murdoch, Duncan J., 149 Nagaraja, H. N., 222 Neal, N. G., 153 Neal, Radford M., 155 Neave, H. R., 172, 185 Nelson, Barry L., 245 Newman, M. E. J., 229, 260, 261 Niederreiter, Harald, 35, 36, 37, 38, 66, 94, 97, 98, 100, 296, 334 Nishimura, Takuji, 42 Nolan, John P., 196, 208 Norman, J. E., 105, 106, 107 Odell, P. L., 199 Ogata, Yosihiko, 233 Oh, Man-Suk, 243 ¨ Okten, Giray, 99, 239 Oldham, Keith B., 332 Olken, Frank, 219 Olkin, Ingram, 49, 200, 201, 207 Orav, E. J., 55, 58, 334 Owen, Art B., 239, 249, 257 Pagnutti, S., 47, 70 Papageorgiou, A., 97 Papanicolaou, George, 270 Parisi, G., 261 Park, Chul Gyu, 204, 214 Park, Stephen K., 20, 28, 61, 86, 288 Park, Tasung, 204, 214 Parrish, Rudolph F., 208, 210 Patefield, W. M., 202, 203 Payne, W. H., 40, 41 Pearce, M. C., 180 Pearson, E. S., 195 Perlman, Michael D., 274 Peterson, Arthur V., 125, 135, 136, 191 Philippe, Anne, 181, 182 Piedmonte, Marion R., 203, 204, 214 Podg´ orski, Krzysztof, 207 Polasek, Wolfgang, 194 Prabhu, G. M., 43 Prasad, M. A., 97
AUTHOR INDEX Pratt, John W., 151 Press, William H., 287 Propp, James Gary, 147, 219 Pryor, Daniel V., 33, 87 Pullin, D. I., 248 Rabinowitz, M., 222 Rajasekaran, Sanguthevar, 119 Ramage, J. G., 173 Ramberg, John S., 194 Ramgopal, Paul, 183 Ratnaparkhi, M. V., 208 Rayner, J. C. W., 208 Reeder, H. A., 221, 222 Relles, Daniel A., 187 Richardson, S., 144, 146 Rinnooy Kan, A. H. G., 158 Ripley, Brian D., 334 Robert, Christian P., 146, 175, 251, 334 Roberts, Gareth O., 144, 146, 158, 256 Robertson, J. M., 275 Robinson, M. L., 33 Rogers, W. H., 301 Romeijn, H. Edwin, 158, 197 Ronning, Gerd, 208 Roof, R. B., 66 Rosen, Michael, 7, 9, 12 Rosen, Ori, 190 Rosenbaum, Paul R., 219 Rosenbluth, A. W., 140, 259, 277 Rosenbluth, M. N., 140, 259, 277 Rosenthal, Jeffrey S., 146, 149 Ross, Keith W., 119 Rotem, Doron, 219 Roux, J. J. J., 123, 208 Rubin, Donald B., 146, 149, 256 Ryan, T. P., 201 Saarinen, J., 86 Sack, J¨ org-R¨ udiger, 213 Sacks, Jerome, 248, 257 Sahu, Sujit K., 146 Saleev, V. M., 96 Salmond, D. J., 244 Sandhu, R. A., 223 Sarkar, P. K., 97 Sarkar, Tapas K., 178 S¨ arndal, Carl-Erik, 218, 227, 239, 241 Schafer, J. L., 251 Scheffer, C. L., 158 Schervish, Mark J., 157, 158 Schladitz, Katja, 148 Schmeiser, Bruce W., 157, 158, 178, 183, 187, 188, 189, 194, 210, 221, 225, 246 Schott, Ren´e, 219 Schrage, Linus E., 172, 334
375 Schucany, William R., 193, 221 Selke, W., 41, 86 Sendrier, Nicolas, 4 Settle, J. G., 172, 185 Seznec, Andr´e, 4 Shahabudding, Perwez, 225 Shao, Jun, 255 Shao, Qi-Man, 256 Shaw, J. E. H., 98 Shchur, Lev N., 41, 86, 260 Shedler, G. S., 225 Shephard, Neil, 193, 270 Shin, Dong Wan, 204, 214 Shiue, Peter Jau-Shyong, 334 Shub, M., 4, 37 Sibuya, M., 161 Simard, Richard, 14, 21, 67 Sinclair, C. D., 76 Sircar, K. Ronnie, 270 Smith, Adrian F. M., 130, 133, 150, 151, 157, 183, 244, 256 Smith, B., 26 Smith, Peter W. F., 252 Smith, Philip W., 30 Smith, Richard L., 222 Smith, Robert L., 158, 197 Smith, W. B., 199 Sobol’, I. M., 94 Spanier, Jerome, 332, 334 Spiegelhalter, David J., 144, 146, 256 Spurr, B. D., 76 Srinivasan, Ashok, 53, 87 Stacy, E. W., 182 Stadlober, Ernst, 130, 131, 132, 187, 189 Stander, J., 147 Steel, S. J., 123 Stef˘ anescu, S., 133 Stegun, Irene A., 175, 332 Stein, Michael, 249 Stephens, Michael A., 76 Stern, Hal S., 256 Stewart, G. W., 201 Stigler, Stephen M., 297 Stoll, Erich P., 41, 287 Stuck, B. W., 196 Sullivan, Stephen J., 89 Swartz, Tim, 233 Swensson, Bengt, 218, 227, 239, 241 Tadikamalla, Pandu R., 178, 195 Takahasi, K., 208 Talapov, A. L., 41, 86 Tan, K. K. C., 153 Tan, Ken Seng, 98 Tang, Boxin, 249 Tang, H. C., 33 Tanner, Martin A., 157, 201
376 Tapia, Richard A., 109 Tausworthe, R. C., 38 Taylor, Malcolm S., 212, 289 Telgen, J., 158 Teller, A. H., 140, 259, 277 Teller, E., 140, 259, 277 Teukolsky, Saul A., 287 Tezuka, Shu, 36, 47, 48, 97, 100, 172, 334 Thisted, Ronald A., 201 Thomas, Andrew, 256 Thompson, Elizabeth A., 154 Thompson, James R., 109, 212, 270, 289 Thompson, William J., 332 Tibshirani, Robert J., 255 Tierney, Luke, 137, 139, 144 Titterington, D. M., 26 Traub, J. F., 97 Tsang, Wai Wan, 127, 154, 174 Tsay, Liang-Huei, 79 Tsutakawa, Robert K., 233 Tu, Dongsheng, 255 Tu, Xin M., 186 Tukey, John W., 301 Turner, S., 194 Tweedie, R. L., 137, 225 Ueberhuber, Christoph W., 95 Underhill, L. G., 201 V˘ aduva, I., 133 Vale, C. David, 210 Vattulainen, I., 21, 41, 79, 86, 87, 260 Vecchi, M. P., 259, 278 Vetterling, William T., 287 Vitter, Jeffrey Scott, 218, 219 Von Neumann, J., 121 Vose, Michael D., 135 Wakefield, J. C., 130, 133 Walker, A. J., 133
AUTHOR INDEX Walker, Stephen G., 168, 175, 182 Wallace, C. S., 121, 174 Wang, J., 49 Wang, Yuan, 7, 47, 97 Warnock, T., 26 Wegenkittl, Stefan, 37, 38 Welch, William J., 248, 257 Weller, G., 95, 98, 239 Whiten, William J., 95 Wichmann, B. A., 47, 55 Wichura, Michael J., 274 Wikramaratna, R. S., 45 Wild, P., 151 Wilks, Allan R., 291 Williamson, D., 66 Wilson, David Bruce, 147, 219 Wilson, James R., 221, 249 Wolfram, Stephen, 44 Wollan, Peter C., 52 Wong, Wing Hung, 157, 244 Wong, Y. Joanna, 21, 41, 86 Wood, G. R., 275 Wretman, Jan, 218, 227, 239, 241 Wu, Li-ming, 38, 78 Wu, Pei-Chi, 13 Wynn, Henry P., 248, 257 Yanagihara, Niro, 38, 78 Ylvisaker, Don, 257 Yu, Bin, 146 Yuan, Yilian, 49, 52, 53 Zaman, Arif, 35, 174 Zaremba, S. K., 7 Zeisel, H., 47 Zierler, Neal, 39 Ziff, Robert M., 41, 287 Zinterhof, Peter, 334 Ziv, J., 84
Subject Index bootstrap, parametric 254 Buffon needle problem 274 BUGS (software) 256 Burr distribution 194 Burr family of distributions 208
acceptance/complement method 125 acceptance/rejection method 113, 227 ACM Transactions on Mathematical Software 284, 332, 335 ACM Transactions on Modeling and Computer Simulation 332 ACORN congruential generator 45 adaptive direction sampling 158 adaptive rejection sampling 151 add-with-carry random number generator 35 additive congruential random number generator 11 alias method 133 alias-urn method 136 almost exact inversion 121 alternating conditional sampling 157 AMS MR classification system 332 analysis of variance 238 Anderson–Darling test 75 antithetic variates 26, 246 Applied Statistics 284, 332, 334 ARMA model 226 ARS (adaptive rejection sampling) 151 AWC random number generator 35
C (programming language) 283 CALGO (Collected Algorithms of the ACM) 332, 335 Cauchy distribution 191 CDF (cumulative distribution function) 102, 316 cellular automata 44 censored data, simulating 223 censored observations 168, 180 CFTP (coupling from the past) 147, 148 chaotic systems 45 characteristic function 136 Chebyshev generator 45 chi distribution 185 chi-squared distribution 180, 184 chi-squared test 74 chop-down method 108, 190 cluster algorithm 259 Collected Algorithms of the ACM (CALGO) 332, 335 combined multiple recursive generator 48, 287 common variates 246 Communications in Statistics — Simulation and Computation 333 complete beta function 321 complete gamma function 320 COMPSTAT 331, 333 Computational Statistics & Data Analysis 333 Computational Statistics 333 Computing Science and Statistics 333 concave density 119, 150 congruential random number generator 11 constrained random walk 234, 273 constrained sampling 248 contaminated distribution 169 control variate 245 convex density 151
ball, generating random points in 202 batch means for variance estimation 237 Bernoulli distribution 105, 203 Bernoulli sampling 217 beta distribution 183 beta function 321 beta-binomial distribution 187, 204 Beyer ratio 66 binary matrix rank test 81 binary random variables 105, 203 binomial distribution 187 birthday spacing test 81 bit stream test 81 bit stripping 10, 13, 22 blocks, simulation experiments 51 Blum/Blum/Shub random number generator 37 Boltzmann distribution 258 bootstrap, nonparametric 253
377
378 correlated random variables 123 correlated random variables, generation 210, 221 correlation matrices, generating random ones 199 coupling from the past 147, 148 craps test 83 crude Monte Carlo 232 cryptography 3, 4, 37, 334 cumulative distribution function 316 Current Index to Statistics 332 cycle length of random number generator 3, 11, 22 D-distribution 183 d-variate uniformity 63 data augmentation 157 data-based random number generation 212, 289 DIEHARD tests for random number generators 80, 291 Dirac delta function 319 Dirichlet distribution 205 Dirichlet-multinomial distribution 206 discrepancy 69, 93 discrete uniform distribution 105, 217 DNA test for random numbers 82 double exponential distribution 177, 207 ECDF (empirical cumulative distribution function) 74, 210, 316 economical method 127 eigenvalues, generating ones from random Wishart matrices 201 elliptically contoured distribution 197, 207, 208 empirical cumulative distribution function 74, 316 empirical test 71 entropy 68 envelope 114 equidistributed 63 equivalence relationship 8 Erlang distribution 180 Euler totient function 9, 12 exact-approximation method 121 exact sampling 147, 148 exponential distribution 176 exponential power distribution 178 extended hypergeometric distribution 190 extended gamma processes 183 Faure sequence 94, 95 feedback shift register generator 38 Fibonacci random number generator 33 finite field 9 fixed-point representation 10
SUBJECT INDEX folded distributions 169 Fortran 95 283 Galois field 9, 38 gamma distribution 178, 208 gamma distribution, bivariate extension 208 gamma function 320 GAMS (Guide to Available Mathematical Software) 285, 335 GAMS, electronic access 335 GARCH model 226 generalized gamma distributions 182, 195 generalized inverse Gaussian distribution 193 generalized lambda family of distributions 194 geometric distribution 189 geometric splitting 241 GFSR (method) 38 Gibbs distribution 258 Gibbs method 149, 155, 256 GIS (geographic information system) 219 GNU Scientific Library (GSL) 287 goodness-of-fit test 74, 75 Google (Web search engine) 335 Gray code 96, 98 GSL (GNU Scientific Library) 287 halfnormal distribution 176 Halton sequence 94 Hamming weight 14 Hastings method 141 hat function 114 HAVEGE 4 Heaviside function 319 heavy-tailed distribution 196 hit-and-run method 157, 197 hit-or-miss Monte Carlo 116, 121, 232, 243, 271 hotbits 2 hybrid generator 98, 239 hypergeometric distribution 189 importance sampling 241, 271 importance-weighted resampling 149 IMSL Libraries 284, 288 incomplete beta function 321 incomplete gamma function 321 independence sampler 144 independent streams of random numbers 51 indicator function 319 infinitely divisible distribution 150 instrumental density 114 Interface Symposium 331, 333
SUBJECT INDEX International Association of Statistical Computing (IASC) 331, 333 interrupted sequence 230, 286, 290, 293 inverse CDF method for truncated distributions 168 inverse CDF method 102 inverse chi-squared distribution 169 “inverse” distributions 169 inverse gamma distribution 169 inverse Gaussian distribution 193 inverse Wishart distribution 169 inversive congruential generator 36 irreducible polynomial 38 Ising model 258 iterative method for random number generation 139, 155 Johnson family of distributions 194 Journal of Computational and Graphical Statistics 333 Journal of Statistical Computation and Simulation 333 k-d tree 212 Kepler conjecture 215 KISS (generator) 46 Kolmogorov distance 75 Kolmogorov–Smirnov test 74, 75 lagged Fibonacci generator 33 Lahiri’s sampling method 227 lambda family of distributions 194 Landau distribution 196 Laplace distribution 177, 207 Latin hypercube sampling 248 lattice test for random number generators 20, 66 leaped Halton sequence 95 leapfrogging, in random number generation 24, 43, 52 Lehmer congruential random number generator 11 Lehmer sequence 11 Lehmer tree 26 linear congruential random number generator 11 linear density 118 log-concave distributions 150 logarithmic distribution 190 lognormal distribution 176 Lorentzian distribution 191 M(RT)2 algorithm 259 machine epsilon 7 majorizing density 114, 203 Markov chain Monte Carlo 139, 144, 146, 156, 256
379 Markov chain 137 Markov process 224 Mathematical Reviews 332 Matlab (software) 284 matrix congruential generator 34 matrix congruential generator, multiple recursive 35 MCMC (Markov chain Monte Carlo) 139, 144, 146, 156, 256 Mersenne prime 13 Mersenne twister 42, 287 Metropolis algorithm 259 Metropolis–Hastings method 141, 156 Metropolis-Hastings method 256 “minimal standard” generator 13, 20, 21, 28, 61, 86 minimum distance test 82 Minkowski reduced basis 66 mixture distributions 110, 169, 248 modular arithmetic 7 Monte Carlo evaluation of an integral 231 Monte Carlo experimentation 297 Monte Carlo study 297 Monte Carlo test 251 MR classification system 332 MT19937 (generator) 42, 287 multinomial distribution 198 multiple recursive random number generator 32, 35 multiplicative congruential random number generator 11 multiply-with-carry random number generator 36 multivariate distributions 197, 212 multivariate double exponential distribution 207 multivariate gamma distribution 208 multivariate hypergeometric distribution 207 multivariate Laplace distribution 207 multivariate normal distribution 197 multivariate stable distribution 208 nearest neighbors 212 nearly linear density 118 negative binomial distribution 188 netlib 285, 332, 335, vii Niederreiter sequence 94, 98 NIST Test Suite, for random number generators 83 noncentral hypergeometric distribution 190 noncentral Wishart distribution 200 nonhomogeneous Poisson process 225 nonlinear congruential generator 37 nonparametric bootstrap 253 norm, function 231 normal distribution 171
380 normal number 43, 91 one-way function 3 order of random number generator 3, 32 order statistics, generating random 221 Ornstein-Uhlenbeck process 264 orthogonal matrices, generating random ones 201 overdispersion 204 overlapping pairs test 81 overlapping permutation test 81 overlapping quadruples test 82 overlapping sums test 83 parallel processing 43, 51, 52 parallel random number generation 51 parametric bootstrap 254 Pareto distribution 192 Pareto-type distribution 196 parking lot test 82 particle filtering 234 Pascal distribution 188 patchwork method 118 Pearson family of distributions 194, 208 perfect sampling 147 period of random number generator 3, 11, 22, 220 permutation, generating random ones 217 π as a source of random numbers 44, 91 Poisson distribution 188 Poisson process, generating a random one 177 Poisson process, nonhomogeneous 225 Poisson sampling 218 portability of software 28, 54, 102, 122, 167 Potts model 260 primitive element 12 primitive polynomial 96 primitive root 12 probabilistic error bound 233, 235 probability-skewed distribution 170 Proceedings of the Statistical Computing Section (of the ASA) 333 projection pursuit 20 quasi-Monte Carlo method 93 quasirandom sequence 4, 94 R (software) 284, 291 R250 (generator) 41, 287 Random Master 3 random number generator, congruential 11 random number generator, feedback shift method 38 random number generator, parallel 51 random number generator, testing 71
SUBJECT INDEX random sampling 217 RANDU (generator) 18, 58, 87 rand 55, 285 RANLUX (generator) 45, 287 Rao-Blackwellization 247 ratio-of-uniforms method 129, 178, 185 Rayleigh distribution 191 rectangle/wedge/tail method 173, 177 reproducible research 286, 299 resampling 252 reservoir sampling 218 residue 8 robustness studies 169, 195, 298 roughness of a function 231 runs test 77, 83, 84 S, S-Plus (software) 284, 291 sampling, random 217 sampling/importance resampling 149 second-order test 71 seed 3, 11, 24, 26, 286, 290, 292 self-avoiding random walk 234, 273 sequential importance sampling 244 sequential Monte Carlo 233 serial test 78 setup time 165 shuffled random number generator 22, 46 SIAM Journal on Scientific Computing 333 side effect 285 simple random sampling 217 simplex 213 simulated annealing 140, 259, 277 simulated tempering 154, 261 simulation 1, 146, 297 SIR (sampling/importance resampling) 149 skew-normal distribution 170 skewed distributions 170 smoothed acceptance/rejection method, for random number generation 243 smoothing parameter 212 smoothing 212 Sobol’ sequence 94, 96, 98 software engineering 285 spanning trees, generating random ones 219 spectral test for random number generators 20, 65 sphere, generating random points on a sphere 201 SPRNG, software for parallel random number generation 53, 296 squeeze test 83 squeeze, in acceptance/rejection 117, 132 stable distribution 196, 208 standard distribution 167
SUBJECT INDEX Statistical Computing Section of the American Statistical Association 331, 333 Statistical Computing & Graphics Newsletter 333 Statistics and Computing 333 statlib 285, 333, 334, vii stratified distribution 110 stratified sampling 241 strict reproducibility 28, 54, 122, 230 Student’s t distribution 185 substitution sampling 157 substreams 23, 33, 43, 51 subtract-with-borrow random number generator 35 Super-Duper (generator) 46 SWC random number generator 35 Swendsen–Wang algorithm 259 swindle, Monte Carlo 240 T -concave distributions 153, 159 table, generating random tables with fixed marginals 202 table-lookup method 105 Tausworthe random number generator 38 tempered transition 155 test suites 79 testing random number generators 71 TestU01 tests for random number generators 80 thinning method 225 3-D sphere test 82 transcendental numbers as a source of random numbers 44 transformed density rejection method 153 transformed rejection method 121
381 truncated distribution 168, 223 truncated gamma distribution 180, 181, 182 truncated normal distribution 175, 198 twisted GSFR generator 42 twos-complement representation 10 underdispersion 204 uniform time algorithm 166 universal methods 102 unpredictable 4, 37 urn method 105, 136 van der Corput sequence 94 variance estimation 237 variance reduction 26, 239 variance-covariance matrices, generating random ones 199 Vavilov distribution 196 von Mises distribution 193 Wald distribution 193 Weibull distribution 186 weight window 241 weighted resampling 149 Wichmann/Hill random number generator 47, 59 Wilson–Hilferty approximation 175 Wishart distribution 199 Wolff algorithm 259 zeta distribution 192 ziggurat method 127, 174 Zipf distribution 192
J. Chambers w. Eddy W.W l e S. Sheather L.Tiemey
Springer Y~rk Berlin Heidelberg Hang Kong London
Milan Perfs
Tokyo
Statistics and Computing Dalgomd: lnboductoryStatistics with R. Gentle. Elemak ofComputational Stptistics. OenfIe: Numerical Linear Algebra for Applications m Statistics. Oentle Random N& omaation andMonte &lo Mahods, 2nd Editim. Hcr*dwMWwlach: XploRe: An Intnactive Statistical Computing Bnvirommt RiOUFPN)Iron:me Basics of S and S-Pws,3rd Edition Lmge: NNllmrkal Analysis for Statisticians. b&r:Local Regrnsion and Lilihcd bRurmcrldh/Fibgemld.Numrical Baycsisn Mcmads Applied to Signal Roassing. Pluvrallw: VARIOWIN: Softwan for Spatial Data Analysis in 2D. PinheirOlBau1: Mixed-Effcds Models in S and S - h u s venabk.dRiy,l~: Modem ~ppliedStatisticswith S,4th ~ t l o n . venabler/Riprey: s ProgmEhg. . WWnmn: me Ibeof Graphics.
James E. Gentle
Random Number Generation and Monte Carlo Methods Second Edition With 54 Illustrati~ns
Springer
James H. Gentle School of Computational Sciences George Mason University Fairfax. VA 22030-4444 USA j gen [email protected] Series
Editors:
J. Chambers Bell Labs, Lucent Techonologies 600 Mountain Avenue Murray Hill. NJ 07974 USA
W. Eddy Department of Statistics Carnegie Mellon University Pittsburgh, PA USA
S. Sheather Australian Graduate School of Management University of New South Wales Sydney, NSW 2052 Australia
L. Tiemey Sclool of Statistics and Actuarial Science Universily of Iowa lowa City. IA 52242-1414 USA
W, Hardle Institut fiir Slatistik und Okonnmetrie Humboldt-University Spandaucr .Str. I D-10178 Berlin Germany
Library of Congress Cataloging-in-Puhlication Data Gentle, James E. 1943-Random number generation and Monte Carlo methods / James H. Gentle. p. cm. — (Statistics and computing) Includes bibliographical references and index ISBN 0-3S7-OOI78-6 (alk, paper) 1. Monte (,Carlo method. 2. Random number generators. I. Title. [I. Series. (QA2298 ,(G46 2003 519 .2'82—dc21 2003042437 ISBN 0-387-0017-6
e-ISBN
0-387-21610
Printed on acid-free paper
CO 2003,3l')'1998Springer Science Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in pan without the written permission of the publisher (Springer Science Business Media, Inc., 233 Spring Strcoi, New York, NY 10013, USA), except for brief excerpts in connection wish reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in (his publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject In proprietary rights. Printed in the United States of America. 9 S 7 6 5 4 3 2 springeronline.cnm
(HID
Corrected second printing, 2005.
SPIN 11016038
To Maria
This page intentionally left blank
Preface The role of Monte Carlo methods and simulation in all of the sciences has increased in importance during the past several years. This edition incorporates discussion of many advances in the field of random number generation and Monte Carlo methods since the appearance of the first edition of this book in 1998. These methods play a central role in the rapidly developing subdisciplines of the computational physical sciences, the computational life sciences, and the other computational sciences. The growing power of computers and the evolving simulation methodology have led to the recognition of computation as a third approach for advancing the natural sciences, together with theory and traditional experimentation. At the kernel of Monte Carlo simulation is random number generation. Generation of random numbers is also at the heart of many standard statistical methods. The random sampling required in most analyses is usually done by the computer. The computations required in Bayesian analysis have become viable because of Monte Carlo methods. This has led to much wider applications of Bayesian statistics, which, in turn, has led to development of new Monte Carlo methods and to refinement of existing procedures for random number generation. Various methods for generation of random numbers have been used. Sometimes, processes that are considered random are used, but for Monte Carlo methods, which depend on millions of random numbers, a physical process as a source of random numbers is generally cumbersome. Instead of “random” numbers, most applications use “pseudorandom” numbers, which are deterministic but “look like” they were generated randomly. Chapter 1 discusses methods for generation of sequences of pseudorandom numbers that simulate a uniform distribution over the unit interval (0, 1). These are the basic sequences from which are derived pseudorandom numbers from other distributions, pseudorandom samples, and pseudostochastic processes. In Chapter 1, as elsewhere in this book, the emphasis is on methods that work. Development of these methods often requires close attention to details. For example, whereas many texts on random number generation use the fact that the uniform distribution over (0, 1) is the same as the uniform distribution over (0, 1] or [0, 1], I emphasize the fact that we are simulating this disvii
viii
PREFACE
tribution with a discrete set of “computer numbers”. In this case whether 0 and/or 1 is included does make a difference. A uniform random number generator should not yield a 0 or 1. Many authors ignore this fact. I learned it over twenty years ago, shortly after beginning to design industrial-strength software. The Monte Carlo methods raise questions about the quality of the pseudorandom numbers that simulate physical processes and about the ability of those numbers to cover the range of a random variable adequately. In Chapter 2, I address some issues of the quality of pseudorandom generators. Chapter 3 describes some of the basic issues in quasirandom sequences. These sequences are designed to be very regular in covering the support of the random process simulated. Chapter 4 discusses general methods for transforming a uniform random deviate or a sequence of uniform random deviates into a deviate from a different distribution. Chapter 5 describes methods for some common specific distributions. The intent is not to provide a compendium in the manner of Devroye (1986a) but, for many standard distributions, to give at least a simple method or two, which may be the best method, but, if the better methods are quite complicated, to give references to those methods. Chapter 6 continues the developments of Chapters 4 and 5 to apply them to generation of samples and nonindependent sequences. Chapter 7 considers some applications of random numbers. Some of these applications are to solve deterministic problems. This type of method is called Monte Carlo. Chapter 8 provides information on computer software for generation of random variates. The discussion concentrates on the S-Plus, R, and IMSL software systems. Monte Carlo methods are widely used in the research literature to evaluate properties of statistical methods. Chapter 9 addresses some of the considerations that apply to this kind of study. I emphasize that a Monte Carlo study uses an experiment, and the principles of scientific experimentation should be observed. The literature on random number generation and Monte Carlo methods is vast and ever-growing. There is a rather extensive list of references beginning on page 336; however, I do not attempt to provide a comprehensive bibliography or to distinguish the highly-varying quality of the literature. The main prerequisite for this text is some background in what is generally called “mathematical statistics”. In the discussions and exercises involving multivariate distributions, some knowledge of matrices is assumed. Some scientific computer literacy is also necessary. I do not use any particular software system in the book, but I do assume the ability to program in either Fortran or C and the availability of either S-Plus, R, Matlab, or Maple. For some exercises, the required software can be obtained from either statlib or netlib (see the bibliography). The book is intended to be both a reference and a textbook. It can be
PREFACE
ix
used as the primary text or a supplementary text for a variety of courses at the graduate or advanced undergraduate level. A course in Monte Carlo methods could proceed quickly through Chapter 1, skip Chapter 2, cover Chapters 3 through 6 rather carefully, and then, in Chapter 7, depending on the backgrounds of the students, discuss Monte Carlo applications in specific fields of interest. Alternatively, a course in Monte Carlo methods could begin with discussions of software to generate random numbers, as in Chapter 8, and then go on to cover Chapters 7 and 9. Although the material in Chapters 1 through 6 provides the background for understanding the methods, in this case the details of the algorithms are not covered, and the material in the first six chapters would only be used for reference as necessary. General courses in statistical computing or computational statistics could use the book as a supplemental text, emphasizing either the algorithms or the Monte Carlo applications as appropriate. The sections that address computer implementations, such as Section 1.2, can generally be skipped without affecting the students’ preparation for later sections. (In any event, when computer implementations are discussed, note should be taken of my warnings about use of software for random number generation that has not been developed by software development professionals.) In most classes that I teach in computational statistics, I give Exercise 9.3 in Chapter 9 (page 311) as a term project. It is to replicate and extend a Monte Carlo study reported in some recent journal article. In working on this exercise, the students learn the sad facts that many authors are irresponsible and many articles have been published without adequate review.
Acknowledgments I thank John Kimmel of Springer for his encouragement and advice on this book and other books on which he has worked with me. I thank Bruce McCullough for comments that corrected some errors and improved clarity in a number of spots. I thank the anonymous reviewers of this edition for their comments and suggestions. I also thank the many readers of the first edition who informed me of errors and who otherwise provided comments or suggestions for improving the exposition. I thank my wife Mar´ıa, to whom this book is dedicated, for everything. I did all of the typing, programming, etc., myself, so all mistakes are mine. I would appreciate receiving suggestions for improvement and notice of errors. Notes on this book, including errata, are available at http://www.science.gmu.edu/~jgentle/rngbk/
Fairfax County, Virginia
James E. Gentle April 10, 2003
This page intentionally left blank
Contents Preface
vii
1 Simulating Random Numbers from a Uniform Distribution 1.1 Uniform Integers and an Approximate Uniform Density . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Simple Linear Congruential Generators . . . . . . . . . . . . . . 1.2.1 Structure in the Generated Numbers . . . . . . . . . . . 1.2.2 Tests of Simple Linear Congruential Generators . . . . . 1.2.3 Shuffling the Output Stream . . . . . . . . . . . . . . . 1.2.4 Generation of Substreams in Simple Linear Congruential Generators . . . . . . . . . . . . . . . . . . 1.3 Computer Implementation of Simple Linear Congruential Generators . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Ensuring Exact Computations . . . . . . . . . . . . . . 1.3.2 Restriction that the Output Be in the Open Interval (0, 1) . . . . . . . . . . . . . . . . . . . . 1.3.3 Efficiency Considerations . . . . . . . . . . . . . . . . . 1.3.4 Vector Processors . . . . . . . . . . . . . . . . . . . . . . 1.4 Other Linear Congruential Generators . . . . . . . . . . . . . . 1.4.1 Multiple Recursive Generators . . . . . . . . . . . . . . 1.4.2 Matrix Congruential Generators . . . . . . . . . . . . . 1.4.3 Add-with-Carry, Subtract-with-Borrow, and Multiply-with-Carry Generators . . . . . . . . . . . . . 1.5 Nonlinear Congruential Generators . . . . . . . . . . . . . . . . 1.5.1 Inversive Congruential Generators . . . . . . . . . . . . 1.5.2 Other Nonlinear Congruential Generators . . . . . . . . 1.6 Feedback Shift Register Generators . . . . . . . . . . . . . . . . 1.6.1 Generalized Feedback Shift Registers and Variations . . 1.6.2 Skipping Ahead in GFSR Generators . . . . . . . . . . . 1.7 Other Sources of Uniform Random Numbers . . . . . . . . . . 1.7.1 Generators Based on Cellular Automata . . . . . . . . . 1.7.2 Generators Based on Chaotic Systems . . . . . . . . . . 1.7.3 Other Recursive Generators . . . . . . . . . . . . . . . . xi
1 . . . . .
5 11 14 20 21
. 23 . 27 . 28 . . . . . .
29 30 30 31 32 34
. . . . . . . . . . .
35 36 36 37 38 40 43 43 44 45 45
xii
CONTENTS 1.7.4 Tables of Random Numbers . . . . . . . . . . . . . . . . 1.8 Combining Generators . . . . . . . . . . . . . . . . . . . . . . . 1.9 Properties of Combined Generators . . . . . . . . . . . . . . . . 1.10 Independent Streams and Parallel Random Number Generation 1.10.1 Skipping Ahead with Combination Generators . . . . . 1.10.2 Different Generators for Different Streams . . . . . . . . 1.10.3 Quality of Parallel Random Number Streams . . . . . . 1.11 Portability of Random Number Generators . . . . . . . . . . . 1.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
46 46 48 51 52 52 53 54 55 56
2 Quality of Random Number Generators 2.1 Properties of Random Numbers . . . . . . . . . . . . 2.2 Measures of Lack of Fit . . . . . . . . . . . . . . . . 2.2.1 Measures Based on the Lattice Structure . . 2.2.2 Differences in Frequencies and Probabilities . 2.2.3 Independence . . . . . . . . . . . . . . . . . . 2.3 Empirical Assessments . . . . . . . . . . . . . . . . . 2.3.1 Statistical Goodness-of-Fit Tests . . . . . . . 2.3.2 Comparisons of Simulated Results with Statistical Models in Physics . . . . . . . . . 2.3.3 Anecdotal Evidence . . . . . . . . . . . . . . 2.3.4 Tests of Random Number Generators Used in 2.4 Programming Issues . . . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
61 62 64 64 67 70 71 71
. . . . . . . . . . Parallel . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
86 86 87 87 87 88
3 Quasirandom Numbers 3.1 Low Discrepancy . . . . 3.2 Types of Sequences . . . 3.2.1 Halton Sequences 3.2.2 Sobol’ Sequences 3.2.3 Comparisons . . 3.2.4 Variations . . . . 3.2.5 Computations . . 3.3 Further Comments . . . Exercises . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
93 93 94 94 96 97 97 98 98 100
4 Transformations of Uniform Deviates: General Methods 4.1 Inverse CDF Method . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decompositions of Distributions . . . . . . . . . . . . . . . . . . 4.3 Transformations that Use More than One Uniform Deviate . . 4.4 Multivariate Uniform Distributions with Nonuniform Marginals 4.5 Acceptance/Rejection Methods . . . . . . . . . . . . . . . . . . 4.6 Mixtures and Acceptance Methods . . . . . . . . . . . . . . . .
. . . . . .
101 102 109 111 112 113 125
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . . . .
CONTENTS 4.7 Ratio-of-Uniforms Method . . . . . . . . . . . . . . . . . . 4.8 Alias Method . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Use of the Characteristic Function . . . . . . . . . . . . . 4.10 Use of Stationary Distributions of Markov Chains . . . . . 4.11 Use of Conditional Distributions . . . . . . . . . . . . . . 4.12 Weighted Resampling . . . . . . . . . . . . . . . . . . . . 4.13 Methods for Distributions with Certain Special Properties 4.14 General Methods for Multivariate Distributions . . . . . . 4.15 Generating Samples from a Given Distribution . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
5 Simulating Random Numbers from Specific Distributions 5.1 Modifications of Standard Distributions . . . . . . . . . . . . . 5.2 Some Specific Univariate Distributions . . . . . . . . . . . . . . 5.2.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . 5.2.2 Exponential, Double Exponential, and Exponential Power Distributions . . . . . . . . . . . . . . . . . . . . 5.2.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . . 5.2.4 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Chi-Squared, Student’s t, and F Distributions . . . . . . 5.2.6 Weibull Distribution . . . . . . . . . . . . . . . . . . . . 5.2.7 Binomial Distribution . . . . . . . . . . . . . . . . . . . 5.2.8 Poisson Distribution . . . . . . . . . . . . . . . . . . . . 5.2.9 Negative Binomial and Geometric Distributions . . . . . 5.2.10 Hypergeometric Distribution . . . . . . . . . . . . . . . 5.2.11 Logarithmic Distribution . . . . . . . . . . . . . . . . . 5.2.12 Other Specific Univariate Distributions . . . . . . . . . 5.2.13 General Families of Univariate Distributions . . . . . . . 5.3 Some Specific Multivariate Distributions . . . . . . . . . . . . . 5.3.1 Multivariate Normal Distribution . . . . . . . . . . . . . 5.3.2 Multinomial Distribution . . . . . . . . . . . . . . . . . 5.3.3 Correlation Matrices and Variance-Covariance Matrices 5.3.4 Points on a Sphere . . . . . . . . . . . . . . . . . . . . . 5.3.5 Two-Way Tables . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Other Specific Multivariate Distributions . . . . . . . . 5.3.7 Families of Multivariate Distributions . . . . . . . . . . 5.4>
After the guide table is set up, Algorithm 4.3 generates a random number from the given distribution. Algorithm 4.3 Sampling a Discrete Random Variate Using the Chen and Asau Guide Table Method 1. Generate u from a U(0, 1) distribution, and set i = 0012un0014. 2. Set x = gi + 1. 0001 3. While xk=1 pk > u, set x = x − 1. Efficiency of the Inverse CDF for Discrete Distributions Rather than using a stored table of the mass points of the distribution, we may seek other efficient methods of searching for the x in equation (4.2). The search can often be improved by knowledge of the relative magnitude of the probabilities of the points. The basic idea is to begin at a point with a high probability of satisfying the relation (4.2). Obviously, the mode is a good place to begin the search, especially if the probability at the mode is quite high.
108
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
For many discrete distributions of interest, there may be a simple recursive relationship between the probabilities of adjacent mass points: p(x) = f (p(x − 1))
for x > x0 ,
where f is some simple function (and we assume that the mass points differ by 1, and x0 is the smallest value with positive mass). In the Poisson distribution (see page 188), for example, p(x) = θp(x − 1)/x for x > 0. For this case, Kemp (1981) describes two approaches. One is a “build-up search” method in which the CDF is built up by the recursive computation of the mass probabilities. This is Algorithm 4.4. Algorithm 4.4 Build-Up Search for Discrete Distributions 0. Set t = p(x0 ). 1. Generate u from a U(0, 1) distribution, and set x = x0 , px = t, and s = px . 2. If u ≤ s, then 2.a. deliver x; otherwise, 2.b. set x = x + 1, px = f (px ), and s = s + px , and return to step 2. The second method that uses the recursive evaluation of probabilities to speed up the search is a “chop-down” method in which the generated uniform variate is decreased by an amount equal to the CDF. This method is given in Algorithm 4.5. Algorithm 4.5 Chop-Down Search for Discrete Distributions 0. Set t = p(x0 ). 1. Generate u from a U(0, 1) distribution, and set x = x0 and px = t. 2. If u ≤ px , then 2.a. deliver x; otherwise, 2.b. set u = u − px , x = x + 1, and px = f (px ), and return to step 2. Either of these methods could be modified to start at some other point, such as the mode.
4.2. DECOMPOSITIONS OF DISTRIBUTIONS
109
Interpolating in Tables Often, for a continuous random variable, we may have a table of values of the cumulative distribution function but not have a function representing the CDF over its full range. This situation may arise in applications in which a person familiar with the process can assign probabilities for the variable of interest yet may be unwilling to assume a particular distributional form. One approach to this problem is to fit a continuous function to the tabular values and then use the inverse CDF method on the interpolant. The simplest interpolating function, of course, is the piecewise linear function, but second- or third-degree polynomials may give a better fit. It is important, however, that the interpolant be monotone. Guerra, Tapia, and Thompson (1976) describe a scheme for approximating the CDF based on an interpolation method of Akima (1970). Their procedure is implemented in the IMSL routine rngct. Multivariate Distributions The inverse CDF method does not apply to a multivariate distribution, although marginal and conditional univariate distributions can be used in an inverse CDF method to generate multivariate random variates. If the CDF of the multivariate random variable (X1 , X2 , . . . , Xd ) is decomposed as PX1 X2 ..Xd (x1 , x2 , . . . , xd ) = PX1 (x1 )PX2 |X1 (x2 |x1 ) · · · PXd |X1 X2 ..Xd−1 (xd |x1 , x2 , . . . , xd−1 ) and if the functions are invertible, the inverse CDF method is applied sequentially using independent realizations of a U(0, 1) random variable, u1 , u2 , . . . , ud : x1
−1 = PX (u1 ), 1
x2
−1 = PX (u2 ), 2 |X1
.. .. −1 (ud ). xd = P X d |X1 X2 ..Xd−1 The modifications of the inverse CDF for discrete random variables described above can be applied if necessary.
4.2
Decompositions of Distributions
It is often useful to break up the range of the distribution of interest using one density over one subrange and another density over another subrange. More generally, we may represent the distribution of interest as a mixture distribution that is composed of proportions of other distributions. Suppose that the probability density or probability function of the random variable of interest,
110
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
p(·), can be represented as p(x) =
k
wj pj (x),
(4.6)
j=1
where the pj (·) are density functions or probability functions of random variables, the union of whose support is the support of the random variable of interest. We require wj ≥ 0 and
k
wj = 1.
j=1
The random variable of interest has a mixture distribution. If the pj are such that the pairwise intersections of the supports of the distributions are all null, the mixture is a stratification. To generate a random deviate from a mixture distribution, first use a single uniform to select the component distribution, and then generate a deviate from it. The mixture can consist of any number of terms. To generate a sample of n random deviates from a mixture distribution of d distributions, consider the proportions to be the parameters of a d-variate multinomial distribution. The first step is to generate a single multinomial deviate, and then generate the required number of deviates from each of the component distributions. Any decomposition of p into the sum of nonnegative integrable functions yields the decomposition in equation (4.6). The nonnegative wi are chosen to sum to 1. For example, suppose that a distribution has density p(x), and for some constant c, p(x) ≥ c over (a, b). Then, the distribution can be decomposed into a mixture of a uniform distribution over (a, b) with proportion c(b − a) and some leftover part, say g(x). Now, g(x)/(1 − c(b − a)) is a probability density function. To generate a deviate from p: with probability c(b − a), generate a deviate from U(a, b); otherwise, generate a deviate from the density
1 g(x). 1 − c(b − a)
If c(b − a) is close to 1, we will generate from the uniform distribution most of the time, so even if it is difficult to generate from g(x)/(1 − c(b − a)), this decomposition of the original distribution may be useful. Another way of forming a mixture distribution is to consider a density similar to equation (4.6) that is a conditional density, p(x|y) = yp1 (x) + (1 − y)p2 (x),
4.3. USE OF MORE THAN ONE UNIFORM DEVIATE
111
where y is the realization of a Bernoulli random variable, Y . If Y takes a value of 0 with probability w1 /(w1 + w2 ), then the density in equation (4.6) is the marginal density. This conditional distribution yields 0016 pX,Y (x, y) dy pX (x) =
= pX|Y =y Pr(Y = y) y
= w1 p1 (x) + w2 p2 (x), as in equation (4.6). More generally, for any random variable X with a distribution parameterized by θ, we can think of the parameter as being the realization of a random variable Θ. Some common distributions result from mixing other distributions; for example, if the gamma distribution is used to generate the parameter in a Poisson distribution, a negative binomial distribution is formed. Mixture distributions are often useful in their own right; for example, the beta-binomial distribution (see page 187) can be used to model overdispersion.
4.3
Transformations that Use More than One Uniform Deviate
Methods for generating random deviates by first decomposing the distribution of interest require the use of more than one uniform deviate for each deviate from the target distribution. Most other methods discussed in this chapter also require more than one uniform deviate for each deviate of interest. For such methods we must be careful to avoid any deleterious effects of correlations in the underlying uniform generator. An example of a short-range correlation occurs in the use of a congruential generator, xi ≡ axi−1 mod m, when xi−1 is extremely small. In this case, the value of xi is just axi−1 with no modular reduction. A small value of xi−1 may correspond to some extreme intermediate value in one of the constituent distributions in the decomposition of the density in equation (4.6). Because xi = axi−1 , when xi is used to complete the transformation to the variate of interest, it may happen that the extreme values of that variate do not cover their appropriate range. As a simple example, consider a method for generating a variate from a double exponential distribution. One way to do this is to use one uniform variate to generate an exponential variate (using one of the methods that we discuss below) and then use a second uniform variate to decide whether to change the sign of the exponential variate (with probability 1/2). Suppose that the method for generating an exponential variate yields an extremely large value if the underlying uniform variate is extremely small. (The method given
112
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
by equation (5.10) on page 176 does this.) If the next uniform deviate from the basic generator is used to determine whether to change the sign, it may happen that all of the extreme double exponentials generated have the same sign. Many such problems arise because of a poor uniform generator; a particular culprit is a multiplicative congruential generator with a small multiplier. Use of a high-quality uniform generator generally solves the problem. A more conservative approach may be to use a different uniform generator for each uniform deviate used in the generation of a single nonuniform deviate. For this to be effective, each generator must be of high quality, of course. Because successive numbers in a quasirandom sequence are constructed so as to span a space systematically, such sequences generally should not be used when more than one uniform deviate is transformed into a single deivate from another distribution. The autocorrelations in the quasirandom sequence may prevent certain ranges of values of the transformations from being realized. A common way in which uniform deviates are transformed to deviates from nonuniform distributions is to use one uniform random number to make a decision about how to use another uniform random number. The decision is often based on a comparison of two floating-point numbers. In rare cases, because of slight differences in rounding to a finite precision, this comparison may result in different decisions in different computer environments. The different decisions can result in the generation of different output streams from that point on. Our goal of completely portable random number generators (Section 1.11) may not be achieved when comparisons are made between two floating-point numbers that might differ in the least significant bits on different systems.
4.4
Multivariate Uniform Distributions with Nonuniform Marginals
Suppose that pX is a continuous probability density function, and consider the set S = {(x, u), s.t. 0 ≤ u ≤ pX (x)}. (4.7) Let (X, U ) be a bivariate random variable with uniform distribution over S. Its density function is (4.8) pXU (x, u) = IS (x, u). The conditional distribution of U given X = x is U(0, pX (x)), and the conditional distribution of X given U = u is also uniform with density pX|U (x|u) = I{t, s.t. pX (t)≥u} (x). The important fact, which we see by integrating u out of the density in equation (4.8), is that the marginal distribution of X has density pX . This can be seen in Figure 4.3, where the points are uniformly distributed over S, but the marginal histogram of the x values corresponds to the density pX .
4.5. ACCEPTANCE/REJECTION METHODS
113
Figure 4.3: Support of a Bivariate Uniform Random Variable (X, U ) Having a Marginal with Density p(x) These facts form the basis of methods of generating random deviates from various nonuniform distributions. The effort in these methods is expended in getting the bivariate uniform points over the region S. In most cases, this is done by generating bivariate points uniformly over some larger region and then rejecting those points that are not in the region S. This same approach is valid if the random variable X is a vector. In this case, we would identify a higher-dimensional region S with a scalar u and a vector x corresponding respectively to a scalar uniform random variable and the vector random variable X.
4.5
Acceptance/Rejection Methods
To generate realizations of a random variable X, an acceptance/rejection method makes use of realizations of another random variable Y having probability density gY similar to the probability density of X, pX . The basic idea is that selective subsamples from samples from one distribution are stochastically equivalent to samples from a different distribution. The acceptance/rejection technique is one of the most important methods in random number generation, and it occurs in many variations.
114
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Majorizing the Density In the basic form of the method, to generate a deviate from a distribution with density pX , a random variable Y is chosen so that we can easily generate realizations of it and so that its density gY can be scaled to majorize pX using some constant c; that is, so that cgY (x) ≥ pX (x) for all x. The density gY is called the majorizing density, and cgY is called the majorizing function. The majorizing function is also called the “envelope” or the “hat function”. The majorizing density is also sometimes called the “trial density”, the “proposal density”, or the “instrumental density”. There are many variations of the acceptance/rejection method. The method described here uses a sequence of i.i.d. variates from the majorizing density. It is also possible to use a sequence from a conditional majorizing density. A method using a nonindependent sequence is called a Metropolis method (and there are variations of these, with their own names, as we see below). Unlike the inverse CDF method, the acceptance/rejection method applies immediately to multivariate random variables, although, as we will see, the method may not be very efficient in high dimensions. Algorithm 4.6 The Acceptance/Rejection Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function gY . 2. Generate u from a U(0, 1) distribution. 3. If u ≤ pX (y)/cgY (y), then 3.a. take y as the desired realization; otherwise 3.b. return to step 1. It is easy to see that the random number delivered by Algorithm 4.6 has a density pX . (In Exercise 4.2, page 160, you are asked to write the formal proof.) The pairs (u, y) that are accepted follow a bivariate uniform distribution over the region S in equation (4.7). Figure 4.4 illustrates the functions used in the acceptance/rejection method. (Figure 4.4 shows the same density used in Figure 4.3 with a different scaling of the axes. The density is the beta distribution with parameters 3 and 2. In Exercise 4.3, page 160, you are asked to write a program implementing the acceptance/rejection method with the majorizing density shown.) The acceptance/rejection method can be visualized as choosing a subsequence from a sequence of independently and identically distributed (i.i.d.) realizations from the distribution with density gY in such a way that the subsequence has density pX , as shown in Figure 4.5. If we ignore the time required to generate y from the dominating density gY , the closer cgY (x) is to pX (x) (that is, the closer c is to its lower bound of 1), the faster the acceptance/rejection algorithm will be. The proportion of
4.5. ACCEPTANCE/REJECTION METHODS
115
Figure 4.4: The Acceptance/Rejection Method to Convert Uniform Random Numbers acceptances to the total number of trials is the ratio of the area marked “A” in Figure 4.4 to the total area of region “A” and region “R”. Because pX is a density, the area of “A” is 1, so the relevant proportion is 1/(r + 1),
(4.9)
where r is the area between the curves. This ratio only relates to the efficiency of the acceptance; other considerations in the efficiency, of course, involve the amount of computation necessary to generate from the majorizing density. The random variable corresponding to the number of passes through the steps of Algorithm 4.6 until the desired variate is delivered has a geometric distribution (equation (5.21) on page 189, except beginning at 1 instead of 0) with parameter π = 1/(r + 1). Selection of a majorizing function involves the principles of function approximation with the added constraint that the approximating function be a i.i.d. from gY accept? i.i.d. from pX
yi no
yi+1 yes xj
yi+2 no
yi+3 yes xj+1
··· ··· ···
Figure 4.5: Acceptance/Rejection
yi+k yes xj+l
··· ··· ···
116
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.6: Normal (0, 1) Density with a Normal (0, 2) Majorizing Density probability density from which it is easy to generate random variates. Often, gY is chosen to be a very simple density, such as a uniform or a triangular density. When the dominating density is uniform, the acceptance/rejection method is similar to the “hit-or-miss” method (see Exercise 7.2, page 271). The acceptance/rejection method can be used for multivariate random variables, in which case the majorizing distribution is also multivariate. For higher dimensions, however, the acceptance ratio (4.9) may be very small. Consider the use of a normal with mean 0 and variance 2 as a majorizing density for a normal with mean 0 and variance 1, as shown in Figure 4.6. A majorizing density like this with a shape more closely approximating that of the target density is more efficient. (This majorizing function is just chosen for illustration. An obvious problem in this case would be that if we could generate deviates from the N(0, 2) distribution, then we could generate ones from the N(0, 1) distribution, and we would not use this method.) In the one-dimensional case, as shown in Figure 4.6, the acceptance region is the area under the lower curve, and the rejection region is √ the thin shell between √ the two curves. The acceptance proportion (4.9) is 1/ 2. (Note that c = 2.) In higher dimensions, even a thin shell contains most of the volume, so the rejection proportion would be high. In d dimensions, use of a multivariate normal with a diagonal variance-covariance matrix with all entries equal to 2 as a majorizing density to generate a multivariate normal with a diagonal variance-covariance matrix √ with all entries equal to 1 would have an acceptance proportion of only 1/ d.
4.5. ACCEPTANCE/REJECTION METHODS
117
Figure 4.7: The Acceptance/Rejection Method with a Squeeze Function Reducing the Computations in Acceptance/Rejection: Squeeze Functions A primary concern in reducing the number of computations in the acceptance/rejection method is to ensure that the proportion of acceptances is high; that is, that the ratio (4.9) is close to one. Two other issues are the difficulty in generating variates from the majorizing density and the speed of the computations to determine whether to accept or to reject. If the target density, p, is difficult to evaluate, an easy way of speeding up the process is to use simple functions that bracket p to avoid the evaluation of p with a high probability. This method is called a “squeeze” (see Marsaglia, 1977). This allows quicker acceptance. The squeeze function is often a linear or piecewise linear function. The basic idea is to do pretests using simpler functions. Most algorithms that use a squeeze function only use one below the density of interest. Figure 4.7 shows a piecewise linear squeeze function for the acceptance/rejection setup of Figure 4.4. For a given trial value y, before evaluating pX (y) we may evaluate the simpler s(y). If u ≤ s(y)/cgY (y), then u ≤ pX (y)/cgY (y), so we can accept without computing pX (y). Pairs (y, u) lying in the region marked “Q” allow for quick acceptance. The efficiency of an acceptance/rejection method with a squeeze function depends not only on the area between the majorizing function and the target density, as in equation (4.9), but also on the difference in the total area of the acceptance region, which is 1, and the area under the squeeze function (that is, the area of the region marked “Q”). The closer this area is to 1, the more effective is the squeeze. These ratios of areas relate only to the efficiency of the
118
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
acceptance and the quick acceptance. Other considerations in the efficiency, of course, involve the amount of computation necessary to generate from the majorizing density and the amount of computation necessary to evaluate the squeeze function, which, it is presumed, is very small. Another procedure for making the acceptance/rejection decision with fewer computations is the “patchwork” method of Kemp (1990). In this method, the unit square is divided into rectangles that correspond to pairs of uniform distributions that would lead to acceptance, rejection, or lack of decision. The full evaluations for the acceptance/rejection algorithm need be performed only if the pair of uniform deviates to be used are in a rectangle of the latter type. For a density that is nearly linear (or nearly linear over some range), Marsaglia (1962) and Knuth (1998) describe some methods for efficient generation. These methods make use of simple methods for generating from a density that is exactly linear. Use of an inverse CDF method for a distribution with a density that is exactly linear over some range involves a square root operation, but another simple way of generating from a linear density is to use the maximum order statistic of a sample of size two from a uniform distribution; that is, independently generate two U(0, 1) variates, u1 and u2 , and use max(u1 , u2 ). (Order statistics from a uniform distribution have a beta distribution; see Section 6.4.1, page 221.) Following Knuth’s development, suppose that, as in Figure 4.8, the density over the interval (s, s + h) is bounded by two parallel lines, l1 (x) = a − b(x − s)/h and l2 (x) = b − b(x − s)/h. Consider the density p(x) shown in Figure 4.8. Algorithm 4.7, which is Knuth’s method, yields deviates from the distribution with density p. Notice the use of the maximum of two uniform deviates to generate from an exactly linear density. By determining the probability that the resulting deviate falls in any given interval, it is easy to see that the algorithm yields deviates from the given density. You are asked to show this formally in Exercise 4.8, page 161. (The solution to the exercise is given in Appendix B.) Algorithm 4.7 Sampling from a Nearly Linear Density 1. Generate u1 and u2 independently from a U(0, 1) distribution. Set u = min(u1 , u2 ), v = max(u1 , u2 ), and x = s + hu. 2. If v ≤ a/b, then 2.a. go to step 3; otherwise, 2.b. if v > u + p(x)/b, go to step 1. 3. Deliver x.
4.5. ACCEPTANCE/REJECTION METHODS
119
Figure 4.8: A Nearly Linear Density Usually, when we take advantage of the fact that a density is nearly linear, it is not the complete density that is linear, but rather the nearly linear density is combined with other densities to form the density of interest. The density shown 001c s+h p(x) dx = in Figure 4.8 may be the density over some interval (s, s + h) so s p < 1. (See the discussion of mixtures of densities in Section 4.2.) For densities that are concave, we can also very easily form linear majorizing and linear squeeze functions. The majorizing function is a polygon of tangents and the squeeze function is a polygon of secants, as shown in Figure 4.9 (for the density p(x) = 43 (1−x2 ) over [−1, 1]). Any number of polygonal sections could be used in this approach. The tradeoffs involve the amount of setup and housekeeping for the polygonal sections and the proportion of total rejections and the proportion of easy acceptances. The formation of the majorizing and squeeze functions can be done adaptively or sequentially, as we discuss on page 151. Acceptance/Rejection for Discrete Distributions There are various ways that acceptance/rejection can be used for discrete distributions. One advantage of these methods is that they can be easily adapted to changes in the distribution. Rajasekaran and Ross (1993) consider the discrete random variable Xs such that Pr(Xs = xi )
= psi =
asi , as1 + as2 + · · · ask
i = 1, . . . , k.
0001k (If i=1 asi = 1, the numerator asi is the ordinary probability psi at the mass point i.) Suppose that there exists an a∗i such that a∗i ≤ asi for s = 1, 2, . . . and
120
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.9: Linear Majorizing and Squeeze Functions for a Concave Density b > 0 such that
0001k i=1
asi ≥ b for s = 1, 2, . . . Let a∗ = max{a∗i },
and let
Psi = asi /a∗
for i = 1, . . . , k.
The generation method for Xs is shown in Algorithm 4.8. Algorithm 4.8 Acceptance/Rejection Method for Discrete Distributions 1. Generate u from a U(0, 1) distribution, and let i = 0012ku0014. 2. Let r = i − ku. 3. If r ≤ Psi , then 3.a. take i as the desired realization; otherwise, 3.b. return to step 1. Suppose that for the random variable Xs+1 , ps+1,i = psi for some i. (Of course, if this is the case for mass point i, it is also necessarily the case for some other mass point.) For each mass point for which the probability changes, reset Ps+1,i to as+1,i /a∗ and continue with Algorithm 4.8. Rajasekaran and Ross (1993) also gave two other acceptance/rejection type algorithms for discrete distributions that are particularly efficient for use with distributions that may be changing. The other algorithms require slightly more preprocessing time but yield faster generation times than Algorithm 4.8.
4.5. ACCEPTANCE/REJECTION METHODS
121
Variations of Acceptance/Rejection There are many variations of the basic acceptance/rejection method, and the idea of selection of variates from one distribution to form a sample from a different distribution forms the basis of several other methods discussed in this chapter, such as formation of ratios of uniform deviates, use of the characteristic function, and various uses of Markov chains. Wallace (1976) introduced a modified acceptance/rejection method called transformed rejection. In the transformed acceptance/rejection method, the steps of Algorithm 4.6 are combined and rearranged slightly. Let G be the CDF corresponding to the dominating density g. Let H(x) = G−1 (x), and let h(x) = d H(x)/dx. If v is a U(0, 1) deviate, step 1 in Algorithm 4.6 is equivalent to y = H(v), so we have Algorithm 4.9. Algorithm 4.9 The Transformed Acceptance/Rejection Method 1. Generate u and v independently from a U(0, 1) distribution. 2. If u ≤ p(H(v))h(v)/c, then 2.a. take H(v) as the desired realization; otherwise, 2.b. return to step 1. Marsaglia (1984) describes a method very similar to the transformed acceptance/rejection method: use ordinary acceptance/rejection to generate a variate x from the density proportional to p(H(·))h(·) and then return H(x). The choice of H is critical to the efficiency of the method, of course. It should be close to the inverse of the CDF of the target distribution, P −1 . Marsaglia called this the exact-approximation method. Devroye (1986a) calls the method almost exact inversion. Other Applications of Acceptance/Rejection The acceptance/rejection method can often be used to evaluate an elementary function at a random point. Suppose, for example, that we wish to evaluate tan(πU ) for U distributed as U(−.5, .5). A realization of tan(πU ) can be simulated by generating u1 and u2 independently from U(−1, 1), checking if u21 + u22 ≤ 1, and, if so, delivering u1 /u2 as tan(πu). (To see this, think of u1 and u2 as sine and cosine values.) Von Neumann (1951) gives an acceptance/rejection method for generating sines and cosines of random angles. An example of evaluating a logarithm can be constructed by use of the equivalence of an inverse CDF method and an acceptance/rejection method for sampling an exponential random deviate. (The methods are equivalent in a stochastic sense; they are both valid, but they will not yield the same stream of deviates.) These methods of evaluating deterministic functions are essentially the same as using the “hit-or-miss” Monte Carlo method described in Exercise 7.2 on page 271 to evaluate an integral.
122
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Generally, if reasonable numerical software is available for evaluating special functions, it should be used rather than using Monte Carlo methods to estimate the function values. Quality and Portability of Acceptance/Rejection Methods Acceptance/rejection methods, like any method for generating nonuniform random numbers, are dependent on a good source of uniform deviates. H¨ ormann and Derflinger (1993) illustrate that small values of the multiplier in a congruential generator for the uniform deviates can result in poor quality of the output from an acceptance/rejection method. Of course, we have seen that small multipliers are not good for generating uniform deviatess. (See the discussion about Figure 1.3, page 16.) H¨ ormann and Derflinger rediscover the method of expression (1.23) and recommend using it so that larger multipliers can be used in the linear congruential generator. Acceptance/rejection methods generally use two uniform deviates to decide whether to deliver one variate of interest. In implementing an acceptance/rejection method, we must be aware of the cautionary note in Section 4.3, page 111. If the y in Algorithm 4.6 is special (extreme, perhaps) and results from a special value from the uniform generator, the u generated subsequently may also be special and may almost always result in the same decision to accept or to reject. Thus, we may get either an abundance or a deficiency of special values for the distribution of interest. Because of the comparison of floating-point numbers (that occurs in step 3 of Algorithm 4.6), there is a chance that an acceptance/rejection method may yield different streams on different computer systems or in implementations in different precisions. Even if the computations are carried out correctly, the program is inherently nonportable and the results may not be strictly reproducible because if a comparision on one system at a given precision results in acceptance and the comparison on another system results in rejection, the two output streams will be different. At best, the streams will be the same except for a few differences; at worst, however, because of how the output is used, the results will be different beginning at the point at which the acceptance/rejection decision is different. If the decision results in the generation of another random number (as in Algorithm 4.10 on page 126), the two output streams can become completely different. Acceptance/Rejection for Multivariate Distributions The acceptance/rejection method is one of the most widely applicable methods for random number generation. It is used in many different forms, often in combination with other methods. It is clear from the description of the algorithm that the acceptance/rejection method applies equally to multivariate distributions. (The uniform random number is still univariate, of course.)
4.5. ACCEPTANCE/REJECTION METHODS
123
As we have mentioned, however, for higher dimensions, the rejection proportion may be high, and thus the efficiency of the acceptance/rejection method may be low. Example of Acceptance/Rejection: A Bivariate Gamma Distribution Becker and Roux (1981) defined a bivariate extension of the gamma distribution that serves as a useful model for failure times for two related components in a system. (The model is also a generalization of a bivariate exponential distribution introduced by Freund, 1961; see Steel and Le Roux, 1987.) The probability density is given by −1 λ2 (Γ(α1 ) Γ(α2 ) β1α1 β2α2 ) × α1 −1 α2 −1 × x1 (λ2 (x2 − x1 ) + x1 ) λ2 λ2 1 1 for 0 ≤ x1 ≤ x2 , exp −( β1 + β2 − β2 )x1 − β2 x2 −1 pX1 X2 (x1 , x2 ) = × λ1 (Γ(α1 ) Γ(α2 ) β1α1 β2α2 ) α2 −1 α −1 x2 (λ1 (x1 − x2 ) + x2 ) 1 × for 0 ≤ x2 < x1 , exp −( β11 + β12 − λβ11 )x2 − λβ11 x1 0 elsewhere. (4.10) The density for α1 = 4, α2 = 3, β1 = 3, β2 = 1, λ1 = 3, and λ2 = 2 is shown in Figure 4.10. It is a little more complicated to determine a majorizing density for this distribution. First of all, not many bivariate densities are familiar to us. The density must have support over the positive quadrant. A bivariate normal density might be tried, but the exp(−(u1 x1 +u2 x2 )2 ) term in the normal density dies out more rapidly than the exp(−v1 x1 − v2 x2 ) term in the gamma density. The normal cannot majorize the gamma in the limit. We may be concerned about covariance of the variables in the bivariate gamma distribution, but the fact that the variables have nonzero covariance is of little concern in using the acceptance/rejection method. The main thing, of course, is that we determine a majorizing density so that the probability of acceptance is high. We can use a bivariate density of independent variables as the majorizing density. The density would be the product of two univariate densities. A bivariate distribution of independent exponentials might work. Such a density has a maximum at (0, 0), however, and there would be a large volume between the bivariate gamma density and the majorizing function formed from a bivariate exponential density. We can reduce this volume by choosing a bivariate uniform over the rectangle with corners (0, 0) and (z1 , z2 ). Our majorizing
124
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.10: A Bivariate Gamma Density, Equation (4.10) density then is composed of two densities, a bivariate exponential, y1 y2 1 for y1 > z1 and y2 > 0 exp − − θ1 θ2 v or y1 > 0 and y2 > z2 , g1 (y1 , y2 ) = 0 elsewhere,
(4.11)
where the constant v is chosen to make g1 a density, and a bivariate uniform, 1 z1 z2 for 0 < y1 ≤ z1 and 0 < y2 ≤ z2 , g2 (y1 , y2 ) = (4.12) 0 elsewhere. Next, we choose θ1 and θ2 so that the bivariate exponential density can majorize the bivariate gamma density. This requires that 00110011 0012 0012 1 1 1 λ2 λ1 ≥ max + − , , θ1 β1 β2 β2 β1 with a similar requirement for θ2 . Let us choose θ1 = 1 and θ2 = 2. Next, we choose z1 and z2 as the mode of the bivariate gamma density. This point is (4 13 , 2). We now choose c so that cg1 (z1 , z2 ) ≥ p(z1 , z2 ). The method is: 1. Generate u from a U(0, 1) distribution.
4.6. MIXTURES AND ACCEPTANCE METHODS
125
2. Generate (y1 , y2 ) from a bivariate exponential density such as (4.11) except over the full range; that is, with v = θ1 θ2 . 3. If (y1 , y2 ) is outside of the rectangle with corners (0, 0) and (z1 , z2 ), then 3.a. if u ≤ p(y1 , y2 )/cg1 (y1 , y2 ), then 3.a.i. deliver (y1 , y2 ); otherwise, 3.a.ii. go to step 1; otherwise, 3.b. generate (y1 , y2 ) as bivariate uniform deviates in that rectangle and if u ≤ p(y1 , y2 )/(cy1 y2 ), then 3.b.i. deliver (y1 , y2 ); otherwise, 3.b.ii. go to step 1. The majorizing density could be changed so that it is closer to the bivariate gamma density. In particular, instead of the uniform density over the rectangle with a corner on the origin, a pyramidal density that is closer to the bivariate gamma density could be used.
4.6
Mixtures and Acceptance Methods
In practice, in acceptance/rejection methods, the density of interest p and/or the majorizing density are often decomposed into mixtures. If the mixture for the density is a stratification, it may be possible to have simple majorizing and squeeze functions within each stratum. Ahrens (1995) suggested using a stratification into equal-probability regions (that is, the wj s in equation (4.6) are all constant) and then using constant majorizing and squeeze functions in each stratum. There is, of course, a tradeoff in gains in high probability of acceptance (because the majorizing function is close to the density) and/or in efficiency of the evaluation of the acceptance decision (because the squeeze function is close to the density) and the complexity introduced by the decomposition. Decomposition into regions where the density is nearly constant almost always will result in overall gains in efficiency. If the decomposition is into equal-probability regions, the random selection of the stratum is very fast. There are many ways in which mixtures can be combined with acceptance/rejection methods. Suppose that the density of interest, p, may be written as p(x) = w1 p1 (x) + w2 p2 (x), and suppose that there is a density g that majorizes w1 p1 ; that is, g(x) ≥ w1 p1 (x) for all x. Kronmal and Peterson (1981, 1984) consider this case and propose the following algorithm, which they call the acceptance/complement method.
126
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Algorithm 4.10 The Acceptance/Complement Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function g. 2. Generate u from a U(0, 1) distribution. 3. If u > w1 p1 (y)/g(y), then generate y from the density p2 . 4. Take y as the desired realization. We discussed nearly linear densities and gave Knuth’s algorithm for generating from such densities as Algorithm 4.7. Devroye (1986a) gives an algorithm for a special nearly linear density; namely, one that is almost flat. The method is based on a simple decomposition using the supremum of the density. (In practice, as we have indicated in discussing other techniques, this method would probably be used for a component of a density that has already been decomposed.) To keep the description simple, assume that the range of the random variable is (−1, 1) and that the density p satisfies sup p(x) − inf p(x) ≤ x
x
1 2
over that interval. Now, because p is a density, we have 0 ≤ inf p(x) ≤ x
1 ≤ sup p(x) 2 x
and sup p(x) ≤ 1. x
∗
Let p = supx p(x), and decompose the target density into 0011 0012 1 p1 (x) = p(x) − p∗ − 2 and
0011 0012 1 ∗ p2 (x) = p − 2
The method is shown in Algorithm 4.11. Algorithm 4.11 Sampling from a Nearly Flat Density 1. Generate u from U(0, 1). 2. Generate x from U(−1, 1). 3. If u > 2(p(x) − (p∗ − 12 )), then generate x from U(−1, 1). 4. Deliver x.
4.6. MIXTURES AND ACCEPTANCE METHODS
127
Another variation on the general theme of acceptance/rejection applied to mixtures was proposed by De´ ak (1981) in what he called the “economical method”. To generate a deviate from the density p using this method, an auxiliary density g is used, and an “excess area” and a “shortage area” are defined. The excess area is where g(x) > p(x), and the shortage area is where g(x) ≤ p(x). We define two functions p1 and p2 : p1 (x)
= g(x) − p(x)
if
g(x) − p(x) < 0,
if
p(x) − g(x) ≥ 0,
= 0 otherwise, p2 (x)
= p(x) − g(x) = 0 otherwise.
Now, we define a transformation T that will map the excess area into the shortage area in a way that will yield the density p. Such a T is not unique, but one transformation that will work is ! 000e 0016 t 0016 x p1 (s) ds = p2 (s) ds . T (x) = min t, s.t. −∞
−∞
Algorithm 4.12 shows the method. Algorithm 4.12 The Economical Method to Convert Uniform Random Numbers 1. Generate y from the distribution with density function g. 2. If p(y)/g(y) < 1, then 2.a. generate u from a U(0, 1) distribution; 2.b. if u ≤ p(y)/g(y), then replace y with T (y). 3. Take y as the desired realization. Using the representation of a discrete distribution that has k mass points as an equally weighted mixture of k two-point distributions, De´ ak (1986) develops a version of the economical method for discrete distributions. (See Section 4.8, page 133, on the alias method for additional discussion of two-point representations.) Marsaglia and Tsang (1984) give a method that involves forming a decomposition of a density into horizontal slices with equal areas. For a unimodal distribution, they first form two regions, one on each side of the mode, prior to the slicing decomposition. They call a method that uses this kind of decomposition the “ziggurat method”. Marsaglia and Tsang (1998) also describe a decomposition and transformation that they called the “Monty Python method”, in which the density (or a part of the density) is divided into three regions as shown in the left-hand plot in Figure 4.11. If the density of interest has already been decomposed
128
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
into a mixture, the part to be decomposed further, p(x), is assumed to have been scaled to integrate to 1. The density in Figure 4.11 may represent the right-hand side of a t distribution, for example, but the function shown has been scaled to integrate to 1. The support of the density is transformed if necessary to begin at 0. One region is now rotated and stretched to fit into an area within a rectangle above another region, as shown in the right-hand plot in Figure 4.11.
Figure 4.11: The Monty Python Decomposition Method The key parameter in the Monty Python method is b, the length of the base of a rectangle that has an area of 1. The portion of the distribution represented by the density above 1/b between 0 and p−1 (1/b) (denote this point by a) is transformed into a region of equal area between a and b bounded from below by the function 1 g(x) = − cp(b − x) − d, b where c and d are chosen so the area is equal to the original and g(x) ≥ p(x) over (a, b). This implies that the tail area beyond b is equal to the area between p(x) and g(x). If a point (x, y), chosen uniformly over the rectangle, falls in region A or B, then x is delivered; if it falls in C, then b − x is delivered; otherwise, x is discarded, and a variate is generated from the tail of the distribution. The efficiency of this method obviously depends on a choice of b in which the decomposition minimizes the tail area. Marsaglia and Tsang (1998) suggest an improvement that may allow a better choice of b. Instead of the function g(x), a polynomial, say a cubic, is determined to satisfy the requirements of majorizing p(x) over (a, b) and having an area equal to the original areal of C. This allows more flexibility in the choice of b. This is the same as the transformation in the exact-approximation method of Marsaglia referred to earlier.
4.7. RATIO-OF-UNIFORMS METHOD
4.7
129
Ratio-of-Uniforms Method
Kinderman and Monahan (1977) discuss a very useful relationship among random variables U , V , and V /U . If (U, V ) is uniformly distributed over the set ' ! 000e v C = (u, v), s.t. 0 ≤ u ≤ h , (4.13) u where h is a nonnegative integrable function, then V /U has probability density proportional to h. Use of this relationship is called a ratio-of-uniforms method. It is easy to see that this relationship holds. For U and V as given, their joint density is pU V (u, v) = IC (u, v)/c, where c is the area of C. Let X = U and Y = V /U . The Jacobian of the transformation is x, so the joint density of X and Y is pXY (x, y) = xIC (x, y)/c. Hence, we have pXY (x, y) =
x # $ (x), I √ c 0, h(y)
and integrating out x, we get 0016 √h(y) pY (y)
= 0
x dx c
1 h(y). 2c
=
In practice, we may choose a simple geometric region that encloses C, generate a uniform point in the rectangle, and reject a point that does not satisfy ' v . u≤ h u The larger region enclosing C is called the majorizing region because it is similar to the region under the majorizing function in acceptance/rejection methods. The ratio-of-uniforms method is very simple to apply, and it can be quite fast. If h(x) and x2 h(x) are bounded in C, a simple form of the majorizing region is the rectangle {(u, v), s.t. 0 ≤ u ≤ b, c ≤ v ≤ d}, where % h(x), x % c = inf x h(x), x % d = sup x h(x). b = sup
x
This yields the method shown in Algorithm 4.13.
130
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Algorithm 4.13 Ratio-of-Uniforms Method (Using a Rectangular Majorizing Region for Continuous Variates) 1. Generate u and v independently from a U(0, 1) distribution. 2. Set u1 = bu and v1 = c + (d − c)v. 3. Set x = v1 /u1 . 4. If u21 ≤ h(x), then 4.a. take x as the desired realization; otherwise, 4.b. return to step 1. Figure 4.12 shows a rectangular region and the area of acceptance for the same density used to illustrate the acceptance/rejection method in Figure 4.4. The full rectangular region as defined above has a very low proportion of acceptances in the example shown in Figure 4.12. There are many obvious ways of reducing the size of this region. A simple reduction would be to truncate the rectangle by the line v = u, as shown. Just as in other acceptance/rejection methods, there is a tradeoff in the effort to generate uniform deviates over a region with a high acceptance rate and the wasted effort of generating uniform deviates that will be rejected. The effort to generate only in the acceptance region is likely to be slightly greater than the effort to invert the CDF. Wakefield, Gelfand, and Smith (1991) give a generalization of the ratio-ofuniforms method by introducing a strictly increasing, differentiable function g that has the property g(0) = 0. Their method uses the fact that if (U, V ) is uniformly distributed over the set 000e 0011 0011 00120012! v , Ch,g = (u, v), s.t. 0 ≤ u ≤ g ch g 0007 (u) where c is a positive constant and h is a nonnegative integrable function as before, then V /g 0007 (U ) has a probability density proportional to h. Ratio-of-Uniforms and Acceptance/Rejection Stadlober (1990, 1991) considers the relationship of the ratio-of-uniforms method to the ordinary acceptance/rejection method and applied the ratio-of-uniforms method to discrete distributions. If (U, V ) is uniformly distributed over the rectangle {(u, v), s.t. 0 ≤ u ≤ 1, −1 ≤ v ≤ 1}, and X = sV /U + a, for any s > 0, then X has the density 1 , a − s ≤ x ≤ a + s, 4s gX (x) = s elsewhere, 4(x − a)2
4.7. RATIO-OF-UNIFORMS METHOD
131
Figure 4.12: The Ratio-of-Uniform Method (Same Density as in Figure 4.4) and the conditional density of Y = U 2 , given X, is 1 for a − s ≤ x ≤ a + s, and 0 ≤ y ≤ 1, (x − a)2 s2 gY |X (y|x) = for x > |a + s|, and 0 ≤ y ≤ , s2 (x − a)2 0 elsewhere. The conditional distribution of Y given X = x is uniform on (0, 4sg(x)), and the ratio-of-uniforms method is an acceptance/rejection method with a table mountain majorizing function. Ratio-of-Uniforms for Discrete Distributions Stadlober (1990) gives the modification of the ratio-of-uniforms method in Algorithm 4.14 for a general discrete random variable with probability function p(·). Algorithm 4.14 Ratio-of-Uniforms Method for Discrete Variates 1. Generate u and v independently from a U(0, 1) distribution.
132
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
2. Set x = 0006a + s(2v − 1)/u0007. 3. Set y = u2 . 4. If y ≤ p(x), then 4.a. take x as the desired realization; otherwise, 4.b. return to step 1. Ahrens and Dieter (1991) describe a ratio-of-uniforms algorithm for the Poisson distribution, and Stadlober (1991) describes one for the binomial distribution. Improving the Efficiency of the Ratio-of-Uniforms Method As we discussed on page 117, the efficiency of any acceptance/rejection method depends negatively on three things: • the effort required to generate the trial variates; • the effort required to make the acceptance/rejection decision; and • the proportion of rejections. There are often tradeoffs among them. We have indicated how the proportion of rejections can be decreased by forming the majorizing region so that it is closer in shape to the shape of the acceptance region. This generally comes at the cost of more effort to generate trial variates. The increase in effort is modest if the majorizing region is a polygon. Leydold (2000) described a systematic method of forming polygonal majorizing regions for a broad class of distributions (T -concave distributions, see page 152). The effort required to make the acceptance/rejection decision can be reduced in the same manner as a squeeze in acceptance/rejection. If a convex polygonal set interior to the acceptance region can be defined, then acceptance decisions can be made quickly by comparisons with linear functions. For a class of distributions, Leydold (2000) described a systematic method for forming interior polygons from construction points defined by the sides of a polygonal majorizing region. Quality of Random Numbers Produced by the Ratio-of-Uniforms Method The ratio-of-uniforms method, like any method for generating nonuniform random numbers, is dependent on a good source of uniforms. The special relationships that may exist between two successive uniforms when one of them is an extreme value can cause problems, as we indicated on pages 111 and 122. Given a high-quality uniform generator, the method is subject to the same issues of floating-point computations that we discussed on page 122.
4.8. ALIAS METHOD
133
Afflerbach and H¨ ormann (1992) and H¨ ormann (1994b) indicate that, in some cases, output of the ratio-of-uniforms method can be quite poor because of structure in the uniforms. The ratio-of-uniforms method transforms all points lying on one line through the origin into a single number. Because of the lattice structure of the uniforms from a linear congruential generator, the lines passing through the origin have regular patterns, which result in structural gaps in the numbers yielded by the ratio-of-uniforms method. Noting these distribution problems, H¨ ormann and Derflinger (1994) make some comparisons of the ratioof-uniforms method with the transformed rejection method (Algorithm 4.9, page 121), and based on their empirical study, they recommend the transformed rejection method over the ratio-of-uniforms method. The quality of the output of the ratio-of-uniforms method, however, is more a function of the quality of the uniform generator and would usually not be of any concern if a good uniform generator is used. The relative computational efficiencies of the two methods depend on the majorizing functions used. The polygonal majorizing functions used by Leydold (2000) in the ratio-of-uniforms method apparently alleviate some of the problems found by H¨ ormann (1994b). Ratio-of-Uniforms for Multivariate Distributions Stef˘ anescu and V˘ aduva (1987) and Wakefield, Gelfand, and Smith (1991) extend the ratio-of-uniforms method to multivariate distributions. As we mentioned in discussing simple acceptance/rejection methods, the probability of rejection may be quite high for multivariate distributions. High correlations in the target distribution can also reduce the efficiency of the ratioof-uniforms method even further.
4.8
Alias Method
Walker (1977) shows that a discrete distribution with k mass points can be represented as an equally weighted mixture of k two-point distributions; that is, distributions with only two mass points. Consider the random variable X such that Pr(X = xi ) = pi , i = 1, . . . , k, 0001k and i=1 pi = 1. Walker constructed k two-point distributions, Pr(Yi = yij ) = qij ,
j = 1, 2;
i = 1, . . . , k
(with qi1 + qi2 = 1) in such a way that any pi can be represented as k −1 times a sum of qi,j s. (It is easy to prove that this can be done; use induction, starting with k = 1.) A setup procedure for the alias method is shown in Algorithm 4.15. The setup phase associates with each i = 1 to k a value Pi that will determine whether the original mass point or an “alias” mass point, indexed by ai , will be delivered when i is chosen with equal probability, k1 . Two lists, L and H,
134
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
are maintained to determine which points or point pairs have probabilities less than or greater than k1 . At termination of the setup phase, all points or point pairs have probabilities equal to k1 . Marsaglia calls the setup phase “leveling the histogram”. The outputs of the setup phase are two lists, P and a, each of length k. Algorithm 4.15 Alias Method Setup to Initialize the Lists a and P 0. For i = 1 to k, set ai = i; set Pi = 0; set bi = pi − k1 ; and if bi < 0, put i in the list L; otherwise, put i in the list H. 1. If max(bi ) = 0, then stop. 2. Select l ∈ L and h ∈ H. 3. Set c = bl and d = bh . 4. Set bl = 0 and bh = c + d. 5. Remove l from L. 6. If bh ≤ 0, then remove h from H; and if bh < 0, then put h in L. 7. Set al = h and Pl = 1 + kc. 8. Go to step 1. 0001 Notice that bi = 0 during every step. The steps are illustrated in Figure 4.13 for a distribution such that Pr(X = 1) = .30, Pr(X = 2) = .05, Pr(X = 3) = .20, Pr(X = 4) = .40, Pr(X = 5) = .05.
At the beginning, L = {2, 5} and H = {1, 4}. In the first step, the values corresponding to 2 and 4 are adjusted. The steps to generate deviates, after the values of Pi and ai are computed by the setup, are shown in Algorithm 4.16. Algorithm 4.16 Generation Using the Alias Method Following the Setup in Algorithm 4.15
4.8. ALIAS METHOD
135
Figure 4.13: Setup for the Alias Method; Leveling the Histogram 1. Generate u from a U(0, 1) distribution. 2. Generate i from a discrete uniform over 1, 2, . . . , k. 3. If u ≤ Pi , then 3.a. deliver xi ; otherwise, 3.b. deliver xai . It is clear that the setup time for Algorithm 4.15 is O(k) because the total number of items in the lists L and H goes down by at least one at each step. If, in step 2, the minimum and maximum values of b are found, as in the original algorithm of Walker (1977), the algorithm may proceed slightly faster in some cases, but then the algorithm is O(k log k). The setup method given in Algorithm 4.15 is from Kronmal and Peterson (1979a). Vose (1991) also describes a setup procedure that is O(k). Once the setup is finished by whatever method, the generation time is constant, or O(1). The alias method is always at least as fast as the guide table method of Chen and Asau (Algorithm 4.3 on page 107). Its speed relative to the table-lookup method of Marsaglia as implemented by Norman and Cannon (Algorithm 4.2 on page 106) depends on the distribution. Variates from distributions with a substantial proportion of mass points whose base b representations (equation (4.5)) have many zeros can be generated very rapidly by the table-lookup method. In the IMSL Libraries, the routine rngda performs both the setup and the generation of discrete random deviates using an alias method.
136
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Kronmal and Peterson (1979a, 1979b) apply the alias method to mixture methods and acceptance/rejection methods for continuous random variables. Peterson and Kronmal (1982) describe a modification of the alias method incorporating some aspects of the urn method. This hybrid method, which they called the alias-urn method, reduces the burden of comparisons at the expense of slightly more storage space.
4.9
Use of the Characteristic Function
The characteristic function of a d-variate random variable X is defined as T φX (t) = E eit X , t ∈ IRd . (4.14) The characteristic function exists for any random variable. For a univariate random variable whose001c first two moments are finite, and 001c whose characteristic function φ is such that |φ(t)| dt and |φ00070007 (t)| dt are finite, Devroye (1986b) describes a method for generating random variates using the characteristic function. Algorithm 4.17 is Devroye’s method for a univariate continuous random variable with probability density function p(·) and characteristic function φ(·). Algorithm 4.17 Conversion of Uniform Random Numbers Using the Characteristic Function & 001c & 001c 1 1 |φ(t)| dt and b = 2π |φ00070007 (t)| dt. 0. Set a = 2π 1. Generate u and v independently from a U(−1, 1) distribution. 2. If u < 0, then 2.a. set y = bv/a and t = a2 |u|; otherwise, 2.b. set y = b/(va) and t = a2 v 2 |u|. 3. If t ≤ p(y), then 3.a. take y as the desired realization; otherwise, 3.b. return to step 1. This method relies on the facts that, under the existence conditions, 0016 1 p(y) ≤ |φ(t)| dt for all y 2π 0016 1 |φ00070007 (t)| dt for all y. 2πx2 Both of these facts are easily established by use of the inverse characteristic function transform, which exists by the integrability conditions on φ(t).
and
p(y) ≤
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
137
The method requires evaluation of the density at each step. Devroye (1996b) also discusses variations that depend on Taylor expansion coefficients. Devroye (1991) describes a related method for the case of a discrete random variable. The characteristic function allows evaluation of all moments that exist. If only some of the moments are known, an approximate method described by Devroye (1989) can be used.
4.10
Use of Stationary Distributions of Markov Chains
Many of the methods for generating deviates from a given distribution are based on a representation of the density that allows the use of some simple transformation or some selection rule for deviates generated from a different density. In the univariate ratio-of-uniforms method, for example, we identify a bivariate uniform random variable with a region of support such that one of the marginal distributions is the distribution of interest. We then generate bivariate uniform random deviates over the region, and then, by a very simple transformation and selection, we get univariate deviates from the distribution of interest. Another approach is to look for a stochastic process that can be easily simulated and such that the distribution of interest can be identified as a distribution at some point in the stochastic process. The simplest useful stochastic process is a Markov chain with a stationary distribution corresponding to the distribution of interest. Markov Chains: Basic Definitions A Markov chain is a sequence of random variables, X1 , X2 , . . ., such that the distribution of Xt+1 given Xt is independent of Xt−1 , Xt−2 , . . . A sequence of realizations of such random variables is also called a Markov chain (that is, the term Markov chain can be used to refer either to a random sequence or to a fixed sequence of realizations). In this section, we will briefly discuss some types of Markov chains and their properties. The main purpose is to introduce the terms that are used to characterize the Markov chains in the applications that we describe later. See Meyn and Tweedie (1993) for extensive discussions of Markov chains. Tierney (1996) discusses the aspects of Markov chains that are particularly relevant for the applications that we consider in later sections. The union of the supports of the random variables is called the state space of the Markov chain. Whether or not the state space is countable is an important characteristic of a Markov chain. A Markov chain with a countable or “discrete” state space is easier to work with and can be used to approximate a Markov chain with a continuous state space. Another important characteristic of a Markov chain is the nature of the indexing. As we have written the sequence
138
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
above, we have implied that the index is discrete. We can generalize this to a continuous index, in which case we usually use the notation X(t). A Markov chain is time homogeneous if the distribution of Xt is independent of t. For our purposes, we can usually restrict attention to a time-homogeneous discrete-state Markov chain with a discrete index, and this is what we assume in the following discussion in this section. For the random variable Xt in a discrete-state Markov chain with state space S, let I index the states; that is, i in I implies that si is in S. For si ∈ S, let Pr(Xt = si ) = pti . The Markov chain can be characterized by an initial distribution and a square transition matrix or transition kernel K = (kij ), where kij = Pr(Xt+1 = si |Xt = sj ). The distribution at time t is characterized by a vector of probabilities pt = (pt1 , pt2 , . . .), so the vector itself is called a distribution. The initial distribution is p0 = (p01 , p02 , . . .), and the distribution at time t = 1 is Kp0 . We sometimes refer to a Markov chain by the doubleton (K, p0 ) or just (K, p). In general, we have pt
= Kpt−1 = K t p0 . (t)
t We 0001 denote the elements of K as kij . The relationships above require that j kij = 1. (A matrix with this property is called a stochastic matrix.) A distribution p such that Kp = p
is said to be invariant or stationary. From that definition, we see that an invariant is an eigenvector of the transition matrix corresponding to an eigenvalue of 1. (Notice the unusual usage of the word “distribution”; in this context, it means a vector.) For a given Markov chain, it is of interest to know whether the chain has an invariant (that is, whether the transition matrix has an eigenvalue equal to 1) and if so, whether the invariant can be reached from the starting distribution p0 . Some Markov chains oscillate among a set of distributions. (For example, think of a two-state Markov chain whose transition matrix has elements k11 = k22 = 0 and k12 = k21 = 1.) We will be interested in chains that do not oscillate; that is, chains that are aperiodic. A chain is guaranteed to be aperiodic if, for (t) some t sufficiently large, kii > 0 for all i in I. A Markov chain is reversible if, for any t, the conditional probability of Xt given Xt+1 is the same as the conditional probability of Xt given Xt−1 . A discrete-space Markov chain obviously is reversible if and only if its transition matrix is symmetric. A Markov chain defined by (K, p) is said to be in detailed balance if Kij pi = Kji pj .
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
139
A Markov chain is irreducible if, for all i, j in I, there exists a t > 0 such (t) that kij > 0. If the chain is irreducible, detailed balance and reversibility are equivalent. Another property of interest is when a Markov chain first takes on a given state. This is called the first passage time for that state. Given that the chain is in a particular state, the first passage time to that state is the first return time for that state. Let Tii be the first return time to state i; that is, for a discrete time chain, let Tii = min{t, s.t. Xt = si |X0 = si }. (Tii is a random variable.) An irreducible Markov chain is recurrent if, for some i, Pr(Tii < ∞) = 1. (For an irreducible chain, this implies the condition for all i.) An irreducible Markov chain is positive recurrent if, for some i, E(Tii ) < ∞. (For an irreducible chain, this implies the condition for all i.) An aperiodic, irreducible, positive recurrent Markov chain is associated with a stationary distribution or invariant distribution, which is the limiting distribution of the chain. In applications of Markov chains, the question of whether the chain has converged to this limiting distribution is one of the primary concerns. Applications that we discuss in later sections have uncountable state spaces, but the basic concepts extend to those. For a continuous state space, instead of a vector specifying the distribution at any given time, we have a probability density at that time, K is a conditional probability density for Xt+1 |Xt , and we have a similar expression for the density at t + 1 formed by integrating over the conditional density weighted by the unconditional density at t. Tierney (1996) carefully discusses the generalization to an uncountable state space and a continuous index. Markov Chain Monte Carlo There are various ways of using a Markov chain to generate random variates from some distribution related to the chain. Such methods are called Markov chain Monte Carlo, or MCMC. An algorithm based on a stationary distribution of a Markov chain is an iterative method because a sequence of operations must be performed until they converge. A Markov chain is the basis for several schemes for generating random numbers. The interest is not in the sequence of the Markov chain itself. The elements of the chain are accepted or rejected in such a way as to form a different chain whose stationary distribution is the distribution of interest. Following engineering terminology for sampling sequences, the techniques based on these chains are generally called “samplers”. The static sample, and not the sequence, is what is used. The objective in the Markov chain samplers is to generate a sequence of autocorrelated points with a given stationary distribution.
140
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
The Metropolis Random Walk For a distribution with density pX , the Metropolis algorithm, introduced by Metropolis et al. (1953), generates a random walk and performs an acceptance/rejection based on p evaluated at successive steps in the walk. In the simplest version, the walk moves from the point yi to a candidate point yi+1 = yi +s, where s is a realization from U(−a, a), if pX (yi+1 ) ≥ u, pX (yi )
(4.15)
where u is an independent realization from U(0, 1). If the new point is at least as probable (that is, if pX (yi+1 ) ≥ pX (yi )), the condition (4.15) implies acceptance without the need to generate u. The random walk of Metropolis et al. is the basic algorithm of simulated annealing, which is currently widely used in optimization problems. It is also used in simulations of models in statistical mechanics (see Section 7.9). The algorithm is described in Exercise 7.16 on page 277. If the range of the distribution is finite, the random walk is not allowed to go outside of the range. Consider, for example, the von Mises distribution, with density 1 ec cos(x) for − π ≤ x ≤ π, (4.16) p(x) = 2πI0 (c) where I0 is the modified Bessel function of the first kind and of order zero. Notice, however, that it is not necessary to know this normalizing constant because it is canceled in the ratio. The fact that all we need is a nonnegative function that is proportional to the density of interest is an important property of this method. In the ordinary acceptance/rejection methods, we need to know the constant. If c = 3, after a quick inspection of the amount of fluctuation in p, we may choose a = 1. The output for n = 1000 and a starting value of y0 = 1 is shown in Figure 4.14. The output is a Markov chain. A histogram, which is not affected by the sequence of the output in a large sample, is shown in Figure 4.15. The von Mises distribution is an easy one to simulate by the Metropolis algorithm. This distribution is often used by physicists in simulations of lattice gauge and spin models, and the Metropolis method is widely used in these simulations. Notice the simplicity of the algorithm: we do not need to determine a majorizing density nor even evaluate the Bessel function that is the normalizing constant for the von Mises density. The Markov chain samplers generally require a “burn-in” period (that is, a number of iterations before the stationary distribution is achieved). In practice, the variates generated during the burn-in period are discarded. The number of iterations needed varies with the distribution and can be quite large sometimes thousands. The von Mises example shown in Figure 4.14 is unusual; no burn-in
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
141
Figure 4.14: Sequential Output from the Metropolis Algorithm for a Von Mises Distribution is required. In general, convergence is much quicker for univariate distributions with finite ranges such as this one. It is important to remember what convergence means; it does not mean that the sequence is independent from the point of convergence forward. The deviates are still from a Markov chain. The Metropolis acceptance/rejection sequence is illustrated in Figure 4.16. Compare this with the acceptance/rejection method based on independent variables, as illustrated in Figure 4.5. The Metropolis–Hastings Method Hastings (1970) describes an algorithm that uses a more general chain for the acceptance/rejection step. Instead of just basing the decision on the probability density pX as in the inequality (4.15), the Metropolis–Hastings sampler to generate deviates from a distribution with a probability density pX uses deviates from a Markov chain with density gYt+1 |Yt . The method is shown in Algorithm 4.18. The conditional density gYt+1 |Yt is chosen so that it is easy to generate deviates from it. Algorithm 4.18 Metropolis–Hastings Algorithm 0. Set k = 0. 1. Choose x(k) in the range of pX . (The choice can be arbitrary.)
142
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.15: Histogram of the Output from the Metropolis Algorithm for a Von Mises Distribution 2. Generate y from the density gYt+1 |Yt (y|x(k) ). 3. Set r: r = pX (y)
gYt+1 |Yt (x(k) |y) . pX (x(k) )gYt+1 |Yt (y|x(k) )
4. If r ≥ 1, then 4.a. set x(k+1) = y; otherwise, 4.b. generate u from U(0, 1) and if u < r, then 4.b.i. set x(k+1) = y; otherwise, 4.b.ii. set x(k+1) = x(k) . 5. If convergence has occurred, then random walk
yi
accept? i.i.d. from pX
no
yi+1 = yi + si+1 yes xj
yi+3 = yi+1 + si+2 no
yi+2 = yi+2 + si+3 yes xj+1
Figure 4.16: Metropolis Acceptance/Rejection
··· ··· ···
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
143
5.a. deliver x = x(k+1) ; otherwise, 5.b. set k = k + 1, and go to step 2. Compare Algorithm 4.18 with the basic acceptance/rejection method in Algorithm 4.6, page 114. The analog to the majorizing function in the Metropolis– Hastings algorithm is the reference function gYt+1 |Yt (x|y) . pX (x) gYt+1 |Yt (y|x) In Algorithm 4.18, r is called the “Hastings ratio”, and step 4 is called the “Metropolis rejection”. The conditional density gYt+1 |Yt (·|·) is called the “proposal density” or the “candidate generating density”. Notice that because the reference function contains pX as a factor, we only need to know pX to within a constant of proportionality. As we have mentioned already, this is an important characteristic of the Metropolis algorithms. We can see that this algorithm delivers realizations from the density pX by using the same method suggested in Exercise 4.2 (page 160); that is, determine the CDF and differentiate. The CDF is the probability-weighted sum of the two components corresponding to whether the chain moved or not. In the case in which the chain does move (that is, in the case of acceptance), for the random variable Z whose realization is y in Algorithm 4.18, we have 0012 0011 ' g(xi |Y ) ' Pr(Z ≤ x) = Pr Y ≤ x U ≤ p(Y ) p(xi )g(Y |xi ) 001c x 001c p(t)g(xi |t)/(p(xi )g(t|xi )) g(t|xi ) ds dt 0 = 001c−∞ ∞ 001c p(t)g(xi |t)/(p(xi )g(t|xi )) g(t|xi ) ds dt −∞ 0 0016 x = pX (t) dt. −∞
We can illustrate the use of the Metropolis–Hastings algorithm using a Markov chain in which the density of Xt+1 is normal with a mean of Xt and a variance of σ 2 . Let us use this density to generate a sample from a standard normal distribution (that is, a normal with a mean of 0 and a variance of 1). We start with x0 chosen arbitrarily. We take logs and cancel terms in the expression for r in Algorithm 4.18. The sequential output for n = 1000, a starting value of x0 = 10 and a variance of σ 2 = 9 is shown in Figure 4.17. Notice that the values descend very quickly from the starting value, which would be a very unusual realization of a standard normal. This example is also special. In practice, we generally cannot expect such a short burn-in period. Notice also in Figure 4.17 the horizontal line segments where the underlying Markov chain did not advance. There are several variations of the basic Metropolis–Hastings algorithm. See Bhanot (1988) and Chib and Greenberg (1995) for descriptions of modifications
144
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.17: Sequential Output from a Standard Normal Distribution Using a Markov Chain, N(Xt , σ 2 ) and generalizations. Also see Section 4.14 for two related methods: Gibbs sampling and hit-and-run sampling. Because those methods are particularly useful in multivariate simulation, we defer the discussion to that section. The Markov chain Monte Carlo method has become one of the most important tools in statistics in recent years. Its applications pervade Bayesian analysis as well as Monte Carlo procedures in many settings. See Gilks, Richardson, and Spiegelhalter (1996) for several examples. Whenever a correlated sequence such as a Markov chain is used, variance estimation must be performed with some care. In the more common cases of positive autocorrelation, the ordinary variance estimators are negatively biased. The method of batch means or some other method that attempts to account for the autocorrelation should be used. See Section 7.4 for discussions of these methods. Tierney (1991, 1994) describes an independence sampler, a Metropolis– Hastings sampler for which the proposal density does not depend on Yt ; that is, gYt+1 |Yt (·|·) = gYt+1 (·). For this type of proposal density, it is more critical that gYt+1 (·) approximates pX (·) fairly well and that it can be scaled to majorize pX (·) in the tails. Liu (1996) and Roberts (1996) discuss some of the properties of the independence sampler and its relationship to other Metropolis–Hastings methods. As with the acceptance/rejection methods using independent sequences, the acceptance/rejection methods based on Markov chains apply immediately to multivariate random variables. As mentioned above, however, convergence gen-
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
145
erally becomes slower as the number of elements in the random vector increases. As an example of MCMC in higher dimensions, consider an example similar to that shown in Figure 4.17 except for a multivariate normal distribution instead of a univariate one. We use a d-dimensional normal with a mean vector xt and a variance-covariance matrix Σ to generate xt+1 for use in the Metropolis– Hastings method of Algorithm 4.18. Taking d = 3, 9 0 0 Σ = 0 9 0 , 0 0 9 and starting with x0 = (10, 10, 10), the first 1000 values of the first element (which should be a realization from a standard univariate normal) are shown in Figure 4.18.
Figure 4.18: Sequential Output of x1 from a Trivariate Standard Normal Distribution Using a Markov Chain, N(Xt , Σ)
Convergence Two of the most important issues in MCMC concern the rate of convergence (that is, the length of the burn-in) and the frequency with which the chain advances. In many applications of simulation, such as studies of waiting times in queues, there is more interest in transient behavior than in stationary behavior. This is not the case in random number generation using an iterative method. For general use in random number generation, the stationary distribution is the only thing of interest. (We often use the terms “Monte Carlo” and “simulation”
146
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
rather synonymously; stationarity and transience, however, are often the key distinctions between Monte Carlo applications and simulation applications. In simulation in practice, the interest is rarely in the stationary behavior, but it is in these Monte Carlo applications.) The issue of convergence is more difficult to address in multivariate distributions. It is for multivariate distributions, however, that the MCMC method is most useful. This is because the Metropolis–Hastings algorithm does not require knowledge of the normalizing constants, and the computation of a normalizing constant may be more difficult for multivariate distributions. Gelman and Rubin (1992b) give examples in which the burn-in is much longer than might be expected. Various diagnostics have been proposed to assess convergence. Cowles and Carlin (1996) discuss and compare thirteen different ones. Most of these diagnostics use multiple chains in one way or another; see, for example, Gelman and Rubin (1992a), Roberts (1992), and Johnson (1996). Multiple chains or separate subsequences within a chain can be compared using analysis-of-variance methods. Once convergence has occurred, the variance within subsequences should be the same as the variance between subsequences. Measuring the variance within a subsequence must be done with some care, of course, because of the autocorrelations. Batch means from separate streams can be used to determine when the variance has stabilized. (See Section 7.4 for a description of batch means.) Yu (1995) uses a cusum plot on only one chain to help to identify convergence. Robert (1998a) provides a benchmark case for evaluation of convergence assessment techniques. Rosenthal (1995), under certain conditions, gives bounds on the length of runs required to give satisfactory results. Cowles and Rosenthal (1998) suggest using auxiliary simulations to determine if the conditions that ensure the bounds on the lengths are satisfied. All of these methods have limitations. The collection of articles in Gilks, Richardson, and Spiegelhalter (1996) addresses many of the problems of convergence. Gamerman (1997) provides a general introduction to MCMC in which many of these convergence issues are explored. Additional reviews are given in Brooks and Roberts (1999) and the collection of articles in Robert (1998b). Mengersen, Robert, and GuihenneucJouyaux (1999) give a classification of methods and review their performance. Methods of assessing convergence are currently an area of active research. Use of any method that indicates that convergence has occurred based on the generated data can introduce bias into the results, unless somehow the probability of making the decision that convergence has occurred can be accounted for in any subsequent inference. This is the basic problem in any adaptive statistical procedure. Cowles, Roberts, and Rosenthal (1999) discuss how bias may be introduced in inferences made using an MCMC method after a convergence diagnostic has been used in the sampling. The main point in this section is that there are many subtle issues, and MCMC must be used with some care. Various methods have been proposed to speed up the convergence; see Gelfand and Sahu (1994), for example. Frigessi, Martinelli, and Stander (1997)
4.10. STATIONARY DISTRIBUTIONS OF MARKOV CHAINS
147
discuss general issues of convergence and acceleration of convergence. How quickly convergence occurs is obviously an important consideration for the efficiency of the method. The effects of slow convergence, however, are not as disastrous as the effects of prematurely assuming that convergence has occurred. Coupled Markov Chains and “Perfect” Sampling Convergence is an issue because we want to sample from the stationary distribution. The approach discussed above is to start at some arbitrary point, t = 0, and proceed until we think convergence has occurred. Propp and Wilson (1996, 1998) suggested another approach for aperiodic, irreducible, positive recurrent chains with finite state spaces. Their method is based on starting multiple chains at an earlier point. The method is to generate chains that are coupled by the same underlying element of the sample space. The coupling can be accomplished by generating a single realization of some random variable and then letting that realization determine the updating for each of the chains. This can be done in several ways. The simplest, perhaps, is to choose the coupling random variable to be U(0, 1) and use the inverse CDF method. At the point t, we generate ut+1 and update each chain with Xt+1 |ut+1 , xt by the method of equation (4.3), xt+1 = min{v, s.t. ut+1 ≤ PXt+1 |xt (v)},
(4.17)
where PXt+1 |xt (·) is the conditional CDF for Xt+1 , given Xt = xt . With this setup for coupled chains, any one of the chains may be represented in a “stochastic recursive sequence”, Xt = φ(Xt−1 , Ut ),
(4.18)
where φ is called the transition rule. The transition rule also allows us to generate Ut+1 |xt+1 , xt , as
(4.19) U PXt+1 |xt (xt+1 − 0001), PXt+1 |xt (xt+1 ) , where 0001 is vanishingly small. The idea in the method of Propp and Wilson is to start coupled chains at ts = −1 at each of the states and advance them all to t = 0. If they coalesce (that is, if they all take the same value), they are the same chain from then on, and X0 has the stationary distribution. If they do not coalesce, then we can start the chains at ts = −2 and maintain exactly the same coupling; that is, we generate a u−1 , but we use the same u0 as before. If these chains coalesce at t = 0, then we accept the common value as a realization of the stationary distribution. If they do not coalesce, we back up the starting points of the chains even further. Propp and Wilson (1996) suggested doubling the starting point each time, but any point further back in time would work. The important thing is that each time chains are started that the realizations of the uniform random variable from previous runs be used. This method is called coupling
148
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
from the past (CFTP). Propp and Wilson (1996) called this method of sampling “exact sampling”. Note that if we do the same thing starting at a fixed point and proceeding forward with parallel chains, the value to which they coalesce is not a realization of the stationary distribution. If the state space is large, checking for coalescence can be computationally intensive. There are various ways of reducing the burden of checking for coalescence. Propp and Wilson (1996) discussed the special case of a monotone chain (one for which the transition matrix stochastically preserves orderings of state + vectors) that has two starting state vectors x− 0 and x0 such that, for all x ∈ S, − + x0 ≤ x ≤ x0 . In that case, they show that if the sequence beginning with x− 0 and the sequence beginning with x+ 0 coalesce, the sequence from that point on is a sample from the stationary distribution. This is interesting, but of limited relevance. Because CFTP depends on fixed values of u0 , u−1 , . . ., for certain of these values, coalescence may occur with very small probability. (This is similar to the modified acceptance/rejection method described in Exercise 4.7b.) In these cases, the ts that will eventually result in coalescence may be very large in absolute value. An “impatient user” may decide just to start over. Doing so, however, biases the procedure. Fill (1998) described a method for sampling directly from the invariant distribution that uses coupled Markov chains of a fixed length. It is an acceptance/rejection method based on whether coalescence has occurred. This method can be restarted without biasing the results; the method is “interruptible”. In this method, an ending time and a state corresponding to that time are chosen arbitrarily. Then, we generate backwards from ts as follows. 1. Select a time ts > 0 and a state xts . 2. Generate xts −1 |xts , xts −2 |xts −1 , . . . , x0 |x1 . 3. Generate u1 |x0 , x1 , u2 |x1 , x2 , . . . , uts |xts −1 , xts using, perhaps, the distribution (4.19). 4. Start chains at t = 0 at each of the states, and advance them to t = ts using the common us. 5. If the chains have coalesced by time ts , then accept x0 ; otherwise, return to step 1. Fill gives a simple proof that this method indeed samples from the invariant distribution. Methods that attempt to sample directly from the invariant distribution of a Markov chain, such as CFTP and interruptible coupled chains, are sometimes called “perfect sampling” methods. The requirement of these methods of a finite state space obviously limits their usefulness. Møller and Schladitz (1999) extended the method to a class
4.11. USE OF CONDITIONAL DISTRIBUTIONS
149
of continuous-state Markov chains. Fill et al. (2000) also discussed the problem of continuous-state Markov chains and considered ways of increasing the computational efficiency.
4.11
Use of Conditional Distributions
If the density of interest, pX , can be represented as a marginal density of some joint density pXY , observations on X can be generated as a Markov chain with elements having densities pYi |Xi−1 ,
pXi |Yi ,
pYi+1 |Xi ,
pXi+1 |Yi+1 , . . . .
This is a simple instance of the Gibbs algorithm, which we discuss beginning on page 156. Casella and George (1992) explain this method in general. The usefulness of this method depends on identifying a joint density with conditionals that are easy to simulate. For example, if the distribution of interest is a standard normal, the joint density 1 1 pXY (x, y) = √ , 2 −x 2π e /2
2
for − ∞ < x < ∞, 0 < y < e−x
/2
,
has a marginal density corresponding to the distribution of interest, and it has 2 simple conditionals. The conditional distribution of Y |X is U(0, e−X /2 ), and √ √ the conditional of X|Y is U(− −2 log Y , −2 log Y ). Starting with x0 in the range of X, we generate y1 as a uniform conditional on x0 , then x1 as a uniform conditional on y1 , and so on. The auxiliary variable Y that we introduce just to simulate X is called a “latent variable”.
4.12
Weighted Resampling
To obtain a sample x1 , x2 , . . . , xm that has an approximate distribution with density pX , a sample y1 , y2 , . . . , yn from another distribution with density gY can be resampled using weights or probabilities pX (yi )/gY (xi ) , wi = 0001n j=1 pX (yj )/gY (xj )
for i = 1, 2, . . . n.
The method was suggested by Rubin (1987, 1988), who called it SIR for sampling/importance resampling. The method is also called importance-weighted resampling. The resampling should be done without replacement to give points with low probabilities a chance to be represented. Methods for sampling from a given set with given probabilities are discussed in Section 6.1, page 217. Generally, in SIR, n is much larger than m. This method can work reasonably well if the density gY is very close to the target density pX .
150
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
This method, like the Markov chain methods above, has the advantage that the normalizing constant of the target density is not needed. Instead of the density pX (·), any nonnegative proportional function cpX (·) could be used. Gelman (1992) describes an iterative variation in which n is allowed to increase as m increases; that is, as the sampling continues, more variates are generated from the distribution with density gY .
4.13
Methods for Distributions with Certain Special Properties
Because of the analytical and implementation burden involved in building a random number generator, a general rule is that a single algorithm that works in two settings is better than two different algorithms, one for each setting. This is true, of course, unless the individual algorithms perform better in the respective special cases, and then the question is how much better. In random number generation from nonuniform distributions, it is desirable to have “universal algorithms” that use general methods that we have discussed above but are optimized for certain broad classes of distributions. For distributions with certain special properties, general algorithms using mixtures and rejection can be optimized for broad classes of distributions. We have already discussed densities that are nearly linear (Algorithm 4.7, page 118) and densities that are nearly flat (Algorithm 4.11, page 126). Another broad class of distributions are those that are infinitely divisible. Damien, Laud, and Smith (1995) give general methods for generation of random deviates from distributions that are infinitely divisible. Distributions with Densities that Can Be Transformed to Concave Functions An important special property of some distributions is concavity of the density or of some transformation of the density. On page 119, we discuss how easy it is to form polygonal majorizing and squeeze functions for concave densities. Similar ideas can be employed for cases in which the density can be invertibly transformed into a concave function. In some applications, especially in reliability or survival analysis, the logarithm is a standard transformation and log-concavity is an important property. A distribution is log-concave if its density (or probability function) has the property 0011 0012 x1 + x2 + log p(x2 ) < 0, log p(x1 ) − 2 log p 2 wherever the densities are positive. If the density is twice-differentiable, this condition is satisfied if the negative of the Hessian is positive definite. Many of the commonly used distributions, such as the normal, the gamma with shape
4.13. DISTRIBUTIONS WITH SPECIAL PROPERTIES
151
parameter greater than 1, and the beta with parameters greater than 1, are logconcave. See Pratt (1981) for discussion of these properties, and see Dellaportas and Smith (1993) for some examples in generalized linear models. Devroye (1984b) describes general methods for a log-concave distribution, and Devroye (1987) describes a method for a discrete distribution that is log-concave. The methods of forming polygonal majorizing and squeeze functions for concave densities can also be applied to convex densities or to densities that can be invertibly transformed into concave functions by reversing the role of the majorizing and squeeze functions. Incremental Formation of Majorizing and Squeeze Functions: “Adaptive” Rejection Gilks (1992) and Gilks and Wild (1992) describe a method that they call adaptive rejection sampling or ARS for a continuous log-concave distribution. The adaptive rejection method described by Gilks (1992) begins with a set Sk consisting of the points x0 < x1 < . . . < xk < xk+1 from the range of the distribution of interest. Define Li as the straight line determined by the points (xi , log p(xi )) and (xi+1 , log p(xi+1 )); then, for i = 1, 2, . . . , k, define the piecewise linear function hk (x) as hk (x) = min (Li−1 (x), Li+1 (x))
for xi ≤ x < xi+1 .
This piecewise linear function is a majorizing function for the log of the density, as shown in Figure 4.19. The chords formed by the continuation of the line segments form functions that can be used as a squeeze function, mk (x), which is also piecewise linear. For the density itself, the majorizing function and the squeeze function are piecewise exponentials. The majorizing function is cgk (x) = exp hk (x), where each piece of gk (x) is an exponential density function truncated to the appropriate range. The density is shown in Figure 4.20. In each step of the acceptance/rejection algorithm, the set Sk is augmented by the point generated from the majorizing distribution, and k is increased by 1. The method is shown in Algorithm 4.19. In Exercise 4.14, page 162, you are asked to write a program for performing adaptive rejection sampling for the density shown in Figure 4.20, which is the same one as in Figure 4.4, and to compare the efficiency of this method with the standard acceptance/rejection method. Algorithm 4.19 Adaptive Acceptance/Rejection Sampling 0. Initialize k and Sk . 1. Generate y from gk .
152
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.19: Adaptive Majorizing Function with the Log-Density (Same Density as in Figure 4.4) 2. Generate u from a U(0, 1) distribution. exp mk (y) , then cgk (y) 3.a. deliver y; otherwise, p(y) , then deliver y; 3.b. if u ≤ cgk (y) 3.c. set k = k + 1, add y to Sk , and update hk , gk , and mk .
3. If u ≤
4. Go to step 1. After an update step, the new piecewise linear majorizing function for the log of the density is as shown in Figure 4.21. Gilks and Wild (1992) describe a similar method, but instead of using secants as the piecewise linear majorizing function, they use tangents of the log of the density. This requires computation of numerical derivatives of the log density. H¨ ormann (1994a) adapts the methods of Gilks (1992) and Gilks and Wild (1992) to discrete distributions. T -concave Distributions H¨ ormann (1995) extends the methods for a distribution with a log-concave density to a distribution whose density p can be transformed by a strictly increasing
4.13. DISTRIBUTIONS WITH SPECIAL PROPERTIES
153
Figure 4.20: Exponential Adaptive Majorizing Function with the Density in Figure 4.4 operator T such that T (p(x)) is concave. In that case, the √ density p is said to be “T -concave”. Often, a good choice is T (s) = −1/ s. A density that is T -concave with respect to this transformation is log-concave. Many of the standard distributions have T -concave densities, and in those cases we refer to the distribution itself as T -concave. The normal distribution (equation (5.6)), for example, is T -concave for all values of its parameters. The gamma distribution (equation (5.13)) is T -concave for α ≥ 1 and β > 0. The beta distribution (equation (5.14)) is√T -concave for α ≥ 1 and β ≥ 1. The transformation T (s) = −1/ s allows construction of a table mountain majorizing function (reminiscent of a majorizing function in the ratioof-uniforms method) that is then used in an acceptance/rejection method. H¨ ormann calls this method transformed density rejection. Leydold (2001) describes an algorithm for T -concave distributions based on a ratio-of-uniforms type of acceptance/rejection method. The advantage of Leydold’s method is that it requires less setup time than H¨ ormann’s method, and so would be useful in applications in which the parameters of the distribution change relatively often compared to the number of variates generated at each fixed value. Gilks, Best, and Tan (1995; corrigendum, Gilks, Neal, Best, and Tan, 1997) develop an adaptive rejection method that does not require the density to be log-concave. They call the method adaptive rejection Metropolis sampling.
154
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
Figure 4.21: Adaptive Majorizing Function with an Additional Point Unimodal Densities Many densities of interest are unimodal, and some simple methods of random number generation take advantage of that property. The ziggurat and Monty Python decomposition methods of Marsaglia and Tsang (1984, 1998) are most effective for unimodal distributions, in which the first decomposition can involve forming two regions, one on each side of the mode. Devroye (1984a) describes general methods for generating variates from such distributions. If a distribution is not unimodal, it is sometimes useful to decompose the distribution into a mixture of unimodal distributions to use the techniques on them separately. Methods for sampling from unimodal discrete distributions, which often involve linear searches, can be more efficient if the search begins at the mode. (See, for example, the method for the Poisson distribution on page 188.) Multimodal Densities For simulating densities with multiple modes, it is generally best to express the distribution as a mixture and use different methods in different regions. MCMC methods can become trapped around a local mode. There are various ways of dealing with this problem. One way to do this is to modify the target density pX (·) in the Hastings ratio so that it becomes flatter, and therefore it is more likely that the sequence will move away from a local mode. Geyer and Thompson (1995) describe a method of “simulated tempering”, in which a “temperature” parameter, which controls how likely it is that the sequence will move away from a current state, is varied randomly. This is similar to
4.14. GENERAL METHODS FOR MULTIVARIATE DISTRIBUTIONS 155 methods used in simulated annealing (see Section 7.9). Neal (1996) describes a systematic method of alternating between the target density and a flatter one. He called the method “tempered transition”.
4.14
General Methods for Multivariate Distributions
Two simple methods of generating multivariate random variates make use of variates from univariate distributions. One way is to generate a vector of i.i.d. variates and then apply a transformation to yield a vector from the desired multivariate distribution. Another way is to use the representation of the distribution function or density function as a product of the form pX1 X2 X3 ···Xd = pX1 |X2 X3 ···Xd · pX2 |X3 ···Xd · pX3 |···Xd · · · pXd . In this method, we generate a marginal xd from pXd , then a conditional xd−1 from pXd−1 |Xd , and continue in this way until we have the full realization x1 , x2 , . . . , xd . We see two simple examples of these methods at the beginning of Section 5.3, page 197. In the first example in that section, we generate a d-variate normal with variance-covariance matrix Σ either by the transformation x = T T z, where T is a d × d matrix such that T T T = Σ and z is a d-vector of i.i.d. N(0, 1) variates. In the second case, we generate x1 from N1 (0, σ11 ), then generate x2 conditionally on x1 , then generate x3 conditionally on x1 and x2 , and so on. As mentioned in discussing acceptance/rejection methods in Sections 4.5 and 4.10, these methods are directly applicable to multivariate distributions, so acceptance/rejection is a third general way of generating multivariate observations. As in the example of the bivariate gamma on page 123, however, this usually involves a multivariate majorizing function, so we are still faced with the basic problem of generating from some multivariate distribution. For higher dimensions, the major problem in using acceptance/rejection methods for generating multivariate deviates results from one of the effects of the so-called “curse of dimensionality”. The proportion of the volume of a closed geometrical figure that is in the outer regions of that figure increases with increasing dimensionality. (See Section 10.7 of Gentle, 2002, and Exercise 4.4f at the end of this chapter.) An iterative method somewhat similar to the use of marginals and conditionals can also be used to generate multivariate observations. This method was used by Geman and Geman (1984) for generating observations from a Gibbs distribution (Boltzmann distribution) and so is called the Gibbs method. In the Gibbs method, after choosing a starting point, the components of the d-vector variate are generated one at a time conditionally on all others. If pX is the density of the d-variate random variable X, we use the conditional densities pX1 |X2 X3 ···Xd , pX2 |X1 X3 ···Xd , and so on. At each stage, the conditional distribution uses the most recent values of all of the other components. Obviously,
156
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
it may require a number of iterations before the choice of the initial starting point is washed out. The method is shown in Algorithm 4.20. (In the algorithms to follow, we represent the support of the density of interest by S, where S ⊆ IRd .) Algorithm 4.20 Gibbs Method 0. Set k = 0. 1. Choose x(k) ∈ S. (k+1)
(k)
(k)
(k+1)
, x2
(k)
conditionally on x2 , x3 , . . . , xd , 2. Generate x1 (k+1) (k+1) (k) (k) Generate x2 conditionally on x1 , x3 , . . . , xd , .. (k+1) (k+1) (k+1) (k) , x2 , . . . , xd , Generate xd−1 conditionally on x1 (k+1)
Generate xd
conditionally on x1
(k+1)
(k+1)
, . . . , xd−1 .
3. If convergence has occurred, then 3.a. deliver x = x(k+1) ; otherwise, 3.b. set k = k + 1, and go to step 2. Casella and George (1992) give a simple proof that this iterative method converges; that is, as k → ∞, the density of the realizations approaches pX . The question of whether convergence has practically occurred in a finite number of iterations in the Gibbs method is similar to the same question in the Metropolis– Hastings method discussed in Section 4.10. In either case, to determine that convergence has occurred is not a simple problem. Once a realization is delivered in Algorithm 4.20 (that is, once convergence has been deemed to have occurred), subsequent realizations can be generated either by starting a new iteration with k = 0 in step 0 or by continuing at step 1 with the current value of x(k) . If the chain is continued at the current value of x(k) , we must remember that the subsequent realizations are not independent. This affects variance estimates (second-order sample moments) but not means (first-order moments). In order to get variance estimates, we may use means of batches of subsequences or use just every mth (for some m > 1) deviate in step 3. (The idea is that this separation in the sequence will yield subsequences or a systematic subsample with correlations nearer 0. See Section 7.4 for a description of batch means.) If we just want estimates of means, however, it is best not to subsample the sequence; that is, the variances of the estimates of means (first-order sample moments) using the full sequence are smaller than the variances of the estimates of the same means using a systematic (or any other) subsample (as long as the Markov chain is stationary). To see this, let x ¯i be the mean of a systematic subsample of size n consisting of every mth realization beginning with the ith realization of the converged
4.14. GENERAL METHODS FOR MULTIVARIATE DISTRIBUTIONS 157 sequence. Now, following MacEachern and Berliner (1994), we observe that ¯j )| ≤ V(¯ xl ) |Cov(¯ xi , x for any positive i, j, and l less than or equal to m. Hence, if x ¯ is the sample mean of a full sequence of length nm, then V(¯ x)
m
= V(¯ xl )/m +
Cov(¯ xi , x ¯j )/m2
i=j;i,j=1
≤ V(¯ xl )/m + m(m − 1)V(¯ xl )/m2 = V(¯ xl ). See also Geyer (1992) for a discussion of subsampling in the chain. The paper by Gelfand and Smith (1990) was very important in popularizing the Gibbs method. Gelfand and Smith also describe a related method of Tanner and Wong (1987), called data augmentation, which Gelfand and Smith call substitution sampling. In this method, a single component of the d-vector is chosen (in step 1), and then multivariate subvectors are generated conditional on just one component. This method requires d(d−1) conditional distributions. The reader is referred to their article and to Schervish and Carlin (1992) for descriptions and comparisons with different methods. Tanner (1996) defines a chained data augmentation, which is the Gibbs method described above. In the Gibbs method, the components of the d-vector are changed systematically, one at a time. The method is sometimes called alternating conditional sampling to reflect this systematic traversal of the components of the vector. Another type of Metropolis method is the hit-and-run sampler. In this method, all components of the vector are updated at once. The method is shown in Algorithm 4.21 in the general version described by Chen and Schmeiser (1996). Algorithm 4.21 Hit-and-Run Sampling 0. Set k = 0. 1. Choose x(k) ∈ S. 2. Generate a random normalized direction v (k) in IRd . (This is equivalent to a random point on a sphere, as discussed on page 201.) 3. Determine the set S (k) ⊆ IR consisting of all λ 0018 (x(k) + λv (k) ) ∈ S. (S (k) is one-dimensional; S is d-dimensional.) 4. Generate λ(k) from the density g (k) , which has support S (k) . 5. With probability a(k) , 5.a. set x(k+1) = x(k) + λ(k) v (k) ; otherwise, 5.b. set x(k+1) = x(k) .
158
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES
6. If convergence has occurred, then 6.a. deliver x = x(k+1) ; otherwise, 6.b. set k = k + 1, and go to step 2. Chen and Schmeiser (1996) discuss various choices for g (k) and a(k) . One choice is p(x(k) + λv (k) ) for λ ∈ S (k) , 001c (k) + uv (k) ) du p(x (k) (k) S g (λ) = 0 otherwise, and a(k) = 1. Another choice is g uniform over S (k) if S (k) is bounded, or else some symmetric distribution centered on 0 (such as a normal or Cauchy distribution), together with 0011 0012 p(x(k) + λ(k) v (k) ) (k) . a = min 1, p(x(k) ) Smith (1984) uses the hit-and-run sampler for generating uniform points over bounded regions, and B´elisle, Romeijn, and Smith (1993) use it for generating random variates from general multivariate distributions. Proofs of the convergence of the method can be found in B´elisle, Romeijn, and Smith (1993) and Chen and Schmeiser (1996). Gilks, Roberts, and George (1994) describe a generalization of the hit-andrun algorithm called adaptive direction sampling. In this method, a set of current points is maintained, and only one, chosen at random from the set, is updated at each iteration (see Gilks and Roberts, 1996). Both the Gibbs and hit-and-run methods are special cases of the Metropolis– Hastings method in which the r of step 2 in Algorithm 4.18 (page 141) is exactly 1, so there is never a rejection. The same issues of convergence that we encountered in discussing the Metropolis–Hastings method must be addressed when using the Gibbs or hit-and-run methods. The need to run long chains can increase the number of computations to unacceptable levels. Schervish and Carlin (1992) and Cowles and Carlin (1996) discuss general conditions for convergence of the Gibbs sampler. Dellaportas (1995) discusses some issues in the efficiency of random number generation using the Gibbs method. Berbee et al. (1987) compare the efficiency of hit-and-run methods with acceptance/rejection methods and find the hit-andrun methods to be more efficient in higher dimensions. Chen and Schmeiser (1993) give some general comparisons of Gibbs, hit-and-run, and variations. Generalizations about the performance of the methods are difficult; the best method often depends on the problem.
4.15. GENERATING SAMPLES FROM A GIVEN DISTRIBUTION
159
Multivariate Densities with Special Properties We have seen that certain properties of univariate densities can be used to develop efficient algorithms for general distributions that possess those special properties. For example, adaptive rejection sampling and other special acceptance/rejection methods can be used for distributions having concave densities or concave transformed densities, as discussed on page 150. H¨ ormann (2000) describes a method for log-concave bivariate distributions that uses adaptive rejection sampling to develop the majorizing function. Leydold (1998) shows that while the methods for univariate T -concave distributions would work for multivariate T -concave distributions, such methods are unacceptably slow. He splits the T -concave multivariate density into a set of simple cones and constructs the majorizing function from piecewise hyperplanes that are tangent to the cones. He reports favorably on the performance of his method for as many as eight dimensions. As we have seen, unimodal distributions are generally easier to work with than multimodal distributions. A product multivariate density having unimodal factors will of course be unimodal. Devroye (1997) described general acceptance/rejection methods for multivariate distributions with the slightly weaker property of being orthounimodal; that is, each marginal density is unimodal.
4.15
Generating Samples from a Given Distribution
Usually, in applications, rather than just generating a single random deviate, we generate a random sample of deviates from the distribution of interest. A random sample of size n from a discrete distribution with probability function p(X = mi ) = pi has a vector of counts of the mass points that has a multinomial (n, p1 , . . . , pk ) distribution. If the sample is to be used as a set, rather than as a sequence, and if n is large relative to k, it obviously makes more sense to generate a single multinomial (x1 , x2 , . . . , xk ) and use these values as counts of occurrences of the respective mass points m1 , m2 , . . . , mk . (Methods for generating multinomials are discussed in Section 5.3.2, page 198.) This same idea can be applied to continuous distributions with a modification to discretize the range (see Kemp and Kemp, 1987).
Exercises 4.1. The inverse CDF method.
160
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES (a) Prove that if X is a random variable with an absolutely continuous distribution function PX , the random variable PX (X) has a U(0, 1) distribution. (b) Prove that the inverse CDF method for discrete random variables as specified in the relationship in expression (4.2) on page 104 is correct.
4.2. Formally prove that the random variable delivered in Algorithm 4.6 on page 114 has the density pX . Hint: For the delivered variable, Z, determine the distribution function Pr(Z ≤ x) and differentiate. 4.3. Write a Fortran or C function to implement the acceptance/rejection method for generating a beta(3, 2) random deviate. Use the majorizing function shown in Figure 4.4 on page 115. The value of c is 1.2. Use the inverse CDF method to generate a deviate from g. (This will involve taking a square root.) 4.4. Acceptance/rejection methods. (a) Give an algorithm to generate a normal random deviate using the basic acceptance/rejection method with the double exponential density (see equation (5.11), page 177) as the majorizing density. (b) What is the acceptance proportion of this method? (c) After you have obtained the basic acceptance/rejection test, try to simplify it. (d) Develop an algorithm to generate bivariate normal deviates with mean (0, 0), variance (1, 1), and correlation ρ using a bivariate product double exponential density as the majorizing density. For ρ = 0, what is the acceptance probability? (e) Write a program to generate bivariate normal deviates with mean (0, 0), variance (1, 1), and correlation ρ. Use a bivariate product double exponential density as the majorizing density. Now, set ρ = 0.5 and generate a sample of 1000 bivariate normals. Compare the sample statistics with the parameters of the simulated distribution. (f) What is the acceptance probability for a basic acceptance/rejection method to generate d-variate normal deviates with mean 0 and diagonal variance-covariance matrix with all elements equal to 1 using a d-variate product double exponential density as the majorizing density? 4.5. What would be the problem with using a normal density to make a majorizing function for the double exponential distribution (or using a halfnormal for an exponential)? 4.6. (a) Write a Fortran or C function to implement the acceptance/rejection method for a bivariate gamma distribution whose density is given in
EXERCISES
161
equation (4.10) on page 123 using the method described in the text. (You must develop a method for determining the mode.) (b) Now, instead of the bivariate uniform in the rectangle near the origin, devise a pyramidal distribution to use as a majorizing density. (c) Use Monte Carlo methods to compare efficiency of the method using the bivariate uniform and the method using a pyramidal density. 4.7. Consider the acceptance/rejection method given in Algorithm 4.6 to generate a realization of a random variable X with density function pX using a density function gY . (a) Let T be the number of passes through the three steps until the desired variate is delivered. Determine the mean and variance of T (in terms of pX and gY ). (b) Now, consider a modification of the rejection method in which steps 1 and 2 are reversed, and the branch in step 3 is back to the new step 2; that is: 1. Generate u from a uniform (0,1) distribution. 2. Generate y from the distribution with density function gY . 3. If u ≤ pX (y)/cgY (y), then take y as the desired realization; otherwise return to step 2. Is this a better method? Let Q be the number of passes through these three steps until the desired variate is delivered. Determine the mean and variance of Q. (This method was suggested by Sibuya, 1961, and analyzed by Greenwood, 1976c.) 4.8. Formally prove that the random variable delivered in Algorithm 4.7 on page 118 has the density p. 4.9. Write a Fortran or C function to implement the ratio-of-uniforms method (page 130) to generate deviates from a gamma distribution with shape parameter α. Generate a sample of size 1000 and perform a chi-squared goodness-of-fit test (see Cheng and Feast, 1979). 4.10. Use the Metropolis–Hastings algorithm (page 141) to generate a sample of standard normal random variables. Use as the candidate generating density g(x|y), a normal density in x with mean y. Experiment with different burn-in periods and different starting values. Plot the sequences generated. Test your samples for goodness-of-fit to a normal distribution. (Remember that they are correlated.) Experiment with different sample sizes. 4.11. Let Π have a beta distribution with parameters α and β, and let X have a conditional distribution given Π = π of a binomial with parameters n and π. Let Π conditional on X = x have a beta distribution with parameters
162
CHAPTER 4. TRANSFORMATIONS OF UNIFORM DEVIATES α + x and n + β − x. (This leads to the “beta-binomial” distribution; see page 187.) Consider a bivariate Markov chain, (Π0 , X0 ), (Π1 , X1 ), . . ., with an uncountable state space (see Casella and George, 1992). (a) What is the transition kernel? That is, what is the conditional density of (Πt , Xt ) given (πt−1 , xt−1 )? (b) Consider just the Markov chain of the beta-binomial random variable X. What is the (i, j) element of the transition matrix?
4.12. Obtain a sample of size 100 from the beta(3,2) distribution using the SIR method of Section 4.12 and using a sample of size 1000 from the density gY that is proportional to the triangular majorizing function used in Exercise 4.3. (Use Algorithm 6.1, page 218, to generate the sample without replacement.) Compare the efficiency of the program that you have written with the one that you wrote in Exercise 4.3. 4.13. Formally prove that the random variable delivered in Algorithm 4.19 on page 151 has the density pX . (Compare Exercise 4.2.) 4.14. Write a computer program to implement the adaptive acceptance/rejection method for generating a beta(3,2) random deviate. Use the majorizing function shown in Figure 4.19 on page 152. The initial value of k is 4, and Sk = {0.00, 0.10, 0.60, 0.75, 0.90, 1.00}. Compare the efficiency of the program that you have written with the ones that you wrote in Exercises 4.3 and 4.12. 4.15. Consider the trivariate normal distribution used as the example in Figure 4.18 (page 145). (a) Use the Gibbs method to generate and plot 1000 realizations of X1 (including any burn-in). Explain any choices that you make on how to proceed with the method. (b) Use the hit-and-run method to generate and plot 1000 realizations of X1 (including any burn-in). Explain any choices that you make on how to proceed with the method. (c) Compare the Metropolis–Hastings method (page 145) and the Gibbs and hit-and-run methods for this problem. 4.16. Consider a probability model in which the random variable X has a binomial distribution with parameters n and y, which are, respectively, realizations of a conditional shifted Poisson distribution and a conditional beta distribution. For fixed λ, α, and β, let the joint density of X, N , and Y be proportional to λn y x+α−1 (1 − y)n−x+β−1 e−λ x!(n − x)!
for
x = 0, 1, . . . , n; 0 ≤ y ≤ 1; n = 1, 2, . . . .
EXERCISES
163
First, determine the conditional densities for X|y, n, Y |x, n, and N |x, y. Next, write a Fortran or C program to sample X from the multivariate distribution for given λ, α, and β. Now, set λ = 16, α = 2, and β = 4, run 500 independent Gibbs sequences of length k = 10, taking only the final variate, and plot a histogram of the observed x. (Use a random starting point.) Now repeat the procedure, except using only one Gibbs sequence of length 5000, and plot a histogram of all observed xs after the ninth one (see Casella and George, 1992). 4.17. Generate a random sample of 1000 Bernoulli variates with π = 0.3. Do not use Algorithm 4.1; instead, use the method of Section 4.15.
This page intentionally left blank
Chapter 5
Simulating Random Numbers from Specific Distributions For the important distributions, specialized algorithms based on the general methods discussed in the previous chapter are available. The important difference in the algorithms is their speed. A secondary difference is the size and complexity of the program to implement the algorithm. Because all of the algorithms for generating from nonuniform distributions rely on programs to generate from uniform distributions, an algorithm that uses only a small number of uniforms to yield a variate of the target distribution may be faster on a computer system on which the generation of the uniform is very fast. As we have mentioned, on a given computer system, there may be more than one program available to generate uniform deviates. Often, a portable generator is slower than a nonportable one, so for portable generators of nonuniform distributions, those that require a small number of uniform deviates may be better. If evaluation of elementary functions is a part of the algorithm for generating random deviates, then the speed of the overall algorithm depends on the speed of the evaluation of the functions. The relative speed of elementary function evaluation is different on different computer systems. The algorithm for a given distribution is some specialized version of those methods discussed in the previous chapter. Often, the algorithm uses some combination of these general techniques. Many algorithms require some setup steps to compute various constants and to store tables; therefore, there are two considerations for the speed: the setup time and the generation time. In some applications, many random numbers from the same distribution are required. In those cases, the setup time may not be too important. In other applications, the random numbers come from different distributions—probably the same family of distributions but with changing 165
166
CHAPTER 5. SPECIFIC DISTRIBUTIONS
values of the parameters. In those cases, the setup time may be very significant. If the best algorithm for a given distribution has a long setup time, it may be desirable to identify another algorithm for use when the parameters vary. Any computation that results in a quantity that is constant with respect to the parameters of the distribution should of course be performed as part of the setup computations in order to avoid performing the computation in every pass through the main part of the algorithm. The efficiency of an algorithm may depend on the values of the parameters of the distribution. Many of the best algorithms therefore switch from one method to another, depending on the values of the parameters. In some cases, the speed of the algorithm is independent of the parameters of the distribution. Such an algorithm is called a uniform time algorithm. In many cases, the most efficient algorithm in one range of the distribution is not the most efficient in other regions. Many of the best algorithms therefore use mixtures of the distribution. Sometimes, it is necessary to generate random numbers from some subrange of a given distribution, such as the tail region. In some cases, there are efficient algorithms for such truncated distributions. (If there is no specialized algorithm for a truncated distribution, acceptance/rejection applied to the full distribution will always work, of course.) Methods for generating random variates from specific distributions are an area in which there have been literally hundreds of papers, each proposing some wrinkle (not always new or significant). Because the relative efficiencies (“efficiency” here means “speed”) of the individual operations in the algorithms vary from one computing system to another, and also because these individual operations can be programmed in various ways, it is very difficult to compare the relative efficiencies of the algorithms. This provides fertile ground for a proliferation of “research” papers. Two other things contribute to the large number of insignificant papers in this area. It is easy to look at some algorithm, modify some step, and then offer the new algorithm. Thus, the intellectual capitalization required to enter the field is small. (In business and economics, this is the same reason that so many restaurants are started; only a relatively small capitalization is required.) Another reason for the large number of papers purporting to give new and better algorithms is the diversity of the substantive and application areas that constitute the backgrounds of the authors. Monte Carlo simulation is widely used throughout both the hard and the soft sciences. Research workers in one field often are not aware of the research published in another field. Although, of course, it is important to seek efficient algorithms, it is also necessary to consider a problem in its proper context. In Monte Carlo simulation applications, literally millions of random numbers may be generated, but the time required to generate them is likely to be only a very small fraction of the total computing time. In fact, it is probably the case that the fraction of time required for the generation of the random numbers is somehow negatively correlated with the importance of the problem. The importance of the time
5.1. MODIFICATIONS OF STANDARD DISTRIBUTIONS
167
required to perform some task usually depends more on its proportion of the overall time of the job rather than on its total time. Another consideration is whether the algorithm is portable; that is, whether it yields the same stream on different computer systems. As we mention in Section 4.5, methods that accept or reject a candidate variate based on a floatingpoint comparison may not yield the same streams on different systems. The descriptions of the algorithms in this chapter are written with an emphasis on clarity, so they should not be incorporated directly into program code without considerations of efficiency. These considerations generally involve avoiding unnecessary computations. This may mean defining a variable not mentioned in the algorithm description or reordering the steps slightly.
5.1
Modifications of Standard Distributions
For many of the common distributions, there are variations that are useful either for computational or other practical reasons or because they model some stochastic process well. A distribution can sometimes be simplified by transformations of the random variable that effectively remove certain parameters that characterize the distribution. In many cases, the algorithms for generating random deviates address the simplified version of the distribution. An appropriate transformation is then applied to yield deviates from the distribution with the given parameters. Standard Distributions A linear transformation, Y = aX + b, is simple to apply and is one of the most useful. The multiplier affects the scale, and the addend affects the location. For example, a “three-parameter” gamma distribution with density p(y) =
1 (y − γ)α−1 e−(y−γ)/β , Γ(α)β α
for γ ≤ y ≤ ∞,
can be formed from the simpler distribution with density g(x) =
1 α−1 −x x e , Γ(α)
for 0 ≤ x ≤ ∞,
using the transformation Y = βX + γ. (Here, and elsewhere, when we give an expression for a probability density function, we imply that the density is equal to 0 outside of the range specified.) The β parameter is a scale parameter, and γ is a location parameter. (The remaining α parameter is called the “shape parameter”, and it is the essential parameter of the family of gamma distributions.) The simpler form is called the standard gamma distribution. Other distributions have similar standard forms. Standard distributions (or standardized random variables) allow us to develop simpler algorithms and more compact tables of values that can be used for a range of parameter values.
168
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Truncated Distributions In many stochastic processes, the realizations of the random variable are constrained to be within a given region of the support of the random variable. Over the allowable region, the random variable has a probability density (or probability function) that is proportional to the density (or probability) of the unconstrained random variable. If the random variable Y has probability density p(y) over a domain S, and if Y is constrained to R ⊂ S, the probability density of the constrained random variable is 1 p(x) for x ∈ R; pc (x) = Pr(Y ∈ R) = 0, elsewhere. (5.1) The most common types of constraint are truncations, either left or right. In a left truncation at τ , say, the random variable Y is constrained by τ ≤ Y , and in a right truncation, it is constrained by Y ≤ τ . Truncated distributions are useful models in applications in which the observations are censored. Such observations often arise in studies where a variable of interest is the time until a particular event occurs. At the end of the study, there may be a number of observational units that have not experienced the event. The corresponding times for these units are said to be censored, or rightcensored; it is known only that the times for these units would be greater than some fixed value. In a similar fashion, left-censoring occurs when the exact times are not recorded early in the study. There are many issues to consider in the analysis of censored data, but it is not our purpose here to discuss the analysis. Generation of random variates with constraints can be handled by the general methods discussed in the previous chapter. The use of acceptance/rejection is obvious; merely generate from the full distribution and reject any realizations outside of the acceptable region. Of course, choosing a majorizing density with no support in the truncated region is a better approach. Modification of the inverse CDF method to handle truncated distributions is simple. For a right truncation at τ of a distribution with CDF PY , for example, instead of the basic transformation (4.1), page 102, we use X = PY−1 (U PY (τ )),
(5.2)
where U is a random variable from U(0, 1). The method using a sequence of conditional distributions described on page 149 can often be modified easily to generate variates from truncated distributions. In some simple applications, the truncated distribution is simulated by a conditional uniform distribution, the range of which is the intersection of the full conditional range and the truncated range. See Damien and Walker (2001) for some examples. There are usually more efficient ways of generating variates from constrained distributions. We describe a few of the more common ones (which are invariably truncations) in the following sections.
5.1. MODIFICATIONS OF STANDARD DISTRIBUTIONS
169
“Inverse” Distributions In Bayesian applications, joint probability densities of interest often involve a product of the density of some well-known random variable and what might be considered the density of the multiplicative inverse of another well-known random variable. Common examples of this are the statistics used in studentization: the chi-squared and the Wishart. Many authors refer to the distribution of such a random variable as the “inverse distribution”; for example, an “inverse chi-squared distribution” is the distribution of X −1 , where X has a chi-squared distribution. Other distributions with this interpretation are the inverse gamma distribution and the inverse Wishart distribution. This interpretation of “inverse” is not the same as for that word in the inverse Gaussian distribution with density given in equation (5.30) on page 193. In the cases of the inverse gamma, chi-squared, and Wishart distributions, the method for generating random variates is the obvious one: generate from the regular distribution and then obtain the inverse. Folded Symmetric Distributions For symmetric distributions, a useful nonlinear transformation is the absolute value. The distribution of the absolute value is often called a “folded” distribution. The exponential distribution, for example, is the folded double exponential distribution (see page 176). The halfnormal distribution, which is the distribution of the absolute value of a normal random variable, is a folded normal distribution. Mixture Distributions In Chapter 4, we discussed general methods for generating random deviates by decomposing the density p(·) into a mixture of other densities,
wi pi (x), (5.3) p(x) = 0001
i
where i wi = 1. Mixture distributions are useful in their own right. It was noticed as early as the nineteenth century by Francis Galton and Karl Pearson that certain observational data correspond very well to a mixture of two normal distributions, whereas a single normal does not fit the data well at all. Often, a simple mixture distribution can be used to model outliers or aberrant observations. This kind of mixture, in which a substantial proportion of the total probability follows one distribution and a small proportion follow another distribution, is called a “contaminated distribution”. Mixture distributions are often used in robustness studies because the interest is in how well a standard procedure holds up when the data are subject to contamination by a different population or by incorrect measurements. A very simple extension of a finite (or countable) mixture, as in equation (5.3), is one in which the parameter of the individual is used to weight
170
CHAPTER 5. SPECIFIC DISTRIBUTIONS
the densities continuously. Let the individual densities be indexed continuously by θ; that is, the density corresponding to θ is p(·; θ).001c Now, let w(·) be a weight (density) associated with θ such that w(θ) ≥ 0 and w(θ)dθ = 1. Then, form the mixture density p(·) as 0016 p(x) = w(θ)p(x; θ)dθ. (5.4) An example of this kind of mixture is the beta-binomial distribution, the density of which is given in equation (5.18). Probability-Skewed Distributions A special type of mixture distribution is a probability-skewed distribution, in which the mixing weights are the values of a CDF. The skew-normal distribution is a good example. The (standard) skew-normal distribution has density 2 2 (5.5) g(x) = √ e−x /2 Φ(λx) for − ∞ ≤ x ≤ ∞, 2π where Φ(·) is the standard normal CDF, and λ is a constant such that −∞ < λ < ∞. For λ = 0, the skew-normal distribution is the normal distribution, and in general, if |λ| is relatively small, the distribution is close to the normal. For larger |λ|, the distribution is more skewed, either positively or negatively. This distribution is an appropriate distribution for variables that would otherwise have a normal distribution but have been screened on the basis of a correlated normal random variable. See Arnold et al. (1993) for discussions. Other distributions symmetric about 0 can also be skewed by a CDF in this manner. The general form of the probability density is g(x) ∝ p(x)P (λx), where p(·) is the density of the underlying symmetric distribution, and P (·) is a CDF (not necessarily the corresponding one). The idea also extends to multivariate distributions. Arnold and Beaver (2000) discuss definitions and applications of such densities, specifically a skew-Cauchy density. In most cases, if |λ| is relatively small, generation of random variables from a probability-skewed symmetric distribution using an acceptance/rejection method with the underlying symmetric distribution as the majorizing density is entirely adequate. For larger values of |λ|, it is necessary to divide the support into two or more intervals. It is still generally possible to use the same majorizing density, but the multiplicative constant can be different in different intervals.
5.2
Some Specific Univariate Distributions
In this section, we consider several of the more common univariate distributions and indicate methods for simulating them. The methods discussed are generally
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
171
among the better ones, at least according to some criteria, but the discussion is not exhaustive. We give the details for some simpler algorithms, but in many cases the best algorithm involves many lines of a program with several constants that optimize a majorizing function or a squeeze function or the breakpoints of mixtures. We sometimes do not describe the best method in detail but rather refer the interested reader to the relevant literature. Devroye (1986a) has given a comprehensive treatment of methods for generating deviates from various distributions, and more information on many of the algorithms in this section can be found in that reference. The descriptions of the algorithms that we give indicate the computations, but if the reader develops a program from the algorithm, issues of computational efficiency should be considered. For example, in the descriptions, we do not identify the computations that should be removed from the main body of the algorithm and made part of some setup computations. Two variations of a distribution are often of interest. In one variation, the distribution is truncated. In this case, as we mentioned above, the range of the original distribution is restricted to a subrange and the probability measure adjusted accordingly. In another variation, the role of the random variable and the parameter of the distribution are interchanged. In some cases, these quantities have a natural association, and the corresponding distributions are said to be conjugate. An example of two such distributions are the binomial and the beta. What is a realization of a random variable in one distribution is a parameter in the other distribution. For many distributions, we may want to generate samples of a parameter, given realizations of the random variable (the data).
5.2.1
Normal Distribution
The normal distribution, which we denote by N(µ, σ 2 ), has the probability density 2 2 1 p(x) = √ e−(x−µ) /(2σ ) for − ∞ ≤ x ≤ ∞. (5.6) 2πσ If Z ∼ N(0, 1) and X = σZ + µ, then X ∼ N(µ, σ 2 ). Because of this simple relationship, it is sufficient to develop methods to generate deviates from the standard normal distribution, N(0, 1), so there is no setup involved. All constants necessary in any algorithm can be precomputed and stored. There are several methods for transforming uniform random variates into normal random variates. One transformation not to use is: for i = 1, . . . , 12 as i.i.d. U(0, 1). 1. Generate ui 0001 2. Deliver x = ui − 6. This method is the Central Limit Theorem applied to a sample of size 12. Not only is the method approximate (and based on a poor approximation!), but it is also slower than better methods.
172
CHAPTER 5. SPECIFIC DISTRIBUTIONS
A simple method is the Box–Muller method arising from a polar transformation: If U1 and U2 are independently distributed as U(0, 1), and % X1 = −2 log(U1 ) cos(2πU2 ), % X2 = −2 log(U1 ) sin(2πU2 ), (5.7) then X1 and X2 are independently distributed as N(0, 1) (see Exercises 5.1a and 5.1b on page 213). The Box–Muller transformation is rather slow. It requires evaluation of one square root and two trigonometric functions for every two deviates generated. As noted by Neave (1973), if the uniform deviates used in the Box–Muller transformation are generated by a congruential generator with small multiplier, the resulting normals are deficient in the tails. Golder and Settle (1976) under similar conditions demonstrate that the density of the generated normal variates has a jagged shape, especially in the tails. Of course, if they had analyzed their small-multiplier congruential generator, they would have found that generator lacking. (See the discussion about Figure 1.3, page 16.) It is easy to see that the largest and smallest numbers generated by the Box–Muller transformation occur when a value of u1 from the uniform generator % is close to 0. A bound on the absolute value of the numbers generated is −2 log(e), where e is the smallest floating-point that can be generated by the uniform generator. (In the “minimal standard” congruential generator and many similar generators, the smallest number is approximately 2−31 , so the bound on the absolute value is approximately 6.56.) How close the results of the transformation come to the bound depends on whether u2 is close to 0, 1/4, 1/2, 3/4, or 1 when u1 is close to 0. (In the “minimal standard” generator, when u1 is at the smallest value possible, cos(2πu2 ) is close to 1 because u2 is relatively close to 0. The maximum number, therefore, is very close to the upper bound of 6.56, which has a p-value of the same order of magnitude as the reciprocal of the period of the generator. On the other hand, when u1 is close to 0, the value of u2 is never close enough to 1/2 or 3/4 to yield a value of one of the trigonometric functions close to −1. The minimum value that results from the Box–Muller transformation, therefore, does not approach the lower bound of −6.56. The p-value of the minimum value is three to four orders of magnitude greater than the reciprocal of the period of the generator.) If the Box–Muller transformation is used with a congruential generator, especially one with a relatively small multiplier, the roles of u1 and u2 should be exchanged periodically. This ensures that the lower and upper bounds are approximately symmetric if the generator has full period. Tezuka (1991) shows that similar effects also are noticeable if a poor Tausworthe generator is used. These studies emphasize the importance of using a good uniform generator for whatever distribution is to be simulated. It is especially important to be wary of the effects of a poor uniform generator in algorithms that require more than one uniform deviate (see Section 4.3). Bratley, Fox, and Schrage (1987) show that normal variates generated by the Box-Muller transformation lie pairwise on spirals. The spirals are of exactly
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
173
the same origin as the lattice of the congruential generator itself, so a solution would be to use a better uniform generator. To alleviate potential problems of patterns in the output of a polar method such as the Box–Muller transformation, some authors have advocated that, for each pair of uniforms, only one of the resulting pair of normals be used. If there is any marginal gain in quality, it is generally not noticeable, especially if the roles of u1 and u2 are exchanged periodically as recommended. The Box–Muller transformation is one of several polar methods. All of them have similar properties, but the Box–Muller transformation generally requires slower computations. Although most currently available computing systems can evaluate the necessary trigonometric functions extremely rapidly, the Box–Muller transformation can often be performed more efficiently using an acceptance/rejection algorithm, as we indicated in the general discussion of acceptance/rejection methods (see Exercise 5.1d on page 213). The Box–Muller transformation is implemented via rejection in Algorithm 5.1. Algorithm 5.1 A Rejection Polar Method for Normal Variates 1. Generate v1 and v2 independently from U(−1, 1), and set r2 = v12 + v22 . 2. If r2 ≥ 1, then go to step 1; otherwise, deliver % x1 = v1 %−2 log r2 /r2 x2 = v2 −2 log r2 /r2 . Ahrens and Dieter (1988) describe fast polar methods for the Cauchy and exponential distributions in addition to the normal distribution. The fastest algorithms for generating normal deviates use either a ratioof-uniforms method or a mixture with acceptance/rejection. One of the best algorithms, called the rectangle/wedge/tail method, is described by Marsaglia, MacLaren, and Bray (1964). In that method, the normal density is decomposed into a mixture of densities with shapes as shown in Figure 5.1. It is easy to generate a variate from one of the rectangular densities, so the decomposition is done to give a high probability of being able to use a rectangular density. That, of course, means lots of rectangles, which brings some inefficiencies. The optimal decomposition must address those tradeoffs. The wedges are nearly linear densities (see Algorithm 4.7), so generating from them is relatively fast. The tail region takes the longest time, so the decomposition is such as to give a small probability to the tail. Ahrens and Dieter (1972) give an implementation of the rectangle/wedge/tail method that can be optimized at the bit level. Kinderman and Ramage (1976) represent the normal density as a mixture and apply a variety of acceptance/rejection and table-lookup techniques for the components. The individual techniques for various regions have been developed by Marsaglia (1964), Marsaglia and Bray (1964), and Marsaglia, MacLaren, and
174
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Figure 5.1: Rectangle/Wedge/Tail Decomposition Bray (1964). Marsaglia and Tsang (1984) also give a decomposition, resulting in what they call the “ziggurat method”. Leva (1992a) describes a ratio-of-uniforms method with very tight bounding curves for generating normal deviates. (The 15-line Fortran program implementing Leva’s method is Algorithm 712 of CALGO; see Leva, 1992b.) Given the current speed of the standard methods of evaluating the inverse normal CDF, the inverse CDF method is often useful, especially if order statistics are of interest. Even with the speed of the standard algorithms for the inverse normal CDF, specialized versions, possibly to a slightly lower accuracy, have been suggested, for example by Marsaglia (1991) and Marsaglia, Zaman, and Marsaglia (1994). (The latter reference gives two algorithms for inverting the normal CDF: one very accurate, and one faster but slightly less accurate.) Wallace (1996) describes an interesting method of generating normals from other normals rather than by making explicit transformations of uniforms. The method begins with a set of kp normal deviates generated by some standard method. The deviates are normalized so that their sum of squares is 1024. Let X be a k × p array containing those deviates, and let Ai be a k × k orthogonal matrix. New normal deviates are formed by multiplication of an orthogonal matrix and the columns of X. A random column from the array and a random method of moving from one column to another are chosen. In Wallace’s implementation, k is 4 and p is 256. Four orthogonal 4 × 4 matrices are chosen to
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS make the matrix/vector multiplication fast: 1 1 −1 1 1 −1 1 −1 1 1 1 −1 1 1 A2 = A1 = 1 2 1 −1 −1 −1 2 1 −1 −1 −1 1 −1 −1 1 −1 1 1 −1 1 −1 −1 1 1 −1 −1 1 −1 A4 = A3 = 1 1 2 −1 −1 −1 2 −1 −1 1 1 1 1 1
175
−1 −1 1 1 −1 1 −1 1 −1 −1 1 −1 ; 1 1 1 −1
hence, the matrix multiplication is usually just the addition of two elements of the vector. After a random column of X is chosen (that is, a random integer between 1 and 256), a random odd number between 1 and 255 is chosen as a stride (that is, as an increment for the column number) to allow movement from one column to another. The first column chosen is multiplied by A1 , the next by A2 , the next by A3 , the next by A4 , and then the next by A1 , and so on. The elements of the vectors that result from these multiplications constitute both the normal deviates output in this pass and the elements of a new k × p array. Except for rounding errors, the elements in the new array should have a sum of squares of 1024 also. Just to avoid any problems from rounding, however, the last element generated is not delivered as a normal deviate but instead is used to generate a chi-squared deviate, y, with 1024 degrees of freedom via a Wilson–Hilferty approximation, and the 1023 other values are normalized by % y/1024. (The Wilson–Hilferty approximation relates the chi-squared random variable Y with ν degrees of freedom to the standard normal random variable X by
Y 13 2 − 1 − 9ν & . X≈ ν 2 9ν
The approximation is fairly good for ν > 30. See Abramowitz and Stegun, 1964.) Truncated Normal Distribution In Monte Carlo studies, the tail behavior is often of interest. Variates from the tail of a distribution can always be formed by selecting variates generated from the full distribution, of course, but this can be a very slow process. Marsaglia (1964), Geweke (1991a), Robert (1995), and Damien and Walker (2001) give methods for generating variates directly from a truncated normal distribution. The truncated normal with left truncation point τ has density 2
2
e−(x−µ) /(2σ )
p(x) = √ 2πσ 1 − Φ τ −µ σ where Φ(·) is the standard normal CDF.
for τ ≤ x ≤ ∞,
176
CHAPTER 5. SPECIFIC DISTRIBUTIONS
The method of Robert uses an acceptance/rejection method with a translated exponential as the majorizing density; that is, g(y) = λ∗ e−λ
∗
(y−τ )
where
for τ ≤ y ≤ ∞,
√
τ2 + 4 . (5.8) 2 (See the next section for methods to generate exponential random variates.) The method of Damien and Walker uses conditional distributions. The range of the conditional uniform that yields the normal is taken of √ √ as the intersection the truncated range and the full conditional range (− −2 log Y , −2 log Y ) in the example on page 149. ∗
λ =
τ+
Lognormal and Halfnormal Distributions Two distributions closely related to the normal are the lognormal and the halfnormal. The lognormal is the distribution of a random variable whose logarithm has a normal distribution. A very good way to generate lognormal variates is just to generate normal variates and exponentiate. The halfnormal is the folded normal distribution. The best way to generate deviates from the halfnormal is just to take the absolute value of normal deviates.
5.2.2
Exponential, Double Exponential, and Exponential Power Distributions
The exponential distribution with parameter λ > 0 has the probability density p(x) = λe−λx
for 0 ≤ x ≤ ∞.
(5.9)
If Z has the standard exponential distribution (that is, with parameter equal to 1), and X = Z/λ, then X has the exponential distribution with parameter λ (called the “rate”). Because of this simple relationship, it is sufficient to develop methods to generate deviates from the standard exponential distribution. The exponential distribution is a special case of the gamma distribution, the density of which is given in equation (5.13). The parameters of the gamma distribution are α = 1 and β = λ1 . The inverse CDF method is very easy to implement and is generally satisfactory for the exponential distribution. The method is to generate u from U(0, 1) and then take log(u) . (5.10) x=− λ (This and similar computations are why we require that the simulated uniform not include its endpoints.) Many other algorithms for generating exponential random numbers have been proposed over the years. Marsaglia, MacLaren, and Bray (1964) apply
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
177
the rectangle/wedge/tail method to the exponential distribution. Ahrens and Dieter (1972) give a method that can be highly optimized at the bit level. Ahrens and Dieter also provide a catalog of other methods for generating exponentials. These other algorithms seek greater speed by avoiding the computation of the logarithm. Many simple algorithms for random number generation involve evaluation of elementary functions. As we have indicated, evaluation of an elementary function at a random point can often be performed equivalently by acceptance/rejection, and Ahrens and Dieter (1988) describe a method for the exponential that does that. (See Hamilton, 1998, for some corrections to their algorithm.) As the software for evaluating elementary functions has become faster, the need to avoid their evaluation has decreased. A common use of the exponential distribution is as the model of the interarrival times in a Poisson process. A (homogeneous) Poisson process, T1 < T2 < . . . , with rate parameter λ can be generated by taking the output of an exponential random number generator with parameter λ as the times, t1 , t2 − t1 , . . . . We consider nonhomogeneous Poisson processes in Section 6.5.2, page 225. Truncated Exponential Distribution The interarrival process is memoryless, and the tail of the exponential distribution has an exponential distribution; that is, if X has the density (5.9), and Y = X + τ , then Y has the density λeλτ e−λy
for τ ≤ y ≤ ∞.
This fact provides a very simple process for generating from the tail of an exponential distribution. Double Exponential Distribution The double exponential distribution, also called the Laplace distribution, with parameter λ > 0 has the probability density p(x) =
λ −λ|x| e 2
for − ∞ ≤ x ≤ ∞.
(5.11)
The double exponential distribution is often used in Monte Carlo studies of robust procedures because it has a heavier tail than the normal distribution yet corresponds well with observed distributions. If Z has the standard exponential distribution and X = SZ/λ, where S is a random variable with probability mass 12 at −1 and at +1, then X has the double exponential distribution with parameter λ. This fact is the basis for the
178
CHAPTER 5. SPECIFIC DISTRIBUTIONS
method of generating double exponential variates; generate an exponential, and change the sign with probability 12 . The method of bit stripping (see page 10) can be used to do this as long as the lower-order bits are the ones used and assuming that the basic uniform generator is a very good one. Exponential Power Distribution A generalization of the double exponential distribution is the exponential power distribution, having density p(x) ∝ e−λ|x|
α
for − ∞ ≤ x ≤ ∞.
(5.12)
For α = 2, the exponential power distribution is the normal distribution. The members of this family with 1 ≤ α < 2 are often used to model distributions with slightly heavier tails than the normal distribution. Either the double exponential or the normal distribution, depending on the value of α, works well as a majorizing density to generate exponential power variates by acceptance/rejection (see Tadikamalla, 1980a).
5.2.3
Gamma Distribution
The gamma distribution with parameters α > 0 and β > 0 has the probability density 1 p(x) = xα−1 e−x/β for 0 ≤ x ≤ ∞, (5.13) Γ(α)β α where Γ(α) is the complete gamma function. The α parameter is called the shape parameter, and β is called the scale parameter. If the random variable Z has the standard gamma distribution with shape parameter α and scale parameter 1, and X = βZ, then X has a gamma distribution with parameters α and β. (Notice that the exponential is a gamma with α = 1 and β = 1/λ.) Of the special distributions that we have considered thus far, this is the first one that has a parameter that cannot be handled by simple translations and scalings. Hence, the best algorithms for the gamma distribution may be different depending on the value of α and on how many deviates are to be generated for a given value of α. Cheng and Feast (1979) use a ratio-of-uniforms method, as shown in Algorithm 5.2, for a gamma distribution with α > 1. The mean time of this 1 algorithm is O(α 2 ), so for larger values of α it is less efficient. Cheng and Feast (1980) also gave an acceptance/rejection method that was better for large values of the shape parameter. Schmeiser and Lal (1980) use a composition of ten densities, some of the rectangle/wedge/tail type, followed by the acceptance/rejection method. The Schmeiser/Lal method is the algorithm used in the IMSL Libraries for values of the shape parameter greater than 1. The speed of the Schmeiser/Lal method does not depend on the value of the shape parameter. Sarkar (1996) gives a modification of the Schmeiser/Lal method that
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
179
has greater efficiency because of using more intervals, resulting in tighter majorizing and squeeze functions, and because of using an alias method to help speed the process. Algorithm 5.2 The Cheng/Feast (1979) Algorithm for Generating Gamma Random Variates when the Shape Parameter is Greater than 1 1. Generate u1 and u2 independently from U(0, 1), and set 1 u1 α − 6α v= . (α − 1)u2 2. If
2(u2 − 1) 1 + v + ≤ 2, α−1 v
then deliver x = (α − 1)v; otherwise, if 2 log u2 − log v + v ≤ 1, α−1 then deliver x = (α − 1)v. 3. Go to step 1. An efficient algorithm for values of the shape parameter less than 1 is the acceptance/rejection method described in Ahrens and Dieter (1974) and modified by Best (1983), as shown in Algorithm 5.3. That method is the algorithm used in the IMSL Libraries for values of the shape parameter less than 1. Algorithm 5.3 The Best/Ahrens/Dieter Algorithm for Generating Gamma Random Variates when the Shape Parameter Is Less than 1 √ 0. Set t = 0.07 + 0.75 1 − α and e−t α . b=1+ t 1. Generate u1 and u2 independently from U(0, 1), and set v = bu1 . 2. If v ≤ 1, then 1 set x = tv α ; 2−x , then deliver x; if u2 ≤ 2+x otherwise, if u2 ≤ e−x , then deliver x; otherwise, 0011 0012 t(b − v) x set x = − log and y = ; α t
180
CHAPTER 5. SPECIFIC DISTRIBUTIONS if u2 (α + y(1 − α)) ≤ 1, then deliver x; otherwise, if u2 ≤ y a−1 , then deliver x.
3. Go to step 1. There are two cases of the gamma distribution that are of particular interest. The shape parameter α often is a positive integer. In that case, the distribution are independently is sometimes called the Erlang distribution. If Y1 , Y2 , . . . , Yα0001 distributed as exponentials with parameter 1/β, then X = Yi has a gamma (Erlang) distribution with parameters α and β. Using the inverse CDF method (equation (5.10)) with the independent realizations u1 , u2 , . . . , uα , we generate an Erlang deviate as (α ) 000f x = −β log ui . i=1
The general algorithms for gammas work better for the Erlang distribution if α is large. The other special case of the gamma is the chi-squared distribution in which the scale parameter β is 2. Twice the shape parameter α is called the degrees of freedom. For large or nonintegral degrees of freedom, the general methods for generating gamma random deviates are best for generating chi-squared deviates; otherwise, special methods described below are used. An important property of the gamma distribution is: If X and Y are independent gamma variates with common scale parameter β and shape parameters α1 and α2 , then X + Y has a gamma distribution with scale parameter β and shape parameter α1 + α2 . This fact may be used in developing schemes for generating gammas. For example, any gamma can be represented as the sum of an Erlang variate, which is the sum of exponential variates, and a gamma variate with shape parameter less than 1. This representation may effectively be used in a method of generating gamma variates (see Atkinson and Pearce, 1976). Truncated Gamma Distribution In some applications, especially ones involving censored observations, a truncated gamma is a useful model. In the case of left-censored data, we need to sample from the tail of a gamma distribution. The relevant distribution has the density p(x) =
1 xα−1 e−x/β (Γ(α) − Γτ /β (α))β α
for τ ≤ x ≤ ∞,
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
181
where Γτ /β (·) is the incomplete gamma function (see page 321). Dagpunar (1978) describes a method of sampling from the tail of a gamma distribution. The method makes use of the fact mentioned above that an exponential distribution is memoryless. A truncated exponential is used as a majorizing density in an acceptance/rejection method. Dagpunar first determines the optimal value of the exponential scale parameter that will maximize the probability of acceptance. The value is the saddlepoint in the ratio of the truncated gamma density to the truncated exponential (both truncated at the same point, τ ), % τ − α + (τ − α)2 + 4τ . λ= 2τ The procedure therefore is: 1. Generate y from the truncated exponential and u independently as U(0, 1). u1 + (y can be generated by generating u1 as U(0, 1) and taking y = − log λ τ .) 1−λ ) ≤ log u1 , then deliver y. 2. If (1 − λ)y − (α − 1)(1 + log y + log α−1
Many common applications require truncation on the right; that is, the observations are right censored. Philippe (1997) describes a method for generating variates from a right-truncated gamma distribution, which has density p(x) =
1 xα−1 e−x/β Γτ /β (α)β α
for 0 ≤ x ≤ τ,
where Γτ /β (α) is the incomplete gamma function. Philippe shows that if the random variable X has this distribution, then it can be represented as an infinite mixture of beta random variables: X=
∞
k=1
Γ(α) Yk , Γ(α + k)Γ1/β (α)β α+k−1 e1/β
where Yk is a random variable with a beta random variable with parameters α and k. Philippe suggested as a majorizing density a finite series gm (y)
∝ =
m
k=1 m
k=1
β k−1 Γ(α)Γ(k) β k−1 Γ(α
+ k)
1 0001m
y α−1 (1 − y)k−1
1 0001m
hk (y),
1 i=1 β i−1 Γ(α+i)
1 i=1 β i−1 Γ(α+i)
where hk is a beta density (equation (5.14)) with parameters α and k. Thus, to generate a variate from a distribution with density gm , we select a beta with the probability equal to the weight and then use a method described in the next section for generating a beta variate. The number of terms depends on the probability of acceptance. Obviously, we want a high probability of acceptance,
182
CHAPTER 5. SPECIFIC DISTRIBUTIONS
but this requires a large number of terms in the series. For a probability of acceptance of at least p∗ (with p∗ < 1, obviously), Philippe shows that the number of terms required in the series is approximately 1 m = 4 ∗
' 0011 00122 4 2 , zp + zp + β
where zp = Φ−1 (p) and Φ is the standard normal CDF. Algorithm 5.4 The Philippe (1997) Algorithm for Generating Gamma Random Variates Truncated on the Right at τ 0. Determine m∗ , and initialize quantities in gm∗ . 1. Generate y from the distribution with density gm∗ . 2. Generate u from U(0, 1). 3. If
0001m∗ u≤
y
eβ
1 k=1 β k−1 Γ(k) 0001m∗ (1−y)k−1 k=1 β k−1 Γ(k)
,
then take y as the desired realization; otherwise, return to step 1. Philippe (1997) also describes methods for a left-truncated gamma distribution, including special algorithms for the case where the truncation point is an integer. The interested reader is referred to the paper for the details. Damien and Walker (2001) also give a method for generating variates directly from a truncated gamma distribution. Their method uses conditional distributions, as we discuss on page 149. The range of the conditional uniform that yields the gamma is taken as the intersection of the truncated range and the full conditional range. Generalized Gamma Distributions There are a number of generalizations of the gamma distribution. The generalizations provide more flexibility in modeling because they have more parameters. Stacy (1962) defined a generalization that has two shape parameters. It is especially useful in failure-time models. The distribution has density p(x) =
γ |γ| xαγ−1 e(−x/β) Γ(α)β αγ
for 0 ≤ x ≤ ∞.
This distribution includes as special cases the ordinary gamma (with γ = 1), the halfnormal distribution (with α = 12 and γ = 2), and the Weibull (with
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
183
α = 1). The best way to generate a generalized gamma distribution is to use the best method for the corresponding gamma and then exponentiate. Everitt (1998) describes a generalized gamma distribution, which he calls the “Creedy and Martin generalized gamma”, with density 2
p(x) = θ0 xθ1 eθ2 x+θ3 x
+θ4 x3
for 0 ≤ x ≤ ∞.
This density can of course be scaled with a β, as in the other gamma distributions that we have discussed. Ghitany (1998) and Agarwal and Al-Saleh (2001) have described a generalized gamma distribution based on a generalized gamma function, 0016 ∞ 1 tα−1 e−t dt, Γ(α, ν, λ) = (t + ν)λ 0 introduced by Kobayashi (1991). The distribution has density p(x) =
xα−1 1 e(−x/β) Γ(α, ν, λ)β α−λ (x + βν)λ
for 0 ≤ x ≤ ∞.
This distribution is useful in reliability studies because of the shapes of the hazard function that are possible for various values of the parameters. It is the same as the ordinary gamma for λ = 0. D-Distributions A class of distributions, called D-distributions, that arise in extended gamma processes is studied by Laud, Ramgopal, and Smith (1993). The interested reader is referred to that paper for the details.
5.2.4
Beta Distribution
The beta distribution with parameters α > 0 and β > 0 has the probability density 1 xα−1 (1 − x)β−1 for 0 ≤ x ≤ 1, p(x) = (5.14) B(α, β) where B(α, β) is the complete beta function. Efficient methods for generating beta variates require different algorithms for different values of the parameters. If either parameter is equal to 1, it is very simple to generate beta variates using the inverse CDF method, which in this case would just be a root of a uniform. If both values of the parameters are less than 1, the simple acceptance/rejection method of J¨ ohnk (1964), given as Algorithm 5.5, is one of the best. If one parameter is less than 1 and the other is greater than 1, the method of Atkinson (1979) is useful. If both parameters are greater than 1, the method of Schmeiser and Babu (1980) is very efficient, except that it requires a lot of setup time. For the case of both parameters
184
CHAPTER 5. SPECIFIC DISTRIBUTIONS
greater than 1, Cheng (1978) gives an algorithm that requires very little setup time. The IMSL Libraries use all five of these methods, depending on the values of the parameters and how many deviates are to be generated for a given setting of the parameters. Algorithm 5.5 J¨ ohnk’s Algorithm for Generating Beta Random Variates when Both Parameters are Less than 1 1/α
1. Generate u1 and u2 independently from U(0, 1), and set v1 = u1 1/β v2 = u2 .
and
2. Set w = v1 + v2 . 3. If w > 1, then go to step 1. v1 4. Set x = , and deliver x. w
5.2.5
Chi-Squared, Student’s t, and F Distributions
The chi-squared, Student’s t, and F distributions all are derived from the normal distribution. Variates from these distributions could, of course, be generated by transforming normal deviates. In the case of the chi-squared distribution, however, this would require generating and squaring several normals for each chi-squared deviate. A more direct algorithm is much more efficient. Even in the case of the t and F distributions, which would require only a couple of normals or chi-squared deviates, there are better algorithms. Chi-Squared Distribution The chi-squared distribution, as we have mentioned above, is a special case of the gamma distribution (see equation (5.13)) in which the scale parameter, β, is 2. Twice the shape parameter, 2α, is called the degrees of freedom and is often denoted by ν. If ν is large or is not an integer, the general methods for generating gamma random deviates described above are best for generating chi-squared deviates. If the degrees of freedom value is a small integer, the chi-squared deviates can be generated by taking a logarithm of the product of some independent uniforms. If ν is an even integer, the chi-squared deviate r is produced from ν/2 independent uniforms, ui , by ν/2 000f r = −2 log ui . i=1
If ν is an odd integer, this method can be used with the product going to (ν − 1)/2, and then the square of an independent normal deviate is added to produce r. The square root of the chi-squared random variable is sometimes called a chi random variable. Although, clearly, a chi random variable could be generated
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
185
as the square root of a chi-squared deviate generated as above, there are more efficient direct ways of generating a chi deviate; see Monahan (1987). Student’s t Distribution The standard t distribution with ν degrees of freedom has density
0011 0012− ν+1 2 Γ ν+1 x2 2 p(x) = ν √ 1+ ν Γ 2 νπ
for − ∞ ≤ x ≤ ∞.
(5.15)
The degrees of freedom, ν, does not have to be an integer, but it must be positive. A standard normal random variable divided by the square root of a chisquared random variable with ν degrees of freedom is a t random variable with ν degrees of freedom. Also, the square root of an F random variable with 1 and ν degrees of freedom is a t random variable with ν degrees of freedom. These relations could be used to generate t deviates, but neither yields an efficient method. Kinderman and Monahan (1980) describe a ratio-of-uniforms method for the t distribution. The algorithm is rather complicated, but it is very efficient. Marsaglia (1980) gives a simpler procedure that is almost as fast. Either is almost twice as fast as generating a normal deviate and a chi-squared deviate and dividing by the square root of the chi-squared one. Marsaglia (1984) also gives a very fast algorithm for generating t variates that is based on a transformed acceptance/rejection method that he called exact-approximation (see Section 4.5). Bailey (1994) gives the polar method shown in Algorithm 5.6 for the Student’s t distribution. It is similar to the polar method for normal variates given in Algorithm 5.1. Algorithm 5.6 A Rejection Polar Method for t Variates with ν Degrees of Freedom 1. Generate v1 and v2 independently from U(−1, 1), and set r2 = v12 + v22 . 2. If r2 ≥ 1, then go to step 1; otherwise, ' deliver x = v1
ν(r−4/ν − 1) . r2
The jagged shape of the frequency curve of normals generated via a polar method based on a poor uniform generator that was observed by Neave (1973) and by Golder and Settle (1976) may also occur for t variates generated by this polar method. It is important to use a good uniform generator for whatever distribution is to be simulated.
186
CHAPTER 5. SPECIFIC DISTRIBUTIONS
In Bayesian analysis, it is sometimes necessary to generate random variates for the degrees of freedom in a t distribution conditional on the data. In the hierarchical model underlying the analysis, the t random variable is interpreted as a mixture of normal random variables divided by square roots of gamma random variables. For given realizations of gammas, λ1 , λ2 , . . . , λn , the density of the degrees of freedom is p(x) ∝
n 000f i=1
ν ν/2 −νλi /2 ν λν/2 . i e
2ν/2 Γ
2
Mendoza-Blanco and Tu (1997) show that three different gamma distributions can be used to approximate this density very well for three different ranges of values of λg e−λa , where λg is the geometric mean of the λi , and λa is the arithmetic mean. Although the approximations are very good, the gamma approximations could also be used as majorizing densities. F Distribution A variate from the F distribution can be generated as the ratio of two chisquared deviates, which, of course, would be only half as fast as generating a chi-squared deviate. A better way to generate an F variate is as a transformed beta. If X is distributed as a beta, with parameters ν1 /2 and ν2 /2, and Y =
ν2 X , ν1 1 − X
then Y has an F distribution with ν1 and ν2 degrees of freedom. Generating a beta deviate and transforming it usually takes only slightly longer than generating a single chi-squared deviate.
5.2.6
Weibull Distribution
The Weibull distribution with parameters α > 0 and β > 0 has the probability density α α p(x) = xα−1 e−x /β for 0 ≤ x ≤ ∞. (5.16) β The simple inverse CDF method applied to the standard Weibull distribution (i.e., β = 1) is quite efficient. The expression is simply 1
(− log u) α . Of course, an acceptance/rejection method could be used to replace the evaluation of the logarithm in the inverse CDF. The standard Weibull deviates are then scaled by β 1/α .
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
5.2.7
187
Binomial Distribution
The probability function for the binomial distribution with parameters n and π is n! π x (1 − π)n−x for x = 0, 1, . . . , n, (5.17) p(x) = x!(n − x)! where n is a positive integer and 0 < π < 1. To generate a binomial, a simple way is to sum Bernoullis (equation (4.4), and Algorithm 4.1, page 105), which is equivalent to an inverse CDF technique. If n, the number of independent Bernoullis, is small, this method is adequate. The time required for this kind of algorithm is obviously O(n). For larger values of n, the median of a random sample of size n from a Bernoulli distribution can be generated (it has an approximate beta distribution; see Relles, 1972), and then the inverse CDF method can be applied from that point. Starting at the median allows the time required to be halved. Kemp (1986) shows that starting at the mode results in an even faster method and gives a method to approximate the modal probability quickly. If this idea is applied recursively, the time becomes O(log n). The time required for any method based on the CDF of the binomial is an increasing function of n. Several methods whose efficiencies are not so dependent on n are available, and for large values of n they are to be preferred to methods based on the CDF. (The value of π also affects the speed; the inverse CDF methods are generally competitive as long as nπ < 500.) Stadlober (1991) described an algorithm based on a ratio-of-uniforms method. Kachitvichyanukul (1982) gives an efficient method using acceptance/rejection over a composition of four regions (see Schmeiser, 1983; and Kachitvichyanukul and Schmeiser, 1988a, 1990). This is the method used in the IMSL Libraries. Beta-Binomial Distribution The beta-binomial distribution is the mixture distribution that is a binomial for which the parameter π is a realization of a random variable that has a beta distribution. This distribution is useful for modeling overdispersion or “extravariation” in applications where there are clusters of separate binomial distributions. The probability function for the beta-binomial distribution with parameters n, which is a positive integer, α > 0, and β > 0 is 0016 1 n! π α−1+x (1 − π)n+β−1−x dπ for x = 0, 1, . . . , n, p(x) = x!(n − x)!B(α, β) 0 (5.18) where B(α, β) is the complete beta function. (The integral in this expression is B(α + x, n + β − x).) The mean of the beta-binomial is in the form of the binomial mean, nπ,
188
CHAPTER 5. SPECIFIC DISTRIBUTIONS
with the beta mean, α/(α + β), in place of π, but the variance is nαβ n + α + β . (α + β)2 1 + α + β A simple way of generating deviates from a beta-binomial distribution is first to generate the parameter π as the appropriate beta and then to generate the binomial (see Ahn and Chen, 1995). In this case, an inverse CDF method for the binomial may be more efficient because it does not require as much setup time as the generally more efficient ratio-of-uniforms or acceptance/rejection methods referred to above.
5.2.8
Poisson Distribution
The probability function for the Poisson distribution with parameter θ > 0 is p(x) =
e−θ θx x!
for x = 0, 1, 2, . . . .
(5.19)
A Poisson with a small mean, θ, can be generated efficiently by the inverse CDF technique. Kemp and Kemp (1991) describe a method that begins at the mode of the distribution and proceeds in the appropriate direction to identify the inverse. They give a method for identifying the mode and computing the modal probability. Many of the other methods that have been suggested for the Poisson also require longer times for distributions with larger means. Ahrens and Dieter (1980) and Schmeiser and Kachitvichyanukul give efficient methods having times that do not depend on the mean (see Schmeiser, 1983). The method of Schmeiser and Kachitvichyanukul uses acceptance/rejection over a composition of four regions. This is the method used in the IMSL Libraries.
5.2.9
Negative Binomial and Geometric Distributions
The probability function for the negative binomial is 0011 0012 x+r−1 p(x) = π r (1 − π)x for x = 0, 1, 2, . . . , r−1
(5.20)
where r > 0 and 0 < π < 1. If r is an integer, the negative binomial distribution is sometimes called the Pascal distribution. If π is the probability of a success in a single Bernoulli trial, the random variable can be thought of as the number of failures before r successes are obtained. If rπ/(1 − π) is relatively small and (1 − π)r is not too small, the inverse CDF method works well. Otherwise, a gamma (r, π/(1 − π)) can be generated and used as the parameter to generate a Poisson. The Poisson variate is then delivered as the negative binomial variate.
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
189
The geometric distribution is a special case of the negative binomial with r = 1. The probability function is p(x) = π(1 − π)x
for x = 0, 1, 2, . . . .
(5.21)
The integer part of an exponential random variable with parameter λ = − log(1− π) has a geometric distribution with parameter π; hence, the simplest, and also one of the best, methods for the geometric distribution with parameter π is to generate a uniform deviate u and take * log u + . log(1 − π) It is common to see the negative binomial and the geometric distributions defined as starting at 1 instead of 0, as above. The distributions are the same after making an adjustment of subtracting 1.
5.2.10
Hypergeometric Distribution
The probability function for the hypergeometric distribution is 0011 00120011 0012 M L−M x N −x 0011 0012 p(x) = L N
(5.22)
for x = max(0, N − L + M ), . . . , min(N, M ). The usual method of developing the hypergeometric distribution is with a finite sampling model: N items are to be sampled, independently with equal probability and without replacement, from a lot of L items of which M are special; the random variable X is the number of special items in the random sample. A good method for generating from the hypergeometric distribution is the inverse CDF method. The inverse CDF can be evaluated recursively using the simple expression for the ratio p(x + 1)/p(x), so the build-up search of Algorithm 4.4 or the chop-down search of Algorithm 4.5 could be used. In either case, beginning at the mean, M N/L, can speed up the search. Another simple method that is good is straightforward use of the finite sampling model that defines the distribution. Kachitvichyanukul and Schmeiser (1985) give an algorithm based on acceptance/rejection of a probability function decomposed as a mixture, and Stadlober (1990) describes an algorithm based on a ratio-of-uniforms method. Both of these can be faster than the inverse CDF for larger values of N and M . Kachitvichyanukul and Schmeiser (1988b) give a Fortran program for sampling from the hypergeometric distribution. The program uses either the inverse CDF or the acceptance/rejection method depending on the mode, , (N + 1)(M + 1) . m= L+2
190
CHAPTER 5. SPECIFIC DISTRIBUTIONS
If m−max(0, N +M −L) < 10, then the inverse CDF method is used; otherwise, the composition/acceptance/rejection method is used. Extended Hypergeometric Distribution A related distribution, called the extended hypergeometric distribution, can be developed by assuming that X and Y = N − X are binomial random variables with parameters M and πX and L − M and πY , respectively. Let ρ be the odds ratio, πX (1 − πY ) ; ρ= πY (1 − πX ) then, the conditional distribution of X given X + Y = N has probability mass function 0011 00120011 0012 M L−M ρx x N −x p(x|x + y = N ) = (5.23) 00120011 0012 b 0011
M L−M ρj j N −j j=a
for x = a . . . , b, where a = max(0, N − L + M ) and b = min(N, M ). This function can also be evaluated recursively and random numbers generated by the inverse CDF method, similarly to the hypergeometric distribution. Liao and Rosen (2001) describe methods for evaluating the probability mass functions and also for computing the mode of the distribution in order to speed up the evaluation of the inverse CDF. Another generalization, called the noncentral hypergeometric distribution, is developed by allowing different probabilities of selecting the two types of items. If the relative probability of selecting an item of the special type to that of selecting an item of the other type (given an equal number of each type) ω, the realization of X can be built sequentially by Bernoulli realizations with probability Mk /(Mk + ω(Lk − Mk )), where Mk is the number of special items remaining. Variates from this distribution can be generated by the finite sampling model underlying the distribution.
5.2.11
Logarithmic Distribution
The probability function for the logarithmic distribution with parameter θ is p(x) = −
θx x log(1 − θ)
for x = 1, 2, 3, . . . ,
(5.24)
where 0 < θ < 1. Kemp (1981) describes a method for generating efficiently from the inverse logarithmic CDF either using a chop-down approach (see page 108) to move
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
191
rapidly down the set of CDF values or using a mixture in which highly likely values are given priority.
5.2.12
Other Specific Univariate Distributions
Many other interesting distributions have simple relationships to the standard distributions discussed above. When that is the case, because there are highly optimized methods for the standard distributions, it is often best just to use a very good method for the standard distribution and then apply the appropriate transformation. For some distributions, the inverse CDF method is almost as good as more complicated methods. Cauchy Distribution Variates from the Cauchy or Lorentzian distribution, which has density p(x) =
1
2 πa 1 + x−b a
for − ∞ ≤ x ≤ ∞,
(5.25)
can be generated easily by the inverse CDF method. For the standard Cauchy distribution (that is, with a = 1 and b = 0), given u from U(0, 1), a Cauchy is tan(πu). The tangent function in the inverse CDF could be evaluated by acceptance/rejection in the manner mentioned on page 121, but if the inverse CDF is to be used, it is probably better just to use a good numerical function to evaluate the tangent. Kronmal and Peterson (1981) express the Cauchy distribution as a mixture and give an acceptance/complement method that is very fast. Rayleigh Distribution For the Rayleigh distribution with density 2 2 x p(x) = 2 e−x /2σ for 0 ≤ x ≤ ∞ σ
(5.26)
(which is a Weibull distribution with parameters α = 2 and β = 2σ 2 ), variates can be generated by the inverse CDF method as % x = σ − log u. Faster acceptance/rejection methods can be constructed, but if the computing system has fast functions for exponentiation and taking logarithms, the inverse CDF is adequate. Pareto Distribution For the Pareto distribution with density p(x) =
aba xa+1
for b ≤ x ≤ ∞,
(5.27)
192
CHAPTER 5. SPECIFIC DISTRIBUTIONS
variates can be generated by the inverse CDF method as x=
b u1/a
.
There are many variations of the continuous Pareto distribution, the simplest of which is the one defined above. In addition, there are some discrete versions, including various zeta and Zipf distributions. (See Arnold, 1983, for an extensive discussion of the variations.) Variates from these distributions can usually be generated by discretizing some form of a Pareto distribution. Dagpunar (1988) describes such a method for a zeta distribution, in which the Pareto variates are first generated by an acceptance/rejection method. Zipf Distribution The standard Zipf distribution assigns probabilities to the positive integers x proportional to x−α , for α > 1. The probability function is p(x) =
1 ζ(α)xα
for x = 1, 2, 3, . . . ,
(5.28)
0001∞ where ζ(α) = x=1 x−α (the Riemann zeta function). Variates from the simple Zipf distribution can be generated efficiently by a direct acceptance/rejection method given by Devroye (1986a). In this method, first two variates u1 and u2 are generated from U(0, 1), and then x and t are defined as −1/(α−1) 0007 x = 0006u1 and t = (1 + 1/x)α−1 . The variate x is accepted if x≤
t 2α−1 − 1 . t − 1 2α−1 u2
Von Mises Distribution Variates from the von Mises distribution with density p(x) =
1 ec cos(x) 2πI0 (c)
for − π ≤ x ≤ π,
(5.29)
as discussed on page 140, can be generated by the acceptance/rejection method. Best and Fisher (1979) use a transformed folded Cauchy distribution as the majorizing distribution. The majorizing density is g(y) =
1 − ρ2 π(1 + ρ2 − 2ρ cos y)
for 0 ≤ y ≤ π,
5.2. SOME SPECIFIC UNIVARIATE DISTRIBUTIONS
193
where ρ is chosen in [0, 1) to optimize the probability of acceptance for a given value of the von Mises parameter, c. (Simple plots of g(·) with different values of ρ compared to a plot of p(·) with a given value of c visually lead to a relatively good choice of ρ.) This is the method used in the IMSL Libraries. Dagpunar (1990) gives an acceptance/rejection method for the von Mises distribution that is often more efficient. Inverse Gaussian Distribution The inverse Gaussian distribution is widely used in reliability studies. The density, for location parameter µ > 0 and scale parameter λ > 0, is 0011 p(x) =
λ 2π
00121/2
x−3/2 exp
0011
−λ(x − µ)2 2µ2 x
0012 for 0 ≤ x ≤ ∞.
(5.30)
The inverse Gaussian distribution with µ = 1 is called the Wald distribution. It is the distribution of the first passage time in a standard Brownian motion with positive drift. Michael, Schucany, and Haas (1976) and Atkinson (1982) discussed methods for simulating inverse Gaussian random deviates. The method of Michael et al., given as Algorithm 5.7, is particularly straightforward but efficient. Algorithm 5.7 Michael/Schucany/Haas Method for Generating Inverse Gaussians 1. Generate v from N(0, 1), and set y = v 2 . % 2 µ 2. Set x1 = µ + µ2λy − 2λ 4µλy + µ2 y 2 . 3. Generate u from U(0, 1). µ , then 4. If u ≤ µ+x 1 deliver x = x1 ; otherwise, 2 deliver x = µx1 .
The generalized inverse Gaussian distribution has an additional parameter that is the exponent of x in the density (5.30), which allows for a wider range of shapes. Barndorff-Nielsen and Shephard (2001) discuss the generalized inverse Gaussian distribution, and describe a method for generating random numbers from it.
5.2.13
General Families of Univariate Distributions
In simulation applications, one of the first questions is what is the distribution of the random variables that model observational data. Some distributions, such as Poisson or hypergeometric distributions, are sometimes obvious from first principles of the 1 − s1 = u4 . s2
The IMSL routine rnsph uses these methods for three or four dimensions and uses scaled normals for higher dimensions. Banerjia and Dwyer (1993) consider the related problem of generating random points in a ball, which would be equivalent to generating random points on a sphere and then scaling the radius by a uniform deviate. They describe a divide-and-conquer algorithm that can be used in any dimension and that is faster than scaling normals or scaling points from Marsaglia’s method, assuming that the speed of the underlying uniform generator is great relative to square root computations.
5.3.5
Two-Way Tables
Boyett (1979) and Patefield (1981) consider the problem of generating random entries in a two-way table subject to given marginal row and column totals. The distribution is uniform over the integers that yield the given totals. Boyett derives the joint distribution for the cell counts and then develops the conditional distribution for a given cell, given the counts in all cells in previous rows and all cells in previous columns of the current given row. Patefield then uses the conditional expected value of a cell count to generate a random entry for each cell in turn. Let aij for i = 1, 2, . . . , r and j = 1, 2, . . . , c be the cell count, and use the “dot notation” for summation: a•j is the sum of the counts in the j th column,
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
203
for example, and a•• is the grand total. The conditional probability that the count in the (l, m)th cell is alm given the counts aij , for 1 ≤ i < l and 1 ≤ j ≤ c and for i = l and 1 ≤ j < m, is 0001 0001 0001 0001 0001 al• −
alm ! al• −
j<m
alj ! a•• −
0001
j≤m
alj ! a•• −
a•m −
0001
i≤l
ai• −
0001
i≤l
ai• −
0001
a ! i≤l im
j<m
a•j +
0001
j≤m
a•j −
a•j +
0001
j<m
0001
a i
i≤l
aij !
0001
j<m
i≤l
aij !
! j>m ×
0001
. 0001 0001 a•m − i
For each cell, a random uniform is generated, and the discrete inverse CDF method is used. Sequential evaluation of this expression is fairly simple, so the probability accumulation proceeds rapidly. The full expression is evaluated only once for each cell. Patefield (1981) also speeds up the process by beginning at the conditional expected value of each cell rather than accumulating the CDF from 0. The conditional expected value of the random count in the (l, m)th cell, Alm , for 1 ≤ i < l and 1 ≤ j ≤ c and for i = l and 1 ≤ j < m, is 0001l−1 0001m−1 a•m − i=1 aim al• − j=1 alj E(Alm | aij ) = 0001l−1 0001c a − • j j=m i=1 aij unless the denominator is 0, in which case E(Alm |aij ) is zero. Patefield (1981) gives a Fortran program implementing the method described. This is the method used in the IMSL routine rntab.
5.3.6
Other Specific Multivariate Distributions
Only a few of the standard univariate distributions have standard multivariate extensions. Various applications often lead to different extensions; see Kotz, Balakrishnan, and Johnson (2000). If the density of a multivariate distribution exists and can be specified, it is usually possible to generate variates from the distribution using an acceptance/rejection method. The majorizing density can often be just the product density; that is, a multivariate density with components that are the independent univariate variables, as in the example of the bivariate gamma on page 123. Multivariate Bernoulli Variates and the Multivariate Binomial Distribution A multivariate Bernoulli distribution of correlated binary random variables has applications in modeling system reliability, clinical trials with repeated measures, and genetic transmission of disease. For the multivariate Bernoulli distribution with marginal probabilities π1 , π2 , . . . , πd and pairwise correlations ρij , Emrich and Piedmonte (1991) propose identifying a multivariate normal
204
CHAPTER 5. SPECIFIC DISTRIBUTIONS
distribution with similar pairwise correlations. The normal is determined by solving for normal pairwise correlations, rij , in a system of d(d − 1)/2 equations involving the bivariate standard normal CDF, Φ2 , evaluated at percentiles zπ corresponding to the Bernoulli probabilities: & Φ2 (zπi , zπj ; rij ) = ρij πi (1 − πi )πj (1 − πj ) + πi πj . (5.37) Once these pairwise correlations are determined, a multivariate normal y is generated and transformed to a Bernoulli, x, by the rule xi
=
1 if yi ≤ zπi
= 0 otherwise. Sums of multivariate Bernoulli random variables are multivariate binomial random variables. Phenomena modeled by binomial distributions, within clusters, often exhibit greater or less intracluster variation than independent binomial distributions would indicate. This behavior is called “overdispersion” or “underdispersion”. Overdispersion can be simulated by the beta-binomial distribution discussed earlier. A beta-binomial cannot model underdispersion, but the method of Emrich and Piedmonte (1991) to induce correlations in the Bernoulli variates can be used to model either overdispersion or underdispersion. Ahn and Chen (1995) discuss this method and compare it with the use of a beta-binomial in the case of overdispersion. They also compared the output of the simulation models for both underdispersed and overdispersed binomials with actual data from animal litters. Park, Park, and Shin (1996) give a method for generating correlated binary variates based on sums of Poisson random variables in which the sums have some terms in common. They let Z1 , Z2 , and Z3 be independent Poisson random variables with nonnegative parameters α11 − α12 , α22 − α12 , and α12 , respectively, with the convention that a Poisson with parameter 0 is a degenerate random variable equal to 0, and define the random variables Y1 and Y2 as Y1 = Z1 + Z3 and Y2 = Z2 + Z3 . They then define the binary random variables X1 and X2 by Xi
=
1 if Yi = 0
= 0 otherwise. They then determine the constants, α11 , α22 , and α12 , so that E(Xi ) = πi and Corr(X1 , X2 ) = ρ12 .
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
205
It is easy to see that 0010
( αij = log 1 + ρij
(1 − πi )(1 − πj ) πi πj
) (5.38)
yields those relations. After the αs are computed, the procedure is as shown in Algorithm 5.10. Algorithm 5.10 Park/Park/Shin Method for Generating Correlated Binary Variates 0. Set k = 0. 1. Set k = k + 1. Let βk = αrs be the smallest positive αij . 2. If αrr = 0 or αss = 0, then stop. 3. Let Sk be the set of all indices, i, j, for which αij > 0. For all {i, j} ⊆ Sk , set αij = αij − βk . 4. If not all αij = 0, then go to step 1. 5. Generate k Poisson deviates, zj , with parameters βj . For i = 1, 2, . . . , d, set
zj . yi = i∈Sj
6. For i = 1, 2, . . . , d, set xi
= 1 if yi = 0 = 0 otherwise.
Lee (1993) gives another method to generate multivariate Bernoullis that uses odds ratios. (Odds ratios and correlations uniquely determine each other for binary variables.) Multivariate Beta or Dirichlet Distribution The Dirichlet distribution is a multivariate extension of a beta distribution, and the density of the Dirichlet is the obvious extension of the beta density (equation (5.14)), p(x)
=
Γ(Σαj ) .
Γ(αj )
.
α −1
xj j
(1 − x1 − x2 − · · · − xd )αd+1 −1 for 0 ≤ xj ≤ 1.
(5.39)
Arnason and Baniuk (1978) consider several ways to generate deviates from the Dirichlet distribution, including a sequence of conditional betas and the use of
206
CHAPTER 5. SPECIFIC DISTRIBUTIONS
the relationship of order statistics from a uniform distribution to a Dirichlet. (The ith order statistic from a sample of size n from a U(0, 1) distribution has a beta distribution with parameters i and n − i + 1.) The most efficient method that they found was the use of a relationship between independent gamma variates and a Dirichlet variate. If Y1 , Y2 , . . . , Yd , Yd+1 are independently distributed as gamma random variables with shape parameters α1 , α2 , . . . , αd , αd+1 and common scale parameter, then the d-vector X with elements Yj Xj = 0001d+1 k=1
Yk
,
j = 1, . . . , k,
has a Dirichlet distribution with parameters α1 , α2 , . . . , αd . This relationship yields the straightforward method of generating Dirichlets by generating gammas. Dirichlet-Multinomial Distribution The Dirichlet-multinomial distribution is the mixture distribution that is a multinomial with parameter π that is a realization of a random variable having a Dirichlet distribution. Just like the beta-binomial distribution (5.18), this distribution is useful for modeling overdispersion or extravariation in applications where there are clusters of separate multinomial distributions. A simple way of generating deviates from a Dirichlet-multinomial distribution is first to generate the parameter π as the appropriate Dirichlet and then to generate the multinomial conditionally. There are other ways of inducing overdispersion in multinomial distributions. Morel (1992) describes a simple algorithm to generate a finite mixture of multinomials by clumping individual multinomials. This mixture distribution has the same first two moments of the Dirichlet-multinomial distribution, but it is not the same distribution. Multivariate Hypergeometric Distribution The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution for more than two types of outcomes. The model is an urn filled with balls of different colors. The multivariate random variable is the vector of numbers of each type of ball when N balls are drawn randomly and without replacement. The probability function for the multivariate hypergeometric distribution is the same as that for the univariate hypergeometric distribution (equation (5.22), page 189) except with more classes. To generate a multivariate hypergeometric random deviate, a simple way is to work with the marginals. The generation is done sequentially. Each succeeding conditional marginal is hypergeometric. To generate the deviate, combine all classes except the first in order to form just two classes. Next, generate a univariate hypergeometric deviate x1 . Then remove the first class and form two classes consisting of the second one and the third through the
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
207
last combined, and generate a univariate hypergeometric deviate based on N − x1 draws. This gives x2 , the number of the second class. Continue in this manner until the number remaining to be drawn is 0 or until the classes are exhausted. For efficiency, the first marginal used would be the one with the largest probability. Multivariate Uniform Distribution Falk (1999) considers the problem of simulating a d-variate Ud (0, 1) distribution with specified correlation matrix R = (ρij ). A simple approximate method is to generate y from Nd (0, R) and take xi = Φ(yi ), where Φ is the standard normal CDF. Falk shows that the correlation matrix of variates generated in this way is very close to the target correlation matrix R. He also shows that if the matrix ˜ R
= (rij )
= 2 sin(πρij /6)
˜ and Xi = Φ(Yi ), then Corr(X) = is positive semidefinite, and if Y ∼ Nd (0, R) R. Therefore, if the target correlation matrix R is such that the corresponding ˜ is positive semidefinite, then variates generated as above are from a matrix R d-variate Ud (0, 1) distribution with exact correlation matrix R. Multivariate Exponential Distributions A multivariate exponential distribution can be defined in terms of Poisson shocks (see Marshall and Olkin, 1967), and variates can be generated from that distribution by generating univariate Poisson variates (see Dagpunar, 1988). There are various ways to define a multivariate double exponential distribution, or a multivariate Laplace distribution. Ernst (1998) describes an elliptically contoured multivariate Laplace distribution with density
γ/2 γΓ(d/2) . (5.40) − (x − µ)T Σ −1 (x − µ) p(x) = 1 exp 2π d/2 Γ(d/γ)|Σ| 2 Ernst shows that a simple way to generate a variate from this distribution is to generate a point s on the d-dimensional sphere (see Section 5.3.4), generate a generalized univariate gamma variate y (page 182) with parameters d, 1, and γ, and deliver x = yT T s + µ, where T T T = Σ. Kozubowski and Podg´ orski (2000) describe an asymmetric multivariate Laplace distribution (not elliptically contoured). They also describe a method for generating random deviates from that distribution.
208
CHAPTER 5. SPECIFIC DISTRIBUTIONS
Multivariate Gamma Distributions The bivariate gamma distribution of Becker and Roux (1981) discussed in Section 4.5 (page 123) is only one possibility for extending the gamma. Others, motivated by different models of applications, are discussed by Mihram and Hultquist (1967), Ronning (1977), Ratnaparkhi (1981), and Jones, Lai, and Rayner (2000), for example. Ronning (1977) and London and Gennings (1999) describe specific multivariate gamma distributions and describe methods for generating variates from the multivariate gamma distribution that they considered. The bivariate gamma distribution of Jones, Lai, and Rayner (2000) is formed from two univariate gamma distributions with fixed shape parameters and scale parameters ζ and ξ, each of which takes one of two values with a generalized Bernoulli distribution. For i, j = 1, 2, Pr(ζ = ζi , ξ = ξj ) = πij . The correlation between the two elements of the bivariate gamma depends on the πij , as Jones, Lai, and Rayner (2000) discuss. It is easy to generate random variates from this bivariate distribution: for each variate, generate a value for ζ and ξ, and then generate two of the univariate gammas. Multivariate Stable Distributions Various multivariate extensions of the stable distributions can be defined. Modarres and Nolan (1994) give a representation of a class of multivariate stable distributions in which the multivariate stable random variable is a weighted sum of a univariate stable random variable times a point on the unit sphere. The reader is referred to the paper for the description of the class of multivariate stable distributions for which the method applies. See also Nolan (1998a).
5.3.7
Families of Multivariate Distributions
Methods are available for generating multivariate distributions with various specific properties. Extensions have been given for multivariate versions of some of the general families of univariate distributions discussed on page 193. Parrish (1990) gives a method to generate random deviates from a multivariate Pearson family of distributions. Takahasi (1965) defines a multivariate extension of the Burr distributions. Generation of deviates from the multivariate Burr distribution can be accomplished by transformations of univariate samples. Gange (1995) gives a method for generating general multivariate categorical variates using iterative proportional fitting to the marginals. A useful general class of multivariate distributions are the elliptically contoured distributions. A nonsingular elliptically contoured distribution has a density of the general form p(x) =
c |Σ|
1 2
g (x − µ)T Σ −1 (x − µ) ,
5.3. SOME SPECIFIC MULTIVARIATE DISTRIBUTIONS
209
where g(·) is a nonnegative function, and Σ is a positive definite matrix. The multivariate normal distribution is obviously of this class, as is the multivariate Laplace distribution discussed above. There are other interesting distributions in this class, including two types of multivariate Pearson distributions. Johnson (1987) discusses general methods for generating variates from the Pearson elliptically contoured distributions. The book edited by Fang and Anderson (1990) contains several papers on applications of elliptically contoured distributions. Cook and Johnson (1981, 1986) define families of non-elliptically symmetric multivariate distributions, and consider their use in applications for modeling data. Johnson (1987) discusses methods for generating variates from those distributions. Distributions with Specified Correlations Li and Hammond (1975) propose a method for a d-variate distribution with specified marginals and variance-covariance matrix. The Li–Hammond method uses the inverse CDF method to transform a d-variate normal into a multivariate distribution with specified marginals. The variance-covariance matrix of the multivariate normal is chosen to yield the specified variance-covariance matrix for the target distribution. The determination of the variance-covariance matrix for the multivariate normal to yield the desired target distribution is difficult, however, and does not always yield a positive definite variance-covariance matrix for the multivariate normal. (An approximate variance-covariance or correlation matrix that is not positive definite can be a general problem in applications of multivariate simulation. See Exercise 6.1 in Gentle, 1998, for a possible solution.) Lurie and Goldberg (1998) modify the Li–Hammond approach by iteratively refining the correlation matrix of the underlying normal using the sample correlation matrix of the transformed variates. They begin with a fixed sample of t multivariate normals with the identity matrix as the variance-covariance. These normal vectors are first linearly transformed by the matrix T (k) as described on page 197 and then transformed by the inverse CDF method into a sample of t vectors with the specified marginal distributions. The correlation matrix of the transformed sample is computed and compared with the target correlation. A measure of the difference in the sample correlation matrix and the target correlation is minimized by iterations over T (k) . A good starting point for T (0) is the d × d matrix that is the square root of the target correlation matrix R (that is, the Cholesky factor) so that T (0)T T (0) = R. The measure of the difference in the sample correlation matrix and the target correlation is a sum of squares, so the minimization is a nonlinear least squares problem. The sample size t to use in the determination of the optimal transformation matrix must be chosen in advance. Obviously, t must be large enough to give some confidence that the sample correlation matrices reflect the target accurately. Because of the number of variables in the optimization
210
CHAPTER 5. SPECIFIC DISTRIBUTIONS
problem, it is likely that t should be chosen proportional to d2 . Once the transformation matrix is chosen, to generate a variate from the target distribution, first generate a variate from Nd (0, I), then apply the linear transformation, and finally apply the inverse CDF transformation. To generate n variates from the target distribution, Lurie and Goldberg (1998) also suggest that the normals be adjusted so that the sample has a mean of 0 and a variancecovariance matrix exactly equal to the expected value that the transformation would yield. (This is constrained sampling, as discussed on page 248.) Vale and Maurelli (1983) also generate general random variates using a multivariate normal distribution with the target correlation matrix as a starting point. They express the individual elements of the multivariate random variable of interest as polynomials in the elements of the multivariate normal random variable, similar to the method of Fleishman (1978) in equation (5.33). They then determine the coefficients in the polynomials so that the lower-order marginal moments correspond to specified values. This does not, of course, mean that the correlation matrix of the random variable determined in this way is the desired matrix. Vale and Maurelli suggest use of the first four marginal moments. Parrish (1990), as mentioned above, gives a method for generating variates from a multivariate Pearson family of distributions. A member of the Pearson family is specified by the first four moments, which of course includes the covariances. Kachitvichyanukul, Cheng, and Schmeiser (1988a) describe methods for inducing correlation in binomial and Poisson random variates.
5.4
Data-Based Random Number Generation
Often, we have a sample and need to generate random numbers from the unknown distribution that yielded it. Specifically, we have a set of observations, {x1 , x2 , . . . , xn }, and we wish to generate a pseudorandom sample from the same distribution as the given dataset. This kind of method is called databased random number generation. Discrete Distributions How we proceed depends on some assumptions about the distribution. If the distribution is discrete and we have no other information about it than what is available from the given dataset, the best way of generating a pseudorandom sample from the distribution is just to generate a random sample of indices with replacement (see Chapter 6) and then use the index set as indices for the given sample. For scalar x, this is equivalent to using the inverse CDF method on the ECDF (the empirical cumulative distribution function): Pn (x) =
1 (number of xi ≤ x). n
5.4. well-designed=' s-plus=' or=' r=' function=' that=' invokes=' random=' number=' generator=' would=' have=' code=' similar=' to=' in=' figure=' 8.5.=' oldseed=' <-=' .random.seed=' ..=' <<-=' return(..)
# save seed on entry # restore seed on exit
Figure 8.5: Saving and Restoring the State of the Generator within an S-Plus or R Function
Monte Carlo in S-Plus and R Explicit loops in S-Plus or R execute very slowly. For that reason, it is best to use array arguments for functions rather than to loop over scalar values of the
EXERCISES
295
arguments. Consider, for example, the problem of evaluating the integral 0016 2 log(x + 1)x2 (2 − x)3 dx. 0
This could be estimated in a loop as follows: # First, initialize n. uu <- runif(n, 0, 2) eu <- 0 for (i in 1:n) eu <- eu + log(uu[i]+1)*uu[i]^2*(2-uu[i])^3 eu <- 2*eu/n A much more efficient way, without the for loop but still using the uniform, is uu <- runif(n, 0, 2) eu <- 2*sum(log(uu+1)*uu^2*(2-uu)^3)/n Alternatively, using the beta density as a weight function, we have eb <- (16/15)*sum(log(2*rbeta(n,3,4)+1))/n (Of course, if we recognize the relationship of the integral to the beta distribution, we would not use Monte Carlo as the method of integration.) For large-scale Monte Carlo studies, an interpretive language such as S-Plus or R may require an inordinate amount of running time. These systems are very useful for prototyping Monte Carlo studies, but it is often better to do the actual computations in a compiled language such as Fortran or C.
Exercises 8.1. Identify as many random number generators as you can that are available on your computer system. Try to determine what method each uses. Do the generators appear to be of high quality? 8.2. Consider the problem of evaluating the integral 0016 π 0016 40016 ∞ x x2 y 3 sin(z)(π + z)2 (π − z)3 e− 2 dx dy dz. −π
0
0
Note the gamma and beta weighting functions. (a) Write a Fortran or C program to use the IMSL Libraries to evaluate this integral by Monte Carlo methods. Use a sample of size 1000, and save the state of the generator, so you can restart it. Now, use a sample of size 10,000, starting where you left off in the first 1000. Combine your two estimates. (b) Now, do the same thing in S-Plus.
296
CHAPTER 8. RANDOM NUMBER GENERATION SOFTWARE (c) Now, do the same thing in Fortran 90 using its built-in random number functions. You may use other software to evaluate special functions if you wish.
8.3. Obtain the programs for Algorithm 738 for generating quasirandom numbers (Bratley, Fox, and Niederreiter, 1994) from the Collected Algorithms of the ACM. The programs are in Fortran and may require a small number of system-dependent modifications, which are described in the documentation embedded in the source code. Devise some tests for Monte Carlo evaluation of multidimensional integrals, and compare the performance of Algorithm 738 with that of a pseudorandom number generator. (Just use any convenient pseudorandom generator available to you.) The subroutine TESTF accompanying Algorithm 738 can be used for this purpose. Can you notice any difference in the performance? 8.4. Obtain the code for SPRNG, the scalable parallel random number generators. The source code is available at http://sprng.cs.fsu.edu Get the code running, preferably on a parallel system. (It will run on a serial machine also.) Choose some simple statistical tests, and apply them to sample output from single streams and also to the output of separate streams. (In the latter case, the primary interest is in correlations across the streams.)
Chapter 9
Monte Carlo Studies in Statistics In statistical inference, certain properties of the test statistic or estimator must be assumed to be known. In simple cases, under rigorous assumptions, we have complete knowledge of the statistic. In testing a mean of a normal distribution, for example, we use a t statistic, and we know its exact distribution. In other cases, however, we may have a perfectly reasonable test statistic but know very little about its distribution. For example, suppose that a statistic T , computed from a differenced time series, could be used to test the hypothesis that the order of differencing is sufficient to yield a series with a zero mean. If enough information about the distribution of T is known under the null hypothesis, that value may be used to construct a test that the differencing is adequate. This, in fact, was what Erastus Lyman de Forest studied in the 1870s in one of the earliest documented Monte Carlo studies of a statistical procedure. De Forest studied ways of smoothing a time series by simulating the data using cards drawn from a box. A description of De Forest’s Monte Carlo study is given in Stigler (1978). Stigler (1991) also describes other Monte Carlo simulation by nineteenth-century scientists and suggests that “simulation, in the modern sense of that term, may be the oldest of the stochastic arts”. Another early use of Monte Carlo was the sampling experiment (using biometric data recorded on pieces of cardboard) that led W. S. Gosset to the discovery of the distribution of the t-statistic and the correlation coefficient. (See Student, 1908a, 1908b. Of course, it was Ronald Fisher who later worked out the distributions.) Major advances in Monte Carlo techniques were made during World War II and afterward by mathematicians and scientists working on problems in atomic physics. (In fact, it was the group led by John von Neumann and S. M. Ulam who coined the term “Monte Carlo” to refer to these methods.) The use of Monte Carlo techniques by statisticians gradually increased from the time of
297
298
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
De Forest, but after the widespread availability of digital computers, the usage greatly expanded. In the mathematical sciences, including statistics, simulation has become an important tool in the development of theory and methods. For example, if the properties of an estimator are very difficult to work out analytically, a Monte Carlo study may be conducted to estimate those properties. Often, the Monte Carlo study is an informal investigation whose main purpose is to indicate promising research directions. If a “quick and dirty” Monte Carlo study indicates that some method of inference has good properties, it may be worth the time of the research worker in developing the method and perhaps doing the difficult analysis to confirm the results of the Monte Carlo study. In addition to quick Monte Carlo studies that are mere precursors to analytic work, Monte Carlo studies often provide a significant amount of the available knowledge of the properties of statistical techniques, especially under various alternative models. A large proportion of the articles in the statistical literature include Monte Carlo studies. In recent issues of the Journal of the American Statistical Association, for example, almost half of the articles report on Monte Carlo studies that supported the research. One common use of Monte Carlo studies is to compare statistical methods. For example, we may wish to compare a procedure based on maximum likelihood with a procedure using least squares. The comparison of methods is often carried out for different distributions for the random component of the model used in the study. It is especially interesting to study how standard statistical methods perform when the distribution of the random component has heavy tails or when the distribution is contaminated by outliers. Monte Carlo methods are widely used in these kinds of studies of the robustness of statistical methods.
9.1
Simulation as an Experiment
A simulation study that incorporates a random component is an experiment. The principles of statistical design and analysis apply just as much to a Monte Carlo study as they do to any other scientific experiment. The Monte Carlo study should adhere to the same high standards of any scientific experimentation: • control; • reproducibility; • efficiency; • careful and complete documentation. In simulation, control, among other things, relates to the fidelity of a nonrandom process to a random process. The experimental units are only simulated.
9.1. SIMULATION AS AN EXPERIMENT
299
Questions about the computer model must be addressed (tests of the random number generators and so on). Likewise, reproducibility is predicated on good random number generators (or else on equally bad ones!). Portability of the random number generators enhances reproducibility and in fact can allow strict reproducibility. Reproducible research also requires preservation and documentation of the computer programs that produced the results (see Buckheit and Donoho, 1995). The principles of good statistical design can improve the efficiency. Use of good designs (fractional factorials, etc.) can allow efficient simultaneous exploration of several factors. Also, there are often many opportunities to reduce the variance (improve the efficiency). Hammersley and Hanscomb (1964, page 8) note .. statisticians were insistent that other experimentalists should design experiments to be as little subject to unwanted error as possible, and had indeed given important and useful help to the experimentalist in this way; but in their own experiments they were singularly inefficient, nay negligent in this respect. Many properties of statistical methods of inference are analytically intractable. Asymptotic results, which are often easy to work out, may imply excellent performance, such as consistency with a good rate of convergence, but the finite sample properties are ultimately what must be considered. Monte Carlo studies are a common tool for investigating the properties of a statistical method, as noted above. In the literature, the Monte Carlo studies are sometimes called “numerical results”. Some numerical results are illustrated by just one randomly generated dataset; others are studied by averaging over thousands of randomly generated sets. In a Monte Carlo study, there are usually several different things (“treatments” or “factors”) that we want to investigate. As in other kinds of experiments, a factorial design is usually more efficient. Each factor occurs at different “levels”, and the set of all levels of all factors that are used in the study constitutes the “design space”. The measured responses are properties of the statistical methods, such as their sample means and variances. The factors commonly studied in Monte Carlo experiments in statistics include the following. • statistical method (estimator, test procedure, etc.) • sample size • the problem for which the statistical method is being applied (that is, the “true” model, which may be different from the one for which the method was developed). Factors relating to the type of problem may be: – distribution of the random component in the model (normality?) – correlation among observations (independence?)
300
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS – homogeneity of the observations (outliers?, mixtures?) – structure of associated variables (leverage?)
The factor whose effect is of primary interest is the statistical method. The other factors are generally just blocking factors. There is, however, usually an interaction between the statistical method and these other factors. As in physical experimentation, observational units are selected for each point in the design space and measured. The measurements, or “responses” made at the same design point, are used to assess the amount of random variation, or variation that is not accounted for by the factors being studied. A comparison of the variation among observational units at the same levels of all factors with the variation among observational units at different levels is the basis for a decision as to whether there are real (or “significant”) differences at the different levels of the factors. This comparison is called analysis of variance. The same basic rationale for identifying differences is used in simulation experiments. A fundamental (and difficult) question in experimental design is how many experimental units to observe at the various design points. Because the experimental units in Monte Carlo studies are generated on the computer, they are usually rather inexpensive. The subsequent processing (the application of the factors, in the terminology of an experiment) may be very extensive, however, so there is a need to design an efficient experiment.
9.2
Reporting Simulation Experiments
The reporting of a simulation experiment should receive the same care and consideration that would be accorded the reporting of other scientific experiments. Hoaglin and Andrews (1975) outline the items that should be included in a report of a simulation study. In addition to a careful general description of the experiment, the report should include mention of the random number generator used, any variance-reducing methods employed, and a justification of the simulation sample size. The Journal of the American Statistical Association includes these reporting standards in its style guide for authors. Closely related to the choice of the sample size is the standard deviation of the estimates that result from the study. The sample standard deviations actually achieved should be included as part of the report. Standard deviations are often reported in parentheses beside the estimates with which they are associated. A formal analysis, of course, would use the sample variance of each estimate to assess the significance of the differences observed between points in the design space; that is, a formal analysis of the simulation experiment would be a standard analysis of variance. The most common method of reporting the results is by means of tables, but a better understanding of the results can often be conveyed by graphs.
9.3. AN EXAMPLE
9.3
301
An Example
One area of statistics in which Monte Carlo studies have been used extensively is robust statistics. This is because the finite sampling distributions of many robust statistics are very difficult to work out, especially for the kinds of underlying distributions for which the statistics are to be studied. A well-known use of Monte Carlo methods is in the important study of robust statistics described by Andrews et al. (1972), who introduced and examined many alternative estimators of location for samples from univariate distributions. This study, which involved many Monte Carlo experiments, employed innovative methods of variance reduction and was very influential in subsequent Monte Carlo studies reported in the statistical literature. As an example of a Monte Carlo study, we will now describe a simple experiment to assess the robustness of a statistical test in linear regression analysis. The purpose of this example is to illustrate some of the issues in designing a Monte Carlo experiment. The results of this small study are not of interest here. There are many important issues about the robustness of the procedures that we do not address in this example. The Problem Consider the simple linear regression model Y = β0 + β1 x + E, where a response or “dependent variable”, Y , is modeled as a linear function of a single regressor or “independent variable”, x, plus a random variable, E, called the “error”. Because E is a random variable, Y is also a random variable. The statistical problem is to make inferences about the unknown, constant parameters β0 and β1 and about distributional parameters of the random variable, E. The inferences are made based on a sample of n pairs, (yi , xi ), with which are associated unobservable realizations of the random error, 0001i , and are assumed to have the relationship (9.1) yi = β0 + β1 xi + 0001i . We also generally assume that the realizations of the random error are independent and are unrelated to the value of x. For this example, let us consider just the specific problem of testing the hypothesis (9.2) H0 : β1 = 0 versus the universal alternative. If the distribution of E is normal and we make the additional assumptions above about the sample, the optimal test for the hypothesis (using the common definitions of optimality) is based on a least squares procedure that yields the statistic % 0001 β00171 (n − 2) (xi − x¯)2 %0001 , (9.3) t= ri2
302
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
Figure 9.1: Least Squares Fit Using Two Datasets that are the Same Except for Two Outliers where x ¯ is the mean of the xs, β00171 together with β00170 minimizes the function L2 (b0 , b1 ) =
n
(yi − b0 − b1 xi )2 ,
i=1
and
ri = yi − (β00170 + β00171 xi ).
If the null hypothesis is true, then t is a realization of a Student’s t distribution with n − 2 degrees of freedom. The test is performed by comparing the p-value from the Student’s t distribution with a preassigned significance level, α, or by comparing the observed value of t with a critical value. The test of the hypothesis depends on the estimates of β0 and β1 used in the test statistic t. Often, a dataset contains outliers (that is, observations that have a realized error that is very large in absolute value) or observations for which the model is not appropriate. In such cases, the least squares procedure may not perform so well. We can see the effect of some outliers on the least squares estimates of β0 and β1 in Figure 9.1. For well-behaved data, as in the plot on the left, the least squares estimates seem to fit the data fairly well. For data with two outlying points, as in the plot on the right in Figure 9.1, the least squares estimates are affected so much by the two points in the upper left part of the graph that the estimates do not provide a good fit for the bulk of the data. Another method of fitting the linear regression line that is robust to outliers in E is to minimize the absolute values of the deviations. The least absolute
9.3. AN EXAMPLE
303
values procedure chooses estimates of β0 and β1 to minimize the function L1 (b0 , b1 ) =
n
|yi − b0 − b1 xi |.
i=1
Figure 9.2 shows the same two datasets as before with the least squares (LS) fit and the least absolute values (LAV) fit plotted on both graphs. We see that the least absolute values fit does not change because of the outliers.
Figure 9.2: Least Squares Fits and Least Absolute Values Fits Another concern in regression analysis is the unduly large influence that some individual observations exert on the aggregate statistics because the values of x in those observations lie at a large distance from the mean of all of the xi s (that is, those observations whose values of the independent variables are outliers). The influence of an individual observation is called leverage. Figure 9.3 shows two datasets together with the least squares and the least absolute values fits for both. In both datasets, there is one value of x that lies far outside the range of the other values of x. All of the data in the plot on the left in Figure 9.3 lie relatively close to a line, and both fits are very similar. In the plot on the right, the observation with an extreme value of x also happens to have an outlying value of E. The effect on the least squares fit is marked, while the least absolute values fit is not affected as much. (Despite this example, least absolute values fits are generally not very robust to outliers at high leverage points, especially if there are multiple such outliers. There are other methods of fitting that are more robust to outliers at high leverage points. We refer the interested reader to Rousseeuw and Leroy, 1987, for discussion of these issues.)
304
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
Figure 9.3: Least Squares and Least Absolute Values Fits Now, we continue with our original objective in this example: to evaluate ways of testing the hypothesis (9.2). A test statistic analogous to the one in equation (9.3), but based on the least absolute values fit, is %0001 (xi − x¯)2 2β˜1 √ , (9.4) t1 = (e(k2 ) − e(k1 ) ) n − 2 where β˜1 together with β˜0 minimizes the function L1 (b0 , b1 ) =
n
|yi − b0 − b1 xi |,
i=1
e(k) is the k th order statistic from ei = yi − (β˜0 + β˜1 xi ), √ k1 is the integer √ closest to (n − 1)/2 − n − 2, and k2 is the integer closest to (n − 1)/2 + n − 2. This statistic has an approximate Student’s t distribution with n − 2 degrees of freedom (see Birkes and Dodge, 1993, for example). If the distribution of the random error is normal, inference based on minimizing the sum of the absolute values is not nearly as efficient as inference based on least squares. This alternative to least squares should therefore be used with some discretion. Furthermore, there are other procedures that may warrant consideration. It is not our purpose here to explore these important issues in robust statistics, however.
9.3. AN EXAMPLE
305
The Design of the Experiment At this point, we should have a clear picture of the problem: we wish to compare two ways of testing the hypothesis (9.2) under various scenarios. The data may have outliers, and there may be observations with large leverage. We expect that the optimal test procedure will depend on the presence of outliers or, more generally, on the distribution of the random error and on the pattern of the values of the independent variable. The possibilities of interest for the distribution of the random error include • the family of the distribution (that is, normal, double exponential, Cauchy, and so on); • whether the distribution is a mixture of more than one basic distribution, and, if so, the proportions in the mixture; • the values of the parameters of the distribution (that is, the variance, the skewness, or any other parameters that may affect the power of the test). In textbooks on the design of experiments, a simple objective of an experiment is to perform a t test or an F test of whether different levels of response are associated with different treatments. Our objective in the Monte Carlo experiment that we are designing is to investigate and characterize the dependence of the performance of the hypothesis test on these factors. The principles of design are similar to those of other experiments, however. It is possible that the optimal test of the hypothesis will depend on the sample size or on the true values of the coefficients in the regression model, so some additional issues that are relevant to the performance of a statistical test of this hypothesis are the sample size and the true values of β0 and β1 . In the terminology of statistical models, the factors in our Monte Carlo experiment are the estimation method and the associated test, the distribution of the random error, the pattern of the independent variable, the sample size, and the true value of β0 and β1 . The estimation method together with the associated test is the “treatment” of interest. The “effect” of interest (that is, the measured response) is the proportion of times that the null hypothesis is rejected using the two treatments. We now can see our objective more clearly: for each setting of the distribution, pattern, and size factors, we wish to measure the power of the two tests. These factors are similar to blocking factors except that there is likely to be an interaction between the treatment and these factors. Of course, the power depends on the nominal level of the test, α. It may be the case that the nominal level of the test affects the relative powers of the two tests. We can think of the problem in the context of a binary response model, E(Pijklqsr ) = f (τi , δj , φk , νl , αq , β1s ),
(9.5)
where the parameters represent levels of the factors listed above (β1s is the sth level of β1 ), and Pijklqsr is a binary variable representing whether the test
306
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
rejects the null hypothesis on the rth trial at the (ijklqs)th setting of the design factors. It is useful to write down a model like this to remind ourselves of the issues in designing an experiment. At this point, it is necessary to pay careful attention to our terminology. We are planning to use a statistical procedure (a Monte Carlo experiment) to evaluate a statistical procedure (a statistical test in a linear model). For the statistical procedure that we will use, we have written a model (9.5) for the observations that we will make. Those observations are indexed by r in that model. Let m be the sample size for each combination of factor settings. This is the Monte Carlo sample size. It is not to be confused with the data sample size, n, which is one of the factors in our study. We now choose the levels of the factors in the Monte Carlo experiment. • For the estimation method, we have decided on two methods: least squares and least absolute values. Its differential effect in the binary response model (9.5) is denoted by τi for i = 1, 2. • For the distribution of the random error, we choose three general ones: 1. normal (0, 1); 2. normal (0, 1) with c% outliers from normal (0, d2 ); 3. standard Cauchy. We choose different values of c and d as appropriate. For this example, let us choose c = 5 and 20 and d = 2 and 5. Thus, in the binary response model (9.5), j = 1, 2, 3, 4, 5, 6. • For the pattern of the independent variable, we choose three different arrangements: 1. uniform over the range; 2. a group of extreme outliers; 3. two groups of outliers. In the binary response model (9.5), k = 1, 2, 3. We use fixed values of the independent variable. • For the sample size, we choose three values: 20, 200, and 2000. In the binary response model (9.5), l = 1, 2, 3. • For the nominal level of the test, we choose two values: 0.01 and 0.05. In the binary response model (9.5), q = 1, 2. • The true value of β0 is probably not relevant, so we just choose β0 = 1. We are interested in the power of the tests at different values of β1 . We expect the power function to be symmetric about β1 = 0 and to approach 1 as |β1 | increases.
9.3. AN EXAMPLE
307
The estimation method is the “treatment” of interest. Restating our objective in terms of the notation introduced above, for each of two tests, we wish to estimate the power curve, Pr(reject H0 ) = g(β1 | τi , δj , φk , νl , αq ), for any combination (τi , δj , φk , νl , αq ). For either test, this curve should have the general appearance of the curve shown in Figure 9.4. The minimum of the power curve should occur at β1 = 0 and should be α. The curve should approach 1 symmetrically as |β1 |.
Figure 9.4: Power Curve for Testing β1 = 0 To estimate the curve, we use a discrete set of points, and because of symmetry, all values chosen for β1 can be nonnegative. The first question is at what point does the curve flatten out just below 1. We might arbitrarily define the region of interest to be that in which the power is less than 0.99 approximately. The abscissa of this point is the maximum β1 of interest. This point, say β1∗ , varies, depending on all of the factors in the study. We could work this out in the least squares case for uncontaminated normal errors using the noncentral Student’s t distribution, but, for other cases, it is analytically intractable. Hence, we compute some preliminary Monte Carlo estimates to determine the maximum β1 for each factor combination in the study. To do a careful job of fitting a curve using a relatively small number of points, we would choose points where the second derivative is changing rapidly and especially near points of inflection where the second derivative changes sign. Because the problem of determining these points for each combination of
308
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
(i, j, k, l, q) is not analytically tractable (otherwise, we would not be doing the study!), we may conveniently choose a set of points equally spaced between 0 and β1∗ . Let us decide on five such points for this example. It is not important that the β1∗ s be chosen with a great deal of care. The objective is that we be able to calculate two power curves between 0 and β1∗ that are meaningful for comparisons. The Experiment The observational units in the experiment are the values of the test statistics (9.3) and (9.4). The measurements are the binary variables corresponding to rejection of the hypothesis (9.2). At each point in the factor space, there will be m such observations. If z is the number of rejections observed, then the estimate of the power is z/m, and the variance of the estimator is π(1 − π)/m, where π is the true power at that point. (z is a realization of a binomial random variable with parameters m and π.) This leads % us to a choice of the value of m. The coefficient of variation at any point is (1 − π)/(mπ), which increases as π decreases. At π = 0.50, a 5% coefficient of variation can be achieved with a sample of size 400. This yields a standard deviation of 0.025. There may be some motivation to choose a slightly larger value of m because we can assume that the minimum of π will be approximately the minimum of α. To achieve a 5% coefficient of variation at the point at which α1 = 0.05 would require a sample of size approximately 160,000. That would correspond to a standard deviation of 0.0005, which is probably much smaller than we need. A sample size of 400 would yield a standard deviation of 0.005. Although that is large in a relative sense, it may be adequate for our purposes. Because this particular point (where β1 = 0) corresponds to the null hypothesis, however, we may choose a larger sample size, say 4000, at that special point. A reasonable choice therefore is a Monte Carlo sample size of 4000 at the null hypothesis and 400 at all other points. We will, however, conduct the experiment in such a way that we can combine the results of this experiment with independent results from a subsequent experiment. The experiment is conducted by running a computer program. The main computation in the program is to determine the values of the test statistics and to compare them with their critical values to decide on the hypothesis. These computations need to be performed at each setting of the factors and for any given realization of the random sample. We design a program that allows us to loop through the settings of the factors and, at each factor setting, to use a random sample. The result is a nest of loops. The program may be stopped and restarted, so we need to be able to control the seeds (see Section 8.2, page 286). Recalling that the purpose of our experiment is to obtain estimates, we may now consider any appropriate methods of reducing the variance of those estimates. There is not much opportunity to apply methods of variance reduction discussed in Section 7.5, but at least we might consider at what points to use
9.3. AN EXAMPLE
309
common realizations of the pseudorandom variables. Because the things that we want to compare most directly are the powers of the tests, we perform the tests on the same pseudorandom datasets. Also, because we are interested in the shape of the power curves, we may want to use the same pseudorandom datasets at each value of β1 ; that is, to use the same set of errors in the model (9.1). Finally, following similar reasoning, we may use the same pseudorandom datasets at each setting of the pattern of the independent variable. This implies that our program of nested loops has the structure shown in Figure 9.5. Initialize a table of counts. Fix the data sample size. (Loop over the sample sizes n = 20, n = 200, and n = 2000.) Generate a set of residuals for the linear regression model (9.1). (This is the loop of m Monte Carlo replications.) Fix the pattern of the independent variable. (Loop over patterns P1 , P2 , and P3 .) Choose the distribution of the error term. (Loop over the distributions D1 , D2 , D3 , D4 , D5 , and D6 .) For each value of β1 , generate a set of observations (the y values) for the linear regression model (9.1), and perform the tests using both procedures and at both levels of significance. Record results. End distributions loop. End patterns loop. End Monte Carlo loop. End sample size loop. Perform computations of summary statistics. Figure 9.5: Program Structure for the Monte Carlo Experiment After writing a computer program with this structure, the first thing is to test the program on a small set of problems and determine appropriate values of β1∗ . We should compare the results with known values at a few points. (As mentioned earlier, the only points that we can work out correspond to the normal case with the ordinary t statistic. One of these points, at β1 = 0, is easily checked.) We can also check the internal consistency of the results. For example, does the power curve increase? We must be careful, of course, in applying such consistency checks because we do not know the behavior of the tests in most cases. Reporting the Results The report of this Monte Carlo study should address as completely as possible the results of interest. The relative values of the power are the main points
310
CHAPTER 9. MONTE CARLO STUDIES IN STATISTICS
of interest. The estimated power at β1 = 0 is of interest. This is the actual significance level of the test, and how it compares to the nominal level α is of particular interest. The presentation should be in a form easily assimilated by the reader. This may mean graphs similar to Figure 9.4, except only the nonnegative half, and with the tick marks on the horizontal axis. Two graphs, for the two test procedures, should be shown on the same set of axes. It is probably counterproductive to show a graph for each factor setting. (There are 108 combinations of factor settings.) In addition to the graphs, tables may allow presentation of a large amount of information in a compact format. The Monte Carlo study should be described so carefully that the study could be replicated exactly. This means specifying the factor settings, the loop nesting, the software and computer used, the seed used, and the Monte Carlo sample size. There should also be at least a simple statement explaining the choice of the Monte Carlo sample size. As mentioned earlier, the statistical literature is replete with reports of Monte Carlo studies. Some of these reports (and, likely, the studies themselves) are woefully deficient. An example of a careful Monte Carlo study and a good report of the study are given by Kleijnen (1977). He designed, performed, and reported on a Monte Carlo study to investigate the robustness of a multiple ranking procedure. In addition to reporting on the study of the question at hand, another purpose of the paper was to illustrate the methods of a Monte Carlo study.
Exercises 9.1. Write a computer program to implement the Monte Carlo experiment described in Section 9.3. The S-Plus functions lsfit and l1fit or the IMSL Fortran subroutines rline and rlav can be used to calculate the fits. See Chapter 8 for discussions of other software that you may use in the program. 9.2. Choose a recent issue of the Journal of the American Statistical Association and identify five articles that report on Monte Carlo studies of statistical methods. In each case, describe the Monte Carlo experiment. (a) What are the factors in the experiment? (b) What is the measured response? (c) What is the design space (that is, the set of factor settings)? (d) What random number generators were used? (e) Critique the report in each article. Did the author(s) justify the sample size? Did the author(s) report variances or confidence intervals? Did the author(s) attempt to reduce the experimental variance?
EXERCISES
311
9.3. Select an article that you identified in Exercise 9.2 that concerns a statistical method that you understand and that interests you. Choose a design space that is not a subspace of that used in the article but has a nonnull intersection with it, and perform a similar experiment. Compare your results with those reported in the article.
This page intentionally left blank
Appendix A
Notation and Definitions All notation used in this work is “standard”, and in most cases it conforms to the ISO conventions. (The notable exception is the notation for vectors.) I have opted for simple notation, which, of course, results in a one-to-many map of notation to object classes. Within a given context, however, the overloaded notation is generally unambiguous. I have endeavored to use notation consistently. This appendix is not intended to be a comprehensive listing of definitions. The subject index, beginning on page 377, is a more reliable set of pointers to definitions, except for symbols that are not words.
General Notation Uppercase italic Latin and Greek letters, A, B, E, Λ, and so on, are generally used to represent either matrices or random variables. Random variables are usually denoted by letters nearer the end of the Latin alphabet, X, Y , Z, and by the Greek letter E. Parameters in models (that is, unobservables in the models), whether or not they are considered to be random variables, are generally represented by lowercase Greek letters. Uppercase Latin and Greek letters, especially P , in general, and Φ, for the normal distribution, are also used to represent cumulative distribution functions. Also, uppercase Latin letters are used to denote sets. Lowercase Latin and Greek letters are used to represent ordinary scalar or vector variables and functions. No distinction in the notation is made between scalars and vectors; thus, β may represent a vector, and βi may represent the ith element of the vector β. In another context, however, β may represent a scalar. All vectors are considered to be column vectors, although we may write a vector as x = (x1 , x2 , . . . , xn ). Transposition of a vector or a matrix is denoted by a superscript T . Uppercase calligraphic Latin letters, F, V, W, and so on, are generally used to represent either vector spaces or transforms. 313
314
APPENDIX A. NOTATION AND DEFINITIONS
Subscripts generally represent indexes to a larger structure; for example, xij may represent the (i, j)th element of a matrix, X. A subscript in parentheses represents an order statistic. A superscript in parentheses represents an (k) iteration, for example, xi may represent the value of xi at the k th step of an iterative process. The following are some examples: xi
The ith element of a structure (including a sample, which is a multiset).
x(i)
The ith order statistic.
x(i)
The value of x at the ith iteration.
Realizations of random variables and placeholders in functions associated with random variables are usually represented by lowercase letters corresponding to the uppercase letters; thus, 0001 may represent a realization of the random variable E. A single symbol in an italic font is used to represent a single variable. A Roman font or a special font is often used to represent a standard operator or a standard mathematical structure. Sometimes, a string of symbols in a Roman font is used to represent an operator (or a standard function); for example, exp represents the exponential function, but a string of symbols in an italic font on the same baseline should be interpreted as representing a composition (probably by multiplication) of separate objects; for example, exp represents the product of e, x, and p. A fixed-width font is used to represent computer input or output; for example, a = bx + sin(c). In computer text, a string of letters or numerals with no intervening spaces or other characters, such as bx above, represents a single object, and there is no distinction in the font to indicate the type of object. Some important mathematical structures and other objects are: IR
The field of reals or the set over which that field is defined.
IRd
The usual d-dimensional vector space over the reals or the set of all d-tuples with elements in IR.
IRd+
The set of all d-tuples with positive real elements.
APPENDIX A. NOTATION AND DEFINITIONS
315
C I
The field of complex numbers or the set over which that field is defined.
ZZ
The ring of integers or the set over which that ring is defined.
G(n) I
A Galois field defined on a set with n elements.
C 0 , C 1 , C 2 , . . . The set of continuous functions, the set of functions with continuous first derivatives, and so forth. √ i The imaginary unit −1.
Computer Number Systems Computer number systems are used to simulate the more commonly used number systems. It is important to realize that they have different properties, however. Some notation for computer number systems follows. IF
The set of floating-point numbers with a given precision, on a given computer system, or this set together with the four operators +, -, *, and /. IF is similar to IR in some useful ways; it is not, however, closed under the two basic operations, and not all reciprocals of the elements exclusive of the additive identity exist, so it is clearly not a field.
II
The set of fixed-point numbers with a given length, on a given computer system, or this set together with the four operators +, -, *, and /. II is similar to ZZ in some useful ways; it is not, however, closed under the two basic operations, so it is clearly not a ring.
emin and emax The minimum and maximum values of the exponent in the set of floating-point numbers with a given length. 0001min and 0001max The minimum and maximum spacings around 1 in the set of floating-point numbers with a given length. 0001 or 0001mach
The machine epsilon, the same as 0001min .
[·]c
The computer version of the object ·.
NaN
Not-a-Number.
316
APPENDIX A. NOTATION AND DEFINITIONS
Notation Relating to Random Variables A common function used with continuous random variables is a density function, and a common function used with discrete random variables is a probability function. The more fundamental function for either type of random variable is the cumulative distribution function, or CDF. The CDF of a random variable X, denoted by PX (x) or just by P (x), is defined by P (x) = Pr(X ≤ x), where “Pr”, or “probability”, can be taken here as a primitive (it is defined in terms of a measure). For vectors (of the same length), “X ≤ x” means that each element of X is less than or equal to the corresponding element of x. Both the CDF and the density or probability function for a d-dimensional random variable are defined over IRd . (It is unfortunately necessary to state that “P (x)” means the “function P evaluated at x”, and likewise “P (y)” means the same “function P evaluated at y” unless P has been redefined. Using a different expression as the argument does not redefine the function despite the sloppy convention adopted by some statisticians—including myself sometimes!) The density for a continuous random variable is just the derivative of the CDF (if it exists). The CDF is therefore the integral. To keep the notation simple, we likewise consider the probability function for a discrete random variable to be a type of derivative (a Radon–Nikodym derivative) of the CDF. Instead of expressing the CDF of a discrete random variable as a sum over a countable set, we often also express it as an integral. (In this case, however, the integral is over a set whose ordinary Lebesgue measure is 0.) A useful analog of the CDF for a random sample is the empirical cumulative distribution function, or ECDF. For a sample of size n, the ECDF is 1 I(−∞,x] (xi ) n i=1 n
Pn (x) =
for the indicator function I(−∞,x] (·). Functions and operators such as Cov and E that are commonly associated with Latin letters or groups of Latin letters are generally represented by that letter in a Roman font. Pr(A)
The probability of the event A.
pX (·) or PX (·)
The probability density function (or probability function), or the cumulative probability function, of the random variable X.
pXY (·) or PXY (·)
The joint probability density function (or probability function), or the joint cumulative probability function, of the random variables X and Y .
APPENDIX A. NOTATION AND DEFINITIONS
317
pX|Y (·) or PX|Y (·)
The conditional probability density function (or probability function), or the conditional cumulative probability function, of the random variable X given the random variable Y (these functions are random variables).
pX|y (·) or PX|y (·)
The conditional probability density function (or probability function), or the conditional cumulative probability function, of the random variable X given the realization y. Sometimes, the notation above is replaced by a similar notation in which the arguments indicate the nature of the distribution; for example, p(x, y) or p(x|y).
pθ (·) or Pθ (·)
The probability density function (or probability function), or the cumulative probability function, of the distribution characterized by the parameter θ.
Y ∼ DX (θ)
The random variable Y is distributed as DX (θ), where X is the name of a random variable associated with the distribution, and θ is a parameter of the distribution. The subscript may take forms similar to those used in the density and distribution functions, such as X|y, or it may be omitted. Alternatively, in place of DX , a symbol denoting a specific distribution may be used. An example is Z ∼ N(0, 1), which means that Z has a normal distribution with mean 0 and variance 1.
CDF
A cumulative distribution function.
ECDF
An empirical cumulative distribution function.
i.i.d.
Independent and identically distributed. d
X (i) → X d or Xi → X
The sequence of random variables X (i) or Xi converges in distribution to the random variable X. (The difference in the notation X (i) and Xi is generally unimportant. The former notation is often used to emphasize the iterative nature of a process.)
E(g(X))
The expected value of the function g of the random variable X. The notation EP (·), where P is a cumulative distribution function or some other identifier of a probability distribution, is sometimes used to indicate explicitly the distribution with respect to which the expectation is evaluated.
V(g(X))
The variance of the function g of the random variable X. The notation VP (·) is also often used.
318
APPENDIX A. NOTATION AND DEFINITIONS
Cov(X, Y )
The covariance of the random variables X and Y . The notation CovP (·, ·) is also often used.
Cov(X)
The variance-covariance matrix of the vector random variable X.
Corr(X, Y )
The correlation of the random variables X and Y . The notation CorrP (·, ·) is also often used.
Corr(X)
The correlation matrix of the vector random variable X.
Bias(T, θ) or Bias(T )
The bias of the estimator T (as an estimator of θ); that is,
MSE(T, θ) or MSE(T )
The mean squared error of the estimator T (as an estimator of θ); that is,
Bias(T, θ) = E(T ) − θ.
2 MSE(T, θ) = Bias(T, θ) + V(T ).
General Mathematical Functions and Operators Functions such as sin, max, span, and so on that are commonly associated with groups of Latin letters are generally represented by those letters in a Roman font. Generally, the argument of a function is enclosed in parentheses: sin(x). Often, for the very common functions, the parentheses are omitted: sin x. In expressions involving functions, parentheses are generally used for clarity, for example, (E(X))2 instead of E2 (X). Operators such as d (the differential operator) that are commonly associated with a Latin letter are generally represented by that letter in a Roman font. |x|
The modulus of the real or complex number x; if x is real, |x| is the absolute value of x.
0012x0014
The ceiling function evaluated at the real number x: 0012x0014 is the smallest integer greater than or equal to x.
0006x0007
The floor function evaluated at the real number x: 0006x0007 is the largest integer less than or equal to x.
#S
The cardinality of the set S.
APPENDIX A. NOTATION AND DEFINITIONS IS (·)
319
The indicator function: IS (x)
= 1 if x ∈ S; = 0 otherwise.
If x is a scalar, the set S is often taken as the interval (−∞, y], and, in this case, the indicator function is the Heaviside function, H, evaluated at the difference of the argument and the upper bound on the interval: I(−∞,y] (x) = H(y − x). (An alternative definition of the Heaviside function is the same as this, except that H(0) = 12 .) In higher dimensions, the set S is often taken as the product set, Ad
= (−∞, y1 ] × (−∞, y2 ] × · · · × (−∞, yd ] = A1 × A2 × · · · × Ad ,
and, in this case, IAd (x) = IA1 (x1 )IA2 (x2 ) · · · IAd (xd ), where x = (x1 , x2 , . . . , xd ). The derivative of the indicator function is the Dirac delta function, δ(·). δ(·)
The Dirac delta “function”, defined by δ(x) = 0 for x = 0, 0016
and
∞
δ(t) dt = 1. −∞
The Dirac delta function is not a function in the usual sense. For any continuous function f , we have the useful fact 0016 ∞ 0016 ∞ f (y) dI(−∞,y] (x) = f (y) δ(y − x) dy −∞
−∞
= f (x). minf (·) or min(S)
The minimum value of the real scalar-valued function f , or the smallest element in the countable set of real numbers S.
argminf (·)
The value of the argument of the real scalar-valued function f that yields its minimum value.
320
APPENDIX A. NOTATION AND DEFINITIONS
⊕
Bitwise binary exclusive-or (see page 39).
O(f (n))
Big O; g(n) = O(f (n)) means that there exists a positive constant M such that |g(n)| ≤ M |f (n)| as n → ∞. g(n) = O(1) means that g(n) is bounded from above.
d
The differential operator. The derivative with respect to the d . variable x is denoted by dx 0002
f 0007 , f 00070007 , . . . , f k For the scalar-valued function f of a scalar variable, differentiation (with respect to an implied variable) taken on the function once, twice, . . ., k times. x ¯
The mean of a sample of objects generically denoted by x.
x•
The 0001 sum of the elements in the object x. More generally, xi•k = j xijk .
x−
The multiplicative inverse of x with respect to some modulus (see page 36).
Special Functions log x
The natural logarithm evaluated at x.
sin x
The sine evaluated at x (in radians) and similarly for other trigonometric functions.
x!
The factorial of x. If x is a positive integer, x! = x(x−1) · · · 2·1. For other values of x, except negative integers, x! is often defined as x! = Γ(x + 1).
Γ(α)
The (complete) gamma function. For α not equal to a nonpositive integer, 0016 ∞ tα−1 e−t dt. Γ(α) = 0
We have the useful relationship √ Γ(α) = (α − 1)!. An important argument is 12 , and Γ( 12 ) = π.
APPENDIX A. NOTATION AND DEFINITIONS Γx (α)
The incomplete gamma function: 0016 x Γx (α) = tα−1 e−t dt. 0
B(α, β)
The (complete) beta function: 0016
1
B(α, β) =
tα−1 (1 − t)β−1 dt,
0
where α > 0 and β > 0. A useful relationship is B(α, β) =
Bx (α, β)
Γ(α)Γ(β) . Γ(α + β)
The incomplete beta function: 0016 x Bx (α, β) = tα−1 (1 − t)β−1 dt. 0
321
This page intentionally left blank
Appendix B
Solutions and Hints for Selected Exercises 1.5.
With a = 17, the correlation of pairs of successive numbers should be about 0.09, and the plot should show 17 lines. With a = 85, the correlation of lag 1 is about 0.03, but the correlation of lag 2 is about −0.09.
1.8.
35 planes for 65 541 and 15 planes for 65 533.
1.10.
950 706 376, 129 027 171, 1 728 259 899, 365 181 143, 1 966 843 080, 1 045 174 992, 636 176 783, 1 602 900 997, 640 853 092, 429 916 489.
1.13.
We seek x0 such that 16 807x0 − (231 − 1)c1 = 231 − 2 for some integer c1 . First, observe that 231 − 2 is equivalent to −1, so we use Euler’s method (see, e.g., Ireland and Rosen, 1991, or Fang and Wang, 1994) with that simpler value and write x0
((231 − 1)c1 − 1) 16 807 (2836c1 − 1) = 127 773c1 + . 16 807
=
Because the latter term must also be an integer, we write 16 807c2
=
2836c1 − 1
or c1
=
5c2 + 323
(2627c2 + 1) 2836
324
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES for some integer c2 . Continuing, (209c3 − 1) , 2627 (119c4 + 1) c3 = 12c4 + , 209 (90c5 − 1) , c4 = c5 + 119 (29c6 + 1) c5 = c6 + , 90 (3c7 − 1) , c6 = 3c7 + 29 (3c8 + 1) c7 = 9c8 + , 3 (c9 − 1) . c8 = c9 + 2 Letting c9 = 1, we can backsolve to get x0 = 739 806 647. c2
1.14.
= c3 +
Using Maple, for example, > pr := 0: > while pr < 8191 do > pr := primroot(pr, 8191) > od; yields the 1728 primitive roots, starting with the smallest one, 17, and going through the largest, 8180. To use primroot, you may first have to attach the number theory package: with(numtheory):.
1.15.
0.5.
2.2c.
The distribution is degenerate with probability 1 for r = min(n, m); that is, the matrix is of full rank with probability 1.
2.3.
Out of the 100 trials, 97 times the maximum element is in position 1311. The test is not really valid because the seeds are all relatively small and are very close together. Try the same test but with 100 randomly generated seeds.
4.1a.
X is a random variable with an absolutely continuous distribution function P . Let Y be the random variable P (X). Then, for 0 ≤ t ≤ 1, using the existence of P −1 , Pr(Y ≤ t)
= Pr(P (X) ≤ t) = Pr(X ≤ P −1 (t)) = P −1 (P (t)) = t.
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
325
Hence, Y has a U(0, 1) distribution. 4.2.
Let Z be the random variable delivered. For any x, because Y (from the density g) and U are independent, we have 0011 0012 p(Y ) Pr(Z ≤ x) = Pr Y ≤ x | U ≤ cg(Y ) 001c x 001c p(t)/cg(t) g(t) ds dt 0 = 001c−∞ ∞ 001c p(t)/cg(t) g(t) ds dt −∞ 0 0016 x = p(t) dt, −∞
the distribution function corresponding to p. Differentiating this quantity with respect to x yields p(x). 4.4c.
Using the relationship 1 1 1 x2 √ e− 2 ≤ √ e 2 −|x| 2π 2π (see Devroye, 1986a), we have the following algorithm, after simplification. 1. Generate x from the double exponential, and generate u from U(0, 1). 2. If x2 + 1 − 2|x| ≤ −2 log u, then deliver x; otherwise, go to step 1.
4.5.
As x → ∞, there is no c such that cg(x) ≥ p(x), where g is the normal density and p is the exponential density.
4.7a.
E(T ) = c; V(T ) = c2 − c. (Note that c ≥ 1.)
4.8.
For any t, we have Pr(X ≤ t)
= Pr(X ≤ s + rh) (for 0 ≤ r ≤ 1) = Pr(U ≤ r | V ≤ U + p(s + hU )/b) 001c r 001c u+p(s+hu)/b 2 dv du = 001c01 001cuu+p(s+hu)/b 2 dv du 001c0r u (p(s + hu)/b) du = 001c01 (p(s + hu)/b) du 0 0016 t p(x) dx, = s
where all of the symbols correspond to those in Algorithm 4.7 with the usual convention of uppercase representing random variables and lowercase representing constants or realizations of random variables.
326
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
5.2b.
We can consider only the case in which τ ≥ 0; otherwise, we could make use of the symmetry of the normal distribution and split the algorithm into two regions. Also, for simplicity, we can generate truncated normals with µ = 0 and σ 2 = 1 and then shift and scale just as we do for the full normal distribution. The probability of acceptance is the ratio of the area under the truncated exponential (the majorizing function) to the area under the truncated normal density. For an exponential with parameter λ and truncated at τ , the density is g(x) = λe−λ(x−τ ) . To scale this so that it majorizes the truncated normal density requires a constant c that does not depend on λ. We can write the probability of acceptance as 2 cλe−λτ −λ /2 . Maximizing this quantity (by taking the derivative and equating to 0) yields the equation λ2 − λτ − 1 = 0.
5.3.
Use the fact that U and 1 − U have the same distribution.
5.6b.
A simple program using the IMSL routine bnrdf can be used to compute r. Here is a fragment of code that will work: 10 pl ph if rt pt if
= bnrdf(z1,z2,rl) = bnrdf(z1,z2,rh) (abs(ph-pl) .le. eps) go to 99 = rl + (rh-rl)/2. = bnrdf(z1,z2,rt) (pt .gt. prob) then rh = rt else rl = rt endif go to 10 99 continue print *, rl 5.7b
We need the partitions of Σ −1 : Σ Now,
−1
0013 =
Σ11 Σ21
Σ12 Σ22
0013
0014−1 =
T11 T21
T12 T22
−1 T −1 Σ11 T11 = Σ11 − Σ12 Σ22
(see Gentle, 1998, page 61).
0014 .
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
327
−1 (x2 − The conditional distribution of X1 given X2 = x2 is Nd1 (µ1 +Σ12 Σ22 −1 µ2 ), T11 ) (see any book on multivariate distributions, such as Kotz, Balakrishnan, and Johnson, 2000). Hence, first, take y1 as rnorm(d1). Then, −1/2 −1 (x2 − µ2 ), x1 = T11 y1 + µ1 + Σ12 Σ22 −1/2
where T11
−1 −1 T is the Cholesky factor of T11 , that is, of Σ11 − Σ12 Σ22 Σ11 .
If Y2 is a d2 -variate random variate with a standard circular normal distribution, and X1 has the given relationship, then −1 E(X1 |X2 = x2 ) = µ1 + Σ12 Σ22 (x2 − µ2 ),
and
−1/2
V(X1 |X2 = x2 ) = T11
−1/2
V(Y1 )T11
−1 = T11 .
6.2a.
The problem with random sampling using a pseudorandom number generator is that the fixed relationships in the generator must be ignored – else there can be no simple random sample larger than 1. On the other hand, if these fixed relationships are ignored, then it does not make sense to speak of a period.
7.2b.
The set is a random sample from the distribution with density f .
7.3b.
The integral can be reduced to 0016 2' 0
π cos πydy. y
Generate yi as 2ui , where ui are from U(0, 1), and estimate the integral as √ cos πyi . 2 π √ yi 7.6.
The order of the variance is O(n−2 ). The order is obviously dependent on the dimension of the integral, however, and, in higher dimensions, it is not competitive with the crude Monte Carlo method.
7.9a.
Generate xi from a gamma(3, 2) distribution, and take your estimator as 0001 sin(πxi ) . 16 n
7.10b. The optimum is l = d. 7.10d. An unbiased estimator for θ is d2 (n1 + n2 ) . (dl − l2 )n The optimum is l = d.
328
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
7.14a.
(
∗
x ) EP0017(¯
1 ∗ = EP0017 x n i i 1 = E (x∗ ) n i P0017 i 1 x¯ = n i
)
= x ¯. Note that the empirical distribution is a conditional distribution, given the sample. With the sample fixed, x ¯ is a “parameter” rather than a “statistic”. 7.14b.
(
∗
x ) EP (¯
= = = =
1 ∗ EP x n i i 1 EP (x∗i ) n i 1 µ n i µ.
)
Alternatively, EP (¯ x∗ )
7.14c. First, note that
EP EP0017(¯ x∗ ) = EP (¯ x) = µ.
=
x∗j ) = x ¯, EP0017(¯ 11 VP0017(¯ x∗j ) = ¯)2 , (xi − x nn
and VP0017(¯ x∗j ) =
1 (xi − x ¯ )2 . mn2
Now, EP0017(V )
=
1 2 E x¯∗j − x ¯∗j m − 1 P0017 j
APPENDIX B. SOLUTIONS AND HINTS FOR EXERCISES
= = = = =
329
1 2 E x ¯∗2 x∗j j − m¯ m − 1 P0017 j 1 2 m m 2 m¯ x + ¯)2 − m¯ x2 − − x ¯ ) (x (xi − x i m−1 n mn2 0012 0011
m 1 1 2 2 (x (x − x ¯ ) − − x ¯ ) i i m − 1 n2 n2 1 (xi − x ¯)2 n2 1 2 σ . n P0017
7.14d. EP (V )
= EP (EP0017(V )) 0012 0011 1 = EP (xi − x¯)2 /n n 1 n−1 2 = σP . n n
This page intentionally left blank
Bibliography As might be expected, the literature in the interface of computer science, numerical analysis, and statistics is quite diverse, and articles on random number generation and Monte Carlo methods are likely to appear in journals devoted to quite different disciplines. There are at least ten journals and serials with titles that contain some variants of both “computing” and “statistics”, but there are far more journals in numerical analysis and in areas such as “computational physics”, “computational biology”, and so on that publish articles relevant to the fields of statistical computing and computational statistics. Many of the methods of computational statistics involve random number generation and Monte Carlo methods. The journals in the mainstream of statistics also have a large proportion of articles in the fields of statistical computing and computational statistics because, as we suggested in the preface, recent developments in statistics and in the computational sciences have paralleled each other to a large extent. There are two well-known learned societies with a primary focus in statistical computing: the International Association for Statistical Computing (IASC), which is an affiliated society of the International Statistical Institute, and the Statistical Computing Section of the American Statistical Association (ASA). The Statistical Computing Section of the ASA has a regular newsletter carrying news and notices as well as articles on practicum. Also, the activities of the Society for Industrial and Applied Mathematics (SIAM) are often relevant to computational statistics. There are two regular conferences in the area of computational statistics: COMPSTAT, held biennially in Europe and sponsored by the IASC, and the Interface Symposium, generally held annually in North America and sponsored by the Interface Foundation of North America with cooperation from the Statistical Computing Section of the ASA. In addition to literature and learned societies in the traditional forms, an important source of communication and a repository of information are computer databases and forums. In some cases, the databases duplicate what is available in some other form, but often the material and the communications facilities provided by the computer are not available elsewhere.
331
332
BIBLIOGRAPHY
Literature in Computational Statistics In the Library of Congress classification scheme, most books on statistics, including statistical computing, are in the QA276 section, although some are classified under H, HA, and HG. Numerical analysis is generally in QA279 and computer science in QA76. Many of the books in the interface of these disciplines are classified in these or other places within QA. Current Index to Statistics, published annually by the American Statistical Association and the Institute for Mathematical Statistics, contains both author and subject indexes that are useful in finding journal articles or books in statistics. The Index is available in hard copy and on CD-ROM. The Association for Computing Machinery (ACM) publishes an annual index, by author, title, and keyword, of the literature in the computing sciences. Mathematical Reviews, published by the American Mathematical Society (AMS), contains brief reviews of articles in all areas of mathematics. The areas of “Statistics”, “Numerical Analysis”, and “Computer Science” contain reviews of articles relevant to computational statistics. The papers reviewed in Mathematical Reviews are categorized according to a standard system that has slowly evolved over the years. In this taxonomy, called the AMS MR classification system, “Statistics” is 62Xyy; “Numerical Analysis”, including random number generation, is 65Xyy; and “Computer Science” is 68Xyy. (“X” represents a letter and “yy” represents a two-digit number.) Mathematical Reviews is available to subscribers via the World Wide Web at MathSciNet: http://www.ams.org/mathscinet/ There are various handbooks of mathematical functions and formulas that are useful in numerical computations. Three that should be mentioned are Abramowitz and Stegun (1964), Spanier and Oldham (1987), and Thompson (1997). Anyone doing serious scientific computations should have ready access to at least one of these volumes. Almost all journals in statistics have occasional articles on computational statistics and statistical computing. The following is a list of journals, proceedings, and newsletters that emphasize this field. ACM Transactions on Mathematical Software, published quarterly by the ACM (Association for Computing Machinery). This journal publishes algorithms in Fortran and C. The ACM collection of algorithms is sometimes called CALGO. The algorithms published during the period 1975 through 1999 are available on a CD-ROM from ACM. Most of the algorithms are available through netlib at http://www.netlib.org/liblist.html ACM Transactions on Modeling and Computer Simulation, published quarterly by the ACM. Applied Statistics, published quarterly by the Royal Statistical Society. (Until 1998, it included algorithms in Fortran. Some of these algorithms, with cor-
BIBLIOGRAPHY
333
rections, were collected by Griffiths and Hill, 1985. Most of the algorithms are available through statlib at Carnegie Mellon University.) Communications in Statistics — Simulation and Computation, published quarterly by Marcel Dekker. (Until 1996, it included algorithms in Fortran. Until 1982, this journal was designated as Series B.) Computational Statistics, published quarterly by Physica-Verlag (formerly called Computational Statistics Quarterly). Computational Statistics. Proceedings of the xxth Symposium on Computational Statistics (COMPSTAT), published biennially by Physica-Verlag. (It is not refereed.) Computational Statistics & Data Analysis, published by North Holland. The number of issues per year varies. (This is also the official journal of the International Association for Statistical Computing, and as such incorporates the Statistical Software Newsletter.) Computing Science and Statistics. This is an annual publication containing papers presented at the Interface Symposium. Until 1992, these proceedings were named Computer Science and Statistics: Proceedings of the xxth Symposium on the Interface. (The 24th symposium was held in 1992.) In 1997, Volume 29 was published in two issues: Number 1, which contains the papers of the regular Interface Symposium, and Number 2, which contains papers from another conference. The two numbers are not sequentially paginated. Since 1999, the proceedings have been published only in CD-ROM form, by the Interface Foundation of North America. (It is not refereed.) Journal of Computational and Graphical Statistics, published quarterly by the American Statistical Association. Journal of Statistical Computation and Simulation, published irregularly in four numbers per volume by Gordon and Breach. Proceedings of the Statistical Computing Section, published annually by the American Statistical Association. (It is not refereed.) SIAM Journal on Scientific Computing, published bimonthly by SIAM. This journal was formerly SIAM Journal on Scientific and Statistical Computing. (Is this a step backward?) Statistical Computing & Graphics Newsletter, published quarterly by the Statistical Computing and the Statistical Graphics Sections of the American Statistical Association. (It is not refereed and is not generally available in university libraries.) Statistics and Computing, published quarterly by Chapman & Hall. There are two journals whose contents are primarily in the subject area of random number generation, simulation, and Monte Carlo methods: ACM Transactions on Modeling and Computer Simulation (Volume 1 appeared in 1992) and Monte Carlo Methods and Applications (Volume 1 appeared in 1995). There has been a series of conferences concentrating on this area (with an emphasis on quasirandom methods). The first International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing was held in Las Vegas, Nevada, in 1994. The fifth was held in Singapore in 2002.
334
BIBLIOGRAPHY
The proceedings of the conferences have been published in the Lecture Notes in Statistics series of Springer-Verlag. The proceedings of the first conference were published as Niederreiter and Shiue (1995); those of the second as Niederreiter et al. (1998), the third as Niederreiter and Spanier (1999), and of the fourth as Fang, Hickernell, and Niederreiter (2002). The proceedings of the CRYPTO conferences often contain interesting articles on uniform random number generation, with an emphasis on the cryptographic applications. These proceedings are published in the Lecture Notes in Computer Science series of Springer-Verlag under the name Proceedings of CRYPTO XX, where XX is a two digit number representing the year of the conference. There are a number of textbooks, monographs, and survey articles on random number generation and Monte Carlo methods. Some of particular note (listed alphabetically) are Bratley, Fox, and Schrage (1987), Dagpunar (1988), De´ ak (1990), Devroye (1986a), Fishman (1996), Knuth (1998), L’Ecuyer (1990), L’Ecuyer and Hellekalek (1998), Lewis and Orav (1989), Liu (2001), Morgan (1984), Niederreiter (1992, 1995c), Ripley (1987), Robert and Casella (1999), and Tezuka (1995).
World Wide Web, News Groups, List Servers, and Bulletin Boards The best way of storing information is in a digital format that can be accessed by computers. In some cases, the best way for people to access information is by computers. In other cases, the best way is via hard copy, which means that the information stored on the computer must go through a printing process resulting in books, journals, or loose pages. The references that I have cited in this text are generally traditional books, journal articles, or compact discs. This usually means that the material has been reviewed by someone other than the author. It also means that the author possibly has newer thoughts on the same material. The Internet provides a mechanism for the dissemination of large volumes of information that can be updated readily. The ease of providing material electronically is also the source of the major problem with the material: it is often half-baked and has not been reviewed critically. Another reason that I have refrained from making frequent reference to material available over the Internet is the unreliability of some sites. The average life of a Web site is measured in weeks. For statistics, one of the most useful sites on the Internet is the electronic repository statlib, maintained at Carnegie Mellon University, which contains programs, datasets, and other items of interest. The URL is http://lib.stat.cmu.edu. The collection of algorithms published in Applied Statistics is available in statlib. These algorithms are sometimes called the ApStat algorithms.
BIBLIOGRAPHY
335
Another very useful site for scientific computing is netlib, which was established by research workers at AT&T (now Lucent) Bell Laboratories and national laboratories, primarily Oak Ridge National Laboratory. The URL is http://www.netlib.org The Collected Algorithms of the ACM (CALGO), which are the Fortran, C, and Algol programs published in ACM Transactions on Mathematical Software (or in Communications of the ACM prior to 1975), are available in netlib under the TOMS link. The Guide to Available Mathematical Software (GAMS) can be accessed at http://gams.nist.gov A different interface, using Java, is available at http://math.nist.gov/HotGAMS/ A good set of links for software are the Econometric Links of the Econometrics Journal (which are not just limited to econometrics): http://www.eur.nl/few/ei/links/software.html There are two major problems in using the WWW to gather information. One is the sheer quantity of information and the number of sites providing information. The other is the “kiosk problem”; anyone can put up material. Sadly, the average quality is affected by a very large denominator. The kiosk problem may be even worse than a random selection of material; the “fools in public places” syndrome is much in evidence. There is not much that can be done about the second problem. It was not solved for traditional postings on uncontrolled kiosks, and it will not be solved on the WWW. For the first problem, there are remarkable programs that automatically crawl through WWW links to build a database that can be searched for logical combinations of terms and phrases. Such systems and databases have been built by several people and companies. One of the most useful is Google at http://google.stanford.edu A very widely used search program is Yahoo at http://www.yahoo.com A neophyte can be quickly disabused of an exaggerated sense of the value of such search engines by doing a search on “Monte Carlo”. It is not clear at this time what will be the media for the scientific literature within a few years. Many of the traditional journals will be converted to an electronic version of some kind. Journals will become Web sites. That is for certain; the details, however, are much less certain. Many bulletin boards and discussion groups have already evolved into “electronic journals”. A publisher of a standard commercial journal has stated that “we reject 80% of the articles submitted to our journal; those are the ones you can find on the Web”.
336
BIBLIOGRAPHY
References for Software Packages There is a wide range of software used in the computational sciences. Some of the software is produced by a single individual who is happy to share the software, sometimes for a fee, but who has no interest in maintaining the software. At the other extreme is software produced by large commercial companies whose continued existence depends on a process of production, distribution, and maintenance of the software. Information on much of the software can be obtained from GAMS. Some of the free software can be obtained from statlib or netlib. The names of many software packages are trade names or trademarks. In this book, the use of names, even if the name is not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
References to the Literature The following bibliography obviously covers a wide range of topics in random number generation and Monte Carlo methods. Except for a few of the general references, all of these entries have been cited in the text. The purpose of this bibliography is to help the reader get more information; hence, I eschew “personal communications” and references to technical reports that may or may not exist. Those kinds of references are generally for the author rather than for the reader. In some cases, important original papers have been reprinted in special collections, such as Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York. In most such cases, because the special collection may be more readily available, I list both sources.
A Note on the Names of Authors In these references, I have generally used the names of authors as they appear in the original sources. This may mean that the same author will appear with different forms of names, sometimes with given names spelled out and sometimes abbreviated. In the author index, beginning on page 371, I use a single name for the same author. The name is generally the most unique (i.e., least abbreviated) of any of the names of that author in any of the references. This convention may occasionally result in an entry in the author index that does not occur exactly in any references. For example, a reference to J. Paul Jones together with one to John P. Jones, if I know that the two names refer to the same person, would result in an author index entry for John Paul Jones. Abramowitz, Milton, and Irene A. Stegun (Editors) (1964), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Na-
BIBLIOGRAPHY
337
tional Bureau of Standards (NIST), Washington. (Reprinted by Dover Publications, New York, 1974. Work on an updated version is occurring at NIST; see http://dlmf.nist.gov/ for the current status.) Afflerbach, L., and H. Grothe (1985), Calculation of Minkowski-reduced lattice bases, Computing 35, 269–276. Afflerbach, Lothar, and Holger Grothe (1988), The lattice structure of pseudorandom vectors generated by matrix generators, Journal of Computational and Applied Mathematics 23, 127–131. Afflerbach, L., and W. H¨ ormann (1992), Nonuniform random numbers: A sensitivity analysis for transformation methods, International Workshop on Computationally Intensive Methods in Simulation and Optimization (edited by U. Dieter and G. C. Pflug), Springer-Verlag, Berlin, 374. Agarwal, Satish K., and Jamal A. Al-Saleh (2001), Generalized gamma type distribution and its hazard rate function, Communications in Statistics — Theory and Methods 30, 309–318. Agresti, Alan (1992), A survey of exact inference for contingency tables (with discussion), Statistical Science 7, 131–177. Ahn, Hongshik, and James J. Chen (1995), Generation of over-dispersed and under-dispersed binomial variates, Journal of Computational and Graphical Statistics 4, 55–64. Ahrens, J. H. (1995), A one-sample method for sampling from continuous and discrete distributions, Computing 52, 127–146. Ahrens, J. H., and U. Dieter (1972), Computer methods for sampling from the exponential and normal distributions, Communications of the ACM 15, 873–882. Ahrens, J. H., and U. Dieter (1974), Computer methods for sampling from gamma, beta, Poisson, and binomial distributions, Computing 12, 223–246. Ahrens, J. H., and U. Dieter (1980), Sampling from binomial and Poisson distributions: A method with bounded computation times, Computing 25, 193–208. Ahrens, J. H., and U. Dieter (1985), Sequential random sampling, ACM Transactions on Mathematical Software 11, 157–169. Ahrens, Joachim H., and Ulrich Dieter (1988), Efficient, table-free sampling methods for the exponential, Cauchy and normal distributions, Communications of the ACM 31, 1330–1337. (See also Hamilton, 1998.) Ahrens, J. H., and U. Dieter (1991), A convenient sampling method with bounded computation times for Poisson distributions, The Frontiers of Statistical Computation, Simulation & Modeling (edited by P. R. Nelson, E. J. ¨ urk, and E. C. van der Meulen), American Sciences Press, Dudewicz, A. Ozt¨ Columbus, Ohio, 137–149. Akima, Hirosha (1970), A new method of interpolation and smooth curve fitting based on local procedures, Journal of the ACM 17, 589–602. Albert, James; Mohan Delampady; and Wolfgang Polasek (1991), A class of distributions for robustness studies, Journal of Statistical Planning and Inference 28, 291–304.
338
BIBLIOGRAPHY
Alonso, Laurent, and Ren´e Schott (1995), Random Generation of Trees: Random Generators in Science, Kluwer Academic Publishers, Boston. Altman, N. S. (1989), Bit-wise behavior of random number generators, SIAM Journal on Scientific and Statistical Computing 9, 941–949. Aluru, S.; G. M. Prabhu; and John Gustafson (1992), A random number generator for parallel computers, Parallel Computing 18, 839–847. Anderson, N. H., and D. M. Titterington (1993), Cross-correlation between simultaneously generated sequences of pseudo-random uniform deviates, Statistics and Computing 3, 61–65. Anderson, T. W.; I. Olkin; and L. G. Underhill (1987), Generation of random orthogonal matrices, SIAM Journal on Scientific and Statistical Computing 8, 625–629. Andrews, D. F.; P. J. Bickel; F. R. Hampel; P. J. Huber; W. H. Rogers; and J. W. Tukey (1972), Robust Estimation of Location: Survey and Advances, Princeton University Press, Princeton, New Jersey. Antonov, I. A., and V. M. Saleev (1979), An economic method of computing LPτ -sequences, USSR Computational Mathematics and Mathematical Physics 19, 252–256. Arnason, A. N., and L. Baniuk (1978), A computer generation of Dirichlet variates, Proceedings of the Eighth Manitoba Conference on Numerical Mathematics and Computing, Utilitas Mathematica Publishing, Winnipeg, 97– 105. Arnold, Barry C. (1983), Pareto Distributions, International Co-operative Publishing House, Fairland, Maryland. Arnold, Barry C., and Robert J. Beaver (2000), The skew-Cauchy distribution, Statistics and Probability Letters 49, 285–290. Arnold, Barry C., and Robert J. Beaver (2002), Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion), Test 11, 7–54. Arnold, Barry C.; Robert J. Beaver; Richard A. Groeneveld; and William Q. Meeker (1993), The nontruncated marginal of a truncated bivariate normal distribution, Psychometrika 58, 471–488. Atkinson, A. C. (1979), A family of switching algorithms for the computer generation of beta random variates, Biometrika 66, 141–145. Atkinson, A. C. (1980), Tests of pseudo-random numbers, Applied Statistics 29, 164–171. Atkinson, A. C. (1982), The simulation of generalized inverse Gaussian and hyperbolic random variables, SIAM Journal on Scientific and Statistical Computing 3, 502–515. Atkinson, A. C., and M. C. Pearce (1976), The computer generation of beta, gamma and normal random variables (with discussion), Journal of the Royal Statistical Society, Series A 139, 431–460. Avramidis, Athanassios N., and James R. Wilson (1995), Correlation-induction techniques for estimating quantiles in simulation experiments, Proceedings
BIBLIOGRAPHY
339
of the 1995 Winter Simulation Conference, Association for Computing Machinery, New York, 268–277. Azzalini, A., and A. Dalla Valle (1996), The multivariate skew-normal distribution, Biometrika 83, 715–726. Bacon-Shone, J. (1985), Algorithm AS210: Fitting five parameter Johnson SB curves by moments, Applied Statistics 34, 95–100. Bailey, David H., and Richard E. Crandall (2001), On the random character of fundamental constant expressions, Experimental Mathematics 10, 175–190. Bailey, Ralph W. (1994), Polar generation of random variates with the tdistribution, Mathematics of Computation 62, 779–781. Balakrishnan, N., and R. A. Sandhu (1995), A simple simulation algorithm for generating progressive Type-II censored samples, The American Statistician 49, 229–230. Banerjia, Sanjeev, and Rex A. Dwyer (1993), Generating random points in a ball, Communications in Statistics — Simulation and Computation 22, 1205–1209. Banks, David L. (1998), Testing random number generators, Proceedings of the Statistical Computing Section, ASA, 102–107. Barnard, G. A. (1963), Discussion of Bartlett, “The spectral analysis of point processes”, Journal of the Royal Statistical Society, Series B 25, 264–296. Barndorff-Nielsen, Ole E., and Neil Shephard (2001), Non-Gaussian OrnsteinUhlenbeck-based models and some of their uses in financial economics (with discussion), Journal of the Royal Statistical Society, Series B 63, 167–241. Bays, Carter, and S. D. Durham (1976), Improving a poor random number generator, ACM Transactions on Mathematical Software 2, 59–64. Beck, J., and W. W. L. Chen (1987), Irregularities of Distribution, Cambridge University Press, Cambridge, United Kingdom. Becker, P. J., and J. J. J. Roux (1981), A bivariate extension of the gamma distribution, South African Statistical Journal 15, 1–12. Becker, Richard A.; John M. Chambers; and Allan R. Wilks (1988), The New S Language, Wadsworth & Brooks/Cole, Pacific Grove, California. Beckman, Richard J., and Michael D. McKay (1987), Monte Carlo estimation under different distributions using the same simulation, Technometrics 29, 153–160. B´elisle, Claude J. P.; H. Edwin Romeijn; and Robert L. Smith (1993), Hitand-run algorithms for generating multivariate distributions, Mathematics of Operations Research 18, 255–266. Bendel, R. B., and M. R. Mickey (1978), Population correlation matrices for sampling experiments, Communications in Statistics — Simulation and Computation B7, 163–182. Berbee, H. C. P.; C. G. E. Boender; A. H. G. Rinnooy Kan; C. L. Scheffer; R. L. Smith; and J. Telgen (1987), Hit-and-run algorithms for the identification of nonredundant linear inequalities, Mathematical Programming 37, 184–207. Best, D. J. (1983), A note on gamma variate generators with shape parameter less than unity, Computing 30, 185–188.
340
BIBLIOGRAPHY
Best, D. J., and N. I. Fisher (1979), Efficient simulation of the von Mises distribution, Applied Statistics 28, 152–157. Beyer, W. A. (1972), Lattice structure and reduced bases of random vectors generated by linear recurrences, Applications of Number Theory to Numerical Analysis (edited by S. K. Zaremba), Academic Press, New York, 361–370. Beyer, W. A.; R. B. Roof; and D. Williamson (1971), The lattice structure of multiplicative congruential pseudo-random vectors, Mathematics of Computation 25, 345–363. Bhanot, Gyan (1988), The Metropolis algorithm, Reports on Progress in Physics 51, 429–457. Birkes, David, and Yadolah Dodge (1993), Alternative Methods of Regression, John Wiley & Sons, New York. Blum, L.; M. Blum; and M. Shub (1986), A simple unpredictable pseudorandom number generator, SIAM Journal of Computing 15, 364–383. Bouleau, Nicolas, and Dominique L´epingle (1994), Numerical Methods for Stochastic Processes, John Wiley & Sons, New York. Boyar, J. (1989), Inferring sequences produced by pseudo-random number generators, Journal of the ACM 36, 129–141. Boyett, J. M. (1979), Random R × C tables with given row and column totals, Applied Statistics 28, 329–332. Braaten, E., and G. Weller (1979), An improved low-discrepancy sequence for multidimensional quasi-Monte Carlo integration, Journal of Computational Physics 33, 249–258. Bratley, Paul, and Bennett L. Fox (1988), Algorithm 659: Implementing Sobol’s quasirandom sequence generator, ACM Transactions on Mathematical Software 14, 88–100. Bratley, Paul; Bennett L. Fox; and Harald Niederreiter (1992), Implementation and tests of low-discrepancy sequences, ACM Transactions on Modeling and Computer Simulation 2, 195–213. Bratley, Paul; Bennett L. Fox; and Harald Niederreiter (1994), Algorithm 738: Programs to generate Niederreiter’s low-discrepancy sequences, ACM Transactions on Mathematical Software 20, 494–495. Bratley, Paul; Bennett L. Fox; and Linus E. Schrage (1987), A Guide to Simulation, second edition, Springer-Verlag, New York. Brooks, S. P., and G. O. Roberts (1999) Assessing convergence of iterative simulations, Statistics and Computing 8, 319–335. Brophy, John F.; James E. Gentle; Jing Li; and Philip W. Smith (1989), Software for advanced architecture computers, Computer Science and Statistics: Proceedings of the Twenty-first Symposium on the Interface (edited by Kenneth Berk and Linda Malone), American Statistical Association, Alexandria, Virginia, 116–120. Brown, Morton B., and Judith Bromberg (1984), An efficient two-stage procedure for generating random variates from the multinomial distribution, The American Statistician 38, 216–219.
BIBLIOGRAPHY
341
Buckheit, Jonathan B., and David L. Donoho (1995), WaveLab and reproducible research, Wavelets and Statistics (edited by Anestis Antoniadis and Georges Oppenheim), Springer-Verlag, New York, 55–81. Buckle, D. J. (1995), Bayesian inference for stable distributions, Journal of the American Statistical Association 90, 605–613. Burr, I. W. (1942), Cumulative frequency functions, Annals of Mathematical Statistics 13, 215–232. Burr, Irving W., and Peter J. Cislak (1968), On a general system of distributions. I. Its curve-shape characteristics. II. The sample median, Journal of the American Statistical Association 63, 627–635. Cabrera, Javier, and Dianne Cook (1992), Projection pursuit indices based on fractal dimension, Computing Science and Statistics 24, 474–477. Caflisch, Russel E., and Bradley Moskowitz (1995), Modified Monte Carlo methods using quasi-random sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 1–16. Carlin, Bradley P., and Thomas A. Louis (1996), Bayes and Empirical Bayes Methods for Data Analysis, Chapman & Hall, New York. Carta, David G. (1990), Two fast implementations of the “minimal standard” random number generator, Communications of the ACM 33, Number 1 (January), 87–88. Casella, George, and Edward I. George (1992), Explaining the Gibbs sampler, The American Statistician 46, 167–174. Chalmers, C. P. (1975), Generation of correlation matrices with given eigenstructure, Journal of Statistical Computation and Simulation 4, 133–139. Chamayou, J.-F. (2001), Pseudo random numbers for the Landau and Vavilov distributions, Computational Statistics 19, 131–152. Chambers, John M. (1997), The evolution of the S language, Computing Science and Statistics 28, 331–337. Chambers, J. M.; C. L. Mallows; and B. W. Stuck (1976), A method for simulating stable random variables, Journal of the American Statistical Association 71, 340–344 (Corrections, 1987, ibid. 82, 704, and 1988, ibid. 83, 581). Chen, H. C., and Y. Asau (1974), On generating random variates from an empirical distribution, AIIE Transactions 6, 163–166. Chen, Huifen, and Bruce W. Schmeiser (1992), Simulation of Poisson processes with trigonometric rates, Proceedings of the 1992 Winter Simulation Conference, Association for Computing Machinery, New York, 609–617. Chen, Ming-Hui, and Bruce Schmeiser (1993), Performance of the Gibbs, hitand-run, and Metropolis samplers, Journal of Computational and Graphical Statistics 3, 251–272. Chen, Ming-Hui, and Bruce W. Schmeiser (1996), General hit-and-run Monte Carlo sampling for evaluating multidimensional integrals, Operations Research Letters 19, 161–169. Chen, Ming-Hui; Qi-Man Shao; and Joseph G. Ibrahim (2000), Monte Carlo Methods in Bayesian Computation, Springer-Verlag, New York.
342
BIBLIOGRAPHY
Cheng, R. C. H. (1978), Generating beta variates with nonintegral shape parameters, Communications of the ACM 21, 317–322. Cheng, R. C. H. (1984), Generation of inverse Gaussian variates with given sample mean and dispersion, Applied Statistics 33, 309–316. Cheng, R. C. H. (1985), Generation of multivariate normal samples with given mean and covariance matrix, Journal of Statistical Computation and Simulation 21, 39–49. Cheng, R. C. H., and G. M. Feast (1979), Some simple gamma variate generators, Applied Statistics 28, 290–295. Cheng, R. C. H., and G. M. Feast (1980), Gamma variate generators with increased shape parameter range, Communications of the ACM 23, 389– 393. Chernick, Michael R. (1999), Bootstrap Methods: A Practitioner’s Guide, John Wiley & Sons, New York. Chib, Siddhartha, and Edward Greenberg (1995), Understanding the Metropolis– Hastings algorithm, The American Statistician 49, 327–335. Chou, Wun-Seng, and Harald Niederreiter (1995), On the lattice test for inversive congruential pseudorandom numbers, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 186–197. Chou, Youn-Min; S. Turner; S. Henson; D. Meyer; and K. S. Chen (1994), On using percentiles to fit data by a Johnson distribution, Communications in Statistics — Simulation and Computation 23, 341–354. Cipra, Barry A. (1987), An introduction to the Ising model, The American Mathematical Monthly 94, 937–959. Coldwell, R. L. (1974), Correlational defects in the standard IBM 360 random number generator and the classical ideal gas correlational function, Journal of Computational Physics 14, 223–226. Collings, Bruce Jay (1987), Compound random number generators, Journal of the American Statistical Association 82, 525–527. Compagner, Aaldert (1991), Definitions of randomness, American Journal of Physics 59, 700–705. Compagner, A. (1995), Operational conditions for random-number generation, Physical Review E 52, 5634–5645. Cook, R. Dennis, and Mark E. Johnson (1981), A family of distributions for modelling non-elliptically symmetric multivariate data, Journal of the Royal Statistical Society, Series B 43, 210–218. Cook, R. Dennis, and Mark E. Johnson (1986), Generalized Burr–Pareto-logistic distributions with applications to a uranium exploration data set, Technometrics 28, 123–131. Couture, R., and Pierre L’Ecuyer (1994), On the lattice structure of certain linear congruential sequences related to AWC/SWB generators, Mathematics of Computation 62, 799–808. Couture, Raymond, and Pierre L’Ecuyer (1995), Linear recurrences with carry as uniform random number generators, Proceedings of the 1995 Winter Sim-
BIBLIOGRAPHY
343
ulation Conference, Association for Computing Machinery, New York, 263– 267. Couture, Raymond, and Pierre L’Ecuyer (1997), Distribution properties of multiply-with-carry random number generators, Mathematics of Computation 66, 591–607. Coveyou, R. R., and R. D. MacPherson (1967), Fourier analysis of uniform random number generators, Journal of the ACM 14, 100–119. Cowles, Mary Kathryn, and Bradley P. Carlin (1996), Markov chain Monte Carlo convergence diagnostics: A comparative review, Journal of the American Statistical Association 91, 883–904. Cowles, Mary Kathryn; Gareth O. Roberts; and Jeffrey S. Rosenthal (1999), Possible biases induced by MCMC convergence diagnostics, Journal of Statistical Computation and Simulation 64, 87–104. Cowles, Mary Kathryn, and Jeffrey S. Rosenthal (1998), A simulation approach to convergence rates for Markov chain Monte Carlo algorithms, Statistics and Computing 8, 115–124. Cuccaro, Steven A.; Michael Mascagni; and Daniel V. Pryor (1994), Techniques for testing the quality of parallel pseudorandom number generators, Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, Society for Industrial and Applied Mathematics, Philadelphia, 279–284. Currin, Carla; Toby J. Mitchell; Max Morris; and Don Ylvisaker (1991), Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments, Journal of the American Statistical Association 86, 953–963. D’Agostino, Ralph B. (1986), Tests for the normal distribution, Goodness-ofFit Techniques (edited by Ralph B. D’Agostino and Michael A. Stephens), Marcel Dekker, New York, 367–419. Dagpunar, J. S. (1978), Sampling of variates from a truncated gamma distribution, Journal of Statistical Computation and Simulation 8, 59–64. Dagpunar, John (1988), Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom. Dagpunar, J. (1990), Sampling from the von Mises distribution via a comparison of random numbers, Journal of Applied Statistics 17, 165–168. Damien, Paul; Purushottam W. Laud; and Adrian F. M. Smith (1995), Approximate random variate generation from infinitely divisible distributions with applications to Bayesian inference, Journal of the Royal Statistical Society, Series B 57, 547–563. Damien, Paul, and Stephen G. Walker (2001), Sampling truncated normal, beta, and gamma densities, Journal of Computational and Graphical Statistics 10, 206–215. David, Herbert A. (1981), Order Statistics, second edition, John Wiley & Sons, New York. Davis, Charles S. (1993), The computer generation of multinomial random variates, Computational Statistics & Data Analysis 16, 205–217.
344
BIBLIOGRAPHY
Davis, Don; Ross Ihaka; and Philip Fenstermacher (1994), Cryptographic randomness from air turbulence in disk drives, Advances in Cryptology — CRYPTO ’94, edited by Yvo G. Desmedt, Springer-Verlag, New York, 114– 120. Davison, A. C., and D. V. Hinkley (1997), Bootstrap Methods and Their Application, Cambridge University Press, Cambridge, United Kingdom. De´ ak, I. (1981), An economical method for random number generation and a normal generator, Computing 27, 113–121. De´ ak, I. (1986), The economical method for generating random samples from discrete distributions, ACM Transactions on Mathematical Software 12, 34– 36. De´ ak, Istv´ an (1990), Random Number Generators and Simulation, Akad´emiai Kiad´ o, Budapest. Dellaportas, Petros (1995), Random variate transformations in the Gibbs sampler: Issues of efficiency and convergence, Statistics and Computing 5, 133– 140. Dellaportas, P., and A. F. M. Smith (1993), Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling, Applied Statistics 42, 443–459. De Matteis, A., and S. Pagnutti (1990), Long-range correlations in linear and non-linear random number generators, Parallel Computing 14, 207–210. De Matteis, A., and S. Pagnutti (1993), Long-range correlation analysis of the Wichmann–Hill random number generator, Statistics and Computing 3, 67– 70. Deng, Lih-Yuan; Kwok Hung Chan; and Yilian Yuan (1994), Random number generators for multiprocessor systems, International Journal of Modelling and Simulation 14, 185–191. Deng, Lih-Yuan, and E. Olusegun George (1990), Generation of uniform variates from several nearly uniformly distributed variables, Communications in Statistics — Simulation and Computation 19, 145–154. Deng, Lih-Yuan, and E. Olusegun George (1992), Some characterizations of the uniform distribution with applications to random number generation, Annals of the Institute of Statistical Mathematics 44, 379–385. Deng, Lih-Yuan, and Dennis K. J. Lin (2000), Random number generation for the new century, The American Statistician 54, 145–150. Deng, L.-Y.; D. K. J. Lin; J. Wang; and Y. Yuan (1997), Statistical justification of combination generators, Statistica Sinica 7 993–1003. Devroye, L. (1984a), Random variate generation for unimodal and monotone densities, Computing 32, 43–68. Devroye, L. (1984b), A simple algorithm for generating random variates with a log-concave density, Computing 33, 247–257. Devroye, Luc (1986a), Non-Uniform Random Variate Generation, SpringerVerlag, New York. Devroye, Luc (1986b), An automatic method for generating random variates
BIBLIOGRAPHY
345
with a given characteristic function, SIAM Journal on Applied Mathematics 46, 698–719. Devroye, Luc (1987), A simple generator for discrete log-concave distributions, Computing 39, 87–91. Devroye, Luc (1989), On random variate generation when only moments or Fourier coefficients are known, Mathematics and Computers in Simulation 31, 71–79. Devroye, Luc (1991), Algorithms for generating discrete random variables with a given generating function or a given moment sequence, SIAM Journal on Scientific and Statistical Computing 12, 107–126. Devroye, Luc (1997), Random variate generation for multivariate unimodal densities, ACM Transactions on Modeling and Computer Simulation 7, 447– 477. Devroye, Luc; Peter Epstein; and J¨ org-R¨ udiger Sack (1993), On generating random intervals and hyperrectangles, Journal of Computational and Graphical Statistics 2, 291–308. Dieter, U. (1975), How to calculate shortest vectors in a lattice, Mathematics of Computation 29, 827–833. Do, Kim-Anh (1991), Quasi-random resampling for the bootstrap, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 297–300. Dodge, Yadolah (1996), A natural random number generator, International Statistical Review 64, 329–344. Doucet, Arnaud; Nando de Freitas; and Neil Gordon (Editors) (2001), Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York. Efron, Bradley, and Robert J. Tibshirani (1993), An Introduction to the Bootstrap, Chapman & Hall, New York. Eichenauer, J.; H. Grothe; and J. Lehn (1988), Marsaglia’s lattice test and nonlinear congruential pseudo random number generators, Metrika 35, 241–250. Eichenauer, J., and H. Niederreiter (1988), On Marsaglia’s lattice test for pseudorandom numbers, Manuscripta Mathematica 62, 245–248. Eichenauer, J¨ urgen, and J¨ urgen Lehn (1986), A non-linear congruential pseudo random number generator, Statistische Hefte 27, 315–326. Eichenauer-Herrmann, J¨ urgen (1995), Pseudorandom number generation by nonlinear methods, International Statistical Review 63, 247–255. Eichenauer-Herrmann, J¨ urgen (1996), Modified explicit inversive congruential pseudorandom numbers with power of 2 modulus, Statistics and Computing 6, 31–36. Eichenauer-Herrmann, J., and H. Grothe (1989), A remark on long-range correlations in multiplicative congruential pseudorandom number generators, Numerische Mathematik 56, 609–611. Eichenauer-Herrmann, J., and H. Grothe (1990), Upper bounds for the Beyer ratios of linear congruential generators, Journal of Computational and Applied Mathematics 31, 73–80.
346
BIBLIOGRAPHY
Eichenauer-Herrmann, J¨ urgen; Eva Herrmann; and Stefan Wegenkittl (1998), A survey of quadratic and inversive congruential pseudorandom numbers, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), SpringerVerlag, New York, 66–97. Eichenauer-Herrmann, J., and K. Ickstadt (1994), Explicit inversive congruential pseudorandom numbers with power of 2 modulus, Mathematics of Computation 62, 787–797. Emrich, Lawrence J., and Marion R. Piedmonte (1991), A method for generating high-dimensional multivariate binary variates, The American Statistician 45, 302–304. Erber, T.; P. Everett; and P. W. Johnson (1979), The simulation of random processes on digital computers with Chebyshev mixing transformations, Journal of Computational Physics 32, 168–211. Ernst, Michael D. (1998), A multivariate generalized Laplace distribution, Computational Statistics 13, 227–232. Evans, Michael, and Tim Swartz (2000), Approximating Integrals via Monte Carlo and Deterministic Methods, Oxford University Press, Oxford, United Kingdom. Everitt, B. S. (1998), The Cambridge Dictionary of Statistics, Cambridge University Press, Cambridge, United Kingdom. Everson, Philip J., and Carl N. Morris (2000), Simulation from Wishart distributions with eigenvalue constraints, Journal of Computational and Graphical Statistics 9, 380–389. Falk, Michael (1999), A simple approach to the generation of uniformly distributed random variables with prescribed correlations, Communications in Statistics — Simulation and Computation 28, 785–791. Fang, Kai-Tai, and T. W. Anderson (Editors) (1990), Statistical Inference in Elliptically Contoured and Related Distributions, Allerton Press, New York. Fang, K.-T.; F. J. Hickernell; and H. Niederreiter (Editors) (2002), Monte Carlo and Quasi-Monte Carlo Methods 2000, Springer-Verlag, New York. Fang, Kai-Tai, and Run-Ze Li (1997), Some methods for generating both an NTnet and the uniform distribution on a Stiefel manifold and their applications, Computational Statistics & Data Analysis 24, 29–46. Fang, Kai-Tai, and Yuan Wang (1994), Number Theoretic Methods in Statistics, Chapman & Hall, New York. Faure, H. (1986), On the star discrepancy of generalised Hammersley sequences in two dimensions. Monatshefte f¨ ur Mathematik 101, 291–300. Ferrenberg, Alan M.; D. P. Landau; and Y. Joanna Wong (1992), Monte Carlo simulations: Hidden errors from “good” random number generators, Physical Review Letters 69, 3382–3384. Fill, James A. (1998), An interruptible algorithm for perfect sampling via Markov Chains, Annals of Applied Probability 8, 131–162. Fill, James Allen; Motoya Machida; Duncan J. Murdoch; and Jeffrey S. Rosen-
BIBLIOGRAPHY
347
thal (2000), Extensions of Fill’s perfect rejection sampling algorithm to general chains, Random Structures and Algorithms 17, 290–316. Fishman, George S. (1996), Monte Carlo Concepts, Algorithms, and Applications, Springer-Verlag, New York. Fishman, George S., and Louis R. Moore, III (1982), A statistical evaluation of multiplicative random number generators with modulus 231 − 1, Journal of the American Statistical Association 77, 129–136. Fishman, George S., and Louis R. Moore, III (1986), An exhaustive analysis of multiplicative congruential random number generators with modulus 231 −1, SIAM Journal on Scientific and Statistical Computing 7, 24–45. Fleishman, Allen I. (1978), A method for simulating non-normal distributions, Psychometrika 43, 521–532. Flournoy, Nancy, and Robert K. Tsutakawa (Editors) (1991), Statistical Multiple Integration, American Mathematical Society (Contemporary Mathematics, Volume 115), Providence, Rhode Island. Forster, Jonathan J.; John W. McDonald; and Peter W. F. Smith (1996), Monte Carlo exact conditional tests for log-linear and logistic models, Journal of the Royal Statistical Society, Series B 55, 3–24. Fouque, Jean-Pierre; George Papanicolaou; and K. Ronnie Sircar (2000), Derivatives in Financial Markets with Stochastic Volatility, Cambridge University Press, Cambridge, United Kingdom. Fox, Bennett L. (1986), Implementation and relative efficiency of quasirandom sequence generators, ACM Transactions on Mathematical Software 12, 362– 376. Frederickson, P.; R. Hiromoto; T. L. Jordan; B. Smith; and T. Warnock (1984), Pseudo-random trees in Monte Carlo, Parallel Computing 1, 175–180. Freimer, Marshall; Govind S. Mudholkar; Georgia Kollia; and Thomas C. Lin (1988), A study of the generalized Tukey lambda family, Communications in Statistics — Theory and Methods 17, 3547–3567. Freund, John E. (1961), A bivariate extension of the exponential distribution, Journal of the American Statistical Association 56, 971–977. Friedman, Jerome H.; Jon Louis Bentley; and Raphael Ari Finkel (1977), An algorithm for finding best matches in logarithmic expected time, ACM Transactions on Mathematical Software 3, 209–226. Frigessi, Arnoldo; Fabio Martinelli; and Julian Stander (1997), Computational complexity of Markov chain Monte Carlo methods for finite Markov random fields, Biometrika 84, 1–18. Fuller, A. T. (1976), The period of pseudo-random numbers generated by Lehmer’s congruential method, Computer Journal 19, 173–177. Fushimi, Masanori (1990), Random number generation with the recursion Xt = Xt−3p ⊕ Xt−3q , Journal of Computational and Applied Mathematics 31, 105–118. Gamerman, Dani (1997), Markov Chain Monte Carlo, Chapman & Hall, London.
348
BIBLIOGRAPHY
Gange, Stephen J. (1995), Generating multivariate categorical variates using the iterative proportional fitting algorithm, The American Statistician 49, 134–138. Gelfand, Alan E., and Adrian F. M. Smith (1990), Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association 85, 398–409. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 526–550.) Gelfand, Alan E., and Sujit K. Sahu (1994), On Markov chain Monte Carlo acceleration, Journal of Computational and Graphical Statistics 3, 261–276. Gelman, Andrew (1992), Iterative and non-iterative simulation algorithms, Computing Science and Statistics 24, 433–438. Gelman, Andrew, and Xiao-Li Meng (1998), Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13, 163–185. Gelman, Andrew, and Donald B. Rubin (1992a), Inference from iterative simulation using multiple sequences (with discussion), Statistical Science 7, 457–511. Gelman, Andrew, and Donald B. Rubin (1992b), A single series from the Gibbs sampler provides a false sense of security, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 625–631. Gelman, Andrew; John B. Carlin; Hal S. Stern; and Donald B. Rubin (1995), Bayesian Data Analysis, Chapman & Hall, London. Geman, Stuart, and Donald Geman (1984), Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Gentle, James E. (1981), Portability considerations for random number generators, Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (edited by William F. Eddy), Springer-Verlag, New York, 158–164. Gentle, James E. (1990), Computer implementation of random number generators, Journal of Computational and Applied Mathematics 31, 119–125. Gentleman, Robert, and Ross Ihaka (1997), The R language, Computing Science and Statistics 28, 326–330. Gerontidis, I., and R. L. Smith (1982), Monte Carlo generation of order statistics from general distributions, Applied Statistics 31, 238–243. Geweke, John (1991a), Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 571–578. Geweke, John (1991b), Generic, algorithmic approaches to Monte Carlo integration in Bayesian inference, Statistical Multiple Integration (edited by
BIBLIOGRAPHY
349
Nancy Flournoy and Robert K. Tsutakawa), American Mathematical Society, Rhode Island, 117–135. Geyer, Charles J. (1992), Practical Markov chain Monte Carlo (with discussion), Statistical Science 7, 473–511. Geyer, Charles J., and Elizabeth A. Thompson (1995), Annealing Markov chain Monte Carlo with applications to ancestral inference, Journal of the American Statistical Association 90, 909–920. Ghitany, M. E. (1998), On a recent generalization of gamma distribution, Communications in Statistics — Theory and Methods 27, 223–233. Gilks, W. R. (1992), Derivative-free adaptive rejection sampling for Gibbs sampling, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 641–649. Gilks, W. R.; N. G. Best; and K. K. C. Tan (1995), Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics 44, 455–472 (Corrections, Gilks, et al., 1997, ibid. 46, 541–542). Gilks, W. R.; S. Richardson; and D. J. Spiegelhalter (Editors) (1996), Markov Chain Monte Carlo in Practice, Chapman & Hall, London. Gilks, Walter R., and Gareth O. Roberts (1996), Strategies for improving MCMC, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 89–114. Gilks, W. R.; G. O. Roberts; and E. I. George (1994), Adaptive direction sampling, The Statistician 43, 179–189. Gilks, W. R.; A. Thomas; and D. J. Spiegelhalter (1992), Software for the Gibbs sampler, Computing Science and Statistics 24, 439–448. Gilks, W. R.; A. Thomas; and D. J. Spiegelhalter (1994), A language and program for complex Bayesian modelling, The Statistician 43, 169–178. Gilks, W. R., and P. Wild (1992), Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41, 337–348. Gleser, Leon Jay (1976), A canonical representation for the noncentral Wishart distribution useful for simulation, Journal of the American Statistical Association 71, 690–695. Golder, E. R., and J. G. Settle (1976), The Box–Muller method for generating pseudo-random normal deviates, Applied Statistics 25, 12–20. Golomb, S. W. (1982), Shift Register Sequences, second edition, Aegean Part Press, Laguna Hills, California. Gordon, J. (1989), Fast multiplicative inverse in modular arithmetic, Cryptography and Coding (edited by H. J. Beker and F. C. Piper), Clarendon Press, Oxford, United Kingdom, 269–279. Gordon, N. J.; D. J. Salmond; and A. F. M. Smith (1993), Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proceedings F, Communications, Radar, and Signal Processing 140, 107–113. Grafton, R. G. T. (1981), The runs-up and runs-down tests, Applied Statistics 30, 81–85.
350
BIBLIOGRAPHY
Greenwood, J. Arthur (1976a), The demands of trivial combinatorial problems on random number generators, Proceedings of the Ninth Interface Symposium on Computer Science and Statistics (edited by David C. Hoaglin and Roy E. Welsch), Prindle, Weber, and Schmidt, Boston, 222–227. Greenwood, J. A. (1976b), A fast machine-independent long-period generator for 31-bit pseudo-random numbers, Compstat 1976: Proceedings in Computational Statistics (edited by J. Gordesch and P. Naeve), Physica-Verlag, Vienna, 30–36. Greenwood, J. Arthur (1976c), Moments of time to generate random variables by rejection, Annals of the Institute for Statistical Mathematics 28, 399– 401. Griffiths, P., and I. D. Hill (Editors) (1985), Applied Statistics Algorithms, Ellis Horwood Limited, Chichester, United Kingdom. Grothe, H. (1987), Matrix generators for pseudo-random vector generation, Statistische Hefte 28, 233–238. Guerra, Victor O.; Richard A. Tapia; and James R. Thompson (1976), A random number generator for continuous random variables based on an interpolation procedure of Akima, Computer Science and Statistics: 9th Annual Symposium on the Interface (edited by David C. Hoaglin and Roy E. Welsch), Prindle, Weber, and Schmidt, Boston, 228–230. Halton, J. H. (1960), On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals, Numerische Mathematik 2, 84–90 (Corrections, 1960, ibid. 2, 190). Hamilton, Kenneth G. (1998), Algorithm 780: Exponential pseudorandom distribution, ACM Transactions on Mathematical Software 24, 102–106. Hammersley, J. M., and D. C. Handscomb (1964), Monte Carlo Methods, Methuen & Co., London. Hartley, H. O., and D. L. Harris (1963), Monte Carlo computations in normal correlation procedures, Journal of the ACM 10, 302–306. Hastings, W. K. (1970), Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 240–256.) Heiberger, Richard M. (1978), Algorithm AS127: Generation of random orthogonal matrices, Applied Statistics 27, 199–205. (See Tanner and Thisted, 1982.) Hellekalek, P. (1984), Regularities of special sequences, Journal of Number Theory 18, 41–55. Hesterberg, Tim (1995), Weighted average importance sampling and defensive mixture distributions, Technometrics 37, 185–194. Hesterberg, Timothy C., and Barry L. Nelson (1998), Control variates for probability and quantile estimation, Management Science 44, 1295–1312. Hickernell, Fred J. (1995), A comparison of random and quasirandom points for multidimensional quadrature, Monte Carlo and Quasi-Monte Carlo Meth-
BIBLIOGRAPHY
351
ods in Scientific Computing (edited by Harald Niederreiter and Peter JauShyong Shiue), Springer-Verlag, New York, 212–227. Hill, I. D.; R. Hill, R.; and R. L. Holder (1976), Algorithm AS99: Fitting Johnson curves by moments, Applied Statistics 25, 180–189 (Remark, 1981, ibid. 30, 106). Hoaglin, David C., and David F. Andrews (1975), The reporting of computationbased results in statistics, The American Statistician 29, 122–126. Hope, A. C. A. (1968), A simplified Monte Carlo significance test procedure, Journal of the Royal Statistical Society, Series B 30, 582–598. Hopkins, T. R. (1983), A revised algorithm for the spectral test, Applied Statistics 32, 328–335. (See http://www.cs.ukc.ac.uk/pubs/1997 for an updated version.) H¨ ormann, W. (1994a), A universal generator for discrete log-concave distributions, Computing 52, 89–96. H¨ ormann, Wolfgang (1994b), A note on the quality of random variates generated by the ratio of uniforms method, ACM Transactions on Modeling and Computer Simulation 4, 96–106. H¨ ormann, Wolfgang (1995), A rejection technique for sampling from T -concave distributions, ACM Transactions on Mathematical Software 21, 182–193. H¨ ormann, Wolfgang (2000), Algorithm 802: An automatic generator for bivariate log-concave distributions, ACM Transactions on Mathematical Software 26, 201–219. H¨ ormann, Wolfgang, and Gerhard Derflinger (1993), A portable random number generator well suited for the rejection method, ACM Transactions on Mathematical Software 19, 489–495. H¨ ormann, Wolfgang, and Gerhard Derflinger (1994), The transformed rejection method for generating random variables, an alternative to the ratio of uniforms method, Communications in Statistics — Simulation and Computation 23, 847–860. Hosack, J. M. (1986), The use of Chebyshev mixing to generate pseudo-random numbers, Journal of Computational Physics 67, 482–486. Huber, Peter J. (1985), Projection pursuit (with discussion), The Annals of Statistics 13, 435–525. Hull, John C. (2000), Options, Futures, & Other Derivatives, Prentice–Hall, Englewood Cliffs, New Jersey. Ireland, Kenneth, and Michael Rosen (1991), A Classical Introduction to Modern Number Theory, Springer-Verlag, New York. J¨ ackel, Peter (2002), Monte Carlo Methods in Finance, John Wiley & Sons Ltd., Chichester. Jaditz, Ted (2000), Are the digits of π an independent and identically distributed sequence?, The American Statistician 54, 12–16. James, F. (1990), A review of pseudorandom number generators, Computer Physics Communications 60, 329–344. James, F. (1994), RANLUX: A Fortran implementation of the high-quality
352
BIBLIOGRAPHY
pseudorandom number generator of L¨ uscher, Computer Physics Communications 79, 111–114. J¨ ohnk, M. D. (1964), Erzeugung von Betaverteilter und Gammaverteilter Zufallszahlen, Metrika 8, 5–15. Johnson, Mark E. (1987), Multivariate Statistical Simulation, John Wiley & Sons, New York. Johnson, Valen E. (1996), Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, Journal of the American Statistical Association 91, 154–166. Jones, G.; C. D. Lai; and J. C. W. Rayner (2000), A bivariate gamma mixture distribution, Communications in Statistics — Theory and Methods 29, 2775–2790. Joy, Corwin; Phelim P. Boyle; and Ken Seng Tan (1996), Quasi-Monte Carlo methods in numerical finance, Management Science 42, 926–938. Juneja, Sandeep, and Perwez Shahabudding (2001), Fast simulation of Markov chains with small transition probabilities, Management Science 47, 547–562. Kachitvichyanukul, Voratas (1982), Computer Generation of Poisson, Binomial, and Hypergeometric Random Variables, unpublished Ph.D. dissertation, Purdue University, West Lafayette, Indiana. Kachitvichyanukul, Voratas; Shiow-Wen Cheng; and Bruce Schmeiser (1988), Fast Poisson and binomial algorithms for correlation induction, Journal of Statistical Computation and Simulation 29, 17–33. Kachitvichyanukul, Voratas, and Bruce Schmeiser (1985), Computer generation of hypergeometric random variates, Journal of Statistical Computation and Simulation 22, 127–145. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1988a), Binomial random variate generation, Communications of the ACM 31, 216–223. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1988b), Algorithm 668: H2PEC: Sampling from the hypergeometric distribution, ACM Transactions on Mathematical Software 14, 397–398. Kachitvichyanukul, Voratas, and Bruce W. Schmeiser (1990), BTPEC: Sampling from the binomial distribution, ACM Transactions on Mathematical Software 16, 394–397. Kahn, H., and A. W. Marshall (1953), Methods of reducing sample size in Monte Carlo computations, Journal of the Operations Research Society of America 1, 263–278. Kankaala, K.; T. Ala-Nissila; and I. Vattulainen (1993), Bit-level correlations in some pseudorandom number generators, Physical Review E 48, R4211– R4214. Kao, Chiang, and H. C. Tang (1997a), Upper bounds in spectral test for multiple recursive random number generators with missing terms, Computers and Mathematical Applications 33, 113–118. Kao, Chiang, and H. C. Tang (1997b), Systematic searches for good multiple recursive random number generators, Computers and Operations Research 24, 899–905.
BIBLIOGRAPHY
353
Karian, Zaven A., and Edward J. Dudewicz (1999), Fitting the generalized lambda distribution to data: A method based on percentiles, Communications in Statistics — Simulation and Computation 28, 793–819. Karian, Zaven A., and Edward J. Dudewicz (2000), Fitting Statistical Distributions, CRC Press, Boca Raton, Florida. Karian, Zaven A.; Edward J. Dudewicz; and Patrick McDonald (1996), The extended generalized lambda distribution system for fitting distributions to data: History, completion of theory, tables, applications, the “final word” on moment fits, Communications in Statistics — Simulation and Computation 25, 611–642. Kato, Takashi; Li-ming Wu; and Niro Yanagihara (1996a), On a nonlinear congruential pseudorandom number generator, Mathematics of Computation 65, 227–233. Kato, Takashi; Li-ming Wu; and Niro Yanagihara (1996b), The serial test for a nonlinear pseudorandom number generator, Mathematics of Computation 65, 761–769. Kemp, A. W. (1981), Efficient generation of logarithmically distributed pseudorandom variables, Applied Statistics 30, 249–253. Kemp, A. W. (1990), Patchwork rejection algorithms, Journal of Computational and Applied Mathematics 31, 127–131. Kemp, C. D. (1986), A modal method for generating binomial variables, Communications in Statistics — Theory and Methods 15, 805–813. Kemp, C. D., and Adrienne W. Kemp (1987), Rapid generation of frequency tables, Applied Statistics 36, 277–282. Kemp, C. D., and Adrienne W. Kemp (1991), Poisson random variate generation, Applied Statistics 40, 143–158. Kinderman, A. J., and J. F. Monahan (1977), Computer generation of random variables using the ratio of uniform deviates, ACM Transactions on Mathematical Software 3, 257–260. Kinderman, A. J., and J. F. Monahan (1980), New methods for generating Student’s t and gamma variables, Computing 25, 369–377. Kinderman, A. J., and J. G. Ramage (1976), Computer generation of normal random variables, Journal of the American Statistical Association 71, 893– 896. Kirkpatrick, S.; C. D. Gelatt; and M. P. Vecchi (1983), Optimization by simulated annealing, Science 220, 671–679. Kirkpatrick, Scott, and Erich P. Stoll (1981), A very fast shift-register sequence random number generator, Journal of Computational Physics 40, 517–526. Kleijnen, Jack P. C. (1977), Robustness of a multiple ranking procedure: A Monte Carlo experiment illustrating design and analysis techniques, Communications in Statistics — Simulation and Computation B6, 235–262. Knuth, Donald E. (1975), Estimating the efficiency of backtrack programs, Mathematics of Computation 29, 121–136. Knuth, Donald E. (1998), The Art of Computer Programming, Volume 2, Semi-
354
BIBLIOGRAPHY
numerical Algorithms, third edition, Addison–Wesley Publishing Company, Reading, Massachusetts. Kobayashi, K. (1991), On generalized gamma functions occurring in diffraction theory, Journal of the Physical Society of Japan 60, 1501–1512. Kocis, Ladislav, and William J. Whiten (1997), Computational investigations of low-discrepancy sequences, ACM Transactions on Mathematical Software 23, 266–294. Koehler, J. R., and A. B. Owen (1996), Computer experiments, Handbook of Statistics, Volume 13 (edited by S. Ghosh and C. R. Rao), Elsevier Science Publishers, Amsterdam, 261–308. Kotz, Samuel; N. Balakrishnan; and Norman L. Johnson (2000), Continuous Multivariate Distributions, second edition, John Wiley & Sons, New York. Kovalenko, I. N. (1972), Distribution of the linear rank of a random matrix, Theory of Probability and Its Applications 17, 342–346. Kozubowski, Tomasz J., and Krzysztof Podg´ orski (2000), A multivariate and asymmetric generalization of Laplace distribution, Computational Statistics 15, 531–540. Krawczyk, Hugo (1992), How to predict congruential generators, Journal of Algorithms 13, 527–545. Krommer, Arnold R., and Christoph W. Ueberhuber (1994), Numerical Integration on Advanced Computer Systems, Springer-Verlag, New York. Kronmal, Richard A., and Arthur V. Peterson (1979a), On the alias method for generating random variables from a discrete distribution, The American Statistician 33, 214–218. Kronmal, R. A., and A. V. Peterson (1979b), The alias and alias-rejectionmixture methods for generating random variables from probability distributions, Proceedings of the 1979 Winter Simulation Conference, Institute of Electrical and Electronics Engineers, New York, 269–280. Kronmal, Richard A., and Arthur V. Peterson (1981), A variant of the acceptancerejection method for computer generation of random variables, Journal of the American Statistical Association 76, 446–451 (Corrections, 1982, ibid. 77, 954). Kronmal, Richard A., and Arthur V. Peterson (1984), An acceptance-complement analogue of the mixture-plus-acceptance-rejection method for generating random variables, ACM Transactions on Mathematical Software 10, 271– 281. Kumada, Toshihiro; Hannes Leeb; Yoshiharu Kurita; and Makoto Matsumoto (2000), New primitive t-nomials (t = 3, 5) over GF (2) whose degree is a Mersenne exponent, Mathematics of Computation 69, 811–814. Lagarias, Jeffrey C. (1993), Pseudorandom numbers, Statistical Science 8, 31– 39. Laud, Purushottam W.; Paul Ramgopal; and Adrian F. M. Smith (1993), Random variate generation from D-distributions, Statistics and Computing 3, 109–112.
BIBLIOGRAPHY
355
Lawrance, A. J. (1992), Uniformly distributed first-order autoregressive time series models and multiplicative congruential random number generators, Journal of Applied Probability 29, 896–903. Learmonth, G. P., and P. A. W. Lewis (1973), Statistical tests of some widely used and recently proposed uniform random number generators, Computer Science and Statistics: 7th Annual Symposium on the Interface (edited by William J. Kennedy), Statistical Laboratory, Iowa State University, Ames, Iowa, 163–171. L’Ecuyer, Pierre (1988), Efficient and portable combined random number generators, Communications of the ACM 31, 742–749, 774. L’Ecuyer, Pierre (1990), Random numbers for simulation, Communications of the ACM 33, Number 10 (October), 85–97. L’Ecuyer, Pierre (1996), Combined multiple recursive random number generators, Operations Research 44, 816–822. L’Ecuyer, Pierre (1997), Tests based on sum-functions of spacings for uniform random numbers, Journal of Statistical Computation and Simulation 59, 251–269. L’Ecuyer, Pierre (1998), Random number generators and empirical tests, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), Springer-Verlag, New York, 124–138. L’Ecuyer, Pierre (1999), Good parameters and implementations for combined multiple recursive random number generators, Operations Research 47, 159– 164. L’Ecuyer, Pierre; Fran¸cois Blouin; and Raymond Couture (1993), A search for good multiple recursive random number generators, ACM Transactions on Modeling and Computer Simulation 3, 87–98. L’Ecuyer, Pierre; Jean-Fran¸coise Cordeau; and Richard Simard (2000), Closepoint spatial tests and their application to random number generators, Operations Research 48, 308–317. L’Ecuyer, Pierre, and Peter Hellekalek (1998), Random number generators: Selection criteria and testing, Random and Quasi-Random Point Sets (edited by Peter Hellekalek and Gerhard Larcher), Springer-Verlag, New York, 223– 266. L’Ecuyer, Pierre, and Richard Simard (1999), Beware of linear congruential generators with multipliers of the form a = ±2q ± 2r , ACM Transactions on Mathematical Software 25, 367–374. L’Ecuyer, Pierre, and Shu Tezuka (1991), Structural properties for two classes of combined random number generators, Mathematics of Computation 57, 735–746. Lee, A. J. (1993), Generating random binary deviates having fixed marginal distributions and specified degrees of association, The American Statistician 47, 209–215. Leeb, Hannes, and Stefan Wegenkittl (1997), Inversive and linear congruential
356
BIBLIOGRAPHY
pseudorandom number generators in empirical tests, ACM Transactions on Modeling and Computer Simulation 7, 272–286. Lehmer, D. H. (1951), Mathematical methods in large-scale computing units, Proceedings of the Second Symposium on Large Scale Digital Computing Machinery, Harvard University Press, Cambridge, Massachusetts. 141–146. Leva, Joseph L. (1992a), A fast normal random number generator, ACM Transactions on Mathematical Software 18, 449–453. Leva, Joseph L. (1992b), Algorithm 712: A normal random number generator, ACM Transactions on Mathematical Software 18, 454–455. Lewis, P. A. W.; A. S. Goodman; and J. M. Miller (1969), A pseudo-random number generator for the System/360, IBM Systems Journal 8, 136–146. Lewis, P. A. W., and E. J. Orav (1989), Simulation Methodology for Statisticians, Operations Analysts, and Engineers, Volume I, Wadsworth & Brooks/Cole, Pacific Grove, California. Lewis, P. A. W., and G. S. Shedler (1979), Simulation of nonhomogeneous Poisson processes by thinning, Naval Logistics Quarterly 26, 403–413. Lewis, T. G., and W. H. Payne (1973), Generalized feedback shift register pseudorandom number algorithm, Journal of the ACM 20, 456–468. Leydold, Josef (1998), A rejection technique for sampling from log-concave multivariate distributions, ACM Transactions on Modeling and Computer Simulation 8, 254–280. Leydold, Josef (2000), Automatic sampling with the ratio-of-uniforms method, ACM Transactions on Mathematical Software 26, 78–98. Leydold, Josef (2001), A simple universal generator for continuous and discrete univariate T -concave distributions, ACM Transactions on Mathematical Software 27, 66–82. Li, Kim-Hung (1994), Reservoir-sampling algorithms of time complexity O(n(1+ log(N/n))), ACM Transactions on Mathematical Software 20, 481–493. Li, Shing Ted, and Joseph L. Hammond (1975), Generation of pseudo-random numbers with specified univariate distributions and correlation coefficients, IEEE Transactions on Systems, Man, and Cybernetics 5, 557–560. Liao, J. G., and Ori Rosen (2001), Fast and stable algorithms for computing and sampling from the noncentral hypergeometric distribution, The American Statistician 55, 366–369. Liu, Jun S. (1996), Metropolized independent sampling with comparisons to rejection sampling and importance sampling, Statistics and Computing 6, 113–119. Liu, Jun S. (2001), Monte Carlo Strategies in Scientific Computing, SpringerVerlag, New York. Liu, Jun S.; Rong Chen; and Tanya Logvinenko (2001), A theoretical framework for sequential importance sampling with resampling, Sequential Monte Carlo Methods in Practice (edited by Arnaud Doucet, Nando de Freitas, and Neil Gordon) Springer-Verlag, New York, 225–246. Liu, Jun S.; Rong Chen; and Wing Hung Wong (1998), Rejection control and
BIBLIOGRAPHY
357
sequential importance sampling Journal of the American Statistical Association 93, 1022–1031. London, Wendy B., and Chris Gennings (1999), Simulation of multivariate gamma data with exponential marginals for independent clusters, Communications in Statistics — Simulation and Computation 28, 487–500. Luby, Michael (1996), Pseudorandomness and Cryptographic Applications, Princeton University Press, Princeton. Lurie, D., and H. O. Hartley (1972), Machine generation of order statistics for Monte Carlo computations, The American Statistician 26(1), 26–27. Lurie, D., and R. L. Mason (1973), Empirical investigation of general techniques for computer generation of order statistics, Communications in Statistics 2, 363–371. Lurie, Philip M., and Matthew S. Goldberg (1998), An approximate method for sampling correlated random variables from partially specified distributions, Management Science 44, 203–218. L¨ uscher, Martin (1994), A portable high-quality random number generator for lattice field theory simulations, Computer Physics Communications 79, 100– 110. MacEachern, Steven N., and L. Mark Berliner (1994), Subsampling the Gibbs sampler, The American Statistician 48, 188–190. MacLaren, M. D., and G. Marsaglia (1965), Uniform random number generators, Journal of the ACM 12, 83–89. Manly, Bryan F. J. (1997), Randomization, Bootstrap and Monte Carlo Methods in Biology, second edition, Chapman & Hall, London. Marasinghe, Mervyn G., and William J. Kennedy, Jr. (1982), Direct methods for generating extreme characteristic roots of certain random matrices, Communications in Statistics — Simulation and Computation 11, 527–542. Marinari, Enzo, and G. Parisi (1992), Simulated tempering: A new Monte Carlo scheme, Europhysics Letters 19, 451–458. Marriott, F. H. C. (1979), Barnard’s Monte Carlo tests: How many simulations?, Applied Statistics 28, 75–78. Marsaglia, G. (1962), Random variables and computers, Information Theory, Statistical Decision Functions, and Random Processes (edited by J. Kozesnik), Czechoslovak Academy of Sciences, Prague, 499–510. Marsaglia, G. (1963), Generating discrete random variables in a computer, Communications of the ACM 6, 37–38. Marsaglia, George (1964), Generating a variable from the tail of a normal distribution, Technometrics 6, 101–102. Marsaglia, G. (1968), Random numbers fall mainly in the planes, Proceedings of the National Academy of Sciences 61, 25–28. Marsaglia, G. (1972a), The structure of linear congruential sequences, Applications of Number Theory to Numerical Analysis (edited by S. K. Zaremba), Academic Press, New York, 249–286. Marsaglia, George (1972b), Choosing a point from the surface of a sphere, Annals of Mathematical Statistics 43, 645–646.
358
BIBLIOGRAPHY
Marsaglia, G. (1977), The squeeze method for generating gamma variates, Computers and Mathematics with Applications 3, 321–325. Marsaglia, G. (1980), Generating random variables with a t-distribution, Mathematics of Computation 34, 235–236. Marsaglia, George (1984), The exact-approximation method for generating random variables in a computer, Journal of the American Statistical Association 79, 218–221. Marsaglia, George (1985), A current view of random number generators, Computer Science and Statistics: 16th Symposium on the Interface (edited by L. Billard), North-Holland, Amsterdam, 3–10. Marsaglia, George (1991), Normal (Gaussian) random variables for supercomputers, Journal of Supercomputing 5, 49–55. Marsaglia, George (1995), The Marsaglia Random Number CDROM, including the DIEHARD Battery of Tests of Randomness, Department of Statistics, Florida State University, Tallahassee, Florida. Available at http://stat.fsu.edu/~geo/diehard.html . Marsaglia, G., and T. A. Bray (1964), A convenient method for generating normal variables, SIAM Review 6, 260–264. Marsaglia, G.; M. D. MacLaren; and T. A. Bray (1964), A fast method for generating normal random variables, Communications of the ACM 7, 4–10. Marsaglia, George, and Ingram Olkin (1984), Generating correlation matrices, SIAM Journal on Scientific and Statistical Computing 5, 470–475. Marsaglia, George, and Wai Wan Tsang (1984), A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions, SIAM Journal of Scientific and Statistical Computing 5, 349–359. Marsaglia, George, and Wai Wan Tsang (1998), The Monty Python method for generating random variables, ACM Transactions on Mathematical Software 24, 341–350. Marsaglia, George, and Liang-Huei Tsay (1985), Matrices and the structure of random number sequences, Linear Algebra and Its Applications 67, 147– 156. Marsaglia, George, and Arif Zaman (1991), A new class of random number generators, The Annals of Applied Probability 1, 462–480. Marsaglia, George; Arif Zaman; and John C. W. Marsaglia (1994), Rapid evaluation of the inverse normal distribution function, Statistics and Probability Letters 19, 259–266. Marshall, Albert W., and Ingram Olkin (1967), A multivariate exponential distribution, Journal of the American Statistical Association 62, 30–44. Marshall, Albert W., and Ingram Olkin (1979), Inequalities — Theory of Majorization and Its Applications, Academic Press, New York. Mascagni, Michael; M. L. Robinson; Daniel V. Pryor; and Steven A. Cuccaro (1995), Parallel pseudorandom number generation using additive laggedFibonacci recursions, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 262–267.
BIBLIOGRAPHY
359
Mascagni, Michael, and Ashok Srinivasan (2000), SPRNG: A scalable library for pseudorandom number generation, ACM Transactions on Mathematical Software 26, 346–461. (Assigned as Algorithm 806, 2000, ibid. 26, 618– 619). Matsumoto, Makoto, and Yoshiharu Kurita (1992), Twisted GFSR generators, ACM Transactions on Modeling and Computer Simulation 2, 179–194. Matsumoto, Makoto, and Yoshiharu Kurita (1994), Twisted GFSR generators II, ACM Transactions on Modeling and Computer Simulation 4, 245– 266. Matsumoto, Makoto, and Takuji Nishimura (1998), Mersenne twister: A 623dimensionally equidistributed uniform pseudo-random generator, ACM Transactions on Modeling and Computer Simulation 8, 3–30. Maurer, Ueli M. (1992), A universal statistical test for random bit generators, Journal of Cryptology 5, 89–105. McCullough, B. D. (1999), Assessing the reliability of statistical software: Part II, The American Statistician 53, 149–159. McKay, Michael D.; William J. Conover; and Richard J. Beckman (1979), A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21, 239–245. McLeod, A. I., and D. R. Bellhouse (1983), A convenient algorithm for drawing a simple random sample, Applied Statistics 32, 182–184. Mendoza-Blanco, Jos´e R., and Xin M. Tu (1997), An algorithm for sampling the degrees of freedom in Bayesian analysis of linear regressions with tdistributed errors, Applied Statistics 46, 383–413. Mengersen, Kerrie L.; Christian P. Robert; and Chantal Guihenneuc-Jouyaux (1999), MCMC convergence diagnostics: A reviewww, Bayesian Statistics 6 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 415–440. Metropolis, N.; A. W. Rosenbluth; M. N. Rosenbluth; A. H. Teller; and E. Teller (1953), Equations of state calculation by fast computing machines, Journal of Chemical Physics 21, 1087–1092. (Reprinted in Samuel Kotz and Norman L. Johnson (Editors) (1997), Breakthroughs in Statistics, Volume III, Springer-Verlag, New York, 127–139.) Meyn, S. P., and R. L. Tweedie (1993), Markov Chains and Stochastic Stability, Springer-Verlag, New York. Michael, John R.; William R. Schucany; and Roy W. Haas (1976), Generating random variates using transformations with multiple roots, The American Statistician 30, 88–90. Mihram, George A., and Robert A. Hultquist (1967), A bivariate warningtime/failure-time distribution, Journal of the American Statistical Association 62, 589–599. Modarres, R., and J. P. Nolan (1994), A method for simulating stable random vectors, Computational Statistics 9, 11–19. Møller, Jesper, and Katja Schladitz (1999), Extensions of Fill’s algorithm for
360
BIBLIOGRAPHY
perfect simulation, Journal of the Royal Statistical Society, Series B 61, 955–969. Monahan, John F. (1987), An algorithm for generating chi random variables, ACM Transactions on Mathematical Software 13, 168–171 (Corrections, 1988, ibid. 14, 111). Morel, Jorge G. (1992), A simple algorithm for generating multinomial random vectors with extravariation, Communications in Statistics — Simulation and Computation 21, 1255–1268. Morgan, B. J. T. (1984), Elements of Simulation, Chapman & Hall, London. Nagaraja, H. N. (1979), Some relations between order statistics generated by different methods, Communications in Statistics — Simulation and Computation B8, 369–377. Neal, Radford M. (1996), Sampling from multimodal distributions using tempered transitions, Statistics and Computing 6, 353–366. Neave, H. R. (1973), On using the Box–Muller transformation with multiplicative congruential pseudo-random number generators, Applied Statistics 22, 92–97. Newman, M. E. J., and G. T. Barkema (1999), Monte Carlo Methods in Statistical Physics, Oxford University Press, Oxford, United Kingdom. Niederreiter, H. (1988), Remarks on nonlinear congruential pseudorandom numbers, Metrika 35, 321–328. Niederreiter, H. (1989), The serial test for congruential pseudorandom numbers generated by inversions, Mathematics of Computation 52, 135–144. Niederreiter, Harald (1992), Random Number Generation and Quasi-Monte Carlo Methods, Society for Industrial and Applied Mathematics, Philadelphia. Niederreiter, Harald (1993), Factorization of polynomials and some linear-algebra problems over finite fields, Linear Algebra and Its Applications 192, 301– 328. Niederreiter, Harald (1995a), The multiple-recursive matrix method for pseudorandom number generation, Finite Fields and Their Applications 1, 3–30. Niederreiter, Harald (1995b), Pseudorandom vector generation by the multiplerecursive matrix method, Mathematics of Computation 64, 279–294. Niederreiter, Harald (1995c), New developments in uniform pseudorandom number and vector generation, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 87–120. Niederreiter, Harald (1995d), Some linear and nonlinear methods for pseudorandom number generation, Proceedings of the 1995 Winter Simulation Conference, Association for Computing Machinery, New York, 250–254. Niederreiter, Harald; Peter Hellekalek; Gerhard Larcher; and Peter Zinterhof (Editors) (1998), Monte Carlo and Quasi-Monte Carlo Methods 1996, Springer-Verlag, New York. Niederreiter, Harald, and Peter Jau-Shyong Shiue (Editors) (1995), Monte
BIBLIOGRAPHY
361
Carlo and Quasi-Monte Carlo Methods in Scientific Computing, SpringerVerlag, New York. Niederreiter, Harald, and Jerome Spanier (Editors) (1999), Monte Carlo and Quasi-Monte Carlo Methods 1998, Springer-Verlag, New York. NIST (2000), A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, NIST Special Publication 80022, National Institute for Standards and Technology, Gaithersburg, Maryland. Nolan, John P. (1998a), Multivariate stable distributions: Approximation, estimation, simulation and identification, A Practical Guide to Heavy Tails: Statistical Techniques and Applications (edited by Robert J. Adler, Raisa E. Feldman, and Murad S. Taqqu), Birkh¨ auser, Boston, 509–526. Nolan, John P. (1998b), Univariate stable distributions: Parameterizations and software, A Practical Guide to Heavy Tails: Statistical Techniques and Applications (edited by Robert J. Adler, Raisa E. Feldman, and Murad S. Taqqu), Birkh¨ auser, Boston, 527–533. Norman, J. E., and L. E. Cannon (1972), A computer program for the generation of random variables from any discrete distribution, Journal of Statistical Computation and Simulation 1, 331–348. Odell, P. L., and A. H. Feiveson (1966), A numerical procedure to generate a sample covariance matrix, Journal of the American Statistical Association 61, 199–203. Ogata, Yosihiko (1990), A Monte Carlo method for an objective Bayesian procedure, Annals of the Institute for Statistical Mathematics 42, 403–433. Oh, Man-Suk, and James O. Berger (1993), Integration of multimodal functions by Monte Carlo importance sampling, Journal of the American Statistical Association 88, 450–456. Øksendal, Bernt (1998), Stochastic Differential Equations. An Introduction with Applications, fifth edition, Springer-Verlag, Berlin. ¨ Okten, Giray (1998), Error estimates for quasi-Monte Carlo methods, Monte Carlo and Quasi-Monte Carlo Methods 1996 (edited by Harald Niederreiter, Peter Hellekalek, Gerhard Larcher, and Peter Zinterhof), Springer-Verlag, New York, 353–358. Olken, Frank, and Doron Rotem (1995a), Random sampling from databases: A survey, Statistics and Computing 5, 25–42. Olken, Frank, and Doron Rotem (1995b), Sampling from spatial databases, Statistics and Computing 5, 43–57. Owen, A. B. (1992a), A central limit theorem for Latin hypercube sampling, Journal of the Royal Statistical Society, Series B 54, 541–551. Owen, A. B. (1992b), Orthogonal arrays for computer experiments, integration and visualization, Statistica Sinica 2, 439–452. Owen, A. B. (1994a), Lattice sampling revisited: Monte Carlo variance of means over randomized orthogonal arrays, Annals of Statistics 22, 930–945. Owen, Art B. (1994b), Controlling correlations in Latin hypercube samples, Journal of the American Statistical Association 89, 1517–1522.
362
BIBLIOGRAPHY
Owen, Art B. (1995), Randomly permuted (t, m, s)-nets and (t, s)-sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (edited by Harald Niederreiter and Peter Jau-Shyong Shiue), Springer-Verlag, New York, 299–317. Owen, Art B. (1997), Scrambled net variance for integrals of smooth functions, Annals of Statistics 25, 1541–1562. Owen, Art B. (1998), Latin supercube sampling for very high-dimensional simulations, ACM Transactions on Modeling and Computer Simulation 8, 71– 102. Papageorgiou, A., and J. F. Traub (1996), Beating Monte Carlo, Risk (June), 63–65. Park, Chul Gyu; Tasung Park; and Dong Wan Shin (1996), A simple method for generating correlated binary variates, The American Statistician 50, 306–310. Park, Stephen K., and Keith W. Miller (1988), Random number generators: Good ones are hard to find, Communications of the ACM 31, 1192–1201. Parrish, Rudolph S. (1990), Generating random deviates from multivariate Pearson distributions, Computational Statistics & Data Analysis 9, 283– 295. Patefield, W. M. (1981), An efficient method of generating r × c tables with given row and column totals, Applied Statistics 30, 91–97. Pearson, E. S.; N. L. Johnson; and I. W. Burr (1979), Comparisons of the percentage points of distributions with the same first four moments, chosen from eight different systems of frequency curves, Communications in Statistics — Simulation and Computation 8, 191–230. Perlman, Michael D., and Michael J. Wichura (1975), Sharpening Buffon’s needle, The American Statistician 29, 157–163. Peterson, Arthur V., and Richard A. Kronmal (1982), On mixture methods for the computer generation of random variables, The American Statistician 36, 184–191. Philippe, Anne (1997), Simulation of right and left truncated gamma distributions by mixtures, Statistics and Computing 7, 173–181. Pratt, John W. (1981), Concavity of the log likelihood, Journal of the American Statistical Association 76, 103–106. Press, William H.; Saul A. Teukolsky; William T. Vetterling; and Brian P. Flannery (1992), Numerical Recipes in Fortran, second edition, Cambridge University Press, Cambridge, United Kingdom. Press, William H.; Saul A. Teukolsky; William T. Vetterling; and Brian P. Flannery (2002), Numerical Recipes in C++, second edition, Cambridge University Press, Cambridge, United Kingdom. Propp, James Gary, and David Bruce Wilson (1996), Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures and Algorithms 9, 223–252. Propp, James, and David Wilson (1998), Coupling from the past: A user’s guide, Microsurveys in Discrete Probability (edited by D. Aldous and J.
BIBLIOGRAPHY
363
Propp), American Mathematical Society, Providence, Rhode Island, 181– 192. Pullin, D. I. (1979), Generation of normal variates with given sample mean and variance, Journal of Statistical Computation and Simulation 9, 303–309. Rabinowitz, M., and M. L. Berenson (1974), A comparison of various methods of obtaining random order statistics for Monte-Carlo computations. The American Statistician 28, 27–29. Rajasekaran, Sanguthevar, and Keith W. Ross (1993), Fast algorithms for generating discrete random variates with changing distributions, ACM Transactions on Modeling and Computer Simulation 3, 1–19. Ramberg, John S., and Bruce W. Schmeiser (1974), An approximate method for generating asymmetric random variables, Communications of the ACM 17, 78–82. RAND Corporation (1955), A Million Random Digits with 100,000 Normal Deviates, Free Press, Glencoe, Illinois. Ratnaparkhi, M. V. (1981), Some bivariate distributions of (X, Y ) where the conditional distribution of Y , given X, is either beta or unit-gamma, Statistical Distributions in Scientific Work. Volume 4 – Models, Structures, and Characterizations (edited by Charles Taillie, Ganapati P. Patil, and Bruno A. Baldessari), D. Reidel Publishing Company, Boston, 389–400. Reeder, H. A. (1972), Machine generation of order statistics, The American Statistician 26(4), 56–57. Relles, Daniel A. (1972), A simple algorithm for generating binomial random variables when N is large, Journal of the American Statistical Association 67, 612–613. Ripley, Brian D. (1987), Stochastic Simulation, John Wiley & Sons, New York. Robert, Christian P. (1995), Simulation of truncated normal variables, Statistics and Computing 5, 121–125. Robert, Christian P. (1998a), A pathological MCMC algorithm and its use as a benchmark for convergence assessment techniques, Computational Statistics 13, 169–184. Robert, Christian P. (Editor) (1998b), Discretization and MCMC Convergence Assessment, Springer-Verlag, New York. Robert, Christian P., and George Casella (1999), Monte Carlo Statistical Methods, Springer-Verlag, New York. Roberts, G. O. (1992), Convergence diagnostics of the Gibbs sampler, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 775–782. Roberts, Gareth O. (1996), Markov chain concepts related to sampling algorithms, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 45–57. Robertson, J. M., and G. R. Wood (1998), Information in Buffon experiments, Journal of Statistical Planning and Inference 66, 21–37. Ronning, Gerd (1977), A simple scheme for generating multivariate gamma
364
BIBLIOGRAPHY
distributions with non-negative covariance matrix, Technometrics 19, 179– 183. Rosenbaum, Paul R. (1993), Sampling the leaves of a tree with equal probabilities, Journal of the American Statistical Association 88, 1455–1457. Rosenthal, Jeffrey S. (1995), Minorization conditions and convergence rates for Markov chain Monte Carlo, Journal of the American Statistical Association 90, 558–566. Rousseeuw, Peter J., and Annick M. Leroy (1987), Robust Regression and Outlier Detection, John Wiley & Sons, New York. Rubin, Donald B. (1987), Comment on Tanner and Wong, “The calculation of posterior distributions by data augmentation”, Journal of the American Statistical Association 82, 543–546. Rubin, Donald B. (1988), Using the SIR algorithm to simulate posterior distributions (with discussion), Bayesian Statistics 3 (edited by J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 395–402. Ryan, Thomas P. (1980), A new method of generating correlation matrices, Journal of Statistical Computation and Simulation 11, 79–85. Sacks, Jerome; William J. Welch; Toby J. Mitchell; and Henry P. Wynn (1989), Design and analysis of computer experiments (with discussion), Statistical Science 4, 409–435. Sarkar, P. K., and M. A. Prasad (1987), A comparative study of pseudo and quasirandom sequences for the solution of integral equations, Journal of Computational Physics 68, 66–88. Sarkar, Tapas K. (1996), A composition-alias method for generating gamma variates with shape parameter greater than 1, ACM Transactions on Mathematical Software 22, 484–492. S¨ arndal, Carl-Erik; Bengt Swensson; and Jan Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag, New York. Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, Chapman & Hall, London. Schervish, Mark J., and Bradley P. Carlin (1992) On the convergence of successive substitution sampling, Journal of Computational and Graphical Statistics 1, 111–127. Schmeiser, Bruce (1983), Recent advances in generation of observations from discrete random variates, Computer Science and Statistics: The Interface (edited by James E. Gentle), North-Holland Publishing Company, Amsterdam, 154–160. Schmeiser, Bruce, and A. J. G. Babu (1980), Beta variate generation via exponential majorizing functions, Operations Research 28, 917–926. Schmeiser, Bruce, and Voratas Kachitvichyanukul (1990), Noninverse correlation induction: Guidelines for algorithm development, Journal of Computational and Applied Mathematics 31, 173–180. Schmeiser, Bruce, and R. Lal (1980), Squeeze methods for generating gamma variates, Journal of the American Statistical Association 75, 679–682.
BIBLIOGRAPHY
365
Schucany, W. R. (1972), Order statistics in simulation, Journal of Statistical Computation and Simulation 1, 281–286. Selke, W.; A. L. Talapov; and L. N. Shchur (1993), Cluster-flipping Monte Carlo algorithm and correlations in “good” random number generators, JETP Letters 58, 665–668. Shao, Jun, and Dongsheng Tu (1995), The Jackknife and Bootstrap, SpringerVerlag, New York. Shaw, J. E. H. (1988), A quasi-random approach to integration in Bayesian statistics, Annals of Statistics 16, 895–914. Shchur, Lev N., and Henk W. J. Bl¨ ote (1997), Cluster Monte Carlo: Scaling of systematic errors in the two-dimensional Ising model, Physical Review E 55, R4905–R4908. Sibuya, M. (1961), Exponential and other variable generators, Annals of the Institute for Statistical Mathematics 13, 231–237. Sinclair, C. D., and B. D. Spurr (1988), Approximations to the distribution function of the Anderson–Darling test statistic, Journal of the American Statistical Association 83, 1190–1191. Smith, A. F. M., and G. O. Roberts (1993), Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society, Series B 55, 3–24. Smith, Robert L. (1984), Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions, Operations Research 32, 1297– 1308. Smith, W. B., and R. R. Hocking (1972), Algorithm AS53: Wishart variate generator, Applied Statistics 21, 341–345. Sobol’, I. M. (1967), On the distribution of points in a cube and the approximate evaluation of integrals, USSR Computational Mathematics and Mathematical Physics 7, 86–112. Sobol’, I. M. (1976), Uniformly distributed sequences with an additional uniform property, USSR Computational Mathematics and Mathematical Physics 16, 236–242. Spanier, Jerome, and Keith B. Oldham (1987), An Atlas of Functions, Hemisphere Publishing Corporation, Washington (also Springer-Verlag, Berlin). Srinivasan, Ashok; Michael Mascagni; and David Ceperley (2003), Testing parallel random number generators, Parallel Computing 29, 69–94. Stacy, E. W. (1962), A generalization of the gamma distribution, Annals of Mathematical Statistics 33, 1187–1191. Stadlober, Ernst (1990), The ratio of uniforms approach for generating discrete random variates, Journal of Computational and Applied Mathematics 31, 181–189. Stadlober, Ernst (1991), Binomial variate generation: A method based on ratio of uniforms, The Frontiers of Statistical Computation, Simulation & Mod¨ urk, and E. C. van der eling (edited by P. R. Nelson, E. J. Dudewicz, A. Ozt¨ Meulen), American Sciences Press, Columbus, Ohio, 93–112.
366
BIBLIOGRAPHY
Steel, S. J., and N. J. le Roux (1987), A reparameterisation of a bivariate gamma extension, Communications in Statistics — Theory and Methods 16, 293–305. Stef˘ anescu, S., and I. V˘ aduva (1987), On computer generation of random vectors by transformations of uniformly distributed vectors, Computing 39, 141–153. Stein, Michael (1987), Large sample properties of simulations using Latin hypercube sampling, Technometrics 29, 143–151. Stephens, Michael A. (1986), Tests based on EDF statistics, Goodness-of-Fit Techniques (edited by Ralph B. D’Agostino and Michael A. Stephens), Marcel Dekker, New York, 97–193. Stewart, G. W. (1980), The efficient generation of random orthogonal matrices with an application to condition estimators, SIAM Journal of Numerical Analysis 17, 403–409. Stigler, Stephen M. (1978), Mathematical statistics in the early states, Annals of Statistics 6, 239–265. Stigler, Stephen M. (1991), Stochastic simulation in the nineteenth century, Statistical Science 6, 89–97. Student (1908a), On the probable error of a mean, Biometrika 6, 1–25. Student (1908b), Probable error of a correlation coefficient, Biometrika 6, 302– 310. Sullivan, Stephen J. (1993), Another test for randomness, Communications of the ACM 33, Number 7 (July), 108. Tadikamalla, Pandu R. (1980a), Random sampling from the exponential power distribution, Journal of the American Statistical Association 75, 683–686. Tadikamalla, Pandu R. (1980b), On simulating non-normal distributions, Psychometrika 45, 273–279. Tadikamalla, Pandu R., and Norman L. Johnson (1982), Systems of frequency curves generated by transformations of logistic variables, Biometrika 69, 461–465. Takahasi, K. (1965), Note on the multivariate Burr’s distribution, Annals of the Institute of Statistical Mathematics 17, 257–260. Tang, Boxin (1993), Orthogonal array-based Latin hypercubes, Journal of the American Statistical Association 88, 1392–1397. Tanner, Martin A. (1996), Tools for Statistical Inference, third edition, SpringerVerlag, New York. Tanner, M. A., and R. A. Thisted (1982), A remark on AS127. Generation of random orthogonal matrices, Applied Statistics 31, 190–192. Tanner, Martin A., and Wing Hung Wong (1987), The calculation of posterior distributions by data augmentation (with discussion), Journal of the American Statistical Association 82, 528–549. Tausworthe, R. C. (1965), Random numbers generated by linear recurrence modulo two, Mathematics of Computation 19, 201–209. Taylor, Malcolm S., and James R. Thompson (1986), Data based random
BIBLIOGRAPHY
367
number generation for a multivariate distribution via stochastic simulation, Computational Statistics & Data Analysis 4, 93–101. Tezuka, Shu (1991), Neave effect also occurs with Tausworthe sequences, Proceedings of the 1991 Winter Simulation Conference, Association for Computing Machinery, New York, 1030–1034. Tezuka, Shu (1993), Polynomial arithmetic analogue of Halton sequences, ACM Transactions on Modeling and Computer Simulation 3, 99–107. Tezuka, Shu (1995), Uniform Random Numbers: Theory and Practice, Kluwer Academic Publishers, Boston. Tezuka, Shu, and Pierre L’Ecuyer (1992), Analysis of add-with-carry and subtractwith-borrow generators, Proceedings of the 1992 Winter Simulation Conference, Association for Computing Machinery, New York, 443–447. Tezuka, Shu; Pierre L’Ecuyer; and R. Couture (1994), On the lattice structure of the add-with-carry and subtract-with-borrow random number generators, ACM Transactions on Modeling and Computer Simulation 3, 315–331. Thomas, Andrew; David J. Spiegelhalter; and Wally R. Gilks (1992), BUGS: A program to perform Bayesian inference using Gibbs sampling, Bayesian Statistics 4 (edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), Oxford University Press, Oxford, United Kingdom, 837–842. Thompson, James R. (2000), Simulation: A Modeler’s Approach, John Wiley & Sons, New York. Thompson, William J. (1997), Atlas for Computing Mathematical Functions: An Illustrated Guide for Practitioners with Programs in C and Mathematica, John Wiley & Sons, New York. Tierney, Luke (1991), Exploring posterior distributions using Markov chains, Computer Science and Statistics: Proceedings of the Twenty-third Symposium on the Interface (edited by Elaine M. Keramidas), Interface Foundation of North America, Fairfax, Virginia, 563–570. Tierney, Luke (1994), Markov chains for exploring posterior distributions (with discussion), Annals of Statistics 22, 1701–1762. Tierney, Luke (1996), Introduction to general state-space Markov chain theory, Practical Markov Chain Monte Carlo (edited by W. R. Gilks, S. Richardson, and D. J. Spiegelhalter), Chapman & Hall, London, 59–74. Vale, C. David, and Vincent A. Maurelli (1983), Simulating multivariate nonnormal distributions, Psychometrika 48, 465–471. Vattulainen, I. (1999), Framework for testing random numbers in parallel calculations, Physical Review E 59, 7200–7204. Vattulainen, I.; T. Ala-Nissila; and K. Kankaala (1994), Physical tests for random numbers in simulations, Physical Review Letters 73, 2513–2516. Vattulainen, I.; T. Ala-Nissila; and K. Kankaala (1995), Physical models as tests for randomness, Physical Review E 52, 3205–3214. Vattulainen, I.; K. Kankaala; J. Saarinen; and T. Ala-Nissila (1995), A comparative study of some pseudorandom number generators, Computer Physics Communications 86, 209–226.
368
BIBLIOGRAPHY
Vitter, J. S. (1984), Faster methods for random sampling, Communications of the ACM 27, 703–717. Vitter, Jeffrey Scott (1985), Random sampling with a reservoir, ACM Transactions on Mathematical Software 11, 37–57. Von Neumann, J. (1951), Various Techniques Used in Connection with Random Digits, NBS Applied Mathematics Series 12, National Bureau of Standards (now National Institute of Standards and Technology), Washington. Vose, Michael D. (1991), A linear algorithm for generating random numbers with a given distribution, IEEE Transactions on Software Engineering 17, 972–975. Wakefield, J. C.; A. E. Gelfand; and A. F. M. Smith (1991), Efficient generation of random variates via the ratio-of-uniforms method, Statistics and Computing 1, 129–133. Walker, A. J. (1977), An efficient method for generating discrete random variables with general distributions, ACM Transactions on Mathematical Software 3, 253–256. Wallace, C. S. (1976), Transformed rejection generators for gamma and normal pseudo-random variables, Australian Computer Journal 8, 103–105. Wallace, C. S. (1996), Fast pseudorandom generators for normal and exponential variates, ACM Transactions on Mathematical Software 22, 119–127. Wichmann, B. A., and I. D. Hill (1982), Algorithm AS183: An efficient and portable pseudo-random number generator, Applied Statistics 31, 188–190 (Corrections, 1984, ibid. 33, 123). Wikramaratna, R. S. (1989), ACORN — A new method for generating sequences of uniformly distributed pseudo-random numbers, Journal of Computational Physics 83, 16–31. Wilson, David Bruce, and James Gary Propp (1996), How to get an exact sample from a generic Markov chain and sample a random spanning tree from a directed graph, both within the cover time, Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, 448–457. Wolfram, Stephen (1984), Random sequence generation by cellular automata, Advances in Applied Mathematics 7, 123–169. (Reprinted in Wolfram, 1994.) Wolfram, Stephen (1994), Cellular Automata and Complexity. Collected Papers, Addison–Wesley Publishing Company, Reading, Massachusetts. Wolfram, Stephen (2002), A New Kind of Science, Wolfram Media, Inc., Champaign, Illinois. Wollan, Peter C. (1992), A portable random number generator for parallel computers, Communications in Statistics — Simulation and Computation 21, 1247–1254. Wu, Pei-Chi (1997), Multiplicative, congruential random-number generators with multiplier ±2k1 ±2k2 and modulus 2p −1, ACM Transactions on Mathematical Software 23, 255–265.
BIBLIOGRAPHY
369
Yu, Bin (1995), Comment on Besag et al., “Bayesian computation and stochastic systems”: Extracting more diagnostic information from a single run using cusum path plot, Statistical Science 10, 54–58. Zaremba, S. K. (Editor) (1972), Applications of Number Theory to Numerical Analysis, Academic Press, New York. Zeisel, H. (1986), A remark on Algorithm AS183: An efficient and portable pseudo-random number generator, Applied Statistics 35, 89. Zierler, Neal, and John Brillhart (1968), On primitive trinomials (mod 2), Information and Control 13, 541–554. Zierler, Neal, and John Brillhart (1969), On primitive trinomials (mod 2), II, Information and Control 14, 566–569. Ziff, Robert M. (1998), Four-tap shift-register-sequence random-number generators, Computers in Physics 12, 385–392. Ziv, J., and A. Lempel (1977), A universal algorithm for sequential data compression, IEEE Transactions on Information Theory 23, 337–343.
This page intentionally left blank
Author Index Berbee, H. C. P., 158 Berenson, M. L., 222 Berger, James O., 243 Berliner, L. Mark, 157 Best, D. J., 179, 192 Best, N. G., 153 Beyer, W. A., 66 Bhanot, Gyan, 143 Bickel, Peter J., 301 Birkes, David, 304 Bl¨ ote, Henk W. J., 260 Blouin, Fran¸cois, 32, 287 Blum, L., 4, 37 Blum, M., 4, 37 Boender, G. E., 158 Bouleau, Nicolas, 99 Boyar, J., 4 Boyett, J. M., 202 Boyle, Phelim P., 98 Braaten, E., 95, 98, 239 Bratley, Paul, 97, 98, 172, 296, 334 Bray, T. A., 173, 174, 176 Brillhart, John, 39 Bromberg, Judith, 198 Brooks, S. P., 146 Brophy, John F., 30 Brown, Morton B., 198 Buckheit, Jonathan B., 299 Buckle, D. J., 196 Burr, Irving W., 194, 195
Abramowitz, Milton, 175, 332 Afflerbach, Lothar, 35, 66, 133 Agarwal, Satish K., 183 Agresti, Alan, 252 Ahn, Hongshik, 188, 204 Ahrens, Joachim H., 125, 132, 173, 177, 179, 188, 218 Akima, Hirosha, 109 Al-Saleh, Jamal A., 183 Ala-Nissila, T., 21, 41, 79, 86, 260 Albert, James, 194 Alonso, Laurent, 219 Altman, N. S., 34 Aluru, Srinivas, 43 Anderson, N. H., 26 Anderson, T. W., 201, 209 Andrews, David F., 300, 301 Antonov, I. A., 96 Arnason, A. N., 205 Arnold, Barry C., 170, 192 Asau, Y., 105, 107 Atkinson, A. C., 66, 180, 183, 193 Avramidis, Athanassios N., 221, 249 Babu, A. J. G., 183 Bacon-Shone, J., 194 Bailey, David H., 44, 91 Bailey, Ralph W., 185 Balakrishnan, N., 203, 223, 327 Banerjia, Sanjeev, 202 Baniuk, L, 205 Banks, David L., 80, 85 Barkema, G. T., 229, 260, 261 Barnard, G. A., 251 Barndorff-Nielsen, Ole E., 193, 270 Bays, Carter, 22 Beaver, Robert J., 170 Beck, J., 97 Becker, P. J., 123, 208 Becker, Richard A., 291 Beckman, Richard J., 249 B´elisle, Claude J. P., 158, 197 Bellhouse, D. R., 219 Bendel, R. B., 200 Bentley, Jon Louis, 212
Cabrera, Javier, 20 Caflisch, Russel E., 243 Cannon, L. E., 105, 106, 107 Carlin, Bradley P., 146, 157, 158, 256 Carlin, John B., 256 Carta, David G., 21 Casella, George, 149, 156, 251, 334 Ceperley, David, 87 Chalmers, C. P., 200 Chamayou, J.-F., 196 Chambers, John M., 196, 291 Chan, Kwok Hung, 52, 53 Chen, H. C., 105, 107 Chen, Huifen, 225
371
372 Chen, James J., 188, 204 Chen, K. S., 194 Chen, Ming-Hui, 157, 158, 256 Chen, Rong, 244, 273 Chen, W. W. L., 97 Cheng, R. C. H., 178, 184, 248 Cheng, Shiow-Wen, 210, 221 Chernick, Michael R., 255 Chib, Siddhartha, 143 Chou, Wun-Seng, 37 Chou, Youn-Min, 194 Cipra, Barry A., 260 Cislak, Peter J., 194 Coldwell, R. L., 20, 71, 87 Collings, Bruce Jay, 46 Compagner, Aaldert, 42 Conover, William J., 249 Cook, Dianne A., 20 Cook, R. Dennis, 209 Cordeau, Jean-Fran¸coise, 21, 67 Couture, Raymond, 32, 36, 287 Coveyou, R. R., 20, 65 Cowles, Mary Kathryn, 146, 158 Crandall, Richard E., 44, 91 Cuccaro, Steven A., 33, 87 Currin, Carla, 257 D’Agostino, Ralph B., 76 Dagpunar, John S., 181, 192, 193, 207, 334 Damien, Paul, 150, 168, 175, 182 David, Herbert A., 222, 227 Davis, Charles S., 198 Davis, Don, 3 Davison, Anthony C., 255 de Freitas, Nando, 234 De Matteis, A., 47, 70 De´ ak, Istv´ an, 127, 197, 334 Delampady, Mohan, 194 Dellaportas, Petros, 151, 158 Deng, Lih-Yuan, 21, 32, 34, 49, 52, 53, 61 Derflinger, Gerhard, 122, 133 Devroye, Luc, 121, 126, 136, 137, 151, 154, 159, 171, 192, 194, 195, 196, 213, 334, vii Dieter, Ulrich, 18, 65, 132, 173, 177, 179, 188, 218 Do, Kim-Anh, 98 Dodge, Yadolah, 43, 304 Donoho, David L., 299 Doucet, Arnaud, 234 Dudewicz, Edward J., 194 Durham, S. D., 22 Dwyer, Rex A., 202 Efron, Bradley, 255 Eichenauer, J¨ urgen, 36, 38, 66
AUTHOR INDEX Eichenauer-Herrmann, J¨ urgen, 37, 38, 66, 70 Emrich, Lawrence J., 203, 204, 214 Epstein, Peter, 213 Erber, T., 45 Ernst, Michael D., 207 Evans, Michael, 233 Everett, P., 45 Everitt, Brian S., 183 Everson, Philip J., 199 Falk, Michael, 207 Fang, Kai-Tai, 7, 47, 97, 201, 209, 334 Faure, H., 95 Feast, G. M., 178 Feiveson, A. H., 199 Fenstermacher, Philip, 3 Ferrenberg, Alan M., 21, 41, 86 Fill, James Allen, 148, 149 Finkel, Raphael Ari, 212 Fisher, N. I., 192 Fishman, George S., 20, 21, 58, 65, 79, 288, 334 Flannery, Brian P., 287 Fleishman, Allen I., 195, 210 Flournoy, Nancy, 233 Forster, Jonathan J., 252 Fouque, Jean-Pierre, 270 Fox, Bennett L., 97, 98, 172, 296, 334 Frederickson, P., 26 Freimer, Marshall, 194 Freund, John E., 123 Friedman, Jerome H., 212 Frigessi, A., 147 Fuller, A. T., 12 Fushimi, Masanori, 41, 288 Gamerman, Dani, 146 Gange, Stephen J., 208 Gelatt, C. D., 259, 278 Gelfand, Alan E., 130, 133, 146, 157, 256 Gelman, Andrew, 146, 150, 233, 256 Geman, Donald, 155 Geman, Stuart, 155 Gennings, Chris, 208 Gentle, James E., 6, 28, 30, 55, 59, 87, 251 Gentleman, Robert, 291 George, E. Olusegun, 49 George, Edward I., 149, 156, 158 Gerontidis, I., 222 Geweke, John, 175, 198, 256 Geyer, Charles J., 154, 157 Ghitany, M. E., 183 Gilks, Walter R., 144, 146, 151, 153, 158, 256 Gleser, Leon Jay, 200 Goldberg, Matthew S., 209, 210
AUTHOR INDEX Golder, E. R., 172, 185 Golomb, S. W., 40, 43 Goodman, A. S., 21, 288 Gordon, J., 37 Gordon, Neil J., 234, 244 Gosset, W. S. (“Student”), 297 Grafton, R. G. T., 78 Greenberg, Edward, 143 Greenwood, J. Arthur, 161, 220 Griffiths, P., 333 Groeneveld, Richard A., 170 Grothe, Holger, 35, 36, 38, 66, 70 Guerra, Victor O., 109 Guihenneuc-Jouyaux, Chantal, 146 Gustafson, John, 43 Haas, Roy W., 193 Halton, J. H., 94 Hamilton, Kenneth G., 177 Hammersley, J. M., 229, 271, 299 Hammond, Joseph L., 209 Hampel, Frank R., 301 Handscomb, D. C., 229, 271, 299 Harris, D. L., 199 Hartley, H. O., 199, 221 Hastings, W. K., 141 Heiberger, Richard M., 201 Hellekalek, Peter, 21, 95, 334 Henson, S., 194 Herrmann, Eva, 38 Hesterberg, Timothy C., 243, 245 Hickernell, Fred J., 99, 334 Hill, I. D., 47, 55, 194, 333 Hill, R., 194 Hinkley, David V., 255 Hiromoto, R., 26 Hoaglin, David C., 300 Hocking, R. R., 199 Holder, R. L., 194 Hope, A. C. A., 251 Hopkins, T. R., 65 H¨ ormann, Wolfgang, 122, 133, 152, 159 Hosack, J. M., 45 Huber, Peter J., 20, 301 Hull, John C., 264, 268 Hultquist, Robert A., 208 Ibrahim, Joseph G., 256 Ickstadt, K., 37 Ihaka, Ross, 3, 291 Ireland, Kenneth, 7, 9, 12 J¨ ackel, Peter, 97, 100, 270 Jaditz, Ted, 44 James, F., 20, 45, 58 J¨ ohnk, M. D., 183 Johnson, Mark E., 197, 209
373 Johnson, Norman L., 195, 203, 327 Johnson, P. W., 45 Johnson, Valen E., 146 Jones, G., 208 Jordan, T. L., 26 Joy, Corwin, 98 Juneja, Sandeep, 225 Kachitvichyanukul, Voratas, 187, 188, 189, 210, 221, 246 Kahn, H., 239 Kankaala, K., 21, 41, 79, 86, 260 Kao, Chiang, 33 Karian, Zaven A., 194 Kato, Takashi, 38, 78 Kemp, Adrienne W., 108, 118, 159, 188, 190 Kemp, C. D., 159, 187, 188 Kennedy, William J., 201 Kinderman, A. J., 129, 173, 185 Kirkpatrick, Scott, 41, 259, 278, 287 Kleijnen, Jack P. C., 310 Knuth, Donald E., 12, 32, 37, 53, 65, 118, 219, 334 Kobayashi, K., 183 Kocis, Ladislav, 95 Koehler, J. R., 257 Kollia, Georgia, 194 Kotz, Samuel, 203, 327 Kovalenko, I. N., 79 Kozubowski, Tomasz J., 207 Krawczyk, Hugo, 4 Krommer, Arnold R., 95 Kronmal, Richard A., 125, 135, 136, 191 Kumada, Toshihiro, 39 Kurita, Yoshiharu, 39, 41, 42 Lagarias, Jeffrey C., 4 Lai, C. D., 208 Lal, R., 178 Landau, D. P., 21, 41, 86 Larcher, Gerhard, 334 Laud, Purushottam W., 150, 183 Lawrance, A. J., 11 Le Roux, N. J., 123 Learmonth, G. P., 21, 46, 291 L’Ecuyer, Pierre, 14, 21, 29, 32, 36, 37, 41, 47, 48, 55, 57, 63, 65, 67, 80, 85, 287, 334 Lee, A. J., 205 Leeb, Hannes, 37, 39 Lehmer, D. H., 11 Lehn, J¨ urgen, 36, 38 Lempel, A., 84 L´epingle, Dominique, 99 Leva, Joseph L., 174
374 Lewis, P. A. W., 21, 46, 55, 58, 225, 288, 291, 334 Lewis, T. G., 40, 41 Leydold, Josef, 132, 133, 153, 159 Li, Jing, 30 Li, Kim-Hung, 219 Li, Run-Ze, 97, 201 Li, Shing Ted, 209 Liao, J. G., 190 Lin, Dennis K. J., 21, 32, 34, 49, 61 Lin, Thomas C., 194 Liu, Jun S., 144, 230, 244, 273, 334 Logvinenko, Tanya, 273 London, Wendy B., 208 Louis, Thomas A., 256 Luby, Michael, 3, 4 Lurie, D., 221, 222 Lurie, Philip M., 209, 210 L¨ uscher, Martin, 45 MacEachern, Steven N., 157 Machida, Motoya, 149 MacLaren, M. D., 21, 46, 173, 174, 176 MacPherson, R. D., 20, 65 Mallows, C. L., 196 Manly, Bryan F. J., 252 Marasinghe, Mervyn G., 201 Marinari, Enzo, 261 Marriott, F. H. C., 251 Marsaglia, George, 14, 17, 20, 21, 35, 43, 46, 49, 66, 79, 80, 83, 85, 105, 117, 118, 121, 127, 154, 173, 174, 175, 176, 185, 200, 202 Marsaglia, John C. W., 174 Marshall, A. W., 239 Marshall, Albert W., 49, 207 Martinelli, F., 147 Mascagni, Michael, 33, 53, 87 Mason, R. L., 222 Matsumoto, Makoto, 39, 41, 42 Maurelli, Vincent A., 210 Maurer, Ueli M., 84 McCullough, B. D., 83, 291 McDonald, John W., 252 McDonald, Patrick, 194 McKay, Michael D., 249 McLeod, A. I., 219 Meeker, William Q., 170 Mendoza-Blanco, Jos´e R., 186 Meng, Xiao-Li, 233 Mengersen, Kerrie L., 146 Metropolis, N., 140, 259, 277 Meyer, D., 194 Meyn, S. P., 137, 225 Michael, John R., 193 Mickey, M. R., 200 Mihram, George Arthur, 208
AUTHOR INDEX Miller, J. M., 21, 288 Miller, Keith W., 20, 28, 61, 86, 288 Mitchell, Toby J., 248, 257 Modarres, R., 208 Møller, Jesper, 148 Monahan, John F., 129, 185 Moore, Louis R., III, 20, 21, 58, 65, 79, 288 Morgan, B. J. T., 334 Morris, Carl N., 199 Morris, Max, 257 Moskowitz, Bradley, 243 Mudholkar, Govind S., 194 Murdoch, Duncan J., 149 Nagaraja, H. N., 222 Neal, N. G., 153 Neal, Radford M., 155 Neave, H. R., 172, 185 Nelson, Barry L., 245 Newman, M. E. J., 229, 260, 261 Niederreiter, Harald, 35, 36, 37, 38, 66, 94, 97, 98, 100, 296, 334 Nishimura, Takuji, 42 Nolan, John P., 196, 208 Norman, J. E., 105, 106, 107 Odell, P. L., 199 Ogata, Yosihiko, 233 Oh, Man-Suk, 243 ¨ Okten, Giray, 99, 239 Oldham, Keith B., 332 Olken, Frank, 219 Olkin, Ingram, 49, 200, 201, 207 Orav, E. J., 55, 58, 334 Owen, Art B., 239, 249, 257 Pagnutti, S., 47, 70 Papageorgiou, A., 97 Papanicolaou, George, 270 Parisi, G., 261 Park, Chul Gyu, 204, 214 Park, Stephen K., 20, 28, 61, 86, 288 Park, Tasung, 204, 214 Parrish, Rudolph F., 208, 210 Patefield, W. M., 202, 203 Payne, W. H., 40, 41 Pearce, M. C., 180 Pearson, E. S., 195 Perlman, Michael D., 274 Peterson, Arthur V., 125, 135, 136, 191 Philippe, Anne, 181, 182 Piedmonte, Marion R., 203, 204, 214 Podg´ orski, Krzysztof, 207 Polasek, Wolfgang, 194 Prabhu, G. M., 43 Prasad, M. A., 97
AUTHOR INDEX Pratt, John W., 151 Press, William H., 287 Propp, James Gary, 147, 219 Pryor, Daniel V., 33, 87 Pullin, D. I., 248 Rabinowitz, M., 222 Rajasekaran, Sanguthevar, 119 Ramage, J. G., 173 Ramberg, John S., 194 Ramgopal, Paul, 183 Ratnaparkhi, M. V., 208 Rayner, J. C. W., 208 Reeder, H. A., 221, 222 Relles, Daniel A., 187 Richardson, S., 144, 146 Rinnooy Kan, A. H. G., 158 Ripley, Brian D., 334 Robert, Christian P., 146, 175, 251, 334 Roberts, Gareth O., 144, 146, 158, 256 Robertson, J. M., 275 Robinson, M. L., 33 Rogers, W. H., 301 Romeijn, H. Edwin, 158, 197 Ronning, Gerd, 208 Roof, R. B., 66 Rosen, Michael, 7, 9, 12 Rosen, Ori, 190 Rosenbaum, Paul R., 219 Rosenbluth, A. W., 140, 259, 277 Rosenbluth, M. N., 140, 259, 277 Rosenthal, Jeffrey S., 146, 149 Ross, Keith W., 119 Rotem, Doron, 219 Roux, J. J. J., 123, 208 Rubin, Donald B., 146, 149, 256 Ryan, T. P., 201 Saarinen, J., 86 Sack, J¨ org-R¨ udiger, 213 Sacks, Jerome, 248, 257 Sahu, Sujit K., 146 Saleev, V. M., 96 Salmond, D. J., 244 Sandhu, R. A., 223 Sarkar, P. K., 97 Sarkar, Tapas K., 178 S¨ arndal, Carl-Erik, 218, 227, 239, 241 Schafer, J. L., 251 Scheffer, C. L., 158 Schervish, Mark J., 157, 158 Schladitz, Katja, 148 Schmeiser, Bruce W., 157, 158, 178, 183, 187, 188, 189, 194, 210, 221, 225, 246 Schott, Ren´e, 219 Schrage, Linus E., 172, 334
375 Schucany, William R., 193, 221 Selke, W., 41, 86 Sendrier, Nicolas, 4 Settle, J. G., 172, 185 Seznec, Andr´e, 4 Shahabudding, Perwez, 225 Shao, Jun, 255 Shao, Qi-Man, 256 Shaw, J. E. H., 98 Shchur, Lev N., 41, 86, 260 Shedler, G. S., 225 Shephard, Neil, 193, 270 Shin, Dong Wan, 204, 214 Shiue, Peter Jau-Shyong, 334 Shub, M., 4, 37 Sibuya, M., 161 Simard, Richard, 14, 21, 67 Sinclair, C. D., 76 Sircar, K. Ronnie, 270 Smith, Adrian F. M., 130, 133, 150, 151, 157, 183, 244, 256 Smith, B., 26 Smith, Peter W. F., 252 Smith, Philip W., 30 Smith, Richard L., 222 Smith, Robert L., 158, 197 Smith, W. B., 199 Sobol’, I. M., 94 Spanier, Jerome, 332, 334 Spiegelhalter, David J., 144, 146, 256 Spurr, B. D., 76 Srinivasan, Ashok, 53, 87 Stacy, E. W., 182 Stadlober, Ernst, 130, 131, 132, 187, 189 Stander, J., 147 Steel, S. J., 123 Stef˘ anescu, S., 133 Stegun, Irene A., 175, 332 Stein, Michael, 249 Stephens, Michael A., 76 Stern, Hal S., 256 Stewart, G. W., 201 Stigler, Stephen M., 297 Stoll, Erich P., 41, 287 Stuck, B. W., 196 Sullivan, Stephen J., 89 Swartz, Tim, 233 Swensson, Bengt, 218, 227, 239, 241 Tadikamalla, Pandu R., 178, 195 Takahasi, K., 208 Talapov, A. L., 41, 86 Tan, K. K. C., 153 Tan, Ken Seng, 98 Tang, Boxin, 249 Tang, H. C., 33 Tanner, Martin A., 157, 201
376 Tapia, Richard A., 109 Tausworthe, R. C., 38 Taylor, Malcolm S., 212, 289 Telgen, J., 158 Teller, A. H., 140, 259, 277 Teller, E., 140, 259, 277 Teukolsky, Saul A., 287 Tezuka, Shu, 36, 47, 48, 97, 100, 172, 334 Thisted, Ronald A., 201 Thomas, Andrew, 256 Thompson, Elizabeth A., 154 Thompson, James R., 109, 212, 270, 289 Thompson, William J., 332 Tibshirani, Robert J., 255 Tierney, Luke, 137, 139, 144 Titterington, D. M., 26 Traub, J. F., 97 Tsang, Wai Wan, 127, 154, 174 Tsay, Liang-Huei, 79 Tsutakawa, Robert K., 233 Tu, Dongsheng, 255 Tu, Xin M., 186 Tukey, John W., 301 Turner, S., 194 Tweedie, R. L., 137, 225 Ueberhuber, Christoph W., 95 Underhill, L. G., 201 V˘ aduva, I., 133 Vale, C. David, 210 Vattulainen, I., 21, 41, 79, 86, 87, 260 Vecchi, M. P., 259, 278 Vetterling, William T., 287 Vitter, Jeffrey Scott, 218, 219 Von Neumann, J., 121 Vose, Michael D., 135 Wakefield, J. C., 130, 133 Walker, A. J., 133
AUTHOR INDEX Walker, Stephen G., 168, 175, 182 Wallace, C. S., 121, 174 Wang, J., 49 Wang, Yuan, 7, 47, 97 Warnock, T., 26 Wegenkittl, Stefan, 37, 38 Welch, William J., 248, 257 Weller, G., 95, 98, 239 Whiten, William J., 95 Wichmann, B. A., 47, 55 Wichura, Michael J., 274 Wikramaratna, R. S., 45 Wild, P., 151 Wilks, Allan R., 291 Williamson, D., 66 Wilson, David Bruce, 147, 219 Wilson, James R., 221, 249 Wolfram, Stephen, 44 Wollan, Peter C., 52 Wong, Wing Hung, 157, 244 Wong, Y. Joanna, 21, 41, 86 Wood, G. R., 275 Wretman, Jan, 218, 227, 239, 241 Wu, Li-ming, 38, 78 Wu, Pei-Chi, 13 Wynn, Henry P., 248, 257 Yanagihara, Niro, 38, 78 Ylvisaker, Don, 257 Yu, Bin, 146 Yuan, Yilian, 49, 52, 53 Zaman, Arif, 35, 174 Zaremba, S. K., 7 Zeisel, H., 47 Zierler, Neal, 39 Ziff, Robert M., 41, 287 Zinterhof, Peter, 334 Ziv, J., 84
Subject Index bootstrap, parametric 254 Buffon needle problem 274 BUGS (software) 256 Burr distribution 194 Burr family of distributions 208
acceptance/complement method 125 acceptance/rejection method 113, 227 ACM Transactions on Mathematical Software 284, 332, 335 ACM Transactions on Modeling and Computer Simulation 332 ACORN congruential generator 45 adaptive direction sampling 158 adaptive rejection sampling 151 add-with-carry random number generator 35 additive congruential random number generator 11 alias method 133 alias-urn method 136 almost exact inversion 121 alternating conditional sampling 157 AMS MR classification system 332 analysis of variance 238 Anderson–Darling test 75 antithetic variates 26, 246 Applied Statistics 284, 332, 334 ARMA model 226 ARS (adaptive rejection sampling) 151 AWC random number generator 35
C (programming language) 283 CALGO (Collected Algorithms of the ACM) 332, 335 Cauchy distribution 191 CDF (cumulative distribution function) 102, 316 cellular automata 44 censored data, simulating 223 censored observations 168, 180 CFTP (coupling from the past) 147, 148 chaotic systems 45 characteristic function 136 Chebyshev generator 45 chi distribution 185 chi-squared distribution 180, 184 chi-squared test 74 chop-down method 108, 190 cluster algorithm 259 Collected Algorithms of the ACM (CALGO) 332, 335 combined multiple recursive generator 48, 287 common variates 246 Communications in Statistics — Simulation and Computation 333 complete beta function 321 complete gamma function 320 COMPSTAT 331, 333 Computational Statistics & Data Analysis 333 Computational Statistics 333 Computing Science and Statistics 333 concave density 119, 150 congruential random number generator 11 constrained random walk 234, 273 constrained sampling 248 contaminated distribution 169 control variate 245 convex density 151
ball, generating random points in 202 batch means for variance estimation 237 Bernoulli distribution 105, 203 Bernoulli sampling 217 beta distribution 183 beta function 321 beta-binomial distribution 187, 204 Beyer ratio 66 binary matrix rank test 81 binary random variables 105, 203 binomial distribution 187 birthday spacing test 81 bit stream test 81 bit stripping 10, 13, 22 blocks, simulation experiments 51 Blum/Blum/Shub random number generator 37 Boltzmann distribution 258 bootstrap, nonparametric 253
377
378 correlated random variables 123 correlated random variables, generation 210, 221 correlation matrices, generating random ones 199 coupling from the past 147, 148 craps test 83 crude Monte Carlo 232 cryptography 3, 4, 37, 334 cumulative distribution function 316 Current Index to Statistics 332 cycle length of random number generator 3, 11, 22 D-distribution 183 d-variate uniformity 63 data augmentation 157 data-based random number generation 212, 289 DIEHARD tests for random number generators 80, 291 Dirac delta function 319 Dirichlet distribution 205 Dirichlet-multinomial distribution 206 discrepancy 69, 93 discrete uniform distribution 105, 217 DNA test for random numbers 82 double exponential distribution 177, 207 ECDF (empirical cumulative distribution function) 74, 210, 316 economical method 127 eigenvalues, generating ones from random Wishart matrices 201 elliptically contoured distribution 197, 207, 208 empirical cumulative distribution function 74, 316 empirical test 71 entropy 68 envelope 114 equidistributed 63 equivalence relationship 8 Erlang distribution 180 Euler totient function 9, 12 exact-approximation method 121 exact sampling 147, 148 exponential distribution 176 exponential power distribution 178 extended hypergeometric distribution 190 extended gamma processes 183 Faure sequence 94, 95 feedback shift register generator 38 Fibonacci random number generator 33 finite field 9 fixed-point representation 10
SUBJECT INDEX folded distributions 169 Fortran 95 283 Galois field 9, 38 gamma distribution 178, 208 gamma distribution, bivariate extension 208 gamma function 320 GAMS (Guide to Available Mathematical Software) 285, 335 GAMS, electronic access 335 GARCH model 226 generalized gamma distributions 182, 195 generalized inverse Gaussian distribution 193 generalized lambda family of distributions 194 geometric distribution 189 geometric splitting 241 GFSR (method) 38 Gibbs distribution 258 Gibbs method 149, 155, 256 GIS (geographic information system) 219 GNU Scientific Library (GSL) 287 goodness-of-fit test 74, 75 Google (Web search engine) 335 Gray code 96, 98 GSL (GNU Scientific Library) 287 halfnormal distribution 176 Halton sequence 94 Hamming weight 14 Hastings method 141 hat function 114 HAVEGE 4 Heaviside function 319 heavy-tailed distribution 196 hit-and-run method 157, 197 hit-or-miss Monte Carlo 116, 121, 232, 243, 271 hotbits 2 hybrid generator 98, 239 hypergeometric distribution 189 importance sampling 241, 271 importance-weighted resampling 149 IMSL Libraries 284, 288 incomplete beta function 321 incomplete gamma function 321 independence sampler 144 independent streams of random numbers 51 indicator function 319 infinitely divisible distribution 150 instrumental density 114 Interface Symposium 331, 333
SUBJECT INDEX International Association of Statistical Computing (IASC) 331, 333 interrupted sequence 230, 286, 290, 293 inverse CDF method for truncated distributions 168 inverse CDF method 102 inverse chi-squared distribution 169 “inverse” distributions 169 inverse gamma distribution 169 inverse Gaussian distribution 193 inverse Wishart distribution 169 inversive congruential generator 36 irreducible polynomial 38 Ising model 258 iterative method for random number generation 139, 155 Johnson family of distributions 194 Journal of Computational and Graphical Statistics 333 Journal of Statistical Computation and Simulation 333 k-d tree 212 Kepler conjecture 215 KISS (generator) 46 Kolmogorov distance 75 Kolmogorov–Smirnov test 74, 75 lagged Fibonacci generator 33 Lahiri’s sampling method 227 lambda family of distributions 194 Landau distribution 196 Laplace distribution 177, 207 Latin hypercube sampling 248 lattice test for random number generators 20, 66 leaped Halton sequence 95 leapfrogging, in random number generation 24, 43, 52 Lehmer congruential random number generator 11 Lehmer sequence 11 Lehmer tree 26 linear congruential random number generator 11 linear density 118 log-concave distributions 150 logarithmic distribution 190 lognormal distribution 176 Lorentzian distribution 191 M(RT)2 algorithm 259 machine epsilon 7 majorizing density 114, 203 Markov chain Monte Carlo 139, 144, 146, 156, 256
379 Markov chain 137 Markov process 224 Mathematical Reviews 332 Matlab (software) 284 matrix congruential generator 34 matrix congruential generator, multiple recursive 35 MCMC (Markov chain Monte Carlo) 139, 144, 146, 156, 256 Mersenne prime 13 Mersenne twister 42, 287 Metropolis algorithm 259 Metropolis–Hastings method 141, 156 Metropolis-Hastings method 256 “minimal standard” generator 13, 20, 21, 28, 61, 86 minimum distance test 82 Minkowski reduced basis 66 mixture distributions 110, 169, 248 modular arithmetic 7 Monte Carlo evaluation of an integral 231 Monte Carlo experimentation 297 Monte Carlo study 297 Monte Carlo test 251 MR classification system 332 MT19937 (generator) 42, 287 multinomial distribution 198 multiple recursive random number generator 32, 35 multiplicative congruential random number generator 11 multiply-with-carry random number generator 36 multivariate distributions 197, 212 multivariate double exponential distribution 207 multivariate gamma distribution 208 multivariate hypergeometric distribution 207 multivariate Laplace distribution 207 multivariate normal distribution 197 multivariate stable distribution 208 nearest neighbors 212 nearly linear density 118 negative binomial distribution 188 netlib 285, 332, 335, vii Niederreiter sequence 94, 98 NIST Test Suite, for random number generators 83 noncentral hypergeometric distribution 190 noncentral Wishart distribution 200 nonhomogeneous Poisson process 225 nonlinear congruential generator 37 nonparametric bootstrap 253 norm, function 231 normal distribution 171
380 normal number 43, 91 one-way function 3 order of random number generator 3, 32 order statistics, generating random 221 Ornstein-Uhlenbeck process 264 orthogonal matrices, generating random ones 201 overdispersion 204 overlapping pairs test 81 overlapping permutation test 81 overlapping quadruples test 82 overlapping sums test 83 parallel processing 43, 51, 52 parallel random number generation 51 parametric bootstrap 254 Pareto distribution 192 Pareto-type distribution 196 parking lot test 82 particle filtering 234 Pascal distribution 188 patchwork method 118 Pearson family of distributions 194, 208 perfect sampling 147 period of random number generator 3, 11, 22, 220 permutation, generating random ones 217 π as a source of random numbers 44, 91 Poisson distribution 188 Poisson process, generating a random one 177 Poisson process, nonhomogeneous 225 Poisson sampling 218 portability of software 28, 54, 102, 122, 167 Potts model 260 primitive element 12 primitive polynomial 96 primitive root 12 probabilistic error bound 233, 235 probability-skewed distribution 170 Proceedings of the Statistical Computing Section (of the ASA) 333 projection pursuit 20 quasi-Monte Carlo method 93 quasirandom sequence 4, 94 R (software) 284, 291 R250 (generator) 41, 287 Random Master 3 random number generator, congruential 11 random number generator, feedback shift method 38 random number generator, parallel 51 random number generator, testing 71
SUBJECT INDEX random sampling 217 RANDU (generator) 18, 58, 87 rand 55, 285 RANLUX (generator) 45, 287 Rao-Blackwellization 247 ratio-of-uniforms method 129, 178, 185 Rayleigh distribution 191 rectangle/wedge/tail method 173, 177 reproducible research 286, 299 resampling 252 reservoir sampling 218 residue 8 robustness studies 169, 195, 298 roughness of a function 231 runs test 77, 83, 84 S, S-Plus (software) 284, 291 sampling, random 217 sampling/importance resampling 149 second-order test 71 seed 3, 11, 24, 26, 286, 290, 292 self-avoiding random walk 234, 273 sequential importance sampling 244 sequential Monte Carlo 233 serial test 78 setup time 165 shuffled random number generator 22, 46 SIAM Journal on Scientific Computing 333 side effect 285 simple random sampling 217 simplex 213 simulated annealing 140, 259, 277 simulated tempering 154, 261 simulation 1, 146, 297 SIR (sampling/importance resampling) 149 skew-normal distribution 170 skewed distributions 170 smoothed acceptance/rejection method, for random number generation 243 smoothing parameter 212 smoothing 212 Sobol’ sequence 94, 96, 98 software engineering 285 spanning trees, generating random ones 219 spectral test for random number generators 20, 65 sphere, generating random points on a sphere 201 SPRNG, software for parallel random number generation 53, 296 squeeze test 83 squeeze, in acceptance/rejection 117, 132 stable distribution 196, 208 standard distribution 167
SUBJECT INDEX Statistical Computing Section of the American Statistical Association 331, 333 Statistical Computing & Graphics Newsletter 333 Statistics and Computing 333 statlib 285, 333, 334, vii stratified distribution 110 stratified sampling 241 strict reproducibility 28, 54, 122, 230 Student’s t distribution 185 substitution sampling 157 substreams 23, 33, 43, 51 subtract-with-borrow random number generator 35 Super-Duper (generator) 46 SWC random number generator 35 Swendsen–Wang algorithm 259 swindle, Monte Carlo 240 T -concave distributions 153, 159 table, generating random tables with fixed marginals 202 table-lookup method 105 Tausworthe random number generator 38 tempered transition 155 test suites 79 testing random number generators 71 TestU01 tests for random number generators 80 thinning method 225 3-D sphere test 82 transcendental numbers as a source of random numbers 44 transformed density rejection method 153 transformed rejection method 121
381 truncated distribution 168, 223 truncated gamma distribution 180, 181, 182 truncated normal distribution 175, 198 twisted GSFR generator 42 twos-complement representation 10 underdispersion 204 uniform time algorithm 166 universal methods 102 unpredictable 4, 37 urn method 105, 136 van der Corput sequence 94 variance estimation 237 variance reduction 26, 239 variance-covariance matrices, generating random ones 199 Vavilov distribution 196 von Mises distribution 193 Wald distribution 193 Weibull distribution 186 weight window 241 weighted resampling 149 Wichmann/Hill random number generator 47, 59 Wilson–Hilferty approximation 175 Wishart distribution 199 Wolff algorithm 259 zeta distribution 192 ziggurat method 127, 174 Zipf distribution 192
The Complexity Of Nonuniform Random Number Generation Pdf Reader Download
The generation of continuous random variables on a digital computer encounters a problem of accuracy caused by approximations and discretization error. These in turn. Mainconcept Conversion Pack Rapidshare. 5/28/2017 0 Comments. 3 does so much more than converting PDF files into Microsoft Word documents! My wife got married synopsis.
Nokia n72 software free download - Camera for Nokia, Ringtones for Nokia, Launcher Theme For Nokia 6, and many more programs. Download photo editor software for nokia n72 ringtones. Found 1460 Free Original Nokia N72 Ringtones. Download Nokia N72 Ringtones for free to your S60 phone or tablet. Why not share and showcase your nokia n72 ringtone downloads with Mobiles24? Click here to upload your ringtones to Mobiles24 or make ringtones from your own music with our. Photo Editor Symbian Apps Series 60 available for free download. My phone is nokia e71.y another soft cannot download.er not supported. Wallpapers, ringtones, and more for phones and tablets. Millions of members are sharing the fun and billions of free downloads served. Get our Android app, iOS app or Windows app from the official.
The Complexity Of Nonuniform Random Number Generation Pdf Reader Free
- AHRENS, J.H. and DIETER, U. (1974): Computer methods for sampling from gamma, beta, Poisson and binomial distributions. Computing, vol. 12, pp. 223–246.MathSciNetzbMATHCrossRefGoogle Scholar
- AKHIEZER, N.I. (1965): The Classical Moment Problem, Hafner, New York.zbMATHGoogle Scholar
- ALSMEYER, G. and IKSANOV, A. (2009): A log-type moment result for perpetuities and its application to martingales in supercritical branching random walks.’ Electronic Journal of Probability, vol. 14, pp. 289–313.MathSciNetzbMATHGoogle Scholar
- ASMUSSEN, S., GLYNN, P. and THORISSON, H. (1992): Stationary detection in the initial transient problem. ACM Transactions on Modeling and Computer Simulation, vol. 2, pp. 130–157.zbMATHCrossRefGoogle Scholar
- BAILEY, R.W. (1994): Polar generation of random variates with the t distribution (1994): Mathematics of Computation, vol. 62, pp. 779–781.MathSciNetzbMATHGoogle Scholar
- BONDESSON, L. (1982): On simulation from infinitely divisible distributions. Advances in Applied Probability, vol. 14, pp. 855–869.MathSciNetzbMATHCrossRefGoogle Scholar
- BOX, G.E.P. and MÜLLER, M.E. (1958): A note on the generation of random normal deviates. Annals of Mathematical Statistics, vol. 29, pp. 610–611.zbMATHCrossRefGoogle Scholar
- CHAMBERS J.M., MALLOWS, C.L. and STUCK, B.W. (1976): A method for simulating stable random variables. Journal of the American Statistical Association, vol. 71, pp. 340–344.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (1981a): The series method in random variate generation and its application to the Kolmogorov-Smirnov distribution. American Journal of Mathematical and Management Sciences, vol. 1, pp. 359–379.MathSciNetzbMATHGoogle Scholar
- DEVROYE, L. (1981b): The computer generation of random variables with a given characteristic function. Computers and Mathematics with Applications, vol. 7, pp. 547–552.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (1986a): Non-Uniform Random Variate Generation, Springer-Verlag, New York.zbMATHGoogle Scholar
- DEVROYE, L. (1986b): An automatic method for generating random variables with a given characteristic function. SIAM Journal of Applied Mathematics, vol. 46, pp. 698–719.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (1989): On random variate generation when only moments or Fourier coefficients are known. Mathematics and Computers in Simulation, vol. 31, pp. 71–89.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (1991): Algorithms for generating discrete random variables with a given generating function or a given moment sequence. SIAM Journal on Scientific and Statistical Computing, vol. 12, pp. 107–126.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (1996): Random variate generation in one line of code. In: 1996 Winter Simulation Conference Proceedings, Charnes, J.M., Morrice, D.J., Brunner D.T. and Swain J.J. (eds.), pp. 265–272, ACM, San Diego, CA.Google Scholar
- DEVROYE, L. (1997): Simulating theta random variates. Statistics and Probability Letters, vol. 31, pp. 2785–2791.MathSciNetCrossRefGoogle Scholar
- DEVROYE, L., FILL, J., and NEININGER, R. (2000): Perfect simulation from the quicksort limit distribution. Electronic Communications in Probability, vol. 5, pp. 95–99.MathSciNetzbMATHGoogle Scholar
- DEVROYE, L. (2001): Simulating perpetuities. Methodologies and Computing in Applied Probability, vol. 3, pp. 97–115.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. and NEININGER, R. (2002): Density approximation and exact simulation of random variables that are solutions of fixed-point equations. Advances of Applied Probability, vol. 34, pp. 441–468.MathSciNetzbMATHCrossRefGoogle Scholar
- DEVROYE, L. (2009): On exact simulation algorithms for some distributions related to Jacobi theta functions. Statistics and Probability Letters, vol. 21, pp. 2251–2259.MathSciNetCrossRefGoogle Scholar
- DEVROYE, L. and FAWZI, O. (2010): Simulating the Dickman distribution. Statistics and Probability Letters, vol. 80, pp. 242–247.MathSciNetzbMATHCrossRefGoogle Scholar
- FILL, J. (1998): An interruptible algorithm for perfect sampling via Markov chains. The Annals of Applied Probability, vol. 8, pp. 131–162.MathSciNetzbMATHCrossRefGoogle Scholar
- FILL, J.A. and HUBER, M (2009): Perfect simulation of perpetuities, To appear.Google Scholar
- FLAJOLET, P. and SAHEB, N. (1986): The complexity of generating an exponentially distributed variate. Journal of Algorithms, vol. 7, pp. 463–488.MathSciNetzbMATHCrossRefGoogle Scholar
- GOLDIE, C.M. and MALLER, R.A. (2000): Stability of perpetuities. Annals of Probability, vol. 28, pp. 1195–1218.MathSciNetzbMATHCrossRefGoogle Scholar
- GREEN, P.J. and MURDOCH, D.J. (2000): Exact sampling for Bayesian inference: towards general purpose algorithms (with discussion). In: Monte Carlo Methods, Bernardo, J.M., Berger, J.O., Dawid, A.P. and Smith, A.F.M. (eds.), pp. 301–321, Bayesian Statistics, vol. 6, Oxford university Press, Oxford.Google Scholar
- HASTINGS, C. (1955): Approximations for Digital Computers, Princeton University Press, Princeton, New Jersey.zbMATHGoogle Scholar
- HÖRMANN, W., LEYDOLD, J., and DERFLINGER, G. (2004): Automatic Nonuniform Random Variate Generation, Springer-Verlag, Berlin.zbMATHGoogle Scholar
- HUFFMAN, D. (1952): A method for the construction of minimum-redundancy codes. Proceedings of the IRE, vol. 40, pp. 1098–1101.CrossRefGoogle Scholar
- KANTER, M. (1975): Stable densities under change of scale and total variation inequalities. Annals of Probability, vol. 3, pp. 697–707.MathSciNetzbMATHCrossRefGoogle Scholar
- KEANE, M.S., and O’BRIEN, G.L. (1994): A Bernoulli factory. ACM Transactions on Modeling and Computer Simulation, vol. 4, pp. 213–219.zbMATHCrossRefGoogle Scholar
- KENDALL, W. (2004): Random walk CFTP. Thönnes ed., Department of Statistics, University of Warwick.Google Scholar
- KNUTH, D.E. and YAO, A.C. (1976): The complexity of nonuniform random number generation. in: Algorithms and Complexity, Traub, J.E. (ed.), pp. 357–428, Academic Press, New York, N.Y.Google Scholar
- MARSAGLIA, G. (1968): Random numbers fall mainly in the planes. Proceedings of the National Academy of Sciences, vol. 60, pp. 25–28.MathSciNetCrossRefGoogle Scholar
- MARSAGLIA, G. and ZAMAN, A. (1991): A new class of random number generators. Annals of Applied Probability, vol. 1, pp. 462–480.MathSciNetzbMATHCrossRefGoogle Scholar
- METROPOLIS, N., ROSENBLUTH, A., ROSENBLUTH, M., TELLER, A., and TELLER, E. (1953): Equations of state calculations by fast computing machines. Journal of Chemical Physics, vol. 21, p. 1087–1091.CrossRefGoogle Scholar
- MURDOCH, D.J. and GREEN, P.J. (1998): Exact sampling from a continous space. Scandinavian Journal of Statistics, vol. 25, pp. 483–502.MathSciNetzbMATHCrossRefGoogle Scholar
- PROPP, G.J. and WILSON, D.B. (1996): Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, vol. 9, pp. 223–252.MathSciNetzbMATH3.0.CO%3B2-O'>CrossRefGoogle Scholar
- RÖSLER, U. and RÜSHENDORF, L. (2001): The contraction method for recursive algorithms. Algorithmica, vol. 29, pp. 3–33.MathSciNetzbMATHCrossRefGoogle Scholar
- K. SATO (2000): Lévy Processes and Infinitely Divisible Distributions, Cambridge University Press, Cambridge.Google Scholar
- ULRICH, U. (1984): Computer generation of distributions on the m-sphere. Applied Statistics, vol. 33, pp. 158–163.MathSciNetzbMATHCrossRefGoogle Scholar
- VERVAAT, W. (1979): On a stochastic difference equation and a representation of non-negative infinitely divisible random variables. Advances in Applied Probability, vol. 11, pp. 750–783.MathSciNetzbMATHCrossRefGoogle Scholar
- VON NEUMANN, J. (1963): Various techniques used in connection with random digits. Collected Works, vol. 5, pp. 768–770, Pergamon Press. Also in (1951): Monte Carlo Method. National Bureau of Standards Series, Vol. 12, pp. 36-38.Google Scholar
- WILSON, D.B. (2000): Layered multishift coupling for use in perfect sampling algorithms (with a primer on CFTP). In: Monte Carlo Methods, Madras, N. (ed.), pp. 141–176, Fields Institute Communications, vol. 6, American Mathematical Society.Google Scholar
- ZOLOTAREV, V. M. (1959): On analytic properties of stable distribution laws. Selected Translations in Mathematical Statistics and Probability, vol. 1, pp. 207–211.Google Scholar
- ZOLOTAREV, V. M. (1966): On the representation of stable laws by integrals. Selected Translations in Mathematical Statistics and Probability, vol. 6, pp. 84–88.Google Scholar
- ZOLOTAREV, V. M. (1981): Integral transformations of distributions and estimates of parameters of multidimensional spherically symmetric stable laws. In: Contributions to Probability, pp. 283–305, Academic Press.Google Scholar
- ZOLOTAREV, V. M. (1986): One-Dimensional Stable Distributions, American Mathematical Society, Providence, R.I.Google Scholar