Wednesday, July 17, 2019
Computational Efficiency of Polar
Lecture Notes on three-card monte Carlo Methods Fall Semester, 2005 Courant Institute of Mathematical Sciences, NYU Jonathan Goodman, emailprotected nyu. edu Chapter 2 Simple Sampling of Gaussians. created August 26, 2005 Generating univariate or variable Gaussian ergodic variables is unprejudiced and fast. There should be no eggshell ever to use approximate orders based, for example, on the Central limit theorem. 1 quoin Muller It would be nice to jerk off a banner beaten(prenominal) from a exemplar uniform by inverting the statistical dispersal do work, but there is no unappealing form polity for this dispersion 2 x unction N (x) = P (X x) = v1 ? e? x /2 dx . The case Muller system is a 2 brilliant trick to overcome this by producing devil separatist monetary popular expressions from two freelancer uniforms. It is based on the acquainted(predicate) trick for calculating ? 2 e? x I= /2 dx . This bathroomnot be mea undisputabled by integration the inde? nite respectable does not start an algebraic aspect in term of elementary functions (exponentials, logs, clear functions). However, ? 2 e? x I2 = ? /2 e? y dx 2 ? /2 ? 2 e? (x dy = +y 2 )/2 dxdy . The last built-in piece of tail be calculated use polar organises x = r cos(? ), y = r sin(? with ara element dxdy = rdrd? , so that 2? I2 = r = 0? e? r 2 /2 rdrd? = 2? r = 0? e? r 2 /2 rdr . ? =0 irrelevant the original x intact, this r integral is elementary. The substitution s = r2 /2 pull up stakess ds = rdr and ? e? s ds = 2? . I 2 = 2? s=0 The Box Muller algorithmic program is a probabilistic interpretation of this trick. If (X, Y ) is a pair of autarkical old-hat rulers, accordingly the probability immersion is a product 2 2 1 1 ? (x2 +y2 )/2 1 e . f (x, y ) = v e? x /2 v e? y /2 = 2? 2? 2? 1 Since this parsimoniousness is radially symmetric, it is natural to consider the polar coordinate ergodic variables (R, ? de? ned by 0 ? ? 2? and X = R cos(? ), and Y = R sin(? ). Clearly ? is uniformly distributed in the breakup 0, 2? and whitethorn be heard use ? = 2? U1 . Unlike the original distribution function N (x), there is a round-eyed expression for the R distribution function 2? r G(R) = P (R ? r) = r =0 ?=0 r 1 ? r 2 /2 e rdrd? = 2? e? r 2 /2 rdr . r =0 The kindred sort of variable r 2 /2 = s, r dr = ds (so that r = r when s = r2 /2) allows us to calculate r 2 /2 e? s dx = 1 ? e? r G(r) = 2 /2 . s=0 thereof, we whitethorn archetype R by solving the distribution function comparability1 G(R) = 1 ? e? R 2 /2 = 1 ?U2 , whose solution is R = ? 2 ln(U2 ). Alto puffher, the Box Muller method mothers independent tired uniform ergodic variables U1 and U2 and breaks independent standard recipes X and Y using the formulas ? = 2? U1 , R = ?2 ln(U2 ) , X = R cos(? ) , Y = R sin(? ) . (1) It whitethorn seem odd that X and Y in (13) are independent given that they use the same R and ?. Not altogether does our algebra shows that this is true, but we flush toilet test the independence countingally, and it bequeath be con? rmed. Part of this method was generating a menses at hit-or-miss on the unit of measurement roundab prohibited. We suggested doing this by choosing ? niformly in the interval 0, 2? and so taking the point on the circle to be (cos(? ), sin(? )). This has the viable draw indorse that the computer must evaluate the sine and cosine functions. Another way to do this2 is to choose a point uniformly in the 2 ? 2 square toes ? 1 ? x ? 1, 1 ? y ? 1 then rejecting it if it falls external the unit circle. The ? rst accepted point will be uniformly distributed in the unit disk x2 + y 2 ? 1, so its angle will be random and uniformly distributed. The ? nal measure is to get a point on the unit circle x2 + y 2 = 1 by dividing by the continuance.The methods puddle equal accuracy (both are convey in exact arithmetic). What distinguishes them is computer performance (a topic talk ofed to a greater extent in a later lecture, hopefully). The rejection method, with an acceptance probability ? ? 4 78%, seems e? cient, but rejection tail assembly break the mastery pipeline and slow a computation by a factor of ten. Also, the square root subscribe toed to compute 1 Recall that 1 ? U2 is a standard uniform if U2 is. for example, in the dubious bear Numerical Recipies. 2 Suggested, 2 the length may not be accelerated to evaluate than sine and cosine.Moreover, the rejection method uses two uniforms while the ? method uses besides one. The method can be reversed to ferment another sampling problem, generating a random point on the unit spnere in Rn . If we dumb lay out n independent standard normals, then the vector X = (X1 , . . . , Xn ) has all angles equally n in all likelihood (because the probability density is f (x) = v1 ? exp(? (x2 + +x2 )/2), n 1 2 which is radially symmetric. because X/ X is uniformly distributed on the unit sphere, as desired. 1. 1 s eparate methods for univariate normals The Box Muller method is delightful and passably fast and is ? ne for casual omputations, but it may not be the best method for hard upshot users. M both software packages have inhering standard normal random topic generators, which (if they are every good) use like an expert optimized methods. There is very fast and consummate software on the web for presently inverting the normal distribution function N (x). This is particularly important for similar four-card monte Carlo, which substitutes equidistributed sequences for random sequences (see a later lecture). 2 Multivariate normals An n fragment multivariate normal, X , is characterized by its slopped = E X and its co sectionalization intercellular substance C = E (X ? )(X ? )t .We discuss the problem of generating such an X with rigorous null, since we achieve mean by tallying to a mean nonentity multivariate normal. The profound to generating such an X is the fact tha t if Y is an m component mean nought multivariate normal with covariance D and X = AY , then X is a mean zero multivariate normal with covariance t C = E X X t = E AY (AY ) = AE Y Y t At = ADAt . We know how to sample the n component multivariate normal with D = I , just recognise the components of Y to be independent univariate standard normals. The formula X = AY will produce the desired covariance ground substance if we ? nd A with AAt = C .A simple way to do this in practice is to use the Choleski buncombe from numerical bi running(a) algebra. This is a simple algorithm that produces a lower angular matrix, L, so that LLt = C . It works for any positive de? nite C . In visible applications it is common that one has not C but its inverse, H . This would happen, for example, if X had the Gibbs-Boltzmann distribution with kT = 1 (its easy to change this) and energy 1 X t HX , and probability 2 1 density Z exp(? 1 X t HX ). In large scale carnal problems it may be impracti2 cal to calculate and gillyflower the covariance matrix C = H ? though the Choleski factoring H = LLt is available. Note that3 H ? 1 = L? t L? 1 , so the choice 3 It is traditional to write L? t for the transpose of L? 1 , which too is the inverse of Lt . 3 A = L? t works. Computing X = L? t Y is the same as solving for X in the equation Y = Lt X , which is the process of back substitution in numerical linear algebra. In some applications one knows the eigenvectors of C (which also are the eigenvectors of H ), and the identical eigen take to bes. These (either the eigenvectors or the eigenvectors and eigen determine) some ms are called booster cable com2 ponents.Let qj be the eigenvectors, normalized to be orthonormal, and ? j the corresponding eigen apprises of C , so that 2 Cqj = ? j qj , t qj qk = ? jk . t Denote the qj component of X by Zj = qj X . This is a linear function of X and t therefore Gaussian with mean zero. Its variance (note Zj = Zj = X t qj ) is 2 t t t 2 E Zj = E Zj Zj = qj E XX t qj = qj Cqj = ? j . A similar advisement shows that Zj and Zk are uncorrelated and hence (as components of a multivariate normal) independent. Therefore, we can generate Yj as independent standard normals and sample the Zj using Zj = ? j Yj . (2) After that, we can get an X using Zj qj . X= (3) j =1 We restate this in matrix toll. Let Q be the orthogonal matrix whose columns are the orthonormal eigenvectors of C , and let ? 2 be the diagonal ma2 trix with ? j in the (j, j ) diagonal position. The eigenvalue/eigenvector relations are CQ = Q? 2 , Qt Q = I = QQt . (4) The multivariate normal vector Z = Qt X then has covariance matrix E ZZ t = E Qt XX t Q = Qt CQ = ? 2 . This says that the Zj , the components of Z , are 2 independent univariate normals with variances ? j . Therefore, we may sample Z by choosing its components by (14) and then reconstruct X by X = QZ , which s the same as (15). Alternatively, we can calculate, using (17) that t C = Q? 2 Qt = Q Qt = (Q? ) (Q? ) . Therefore A = Q? satis? es AAt = C and X = AY = Q? Y = QZ has covariance C if the components of Y are independent standard univariate normals or 2 the components of Z are independent univariate normals with variance ? j . 3 Brownian crusade examples We illustrate these ideas for diverse kids of Brownian motion. Let X (t) be a Brownian motion rails. Choose a ? nal m t and a time step ? t = T /n. The 4 ceremonial occasion times will be tj = j ? t and the utterances (or observation value) will be Xj = X (tj ).These observations may be assembled into a vector X = (X1 , . . . , Xn )t . We seek to generate sample observation vectors (or observation meanss). How we do this depends on the term conditions. The simplest case is standard Brownian motion. Specifying X (0) = 0 is a Dirichlet marge condition at t = 0. look nothing about X (T ) is a free (or Neumann) condition at t = T . The joint probability density for the observation vector, f (x) = f (x1 , . . . , xn ), is found by multiplying the conditional densities. abandoned Xk = X (tk ), the next observation Xk+1 = X (tk + ? ) is Gaussian with mean Xk and variance ? t, so its conditional density is v 2 1 e? (xk+1 ? Xk ) /2? t . 2? ?t engender these together and use X0 = 0 and you ? nd (with the convention x0 = 0) f (x1 , . . . , xn ) = 3. 1 1 2? ?t n/2 exp ?1 2 ? Deltat n? 1 (xk+1 ? xk )2 . (5) k=0 The random strait method The simplest and possibly best way to generate a sample observation path, X , comes from the derivation of (1). first gear generate X1 = X (? t) as a mean zero v univariate normal with mean zero and variance ? t, i. e. X1 = ? tY1 . Given X1 , X2 is a univariate normal with mean X1 and variance ? , so we may v take X2 = X1 + ? tY2 , and so on. This is the random crack method. If you just take to make standard Brownian motion paths, stop here. We push on for pedigogical purposes and to sire strategies that apply to other types of Brownian motion. We describe t he random walk method in terms of the matrices above, starting by identifying the matrices C and H . Examining (1) leads to ? 2 ? 1 0 ? ? ? 1 2 ? 1 0 ? ? .. .. .. . . . 1 ? 0 ? 1 ? H= ?. .. ?t ? . . 2 ? 1 ?. ? .. ? . ? 1 2 0 0 ? 1 ? 0 .? .? .? ? ? ? ? 0? ? ? ?1 ? 1 This is a tridiagonal matrix with pattern ? 1, 2, ? except at the bottom decline corner. One can calculate the covariances Cjk from the random walk delegation v Xk = ? t (Y1 + + Yk ) . 5 Since the Yj are independent, we have Ckk = var(Xk ) = ? t k var(Yj ) = tk , and, supposing j k , Cjk = E Xj Xk = ? tE ((Y1 + + Yj ) + (Yj +1 + + Yk )) (Y1 + + Yj ) = 2 ?tE (Y1 + + Yj ) = tj . These combine into the familiar formula Cjk = cov(X (tj ), X (tk )) = min(tj , tk ) . This is the same as saying that the ? 1 ?1 ? ?. ?. C = ? t ? . ? ? ? 1 matrix C is 1 2 2 2 . . . 3 . . . 2 3 ? 1 2? ? ? 3? .? .? .? .. . (6) The random walk method for generating X may be expresses as ? ? ? Y ? X1 1 1 0 01 ? ? ? ?1 1 0 0 ? ? . ? ?.? ?.? v? ? . ? ?.? 1 0 . . ? . .? ? . ? = ? t ? 1 1 ? ? ? ? ?. . .. ? ? ? ?. . . .. ? ? ? ? 11 1 1 Yn Xn Thus, X = AY with ? ? 1 0 01 ?1 1 0 0 ? ? ? v? .? .? . ?1 1 1 0 .? A = ? t ? ?. . ? .. .. ?. . ? . 11 1 1 (7) The lecturer should do the matrix multiplication to brand that indeed C = AAt for (6) and (7). Notice that H is a sparse matrix indicating soon range interactions while C is full indicating long range correlations.This is true of in great number of physical applications, though it is rare to have an explicit formula for C . 6 We also can calculate the Choleski factorization of H . The subscriber can convince herself or himself that the Choleski factor, L, is bidiagonal, with nonzeros only on or immediately infra the diagonal. However, the formulas are simpler if we reverse the order of the coordinates. Therefore we de? ne the coordinate reversed observation vector t X = (Xn , xn? 1 , . . . , Xn ) and whose covariance matrix is ? tn ? tn? 1 ? C=? . ?. . t1 tn? 1 tn? 1 t1 t1 .. . ? ? ? , ? t1 and energy matrix ? 1 ? 1 0 ? 0 .? .? .? ? ? ?. ? 0? ? ? ?1 ? 2 ? ? ? 1 2 ? 1 0 ? ? .. .. .. . . . 1 ? 0 ? 1 ? H= .. ?t ? . . ?. . 2 ? 1 ? ? .. ? . ? 1 2 0 0 ? 1 We seek the Choleski factorization H = LLt ? l1 0 ? m2 l2 1? L= v ? m3 ?t ? 0 ? . .. . . . with bidiagonal ? ? 0 ? ?. .. ? . ? .. . Multiplying out H = LLt leads to equations that successively regularise the lk and mk 2 l1 l 1 m2 2 2 l1 + l 2 l 2 m3 = 1 =? l1 = 1 , = ? 1 =? m2 = ? 1 , = 2 =? l2 = 1 , = 1 =? m3 = ? 1 , and so on , The result is H = LLt with L simply ? 1 0 ? ? 1 10 1? .. L= v ? . ?t ? ? 1 ? . .. .. . . . . 7 ? ? ? ?. ? ? The sampling algorithm using this Y = Lt X ? ? ? 1 Yn ? Yn? 1 ? ? ? ? ?0 ? ? 1? ? ? ? ? . ?= v ? ?.? ?t ? ?.? ?. ? ? ?. . Y1 0 information is to ? nd X from Y by solving ?1 0 1 .. . ?1 .. . .. . 0 0 Xn . ? ? Xn? 1 . . . 0 . . ?1 X1 1 ? ? ? ? ? ? ? ? ? Solving from the bottom up (back substitution), we have Y1 = Y2 = v 1 v X1 =? X1 = ? tY1 , ?t v 1 v (X2 ? X1 ) =? X2 = X1 + ? tY2 , etc. ?t This whole process turns out to give the same random walk sampling method. Had we not gone to the time reversed (X , etc. variables, we could have calculated the bidiagonal Choleski factor L numerically. This works for any problem with a tridiagonal energy matrix H and has a name in the control theory/estimation publications that escapes me. In particular, it will allow to ? nd sample Brownian motion paths with other boundary conditions. 3. 2 The Brownian tide over construction The Brownian bridge construction is useful in the numeric theory of Brownian motion. It also is the basis for the success of quasi monte Carlo methods in ? nance. conjecture n is a power of 2 n = 2L . We will construct the observation path X through a sequence of L re? ements. First, attain that Xn is a univariate normal with mean zero and variance T , so we may take (with Yk,l being independent standard normals) v Xn = T Y1,1 . Give n the value of Xn , the midoint observation, Xn/2 , is a univariate normal4 with mean 1 Xn and variance T /4, so we may take 2 Xn 2 v 1 T = Xn + Y2,1 . 2 2 At the ? rst direct, we chose the endpoint value for X . We could draw a ? rst take aim path by connenting Xn to zero with a unbent line. At the help take, or ? rst re? nement, we created a midpoint value. The second level path could be piecewise linear, connecting 0 to X n to Xn . 4 We assign this and related claims below as exercises for the student. 8 The second re? nement level creates set for the soak up points. Given n X n , X n is a normal with mean 1 X n and variance 1 T . Similarly, X 34 is a 2 42 2 4 2 1 1T normal with mean 2 (X n + Xn ) and variance 4 2 . Therefore, we may take 2 Xn = 4 1 1 Xn + 22 2 T Y3,1 2 and n X 34 = 1 1 (X n + Xn ) + 2 2 2 T Y3,2 . 2 1 The level three path would be piecewise linear with breakpoints at 1 , 2 , and 3 . 4 4 Note that in each case we add a mean zero normal of the appropriate v ariance to the linear interpolation value.In the general step, we go from the level k ? 1 path to the level k paths by creating values for the midpoints of the level k ? 1 intervals. The level k observations are X j . The values with even j are cognise from the previous 2k? 1 level, so we need values for odd j . That is, we want to interpolate between the j = 2m value and the j = 2m + 2 value and add a mean zero normal of the appropriate variance X (2m+1)n = 2k? 1 1 2 mn X 2k? 1 + X (2m+2)n 2 2k? 1 + 1 2(k? 2)/2 T Ym,k . 2 The reader should check that the vector of standard normals Y = (Y1,1 , Y2,1 , Y3,1 , Y3,2 , . . . t indeed has n = 2L components. The value of this method for quasi three-card monte Carlo comes from the fact that the nigh important values that determine the large scale structure of X are the ? rst components of Y . As we will see, the components of the Y vectors of quasi Monte Carlo have uneven quality, with the ? rst components being the best. 3. 3 Principle components The principle component eigenvalues and eigenvectors for many types of Brownian motion are known in closed form. In many of these cases, the betting Fourier Transform (FFT) algorithm leads to a reasonably fast sampling method.These FFT based methods are slower than random walk or Brownian bridge sampling for standard random walk, but they sometimes are the most e? cient for fractional Brownian motion. They may be better than Brownian bridge sampling with quasi Monte Carlo (Im not sure about this). The eigenvectors of H are known5 to have components (qj,k is the k th component of eigenvector qj . ) qj,k = const sin(? j tk ) . 5 See e. g. Numerical depth psychology by Eugene Isaacson and Herbert Keller. 9 (8) The n eigenvectors and eigenvalues then are located by the allowed values of ? j , which, in turn, are determined throught the boundary conditions.We 2 2 can ? nd ? j in terms of ? j using the eigenvalue equation Hqj = ? j qj evaluated at any of the interior compon ents 1 k n 1 2 ? sin(? j (tk ? ?t)) + 2 sin(? j tk ) ? sin(? j (tk + ? t)) = ? j sin(? j tk ) . ?t Doing the mathematics shown that the eigenvalue equation is satis? ed and that 2 ?j = 2 1 ? cos(? j ? t) . ?t (9) The eigenvalue equation also is satis? ed at k = 1 because the form (8) automatically satis? es the boundary condition qj,0 = 0. This is wherefore we used the sine and not the cosine. only if special values ? j give qj,k that satisfy the eigenvalue equation at the right boundary point k = n. 10
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.