Up:Old Papers

Chapter 1
Cosmological N-body Simulation

George Lake       Neal Katz       Thomas Quinn       Joachim Stadel      

Abstract:

An epochal survey has thrown down the gauntlet for cosmological simulation. We describe three keys to meeting the challenge of N-body simulation: adaptive potential solvers, adaptive integrators and volume renormalization. With these techniques and a dedicated Teraflop facility, simulation can stay even with observation of the Universe.

The Scientific Importance of Cosmological N-body Simulation

Simulations are required to calculate the nonlinear final states of theories of structure formation as well as to design and analyze observational programs. Galaxies have six coordinates of velocity and position, but observations determine just two coordinates of position and the line-of-sight velocity that bundles the expansion of the Universe (the distance via Hubble's Law) together with random velocities created by the mass concentrations (see Figure 1). To determine the underlying structure and masses, we must use simulations. If we want to determine the structure of a cluster of galaxies, how large must the survey volume be? Without using simulations to define observing programs, the scarce resource of observing time on $2Billionspace observatories may be mispent. Finally, to test theories for the formation of structure, we must simulate the nonlinear evolution to the present epoch. This relationship to observational surveys defines our goal for the next decade. The Sloan Digital Sky Survey[8] will produce fluxes and sky positions for galaxies with redshifts for the brightest . Our ambitious observational colleagues have cut steel and ground glass to survey a ``fair volume" that we must simulate, but we need to do this. Direct summation of the gravitational forces using fixed timesteps would take Teraflop-years.

We will present the three keys to a realistic float count: 1) spatially adaptive potential solvers, 2) temporally adaptive integrators and 3) volume renormalizations. Another goal of this paper is to define ``high quality simulations" and the niche science that can be done with .

Figure Caption: The left panel shows a slice (100 Mpc on a side) from the final state of a cosmological simulation. The right panel shows the same slice looks in ``redshift" or ``observational" space. We must refine and test analysis techniques on simulated ``skies" where the underlying ground truth is known.

A Brief History of N

Over the last 20 years, the of our simulations has increased as: . To simulate particles, can we just wait until 2013? Computers in 1974 had speeds of flops, now they hit flops-a speed improvement. However, our current algorithms best this speed-up by evolving particles at a rate that is faster than the 1974 code. To reach with computer speed and algorithms contributing equally, we would have to achieve the impossible and make the cost of advancing a particle equal to one pairwise force evaluation! We will have to sidestep as well as timestep.

Declaration of N-dependence

The ``" obsession of -body simulators can be puzzling. There are a variety of problems where represents a minimum ante. For example, clusters of galaxies are extremely important for determining cosmological parameters such as the density of the Universe. Within a cluster, the galaxies are 1-10%of the mass, and there are roughly of them. If the galaxies have fewer than particles, they dissolve before the present epoch owing to two-body relaxation in the tidal field of the cluster. To prevent this, we need . Scaling to the Sloan Volume yields

is far smaller than the true number of particles and will compromise the physics of the system at some level. We use two principles to guide our science; make sure that: 1) the physics being examined has not been compromised and 2) gravitational softening, discrete timesteps, force accuracy and simulation volume don't make matters even worse. is not the figure of merit in most reported simulations-it should be! The N-body Constitution in the Appendix provides a set of necessary but not sufficient guidelines for N-body simulation. This evolving document can be viewed at http://www-hpcc.astro.washington.edu.

All the N that fits

There are two constraints on our choice of . The cost of computing a full simulation is floats. The memory needed to run a simulation is bytes. If we fix by filling memory, the time to run a simulation is Current machines are well balanced for our Grand Challenge simulations. With Gigaflops and Gigabytes, we can perform simulations with . With Teraflops and Terabytes, we can simulate particles. Simulations with lie in the nether world of Petaflops and Petabytes.

Parallel Adaptive N-body Solvers

Performance gains of the recent past and near future rely on parallel computers that reduce CPU-years to wall-clock-days. The challenge lies in dividing work amongst the processors while minimizing the latency of communication.

The dynamic range in densities demands that spatially adaptive methods be used. Our group is exploring both the adaptive mesh code[7] and tree-codes[4][1]. The latter use multipole expansions to approximate the gravitational acceleration on each particle. A tree is built with each node storing its multipole moments. Each node is recursively divided into smaller subvolumes until the final leaf nodes are reached. Starting from the root node and moving level by level toward the leaves of the tree, we obtain a progressively more detailed representation of the underlying mass distribution. In calculating the force on a particle, we can tolerate a cruder representation of the more distant particles leading to an method.

Since we only need a crude representation for distant mass, the concept of ``computational locality'' translates directly to spatial locality and leads to a natural domain decomposition. Experience has shown that the domain decomposition and the data structure for force calculation must be the same for efficient calculation. One approach is to build an octree for forces[4] and hash it to produce the domain decomposition[3].

We use a balanced k-D tree for both the domain decomposition[14] and the data structure. The tree is constructed by recursively bisecting the particle distribution along the longest axis. The lowest level nodes of this tree contain several particles (usually 8 to 32) whose force calculations are collectively optimized. Each domains is a single rectangular region of the upper levels of the tree. Pointers are unnecessary, each node in the tree can be indexed so that finding children, parents and siblings in the tree are simple bit shift operations[3].

This simple tree structure provides a fast and effective means to create portable parallel codes for N-body simulations, nearest-neighbor searching and group finding. The same code runs under PVM, the KSR's native pthreads and the INTEL's NX system. On the KSR1-64, a single timestep of particles takes about 10 minutes.

Hierarchical Timestepping

The great advance in the calculation of forces has come from hierarchical methods that are spatially adaptive. As the number of particles in a cosmological simulation grows, so do the density contrasts and the range of dynamical times (). If we take the final state of a simulation and weight the work done on particles inversely with their natural timesteps, we find a potential gain of of . Temporal adaptivity is one of the last algorithmic areas where we can squeeze an order of magnitude improvement.

The most commonly used time integration scheme for N-body simulations is leapfrog:

where is the position vector, is the velocity, is the acceleration, and is the timestep. The operator evolves the system under the Hamiltonian where is of order [15].

The existence of this surrogate Hamiltonian ensures that the leapfrog is symplectic-it is the exact solution of an approximate Hamiltonian. Errors explore the ensemble of systems close to the initial system rather than an ensemble of non-Hamiltonian time evolution operators near the desired one.

Leapfrog is a second-order symplectic integrator requiring only one costly force evaluation per timestep and only one copy of the physical state of the system. These properties are so desirable that we have concentrated on making an adaptive leapfrog. Unfortunately, simply choosing a new timestep for each leapfrog step evolves in a manner that may not be Hamiltonian, hence it is neither symplectic nor time-reversible. The results can be awful[5]. Time reversibility can be restored[11] if the timestep is determined implicitly from the state of the system at both the beginning and the end of the step. This requires backing up timesteps, throwing away expensive force calculations and using auxiliary storage. However, we can define an operator that ``adjusts'' the timestep, , yet retains time reversibility and only calculates a force if it is used to complete the timestep[13]. This is done by choosing such that it commutes with , so that is equivalent to . Since only changes the velocities, an operator that depends entirely on positions satisfies the commutation requirement. The ``natural definition" of timestep, , is ideal. Synchronization is maintained by choosing timesteps that are a power-of-two subdivision of the largest timestep, . That is, where is the timestep of a given particle[9][10]. We are currently experimenting with this approach and encourage others to look at variants.

Volume Renormalizations

The power of this technique has recently been shown in the simulation of rare quasar formation sites[12]. A large scale simulation is first done at modest resolution (particle mass of )- galaxies ``weigh" just particles. Current simulations with this resolution cover volumes of (100 Mpc) with particles. Within these volumes, regions of interest are identified. These can be sites of galaxy/QSO formation occuring at high redshift, a large cluster of galaxies or a structure matching our local group.

Next, initial conditions are reconstructed using the same low-frequency waves present in the low resolution simulation but adding the higher spatial frequencies. To reduce the number of particles and make the nonlinear simulation possible with the same cosmological context, we construct another set of initial conditions with particles whose mass, and therefore mean separation, increase with the distance from the center of our volume. Note that because tides are important in the formation of the filaments it is not sufficient to extract just the central region. Finally, the higher resolution simulations can be done adding gasdynamics using TREESPH[10].

Using this approach, we can simulate structures like the local Virgo cluster of galaxies using particles. A larger cluster like Coma requires particles. We can get the gross dynamics of the local group, capturing the larger satellites with particles.

Simulating the Sloan Volume

Our proposed program to simulate the Sloan Volume before the millenia is as follows:

The total cost for the first simulation is roughly a Teraflop-year and requires a machine with a Terabyte of memory. The second sequence of simulations should be designed to have roughly equal computational cost, but will require less memory.

References

1
A. W. Appel, An efficient Program for Many-Body Simulation, SIAM J. Sci. Stat. Comput., 6 (1985), pp. 85-93.

2
J. M. Bardeen, J. R. Bond, N. Kaiser and A. S. Szalay, The Statistics of Peaks of Gaussian Random Fields, Astrophys. J., 304, (1986), pp. 15-34.

3
J. Barnes, An efficient N-body algorithm for a fine-grain parallel computer", in The Use of Supercomputers in Stellar Dynamics, P. Hut and S. McMillan, eds., Springer Verlag, New York, (1986), pp. 175-80.

4
J. Barnes and P. Hut, A Hierarchical O(NlogN) Force-Calculation Algorithm, Nature, 324, (1986), pp. 446-50.

5
M.P. Calvo and J. M. Sanz-Serna, The development of variable-step symplectic integrators with application to the two-body problems, SIAM J. Sci. Comput., 14, (1993), pp. 936-52.

6
H.M.P. Couchman and R. G. Carlberg, Large-scale structure in a low-bias universe, Astrophys. J., 389, (1992), pp. 453-63.

7
H.M.P. Couchman, Mesh-refined PM: a fast adaptive N-body algorithm, Astrophys. J. Lett., 368, (1991), pp. L23-6.

8
J. E. Gunn and G. R. Knapp, The Sloan Digital Sky Survey, in Sky Surveys, Protostars to Protogalaxies, editor T. Soifer, Ast. Soc. of Pacific conference series, #43, (1992), pp. 267-79.

9
J.G. Jernigan and D. H. Porter, A tree code with logarithmic reduction of force terms, hierarchical regularization of all variables, and explicit accuracy controls, Astrophys. J. Suppl., 71, (1989), pp. 871-93.

10
L. Hernquist and N. Katz, TREESPH, a unification of SPH with the hierarchical tree method, Astrophys. J. Suppl., 70, (1989), pp. 419-46.

11
P. Hut, J. Makino and S. McMillan, preprint, (1994).

12
N. Katz, T. Quinn, E. Bertschinger and J. M. Gelb, Formation of quasars at high redshift, Mon. Not. Roy. Astr. Soc., in press, (1994).

13
T. Quinn, N. Katz, J. Stadel and G. Lake, Time stepping N-body simulations, in preparation.

14
J. K. Salmon, Parallel Hierarchical N-body Methods, Ph. D. Thesis, California Institute of Technology, 1990.

15
P. Saha and S. Tremaine, Symplectic integrators for solar system dynamics, Astron. J., 104, (1994), pp. 1633-40.

Appendix: The N-body Constitution, (short form without Preamble)

Article I: On the Gravitational Softening Length Section 1-The gravitational softening length should be large enough to minimize the effects of two body relaxation.

There should be at least 8 particles in each softening volume in objects of interest.

Article II: On the Size of Time steps Section 1-Time steps should be chosen to be small enough to eliminate the effects of two body scattering introduced through integration errors.

The time step must satisfy For an isothermal sphere this is equivalent to for a standard Plummer softening. For a typical cosmological simulation Spline kernel softening requires: 33%more time steps.

Article III: On the Accuracy of Forces Section 1-Forces should be calculated with a maximum absolute and relative error.

The error should always be less than 0.5%of the force or the rms force, whichever is less.

Article IV: On the Accuracy of the Integrator Section 1-The integrator must be second order and time reversible or symplectic.

Article V: On the Size of the Simulation Volume Section 1-The simulation volume must be large enough to model all non-linear effects.

Mode couplings between large and small scales can increase power on small scales. In particular, filaments greatly effect the gravitational evolution at small scales. The diameter of the simulation for a CDM spectrum with must be at least 40h Mpc. Models with more large scale power will require larger volumes.

Section 2-No one object should dominate the evolution of the simulated volume.

To prevent aliases and to keep objects from artificially lowering the mean background density, no virialized object may contain more than 1/10 the total simulated mass.

Section 3-The simulation must have a large enough volume that fundamental mode remains well within the linear regime.

Accurate evolution of the fundamental mode requires periodic boundary conditions.

Article VI: On the Starting Redshift of the Simulation

Section 1-The simulation must start at a redshift high enough to ensure that all represented mass scales are still in the linear regime.

In particular, if the initial conditions are generated using the Zel'dovich approximation, the starting redshift must be sufficiently high to ensure that the absolute maximum . For example, a CDM simulation with in a cubic volume 40h Mpc on a side simulated with particles would require a starting redshift of at least .

Article VII: What Ratification Shall Establish Constitution Section 1-This constitution must be ratified by the representatives of at least two different HPCC groups at a constitutional convention by late 1995.

The effects of violating the above articles are not known. Some violations will introduce unphysical dissipative effects while others will introduce errors that act like an artificial heat source.

About this document ...

This document was generated using the LaTeX2HTML translator Version 0.5.3 (Wed Jan 26 1994) Copyright © 1993, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 siamhtml.tex.

The translation was initiated by lake@phys.washington.edu on Fri Oct 7 10:51:20 PDT 1994





lake@phys.washington.edu
Fri Oct 7 10:51:20 PDT 1994