
George Lake Neal Katz Thomas Quinn Joachim Stadel
Simulations are required to calculate the nonlinear final states
of theories of structure formation as well as to design and analyze
observational programs.
Galaxies have
six coordinates of velocity and position, but observations
determine just two coordinates of position and
the line-of-sight velocity that bundles the
expansion of the Universe (the distance via Hubble's Law)
together with random velocities created by
the mass concentrations (see Figure 1).
To determine the underlying
structure and masses, we must use simulations.
If we want to determine the structure of a cluster of galaxies,
how large must the survey volume be?
Without using simulations to define observing programs, the scarce
resource of observing time on $2Billionspace observatories
may be mispent. Finally, to test theories
for the formation of structure, we must simulate the nonlinear evolution
to the present epoch.
This relationship to observational surveys defines our
goal for the next
decade. The Sloan Digital Sky Survey[8] will produce fluxes and
sky positions for galaxies with redshifts for the
brightest
.
Our ambitious observational colleagues have cut steel and ground glass
to survey a ``fair volume" that we must simulate, but
we need
to do this.
Direct summation of the gravitational forces using fixed timesteps
would take
Teraflop-years.
We will present the three keys to a realistic
float count:
1) spatially adaptive potential solvers,
2) temporally adaptive integrators
and 3) volume renormalizations.
Another goal
of this paper is to define ``high quality simulations"
and the niche science that can be done with .
Figure Caption: The left panel shows a slice (100 Mpc on a side) from the final state of a cosmological simulation. The right panel shows the same slice looks in ``redshift" or ``observational" space. We must refine and test analysis techniques on simulated ``skies" where the underlying ground truth is known.
Over the last 20 years, the of our simulations has increased as:
.
To simulate
particles, can we just
wait until 2013?
Computers in 1974 had speeds of
flops, now they hit
flops-a
speed improvement.
However, our current
algorithms best this speed-up by evolving
particles at a rate that is
faster than the 1974 code.
To reach
with computer speed and algorithms contributing equally, we would have
to achieve the impossible and
make the cost of advancing a particle equal to
one pairwise force evaluation! We will have to sidestep as well as
timestep.
is far smaller than the true number of particles
and will compromise the physics of the
system at some level. We use two principles to guide our science; make sure
that:
1) the physics being examined has not been compromised
and 2) gravitational softening, discrete timesteps,
force accuracy and simulation volume don't make matters even worse.
is not the
figure of merit in most reported simulations-it should be!
The N-body Constitution in the Appendix
provides a set of necessary but not sufficient
guidelines for N-body simulation.
This evolving document can be viewed at
http://www-hpcc.astro.washington.edu.
Performance gains of the recent past and near future rely on parallel computers that reduce CPU-years to wall-clock-days. The challenge lies in dividing work amongst the processors while minimizing the latency of communication.
The dynamic range in densities demands that spatially
adaptive methods be used. Our group is exploring both the
adaptive mesh code[7] and
tree-codes[4][1].
The latter use multipole expansions to approximate the
gravitational acceleration on each particle. A tree is built with
each node storing its multipole moments.
Each node is recursively divided into smaller subvolumes until
the final leaf nodes are reached.
Starting from the root node and moving level by level toward
the leaves of the tree, we obtain a progressively more detailed
representation of the underlying mass distribution. In calculating the
force on a particle, we can tolerate a cruder representation
of the more distant particles leading to an method.
Since we only need a crude representation for distant mass, the concept of ``computational locality'' translates directly to spatial locality and leads to a natural domain decomposition. Experience has shown that the domain decomposition and the data structure for force calculation must be the same for efficient calculation. One approach is to build an octree for forces[4] and hash it to produce the domain decomposition[3].
We use
a balanced k-D tree for both the domain decomposition[14]
and the data structure.
The tree is constructed by recursively bisecting
the particle distribution
along the longest axis. The lowest level
nodes of this tree contain several particles (usually 8 to 32)
whose force calculations are collectively optimized.
Each domains is a single rectangular region of the
upper
levels of the tree.
Pointers are
unnecessary, each node in the tree can be indexed so
that finding children, parents and siblings in the tree are simple
bit shift operations[3].
This simple tree structure
provides a fast and effective means to create
portable parallel codes for N-body simulations,
nearest-neighbor searching and group finding.
The same code runs
under PVM, the KSR's native pthreads and the INTEL's
NX system.
On the KSR1-64, a single timestep of particles takes about 10
minutes.
The great advance in the calculation of forces has come from hierarchical
methods that are spatially adaptive.
As the number of particles in a cosmological simulation
grows, so do the density contrasts and the
range of dynamical times ().
If we
take the final
state of a simulation and weight the work done on particles
inversely with their natural timesteps, we find a potential gain of
of
.
Temporal
adaptivity is one of the last algorithmic areas where we can squeeze
an order of magnitude improvement.
The most commonly used time integration scheme for N-body simulations is leapfrog:
where is the position vector,
is the
velocity,
is the acceleration, and
is the timestep.
The operator
evolves the system under the Hamiltonian
where
is of order
[15].
The existence of this surrogate Hamiltonian ensures that the leapfrog is symplectic-it is the exact solution of an approximate Hamiltonian. Errors explore the ensemble of systems close to the initial system rather than an ensemble of non-Hamiltonian time evolution operators near the desired one.
Leapfrog is a second-order symplectic integrator requiring only one costly
force evaluation per timestep and only one copy of the physical state of the
system.
These properties are so desirable that we have concentrated
on making an adaptive leapfrog.
Unfortunately, simply choosing a
new timestep for each leapfrog step evolves
in a manner that may not be Hamiltonian, hence it is
neither symplectic nor time-reversible. The results can be awful[5].
Time reversibility
can be restored[11] if the timestep is
determined implicitly from the state of the system at both
the beginning and the end of the step.
This requires backing up timesteps,
throwing away expensive force calculations and using
auxiliary storage.
However, we can define
an operator that ``adjusts'' the timestep,
, yet retains
time reversibility and only calculates a force if it is used
to complete the timestep[13]. This is done by choosing
such that it
commutes with
, so that
is equivalent to
. Since
only changes the velocities, an
operator
that depends entirely on positions satisfies the commutation
requirement. The ``natural definition" of timestep,
, is ideal.
Synchronization is maintained by choosing
timesteps that are a power-of-two subdivision of the largest timestep,
. That is,
where
is the timestep of a given particle[9][10].
We are currently experimenting with this approach and encourage others
to look at variants.
The power of this technique has recently been shown in the simulation
of rare quasar formation sites[12].
A large scale simulation is first done at modest
resolution (particle mass of )-
galaxies ``weigh" just
particles.
Current simulations with this resolution cover volumes
of (100 Mpc)
with
particles. Within these volumes,
regions of interest are identified. These can be sites of galaxy/QSO
formation occuring at high redshift, a large cluster
of galaxies or a structure matching our local group.
Next, initial conditions are reconstructed using the same low-frequency waves present in the low resolution simulation but adding the higher spatial frequencies. To reduce the number of particles and make the nonlinear simulation possible with the same cosmological context, we construct another set of initial conditions with particles whose mass, and therefore mean separation, increase with the distance from the center of our volume. Note that because tides are important in the formation of the filaments it is not sufficient to extract just the central region. Finally, the higher resolution simulations can be done adding gasdynamics using TREESPH[10].
Using this approach, we can simulate structures like
the local Virgo cluster of galaxies using particles. A larger
cluster like Coma requires
particles. We can get the gross
dynamics of the local group, capturing the larger satellites with
particles.
Our proposed program to simulate the Sloan Volume before the millenia is as follows:
Article I: On the Gravitational Softening Length Section 1-The gravitational softening length should be large enough to minimize the effects of two body relaxation.
There should be at least 8 particles in each softening volume in objects of interest.
Article II: On the Size of Time steps Section 1-Time steps should be chosen to be small enough to eliminate the effects of two body scattering introduced through integration errors.
The time step must satisfy
For an isothermal sphere this is equivalent to
for a standard Plummer softening. For a typical cosmological simulation
Spline kernel softening requires: 33%more
time steps.
Article III: On the Accuracy of Forces Section 1-Forces should be calculated with a maximum absolute and relative error.
The error should always be less than 0.5%of the force or the rms force, whichever is less.
Article IV: On the Accuracy of the Integrator Section 1-The integrator must be second order and time reversible or symplectic.
Article V: On the Size of the Simulation Volume Section 1-The simulation volume must be large enough to model all non-linear effects.
Mode couplings between large and small scales can increase power
on small scales. In particular, filaments greatly
effect the gravitational evolution at small scales. The diameter of the
simulation for a CDM spectrum with must be at least 40h
Mpc. Models with more large scale power will require larger volumes.
Section 2-No one object should dominate the evolution of the simulated volume.
To prevent aliases and to keep objects from artificially lowering the mean background density, no virialized object may contain more than 1/10 the total simulated mass.
Section 3-The simulation must have a large enough volume that fundamental mode remains well within the linear regime.
Accurate evolution of the fundamental mode requires periodic boundary conditions.
Article VI: On the Starting Redshift of the Simulation
Section 1-The simulation must start at a redshift high enough to ensure that all represented mass scales are still in the linear regime.
In particular, if the initial conditions are generated using the Zel'dovich
approximation, the starting redshift must be sufficiently high to ensure
that the absolute maximum . For example, a CDM simulation with
in a cubic volume 40h
Mpc on a side simulated with
particles would require a starting redshift of at least
.
Article VII: What Ratification Shall Establish Constitution Section 1-This constitution must be ratified by the representatives of at least two different HPCC groups at a constitutional convention by late 1995.
The effects of violating the above articles are not known. Some violations will introduce unphysical dissipative effects while others will introduce errors that act like an artificial heat source.
This document was generated using the LaTeX2HTML translator Version 0.5.3 (Wed Jan 26 1994) Copyright © 1993, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 siamhtml.tex.
The translation was initiated by lake@phys.washington.edu on Fri Oct 7 10:51:20 PDT 1994