Running¶
Command line usage¶
Note
These instructions apply for manual runs of shark. If you want to run shark in an HPC system please refer to Running with shark-submit.
After successfully Building shark you can directly run it from the command line:
$> ./shark -h
This will print out detailed information about how to run shark.
shark requires, and accepts, a number of command line options:
-h
or-?
show the help message and exits.-V
shows version information and exits.-v <verbosity>
specifies how verbose should shark be. Values forverbosity
range from 0 to 5, with 0 being mute and 5 being extremely verbose.-t <threads>
specifies how many OpenMP threads to use to run shark. If 0 is given, then OpenMP will use its own default value. If shark is compiled without OpenMP support this option is ignored.-o <option>
specifies additional configuration values to use. See Specifying configuration values for details.
Any other argument is interpreted as the name of a configuration file to load. At least one configuration file is required for shark to run. See Specifying configuration values for details on how configuration works.
Exit code¶
Upon a successful run,
shark returns with an exit code equals to 0
,
or something different from 0
in case of any error.
Scalability¶
shark scales using two approaches:
- Using independent input data, and
- Using multiple CPUs with OpenMP.
We describe both approached here, and mention when they should be used.
Input data¶
shark input data (volumes) is usually divided into separate files (sub-volumes). These sub-volumes are self-sufficient, meaning that they can be processed independently.
Based on this, a shark instance can be commanded to process one or more sub-volumes. This extremely simple but flexible scheme allows for easy parallelisation based on input data. In other words, multiple shark instances can be independently executed to process a big number of sub-volumes in parallel. Note that this strategy does not require communication between instances, reducing both the complexity of the software and its dependencies (i.e., it does not require MPI to work).
This is the basic strategy used by the shark-submit
script
when running under an HPC environment.
OpenMP¶
During the main shark evolution loop, and for any snapshot, the evolution of galaxies belonging to different merger trees is independent from each other. This is the place in the code in which most of the time is spent, and thus shark parallelises the evolution of individual merger trees so they take place in different threads. Other parts of the code are parallelised as well.
shark uses OpenMP to carry out this parallelisation.
OpenMP is widely supported by compilers nowadays,
but not universally (see Building for details).
The number of threads to use is
specified on the command-line
(see Command line usage, option -t
),
and can be set to either a fixed number,
or to the default value provided by the OpenMP library.
Using OpenMP will result in a speed up in most cases, but only up to certain threshold when using more CPUs will not necessarily improve the runtime of shark.
Reproducibility¶
shark uses random number generators (RNGs) for drawing values out of certain probability distributions as part of some of its calculations, and hence it is inherently stochastic. However, the seed used to prime the RNGs can be manipulated in order to fully reproduce a previous execution. In particular:
- If no explicit seed is given, a different random seed is used each time. As a consequence, all runs will produce different outputs by default, regardless of any other factor.
- If a seed is given (via the
execution.seed
configuration option) shark guarantees that the exact same results will be produced each time the same seed value is given, for the same given inputs, configuration and software version.
The second point,
combined with the fact that the seed of an execution
is always recorded under the run_info
group
of the galaxies.hdf5 outputs
means that executions are fully reproducible.
Note also that the number of threads
used to execute shark
has no relevance on the reproducibility of results:
executions using different number of threads
but the same seed, inputs, configuration and software version
will yield the same results.