Optimization
shark ships with an optimization package to allow users easily explore different shark parameter sets in different scenarios.
Specifying the search space
The parameter search space to be used by the optimization routine is specified in a text file, where each row fully describes a parameter.
Each row is a comma-separated list of values looking like this:
parameter-name, plot-label, is_log, lb, up
where:
parameter-nameis the name of the shark option as given in the command line (e.g.,reincorporation.tau_reinc).
plot-labelis the label that will appear on any plots produced to diagnose the execution.
is_logshould be1if the parameter should be optimized for in the log space, or0if it should be optimized linearly.
lbandubdefine the parameter’s search space boundaries. Ifis_logis1these values should already be logged.
As an example:
reincorporation.tau_reinc, $\log_{10}(\tau_{reinc})$, 1, 0, 1.4771212547196624
reincorporation.mhalo_norm, $\log_{10}(M_{norm})$, 1, 9, 12
reincorporation.halo_mass_power, $\gamma$, 0, -3, 0
Running
To run the optimization routines for shark
you need to enter to the optim subdirectory
of the shark git repository
and run the main.py script.
Run main.py -h to get a comprehensive list
of all options that can be passed down to the script,
including options specific to
PSO,
HPC environments
and more.
Optimization methods
Currently the only supported optimization method is the Particle Swarm Optimization (PSO), but with time we plan to add more optimization methods.
PSO
When running a PSO optimization for shark, users can specify a number of options:
-s SWARM_SIZEindicates the swarm size (i.e., the number of particles to use). It defaults to10 + sqrt(D) * 2, whereDis the number of dimensions of the problem, which corresponds to the number of parameters being fitted.
-m MAX_ITERATIONSis the number of maximum iterations that PSO should run for before giving up. Otherwise PSO will automatically stop when the particles start converging within certain limits (1e-8in particle step differences or objective function changes).
Evaluation functions
When comparing model data against observational data,
different functions can be used
to evaluate how well they compare to each other.
We currently support two evaluation functions,
which are specified using the -t flag:
chi2: A \(\chi^2\) distribution
student-t: A log Student-T distribution
In both cases the evaluation function is applied to individual data points pairs (observations v/s model data), and a final sum is done to get the final result. Thus, constraints with more data points have naturally more relevance in the final result.
Constraints
A flexible number of constrains are supported
to evaluate how well a shark parameter set behaves.
Constraints are specified on the command line
with the -x switch (see main.py -h for details).
Each constraint specification follows this pattern:
<name>[(<min>-<max>)][*<weight>]
Here sections within [] are optional,
meaning that only <name> is required.
<name> is the name of the constraint (see below),
<min> and <max> specify the domain to consider
during the evaluation of the constraint,
and <weight> is the relative weight of the constraint
when evaluating in a multi-constraint scenario.
Each constraint has a hard-coded domain that is used as default,
in the user doesn’t specify one.
<weight> defaults to 1,
meaning that the results of the evaluation function for all constraints
(see above for details on this)
are weighted equally.
The following constraints are currently supported:
HIMF: it evaluates the HI mass function atz=0. Its default domain is(7, 12).
SMF_z0: it evaluates the stellar mass function atz=0. Its default domain is(8, 13).
SMF_z1: likeSMF_z0but atz=1.
If you want to add new constraints please refer to the in depth documentation about the subject.
HPC support
HPC support for running shark optimizations
comes out of the box.
In particular,
the parallel parameter set evaluation
supported by the shark-submit script is used
to execute as many shark instances in parallel as possible
in the cluster.
To turn on HPC support use the -H option
when calling main.py.
Use main.py -h to get a full list of parameters.
Remember that some of these can already be defined
via environment variables,
easing the usage of the system.
Diagnostics
After running,
the optimization routines will generate a series of files
under a tracks folder.
These can be visually analyzed by running the diagnostics.py script
pointing to the tracks folder.