Optimization¶
shark ships with an optimization package to allow users easily explore different shark parameter sets in different scenarios.
Specifying the search space¶
The parameter search space to be used by the optimization routine is specified in a text file, where each row fully describes a parameter.
Each row is a comma-separated list of values looking like this:
parameter-name, plot-label, is_log, lb, up
where:
parameter-name
is the name of the shark option as given in the command line (e.g.,reincorporation.tau_reinc
).plot-label
is the label that will appear on any plots produced to diagnose the execution.is_log
should be1
if the parameter should be optimized for in the log space, or0
if it should be optimized linearly.lb
andub
define the parameter’s search space boundaries. Ifis_log
is1
these values should already be logged.
As an example:
reincorporation.tau_reinc, $\log_{10}(\tau_{reinc})$, 1, 0, 1.4771212547196624
reincorporation.mhalo_norm, $\log_{10}(M_{norm})$, 1, 9, 12
reincorporation.halo_mass_power, $\gamma$, 0, -3, 0
Running¶
To run the optimization routines for shark
you need to enter to the optim
subdirectory
of the shark git repository
and run the main.py
script.
Run main.py -h
to get a comprehensive list
of all options that can be passed down to the script,
including options specific to
PSO,
HPC environments
and more.
Optimization methods¶
Currently the only supported optimization method is the Particle Swarm Optimization (PSO), but with time we plan to add more optimization methods.
PSO¶
When running a PSO optimization for shark, users can specify a number of options:
-s SWARM_SIZE
indicates the swarm size (i.e., the number of particles to use). It defaults to10 + sqrt(D) * 2
, whereD
is the number of dimensions of the problem, which corresponds to the number of parameters being fitted.-m MAX_ITERATIONS
is the number of maximum iterations that PSO should run for before giving up. Otherwise PSO will automatically stop when the particles start converging within certain limits (1e-8
in particle step differences or objective function changes).
Evaluation functions¶
When comparing model data against observational data,
different functions can be used
to evaluate how well they compare to each other.
We currently support two evaluation functions,
which are specified using the -t
flag:
chi2
: A \(\chi^2\) distributionstudent-t
: A log Student-T distribution
In both cases the evaluation function is applied to individual data points pairs (observations v/s model data), and a final sum is done to get the final result. Thus, constraints with more data points have naturally more relevance in the final result.
Constraints¶
A flexible number of constrains are supported
to evaluate how well a shark parameter set behaves.
Constraints are specified on the command line
with the -x
switch (see main.py -h
for details).
Each constraint specification follows this pattern:
<name>[(<min>-<max>)][*<weight>]
Here sections within []
are optional,
meaning that only <name>
is required.
<name>
is the name of the constraint (see below),
<min>
and <max>
specify the domain to consider
during the evaluation of the constraint,
and <weight>
is the relative weight of the constraint
when evaluating in a multi-constraint scenario.
Each constraint has a hard-coded domain that is used as default,
in the user doesn’t specify one.
<weight>
defaults to 1,
meaning that the results of the evaluation function for all constraints
(see above for details on this)
are weighted equally.
The following constraints are currently supported:
HIMF
: it evaluates the HI mass function atz=0
. Its default domain is(7, 12)
.SMF_z0
: it evaluates the stellar mass function atz=0
. Its default domain is(8, 13)
.SMF_z1
: likeSMF_z0
but atz=1
.
If you want to add new constraints please refer to the in depth documentation about the subject.
HPC support¶
HPC support for running shark optimizations
comes out of the box.
In particular,
the parallel parameter set evaluation
supported by the shark-submit
script is used
to execute as many shark instances in parallel as possible
in the cluster.
To turn on HPC support use the -H
option
when calling main.py
.
Use main.py -h
to get a full list of parameters.
Remember that some of these can already be defined
via environment variables,
easing the usage of the system.
Diagnostics¶
After running,
the optimization routines will generate a series of files
under a tracks
folder.
These can be visually analyzed by running the diagnostics.py
script
pointing to the tracks
folder.