For the comparison of more than 2 algorithms, I would appreciate the ability to define my own line patterns for the graphs. Why? If I compare several unrelated algos, it is OK to have them each plotted with a different color… But, if I have e.g. 6 algorithms which are actually 6 instances of the same algorithm differing only in the levels of e.g. 2 factors, say popsize (large, medium, small) and crossover (on, off), than it would be highly desirable to encode the popsize e.g. by 3 different markers, and the presence of crossover e.g. by the line type (solid, dashed).
Current solution (not sure to which graphs this applies): the variable line_styles in file genericsettings.py defines the line styles. The simplest solution is therefore to assign this variable with a different value. An example will be provided in the file in the next release.
Provide the user with means (and tutorial) how to easilly define 1 definitive criterion that could be used to rank the algorithms. In the docs, it is actually suggested to use COCO to explore various parameter settings of the algorithms. But what COCO provides to the user with its automatically generated graphs and tables is usually a kind of “feeling” which algorithm works best and this feeling is based on some nonarticulated criterion the user implicitly uses.
This issue will be covered by a short tutorial during BBOB if time permits. We will show how to access the experimental data using the COCO DataSet and DataSetList classes and we will show how to aggregate them using (variants of) the geometric mean as the performance index.
Things worth repeating over and over.
Q: If I benchmarked 2 algorithms with the same evaluations budget and they both solved the same proportion of probles (as indicated by the crosses in the ECDF graphs which are virtually in the same place), how come that the curve on the right from the cross rises significantly for one of the algorithms, while it stagnates for the other algorithm?
A: This happens if one algorithm has solved for some functions on each of them only a few instances, while another solves all instances on one of these functions, and none on the others. Before the cross is reached, their performance could look similar (they solve the same overall number of instances). However, as the ECDF graphs are derived from virtual restarts on the same function (but with random instances), the first algorithm will raise and appear to solve all functions after the cross, whereas the second will stagnate as it only solved one of the functions.