coevolutionary free lunches

Anthony • Dec 16, 2021 • #coev #complexity #solutions #domains

Refers to the general idea that for a fixed class of problems, there can be coevolutionary algorithmscoevolutionary algorithms
A part of complexity science that deals directly with nature-inspired evolutionary processes involving interaction in the fitness function. A definition I've used in the past is that these algorith...
that perform better than others on all instances of that class. It is an extension of a series of no free lunch theorems initiated by David Wolpert and William Macready, which argue that when it comes to classes of optimization problems, any (reasonable) algorithm performs as well as any other in the sense that any particular algorithm will perform optimally on some instances of the problem class, reasonably well on others, and poorly on still others, so that in the aggregate all algorithms' performance comes out to be the same. There's a lengthy theoretical setup, but that's the gist. No free lunch has become a sort of folklore result, which (in my view) is one of the things that makes coevolutionary free lunch interesting.

Elena Popovici and I extended and grounded the theoretical paper of Wolpert and Macready and actually exhibited pseudocode of a free lunch algorithm. Later, Elena and Ezra Winston extended that result to include algorithm mechanisms, something W&M did not consider, landing a paper about it in the Theoretical Computer Science journal.

A key construct in these proofs is an algorithm history or trace. We considered best worst-case optimization of a simple test-based problem, with interactive domainsinteractive domains
A collection of one or more functions, called metrics, of the form $p\colon X_1\times X_2\times\cdots\times X_n\rightarrow R$, where

each $i$ with $1\leq i\leq n$ is a domain role
an element $...
taking form $p\colon S\times T\rightarrow R$, where $S$ and $R$ are finite, $R$ is totally ordered and higher up the order is better, and the value of $p$ goes to elements of $S$. The solution concept was expected to output a single element of $S$ or a set of singletons of $S$ to be more precise. A history or trace then is a finite sequence $((s_1,t_1),(s_2,t_2),\dots,(s_n,t_n))$ of pairs where no pair ever appears twice, but individual $s_i$ might repeat (similarly for $t_i$), together with the output of $p$ on each pair. We followed W&M in being algorithm agnostic and focused only on the output mechanism, the salient question being: given such a history, what element of $S$ should the algorithm output as its "answer"? Their answer, which is developed in their paper, was a kind of Bayes optimality property, where the expectations are taken with respect to all possible extensions of the problem consistent with the history observed.

What Elena and I first observed is that in a fairly wide set of circumstances (sticking with best worst case though), this complicated-sounding condition boiled down to a greedy criterion. In other words, the Bayes optimal way to choose which solution to output, given a history, is along the lines of: output an $s_i$ that has the best worst-case value according to the history among those that have been tested most, unless that best-worst case is the worst value possible, in which case output an arbitrary solution not in the history (or all of them if you're looking for all). What Elena and Ezra showed is that, again in a wide (though more restricted) set of circumstances, the optimal way to choose which solution to interact next–the algorithm mechanism–is to choose the solution you would have output from the history up to that point, and test that one with an arbitrary test that does not create a repeat in the history. These mechanisms can be implemented in a straightforward way.

I feel compelled to point out, though, that when you run them, or sit and think about them for a minute, these mechanisms are disappointing. In practice, such an algorithm will repeatedly test the same candidate solution over and over again until it is found to have the worst case value; then it will select an arbitrary new candidate, and repeat. Doing this has nice properties when considered in aggregate over all problems in the problem class, but it's unsatisfactory in practice because it does not explore.

Best worst-case is well-known to be a non-montonic solution concept, and thus stymies algorithms or theories that rely on a monotonicity condition (which I’ve argued is a form of convexity). Thus, this line of work is important not just because it exhibits provably-optimal algorithms, but also because it fills a theoretical gap.