Without commenting on their work or its success, I will say that, in general, this is a use of the large language model that holds much potential for discoveries and invention.
I think this is more machine-learning than language model stuff though. “This outcome is GOOD +1. THIS outcome is BAD -1. Now, go get high score”. (By running the same functions millions of times, and propagating the “genes” of the highest score of each generation, to the next.