AlphaEvolve: ToolKami Style

(Last updated: 2025-09-23):

TL;DR: We implemented AlphaEvolve as an LLM workflow with MCP tools to optimize Perlin noise implementation for procedural generation of images. Code is available at the end of this post.

When I had just started experimenting with ToolKami, Google released AlphaEvolve: A coding agent for scientific and algorithmic discovery. The paper made waves in the news—unsurprisingly, given its impressive results and its innovative combination of two powerful techniques:

While their implementation remains closed-source, they shared enough details for me to quickly replicate a pseudo loop using my existing agentic setup. For this experiment, I set myself the goal of optimizing a Perlin noise implementation that resembles a target fire image.

Perlin noise

Perlin noise is an algorithm commonly used for procedural content generation of wave-like, undulating material or textures. You’ll find it in games such as Minecraft, where it generates terrain and biomes, and in movies, where it powers VFX effects like clouds, fire, or water.

Sweet spot of AI

Optimizing Perlin Noise is a sweet spot for AI because it fulfill three important criteria.

Massive combinatorial search space There are countless ways to implement Perlin noise and populate its parameters.
Clear objective function (metric) to optimize against We can use the Mean Squared Error (MSE) between pixels of the target image and the image generated by the Perlin noise implementation.

Either logs of data and/or an accurate and efficient simulator In this case, the “efficient simulator” is the code interpreter, which can be run in parallel.

Implementation

The “Controller Loop” can be implemented manually, or as an agentic workflow with the right tools:

The general steps are as follows:

Tell the agent its role and goal
Sample the implementations using the File tool
Instruct the LLM which region of code it is allowed to modify
Evaluate the implementation using the Shell tool
Save the implementations and evaluation results with File tool
Repeat

Here’s the actual prompt I used:

Act as an expert software developer. Your task is to iteratively improve the provided codebase.

Create a Perlin noise implementation that resembles the target image (a fire in this case).
1. Use list_directory tool with sort_order 'asc' and limit 10 in the directory '/workspaces/toolkami/projects/perlin/results' which were saved with convention '{score}_{md5sum}.py'.
2. Sample 1 program from the list, it doesn't have to be the best, sample randomly.
3. Make a copy of the file with the name 'candidate_{random_id}_{md5sum}.py' with executable permission and save it in the directory '/workspaces/toolkami/projects/perlin/results'.
4. You are only allowed to modify the content between '# EVOLVE-BLOCK-START' and '# EVOLVE-BLOCK-END', suggest a new idea to improve the code that is inspired by your expert knowledge of game programming, graphics and optimization.
5. Edit the candidate file using the edit tool with the diff-fenced format.
6. Write the output of edit tool to the candidate file
7. Execute the program (as a UV script) and after obtaining the output score
8. rename the file with convention '{score}_{md5sum}.py'.
9. Forget current memory
10. Repeat the process

Notice the red block specifying the edit region. The whitepaper highlights this technique:

API. To support evolving multiple components across a codebase, AlphaEvolve exposes an input API where blocks of code can be annotated as to-be-evolved-by-the-system; This design facilitates integrating it with existing codebases while requiring only minimal changes, simply by adding special markers as comments into the code.

Result

The agent successfully reduced the MSE of the base implementation from -0.1373 to -0.0564 (closer to 0 is better) in just 100 iterations. Improvements came mainly from:

Adding a bilinear_interpolate function
Using exact fractional and integer parts of coordinates for gradient dot product computations
Generating smoother transitions across gradient cells due to higher-fidelity calculations

Visually, the progression looks like this:

Base implementation: resembles lava more than fire
After 20 iterations: begins to take the shape of fire
After 100 iterations: fire streaks appear, burning brightly at the center

With more iterations, we’d expect even better results.

Conclusion

To re-summarize our modifications to the original setup:

We implemented it as an agentic workflow with simple, composable tools instead of a full-fledged program
Instead of a database, we stored single-file executable programs (UV scripts) in a directory, with scores prefixed in the filenames
We used diff-fenced editing (credits to Phil Schmid) as an API for modifying Evolve Blocks

The complete implementation is available here: Code.

As this is my first blog post, I hope you enjoyed reading it! Feel free to share suggestions to improve both the post and the implementation.

If you found this post useful, get updated whenever there is a new post!

We respect your privacy. Unsubscribe at any time.

#Agent