OpenAI Codex: Tooling Deep-Dive

With the release of GPT-5-Codex (a version of GPT‑5 further optimized for agentic coding in Codex), Codex has became my default coding agent tool of choice:

codex \
  --model 'gpt-5-codex' \
  --full-auto \
  -c model_reasoning_summary_format=experimental \
  --search "$@"

I noticed that web --search is disabled by default and must be explicitly enabled when starting the agent. That prompted me to dig deeper into the tooling setup.

From their thankfully open repository, I learnt that only 2 tools are enabled by default: shell and view_image. The latter is straightforward - view_image lets the LLM “see” uploaded images in the conversation. The shell tool, however, has some additional and every interesting capabilities.

Shell

The basic mode executes a command and immediately returns the result as JSON:

Beyond this, there are two interactive modes that allows for a persistent, streamable shell session using a session_id. The first is unified_exec.

unified_exec is useful for sessions that operates at the line or command level. For lower-level control, the API splits into exec_command and write_stdin:

exec_command: starts a session
write_stdin: sends additional input to that session

The key distinction is that write_stdin can transmit raw control characters (e.g. \u0003) and streams direct text output instead of JSON, enabling more fine-grained, low-level integration with the shell.

Interestingly, the code editing tool apply_patch is also just a wrapper around shell.

Apply Patch

Getting LLMs to perform multiple file edits with precision is tricky, as each model favors a different patch specification.

Google’s Gemini prefers the diff-fenced format, with filenames placed inside the code fence (```).
OpenAI models, by contrast, prefer their patch format described here:

%%bash
apply_patch <<"EOF"
*** Begin Patch
[YOUR_PATCH]
*** End Patch
EOF

Beyond syntax, we also need to consider the implementation:

Should code search be exact or regex-based?
Should replacements be single or multiple idempotent operations?

These choices directly affect whether edits apply reliably across files.

This is OpenAI’s apply_patch implementation.

Before diving into web_search, though, I want to zoom out and reflect on coding agents more broadly.

Coding Agents

It feels like coding agents have been around forever, but in reality, it’s been less than six months since I started experimenting with them—and even tried to build my own.

Back then, I created ToolKami because I wanted a Claude Code–like agent that worked with Gemini 2.5 Pro. I had fun (and some success), and eventually released it as an open-source project. Ironically, in the same month OpenAI announced Codex and Google announced Jules.

I had to pause ToolKami for about three months due to other priorities. Now, with Codex as my go-to agent, I find myself questioning whether ToolKami should still exist.

There are, however, strong arguments in its favor:

Open vs. closed: most options out there—except Codex—are closed.
Python vs. Rust/TypeScript: personal preference matters, and I’d rather hack on an agent in a language I love.
Model agnostic: most agents pair best with their own models, but model popularity shifts quickly.
Simplicity and extensibility: ToolKami was designed to be simple, composable, and extensible, starting with four core tools—Ask, Browse, File, and Shell.

This design also unlocked some neat possibilities:

UV Script: each tool is a standalone UV script, meaning you can ship an MCP server as a single file.
Turbo Mode: toggle full autonomy simply by disabling the ask tool.
Hot Reload: the MCP server picks up changes to tool files in real time, enabling the agent to self-modify.

Moreover, one area consistently underwhelmed me: web_search.

Web Search

While web_search is invaluable for pulling up-to-date documentation, a true browse tool goes further. It allows the agent to:

Render pages exactly as the client’s browser would.
Inspect JavaScript errors via the browser console.
Multi modally review artifacts—images, animations, audio, video—to confirm they play or render correctly.

This isn’t just nice-to-have. It’s foundational, and too important not to be a default tool for coding agents.

Conclusion

With all this in mind:

The space is evolving rapidly. This set of tools is too important to leave entirely in the hands of others—I want a simple, extensible framework I actually enjoy working with.
It’s not an either–or choice. ToolKami tools can easily be embedded into Codex or even into Cursor, as I showed here.
Beyond coding, I’ve extended it into an AlphaEvolve Agent—and it’s been too much fun to stop.

So yes, I’ll be resuming work on ToolKami. Every star (or even issue) you leave on GitHub motivates me to keep going.

I’m also planning another post on update_plan and cover some context engineering.

Thanks for reading—I hope you found some takeaways, especially if you’re experimenting with Codex yourself.

References:

https://cookbook.openai.com/examples/gpt-5-codex_prompting_guide

If you found this post useful, get updated whenever there is a new post!

We respect your privacy. Unsubscribe at any time.

#Agent