Weeknotes: An Agentic Loop in PHP, Forge Git Review, and Verifying and Demoing Agent Work

A few bits from this week. A bit of fun code golf seeing how small an agentic loop in PHP can be, and more work on Forge with several improvements to the Git review pane. Also been leaning into computer use and terminal recording tools to get agents to verify and demo their own work.

Code Golf: An Agentic Loop in PHP

I spent some time this week writing up Code Golf: An Agentic Loop in PHP. There’s been a lot of discussion recently about agent frameworks - orchestration layers, SDKs, harnesses - and over lunch I got curious about how much of that is actually essential. The post is a bit of fun code golf seeing how small a working agentic loop can be, whilst still doing useful work. The end result is a working agent with tool calling in under 1KB of PHP, using Ollama for local inference and a single shell tool.

The use of qwen3.5:35b-a3b impressed me a lot whilst working on this - it’s piqued my interest to peel under Ollama and explore mlx-lm and llama.cpp / llama-server directly. This article also did the rounds just after I worked on this and is worth a read: Stop Using Ollama.

Forge

Continuing on Forge from last week, this week was about a few improvements to the Git review pane - syntax highlighting being the big one, with some usability and performance tweaks alongside. I’m really happy with where the review side of Forge has got to. It’s now surpassing the experience I had with revu, which is where this whole idea started.

Forge Git review pane with tree-sitter syntax highlighting and inline comment

Syntax highlighting is built on tree-sitter via the SwiftTreeSitter binding, with a per-language grammar pulled in for each one supported (Swift, TypeScript, Python, etc.). It does feel a bit heavyweight though. Those 13 grammar packages are Forge’s first SPM dependencies, and each compiled grammar gets bundled into the binary, which adds up. File and hunk expanding / collapsing was also added this week alongside the highlighting, and the review pane feels much more usable now.

Verifying and Demoing Agent Work

A couple of ideas sit underneath this one, both about getting the agent to do more of the work itself rather than handing it back to me unfinished. The first is getting the agent to verify what it built before declaring done. The second is getting it to demo that work back to me, so I can review it visually before merging.

For the verification side, I’ve been leaning into computer use - both in Claude Code and in Codex. Rather than me having to manually exercise a feature once the agent says it’s done, I can now get the agent to explore and verify the feature itself - sometimes a bit too much, in the case of Forge.

The demo side is video recordings - getting the agent to record what it built and provide that at the end of its work, so I can review the change visually before merging it back. A nice way to see the work is actually done before I sign off on it.

In a similar vein, I’ve been having a play with VHS. Scripted recordings of terminal / ASCII output, which I’ve been using to capture demos for terminal-driven work alongside what I’m doing with computer use for the GUI side.

The throughline across all of this is the same: get the agent to confirm its own work, then present that work in a nice form, with videos, to sell it to me to actually merge, much like you would do in a regular PR in a team setting.

What I’ve Been Learning From

Articles:

Videos/Podcasts:

Tweets:

Compiled Conversations podcast album art

Compiled Conversations

Podcast I host, featuring conversations with the people shaping software and technology.

Listen to Compiled Conversations