Weeknotes: Harness Engineering, PWAKit Documentation, and Marathon Reality Checks

A bit of everything this week. PWAKit docs, rethinking how agents get isolated workspaces, pruning overstuffed config files, and Jeeves diplomatically telling me I’m not marathon-ready.

PWAKit Documentation

This week I’ve managed to spend a little time building out some docs for PWAKit.

I took inspiration from Peter Steinberger and the OpenClaw docs, using AI-generated docs following a progressive disclosure pattern with source code as the source of truth.

PWAKit landing page PWAKit getting started guide

It’s a good initial start which can be fine-tuned going forward. Another case of steering intent rather than writing. Describe the structure you want, point the agent at the source, and let it generate.

It also raises an interesting question about what documentation is even for now. If someone can point their agent at a repo and it can explore, understand, and answer questions about it, do we still need traditional docs? Much like comments in code, maybe documentation should be reserved for the things you can’t discern from the source itself.

From Prompt Engineering to Harness Engineering

Ryan Lopopolo published an interesting blog post last month, Harness Engineering. Five months, roughly a million lines of code, zero manually written. Engineers design environments and specify intent, agents implement.

What caught my attention was the role shift. The job is less about writing code and more about designing the environment the agent works within.

It crystallised something I’d been feeling: there’s an evolution across AI engineering disciplines. Prompt engineering, learning to write good prompts. Context engineering, realising what you surround the prompt with matters more. Harness engineering, the scaffolding, tooling, constraints, and feedback loops around the agent.

These aren’t distinct eras though. They’re concentric circles, each subsuming the previous. You still need good prompts and good context, but the harness is the outermost circle. And it’s all about closing that feedback loop.

Copy-on-Write Clones for Agent Sandboxing

There’s been a wave of GUI-based coding agent tools appearing, T3 Code, Polyscope, Conductor, and others. A common problem they all hit: how do you give multiple agents isolated workspaces?

Git worktrees are the usual answer. But there’s something nicer on macOS: APFS copy-on-write clones.

clonefile() gives you an instant, zero-disk copy of an entire directory. Nothing actually gets copied; it just duplicates pointers. New data only gets written when something changes. You can use the cp -c command to leverage this from the shell.

Compared to worktrees, you get a genuinely isolated full project copy without the cleanup overhead. Each agent gets its own workspace, works on its own branch, diffs merge back. I’ve been writing some scripts to automate this and it’s replaced worktrees in my workflow.

It’s macOS-only (Linux needs Btrfs or XFS reflinks), but for native Mac tooling it’s elegant.

Peeling Back AGENTS.md

An ETH Zurich/DeepMind paper showed that comprehensive AGENTS.md files might actually hurt agent performance. Too much context, too generic. Small, targeted instructions beat kitchen-sink configs.

I’ve been guilty of this. CLAUDE.md files, AGENTS.md files. I was going a bit too heavy. Stuffing too much in and then forgetting about it. Now I’m peeling it back, working out what actually helps versus what’s just noise sitting in the system prompt every interaction.

The hard part: evals are fuzzy. Agents are nondeterministic, drift happens slowly, and you don’t notice. You write these config files, forget they exist, and they shape everything silently. This made me think about evals, and Anthropic’s skill evaluation approach is interesting, comparing output with and without a skill active to measure actual impact. Helps catch regressions across model changes too. Maybe something similar could be done with AGENTS.md files, evaluating whether each instruction actually improves output.

The answer isn’t a bigger prompt file, it’s better scaffolding. Aligns neatly with the harness engineering thesis.

Jeeves

Jeeves keeps ticking along. New skills, new pages. The weeknotes skill is particularly meta, helping me draft these posts from voice notes.

The Garmin integration has been interesting. Correlating health data with the baby’s arrival; sleep, activity, all the physiological metrics shifting. Best moment: Jeeves analysed my training data and gently broke the news I’m not marathon-ready for mid-April. Diplomatically delivered, but brutal.

Jeeves marathon readiness analysis

I’ve also been reading about Lossless Context Management as an alternative memory model. Instead of lossy compaction, LCM builds a DAG of hierarchical summaries. Nothing lost, raw messages in SQLite, agents can drill into any summary. Relevant to Jeeves’s context bloat. This is something I’m looking to explore in the near future.

What I’ve Been Learning From

Links, reads, watches, and listens from the week.

Articles:

Podcasts/Videos:


A lot of this week comes back to the harness engineering idea: PWAKit docs, copy-on-write sandboxing, pruning AGENTS.md files. Less writing code, more shaping the environment the agent works within. And then Jeeves quietly letting me know the marathon isn’t happening.

Compiled Conversations podcast album art

Compiled Conversations

Podcast I host, featuring conversations with the people shaping software and technology.

Listen to Compiled Conversations