The Toolkit Pattern

The toolkit pattern is a way of documenting your project’s configuration so that any AI can generate working inputs from a plain-English description. You and the AI create a single file that describes your tool’s configuration format, its constraints, and enough worked examples that any AI can generate working inputs from a plain-English description. You […]

Apr 2, 2026 0 5

The toolkit pattern is a way of documenting your project’s configuration so that any AI can generate working inputs from a plain-English description. You and the AI create a single file that describes your tool’s configuration format, its constraints, and enough worked examples that any AI can generate working inputs from a plain-English description. You build it iteratively, working with the AI (or, better, multiple AIs) to draft it. You test it by starting a fresh AI session and trying to use it, and every time that fails you grow the toolkit from those failures. When you build the toolkit well, your users will never need to learn how your tool’s configuration files work, because they describe what they want in conversation and the AI handles the translation. That means you don’t have to compromise on the way your project is configured, because the config files can be more complex and more complete than they would be if a human had to edit and understand them.

To understand why all of this matters, let me take you back to the mid-1980s.

I was 12 years old, and our family got an AT&T PC 6300, an IBM-compatible that came with a user’s guide roughly 159 pages long. Chapter 4 of that manual was called “What Every User Should Know.” It covered things like how to use the keyboard, how to care for your diskettes, and, memorably, how to label them, complete with hand-drawn illustrations and really useful advice, like how you should only use felt-tipped pens, never ballpoint, because the pressure might damage the magnetic surface.

A page from the AT&T PC 6300 User's Guide, Chapter 4: — *A page from the AT&T PC 6300 User’s Guide, Chapter 4: “Labeling Diskettes”*

I remember being fascinated by this manual. It wasn’t our first computer. I’d been writing BASIC programs and dialing into BBSs and CompuServe for a couple of years, so I knew there were all sorts of amazing things you could do with a PC, especially one with a blazing fast 8MHz processor. But the manual barely mentioned any of that. That seemed really weird to me, even as a kid, that you would give someone a manual that had a whole page on using the backspace key to correct typing mistakes (really!) but didn’t actually tell them how to use the thing to do anything useful.

That’s how most developer documentation works. We write the stuff that’s easy to write—installation, setup, the getting-started guide—because it’s a lot easier than writing the stuff that’s actually hard: the deep explanation of how all the pieces fit together, the constraints you only discover by hitting them, the patterns that separate a configuration that works from one that almost works. This is yet another “looking for your keys under the streetlight” problem: We write the documentation we write because it’s easiest to write, even if it’s not really the documentation our users need.

Developers who came up through the Unix era know this well. Man pages were thorough, accurate, and often completely impenetrable if you didn’t already know what you were doing. The tar man page is the canonical example: It documents every flag and option in exhaustive detail, but if you just want to know how to extract a .tar.gz file, it’s almost useless. (The right flag is -xzvf in case you’re curious.) Stack Overflow exists in large part because man pages like tar’s left a gap between what the documentation said and what developers actually needed to know.

And now we have AI assistants. You can ask Claude or ChatGPT about, say, Kubernetes, Terraform, or React, and you’ll actually get useful answers, because those are all established projects that have been written about extensively and the training data is everywhere.

But AI hits a hard wall at the boundary of its training data. If you’ve built something new—a framework, an internal platform, a tool your team created—no model has ever seen it. Your users can’t ask their AI assistant for help, because the AI doesn’t know your thing even exists.

There’s been a lot of great work moving AI documentation in the right direction. AGENTS.md tells AI coding agents how to work on your codebase, treating the AI as a developer. llms.txt gives models a structured summary of your external documentation, treating the AI as a search engine. What’s been missing is a practice for treating the AI as a support engineer. Every project needs configuration: input files, option schemas, workflow definitions, usually in the form of a whole bunch of JSON or YAML files with cryptic formats that users have to learn before they can do anything useful.

The toolkit pattern solves that problem of getting AIs to write configuration files for a project that isn’t in its training data. It consists of a documentation file that teaches any AI enough about your project’s configuration that it can generate working inputs from a plain-English description, without your users ever having to learn the format themselves. Developers have been arriving at this same pattern (or something very similar) independently from different directions, but as far as I can tell, nobody has named it or described a methodology for doing it well. This article distills what I learned from building the toolkit for Octobatch pipelines into a set of practices you can apply to your own projects.

Build the AI its own manual

Traditionally, developers face a trade-off with configuration: keep it simple and easy to understand, or let it grow to handle real complexity and accept that it now requires a manual. The toolkit pattern emerged for me while I was building Octobatch, the batch-processing orchestrator I’ve been writing about in this series. As I described in the previous articles in this series, “The Accidental Orchestrator” and “Keep Deterministic Work Deterministic,” Octobatch runs complex multistep LLM pipelines that generate files or run Monte Carlo simulations. Each pipeline is defined using a complex configuration that consists of YAML, Jinja2 templates, JSON schemas, expression steps, and a set of rules tying it all together. The toolkit pattern let me sidestep that traditional trade-off.

As Octobatch grew more complex, I found myself relying on the AIs (Claude and Gemini) to build configuration files for me, which turned out to be genuinely valuable. When I developed a new feature, I would work with the AIs to come up with the configuration structure to support it. At first I defined the configuration, but by the end of the project I relied on the AIs to come up with the first cut, and I’d push back when something seemed off or not forward-looking enough. Once we all agreed, I would have an AI produce the actual updated config for whatever pipeline we were working on. This move to having the AIs do the heavy lifting of writing the configuration was really valuable, because it let me create a very robust format very quickly without having to spend hours updating existing configurations every time I changed the syntax or semantics.

At some point I realized that every time a new user wanted to build a pipeline, they faced the same learning curve and implementation challenges that I’d already worked through with the AIs. The project already had a README.md file, and every time I modified the configuration I had an AI update it to keep the documentation up to date. But by this time, the README.md file was doing way too much work: It was really comprehensive but a real headache to read. It had eight separate subdocuments showing the user how to do pretty much everything Octobatch supported, and the bulk of it was focused on configuration, and it was becoming exactly the kind of documentation nobody ever wants to read. That particularly bothered me as a writer; I’d produced documentation that was genuinely painful to read.

Looking back at my chats, I can trace how the toolkit pattern developed. My first instinct was to build an AI-assisted editor. About four weeks into the project, I described the idea to Gemini:

I’m thinking about how to provide any kind of AI-assisted tool to help people create their own pipeline. I was thinking about a feature we would call “Octobatch Studio” where we make it easy to prompt for modifying pipeline stages, possibly assisting in creating the prompts. But maybe instead we include a lot of documentation in Markdown files, and expect them to use Claude Code, and give lots of guidance for creating it.

I can actually see the pivot to the toolkit pattern happening in real time in this later message I sent to Claude. It had sunk in that my users could use Claude Code, Cursor, or another AI as interactive documentation to build their configs exactly the same way I’ve been doing:

My plan is to use Claude Code as the IDE for creating new pipelines, so people who want to create them can just spin up Claude Code and start generating them. That means we need to give Claude Code specific context files to tell it everything it needs to know to create the pipeline YAML config with asteval expressions and Jinja2 template files.

The traditional trade-off between simplicity and flexibility comes from cognitive overhead: the cost of holding all of a system’s rules, constraints, and interactions in your head while you work with it. It’s why many developers opt for simpler config files, so they don’t overload their users (or themselves). Once the AI was writing the configuration, that trade-off disappeared. The configs could get as complicated as they needed to be, because I wasn’t the one who had to remember how all the pieces fit together. At some point I realized the toolkit pattern was worth standardizing.

That toolkit-based workflow—users describe what they want, the AI reads TOOLKIT.md and generates the config—is the core of the Octobatch user experience now. A user clones the repo and opens Claude Code, Cursor, or Copilot, the same way they would with any open source project. Every configuration prompt starts the same way: “Read pipelines/TOOLKIT.md and use it as your guide.” The AI reads the file, understands the project structure, and guides them step by step.

To see what this looks like in practice, take the Drunken Sailor pipeline I described in “The Accidental Orchestrator.” It’s a Monte Carlo random walk simulation: A sailor leaves a bar and stumbles randomly toward the ship or the water. The pipeline configuration for that involves multiple YAML files, JSON schemas, Jinja2 templates, and expression steps with real mathematical logic, all wired together with specific rules.

*Drunken Sailor is Octobatch’s simplest “Hello, World!” Monte Carlo pipeline, but it still has 148 lines of config spread across four files.*

Here’s the prompt that generated all of that. The user describes what they want in plain English, and the AI produces the entire configuration by reading TOOLKIT.md. This is the exact prompt I gave Claude Code to generate the Drunken Sailor pipeline—notice the first line of the prompt, telling it to read the toolkit file.

*You don’t need to know Octobatch to understand the prompt I used to create the Drunken Sailor pipeline.*

But configuration generation is only half of what the toolkit file does. Users can also upload TOOLKIT.md and PROJECT_CONTEXT.md (which has information about the project) to any AI assistant—ChatGPT, Gemini, Claude, Copilot, whatever they prefer—and use it as interactive documentation. A pipeline run finished with validation failures? Upload the two files and ask what went wrong. Stuck on how retries work? Ask. You can even paste in a screenshot of the TUI and say, “What do I do?” and the AI will read the screen and give specific advice. The toolkit file turns any AI into an on-demand support engineer for your project.

*The toolkit helps turn ChatGPT into an AI manual that helps with Octobatch.*

What the Octobatch project taught me about the toolkit pattern

Building the generative toolkit for Octobatch produced more than just documentation that an AI could use to create configuration files that worked; it also yielded a set of practices, and those practices turn out to be pretty consistent regardless of what kind of project you’re building. Here are the five that mattered most:

Start with the toolkit file and grow it from failures. Don’t wait until the project is finished to write the documentation. Create the toolkit file first, then let each real failure add one principle at a time.
Let the AI write the config files. Your job is product vision—what the project should do and how it should feel. The AI’s job is translating that into valid configuration.
Keep guidance lean. State the principle, give one concrete example, move on. Every guardrail costs tokens, and bloated guidance makes AI performance worse.
Treat every use as a test. There’s no separate testing phase for documentation. Every time someone uses the toolkit file to build something, that’s a test of whether the documentation works.
Use more than one model. Different models catch different things. In a three-model audit of Octobatch, three-quarters of the defects were caught by only one model.

I’m not proposing a standard format for a toolkit file, and I think trying to create one would be counterproductive. Configuration formats vary wildly from tool to tool—that’s the whole problem we’re trying to solve—and a toolkit file that describes your project’s building blocks is going to look completely different from one that describes someone else’s. What I found is that the AI is perfectly capable of reading whatever you give it, and is probably better at writing the file than you are anyway, because it’s writing for another AI. These five practices should help build an effective toolkit regardless of what your project looks like.

Start with the toolkit file and grow it from failures

You can start building a toolkit at any point in your project. The way it happened for me was organic: After weeks of working with Claude and Gemini on Octobatch configuration, the knowledge about what worked and what didn’t was scattered across dozens of chat sessions and context files. I wrote a prompt asking Gemini to consolidate everything it knew about the config format—the structure, the rules, the constraints, the examples, everything we’d talked about—into a single TOOLKIT.md file. That first version wasn’t great, but it was a starting point, and every failure after that made it better.

I didn’t plan the toolkit from the beginning of the Octobatch project. It started because I wanted my users to be able to build pipelines the same way I had—by working with an AI—but everything they’d need to do that was spread across months of chat logs and the CONTEXT.md files I’d been maintaining to bootstrap new development sessions. Once I had Gemini consolidate everything into a single TOOLKIT.md file and had Claude review it, I treated it the way I treat any other code: Every time something broke, I found the root cause, worked with the AIs to update the toolkit to account for it, and verified that a fresh AI session could still use it to generate valid configuration.

That incremental approach worked well for me, and it let me test my toolkit the way I test any other code: try it out, find bugs, fix them, rinse, repeat.

You can do the same thing. If you’re starting a new project, you can plan to create the toolkit at the end. But it’s more effective to start with a simple version early and let it emerge over the course of development. That way you’re dogfooding it the whole time instead of guessing what users will need.

Let the AI write the config files (but stay in control!)

Early Octobatch pipelines had simple enough configuration that a human could read and understand them, but not because I was writing them by hand. One of the ground rules I set for the Octobatch experiment in AI-driven development was that the AIs would write all of the code, and that included writing all of the configuration files. The problem was that even though they were doing the writing, I was unconsciously constraining the AIs: pushing back on anything that felt too complex, steering toward structures I could still hold in my head.

At some point I realized my pushback was placing an artificial limit on the project. The whole point of having AIs write the config was that I didn’t need to keep every single line in my head—it was okay to let the AIs handle that level of complexity. Once I stopped constraining them, the cognitive overhead limit I described earlier went away. I could have full pipelines defined in config, including expression steps with real mathematical logic, without needing to hold all the rules and relationships in my head.

Once the project really got rolling, I never wrote YAML by hand again. The cycle was always: need a feature, discuss it with Claude and Gemini, push back when something seemed off, and one of them produces the updated config. My job was product vision. Their job was translating that into valid configuration. And every config file they wrote was another test of whether the toolkit actually worked.

This job delineation, however, meant inevitable disagreements between me and the AI, and it’s not always easy to find yourself disagreeing with a machine because they’re surprisingly stubborn (and often shockingly stupid). It required persistence and vigilance to stay in control of the project, especially when I turned over large responsibilities to the AIs.

The AIs consistently optimized for technical correctness—separation of concerns, code organization, effort estimation—which was great, because that’s the job I asked them to do. I optimized for product value. I found that keeping that value as my north star and always focusing on building useful features consistently helped with these disagreements.

Keep guidance lean

Once you start growing the toolkit from failures, the natural progression is to overdocument everything. Generative AIs are biased toward generating, and it’s easy to let them get carried away with it. Every bug feels like it deserves a warning, every edge case feels like it needs a caveat, and before long your toolkit file is bloated with guardrails that cost tokens without adding much value. And since the AI is the one writing your toolkit updates, you need to push back on it the same way you push back on architecture decisions. AIs love adding WARNING blocks and exhaustive caveats. The discipline you need to bring is telling them when not to add something.

The right level is to state the principle, give one concrete example, and trust the AI to apply it to new situations. When Claude Code made a choice about JSON schema constraints that I might have second-guessed, I had to decide whether to add more guardrails to TOOLKIT.md. The answer was no—the guidance was already there, and the choice it made was actually correct. If you keep tightening guardrails every time an AI makes a judgment call, the signal gets lost in the noise and performance gets worse, not better. When something goes wrong, the impulse—for both you and the AI—is to add a WARNING block. Resist it. One principle, one example, move on.

Treat every use as a test

There was no separate “testing phase” for Octobatch’s TOOLKIT.md. Every pipeline that I created with it was a new test. After the very first version, I opened a fresh Claude Code session that had never seen any of my development conversations, pointed it at the newly minted TOOLKIT.md, and asked it to build a pipeline. The first time I tried it, I was surprised at how well it worked! So I kept using it, and as the project rolled along, I updated it with every new feature and tested those updates. When something failed, I traced it back to a missing or unclear rule in the toolkit and fixed it there.

That’s the practical test for any toolkit: open a fresh AI session with no context beyond the file, describe what you want in plain English, and see if the output works. If it doesn’t, the toolkit has a bug.

Use more than one model

When you’re building and testing your toolkit, don’t just use one AI. Run the same task through a second model. A good pattern that worked for me was consistently having Claude generate the toolkit and Gemini check its work.

Different models catch different things, and this matters for both developing and testing the toolkit. I used Claude and Gemini together throughout Octobatch development, and I overruled both when they were wrong about product intent. You can do the same thing: If you work with multiple AIs throughout your project, you’ll start to get a feel for the different kinds of questions they’re good at answering.

When you have multiple models generate config from the same toolkit independently, you find out fast where your documentation is ambiguous. If two models interpret the same rule differently, the rule needs rewriting. That’s a signal you can’t get from using just one model.

The manual, revisited

That AT&T PC 6300 manual devoted a full page to labeling diskettes, which may have been overkill, but it got one thing right: it described the building blocks and trusted the reader to figure out the rest. It just had the wrong reader in mind.

The toolkit pattern is the same idea, pointed at a different audience. You write a file that describes your project’s configuration format, its constraints, and enough worked examples that any AI can generate working inputs from a plain-English description. Your users never have to learn YAML or memorize your schema, because they have a conversation with the AI and it handles the translation.

If you’re building a project and you want AI to be able to help your users, start here: write the toolkit file before you write the README, grow it from real failures instead of trying to plan it all upfront, keep it lean, test it by using it, and use more than one model because no single AI catches everything.

The AT&T manual’s Chapter 4 was called “What Every User Should Know.” Your toolkit file is “What Every AI Should Know.” The difference is that this time, the reader will actually use it.

In the next article, I’ll start with a statistic about developer trust in AI-generated code that turned out to be fabricated by the AI itself—and use that to explain why I built a quality playbook that revives the traditional quality practices most teams cut decades ago. It explores an unfamiliar codebase, generates a complete quality infrastructure—tests, review protocols, validation rules—and finds real bugs in the process. It works across Java, C#, Python, and Scala, and it’s available as an open source Claude Code skill.