From framework to install: wiring a self-improving content loop into your team's files
by Ygor Fonseca, Founder & Systems Lead
Two marketing teams started the same quarter with the same setup. Both watched Tom Blomfield's YC batch talk on self-improving companies. Both agreed the five-layer frame — sensors, policy, tools, quality gates, learning — was the shape they wanted. Both bought the same AI tools. Both put the same heads of growth in charge.
Three months in, the first team's setup is a folder. A top-level rules file the AI reads on every task. A few smaller instructions files for the work the team does most often. A growing log of what's been learned. The team's head of growth opens it on Monday morning, runs the week's content brief, and the output already reflects last week's edits — because last week's edits went into the file. The second team's setup is a Notion workspace, a Slack channel where the lifecycle manager has been "training the AI in conversation," and a folder of one-off prompts pasted into chat windows. The output looks similar to where it started in February. Nobody's edits compounded.
Same framework. Different install. The gap between the two is the subject of this post.
Yesterday's companion piece walked through what Blomfield's self-improving loop does when applied to growth and content — the five layers as workflow, the LeanBoat blog itself as the worked example. This piece is the other half. What does it actually look like to wire those five layers into a team's files? Where do the layers live? What gets created first? What happens when nothing happens — when the team has the AI tools but the install never gets built?
The install: layer by layer, as files
Every layer in Blomfield's frame becomes one or more files in the install. The files are plain text — markdown, mostly. Versioned the way engineers version code. Read by the AI assistant on demand. Who edits them — the AI on its own, the human after approving an AI proposal, or a mix that varies by file — is a design choice we'll come back to. The install isn't a SaaS product. It's a folder.
Sensor layer. One file per signal source. Each sensor file is a small instructions file that tells the AI what to pull, how to score what it pulled, and where to store the result. The team that pulls a daily research file has a sensor instructions file for that pull — what voices to read, what to skip, what to flag, what fields to fill in on the output. The team that pulls a monthly citation report has a sensor file for that pull. Each sensor produces an output file — the daily research log, the citation log — that the next layer can read.
The shape of a sensor file is consistent. A trigger condition (what makes this pull run). A source list (where it reads from). A scoring rubric (what counts as worth keeping). An output template (what the result looks like when it lands). What the file does NOT contain is the actual analysis. The AI runs the pull, produces the output, and stops. The team reads the output the way an analyst would read a research brief.
Policy layer. One top-level rules file that the AI reads on every task, plus a memory folder for the smaller running rules. The top-level file holds the constants — the brand voice constraints, the off-limit topics, the recent founder calls that override defaults, and the kill switch (the conditions that halt the loop and hand control back to the human). The memory folder holds the running ledger of rules the team has learned over time, with one rule per file. Each rule has a trigger ("when X happens, apply Y") and a reason ("we set this rule after Z went wrong" or "we set this rule because the founder decided W on this date").
The catalog of rules is the part nobody publishes. Architecturally, this is the brand voice in machine-readable form, accumulated over time. It is also the moat. The shape — a top-level file, a memory folder, one rule per file, trigger + reason for each — is shareable. The catalog itself is not.
Tool layer. One instructions file per recurring task. An "instructions file" is what we call a small markdown file that tells the AI assistant how to do one kind of work — write a competitive brief, run a pre-publish check, draft a LinkedIn distillation, propose a headline. Each file has a clear trigger (when to use this), a scope (what this is for and what it's not for), the structural moves the task uses, examples of past instances that landed, examples that didn't. When the team starts a new kind of task, they don't invent a new shape from scratch. They write a new instructions file and the next instance of that task uses it.
The instructions files compound the way the rules do. A new engagement adds a handful of new ones. After six months the folder is the agency's living methodology, written for an AI to run, not for a slide to look good.
Quality gate layer. One check file per gate. A pre-publish review check. A fact-verification check that fires on quoted material. A headline check. A LinkedIn-distillation check. Each check file has a list of items, a pass/fail criterion per item, and a behavior on failure (block ship, flag for review, propose a fix). New checks start in a dry pass — for two weeks, the gate logs what it would have flagged without blocking anything. The human reads the log and decides which items graduate to enforcement and which get cut. A check that hasn't caught a real fail isn't a gate yet.
The check contents are not published either. The shape — a check file per gate, items per check, dry-pass-then-graduate — is the architecture. The actual check items, the actual fail conditions, the actual pattern library a check uses to score a draft, all sit in the folder and stay there.
Learning mechanism layer. One scheduled file per cycle. A weekly learning pass file. A monthly learning pass file. Each file is itself a small instructions file that tells the AI what to read, what patterns to look for, and what kind of proposed edits to produce. The AI reads the prior period of ships and outputs a list: proposed edits to the rules file, proposed new instructions files for tasks that have come up more than twice, proposed updates to sensor configurations, proposed promotions of dry-pass checks to enforcement. The human reviews each proposed edit and approves, modifies, or rejects.
This is the layer most installs skip. The AI generates plenty without it; the system just never compounds. Without a learning file, the team's instructions look the same in November as they did in May. The labor multiplier is gone. The whole reason for the install evaporates.
How the AI knows what to read when
The wiring matters as much as the files. The principle is simple: the AI assistant reads the top-level rules file on every task. The top-level file points to the relevant instructions file by name when the task matches a trigger. The memory pointers get loaded when the task needs context. Sensor outputs, check outputs, and learning outputs all sit in their own files; the assistant reads them when the task references them.
The assistant does not read everything every time. That is the design. A new task triggers the top-level file plus one or two named instructions files plus the relevant memory pointers. A different task triggers a different combination. Anthropic's documentation describes this as routing — the assistant figures out what the task is and pulls in only the files that are relevant. The team does not maintain a giant context window for every invocation. The team maintains a folder, and the assistant assembles the right slice on demand.
The install lives in version control. Every change is a commit with a message and a timestamp. The team can see what changed last week, what the AI proposed and the human approved, what the human rejected. The folder is auditable the way a code repository is auditable. If a piece of output goes wrong, the team can trace which files the assistant used to produce it and what version those files were in at the time.
When the install can run autonomous
Yesterday's piece covered why our loop runs human-gated rather than fully autonomous like Blomfield's YC office-hours agent. The install supports either variant — the policy layer is the switch. The top-level rules file says "for changes in category X, a second AI agent's review is the final gate; for changes in category Y, the human approves." Both are valid installs of the same five-layer frame. The autonomous variant works at YC because the file edits are narrow and low-risk by category; a second AI agent can catch most regressions before merge.
Most marketing and growth installs are not in that category. Edits to brand voice rules, headline conventions, tone constraints, the kill-switch list — these do not fail loudly. A bad edit ships six weeks of subtly off-brand work under a named identity before anyone catches the drift. By the time the team notices, the catalog has been quietly accumulating the wrong shape and the labor multiplier has gone in reverse.
Our default is the human approval gate. The architecture supports both variants; we choose the human gate because the human's judgment is the part the brand pays for. The loop compounds the team's judgment into the folder — that compounding is the point. Autonomous mode would remove the part that adds experience and authority to the content. That is why the human needs to be in this loop. Narrow categories can graduate to autonomous over time — a new sensor configuration that has run cleanly for six weeks, a low-risk instructions file refinement another AI agent can review on its own — with a drift-monitoring agent reading every committed change for pattern shifts and surfacing them to the human weekly. But the policy edits, the brand voice edits, and anything shipping under a named author stay with the human, by design.
The honest test sits in the policy file itself. If the policy file lets the AI commit edits to brand voice or kill-switch rules without a human approval step, the install has chosen autonomous mode and the team has accepted drift as a category of risk. The choice is legible — it's a line in a file anyone on the team can read. We don't ship that line.
Install order
The five layers do not get built on day one. Trying to wire everything at once does not work — the team builds a beautiful empty folder and nothing runs.
A working install builds in roughly this order.
Week 1. Top-level rules file. Working hypothesis at the top — three sentences on what the team is trying to do, why now, what's blocking it. Brand voice constraints. Kill switch — the conditions that halt the loop and hand control back to the human. One instructions file for the single task the team does most often. That is enough to start running real work against the install. Everything else gets added by being needed.
Week 2. First sensor file. Pick the signal that already drives decisions and isn't being captured anywhere — usually related to content engagement, inbound responses, or citation tracking. Wire the sensor pull, point its output to a file the team can read. Add a second instructions file for the next-most-common task. Run the existing instructions file in production and write down what fails.
Week 3. First quality gate. Pick the failure mode that has shown up twice already in weeks 1 and 2. Write the check as a dry pass — log what it would have flagged without blocking anything. After two weeks of dry pass, the human reviews the log and graduates the check to enforcement.
Week 4. First learning pass. The AI reads the prior month of ships, the sensor outputs, and the check logs, then proposes edits to whichever files the pattern points at — the rules file, an instructions file, a sensor configuration, anything that has compounded into a recurring pattern. The human reviews the proposed edits and approves the ones that make sense. From here, the loop runs on a weekly cadence and the files are the part that gets smarter every cycle.
This sequence puts a working install in place by the end of the first month — not a finished install, but one that is producing output and learning. The catalog grows from there. By month three, the team's working methodology lives in the files.
Where the install breaks
Four failure modes, in order of how often they show up.
The first is no write-back. The human reviews the AI's output, makes corrections verbally or by editing the output document, but never updates the underlying files. The same kind of mistake recurs on the next run because the rules and instructions didn't change. The discipline is: no correction counts until the file is updated. If a decision happens off-file, the file is wrong and gets fixed before the next invocation.
The second is one human, one bottleneck. The AI proposes plenty; if one person owns every approval, the queue grows faster than the queue clears. Either approvals get rubber-stamped (the gate stops working) or the system slows until people work around it (the gate stops being used). The fix is rotation by domain. Sensor edits get approved by the person closest to that sensor. Tool-file edits get approved by the person who runs that task. Policy-level edits stay with the lead because the policy is the brand voice.
The third is rules contradicting other rules. Over time the policy memory accumulates rules; some will conflict. The fix is a monthly contradiction pass — the AI reads the rules folder, surfaces conflicts, the human resolves them, the resolutions land back in the files. Without this pass, the AI ends up applying whichever rule it read most recently, which is not the same thing as the rule the team wanted.
The fourth is publishing the catalog. The shape of the install is the publishable part. The catalog of rules, checks, sensor sources, instructions, is the part that compounds and the part that is brand-specific. Treating the install as a downloadable template — a "starter folder" the team hands to a buyer — collapses the moat. The catalog earns its value by being built per-team over time, not by being copied wholesale.
The architecture and the install
Blomfield's frame is the architecture. The install is what it looks like when a team wires the architecture into their files. The architecture is sharable; this post shares it. The install is per-team; the catalog it produces over time is the part the team owns.
The first team in the opening — the one with the folder — built a working install in their first month and let the catalog grow from there. Three months in, the install is the team. Whoever joins next reads the files and is up to speed in an afternoon. Whoever leaves takes their judgment with them, but the rules and instructions they wrote stay. The labor multiplier is doing what Blomfield's frame said it would do.
The second team is still typing into chat windows. Their AI tools are working as advertised. They are also working without anything underneath them.