Skills as contracts, not prompts

Prompts don’t compose

Most agent systems I see embed procedural knowledge inside the agent’s system prompt. “When the user asks about X, do Y. When the user asks about Z, prefer the W approach.” It works until the prompt is 4,000 tokens long and nobody can remember which rule came from which incident.

The other common shape is the prompt library. A folder of .md files, each one a named prompt, copy-pasted into context when the agent needs it. Better than one giant prompt. Still brittle. Two prompts that want to compose have no way to reference each other cleanly. The agent has to pick one.

Neither shape is good at what actually happens in production: an agent needs to reach for a capability by name, compose it with another capability, and know when to not use either.

That’s what skills are for.

What a skill actually is

A skill is a folder with a SKILL.md file. Every skill declares, minimally:

name: unique identifier
description: one-sentence purpose
trigger_phrases: natural-language signals that activate the skill
user_invocable: whether a human invokes it directly or it’s agent-only
when_to_use: scope and preconditions
workflows: the actual procedural content, often multi-step
core_principles: what not to do, what to prefer

The file is plain markdown. Git-tracked. Reviewable. Rollback-able. Line-blameable when someone changes it and breaks an agent.

Anthropic published the Skills standard in October 2025. The folder structure, metadata schema, and progressive-disclosure loading model are all in the spec. The public agent-skills reference library follows the contract. So does our internal 32-skill collection at RAGnos Labs.

The important move isn’t the file format. It’s the structural claim: a skill is a contract. The contract says “under these conditions, this workflow applies. Here is how to invoke it. Here is what to do when it doesn’t fit.”

Prompts don’t make that claim. Prompts are strings. A contract gives you triggers, scope, composition, and pruning.

Three reasons skills matter for agent harness design

1. Composability beats monolithic agents

The monolithic “marketing agent” that knows about content, outreach, CMS, social, and voice judgment is a fragile agent. Every capability competes for prompt space. Every new responsibility destabilizes the old ones.

Decompose the same responsibilities into 32 skills and something different happens. The same prompt-optimization skill works inside a release-gate pipeline, inside a content voice-scoring flow, and inside a standalone agent session. The composition surface is wide. The individual components are small.

The agent isn’t doing less. It’s reaching for fewer things at any one moment, and what it reaches for is sharper.

2. Triggered activation beats always-on context

A long system prompt that says “here are all the things you can do” burns tokens even when the agent is doing one of them. At scale that cost compounds. More importantly, it pulls the agent’s attention across capabilities it doesn’t need in the current turn.

Skills invert the default. Nothing is in context until a trigger matches. The trigger_phrases metadata lets the harness scope what’s live at any moment. An agent handling an NDA triage doesn’t need the transcript-editorial skill in context. An agent running a release gate doesn’t need CRM-integration skills.

Context becomes a scarce resource that’s spent deliberately, not a bucket you dump everything into.

Anthropic’s own framing is that context is a finite resource. Skills are one of the cleanest patterns I’ve found for respecting that.

3. Versioned and testable beats prompts-in-code

A prompt embedded in agent code is invisible to most review processes. Code review catches the code around it. The prompt gets waved through because “it’s just a string.”

A skill is a file. It shows up in git blame. Pull requests that change a skill’s behavior are legible. You can tag a skill version and roll back. You can A/B two versions of the same skill by swapping the trigger.

The cost of introducing a skill is higher than the cost of adding a prompt. The cost of maintaining it is dramatically lower.

Skills as agent affordances

Anthropic’s description of Skills is that they’re folders of instructions, scripts, and resources that make Claude better at specialized tasks, using progressive disclosure and executable code.

The word I keep coming back to: affordance.

An agent with access to rough-cutter (a transcript-to-timeline editorial flow) is a different agent from one without it. An agent with access to prompt-optimization (a self-improving prompt analyzer) behaves differently from one without it. The skill isn’t just a capability attached to the agent. The skill changes what the agent is.

That’s what “agent affordance” means in the literal sense. A skill affords the agent a way of acting it didn’t have before.

The 32 skills are the per-session capability map. The agent can reach for any one of them, compose them, or decide none apply.

Invocation is plural, not single

Three ways a skill gets invoked:

User explicit: user types /skill-name or asks for the capability directly
Agent trigger: agent detects a trigger phrase or context match and auto-invokes
Composition: another skill references this skill, chaining them

Each invocation gets logged with the triggering context. Which means after a few weeks you can ask the actual questions a practitioner wants answered:

Which skills are used most?
Which skills get auto-invoked vs. explicitly called?
Which skills compose together, and which never do?
Which skills haven’t fired in 30 days?

That last question is the interesting one.

Pruning is the unglamorous discipline

Skills decay.

A skill that hasn’t triggered in 30+ days is not an accomplishment. It’s clutter. It inflates the trigger-matching surface. It makes the catalog harder to maintain. It signals “we started this and didn’t finish.”

Every few weeks someone runs a sweep of skill-use logs. Dead skills get removed or consolidated. Live skills that are duplicating each other get merged. New triggers get added where the logs show near-misses.

Without this discipline, the 32-skill library would be 200 in eighteen months. Half of them stale. The other half competing for the same triggers.

The discipline is unglamorous because it’s the opposite of launching. Launching is creation. Pruning is subtraction. But a skill library without pruning is a fossil record, not a working system.

When not to make a skill

Not every procedural pattern deserves a skill.

A one-off task that you’ll never repeat. Just write the prompt inline and move on. Don’t pollute the library.

A workflow that only one person uses. Keep it in their personal notes, not the shared skill catalog.

A capability that’s already a first-class tool (a CLI, an API endpoint, a library function). Wrap the tool in the agent’s tool set. Don’t re-describe the tool as a skill.

A procedure that’s actively being redesigned. Wait until the shape stabilizes. Writing a skill around a moving target locks in the wrong version.

The test I use: would I want another agent to reach for this, a month from now, by name, without me being there to explain it? If yes, it’s a skill. If no, it’s a note.

The bigger thesis

Prompts describe what to say. Skills describe what to do.

That sounds like a small difference until you watch it compound. A prompt-first agent accumulates instructions until the prompt eats itself. A skills-first agent accumulates capabilities the way a person accumulates skills. Named, practiced, composable, occasionally pruned.

Both approaches can work on a toy. Only one of them stays maintainable when you’re trying to run a real business on top of it.

Public reference implementation: github.com/ragnos-labs/agent-skills. Starts with the /ship release-gate skill; more to come.