Field Manual / agents

Writing SKILL.md files that actually work

Skills are procedures, not vibes. The anatomy of a SKILL.md that triggers reliably, refuses to fabricate, and survives being run by an agent — plus how to test and version skills across a team.

If you remember one thing

A skill's description decides WHEN it loads; its numbered procedure decides WHAT happens. Write the description for the trigger, the body as steps an intern could follow, and add evidence rules and approval gates — then run it three times before you trust it.

Cowork’s custom skills are deceptively simple: a markdown file in OneDrive at /Documents/Cowork/skills/<name>/SKILL.md, YAML frontmatter with a name and description, instructions underneath. Auto-discovered every conversation. Fifty skills max, 1 MB per file, 20 companion files and 10 MB per skill.

Anyone can write one in five minutes. Almost nobody writes one that works on the tenth run, on a bad input, on a colleague’s machine. The difference is craft, and the craft is learnable. Here it is.

Skills are procedures, not vibes

The most common failed skill reads like a job posting: “You are an expert communicator who produces polished, insightful weekly reports.” That’s a vibe. The agent already has vibes. What it doesn’t have is your procedure — the steps you’d dictate to a competent intern doing this task for the first time.

The test for every line you write: could someone who has never met you execute this? “Make it punchy” fails. “Max 150 words, lead with the decision needed, no adjectives in the subject line” passes. If the line isn’t executable, it’s decoration, and decoration in a skill file is worse than useless — it dilutes the instructions that matter.

The anatomy, part one: frontmatter decides WHEN

The name and description in the YAML frontmatter aren’t metadata. They’re the routing layer. Each conversation, Cowork scans your skills and decides which ones are relevant to the task at hand — and it decides based on the description.

So write the description for the trigger, not the marketing:

# Bad — describes how great the skill is
description: A comprehensive framework for excellent status reporting

# Good — describes when to fire
description: Use when asked to write, draft, or update my weekly
  status report for my manager, or anything mentioning "Friday update"

The good version names the verbs, the artifact, and the magic phrases real requests contain. If your skill chip never lights up in the side panel, this is the line to fix — the body of the skill is never even read if the description doesn’t match.

The anatomy, part two: the body decides WHAT

Numbered procedures beat prose. Always. A paragraph gets interpreted; a numbered list gets executed. Steps also give the agent (and you, watching the Progress panel) a checkpoint structure:

1. Search my sent mail and meetings from the last 7 days.
2. Extract: decisions made, items I committed to, blockers raised.
3. Draft in the format from template.md (companion file).
4. List anything you could not verify from a real source.
5. Show me the draft. Do NOT send until I approve.

Five lines, and notice that three of the patterns below are already in them.

Evidence rules and anti-fabrication clauses. An agent running unattended will pad thin material to look complete — same retrieval-starvation behavior you know from Copilot Chat, but now it ships in a deliverable. Every skill that summarizes, reports, or analyzes needs an explicit clause: “Only include items you found in actual mail, files, or transcripts. If a section has nothing, write ‘nothing this week’ — do not generalize or infer.” Cheap to write, saves you from the worst failure mode a skill has.

Approval gates on anything that sends or saves. Cowork has a system-level approval flow, but your skill should declare its own gates too: “Show me the final draft before sending,” “Save to /Drafts, never overwrite the original.” Belt and suspenders — because the day someone (maybe future-you) runs this skill with Approve All on, the in-skill gate is the only one left.

Separate STYLE from HARD RULES. Literally use those headings. Style (“prefer short sentences, warm tone”) is guidance the agent may bend under pressure from your prompt. Hard rules (“never include salary data, never email outside the org, always cite the source file”) must survive any prompt. Mixing them teaches the agent that everything is negotiable. Separating them is the closest thing SKILL.md has to a permission system.

The thin-week principle. Design for the bad input. A reporting skill written against a busy week produces fiction on a quiet one — unless you’ve told it that thin weeks should read thin. “A two-line report on a slow week is correct output” is one sentence that immunizes the skill against its most embarrassing failure.

Companion files: stop describing, start showing

You get 20 companion files per skill. Use them for the things prose describes badly: the actual report template, two or three real examples of good output, the boilerplate disclaimer, the list of project codenames. “Match the structure of template.md” outperforms three paragraphs describing that structure — and when the format changes, you update one file instead of rewriting instructions.

A skill with a template and two worked examples behaves like a different species from the same skill without them.

Testing: run it three times

Nobody tests skills, which is why most skills work once. The protocol:

  1. Run it three times, varying the inputs — a normal case, a thin case (empty week, missing file), and a messy case (ambiguous request, conflicting sources). One run tells you it can work; three tell you whether it does work.
  2. Watch the side-panel chips. Did your skill actually load? If the chip didn’t appear, the description failed to trigger and the agent winged it — and a winged success is worse than a failure, because it teaches you false confidence.
  3. Read the output against the hard rules, not against “looks good.” Did it cite sources? Did it stop at the approval gate? Did the thin case read thin?

By the way: you don’t have to start from a blank file. Ask Cowork to create the skill in natural language and it walks you through it — then apply everything above as the editing pass.

Versioning across a team

Here’s the structural problem: skills live in each user’s OneDrive. There is no tenant-wide skill deployment for custom skills — so the moment two people on your team hand-tune their own copies of “the weekly report skill,” you have a fork, and forks rot.

The pattern that works: a shared library folder plus copy-in.

  • Keep the canonical SKILL.md files (and companion files) in a shared location your team already uses — a SharePoint library or shared OneDrive folder, one subfolder per skill, with a version note at the top of each file.
  • Each user copies skills into their own /Documents/Cowork/skills/ folder. Copy-in is the deployment step.
  • Changes go to the library first, then announce “v3 of weekly-report is up, re-copy.” Crude? Yes. But it gives you a single source of truth, a review point before changes spread, and a way to answer “which version are you running?” when someone’s output looks wrong.

Treat the library like code: someone owns each skill, edits get a one-line changelog, and the worked examples get refreshed when the format changes.

The open-standard angle

SKILL.md isn’t a Microsoft invention — it follows the Agent Skills open standard (agentskills.io), the same format used by Claude Code, GitHub Copilot, Cursor, and a growing list of agentic tools. The practical consequences are bigger than they sound:

  • Your skills are portable. The procedure you wrote for Cowork drops into a developer’s Claude Code or Cursor setup with little or no editing. Org knowledge encoded once, usable everywhere your people run agents.
  • The craft transfers too. Everything in this guide — trigger-focused descriptions, numbered procedures, anti-fabrication clauses, hard rules — applies in every tool that reads the format. You’re not learning a Cowork dialect; you’re learning to write procedures for agents, full stop.

That’s the real return on doing this well. A vibe prompt is disposable. A tested, versioned, portable procedure is an asset — and right now, almost nobody in your org knows how to write one.

← All guides Steal prompts from the Vault →