AI Reference

How To Build And Design Skills

This page is a practical playbook for building reliable AI skills that trigger correctly, run predictable workflows, and stay maintainable as your team scales.

Last updated: 2026-03-11

1. Start with 2-3 concrete outcomes

What to do

Define what the user wants to accomplish in plain language before writing the skill file. Keep scope narrow at first.

Why it matters

A focused scope makes triggering cleaner, instructions shorter, and testing easier. Broad skills often underperform because intent is unclear.

Example

Good: "Set up sprint tasks in Linear from a project brief." Bad: "Help with project management."

Common mistake: Trying to solve every workflow in v1.

2. Write description text for real user phrasing

What to do

In frontmatter, state what the skill does, when to use it, and include realistic trigger phrases users would actually type.

Why it matters

Triggering quality is mostly determined by description clarity. If it is vague, the skill is either ignored or over-triggered.

Example

Use phrases like "create tickets", "plan sprint", "generate handoff" instead of generic text like "helps with tasks."

Common mistake: Using internal architecture jargon instead of user language.

3. Use a strict, predictable skill structure

What to do

Keep folder naming in kebab-case, keep SKILL.md exact, and split large details into references/ and tooling into scripts/.

Why it matters

Predictable structure reduces upload/config failures and improves maintainability when teams scale skill count.

Example

Use customer-onboarding-skill/SKILL.md with references/api-errors.md and scripts/validate_input.py.

Common mistake: Scattering instructions across random files or misnaming SKILL.md.

4. Make every step explicit and executable

What to do

Write instructions as ordered steps with clear inputs, expected output, and completion checks for each phase.

Why it matters

Ambiguous steps create inconsistent behavior. Explicit steps produce stable outcomes across sessions and users.

Example

Step: "Call create_subscription with plan_id and customer_id, then verify subscription status is active."

Common mistake: Using vague instructions like "handle payment setup properly."

5. Validate before side effects

What to do

Add preflight checks before create/update/delete actions. Prefer deterministic script-based checks for critical rules.

Why it matters

Validation gates prevent invalid writes and reduce expensive retries in production systems.

Example

Before creating a deployment ticket: verify service name exists, owner is assigned, and deadline is not in the past.

Common mistake: Only validating after failures happen.

6. Include error handling and recovery paths

What to do

Document common failure modes, likely causes, and exact remediation steps users or the skill should take.

Why it matters

Most real-world breakage comes from auth, connectivity, or data shape issues. Recovery guidance keeps workflows usable.

Example

If API returns 401: refresh token, retry once, then ask user to reconnect integration if still unauthorized.

Common mistake: Assuming happy-path infrastructure with no retries or fallback.

7. Keep core instructions short and defer deep docs

What to do

Put only decision-critical instructions in SKILL.md; link detailed specs, error catalogs, and templates in references/.

Why it matters

Smaller core instructions reduce context load and keep model attention on the active workflow.

Example

In SKILL.md: "For pagination edge cases, consult references/pagination-playbook.md."

Common mistake: Dumping full API manuals inside the main skill file.

8. Test triggering, function, and performance separately

What to do

Run tests in three buckets: trigger accuracy, output correctness, and baseline-vs-skill efficiency.

Why it matters

A skill can produce correct outputs but still fail in production if it triggers poorly or wastes tokens/tool calls.

Example

Track: trigger hit-rate on paraphrases, failed calls per run, and total tool calls compared to no-skill baseline.

Common mistake: Only doing one manual demo and calling it done.

9. Choose one orchestration pattern per workflow

What to do

Pick an explicit pattern (sequential steps, multi-service phases, refinement loop, or context-based routing) for each use case.

Why it matters

Pattern clarity prevents instruction conflicts and makes troubleshooting much faster.

Example

Design handoff: Figma export -> Drive upload -> Linear task creation -> Slack notification.

Common mistake: Mixing multiple patterns in one unstructured instruction block.

10. Treat skills as living assets with versioned iteration

What to do

Use under-triggering, over-triggering, and correction frequency as feedback signals. Update description/instructions and version metadata.

Why it matters

Most skills improve through operational feedback, not first-draft perfection.

Example

If users keep manually enabling a skill, add missing trigger terms and clarify scope boundaries.

Common mistake: Never revisiting frontmatter after launch.

Starter Template

Use this minimal template as a clean starting point, then iterate based on real workflow feedback.

---
name: customer-onboarding
description: End-to-end onboarding for new e-commerce customers. Use when user says "onboard customer", "create subscription", or "set up billing".
metadata:
  version: 1.0.0
  category: onboarding
---

# Customer Onboarding

## Step 1: Validate Inputs
- Confirm customer name, email, and plan are present.
- Stop and ask for missing fields before any write.

## Step 2: Create Customer
- Call create_customer.
- Verify returned customer_id is non-empty.

## Step 3: Create Subscription
- Call create_subscription with customer_id and plan_id.
- Confirm status is active.

## Common Issues
- If auth fails: refresh credentials, retry once, then ask user to reconnect integration.

Validation Checklist

  • 1. Scope is limited to 2-3 concrete workflows.
  • 2. Frontmatter description includes both WHAT and WHEN.
  • 3. Trigger phrases reflect real user wording, not internal jargon.
  • 4. Instructions are step-based with validation gates before side effects.
  • 5. Common errors are documented with exact fixes.
  • 6. references/ is used for deep docs instead of overloading SKILL.md.
  • 7. Triggering tests include paraphrases and explicit non-trigger cases.
  • 8. Functional tests verify outputs, tool call success, and edge-case handling.
  • 9. Performance is compared against a baseline workflow without the skill.
  • 10. Metadata version is updated after meaningful behavior changes.

Skill Marketplaces and References

Use these resources to study current skill patterns, discover publishable formats, and benchmark how other teams package reusable skill workflows.

Full Claude Guide

For deeper context and latest official details, review the full Claude guide on building and structuring skills.