Agent Tooling
Agent tooling is not a separate product surface. It adds progressively higher-level adapters over the same metadata, validation rules, and package APIs that humans use. Build the slow deterministic layer first. Add model-facing tools only after the lower layer can answer the question or reject the invalid state.
Policy Compiler
Turn repeated guidance into facts, gates, commands, and agent tools.
Build Inside Out
- Metadata is the durable contract. Generate facts once, then let docs, lint, CLI, skills, apps, and MCP consume them. If a fact exists only in a prompt, README, or tool description, the harness does not own it.
- Static tools for when agents repeat a mistake: type or schema, lint rule, test, CLI validation, MCP tool. If CI cannot enforce a rule that a parser can see, the harness is incomplete.
- CLI and skills to adapt the deterministic layer for humans and agent workflows. The CLI proves a capability without chat context. Skills should carry workflow order and project policy, not duplicate API catalogs.
- MCP and MCP Apps expose existing services to agents through narrow schemas, structured outputs, and explicit side-effect annotations. They should mirror the service layer, not own domain logic.
Layering Rules
- Start with metadata. Add or fix the generated fact before building consumers.
- Fail statically. Prefer lint, types, and tests for any rule a parser can verify.
- Prove with CLI. Make the command usable by humans before agents call it.
- Guide with skills. Put workflow order, repository policy, and validation habits in skills.
- Expose through MCP last. Use MCP as a focused access point over existing services and standardized functionality.
- Do not hide facts in prompts. Prompts are runtime hints, not durable data.
- Do not make MCP the source of truth. Treat it as an adapter over services.
- Do not ship model-only validation. If CI cannot enforce it, the harness is incomplete.
- Do not return raw dumps. Distill context before it reaches the agent.
Decision Checklist
Before creating a new agent-facing tool, answer these questions:
- What metadata does this tool need, and where is that metadata generated?
- Which invalid states can type checking, JSON Schema, linting, or tests reject first?
- Can a human use the same capability through the CLI without chat context?
- Does the tool have a bounded input schema and a structured output schema?
- Is the result distilled for the task, or does it push context cleanup onto the model?
- Is this capability general enough for MCP, or is it only workflow context for a skill?
- What test fails if the tool disappears, changes shape, or starts returning stale data?
If the answer starts with "tell the model to remember," stop. Build the harness layer that makes remembering unnecessary.