SkillAudit
- Type
- Benchmark
- Year
- 2026
- Status
- active
Skill-centered assessment for agent skills across utility, efficiency and cost, and safety, backed by sandboxed execution evidence.
SkillAudit accepts an arbitrary agent skill package, derives capability-aligned evaluation tasks, runs paired experiments in isolated sandboxes, and produces auditable reports covering utility, efficiency, cost, and safety. A Chromium extension surfaces the results when developers are deciding whether to install a skill.
Unlike a fixed benchmark suite, the evaluation is generated around the capabilities claimed by each submitted skill. The paired setup compares agent behavior with and without the skill, while preserving execution traces and safety evidence for review.