About Evaluation Items for AI Collaboration Skills

"AI Collaboration Skills" is a category that evaluates the ability to appropriately instruct AI, prepare necessary information and environments, and enhance the quality of deliverables by validating outputs when proceeding with development using an AI coding assistant.

Assessment is based not simply on whether AI is being used, but on how AI is integrated into the development process and whether the user takes responsibility for the final quality.

Structure of Evaluation Items

AI Collaboration Skills
├── Agent Environment Construction
│   ├── Static Context
│   └── Tool Utilization
├── AI Instruction
│   ├── Task Design
│   ├── Instruction Clarity
│   └── Dynamic Context
├── Iterative Improvement with AI
│   └── Convergence Efficiency
└── Validation of AI Output
    └── AI Output Review

AI Collaboration Skills consist of the following four categories:

Category	Description
Agent Environment Construction	Preparations to ensure the AI understands the project and can work appropriately.
AI Instruction	The ability to clearly communicate objectives, constraints, and procedures to the AI.
Iterative Improvement with AI	The ability to efficiently improve based on AI outputs or errors by analyzing their causes.
Validation of AI Output	The human ability to verify AI-generated code and deliverables to ensure quality.

List of Evaluation Items

Static Context

Evaluates whether documents such as AGENT.md, CLAUDE.md, custom rules, and skill settings have been established so that the AI can understand project premises before starting work.

It is crucial that project structure, coding conventions, and guidelines are predefined, creating a state where the AI can work in accordance with those requirements.

Tool Utilization

Evaluates whether CLI, MCP, external integration tools, etc., are utilized to enhance AI work efficiency and accuracy.

Ideally, the user selects appropriate tools based on the task and switches between them while considering characteristics such as speed, accuracy, and token consumption.

Task Design

Evaluates whether tasks requested of the AI are broken down into appropriate levels of granularity.

For multi-step tasks, users are required to consider dependencies and execution order while clarifying deliverables and completion criteria for each phase. This assesses whether the user organizes work into a form easy for AI to handle, rather than simply offloading large tasks as-is.

Instruction Clarity

Evaluates whether instructions given to the AI are specific and easy to understand.

When goals, constraints, scope, prohibitions, expected output formats, and completion criteria are clearly communicated, it becomes easier for the AI to proceed without hesitation. If instructions are ambiguous, the AI is more likely to engage in unnecessary exploration or guesswork, leading to lower quality and efficiency.

Dynamic Context

Evaluates whether the necessary information for each task is provided to the AI at the appropriate timing.

It is important to provide relevant files, specification documents, background information, and change logs without excess or deficiency, and to add or update information as the work progresses. If necessary information is missing, the AI may explore unrelated files or work based on incorrect premises.

Convergence Efficiency

Evaluates whether problems are solved efficiently through interaction with the AI.

When errors or bugs occur, this assesses whether the user analyzes the cause, provides corrective instructions, and reaches the goal in a small number of iterations. Repeatedly giving the same instructions or continuing trial-and-error without investigating the cause will result in a lower evaluation.

AI Output Review

Evaluates whether a human is properly verifying the code and deliverables generated by the AI.

It is vital not to accept AI output blindly, but to review the content and correct, improve, or reject it if issues are found. This assesses whether the user ensures final quality through their own judgment rather than leaving it entirely to the AI.

Evaluation Philosophy

In AI Collaboration Skills, the collaborative process with the AI is subject to evaluation, not just the results of using AI.

High evaluations are given for the following states:

The environment and rules are prepared in advance to make it easy for the AI to work.
Task objectives, constraints, and expected deliverables are clearly communicated.
Necessary contexts, such as files and specifications, are provided appropriately.
When problems occur, improvements are made efficiently by analyzing the cause.
The human takes responsibility for verification rather than taking AI output at face value.

Conversely, states such as offloading tasks entirely to the AI, giving ambiguous instructions, failing to provide necessary information, or adopting generated content without verification tend to result in low evaluations.