SkillComposer: Structured Skill Composition Generation for LLM Agents

This paper addresses the skill selection bottleneck faced by large language model agents in complex tasks by proposing SkillComposer, a structured skill composition generation framework. Unlike existing approaches that treat skill selection as independent retrieval or reasoning problems and ignore the strong coupling among skill subsets, quantities, and execution order, SkillComposer formalizes the process as a task-conditioned skill sequence prediction. Using a constrained autoregressive decoder, it jointly determines the activated skill subset, count, and execution order in a single decoding step. Experiments on the SkillsBench benchmark with a human-curated skill library show that SkillComposer improves pass rates by 23.1 and 18.2 percentage points over skillless baselines on GPT-5.2-Codex and Gemini-3-Pro-Preview, respectively, surpassing top-3 retrieval strategies while approaching the upper bound of golden-skill retrieval at lower prompt token cost.

Background and Context

The integration of large language model agents into complex problem-solving workflows has revealed a critical bottleneck in skill selection. As agents increasingly rely on modular packages that encapsulate procedural knowledge and instructions, the scale of available skill libraries has expanded significantly. This growth, while enhancing the potential for task reuse across different domains, has introduced a fundamental challenge: identifying the optimal subset of skills from a vast repository. Current mainstream methodologies typically approach this challenge through two distinct lenses. The first involves exposing the agent's entire reasoning process to the full skill collection, while the second relies on embedding vectors or LLM-based rerankers to retrieve relevant tools. Although these approaches offer valuable insights into tool usage, they fundamentally treat skill selection as an independent retrieval or reasoning problem. This perspective ignores the strong coupling between the subset of skills chosen, the quantity of skills activated, and their execution order. Consequently, existing methods struggle to model the interdependencies between skills, limiting agent performance in scenarios where sequential logic and combined tool usage are essential.

To address this structural deficiency, the research introduces SkillComposer, a framework that formalizes skill selection as a task-conditioned skill sequence prediction problem. Rather than viewing skill selection as a series of disjointed steps, SkillComposer treats it as a unified composition task. This shift in perspective aligns more closely with the logical requirements of actual programming and task execution, where the decision of which tools to use is inextricably linked to when and in what order they are invoked. By redefining the problem space, the framework aims to capture the nuanced dependencies that traditional retrieval methods miss, thereby enabling agents to construct more coherent and effective action plans for complex tasks.

Deep Analysis

The core innovation of SkillComposer lies in its use of a constrained autoregressive decoder to predict skill identifiers directly. This architectural choice allows the model to jointly determine the activated skill subset, the number of skills, and their execution order within a single decoding pass. Unlike multi-step heuristic rules or independent modules that are pieced together, this end-to-end sequence prediction approach ensures that the dependencies between consecutive skills are naturally captured. Each subsequent skill prediction is conditioned on the previously generated sequence, allowing the model to learn and enforce logical constraints dynamically. This design not only simplifies the system architecture but also significantly enhances the accuracy and executability of the generated plans by constraining the decoding space to valid combinations.

The training data for SkillComposer was constructed from a human-curated skill library, ensuring high quality and relevance. By extracting task-composition pairs from real-world scenarios, the researchers ensured that the model learned from practical examples of effective skill usage. This empirical foundation is critical for the model's ability to generalize to unseen tasks. The constrained decoding mechanism plays a pivotal role here, as it prevents the generation of invalid or logically conflicting skill combinations. By enforcing structural constraints during the prediction phase, the framework avoids the common pitfall of agents proposing tool sequences that are technically possible but practically incoherent. This rigorous approach to sequence generation ensures that the output is not only diverse but also strictly adherent to the logical flow required for successful task completion.

Furthermore, the framework's ability to handle the joint decision-making process of subset selection, quantity determination, and ordering addresses a key limitation of previous methods. By treating these three dimensions as inseparable, SkillComposer can model complex interactions between skills that independent retrieval strategies would overlook. For instance, the effectiveness of a specific tool might depend heavily on the preceding tool in the sequence, a relationship that is easily captured by the autoregressive nature of the decoder but lost in flat retrieval models. This holistic view of skill composition allows the agent to construct sophisticated workflows that leverage the synergistic effects of multiple tools, leading to more robust and reliable performance in complex environments.

Industry Impact

The evaluation of SkillComposer was conducted on the SkillsBench benchmark, focusing on composition quality and downstream task success rates. The experiments were performed on two production-grade coding agents, one based on the GPT-5.2-Codex model and the other on the Gemini-3-Pro-Preview model. The results demonstrated significant improvements in task pass rates. Specifically, SkillComposer improved the pass rate by 23.1 percentage points on GPT-5.2-Codex and by 18.2 percentage points on Gemini-3-Pro-Preview compared to skillless baselines. These gains are not merely incremental; they represent a substantial leap in the agent's ability to successfully execute complex tasks that require multi-step tool usage. The framework's performance also surpassed traditional top-3 retrieval strategies, indicating that the structured sequence prediction approach is more effective than simple relevance-based filtering.

A critical aspect of SkillComposer's impact is its efficiency in resource utilization. The framework achieved performance levels that approached the theoretical upper bound of golden-skill retrieval while incurring lower prompt token costs. This efficiency is vital for industrial applications, where the cost of API calls and the latency of token generation are significant constraints. By reducing the number of tokens required to identify and sequence the correct skills, SkillComposer lowers the economic barrier to entry for deploying sophisticated agent systems in real-time scenarios. This cost-effectiveness makes it feasible to integrate complex skill combinations into applications that demand high responsiveness and scalability, such as automated customer support, real-time data analysis, and dynamic code generation.

The implications for the open-source community and industrial deployment are profound. By providing a reproducible benchmark and a reference implementation based on a human-curated skill library, SkillComposer sets a new standard for agent skill management. It offers a clear path for other researchers and developers to build upon, fostering a more standardized and efficient ecosystem for agent development. The framework's success in bridging the gap between theoretical capability and practical efficiency highlights its potential to accelerate the adoption of advanced agent technologies across various industries. It demonstrates that with the right architectural choices, agents can move beyond simple tool calling to engage in complex logical planning, thereby unlocking new levels of automation and productivity.

Outlook

The success of SkillComposer in demonstrating the efficacy of structured sequence prediction for skill composition opens new avenues for future research. One promising direction is the exploration of more complex skill dependency structures, such as conditional branching and parallel execution paths. As agents become more capable, the need for frameworks that can handle non-linear workflows will grow. Additionally, the development of dynamic skill library update mechanisms is crucial for maintaining the relevance and accuracy of the agent's knowledge base in rapidly changing environments. SkillComposer's architecture provides a solid foundation for integrating such dynamic updates, allowing agents to adapt their skill sets in real-time based on new information or changing task requirements.

Another significant area for advancement is cross-domain skill transfer. The ability to generalize skills learned in one context to another could dramatically reduce the effort required to onboard agents into new domains. By leveraging the structured nature of skill compositions, researchers can investigate methods for transferring not just individual skills but entire workflow patterns. This could lead to more versatile agents that can quickly adapt to novel tasks by recombining existing skills in innovative ways. Furthermore, the principles underlying SkillComposer can be applied to other fields requiring complex decision sequence generation, such as supply chain optimization, financial trading, and medical diagnosis, showcasing the broader potential of joint decision models in handling high-dimensional combinatorial problems.

Ultimately, SkillComposer represents a step forward in the evolution of AI agents from simple tool users to sophisticated planners. By addressing the structural challenges of skill selection and composition, it provides a robust framework for building agents that can navigate the complexity of real-world tasks with greater autonomy and efficiency. As the field continues to advance, the insights gained from this research will likely inform the design of next-generation agent architectures, pushing the boundaries of what is possible in automated reasoning and action. The journey toward fully autonomous agents is ongoing, and frameworks like SkillComposer are essential building blocks in this endeavor, paving the way for a future where AI systems can seamlessly integrate into and enhance human workflows.

Sources