资料内容:
some specification. Successful code generation can improve the efficiency and quality of software
development, even causing changes in social production modes. Therefore, code generation has been
a significant research hotspot in the fields of artificial intelligence, natural language processing, and
software engineering. Recently, code generation has made substantial advancements in both academic
and industrial domains [Chen et al., 2021, Shen et al., 2022, Li et al., 2022, Dong et al., 2023a]. In
particular, LLMs have achieved excellent performance and demonstrate promising potential on code
generation tasks [Nijkamp et al., 2022, Fried et al., 2022, Zheng et al., 2023].
Nonetheless, generating correct code for complex requirements poses a substantial challenge, even for
experienced human programmers. Intuitively, humans, as social beings, tend to rely on collaborative
teamwork when encountering complex tasks. Teamwork through division of labor, interaction,
and collaboration to solve complex problems, which has been theorized to play an important role
in dealing with complexity, as posited in both teamwork theory [Belbin, 2012, Katzenbach and
Smith, 2015] and software engineering practice [Beck et al., 2001, McChesney and Gallagher, 2004,
∗Equal Contribution
†Corresponding author
Preprint. Under review.
DeMarco and Lister, 2013]. The benefits of collaborative teamwork are manifold: 1) It breaks down
complex tasks into smaller subtasks, making the entire code generation process more efficient and
controllable. 2) It assists with error detection and quality control. Team members can review and test
the generated code, providing feedback and suggestions for improvement, thus reducing potential
errors and defects. 3) It ensures that the generated code is consistent with the expected requirements.
Team members can offer different viewpoints to solve problems and reduce misunderstandings.
A straightforward way to implement collaborative teamwork entails training different models to handle
the corresponding subtasks, subsequently conducting joint training to foster mutual understanding
of behaviors to assemble them into a team [Schick et al., 2022]. However, this training approach is
costly, especially for LLMs. The scarcity of relevant training data further exacerbates the difficulty of
achieving collaborative code generation. Revolutionary advancements in artificial general intelligence
(AGI), especially LLMs represented by ChatGPT [OpenAI], provide a turning point. These LLMs
perform commendably across tasks in various stages of software development, laying the groundwork
for division of labor. Furthermore, LLMs use language as the foundation for input and output
and align with human needs through instructions or prompts, offering the potential for inter-model
interaction and collaboration.
To this end, we propose a self-collaboration framework aimed at guiding LLMs to collaborate
with themselves, thereby dealing with complex requirements and boosting the performance of code
generation. This framework is divided into two parts: division of labor and collaboration, both
of which are dominated by role instructions. First, role instructions achieve division of labor by
allocating specific roles and responsibilities to LLMs. This strategy enables LLMs to think about and
tackle tasks from the standpoint of the roles they play, transforming the original LLMs into domain
“experts”, as shown in Fig. 1. Second, role instructions control the interaction between roles, allowing
otherwise isolated roles to form a virtual team and facilitate each other’s work.