Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey
📖 Documentation Website | 📄 Full Paper | 📋 Tables & Resources
🎙️ Interactive Exploration:
Based on a systematic review of 176 papers and online resources, this survey establishes a holistic theoretical framework for Issue Resolution in software engineering. We examine how Large Language Models (LLMs) are transforming the automation of GitHub issue resolution. Beyond the theoretical analysis, we have curated a comprehensive collection of datasets and model training resources, which are continuously synchronized with our GitHub repository and project documentation website.
🔍 Explore This Survey:
- 📊 Data: Evaluation and training datasets, data collection and synthesis methods
- 🛠️ Methods: Training-free (agent/workflow) and training-based (SFT/RL) approaches
- Training-free Methods
- Training-based Methods
- 🔍 Analysis: Insights into both data characteristics and method performance
- 📋 Tables & Resources: Comprehensive statistical tables and resources
- 📄 Full Paper: Read the complete survey paper
- 🤝 Contributing: How to contribute to this project
Total: 176 works across 14 categories
Benchmarks for evaluating issue resolution systems
- SWE-bench Lite: SWE-bench: Can Language Models Resolve Real-world Github Issues? (2024)
- SWE-bench Verified: Introducing SWE-bench Verified | OpenAI (2024)
- SWE-bench-java: SWE-bench-java: A GitHub Issue Resolving Benchmark for Java (2024)
- Visual SWE-bench: CodeV: Issue Resolving with Visual Data (2025)
- SWE-Lancer: SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? (2025)
- FEA-Bench: FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation
for Feature Implementation (2025)
- Multi-SWE-bench: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving (2025)
- SWE-PolyBench: SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents (2025)
- SWE-bench Multilingual: SWE-smith: Scaling Data for Software Engineering Agents (2025)
- SwingArena: SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving (2025)
- SWE-bench Multimodal: SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? (2025)
- OmniGIRL: Omnigirl: A multilingual and multimodal benchmark for github issue resolution (2025)
- SWE-bench-Live: SWE-bench Goes Live! (2025)
- SWE-Factory: SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks (2025)
- SWE-MERA: SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks (2025)
- SWE-Perf: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? (2025)
- SWE-Bench Pro: SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? (2025)
- SWE-InfraBench: SWE-InfraBench: Evaluating Language Models on Cloud Infrastructure Code (2025)
- SWE-Sharp-Bench: SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks (2025)
- SWE-fficiency: SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads? (2025)
- SWE-Compass: SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models (2025)
- SWE-EVO: SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios (2025)
Datasets for training issue resolution systems
- SWE-bench-extra: SWE-bench: Can Language Models Resolve Real-world Github Issues? (2024)
- Multi-SWE-RL: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving (2025)
- R2E-Gym: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents (2025)
- SWE-Synth: SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs (2025)
- LocAgent: OrcaLoca: An LLM Agent Framework for Software Issue Localization (2025)
- SWE-Smith: SWE-smith: Scaling Data for Software Engineering Agents (2025)
- SWE-Fixer: SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
- SWELoc: SweRank: Software Issue Localization with Code Ranking (2025)
- SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym
- SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner (2025)
- SWE-Factory: SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks (2025)
- Skywork-SWE: Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs (2025)
- RepoForge: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale (2025)
- SWE-Mirror: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories (2025)
- SWE-Lego: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving (2026)
Individual autonomous agents for issue resolution
- SWE-agent: Swe-agent: Agent-computer interfaces enable automated software engineering (2024)
- Aider (2026)
- Devin: SWE-bench technical report (2025)
- PatchPilot: PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification (2025)
- LCLM: Putting It All into Context: Simplifying Agents with LCLMs (2025)
- DGM: Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents (2025)
- Trae Agent: Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling (2025)
- Live-SWE-agent: SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents (2025)
- Lita: Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs (2025)
- TOM-SWE: TOM-SWE: User Mental Modeling For Software Engineering Agents (2025)
- Confucius Code Agent: Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases (2025)
Collaborative multi-agent frameworks
- MAGIS: MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution (2024)
- AutoCodeRover: AutoCodeRover: Autonomous Program Improvement (2024)
- CodeR: CodeR: Issue Resolving with Multi-Agent and Task Graphs (2024)
- OpenHands: OpenHands: An Open Platform for AI Software Developers as Generalist Agents (2025)
- AgentScope: SWE-Bench - AgentScope (2025)
- OrcaLora: OrcaLoca: An LLM Agent Framework for Software Issue Localization (2025)
- DEI: Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents (2025)
- MarsCode Agent: MarsCode Agent: AI-native Automated Bug Fixing (2024)
- Lingxi: Lingxi/docs/Lingxi Technical Report 2505.pdf at master · lingxi-agent/Lingxi (2026)
- Devlo: Achieving SOTA on SWE-bench (2026)
- Refact.ai Agent: AI Coding Agent for Software Development - Refact.ai (2025)
- HyperAgent: HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale (2024)
- SWE-Search: SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement (2025)
- CodeCoR: CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation (2025)
- Agent KB: Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving (2025)
- SWE-Debate: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution (2026)
- SWE-Exp: SWE-Exp: Experience-Driven Software Issue Resolution (2025)
- Meta-RAG: Meta-RAG on Large Codebases Using Code Summarization (2025)
Structured pipeline approaches
- Agentless: Demystifying LLM-Based Software Engineering Agents (2025)
- Conversational Pipeline: Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench (2024)
- SynFix: SynFix: Dependency-Aware Program Repair via RelationGraph Analysis (2025)
- CodeV: CodeV: Issue Resolving with Visual Data (2025)
- GUIRepair: Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing (2025)
Methods leveraging external tools
- MAGIS: MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution (2024)
- AutoCodeRover: AutoCodeRover: Autonomous Program Improvement (2024)
- SWE-agent: Swe-agent: Agent-computer interfaces enable automated software engineering (2024)
- Alibaba LingmaAgent: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration (2025)
- OpenHands: OpenHands: An Open Platform for AI Software Developers as Generalist Agents (2025)
- SpecRover: SpecRover: Code Intent Extraction via LLMs (2025)
- MarsCode Agent: MarsCode Agent: AI-native Automated Bug Fixing (2024)
- RepoGraph: RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
- SuperCoder2.0: SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer (2024)
- EvoCoder: LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues (2024)
- AEGIS: AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions (2025)
- CoRNStack: CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking (2025)
- OrcaLoca: OrcaLoca: An LLM Agent Framework for Software Issue Localization (2025)
- DARS: DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal (2025)
- Otter: Otter: Generating Tests from Issues to Validate SWE Patches (2025)
- Quadropic Insiders: Quadropic Insiders : Syntheo Tops Swelite Feb (2025)
- Issue2Test: Issue2Test: Generating Reproducing Test Cases from Issue Reports (2025)
- KGCompass: Enhancing repository-level software repair via repository-aware knowledge graphs (2025)
- CoSIL: Issue Localization via LLM-Driven Iterative Code Graph Searching (2025)
- InfantAgent-Next: InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction (2025)
- Co-PatcheR: Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models (2025)
- SWERank: SweRank: Software Issue Localization with Code Ranking (2025)
- Nemotron-CORTEXA: Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity (2025)
- LCLM: Putting It All into Context: Simplifying Agents with LCLMs (2025)
- SACL: SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization (2025)
- SWE-Debate: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution (2026)
- OpenHands-Versa: Coding Agents with Multimodal Browsing are Generalist Problem Solvers
- SemAgent: SemAgent: A Semantics Aware Program Repair Agent (2025)
- Repeton: Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles (2025)
- cAST: cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree (2025)
- Prometheus: Prometheus: Unified Knowledge Graphs for Issue Resolution in Multilingual Codebases (2025)
- Git Context Controller: Git Context Controller: Manage the Context of LLM-based Agents like Git (2025)
- Trae Agent: Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling (2025)
- BugPilot: BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills (2025)
- TestPrune: When Old Meets New: Evaluating the Impact of Regression Tests on SWE Issue Resolution (2025)
- Meta-RAG: Meta-RAG on Large Codebases Using Code Summarization (2025)
- InfCode: InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution (2025)
- GraphLocator: GraphLocator: Graph-guided Causal Reasoning for Issue Localization (2025)
Systems with memory mechanisms
- Infant Agent: Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage (2024)
- EvoCoder: LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues (2024)
- Learn-by-interact: Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments (2025)
- DGM: Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents (2025)
- ExpeRepair: EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair (2025)
- Agent KB: Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving (2025)
- SWE-Exp: SWE-Exp: Experience-Driven Software Issue Resolution (2025)
- RepoMem: Improving Code Localization with Repository Memory (2025)
- AgentDiet: Improving the Efficiency of LLM Agent Systems through Trajectory Reduction (2025)
- ReasoningBank: ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory (2025)
- MemGovern: MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences (2026)
Models trained via supervised fine-tuning
- Lingma SWE-GPT: SWE-GPT: A Process-Centric Language Model for Automated Software Improvement (2025)
- ReSAT: Repository Structure-Aware Training Makes SLMs Better Issue Resolver (2024)
- Scaling data collection: Scaling Data Collection for Training SWE Agents (2024)
- CodeXEmbed: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval (2025)
- SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym
- Thinking Longer: Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute (2025)
- Search for training: Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents (2025)
- Co-PatcheR: Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models (2025)
- MCTS-Refined CoT: MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution (2025)
- SWE-Swiss: SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution (2025)
- Devstral: Devstral: Fine-tuning Language Models for Coding Agent Applications (2025)
- Kimi-Dev: Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents (2025)
- SWE-Compressor: Context as a Tool: Context Management for Long-Horizon SWE-Agents (2025)
- SWE-Lego: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving (2026)
- Agentic Rubrics: Agentic Rubrics as Contextual Verifiers for SWE Agents (2026)
Models trained via reinforcement learning
- SWE-RL: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution (2025)
- SoRFT: SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning (2025)
- SEAlign: SEAlign: Alignment Training for Software Engineering Agent (2026)
- SWE-Dev1: SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development (2025)
- Satori-SWE: Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering (2025)
- Agent-RLVR: Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards (2025)
- DeepSWE: DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL (2025)
- SWE-Dev2: SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling (2025)
- Tool-integrated RL: Tool-integrated Reinforcement Learning for Repo Deep Search (2025)
- SWE-Swiss: SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution (2025)
- SeamlessFlow: SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling (2025)
- DAPO: Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning (2025)
- CoreThink: CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs (2025)
- CWM: CWM: An Open-Weights LLM for Research on Code Generation with World Models (2025)
- EntroPO: Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization (2025)
- Kimi-Dev: Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents (2025)
- FoldGRPO: Scaling Long-Horizon LLM Agent via Context-Folding (2025)
- GRPO-based Method: A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning (2025)
- TSP: Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair (2025)
- Self-play SWE-RL: Toward Training Superintelligent Software Agents through Self-Play SWE-RL (2025)
- SWE-Playground: Training Versatile Coding Agents in Synthetic Environments (2025)
- Supervised RL: Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning (2025)
- OSCA: Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation (2025)
- SWE-RM: SWE-RM: Execution-free Feedback For Software Engineering Agents (2025)
- One Tool Is Enough: One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents (2025)
- Let It Flow: Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem (2025)
- KAT-Coder: KAT-Coder Technical Report (2025)
- Seed1.5-Thinking: Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning (2025)
- Deepseek V3.2: DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models (2025)
- Kimi-K2-Instruct: Kimi K2: Open Agentic Intelligence (2025)
- GLM-4.6: gpt-oss-120b & gpt-oss-20b model card (2025)
- Qwen3-Coder: Qwen3 Technical Report (2025)
- GLM-4.6: Glm-4.5: Agentic, reasoning, and coding (arc) foundation models (2025)
- Minimax M2: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention (2025)
- LongCat-Flash-Think: Introducing LongCat-Flash-Thinking: A Technical Report (2025)
- MiMo-V2-Flash: MiMo-V2-Flash Technical Report (2026)
Methods for scaling at inference time
- SWE-Search: SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement (2025)
- ReasoningBank: CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2025)
- SWE-PRM: When Agents go Astray: Course-Correcting SWE Agents with PRMs (2025)
- SIADAFIX: SIADAFIX: issue description response for adaptive program repair (2025)
Techniques for collecting training data
- SWE-rebench: SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents (2025)
- RepoLaunch: SWE-bench Goes Live! (2025)
- SWE-Factory: SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks (2025)
- SWE-MERA: SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks (2025)
- RepoForge: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale (2025)
- Multi-Docker-Eval: Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering (2025)
Approaches for synthetic data generation
- Learn-by-interact: Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments (2025)
- R2E-Gym: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents (2025)
- SWE-Synth: SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs (2025)
- SWE-smith: SWE-smith: Scaling Data for Software Engineering Agents (2025)
- SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner (2025)
- SWE-Mirror: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories (2025)
Analysis of datasets and benchmarks
- SWE-bench Verified: Introducing SWE-bench Verified | OpenAI (2024)
- Patch Correctness: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study (2025)
- UTBoost: UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench (2025)
- Trustworthiness: Is Your Automated Software Engineer Trustworthy? (2025)
- Rigorous agentic benchmarks: Establishing Best Practices for Building Rigorous Agentic Benchmarks (2025)
- The SWE-Bench Illusion: The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason (2025)
- Revisiting SWE-Bench: Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code Models (2025)
- SPICE: SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation (2025)
- Data contamination: Does SWE-Bench-Verified Test Agent Ability or Model Memory? (2025)
Comparative analysis of different methods
- Context Retrieval: On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing (2024)
- Evaluating software development agents: Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios (2025)
- Overthinking: The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks (2025)
- Beyond final code: Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios (2025)
- GSO: GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents (2025)
- Dissecting the SWE-Bench Leaderboards: Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems (2025)
- Security analysis: How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench (2025)
- Failures analysis: An Empirical Study on Failures in Automated Issue Solving (2025)
- SeaView: SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow (2025)
- SWEnergy: SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs (2026)
- Strong-Weak Model Collaboration: An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation (2025)
- Agents in the Wild (2025)
Comprehensive tables and statistics about issue resolution datasets, methods, and benchmarks.
A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.
A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.
Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).
A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.
Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.
Windows:
run.batLinux/Mac:
chmod +x run.sh
./run.shOptions:
[1]Add Paper - Interactive paper entry with duplicate check[2]Add Table - Update statistical tables[3]Batch Import - Import papers from CSV template[4]Sync & Build - Render website and sync README
Manual operations:
# Local preview
mkdocs serve
# Deploy (or push to GitHub for auto-deploy via Actions)
mkdocs gh-deployWe welcome contributions! To add new papers or tables:
- Fork this repository
- Run
run.bat(Windows) orrun.sh(Linux/Mac) - Or manually edit YAML/CSV files in
data/directory - Submit a PR with your changes
The application of LLMs in the programming domain has witnessed explosive growth. Early research focused primarily on function-level code generation, with benchmarks such as HumanEval serving as standard metrics. However, generic benchmarks often fail to capture the nuances of real-world development. To bridge this gap, recent initiatives have attempted to extend evaluation tasks to align more closely with realistic software development scenarios, revealing the limitations of general models in specialized domains. Concurrently, methods are also evolving to capture these broader contexts. While foundational approaches primarily relied on SFT or standard retrieval-augmented generation, RL-based methods emerged as a pivotal direction for handling complex coding tasks.
Related:
- HumanEval: Evaluating Large Language Models Trained on Code
- Program Synthesis: Program Synthesis with Large Language Models
- Repository-Level Code Completion: RLCoder: Reinforcement Learning for Repository-Level Code Completion
- Domain-Specific Benchmarks: Top General Performance = Top Domain Performance? DomainCodeBench
- Long-Context Code Models: Long Code Arena
- Multitask Fine-Tuning: MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
- RAG for Code: RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry, Repoformer: Selective Retrieval for Repository-Level Code Completion, CodeRAG-Bench
- Code Generation Survey: A Survey on Large Language Models for Code Generation
The primary goal of this task is to autonomously construct complete and executable software systems starting from high-level natural language requirements. Unlike code completion, it necessitates covering the Software Development Life Cycle (SDLC), including requirement analysis, system design, coding, and testing. To address the complexity and potential logic inconsistencies in this process, state-of-the-art frameworks leverage multi-agent collaboration, simulating human development teams to decompose complex tasks into streamlined and verifiable workflows.
Related:
- ChatDev: Communicative Agents for Software Development
- MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
- RPG: Repository Planning Graph for Unified and Scalable Codebase Generation
Issue resolution is intrinsically linked to the broader domain of automated software maintenance. Methodologies established in this field are frequently encapsulated as callable tools to augment the capabilities of LLMs in software development tasks.
Related:
- Bug Reproduction: AssertFlip, Automated Generation of Issue-Reproducing Tests
- Fault Localization:
- Code Search: A Benchmark for Localizing Code and Non-Code Issues
- Test Generation:
- Security: Is Vibe Coding Safe?
- Survey Papers:
Recent initiatives focus on automating the configuration of runtime environments for entire repositories. This capability develops in parallel with data construction for issue resolution.
Related:
- EnvBench: A Benchmark for Automated Environment Setup
- PIPer: On-Device Environment Setup via Online Reinforcement Learning
- Automated Benchmark Generation: Automated Benchmark Generation for Repository-Level Coding Tasks
Existing surveys primarily focus on code generation or other tasks within the software engineering domain. This paper bridges this gap by offering the first systematic survey dedicated to the entire spectrum of issue resolution, ranging from non-agent approaches to the latest agentic advancements.
Related:
- A Survey on Large Language Models for Code Generation
- Agents in software engineering: survey, landscape, and vision
- A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
If you use this project or related survey in your research or system, please cite the following:
Li, Caihua, Guo, Lianghong, Wang, Yanlin, et al. (2026). Advances, Frontiers, and Future of Issue Resolution in Software Engineering: A Comprehensive Survey. TechRxiv. DOI: 10.36227/techrxiv.176779734.47868328/v2
BibTeX:
@article{li2026advances,
title={Advances, Frontiers, and Future of Issue Resolution in Software Engineering: A Comprehensive Survey},
author={Li, Caihua and Guo, Lianghong and Wang, Yanlin and Guo, Daya and Tao, Wei and Shan, Zhenyu and Liu, Mingwei and Chen, Jiachi and Song, Haoyu and Tang, Duyu and Zhang, Hongyu and Zheng, Zibin},
journal={TechRxiv},
year={2026},
page={1375056},
dor={10.36227/techrxiv.176779734.47868328/v2},
publisher={IEEE}
}Once published on arXiv or at a conference, please replace the entry with the official citation information (authors, DOI/arXiv ID, conference name, etc.).
We would like to express our sincere gratitude to:
-
The authors of cited papers who provided valuable feedback on how their work is presented in this survey, greatly improving its accuracy and comprehensiveness.
-
All contributors who have helped improve this project through issues, pull requests, and discussions.
-
The open-source community for developing the amazing tools and frameworks that made this project possible.
-
@chao-peng (Dr. Chao Peng), ByteDance Software Engineering Lab, for providing valuable suggestions on the Challenges and Opportunities section of our survey.
-
@EuniAI/awesome-code-agents for providing an excellent reference on managing survey papers through documentation systems and inspiring our project structure.
If you have any questions or suggestions, please contact us through:
- 📧 Email: noranotdor4@gmail.com
- 💬 GitHub Issues: Open an issue
This project is licensed under the MIT License - see the LICENSE file for details.
⭐ Star this repository if you find it helpful!
Made with ❤️ by the DeepSoftwareAnalytics team
Documentation | Paper | Tables | About | Cite