⚡ Bolt: Concurrent LLM tool execution via asyncio.gather#112
⚡ Bolt: Concurrent LLM tool execution via asyncio.gather#112ishaanxgupta wants to merge 1 commit intomainfrom
Conversation
…io.gather() Replaced sequential `for tc in ai_response.tool_calls` iterations with `asyncio.gather()` in `RetrievalPipeline` and `CodeRetrievalPipeline`. This drastically improves network/I/O bound operational performance (like Pinecone queries and LLM sub-calls) by resolving them concurrently before securely appending them to the pipeline state sequentially. Multi-repo searches inside `_search_symbols` and `_search_files` were similarly optimized.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
💡 What: Refactored
RetrievalPipeline.run(),CodeRetrievalPipeline.run(),CodeRetrievalPipeline.run_stream(),CodeRetrievalPipeline._search_symbols(), andCodeRetrievalPipeline._search_files()to execute iteration loops concurrently usingasyncio.gather().🎯 Why: Execution of LLM tool calls (and multi-repository namespace searches) was sequential, creating a major network/I/O bottleneck where each Pinecone query/DB read waited for the previous to finish.
📊 Impact: Considerably faster overall pipeline response times. Pipeline latency for queries generating multiple tool calls or searching across all repositories will scale with the longest-running tool/repo call, rather than linearly by length.
🔬 Measurement: Can be verified by providing prompts to the retrieval pipeline that trigger multiple distinct tool calls and observing the logged timing metrics in the server console, or observing multiple concurrent read tasks rather than single sequential loops.
PR created automatically by Jules for task 6328948430720287357 started by @ishaanxgupta