feat(routing): Initialize model routing architecture#8153
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @abhipatel12, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request establishes the core architecture for dynamic model routing within the application. It introduces a service to intelligently select the most suitable language model for user interactions, ensuring consistency across multi-turn conversations. This foundational change enables future enhancements for optimizing model usage based on various criteria like cost and capability, without altering current user-facing behavior.
Highlights
- Foundational Model Routing Architecture: Introduced a new
ModelRouterServiceresponsible for dynamically selecting the appropriate LLM (e.g., Gemini Pro, Gemini Flash) for a given user prompt, laying the groundwork for more intelligent and efficient model usage. - Model Stickiness: Implemented a 'model stickiness' mechanism where, once a model is selected for the first turn of a conversation (identified by a
prompt_id), that same model is used for all subsequent turns within that multi-turn prompt to ensure conversational consistency. - Inversion of Control Refactor: Refactored
GeminiClient,GeminiChat, andTurncomponents to accept the chosen model as a parameter, rather than fetching it from the configuration. This improves modularity, predictability, and testability of downstream components. - Default Routing Strategy: Included an initial
DefaultStrategythat always routes to the standard Gemini Pro model, ensuring that existing behavior remains unchanged while the new architecture is in place. - Future Extensibility: The new architecture is designed to easily accommodate advanced routing strategies in the future, such as cost-based, capability-based, or content-based routing, to optimize model selection.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
Size Change: +6.36 kB (+0.05%) Total Size: 13.2 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request introduces a foundational model routing architecture, which is a significant and well-designed improvement for enabling dynamic model selection. The introduction of the ModelRouterService and the refactoring to pass the model down the call stack are implemented consistently across the codebase. The new architecture is more modular and paves the way for more intelligent model usage. I have identified one high-severity issue regarding the interaction between the new routing logic and the existing chat compression feature, which could lead to incorrect behavior and unexpected costs. Please see the detailed comment.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
| if (this.contentGeneratorConfig) { | ||
| this.contentGeneratorConfig.model = newModel; | ||
| // Do not allow Pro usage if the user is in fallback mode. | ||
| if (newModel === DEFAULT_GEMINI_MODEL && this.isInFallbackMode()) { |
There was a problem hiding this comment.
Should the fallback be handled at this level of abstraction? Feels like it should be a part of the router.
There was a problem hiding this comment.
This is for when the user decides to set their configured model and if they are in fallback mode. I thought it made sense to fail fast and early if the model cannot be upgraded.
We could allow this action, but then on the next user request, they will be downgraded again on their actual request which could make for an odd user experience.
What do you think?
|
|
||
| let modelToUse: string; | ||
|
|
||
| // Determine Model (Stickiness vs. Routing) |
There was a problem hiding this comment.
I'd encapsulate all of this logic in the router.
There was a problem hiding this comment.
To determine model stickiness, we currently rely on the promptId. However, if we move this into the model service, we then cause the router to be in charge of routing as well as keeping track of all clients and their current turn.
This may work but becomes more of an issue as we add more agents that require a router. For now, we should keep the model stickiness as a responsibility of the client.
I am removing the fallback check since that was a copypasta mistake from my previous PR.
| return decision; | ||
| } | ||
|
|
||
| return this.strategy.route(context, this.config.getBaseLlmClient()); |
There was a problem hiding this comment.
The returned model name needs to be validated (or make it an enum).
There was a problem hiding this comment.
How do you recommend validating it?
Models can be strings passed from the user, so if it is wrong it will fail during the request.
For the classifier case in the future, we simply map the response to the actual model string constant so that should be safe as well.
| */ | ||
| export interface RoutingContext { | ||
| /** The full history of the conversation. */ | ||
| history: Content[]; |
There was a problem hiding this comment.
Full history or compressed history?
There was a problem hiding this comment.
This is the current most history as provided by the client. Since compression happens prior to routing, this would mean the history after compression takes place.
1b1cdd6 to
b58578e
Compare
abhipatel12
left a comment
There was a problem hiding this comment.
Thanks for the feedback Victor! Let me know what you think of the changes.
|
|
||
| let modelToUse: string; | ||
|
|
||
| // Determine Model (Stickiness vs. Routing) |
There was a problem hiding this comment.
To determine model stickiness, we currently rely on the promptId. However, if we move this into the model service, we then cause the router to be in charge of routing as well as keeping track of all clients and their current turn.
This may work but becomes more of an issue as we add more agents that require a router. For now, we should keep the model stickiness as a responsibility of the client.
I am removing the fallback check since that was a copypasta mistake from my previous PR.
| if (this.contentGeneratorConfig) { | ||
| this.contentGeneratorConfig.model = newModel; | ||
| // Do not allow Pro usage if the user is in fallback mode. | ||
| if (newModel === DEFAULT_GEMINI_MODEL && this.isInFallbackMode()) { |
There was a problem hiding this comment.
This is for when the user decides to set their configured model and if they are in fallback mode. I thought it made sense to fail fast and early if the model cannot be upgraded.
We could allow this action, but then on the next user request, they will be downgraded again on their actual request which could make for an odd user experience.
What do you think?
| */ | ||
| export interface RoutingContext { | ||
| /** The full history of the conversation. */ | ||
| history: Content[]; |
There was a problem hiding this comment.
This is the current most history as provided by the client. Since compression happens prior to routing, this would mean the history after compression takes place.
| return decision; | ||
| } | ||
|
|
||
| return this.strategy.route(context, this.config.getBaseLlmClient()); |
There was a problem hiding this comment.
How do you recommend validating it?
Models can be strings passed from the user, so if it is wrong it will fail during the request.
For the classifier case in the future, we simply map the response to the actual model string constant so that should be safe as well.
TLDR
This PR introduces a foundational model routing architecture to enable dynamic model selection for user prompts.
NOTE: This PR does not make any functional changes to actually implement dynamic model routing. It only adds the architecture to do so.
ModelRouterServiceis introduced, which is responsible for deciding which model (e.g., Gemini Pro, Gemini Flash) to use for a given conversation.GeminiClientnow orchestrates model selection. For the first turn of a new prompt, it calls theModelRouterService.GeminiChat,Turn) have been refactored to accept the chosen model as a parameter, rather than looking it up in the config.ModelRouterServicethat uses a chain-of-responsibility pattern with different strategies.The core logic changes are in
packages/core/src/core/client.tsand the newpackages/core/src/routing/directory.Dive Deeper
This change introduces a more sophisticated approach to model management. Previously, the model was a static configuration value. Now, it's a dynamic decision made at the start of each conversational sequence (identified by a
prompt_id).Model Stickiness
A key concept here is "model stickiness." Once the router selects a model for a prompt, that same model is used for all subsequent turns within that prompt. This is crucial for maintaining conversational context and preventing jarring shifts in tone or capability that could occur if the model changed mid-conversation. The
GeminiClientnow manages this state via thecurrentSequenceModelproperty.Inversion of Control
This refactor also improves the overall architecture by applying an inversion of control pattern. Instead of lower-level modules like
GeminiChatpulling the model from theConfigservice, the high-levelGeminiClientnow pushes the selected model down as a parameter. This makes the downstream components more modular, predictable, and easier to test in isolation.Strategies
The initial order of strategies is:
FallbackStrategy: If the client is in fallback mode (e.g., due to a previous quota error), this strategy immediately takes over. It intelligently downgrades "pro" models to "flash" while honoring requests for "lite" models to preserve cost-saving intentions.OverrideStrategy: If a model is explicitly set in the user's configuration, this strategy ensures that the user's choice is respected, bypassing any other logic.DefaultStrategy: This is the terminal strategy in the chain. If no other strategy makes a decision, it ensures that we always fall back to the defaultgemini-2.5-promodel.Future Possibilities
Although this PR does not introduce functional changes, it unlocks the ability to build more advanced routing strategies in the future, such as:
NOTE
The Default strategy is not used b/c the override mechanism is greedy. This is in preparation for how we will explicitly set the model once we introduce an actual dynamic router.
Reviewer Test Plan
The most effective way to validate this change is by reviewing the new unit tests, which thoroughly cover the routing and stickiness logic.
packages/core/src/core/client.test.ts: See the new "Model Routing"describeblock.packages/core/src/routing/modelRouterService.test.ts: Validates the new service logic.Manual Validation (Optional)
Since the
DefaultStrategypreserves the existing behavior, the CLI should function identically from a user's perspective. To see the new architecture in action, a reviewer can add logging:packages/core/src/routing/modelRouterService.ts, log the decision in theroutemethod.packages/core/src/core/client.ts, log when the router is being called vs. when a sticky model is being used.ModelRouterServiceindicating it's making a decision.Testing Matrix
Linked issues / bugs
This PR makes progress on #6079