Skip to content

Merge master changes#140

Merged
tttennnnn merged 9 commits intottft-analysisfrom
main
Mar 6, 2026
Merged

Merge master changes#140
tttennnnn merged 9 commits intottft-analysisfrom
main

Conversation

@tttennnnn
Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the VQA benchmarking framework by enabling dynamic measurement of input token counts and generating token-matched dummy text for text-only baseline comparisons. These changes improve the precision and flexibility of VQA performance evaluations across various AI providers, allowing for more robust analysis of vision encoder latency. Additionally, provider configurations were updated to support newer vision models and improve client stability.

Highlights

  • VQA Benchmarking Configuration Update: The VQA streaming benchmark configuration was updated to remove 'TogetherAI' as a provider, switch to 'vision-model-02', and increase the number of requests from 100 to 150.
  • Provider Model Mappings and Robustness: AWS Bedrock, Azure, and Google providers were updated to include new 'vision-model-01' and 'vision-model-02' mappings. The AWS Bedrock client also received configuration for connection/read timeouts and retries to improve stability.
  • Input Token Count Extraction: New get_input_token_count methods were implemented across AWS Bedrock, Azure, and Google providers to accurately extract the number of input tokens from both streaming and non-streaming responses.
  • Dynamic VQA Dummy Text Generation: The base ProviderInterface now includes functionality to dynamically generate dummy text for VQA benchmarks. This text is token-count-matched to the multimodal input, using Hugging Face tokenizers and a corpus from Project Gutenberg, ensuring more accurate text-only baseline comparisons.
  • VQA Logging Enhancement: The VQA log output was extended to include the multimodal_input_tokens count, providing more detailed metrics for analysis.
  • New Dependency Added: The transformers library (version 4.46.3) was added to the project dependencies to support the new tokenization functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • benchmarking/experiments/compare_providers_streaming_vqa.json
    • Removed 'TogetherAI' from the list of providers.
    • Updated the 'models' entry from 'vision-model' to 'vision-model-02'.
    • Increased 'num_requests' from 100 to 150.
  • providers/aws_provider.py
    • Imported Config from botocore.config.
    • Added 'vision-model-01' and 'vision-model-02' mappings to self.model_map.
    • Configured the boto3 client with connect_timeout, read_timeout, and retries.
    • Implemented the get_input_token_count method to extract input token usage from responses.
  • providers/azure_provider.py
    • Added 'vision-model-01' and 'vision-model-02' mappings to self.model_map.
    • Implemented the get_input_token_count method to extract prompt token usage from responses.
  • providers/google_provider.py
    • Added 'vision-model-01' and 'vision-model-02' mappings to self.model_map.
    • Implemented the get_input_token_count method to extract prompt token count from responses.
  • providers/provider_interface.py
    • Added imports for random, requests, and transformers.AutoTokenizer.
    • Initialized _vqa_tokenizers and _vqa_corpus_words attributes.
    • Implemented the get_vqa_dummy_text method to generate text of a specified token count.
    • Modified the VQA log header to include 'multimodal_input_tokens'.
    • Removed the dummy_text field from VQA item processing.
    • Dynamically generated text_only_messages using get_vqa_dummy_text based on multimodal input token count.
    • Added a print statement to report input tokens for the text-only pass.
  • requirements.txt
    • Added transformers==4.46.3 to the list of dependencies.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@tttennnnn tttennnnn merged commit 75acf26 into ttft-analysis Mar 6, 2026
3 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new methodology for benchmarking Visual Question Answering (VQA) models by measuring vision encoder latency, comparing time-to-first-token (TTFT) of multimodal and text-only requests. It includes helper methods for dummy text generation and input token extraction. Please note that the security review could not be completed as the provided model response was malformed and did not adhere to the specified JSON schema. Furthermore, a critical bug in the dummy text generation has been identified that could invalidate benchmark results, and there are opportunities for improving robustness and code quality.

Comment on lines +285 to +297
if not self._vqa_corpus_words:
url = "https://www.gutenberg.org/cache/epub/2701/pg2701.txt"
try:
text = requests.get(url, timeout=10).text
all_ids = tokenizer.encode(text, add_special_tokens=False)
special_ids = set(tokenizer.all_special_ids) # Filter out special tokens
self._vqa_corpus_words = [tid for tid in all_ids if tid not in special_ids]
except Exception as e:
print(f"Fallback to dummy tokens: {e}")
self._vqa_corpus_words = tokenizer.encode("The quick brown fox jumps over the lazy dog", add_special_tokens=False)

# Build dummy text
sampled_ids = random.choices(self._vqa_corpus_words, k=total_tokens)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical bug here. The _vqa_corpus_words attribute is shared across all models, but it stores token IDs that are specific to a particular tokenizer. When get_vqa_dummy_text is called for a second model with a different tokenizer, it will incorrectly reuse the token IDs from the first model's tokenizer. This will lead to incorrect dummy text generation and invalidate the benchmark results.

To fix this, _vqa_corpus_words should be a dictionary keyed by the normalized model_id, similar to _vqa_tokenizers. You will also need to update __init__ to initialize self._vqa_corpus_words = {}.

Suggested change
if not self._vqa_corpus_words:
url = "https://www.gutenberg.org/cache/epub/2701/pg2701.txt"
try:
text = requests.get(url, timeout=10).text
all_ids = tokenizer.encode(text, add_special_tokens=False)
special_ids = set(tokenizer.all_special_ids) # Filter out special tokens
self._vqa_corpus_words = [tid for tid in all_ids if tid not in special_ids]
except Exception as e:
print(f"Fallback to dummy tokens: {e}")
self._vqa_corpus_words = tokenizer.encode("The quick brown fox jumps over the lazy dog", add_special_tokens=False)
# Build dummy text
sampled_ids = random.choices(self._vqa_corpus_words, k=total_tokens)
# Load corpus for the specific tokenizer if not already loaded
if not self._vqa_corpus_words.get(model_id):
url = "https://www.gutenberg.org/cache/epub/2701/pg2701.txt"
try:
text = requests.get(url, timeout=10).text
all_ids = tokenizer.encode(text, add_special_tokens=False)
special_ids = set(tokenizer.all_special_ids) # Filter out special tokens
self._vqa_corpus_words[model_id] = [tid for tid in all_ids if tid not in special_ids]
except Exception as e:
print(f"Fallback to dummy tokens: {e}")
self._vqa_corpus_words[model_id] = tokenizer.encode("The quick brown fox jumps over the lazy dog", add_special_tokens=False)
# Build dummy text
sampled_ids = random.choices(self._vqa_corpus_words[model_id], k=total_tokens)

Comment on lines +131 to +133
last_chunk = response[-1]
usage = last_chunk.get('usage_metadata', {})
return usage.get('prompt_token_count', 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation assumes that usage_metadata is always present in the last chunk of the streaming response. If for some reason it's in an earlier chunk (e.g., followed by empty chunks), this will fail to find it and return 0. To make this more robust, it's better to iterate backwards through the response to find the last chunk containing usage data, similar to the implementation for the AWS and Azure providers.

            # The usage metadata is usually in the last chunk, but we search backwards for robustness.
            for chunk in reversed(response):
                if 'usage_metadata' in chunk:
                    usage = chunk.get('usage_metadata', {})
                    return usage.get('prompt_token_count', 0)
            return 0

elif 'gemini-3' in lowered_id:
model_id = 'gemini-3'
else:
raise Exception('Invalid model for VQA input type.')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a more specific exception type like ValueError instead of the generic Exception is a good practice as it allows for more granular error handling. Including the invalid model_id in the error message would also be helpful for debugging.

Suggested change
raise Exception('Invalid model for VQA input type.')
raise ValueError(f'Invalid model for VQA input type: {model_id}')

Comment on lines +301 to +303
final_ids = tokenizer.encode(dummy_text, add_special_tokens=False)
if len(final_ids) < total_tokens:
dummy_text += "." * (total_tokens - len(final_ids))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current padding logic is not guaranteed to produce text with the exact total_tokens count. The added . characters can be merged with the preceding token by the tokenizer, resulting in a lower-than-expected token count. This could affect the accuracy of the benchmark. A more robust approach is to iteratively pad and re-tokenize until the target count is met, and then truncate if you've overshot.

        final_ids = tokenizer.encode(dummy_text, add_special_tokens=False)
        # Iteratively adjust to ensure the token count is precise.
        while len(final_ids) < total_tokens:
            dummy_text += " ."  # Adding a space helps prevent merging.
            final_ids = tokenizer.encode(dummy_text, add_special_tokens=False)

        # If we overshot, truncate the token list and decode back to text.
        if len(final_ids) > total_tokens:
            dummy_text = tokenizer.decode(final_ids[:total_tokens], clean_up_tokenization_spaces=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant