[diffusion] Skip negative prompt encoding when guidance_scale <= 1.0 or negative_prompt is None#16919
Conversation
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Summary of ChangesHello @zhuyuhua-v, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant optimization to the diffusion pipeline by making the negative prompt encoding conditional. By only performing this computationally intensive step when classifier-free guidance is actively engaged (i.e., when a Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable optimization by skipping the negative prompt encoding when classifier-free guidance is not active. The logic is sound and the performance benefits are clear from the benchmarks. My review includes one suggestion to improve code maintainability by using a pre-existing flag, do_classifier_free_guidance, instead of re-evaluating the condition in two places. This will make the code cleaner and more robust against future changes.
| if batch.guidance_scale > 1.0 and batch.negative_prompt is not None: | ||
| neg_outputs = self.text_encoder( | ||
| input_ids=neg_image_inputs.input_ids, | ||
| attention_mask=neg_image_inputs.attention_mask, | ||
| pixel_values=neg_image_inputs.pixel_values, | ||
| image_grid_thw=neg_image_inputs.image_grid_thw, | ||
| output_hidden_states=True, | ||
| ) | ||
| batch.prompt_embeds.append( | ||
| self.encoding_qwen_image_edit(outputs, image_inputs) | ||
| ) | ||
|
|
||
| batch.negative_prompt_embeds.append( | ||
| self.encoding_qwen_image_edit(neg_outputs, neg_image_inputs) | ||
| ) | ||
| if batch.guidance_scale > 1.0 and batch.negative_prompt is not None: | ||
| batch.negative_prompt_embeds.append( | ||
| self.encoding_qwen_image_edit(neg_outputs, neg_image_inputs) | ||
| ) |
There was a problem hiding this comment.
This change correctly adds the optimization, but the condition batch.guidance_scale > 1.0 and batch.negative_prompt is not None is duplicated, which can be a maintenance risk.
The Req class (which batch is an instance of) already computes a do_classifier_free_guidance boolean flag for this exact purpose in its __post_init__ method.
Using this existing flag batch.do_classifier_free_guidance would make the code cleaner and more robust by centralizing the logic. This is a recommended practice to avoid logic duplication.
| if batch.guidance_scale > 1.0 and batch.negative_prompt is not None: | |
| neg_outputs = self.text_encoder( | |
| input_ids=neg_image_inputs.input_ids, | |
| attention_mask=neg_image_inputs.attention_mask, | |
| pixel_values=neg_image_inputs.pixel_values, | |
| image_grid_thw=neg_image_inputs.image_grid_thw, | |
| output_hidden_states=True, | |
| ) | |
| batch.prompt_embeds.append( | |
| self.encoding_qwen_image_edit(outputs, image_inputs) | |
| ) | |
| batch.negative_prompt_embeds.append( | |
| self.encoding_qwen_image_edit(neg_outputs, neg_image_inputs) | |
| ) | |
| if batch.guidance_scale > 1.0 and batch.negative_prompt is not None: | |
| batch.negative_prompt_embeds.append( | |
| self.encoding_qwen_image_edit(neg_outputs, neg_image_inputs) | |
| ) | |
| if batch.do_classifier_free_guidance: | |
| neg_outputs = self.text_encoder( | |
| input_ids=neg_image_inputs.input_ids, | |
| attention_mask=neg_image_inputs.attention_mask, | |
| pixel_values=neg_image_inputs.pixel_values, | |
| image_grid_thw=neg_image_inputs.image_grid_thw, | |
| output_hidden_states=True, | |
| ) | |
| batch.prompt_embeds.append( | |
| self.encoding_qwen_image_edit(outputs, image_inputs) | |
| ) | |
| if batch.do_classifier_free_guidance: | |
| batch.negative_prompt_embeds.append( | |
| self.encoding_qwen_image_edit(neg_outputs, neg_image_inputs) | |
| ) |
| image_grid_thw=neg_image_inputs.image_grid_thw, | ||
| output_hidden_states=True, | ||
| ) | ||
| if batch.guidance_scale > 1.0 and batch.negative_prompt is not None: |
There was a problem hiding this comment.
could we generalize this logic to something like batch::should_do_cfg, and apply that logic to denoising stage as well?
There was a problem hiding this comment.
actually, we have a do_classifier_free_guidance already
There was a problem hiding this comment.
Thanks a lot for the notification! I've updated these changes to use do_classifier_free_guidance for judgment.
There was a problem hiding this comment.
in this case, should be skip these too?
|
/tag-and-rerun-ci |
Motivation
classifier-free guidance is enabled when both
guidance_scale > 1.0andnegative_prompt is not None.Consequently, the negative prompt needs to be encoded only under this condition.
When either
guidance_scale <= 1.0ornegative_prompt is None, the unconditional noise prediction (noise_pred_uncond) is not used, making the negative prompt encoding redundant. Skipping it in these cases avoids unnecessary computation in the text encoder and improves inference efficiency.Modifications
guidance_scale <= 1.0ornegative_prompt is None.Accuracy Tests
test cmd:
input image:

prompt:
make the clothes to redoutput with this pr:

output without this pr:

Benchmarking and Profiling
performance without this pr on MI308:
performance with this pr on MI308: