Skip to content

Fix validation error for other models#750

Merged
copybara-service[bot] merged 1 commit into
mainfrom
fix_mega_validation
Jul 4, 2024
Merged

Fix validation error for other models#750
copybara-service[bot] merged 1 commit into
mainfrom
fix_mega_validation

Conversation

@RissyRan

@RissyRan RissyRan commented Jul 3, 2024

Copy link
Copy Markdown
Collaborator

Description

Nightly gpt3 175b tests met megablox parallelism validatoin error ValueError: Currently we only support Megablox with data parallelism (here). This is because we set megablox flag as True when cleaning up brute force implementation (here).

The solution is to turn on this validation only for MoE model.

Test

Running train_compile.py script (non-MoE): link

@RissyRan RissyRan requested a review from raymondzouu July 3, 2024 22:35
@RissyRan RissyRan force-pushed the fix_mega_validation branch from 3935619 to 491e466 Compare July 3, 2024 22:37
@RissyRan RissyRan force-pushed the fix_mega_validation branch from 491e466 to eb5cf82 Compare July 3, 2024 22:39
@RissyRan RissyRan marked this pull request as ready for review July 3, 2024 23:01
@RissyRan RissyRan requested a review from gobbleturk as a code owner July 3, 2024 23:01

@raymondzouu raymondzouu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick fix @RissyRan!

@copybara-service copybara-service Bot merged commit b83a7a4 into main Jul 4, 2024
@copybara-service copybara-service Bot deleted the fix_mega_validation branch July 4, 2024 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants