Skip to content

Fix Megatron OOM when calculating entropy#550

Merged
chenyushuo merged 11 commits into
agentscope-ai:mainfrom
pan-x-c:feature/megatron_entropy
May 25, 2026
Merged

Fix Megatron OOM when calculating entropy#550
chenyushuo merged 11 commits into
agentscope-ai:mainfrom
pan-x-c:feature/megatron_entropy

Conversation

@pan-x-c
Copy link
Copy Markdown
Collaborator

@pan-x-c pan-x-c commented May 25, 2026

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented May 25, 2026

/unittest-module-trainer

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented May 25, 2026

/unittest-pattern-TestTrainerGSM8K

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
4 4 0 0 0 0 13m 11s

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_megatron::test_trainer 2m 25s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 1m 37s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_megatron::test_trainer 5m 17s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 3m 29s

Github Test Reporter by CTRF 💚

@chenyushuo chenyushuo merged commit 21eb8b1 into agentscope-ai:main May 25, 2026
1 check passed
@chenyushuo
Copy link
Copy Markdown
Collaborator

/unittest-module-common

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
81 78 1 2 0 0 31m 12s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api The test failed in the call phase

Skipped

Tests Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 17.3s
tests/common/config_test.py::TestConfig::test_chat_template_path 98ms
tests/common/config_test.py::TestConfig::test_config_flatten 37ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 210ms
tests/common/config_test.py::TestConfig::test_default_workflow 97ms
tests/common/config_test.py::TestConfig::test_inference_model_base_port_falls_back_when_unavailable 2ms
tests/common/config_test.py::TestConfig::test_inference_model_base_port_uses_engine_id 1ms
tests/common/config_test.py::TestConfig::test_inference_model_random_port_can_use_port_reserved_by_api_server 1ms
tests/common/config_test.py::TestConfig::test_inference_model_random_port_ignores_base_port 1ms
tests/common/config_test.py::TestConfig::test_inference_model_without_base_port_uses_ephemeral_port 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 421ms
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 433ms
tests/common/config_test.py::TestConfig::test_multinode_inference_is_rejected_for_non_vllm_sglang_engines 37ms
tests/common/config_test.py::TestConfig::test_multinode_vllm_config_is_valid 36ms
tests/common/config_test.py::TestConfig::test_multinode_vllm_requires_full_node_occupancy 36ms
tests/common/config_test.py::TestConfig::test_multinode_vllm_requires_matching_nnodes_for_full_nodes 37ms
tests/common/config_test.py::TestConfig::test_multinode_vllm_requires_nnodes_within_cluster_size 37ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 99ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.7s
tests/common/experience_extraction_test.py::TestExperienceExtraction::test_convert_completion_output_extracts_sglang_routed_experts 1ms
tests/common/experience_extraction_test.py::TestExperienceExtraction::test_convert_completion_output_ignores_invalid_routed_experts_shape 2ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs 1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload 2ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 17ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize_with_routed_experts 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many_with_routed_experts 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model_load 835ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant 1.3s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty 698ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_mm_messages 1.3s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages 816ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data 1.2s
tests/common/sglang_test.py::TestSGLangOpenAIAPI_0::test_chat_completions 4m 37s
tests/common/sglang_test.py::TestSGLangOpenAIAPI_1::test_chat_completions 1m 5s
tests/common/sglang_test.py::TestSGLangOpenAIAPI_2::test_chat_completions 55.1s
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 2ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 10s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 1m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 3m 15s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 56.8s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 54.6s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 54.9s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 54.4s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 54.1s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 54.2s
tests/common/vllm_test.py::TestAPIServer::test_api 57.2s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content ⏭️ 515ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 39.3s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 36.1s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 306ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 601ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 2m 28s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 1m 59s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 5m 7s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 1m 14s

Github Test Reporter by CTRF 💚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants