add single_model network and use intermediate api #9412

blacksheep-Aristotle · 2024-11-12T08:54:49Z

PR types

New features

PR changes

Models

Description

添加llama ，qwen，gpt 单卡组网。
auto_trainer支持使用中层api & 运行脚本支持使用中层api

验证文档：https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/ESWJRriQZ-/sfEV74J-hHGXIR

paddle-bot · 2024-11-12T08:54:54Z

Thanks for your contribution!

jeff41404 · 2024-11-14T03:56:36Z

paddlenlp/trainer/auto_trainer.py

+                level = "os_g"
+            elif ShardingOption.FULL_SHARD in self.args.sharding:
+                level = "p_g_os"
+            model, self.optimizer = sharded_data_parallel(model, self.optimizer, level)


是否构造dp_config传入parallelize，不需要再单独进行 sharded_data_parallel

done udpated

codecov · 2024-11-15T03:27:49Z

Codecov Report

Attention: Patch coverage is 16.87500% with 1197 lines in your changes missing coverage. Please review.

Project coverage is 52.33%. Comparing base (3374e7f) to head (67bb667).
Report is 287 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/gpt/modeling_network.py	18.52%	387 Missing ⚠️
paddlenlp/transformers/llama/modeling_network.py	16.93%	368 Missing ⚠️
paddlenlp/transformers/qwen/modeling_network.py	17.50%	330 Missing ⚠️
paddlenlp/transformers/model_utils.py	3.03%	64 Missing ⚠️
paddlenlp/trainer/auto_trainer.py	0.00%	35 Missing ⚠️
paddlenlp/transformers/gpt/modeling_auto.py	15.38%	11 Missing ⚠️
paddlenlp/transformers/llama/modeling_auto.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9412      +/-   ##
===========================================
- Coverage    53.17%   52.33%   -0.85%     
===========================================
  Files          718      721       +3     
  Lines       114694   113772     -922     
===========================================
- Hits         60990    59540    -1450     
- Misses       53704    54232     +528

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JunnYu · 2024-11-22T01:45:00Z

paddlenlp/transformers/qwen/modeling_network.py

+        return logits
+
+
+loss_cnt = 0


这些都是干啥的

遗留代码，已删除。

JunnYu · 2024-11-22T01:49:26Z

paddlenlp/trainer/auto_trainer.py


 import numpy as np
 import paddle
 import paddle.distributed as dist


未来的目标是啥，全面迁移auto_trainer?如果是的话，是不是直接所有trainer逻辑都统一重写，不要与旧的耦合

按之前讨论，自动并行集成到auto_trainer，trainer主要是集成原来手动并行的逻辑，两者不耦合，只是共用基础设施

主要是公用基础设施，如果公用的api发生变化，可能会导致这里也挂了，从开发体验上来看，当前自动并行的测试监控需要能够及时发现定位这些问题

wawltor · 2024-11-22T01:47:32Z

paddlenlp/transformers/llama/modeling_network.py

+                    f"{prefix}lm_head.weight": ColWiseParallel(),
+                }
+            },
+            "pp_config": {"split_spec": f"{prefix}llama.layers"},


对于非layers层以及empty层切分方式麻烦提供下示例

同时有个疑问，share weight的参数预计怎么如何标记了？

框架会自动识别整理出share weight的参数，做特殊处理，用户在使用时当作正常参数标记即可。

wawltor · 2024-11-22T01:49:47Z

paddlenlp/transformers/llama/modeling_network.py

+    _keys_to_ignore_on_load_unexpected = [r"self_attn.rotary_emb.inv_freq"]
+
+    @classmethod
+    def _get_name_mappings(cls, config: LlamaConfig) -> list[StateDictNameMapping]:


在 auto_dist_config 配置了 sp、tp的切分信息，建议去掉name_mapping的配置，在自动并行的中层API里面进行切分

wawltor · 2024-11-22T02:01:24Z

paddlenlp/transformers/model_utils.py

+                    level = 2
+                if ShardingOption.FULL_SHARD in sharding:
+                    level = 3
+            final_config["dp_config"] = {"level": level}


需要考虑LoRA、DPO、KTO训练

LoRA训练会自定义tensor parallel的LoRA层，这里如何配置了？

KTO 和 DPO 有两个模型，一个更新参数，另外不更新参数，两个模型如何配置分布式策略了？

按之前讨论，预训练之后的流程会逐步验证支持

wawltor · 2024-11-22T02:11:25Z

paddlenlp/transformers/llama/modeling_network.py

+                warnings.warn(
+                    f"enable_parallel_cross_entropy, the vocab_size should be splited: {prediction_scores.shape[-1]}, {self.config.vocab_size}"
+                )
+                self.loss_func = paddle.nn.CrossEntropyLoss(reduction="none", ignore_index=self.ignore_index)


在大模型训练中有一些自定义的PyLayer算子，非TP组网中能支持折中PyLayer算子吗？同时折中PyLayer算子和张量并行也是耦合在一起，如何开发了?

PaddleNLP/paddlenlp/transformers/tensor_parallel_utils.py

Line 145 in 9ee691a

class FusedHeadAndCrossEntropy(PyLayer):

自动并行支持PyLayer的工作在另外开展中，相关进展可以请 @From00 介绍下

wawltor · 2024-11-22T02:18:26Z

paddlenlp/transformers/model_utils.py

+                    level = 3
+            final_config["dp_config"] = {"level": level}
+
+        return final_config


中层API对unified checkpoint的支持情况如何了？是否可以支持自适应的分布式策略的扩展

当前已支持checkpoint转换为单卡权重，单卡权重到unified checkpoint的转换在支持中

Unified checkpoint现在可以支持基本任意的分布式策略切换；

Unified checkpoint 保存出来的模型权重可以直接应用到推理框架、或者其他的训练流程中。

现在自动并行保存的格式可以同时支持这两点吗？
另外我看是需要把checkpoint转换为单卡权重，再转到unified checkpoint，是否太复杂了。能否直接支持 Unified checkpoint 的保存？

lugimzzz · 2024-11-22T02:32:19Z

paddlenlp/transformers/llama/modeling_network.py

+        )
+
+
+class LlamaPretrainingCriterion3DNet(paddle.nn.Layer):


中层API设计涉及criterion吗？比如ParallelCrossEntropy？

支持，在tensor_parallel_config中加上replace_with_parallel_cross_entropy配置即可

lugimzzz · 2024-11-22T02:35:00Z

paddlenlp/transformers/llama/modeling_network.py

+
+
+class LlamaPretrainingCriterion3DNet(paddle.nn.Layer):
+    """


中层API的设计主要是涉及DP和TP，和PP的场景需要特殊的兼容吗？

中层api也支持pp，详情可参考文档：http://preview-pr-7011.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/distributed/parallelize_cn.html

lugimzzz · 2024-11-22T02:37:46Z

paddlenlp/transformers/llama/modeling_network.py

+                    f"{prefix}llama.layers.*.mlp.up_proj": ColWiseParallel(),
+                    f"{prefix}llama.layers.*.mlp.gate_up_fused_proj": ColWiseParallel(),
+                    f"{prefix}llama.layers.*.mlp.down_proj": RowWiseParallel(),
+                    f"{prefix}lm_head.weight": ColWiseParallel(),


ColWiseParallel()后原先的linear会转化为ColumnParallelLinear或ColumnSequenceParallelLinear，还是另一个新的linear类型？

ColWiseParallel会对指定的linear的权重用基础api重写一遍，在后续运行的时候自动并行会自动推出来分布式状态同时插入通信算子。不会改变linear类型。

DesmonDay · 2024-11-22T03:35:50Z

llm/auto_parallel/llama/run_pretrain_auto.py

    config.use_recompute = training_args.recompute
    config.tensor_parallel_degree = training_args.tensor_parallel_degree
    config.tensor_parallel_rank = training_args.tensor_parallel_rank
+    config.sharding_parallel_degree = training_args.sharding_parallel_degree


看了一下代码，为啥在400多行创建Topology的位置，sharding_degree默认设置为1？

auto_parallel's sharding is not orthogonal with dp, mp and pp

dp_degree已经包含了sharding_degree，所以sharding_degree设为1即可。

ZHUI · 2024-12-24T03:58:15Z

paddlenlp/transformers/gpt/modeling_auto.py

            normalized_shape=normalized_shape, epsilon=epsilon, weight_attr=weight_attr, bias_attr=bias_attr
        )
        self.config = config
+        self.ipp = ipp


ipp 是什么，为什现在需要额外传入 ipp ?

基础组网需要传入ipp，表示该layer在pipeline stage中的位置。

可以避免这样传参数吗？每一个层都要接一个这样的参数，很麻烦。这个不能自动做吗？

ZHUI · 2024-12-24T04:00:03Z

paddlenlp/transformers/llama/modeling_network.py

+    flash_attention = None
+
+__all__ = [
+    "LlamaForCausalLM3DNet",


Suggested change

"LlamaForCausalLM3DNet",

"LlamaForCausalLM3DNet",

为什么要叫 3DNet，为什么要加特殊的 3D 前缀？

因为基础api组网中带了3D前缀，表示该网络支持pp，dp，tp 3d混合并行，故此保留

单卡和3D的组网不一样吗?如果一样是不是可以直接去掉3D的前缀？

ZHUI · 2024-12-24T04:00:59Z

paddlenlp/transformers/llama/modeling_network.py

+
+
+class LlamaMLPNet(nn.Layer):
+    def __init__(self, config, ipp: Optional[int] = None):


ipp 是什么意思，代表什么？必须传入吗?

遗留代码，已删除，done

ZHUI · 2024-12-24T04:03:11Z

paddlenlp/transformers/llama/modeling_network.py

+        #     output = (logits,) + outputs[1:]
+        #     return (loss,) + output if loss is not None else output
+
+        # return CausalLMOutputWithCrossAttentions(


这些不支持吗? 可以支持 model.generate 生成吗?

动转静要求损失函数和模型分离，故此注释。

ZHUI · 2024-12-24T04:03:48Z

paddlenlp/transformers/llama/modeling_network.py

+        # )
+
+    def auto_dist_config(self, prefix=""):
+        if prefix != "":


这些东西需要如何配置？有文档介绍吗？

详情可参考文档：http://preview-pr-7011.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/distributed/parallelize_cn.html

ZHUI · 2024-12-24T04:11:30Z

paddlenlp/transformers/model_utils.py

            )

+    def merge_auto_dist_configs(self, configs):
+        """


这个函数必须放到模型基类里面吗？可以放到 autotrainer 吗?

因为可能会有如下场景，模型A包含模型B和模型C。模型A，模型B和模型C都有各自的分布式配置，所以放在模型的基类中便于处理，能方便的找到各个模型自己的分布式配置，在auto trainer中merge最终得到模型A的分布式配置。

ZHUI · 2024-12-24T04:11:57Z

paddlenlp/transformers/model_utils.py

+
+        return final_config
+
+    def _generate_auto_dist_config(self, auto_dist_degree):


同上，看能不能放到 autotrainer

ZHUI · 2024-12-24T04:13:50Z

paddlenlp/transformers/qwen/modeling_3D_auto.py

        attention_mask = paddle.where(attention_mask, zero, neg_inf)
        attention_mask = dist.shard_tensor(attention_mask, get_mesh(), [dist.Replicate(), dist.Replicate()])
        hidden_states = self.drop(hidden_states)
-        hidden_states = dist.reshard(hidden_states, get_mesh(), [dist.Shard(0), dist.Replicate()])


modeling_3D_auto.py
modeling_auto.py

要不统一一下，我看不同模型不同写法

ZHUI · 2024-12-25T08:47:13Z

paddlenlp/transformers/gpt/modeling_auto.py

            normalized_shape=normalized_shape, epsilon=epsilon, weight_attr=weight_attr, bias_attr=bias_attr
        )
        self.config = config
+        self.ipp = ipp


可以避免这样传参数吗？每一个层都要接一个这样的参数，很麻烦。这个不能自动做吗？

ZHUI · 2024-12-25T08:50:37Z

paddlenlp/transformers/model_utils.py

+            "pp_config": None,
+        }
+        for name, layer in self.named_sublayers(include_self=True):
+            if hasattr(layer, "auto_dist_config"):


这个属性每层layer 需要额外设置吗？

问题一：基础组网暂时无法避免。因为每一层layer都需要知道自己在pipeline stage中的位置。
问题二：不需要。只是为了处理模型A包含模型B和模型C的场景。此时模型B和模型C属于模型A的sublayer且都有自己的auto_dist_config

补充下我的理解：
“可以避免这样传参数吗？每一个层都要接一个这样的参数，很麻烦。这个不能自动做吗？”
使用自动并行基础API的组网(modeling_auto.py)需要ipp参数，和之前一样，这种组网在自动并行中层API成熟之后会逐步退场。
而使用自动并行中层API的组网(modeling_network.py)即单卡组网就不需要ipp参数了，从代码中也可以看出来，是未来建议使用的方式

ZHUI · 2024-12-25T08:54:19Z

scripts/distribute/ci_case_auto.sh

    mem=-1
    echo "result: loss=$loss ips=$ips mem=$mem loss_md5=$loss_md5"
-    loss_base=10.59486389 # output of dropout is different after supporting spmd
+    loss_base=10.55848312 # output of dropout is different after supporting spmd


这些影响了啥呀，为啥改。

gpt初始化权重改变了

ZHUI · 2024-12-25T08:57:44Z

paddlenlp/transformers/llama/modeling_network.py

+    flash_attention = None
+
+__all__ = [
+    "LlamaForCausalLM3DNet",


单卡和3D的组网不一样吗?如果一样是不是可以直接去掉3D的前缀？

JunnYu · 2024-12-26T09:13:20Z

paddlenlp/transformers/model_utils.py

+        """
+        Merged all auto dist configs into one config.
+        """
+        assert isinstance(configs, (dict, list))


这种assert，注明一下报错原因吧

JunnYu · 2024-12-26T09:13:47Z

paddlenlp/transformers/model_utils.py

+                    final_config["sp_config"] = config["sp_config"]
+                else:
+                    for k, v in config["sp_config"]["parallelize_plan"].items():
+                        assert k not in final_config["sp_config"]["parallelize_plan"].keys()


JunnYu · 2024-12-26T09:14:11Z

paddlenlp/transformers/model_utils.py

+            "sp_config": None,
+            "pp_config": None,
+        }
+        for config in configs:


整体适当加一些注释，便于理解

JunnYu · 2024-12-26T09:18:53Z

paddlenlp/trainer/auto_trainer.py

+            assert model is not None
+            assert isinstance(model, PretrainedModel)


assert 添加一些后面error的原因

JunnYu · 2024-12-26T09:30:23Z

paddlenlp/transformers/qwen/modeling_network.py

+        # # up
+        # a1 = self.w1(hidden_states)
+        # # gate
+        # a2 = self.w2(hidden_states)
+        # intermediate_parallel = a1 * F.silu(a2)
+        # down


如果是没用的一些代码可以删除

wawltor · 2024-12-27T01:53:58Z

llm/auto_parallel/gpt-3/gpt_with_intermediate.sh

+# export PYTHONPATH=../../../:$PYTHONPATH
+
+python -u -m paddle.distributed.launch \
+    --gpus "4,5,6,7" \


改成0,1,2,3是不是更合适点了？

wawltor · 2024-12-27T01:54:33Z

llm/auto_parallel/gpt-3/gpt_with_intermediate.sh

@@ -0,0 +1,113 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.


这里依赖的paddle版本是否给出明确的版本信息，可以提现在readme文档上

wawltor · 2024-12-27T02:05:36Z

paddlenlp/transformers/llama/modeling_network.py

+                    f"{prefix}lm_head.weight": dist.ColWiseParallel(),
+                }
+            },
+            "pp_config": {"split_spec": f"{prefix}llama.layers", "global_spec": "llama.global_layer"},


现在可以支持share weight的pipeline方式吗？

wawltor · 2024-12-27T02:08:34Z

paddlenlp/transformers/llama/modeling_network.py

+        if prefix != "":
+            assert prefix.endswith(".")
+        config = {
+            "sp_config": {


这里的配置有点疑问，sequence parallel虽然要依赖tensor parallel，但这里的配置和 tp config大部分是重复，是否可以减少点重复配置？

可以在后续优化

lugimzzz · 2024-12-27T09:48:19Z

对于DPO、KTO pipeline场景需要替换原有的criterion变成定制的DPOCriterion和KTOCriterion是否能支持
DPOCriterion pipeline需要通过公共变量infohub传出logit 半自动并行有影响吗 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_criterion.py#L299C1-L304C24 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_trainer.py#L342C1-L349C31
那么原来是nn.Linear现在还是nn.Linear?那么我如何判断这个linear是rowparallellinear还是 columnparallellinear
在LoRA 我原本是把ColumnParallelLinear替换为 ColumnParallelLoRALinear，现在我要怎么操作https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L509C1-L527C14

DesmonDay · 2024-12-27T09:52:57Z

llm/auto_parallel/gpt-3/run_pretrain_auto.py

    AutoTokenizer,
    CosineAnnealingWithWarmupDecay,
    GPTConfig,
    GPTForCausalLMAuto,


是否可以考虑统一一个run_pretrain_auto.py，不然每个模型维护一个脚本，维护成本比较大。

考率到修改启动脚本涉及到的ci/ce脚本较多。故计划此pr合入之后提交个pr统一修改

DesmonDay · 2024-12-27T10:05:36Z

paddlenlp/trainer/auto_trainer.py

    def _wrap_for_auto(self, model, train_dataloader):
-        logger.info("Wrapping model for auto paralle")
+        logger.info(f"Wrapping model for auto parallel using intermediate api {self.args.use_intermediate_api} ")
        dist_loader = self._wrap_for_dist_loader(train_dataloader)


这块dist_loader和我们目前PaddleNLP的 distributed_dataloader，区别大吗？看起来功能差得有点多

区别不大，只是包了一层dataloader。对dataloader其余的功能支持还在陆续开发中。coming soon

blacksheep-Aristotle · 2024-12-27T11:36:28Z

对于DPO、KTO pipeline场景需要替换原有的criterion变成定制的DPOCriterion和KTOCriterion是否能支持

DPOCriterion pipeline需要通过公共变量infohub传出logit 半自动并行有影响吗 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_criterion.py#L299C1-L304C24 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_trainer.py#L342C1-L349C31

那么原来是nn.Linear现在还是nn.Linear?那么我如何判断这个linear是rowparallellinear还是 columnparallellinear
在LoRA 我原本是把ColumnParallelLinear替换为 ColumnParallelLoRALinear，现在我要怎么操作https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L509C1-L527C14

1.sft/dpo/ppo的支持还在开发验证当中。当前pr只考虑了预训练的场景。
2.同上
3.是的.对于基础api，可以通过判断权重的切分方式来对lora weight设置分布式状态。对于中层api，可以直接修改layer的auto_dist_config。

blacksheep-Aristotle · 2024-12-27T11:38:13Z

对于DPO、KTO pipeline场景需要替换原有的criterion变成定制的DPOCriterion和KTOCriterion是否能支持

DPOCriterion pipeline需要通过公共变量infohub传出logit 半自动并行有影响吗 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_criterion.py#L299C1-L304C24 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_trainer.py#L342C1-L349C31

那么原来是nn.Linear现在还是nn.Linear?那么我如何判断这个linear是rowparallellinear还是 columnparallellinear
在LoRA 我原本是把ColumnParallelLinear替换为 ColumnParallelLoRALinear，现在我要怎么操作https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L509C1-L527C14
比如当前适配lora的做法：

DesmonDay · 2024-12-31T03:37:18Z

Unified checkpoint现在可以支持基本任意的分布式策略切换；
Unified checkpoint 保存出来的模型权重可以直接应用到推理框架、或者其他的训练流程中。

现在自动并行保存的格式可以同时支持这两点吗？
另外我看是需要把checkpoint转换为单卡权重，再转到unified checkpoint，是否太复杂了。能否直接支持 Unified checkpoint 的保存？

blacksheep-Aristotle · 2025-01-01T17:11:45Z

Unified checkpoint现在可以支持基本任意的分布式策略切换； Unified checkpoint 保存出来的模型权重可以直接应用到推理框架、或者其他的训练流程中。

现在自动并行保存的格式可以同时支持这两点吗？另外我看是需要把checkpoint转换为单卡权重，再转到unified checkpoint，是否太复杂了。能否直接支持 Unified checkpoint 的保存？

这个还在开发中呢。

lugimzzz · 2025-01-02T06:40:37Z

对于DPO、KTO pipeline场景需要替换原有的criterion变成定制的DPOCriterion和KTOCriterion是否能支持

DPOCriterion pipeline需要通过公共变量infohub传出logit 半自动并行有影响吗 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_criterion.py#L299C1-L304C24 https://github.com/lugimzzz/PaddleNLP/blob/develop/paddlenlp/trl/dpo_trainer.py#L342C1-L349C31

那么原来是nn.Linear现在还是nn.Linear?那么我如何判断这个linear是rowparallellinear还是 columnparallellinear
在LoRA 我原本是把ColumnParallelLinear替换为 ColumnParallelLoRALinear，现在我要怎么操作https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L509C1-L527C14

1.sft/dpo/ppo的支持还在开发验证当中。当前pr只考虑了预训练的场景。 2.同上 3.是的.对于基础api，可以通过判断权重的切分方式来对lora weight设置分布式状态。对于中层api，可以直接修改layer的auto_dist_config。

DPO/KTO主要需要考虑的是需要支持Criterion这块是否可以更加灵活的支持，而不是限制需要写在组网里

wawltor

LGTM

jeff41404 reviewed Nov 14, 2024

View reviewed changes

FeixLiu force-pushed the single_network branch from 3f735c3 to 0632d75 Compare November 20, 2024 02:18

JunnYu reviewed Nov 22, 2024

View reviewed changes

wawltor reviewed Nov 22, 2024

View reviewed changes

lugimzzz reviewed Nov 22, 2024

View reviewed changes

DesmonDay reviewed Nov 22, 2024

View reviewed changes

FeixLiu force-pushed the single_network branch 2 times, most recently from f1f4e46 to 612237d Compare November 27, 2024 05:04

FeixLiu force-pushed the single_network branch 2 times, most recently from 5e24d14 to f35407b Compare December 9, 2024 05:58

blacksheep-Aristotle force-pushed the single_network branch 2 times, most recently from 1720943 to 2ebb3dc Compare December 19, 2024 05:34

ZHUI reviewed Dec 24, 2024

View reviewed changes

blacksheep-Aristotle and others added 11 commits December 24, 2024 13:56

add single_model network and use intermediate api

b26a6ca

[AutoParallel]: fix llama_model_network run error

3f01533

New version of auto config

09d102f

bug fix

38e1e03

[AutoParallel]:add qwen run intermedaite api script

bf5f2b0

fix sharding

d9a776e

support attn_mask not None for pp

ddffc98

fix for mp

5367921

update api calling

edcd14d

update the script

f2a3574

add qwen mp/sp/pp config and process attn_mask and position_id

38de88f

blacksheep-Aristotle added 8 commits December 24, 2024 13:58

fix gpt_network to use intermediate_api

41f65c5

update ci loss baseline

ffde7ad

update gpt run_pretrain_py

5d0d2eb

fix sharding error

5a28b85

fix gpt format error

49a1cec

[AutoParallel]:fix llama vpp ci error

88c512e

[AutoParallel]:fix llama A100 ci failed

e0f6d0f

[AutoParallel]:update model formate

f00161a

blacksheep-Aristotle force-pushed the single_network branch from ca32e1d to f00161a Compare December 24, 2024 05:59

blacksheep-Aristotle added 3 commits December 24, 2024 15:07

[AutoParallel]:fix ipp error

78be8b0

[AutoParallel]:fix a100 ci error

27df94b

[AutoParallel]:fix a100 ci error

2102ad6

ZHUI reviewed Dec 25, 2024

View reviewed changes

[AutoParallel]:update model_netowrk formate

d295a76

JunnYu reviewed Dec 26, 2024

View reviewed changes

wawltor reviewed Dec 27, 2024

View reviewed changes

blacksheep-Aristotle and others added 4 commits December 27, 2024 10:40

[AutoParallel]:add explanatory note

9fe59d9

[AutoParallel]:add explanatory note

cdcfcc6

[AutoParallel]:add explanatory note

4b03203

Delete =1.0.0

67bb667

DesmonDay reviewed Dec 27, 2024

View reviewed changes

wawltor approved these changes Jan 3, 2025

View reviewed changes

wawltor merged commit 575896c into PaddlePaddle:develop Jan 3, 2025
9 of 12 checks passed



		class LlamaMLPNet(nn.Layer):
		def __init__(self, config, ipp: Optional[int] = None):


		return final_config

		def _generate_auto_dist_config(self, auto_dist_degree):

		assert model is not None
		assert isinstance(model, PretrainedModel)

		@@ -0,0 +1,113 @@
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.

add single_model network and use intermediate api #9412

add single_model network and use intermediate api #9412

Uh oh!

Conversation

blacksheep-Aristotle commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Nov 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wawltor Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DesmonDay Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blacksheep-Aristotle Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blacksheep-Aristotle commented Nov 12, 2024 •

edited

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

wawltor Nov 22, 2024 •

edited

Loading

DesmonDay Dec 27, 2024 •

edited

Loading

blacksheep-Aristotle Dec 20, 2024 •

edited

Loading

blacksheep-Aristotle Dec 24, 2024 •

edited

Loading

blacksheep-Aristotle Dec 24, 2024 •

edited

Loading