Gradient accumulate optimizer

**Describe the feature and the current behavior/state.**

Hi, I think it's good if someone can support Gradient Accumulate optimizer for this repo, this feature is really helpful for those who train the large model with a low resource such as Bert, etc. The usage should be similar with `tfa.optimizer.SWA`:

```
opt = ...
accumulate_opt = tfa.optimizer.AccumulationOptimizer(opt, accumulate_steps=5)
```

There is an implementation of gradient accumulator but for `custom training loop` rather than Keras model fit here [link](https://github.com/OpenNMT/OpenNMT-tf/blob/master/opennmt/optimizers/utils.py#L64-L124).

**Relevant information**
- Are you willing to contribute it (yes/no): no 
- Are you willing to maintain it going forward? (yes/no): no
- Is there a relevant academic paper? (if so, where): 
- Is there already an implementation in another framework? (if so, where): [here](https://github.com/OpenNMT/OpenNMT-tf/blob/master/opennmt/optimizers/utils.py#L64-L124) but for `custom training loop.`
- Was it part of tf.contrib? (if so, where): no

**Which API type would this fall under (layer, metric, optimizer, etc.)**
optimizer
**Who will benefit with this feature?**
all tensorflow users. 
**Any other info.**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient accumulate optimizer #2260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gradient accumulate optimizer #2260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions