Skip to content

Commit 8dbe42e

Browse files
authored
Update README.md
1 parent 09d34cf commit 8dbe42e

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,17 @@
22
</br>
33
Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) in one optimizer.
44
</br>
5-
Latest version 20.4.11 - new record for accuracy on benchmarks with the addition of Gradient Centralization! </br> </br>
6-
What is Gradient Centralization? = "GC can be viewed as a projected gradient descent method with a constrained loss function. The Lipschitzness of the constrained loss function and its gradient is better so that the training process becomes more efficient and stable." Source paper: https://arxiv.org/abs/2004.01461v2
75

6+
### Latest version 20.4.11 - new record for accuracy on benchmarks vs all optimizers tested, with the addition of Gradient Centralization!
7+
</br> </br>
8+
What is Gradient Centralization? = "GC can be viewed as a projected gradient descent method with a constrained loss function. The Lipschitzness of the constrained loss function and its gradient is better so that the training process becomes more efficient and stable." Source paper: https://arxiv.org/abs/2004.01461v2
9+
</br>
810
Ranger now uses Gradient Centralization by default, and applies it to all conv and fc layers by default. However, everything is customizable so you can test with and without on your own datasets. (Turn on off via "use_gc" flag at init).
911
</br>
1012
### Best training results - use a 75% flat lr, then step down and run lower lr for 25%, or cosine descend last 25%.
11-
</br>
12-
</br> It's important to note that simply running one learning rate the entire time will not produce optimal results. Effectively Ranger will end up 'hovering' around the an optimal zone but can't descend into it unless it has some additional run time at a lower rate to drop down into the optimal valley.
13+
14+
</br> Per extensive testing - It's important to note that simply running one learning rate the entire time will not produce optimal results.
15+
Effectively Ranger will end up 'hovering' around the optimal zone, but can't descend into it unless it has some additional run time at a lower rate to drop down into the optimal valley.
1316

1417
### Full customization at init:
1518

0 commit comments

Comments
 (0)