Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead in one codebase. Medium article with more info: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
Multiple updates: 1 - We used Ranger to beat the FastAI leaderboard score by nearly 20% (19.77%). The trick was to combine Ranger with: Mish activation function, and flat+ cosine anneal training curve. 2 - Based on that, also found .95 is better than .90 for beta1 (momentum) param (ala betas=(0.95, 0.999)). 3 - Verified no load/save issues in our codebase here. It was an issue for people that were using LookAhead/RAdam as seperate components.
Usage and notebook to test are available here: https://github.com/lessw2020/Ranger-Mish-ImageWoof-5