It would be worth having something generic for all things related to stochastic approximations, to be separated from variational inference itself. E.g., a sgd class to have different stochastic gradient methods available, a learning rate class for testing various learning rates, a subsampling class, etc. This will eventually be necessary as we start working on more research tracks, e.g., Mandt and Blei (2014), Theis and Hoffman (2015), Tran et al. (2015).
It should also be applicable for computing the penalized MLE, so that the optimization interface of Stan also has SGD available for users.
It would be worth having something generic for all things related to stochastic approximations, to be separated from variational inference itself. E.g., a sgd class to have different stochastic gradient methods available, a learning rate class for testing various learning rates, a subsampling class, etc. This will eventually be necessary as we start working on more research tracks, e.g., Mandt and Blei (2014), Theis and Hoffman (2015), Tran et al. (2015).
It should also be applicable for computing the penalized MLE, so that the optimization interface of Stan also has SGD available for users.