- Updated Distribution class to include dimensions.
- Updated Mujoco to v1.31
- Fixed
tensor_utils.concat_tensor_dict_listto handle nested situations properly.
- Default nonlinearity for
CategoricalMLPPolicychanged totanhas well, for consistency. - Add
flatten_n,unflatten_nsupport forDiscreteandProductspaces. - Changed
dist_info_symanddist_infointerface for policies. Previously it takes both the observations and actions as input arguments, where actions are needed for recurrent policies when the policy takes both the current state and the previous action into account. However this is rather artificial. The interface is now changed to take in the observation plus a dictionary of state-related information. An extra propertystate_info_keysis added to specify the list of keys used for state-related information. By default this is an empty list. - Removed
lasagne_recurrent.pysince it's not used anywhere, and its functionality is replaced byGRUNetworkimplemented inrllab.core.network.
- Restored the default value of the
whole_pathsparameter inBatchPoloptback toTrue. This is more consistent with previous configurations.
- Removed the helper method
rllab.misc.ext.merge_dict. Turns out Python'sdictconstructor already supports this functionality:merge_dict(dict1, dict2) == dict(dict1, **dict2). - Added a
min_stdoption toGaussianMLPPolicy. This avoids the gradients being unstable near deterministic policies.
- Added a method
truncate_pathsto therllab.sampler.parallel_samplermodule. This should be sufficient to replace the old configurable parameterwhole_pathswhich has been removed during refactoring.
- Known issues:
- TRPO does not work well with relu since the hessian is undefined at 0, causing NaN sometimes. This issue of Theano is tracked here: Theano/Theano#4353). If relu must be used, try using
theano.tensor.maximum(x, 0.)as opposed totheano.tensor.nnet.relu.
- TRPO does not work well with relu since the hessian is undefined at 0, causing NaN sometimes. This issue of Theano is tracked here: Theano/Theano#4353). If relu must be used, try using
- Fixed bug of TNPG (max_backtracks should be set to 1 instead of 0)
- Neural network policies now use tanh nonlinearities by default
- Refactored interface for
rllab.sampler.parallel_sampler. Extracted new modulerllab.sampler.stateful_poolcontaining general parallelization utilities. - Fixed numerous issues in tests causing too long to run.
- Merged release branch onto master and removed the release branch, to avoid potential confusions.
Features:
- Upgraded Mujoco interface to accomodate v1.30