Skip to content

Commit 881903e

Browse files
add advantage normalization for ppo gae
1 parent 7f2bb74 commit 881903e

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

ppo_gae_discrete.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,9 @@ def train_net(self):
7575
advantage_lst.append([advantage])
7676
advantage_lst.reverse()
7777
advantage = torch.tensor(advantage_lst, dtype=torch.float)
78-
78+
# this can have significant improvement (efficiency, stability) on performance
79+
advantage = (advantage - advantage.mean()) / (advantage.std() + 1e-5)
80+
7981
pi = self.pi(s, softmax_dim=-1)
8082
dist_entropy = Categorical(pi).entropy()
8183
pi_a = pi.gather(1,a)
@@ -126,4 +128,4 @@ def main():
126128
env.close()
127129

128130
if __name__ == '__main__':
129-
main()
131+
main()

0 commit comments

Comments
 (0)