Exactly reproduce 56 layers ResNet on CIFAR10 by Answeror · Pull Request #2046 · apache/mxnet

Answeror · 2016-05-06T02:38:59Z

Test accuracy are 0.9309 and 0.9303 in this patch and Kaiming He's paper, respectively.
The accuracy is the best one of the last 3 epochs (0.930288, 0.930889 and 0.929587), while the original paper select the best one in 5 runs. The dockerfile and log are in: https://gist.github.com/Answeror/f9160145e1c64bb509f52c00014bdb77

The only difference between this patch and Facebook's implementation (https://github.com/gcr/torch-residual-networks and https://github.com/facebook/fb.resnet.torch) are:

The kernel of shortcut with downsampling is 2x2 rather than 1x1. I can't reproduce this accuracy with 1x1 kernel. Note the shortcut does not contain learnable parameters.
I use a BatchNorm after data layer to simulate z-score normalization. Although subtract (127, 127, 127) and divide 60 works equally well.

Some details affect the accuracy:

Z-score normalization of the input.
Weight decay of all parameters (weight, bias, gamma, beta). See comments in train_cifar10_resnet.pyfor details.
Nesterov momentum
fix_gamma=False in BatchNorm (gamma is necessary because of the weight decay of the conv weight)
Initialization
4 pixel padding

And thanks #1230 (@freesouls) and #1041 (@shuokay) to provide preliminary implementations.

update@2016-06-08

With #2366 and a batch size of 64, I got an accuracy of 0.939704 after 200 epochs on 2 GPUs.
Note, the accuracy is strongly affected by the batch size, the more GPU you use, the smaller batch size should be.
See https://gist.github.com/Answeror/f9160145e1c64bb509f52c00014bdb77#file-resnet-dual-gpu-log for the full log.

piiswrong · 2016-05-06T03:28:24Z

@mavenlin @antinucleon

mli · 2016-05-06T06:21:12Z

+        # Note we use kernel (2, 2) rather than (1, 1) and a custom initializer
+        # in train_cifar10_resnet.py
+        # Test accuracy 0.918 on CIFAR10 with 56 layers and kernel (1, 1)
+        # TODO: Don't know why (1, 1) got much lower accuracy


please use TODO(Answeror), otherwise it's hard for others to track the TODO

Track TODO

Godricly · 2016-06-08T07:32:07Z

May I ask if you are using cudnn for this experiment? The 1e-5 eps get gives bad_param error. By adding 1e-9(my coworker used that value), it works well now.

Answeror · 2016-06-08T07:39:48Z

@Godricly Thank you for your report. Pending PR #2366 fixed this issue with cuDNN v5 (2e-5 is ok).

mli reviewed May 6, 2016
View reviewed changes

Exactly reproduce 56 layers ResNet on CIFAR10

8ca342e

Track TODO

Answeror force-pushed the example/resnet branch from 8e33090 to 8ca342e Compare May 6, 2016 06:42

Answeror added 2 commits May 7, 2016 13:06

Merge branch 'master' into example/resnet

bd223b3

Merge branch 'master' into example/resnet

9b9a438

antinucleon merged commit f59e82b into apache:master May 9, 2016

Answeror mentioned this pull request Jun 10, 2016

bug in symbol_resnet.py #2354

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exactly reproduce 56 layers ResNet on CIFAR10#2046

Exactly reproduce 56 layers ResNet on CIFAR10#2046
antinucleon merged 3 commits intoapache:masterfrom
Answeror:example/resnet

Answeror commented May 6, 2016 •

edited

Loading

Uh oh!

piiswrong commented May 6, 2016

Uh oh!

mli May 6, 2016

Uh oh!

Godricly commented Jun 8, 2016

Uh oh!

Answeror commented Jun 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Answeror commented May 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

update@2016-06-08

Uh oh!

piiswrong commented May 6, 2016

Uh oh!

mli May 6, 2016

Choose a reason for hiding this comment

Uh oh!

Godricly commented Jun 8, 2016

Uh oh!

Answeror commented Jun 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Answeror commented May 6, 2016 •

edited

Loading