Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Exactly reproduce 56 layers ResNet on CIFAR10#2046

Merged
antinucleon merged 3 commits intoapache:masterfrom
Answeror:example/resnet
May 9, 2016
Merged

Exactly reproduce 56 layers ResNet on CIFAR10#2046
antinucleon merged 3 commits intoapache:masterfrom
Answeror:example/resnet

Conversation

@Answeror
Copy link
Contributor

@Answeror Answeror commented May 6, 2016

Test accuracy are 0.9309 and 0.9303 in this patch and Kaiming He's paper, respectively.
The accuracy is the best one of the last 3 epochs (0.930288, 0.930889 and 0.929587), while the original paper select the best one in 5 runs. The dockerfile and log are in: https://gist.github.com/Answeror/f9160145e1c64bb509f52c00014bdb77

The only difference between this patch and Facebook's implementation (https://github.com/gcr/torch-residual-networks and https://github.com/facebook/fb.resnet.torch) are:

  1. The kernel of shortcut with downsampling is 2x2 rather than 1x1. I can't reproduce this accuracy with 1x1 kernel. Note the shortcut does not contain learnable parameters.
  2. I use a BatchNorm after data layer to simulate z-score normalization. Although subtract (127, 127, 127) and divide 60 works equally well.

Some details affect the accuracy:

  1. Z-score normalization of the input.
  2. Weight decay of all parameters (weight, bias, gamma, beta). See comments in train_cifar10_resnet.pyfor details.
  3. Nesterov momentum
  4. fix_gamma=False in BatchNorm (gamma is necessary because of the weight decay of the conv weight)
  5. Initialization
  6. 4 pixel padding

And thanks #1230 (@freesouls) and #1041 (@shuokay) to provide preliminary implementations.

update@2016-06-08

With #2366 and a batch size of 64, I got an accuracy of 0.939704 after 200 epochs on 2 GPUs.
Note, the accuracy is strongly affected by the batch size, the more GPU you use, the smaller batch size should be.
See https://gist.github.com/Answeror/f9160145e1c64bb509f52c00014bdb77#file-resnet-dual-gpu-log for the full log.

@piiswrong
Copy link
Contributor

@mavenlin @antinucleon

# Note we use kernel (2, 2) rather than (1, 1) and a custom initializer
# in train_cifar10_resnet.py
# Test accuracy 0.918 on CIFAR10 with 56 layers and kernel (1, 1)
# TODO: Don't know why (1, 1) got much lower accuracy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use TODO(Answeror), otherwise it's hard for others to track the TODO

@antinucleon antinucleon merged commit f59e82b into apache:master May 9, 2016
@Godricly
Copy link
Contributor

Godricly commented Jun 8, 2016

May I ask if you are using cudnn for this experiment? The 1e-5 eps get gives bad_param error. By adding 1e-9(my coworker used that value), it works well now.

@Answeror
Copy link
Contributor Author

Answeror commented Jun 8, 2016

@Godricly Thank you for your report. Pending PR #2366 fixed this issue with cuDNN v5 (2e-5 is ok).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants