- I have experimented with different model architectures and found this network best
- Don't blindly put dropouts in every layers
- Learning rate scheduler is very good if your training loss is not converging
- Don't train for too many epochs in small dataset like this