## Key takeaways - I have experimented with different model architectures and found this network best - Don't blindly put dropouts in every layers - Learning rate scheduler is very good if your training loss is not converging - Don't train for too many epochs in small dataset like this