A quick demonstration of the Transformer architecture from the original 2017 paper, applied to a NMT task.
Another demonstration of the Vision Transformer architecture from the 2020 paper applied to classification tasks across two different image datasets.