hi, a question about the loss calculation in BERT notebook:
https://github.com/ChanCheeKean/DataScience/blob/main/13%20-%20NLP/C04%20-%20BERT%20(Pytorch%20Scratch).ipynb
2-1. NLL(negative log likelihood) loss of is_next classification result
next_loss = self.criterion(next_sent_output, data["is_next"])
2-2. NLLLoss of predicting masked token word
transpose to (m, vocab_size, seq_len) vs (m, seq_len)
criterion(mask_lm_output.view(-1, mask_lm_output.size(-1)), data["bert_label"].view(-1))
mask_loss = self.criterion(mask_lm_output.transpose(1, 2), data["bert_label"])
but the loss is defined as self.criterion = torch.nn.NLLLoss(ignore_index=0)
when the gt of the 'is_next' = 0, the gradient will not be backpropagted??
hi, a question about the loss calculation in BERT notebook:
https://github.com/ChanCheeKean/DataScience/blob/main/13%20-%20NLP/C04%20-%20BERT%20(Pytorch%20Scratch).ipynb
2-1. NLL(negative log likelihood) loss of is_next classification result
next_loss = self.criterion(next_sent_output, data["is_next"])
2-2. NLLLoss of predicting masked token word
transpose to (m, vocab_size, seq_len) vs (m, seq_len)
criterion(mask_lm_output.view(-1, mask_lm_output.size(-1)), data["bert_label"].view(-1))
mask_loss = self.criterion(mask_lm_output.transpose(1, 2), data["bert_label"])
but the loss is defined as self.criterion = torch.nn.NLLLoss(ignore_index=0)
when the gt of the 'is_next' = 0, the gradient will not be backpropagted??