a question in BERT pytorch scratch notebook

hi, a question about the loss calculation in BERT notebook:

https://github.com/ChanCheeKean/DataScience/blob/main/13%20-%20NLP/C04%20-%20BERT%20(Pytorch%20Scratch).ipynb            

# 2-1. NLL(negative log likelihood) loss of is_next classification result
next_loss = self.criterion(next_sent_output, data["is_next"])

# 2-2. NLLLoss of predicting masked token word
# transpose to (m, vocab_size, seq_len) vs (m, seq_len)
# criterion(mask_lm_output.view(-1, mask_lm_output.size(-1)), data["bert_label"].view(-1))
mask_loss = self.criterion(mask_lm_output.transpose(1, 2), data["bert_label"])

but the loss is defined as self.criterion = torch.nn.NLLLoss(ignore_index=0)
when the gt of the 'is_next' = 0, the gradient will not be backpropagted??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question in BERT pytorch scratch notebook #2

2-1. NLL(negative log likelihood) loss of is_next classification result

2-2. NLLLoss of predicting masked token word

transpose to (m, vocab_size, seq_len) vs (m, seq_len)

criterion(mask_lm_output.view(-1, mask_lm_output.size(-1)), data["bert_label"].view(-1))

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

a question in BERT pytorch scratch notebook #2

Description

2-1. NLL(negative log likelihood) loss of is_next classification result

2-2. NLLLoss of predicting masked token word

transpose to (m, vocab_size, seq_len) vs (m, seq_len)

criterion(mask_lm_output.view(-1, mask_lm_output.size(-1)), data["bert_label"].view(-1))

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions