Pre-trained Models

Pre-trained Models for Natural Language Processing: A survey [paper]

Language Representation Learning

Non-contextual Embeddings
Contextual Embeddings
- Neural Contextual Encoders
  - Sequence Models
    - CNN
    - RNN
  - Non-sequence Models
    - Fully-connected self-attention model

Pretrained Models

Why?
- Learn universal language representations
- better model initialization -> better generalization
- regularization to avoid overfitting on small data
1st Generation - Pretrained word embeddings
- Neural Network LM
  - Word2vec, GloVe (Based on CBOW, Skip-Gram)
2nd Generation - Pretrained Contextual encoders
- ELMo
- ULMFit
- GPT, BERT

Overview of Pretrained Models

Pretraining Tasks
- Language Modeling (LM)
- Masked Language Model (MLM)
- Permuted Language Model
- Denoising Autoencoder (DAE)
- Contrastive Learning (CTL)

Pre-training with Whole Word Masking for Chinese BERT

在训练BERT任务时，不随机mask单词，而是mask连续的词

比如：

[Original Sentence]

使用语言模型来预测下一个词的probability。
使用语言[MASK]型来[MASK]测下一次的pro[MASK]##lity。 -随机mask
使用语言[MASK][MASK]来[MASK][MASK]下一个词的[MASK][MASK][MASK]。 -本工作中使用的mask方法

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Contributions

提出了更好的Span Mask方案，随即遮盖连续一段字比随机遮盖分散字更好。
通过加入Span Boundary Objective（SBO）训练目标，增强了BERT的性能，特别在一些与span相关的任务，如抽取式问答。
用实验获得了和XLNet类似的结果，发现不加入Next Sentence Prediction任务，直接用连续一长句训练效果更好。

Method

Sidenotes

知乎常见预训练语言模型总结

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained Models

Pre-trained Models for Natural Language Processing: A survey [paper]

Language Representation Learning

Pretrained Models

Overview of Pretrained Models

Pre-training with Whole Word Masking for Chinese BERT

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Sidenotes

FilesExpand file tree

PTM.md

Latest commit

History

PTM.md

File metadata and controls

Pre-trained Models

Pre-trained Models for Natural Language Processing: A survey [paper]

Language Representation Learning

Pretrained Models

Overview of Pretrained Models

Pre-training with Whole Word Masking for Chinese BERT

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Sidenotes