NLP 学习相关网站

发布： 2021-08-01 分类： NLP 评论：阅读：

BERT

The top layer of BERT is more useful for text classification;
With an appropriate layer-wise decreasing learning rate, BERT can overcome the catastrophic for getting problem;
Within-task and in-domain further pre-training can significantly boost its performance;
A preceding multi-task fine-tuning is also helpful to the single-task fine-tuning, but its benefit is smaller than further pre-training;
BERT can improve the task with small-size data.