0%
BERT
GPT-3
Bert fine-tune 策略
bert fine-tune 策略
- The top layer of BERT is more useful for text classification;
- With an appropriate layer-wise decreasing learning rate, BERT can overcome the catastrophic for getting problem;
- Within-task and in-domain further pre-training can significantly boost its performance;
- A preceding multi-task fine-tuning is also helpful to the single-task fine-tuning, but its benefit is smaller than further pre-training;
- BERT can improve the task with small-size data.
CTR 预估模型