
资料内容:
2 Related Work
There is a long history of pre-training general lan-
guage representations, and we briefly review the
most widely-used approaches in this section.
2.1 Unsupervised Feature-based Approaches
Learning widely applicable representations of
words has been an active area of research for
decades, including non-neural (Brown et al., 1992;
Ando and Zhang, 2005; Blitzer et al., 2006) and
neural (Mikolov et al., 2013; Pennington et al.,
2014) methods. Pre-trained word embeddings
are an integral part of modern NLP systems, of-
fering significant improvements over embeddings
learned from scratch (Turian et al., 2010). To pre-
train word embedding vectors, left-to-right lan-
guage modeling objectives have been used (Mnih
and Hinton, 2009), as well as objectives to dis-
criminate correct from incorrect words in left and
right context (Mikolov et al., 2013)