transformer论文集合下载-Python知识分享网

transformer论文集合下载

匿名网友发布于：2025-05-26 09:54:38

(侵权举报)

(假如点击没反应，多刷新两次就OK！)

transformer论文集合下载图1

资料内容：

1 Introduction
Transformer has been the most widely used ar-
chitecture for machine translation (Vaswani et al.,
2017). Despite its strong performance, the decod-
ing of Transformer is inefficient as it adopts the
sequential auto-regressive factorization for its prob-
ability model (Figure 1a). Recent work such as
non-autoregressive transformer (NAT), aim to de-
code target tokens in parallel to speed up the gener-
ation (Gu et al., 2018). However, the vanilla NAT
still lags behind Transformer in the translation qual-
ity – with a gap about 7.0 BLEU score. NAT as-
sumes the conditional independence of the target
tokens given the source sentence. We suspect that
NAT’s conditional independence assumption pre-
vents learning word interdependency in the target
sentence. Notice that such word interdependency
is crucial, as the Transformer explicitly captures
that via decoding from left to right (Figure 1a).

热门帖子推荐

相关帖子推荐

热门标签推荐