机器翻译十大经典论文

推荐人：刘洋（清华大学副教授，机器翻译领域专家）

声明：本文由刘群（都柏林城市大学教授，中国科学院计算技术研究所研究员，机器翻译领域大牛）推荐，原始出处参见刘群老师的主页。更多文章参见这个链接。

以下是刘洋博士为新入门的同学精选的10篇统计机器翻译文献，供大家参考。

-----------------------------------------------------------------------------------------

为了帮助新进实验室的同学尽快进入统计机器翻译领域，我挑选了10篇比较好的论文。如果能真正掌握这十篇论文，对统计机器翻译便一定能有较深入的认识，更多的知识需要通过编程构建实际系统来获得。评选Top10历来有争议，论文的选择难免会受到我的研究经历的影响，但选出的文章肯定是好文章（可能更好的文章被漏掉了），希望能对大家有所帮助。

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263-311.
推荐指数：*****
推荐理由：这篇文章是统计机器翻译第一篇重要论文，可谓开山之作。作者提出了五个基于词的翻译模型，对后来的翻译模型和词语对齐研究产生巨大的影响，直至今日。依据五个IBM模型开发的工具GIZA++是目前被使用最广泛的统计机器翻译软件。
Franz J. Och and Hermann Ney. 2004. The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, 30(4):417-449.
推荐指数：*****
推荐理由：Franz J. Och的对齐模板系统一直雄霸NIST评测，这篇论文详细介绍了基于对齐模板的模型。更重要的是，Och首次将对数线性模型成功地应用于统计机器翻译（因此获得ACL 2002最佳论文奖），这篇论文也包含了这部分内容。
David Chiang. 2007. Hierarchical Phrase-Based Translation. Computational Linguistics, 33(2):201-228.
推荐指数：*****
推荐理由：蒋伟的层次短语模型在Phrase-based和Syntax-based之间找到一个近乎完美的平衡点（因此获得ACL 2005最佳论文奖），模型简洁高效，在性能上是目前最好的模型之一。该论文所描述的各项技术也是目前最新的。
Franz J. Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51.
推荐指数：****
推荐理由：本文是词语对齐的重要论文。论文的作者Franz J. Och开发了GIZA++。若要深入了解GIZA++，必须阅读此文。
Dekai Wu. 1997. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics, 23:377-404.
推荐指数：****
推荐理由：吴德凯的ITG十年来一直影响巨大，不可不入选。
Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, pages 127-133, Edmonton, Canada, May.
推荐指数：***
推荐理由：通过阅读三个Phrase-based的代表人写的论文，能够很好地了解Phrase-based的基本思想。
Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable Inference and Training of Context-Rich Syntactic Translation Models. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of Association for Computational Linguistics, pages 961-968, Sydney, Australia, July.
推荐指数：***
推荐理由：当前，ISI的工作是最领先的。由于还没有杂志文章来完整地描述ISI的串到树模型，我选择这篇来代表ISI的工作。如果想深入了解ISI的工作，还需多读几篇论文。
Franz J. Och. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of Association for Computational Linguistics, pages 160-167, Sapporo, Japan, July.
推荐指数：***
推荐理由：对数线性模型是目前统计机器翻译的标准模型框架，最小错误率训练用来优化特征权重，是必须掌握的技术。
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the ACL, pages 311-318, Philadelphia, PA, July.
推荐指数：***
推荐理由：凡是听说过SMT不会没听说过BLEU。这篇文章提出了自动评价指标BLEU，是机器翻译自动评测领域的重要论文。
Ben Taskar, Simon Lacoste-Julien, and Dan Klein. 2005. A Discriminative Matching Approach to Word Alignment. In Proceedings of HLT/EMNLP 2005, pages 73-80, Vancouver, British Columbia, Canada, October.
推荐指数：***
推荐理由：词语对齐近几年来很受关注，特别是判别模型。这篇文章在众多文章中引用率相当高，通过阅读这篇论文，可以大致了解词语对齐的判别模型的基本思想。

这些文章都可以在ACL Anthology或者Google Scholar Search下载到。

更多文章参见这个链接。

voters

我爱计算机

Report Story

Tags : 机器翻译

我爱计算机

机器翻译十大经典论文

留下你的评论

Cancel Reply

最近热文

今日头条

分类导航

站内搜索

猜你喜欢

留下你的评论

最近热文

今日头条

分类导航

站内搜索

登录