References¶
- Eis00
Jason Eisner. Bilexical grammars and their cubic-time parsing algorithms. In Advances in probabilistic and other parsing technologies, pages 29–61. Springer, 2000.
- Eis02
Jason Eisner. Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 1–8. 2002.
- Eis16
Jason Eisner. Inside-outside and forward-backward algorithms are just backprop (tutorial paper). In Proceedings of the Workshop on Structured Prediction for NLP, 1–17. 2016.
- Goo99
Joshua Goodman. Semiring parsing. Computational Linguistics, 25(4):573–605, 1999.
- GDBK17
Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. Differentiable scheduled sampling for credit assignment. arXiv preprint arXiv:1704.06970, 2017.
- HXY15
Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
- KGCPerezC07
Terry Koo, Amir Globerson, Xavier Carreras Pérez, and Michael Collins. Structured prediction models via the matrix-tree theorem. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 141–150. 2007.
- KvHW19
Wouter Kool, Herke van Hoof, and Max Welling. Stochastic beams and where to find them: the gumbel-top-k trick for sampling sequences without replacement. CoRR, 2019. URL: http://arxiv.org/abs/1903.06059, arXiv:1903.06059.
- LE09
Zhifei Li and Jason Eisner. First-and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, 40–51. Association for Computational Linguistics, 2009.
- MA16
Andre Martins and Ramon Astudillo. From softmax to sparsemax- a sparse model of attention and multi-label classification. In International Conference on Machine Learning, 1614–1623. 2016.
- MPRHajivc05
Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 523–530. Association for Computational Linguistics, 2005.
- MB18
Arthur Mensch and Mathieu Blondel. Differentiable dynamic programming for structured prediction and attention. arXiv preprint arXiv:1802.03676, 2018.
- SAK17
Mitchell Stern, Jacob Andreas, and Dan Klein. A minimal span-based neural constituency parser. arXiv preprint arXiv:1705.03919, 2017.
- SM+12
Charles Sutton, Andrew McCallum, and others. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4):267–373, 2012.