

Jason Eisner. Bilexical grammars and their cubic-time parsing algorithms. In Advances in probabilistic and other parsing technologies, pages 29–61. Springer, 2000.


Jason Eisner. Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 1–8. 2002.


Jason Eisner. Inside-outside and forward-backward algorithms are just backprop (tutorial paper). In Proceedings of the Workshop on Structured Prediction for NLP, 1–17. 2016.


Joshua Goodman. Semiring parsing. Computational Linguistics, 25(4):573–605, 1999.


Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. Differentiable scheduled sampling for credit assignment. arXiv preprint arXiv:1704.06970, 2017.


Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.


Terry Koo, Amir Globerson, Xavier Carreras Pérez, and Michael Collins. Structured prediction models via the matrix-tree theorem. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 141–150. 2007.


Wouter Kool, Herke van Hoof, and Max Welling. Stochastic beams and where to find them: the gumbel-top-k trick for sampling sequences without replacement. CoRR, 2019. URL:, arXiv:1903.06059.


Zhifei Li and Jason Eisner. First-and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, 40–51. Association for Computational Linguistics, 2009.


Andre Martins and Ramon Astudillo. From softmax to sparsemax- a sparse model of attention and multi-label classification. In International Conference on Machine Learning, 1614–1623. 2016.


Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 523–530. Association for Computational Linguistics, 2005.


Arthur Mensch and Mathieu Blondel. Differentiable dynamic programming for structured prediction and attention. arXiv preprint arXiv:1802.03676, 2018.


Mitchell Stern, Jacob Andreas, and Dan Klein. A minimal span-based neural constituency parser. arXiv preprint arXiv:1705.03919, 2017.


Charles Sutton, Andrew McCallum, and others. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4):267–373, 2012.