References

Eis00

Jason Eisner. Bilexical grammars and their cubic-time parsing algorithms. In Advances in probabilistic and other parsing technologies, pages 29–61. Springer, 2000.

Eis02

Jason Eisner. Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 1–8. 2002.

Eis16

Jason Eisner. Inside-outside and forward-backward algorithms are just backprop (tutorial paper). In Proceedings of the Workshop on Structured Prediction for NLP, 1–17. 2016.

Goo99

Joshua Goodman. Semiring parsing. Computational Linguistics, 25(4):573–605, 1999.

GDBK17

Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. Differentiable scheduled sampling for credit assignment. arXiv preprint arXiv:1704.06970, 2017.

HXY15

Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.

KGCPerezC07

Terry Koo, Amir Globerson, Xavier Carreras Pérez, and Michael Collins. Structured prediction models via the matrix-tree theorem. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 141–150. 2007.

KvHW19

Wouter Kool, Herke van Hoof, and Max Welling. Stochastic beams and where to find them: the gumbel-top-k trick for sampling sequences without replacement. CoRR, 2019. URL: http://arxiv.org/abs/1903.06059, arXiv:1903.06059.

LE09

Zhifei Li and Jason Eisner. First-and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, 40–51. Association for Computational Linguistics, 2009.

MA16

Andre Martins and Ramon Astudillo. From softmax to sparsemax- a sparse model of attention and multi-label classification. In International Conference on Machine Learning, 1614–1623. 2016.

MPRHajivc05

Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 523–530. Association for Computational Linguistics, 2005.

MB18

Arthur Mensch and Mathieu Blondel. Differentiable dynamic programming for structured prediction and attention. arXiv preprint arXiv:1802.03676, 2018.

SAK17

Mitchell Stern, Jacob Andreas, and Dan Klein. A minimal span-based neural constituency parser. arXiv preprint arXiv:1705.03919, 2017.

SM+12

Charles Sutton, Andrew McCallum, and others. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4):267–373, 2012.