A Statistical Model for Lost Language Decipherment
Printer-friendly versionPDF version
| Title | A Statistical Model for Lost Language Decipherment |
| Publication Type | Conference Paper |
| Year of Publication | 2010 |
| Authors | Snyder, B, Barzilay R, Knight K |
| Conference Name | Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics |
| Date Published | July |
| Publisher | Association for Computational Linguistics |
| Conference Location | Uppsala, Sweden |
| Abstract | In this paper we propose a method for the automatic decipherment of lostlanguages. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and high-level morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps 29 of 30 letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for 60% of the Ugaritic words which have cognates in Hebrew. |
| URL | http://www.aclweb.org/anthology/P10-1107 |
| Attachment | Size |
|---|---|
| Statistical Model for Lost Language Decipherment.pdf | 371.39 KB |