SRL for low resource languages isn’t needed for semantic SMT

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/76031
Información del item - Informació de l'item - Item information
Title: SRL for low resource languages isn’t needed for semantic SMT
Authors: Beloucif, Meriem | Wu, Dekai
Keywords: Machine Translation
Knowledge Area: Lenguajes y Sistemas Informáticos
Issue Date: 2018
Publisher: European Association for Machine Translation
Citation: Beloucif, Meriem; Wu, Dekai. “SRL for low resource languages isn’t needed for semantic SMT”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 59-68
Abstract: Previous attempts at injecting semantic frame biases into SMT training for low resource languages failed because either (a) no semantic parser is available for the low resource input language; or (b) the output English language semantic parses excise relevant parts of the alignment space too aggressively. We present the first semantic SMT model to succeed in significantly improving translation quality across many low resource input languages for which no automatic SRL is available —consistently and across all common MT metrics. The results we report are the best by far to date for this type of approach; our analyses suggest that in general, easier approaches toward including semantics in training SMT models may be more feasible than generally assumed even for low resource languages where semantic parsers remain scarce. While recent proposals to use the crosslingual evaluation metric XMEANT during inversion transduction grammar (ITG) induction are inapplicable to low resource languages that lack semantic parsers, we break the bottleneck via a vastly improved method of biasing ITG induction toward learning more semantically correct alignments using the monolingual semantic evaluation metric MEANT. Unlike XMEANT, MEANT requires only a readily-available English (output language) semantic parser. The advances we report here exploit the novel realization that MEANT represents an excellent way to semantically bias expectation-maximization induction even for low resource languages. We test our systems on challenging languages including Amharic, Uyghur, Tigrinya and Oromo. Results show that our model influences the learning towards more semantically correct alignments, leading to better translation quality than both the standard ITG or GIZA++ based SMT training models on different datasets.
Sponsor: This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under LORELEI contract HR0011-15-C-0114, BOLT contracts HR0011-12-C-0014 and HR0011-12-C-0016, and GALE contracts HR0011-06-C-0022 and HR0011-06-C-0023; by the European Union under the Horizon 2020 grant agreement 645452 (QT21) and FP7 grant agreement 287658; and by the Hong Kong Research Grants Council (RGC) research grants GRF16210714, GRF16214315, GRF620811 and GRF621008.
URI: http://hdl.handle.net/10045/76031
ISBN: 978-84-09-01901-4
Language: eng
Type: info:eu-repo/semantics/conferenceObject
Rights: © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.
Peer Review: si
Publisher version: http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf
Appears in Collections:Congresos - EAMT2018 - Proceedings
Research funded by the EU

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailEAMT2018-Proceedings_08.pdf1,94 MBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons