Neurala modeller för pronomen i maskinöversättning

Tidsperiod: 2018-01-01 till 2021-12-31

Projektledare: Christian Hardmeier

Finansiär: Vetenskapsrådet

Bidragstyp: Projektbidrag

Budget: 5 200 000 SEK

Pronouns are challenging for machine translation (MT) because their use variesbetween languages and because anaphoric pronouns are subject to long-distanceconstraints outside the standard unit of MT. The problem has usually been viewedas simply that of selecting a pronoun´s correct gender and number.  As thisapproach has continually failed to improve translation quality, a new approachis required.This project aims to improve our understanding of pronouns in translation andcreate computational models of pronoun translation for MT. We study the languagepairs English-French and English-German. Informed by a review of the monolingualand contrastive literature on pronouns, we create a parallel core resource withrich manual annotations of pronouns. This will be used to develop tools tocollect quantitative data from larger and more diverse corpora, providing areliable characterisation of different pronominal constructions and translationstrategies. The annotations and tools will be released to benefit the community.Next, we use deep neural network models to capture and predict patterns inpronoun translation across a wide range of pronoun types. We explorecomputational methods to keep track of discourse-level information such asentities in the discourse and common ground between the text producer andcomprehender.Finally, we integrate our model into an end-to-end neural MT system and testhow it affects translation quality using targeted manual evaluation.