Typologiskt baserad dependensanalys

Tidsperiod: 2020-07-01 till 2023-06-30

Projektledare: Miryam de Lhoneux

Finansiär: Vetenskapsrådet

Bidragstyp: Bidrag för anställning eller stipendier

Budget: 3 150 000 SEK

Language technology is increasingly becoming a part of our daily lives. We rely on search engines, spell checkers, machine translation, etc. Access to such language technology is however very unequal across the world, as many languages are not well covered or not covered at all by these tools. These tools make use of machine learning which requires access to large amounts of (annotated) data. They are poorly equipped to deal with languages for which there is little data available. To mitigate that problem, Natural Language Processing (NLP) models using cross-lingual transfer where knowledge is transferred between languages have started gaining popularity. This has improved the situation for some languages, but the situation remains unchanged for most of the world´s languages. In many of these methods, transfer is implicitly learned by the machine learning model. There is indication that this implicit transfer only works well in specific situations. This project uses linguistic knowledge to explicitly transfer knowledge across languages, thereby combining human and machine intelligence. We use information from typological databases to inform models that predict the structure of sentences, syntactic parsers, a central component of many NLP tools for which there is a multilingual collection of annotated data. We seek to advance our understanding of current methods of cross-lingual transfer, develop new ones, and increase the number of languages we can accurately parse.