Tidsperiod: 2017-01-01 till 2020-12-31
Projektledare: Joakim Nivre
Budget: 3 448 000 SEK
Natural language processing plays an increasingly important role in our lives. Whenever we make use of search engines, voice interfaces, or online translation services, we rely on the capacity of computers to interpret human language. However, high-quality language processingis currently limited to a small set of languages, in particular English, and models developed for these languages do not generalize well to structurally different languages. Until recently, multilingual researchwas also hampered by the lack of cross-linguistically consistent standards for linguisticannotation. This projectstudies modelsfor grammatical analysis of typologically diverse languages with the aim of finding out what techniques work well across languages and what aspectsrequire language-specific adaptation. The processingmodelis based ondependency parsing, the currently dominant approach to syntactic analysis, but extends it to better cope with typological diversity, in particular to handle different ways of encoding grammatical structure by means ofmorphological inflection, function words, and word order patterns. The theoretical framework is Universal Dependencies (UD), a system for cross-linguistically consistent grammatical analysisso farapplied to over 40 languages. The project resultswill furtherour theoretical understanding ofthe interplay between language and technologyand willenable high-qualitylanguage processingfor a much wider range of languages than before.