Page MenuHomePhabricator

Maybe revisit our strategy for selecting the language of a lexeme
Open, LowPublic

Description

Description

When we map from a lexeme from Wikidata to an instance of Z6005 / Wikidata lexeme, we need to determine the relevant language, as a ZObject of type Z60. The language field of a lexeme from Wikidata contains a QID. Because there's no perfect mapping from that field to our Z60 instances, and it would be some effort to maintain such a mapping, we currently use a heuristic that we believe will work for the vast majority of lexemes.

However, it might be good to investigate more thoroughly whether this is the best available approach.

This code is in the orchestrator, db.js, getLexemeLanguage.

Desired behavior/Acceptance criteria (returned value, expected error, performance expectations, etc.)

  • A strategy that selects the most correct Z60/Natural language for the greatest number of lexemes.

Completion checklist

Event Timeline

I believe T344170 would be helpful here.

I agree. Also, Wikidata language QIDs need to be highly stable in the Lexeme space so I’m not convinced that maintaining such a link would require much effort. If it does, I imagine that maintaining it at the language level would require less effort than maintaining it at the Lexeme level and, in practice, I guess we would need to do something anyway!