Page MenuHomePhabricator

Separate dumps for Lexemes
Closed, ResolvedPublic

Description

As a data-reuser I want to work with reasonably-sized dumps and not care about parts of the data that is not interesting for me.
We should have separate dumps for the lexicographical data in the Lexeme namespace. This should be done before loading lexicographical data into the query service.

Open questions:

  • We do need to do this for the RDF dumps. What about the others?

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedSmalyshev

Event Timeline

Lydia_Pintscher created this task.

Another question: lexeme dumps would be referring to a lot of Wikidata items. Do we want some information (like stubs/labels/etc.) about these in lexeme dump, or just ignore that?

Change 461862 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Add lexemes dump as separate dump

https://gerrit.wikimedia.org/r/461862

Change 461862 merged by ArielGlenn:
[operations/puppet@production] Add lexemes dump as separate dump

https://gerrit.wikimedia.org/r/461862