Page MenuHomePhabricator

Enable dumpJson.php to output Lexemes
Closed, ResolvedPublic

Description

This may already work but we have to ensure it does the right thing.

  • run repo/maintenance/dumpJson.php with Lexemes and ensure it looks "right" (like e.g. Special:EntityData)
  • write an integration test which verifies this

Event Timeline

Change 633177 had a related patch set uploaded (by Itamar Givon; owner: Itamar Givon):
[mediawiki/extensions/Wikibase@master] Add lexeme dump integration tests

https://gerrit.wikimedia.org/r/633177

Change 633196 had a related patch set uploaded (by Itamar Givon; owner: Itamar Givon):
[mediawiki/extensions/WikibaseLexeme@master] Add lexeme dupm integration tests

https://gerrit.wikimedia.org/r/633196

Change 633177 abandoned by Itamar Givon:
[mediawiki/extensions/Wikibase@master] Add lexeme dump integration tests

Reason:

https://gerrit.wikimedia.org/r/633177

toan removed toan as the assignee of this task.Oct 16 2020, 9:30 AM
toan added a subscriber: toan.

@hoo pointed out in the test (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseLexeme/+/633196/2/tests/phpunit/maintenance/DumpJsonTest.php#155) that it is not sure that forms / senses will get the correct datatype when serialized. In the current test we are not expecting any datatypes because of the way it's setup.

I started looking into this yesterday and found that we probably need to add something in Wikibase that can either allow extensions to register where Wikibase should be looking to inject these datatypes OR make these injections more general to support nested claims no matter how deep they are hidden inside the entity.

I find the second alternative better because the structure of serialized claims won't really change between entities. However doing these changes also has a high risk of slowing down already very slow dump scripts to a point where they are too slow.

Therefore I started looking into how these injections could be separated into it's own class and allowing extensions to register these paths for any entity-type through WikibaseLexeme.entitytypes.php for example. (
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/634462)

However it's still unclear to me exactly where these datatypes are supposed to show up, and would like someone in the know to confirm that this is the right way forward.

Note that we already solved the missing datatypes for API output earlier this year – see T249206. Specifically, change 620097 solved it by expanding the “paths” where datatypes are added from e. g. 'claims/*/*/qualifiers' to include '*/*/claims/*/*/qualifiers'.

Change 634462 had a related patch set uploaded (by Tobias Andersson; owner: Tobias Andersson):
[mediawiki/extensions/Wikibase@master] Add JsonDumpDataTypeInjector

https://gerrit.wikimedia.org/r/634462

Change 634462 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add JsonDataTypeInjector

https://gerrit.wikimedia.org/r/634462

Change 633196 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add lexeme dump integration tests

https://gerrit.wikimedia.org/r/633196