Page MenuHomePhabricator

Seed test system with data useful for toolbuilder validation
Closed, ResolvedPublic3 Estimated Story Points

Description

We need to add some sample data sets (minimally, properties for the equivEntities portion of the Manifest) to our test system on cloudVPS. This is so that tool builders can use the output of the test system's WBManifest file to configure their tools to work properly.

The minimal set of entities to include would include "Instance of" and "Subclass of".
We also want some items with some statements.

This should be automated in some way.

Possible Solutions:

  • Use whatever federated properties built (some kind of maintenance script) see: T255648 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/612633/
    • ✅ : Something is already there
    • ❌ We'd need to extend this (no properties or statements there)
  • WikibaseImport (a maintenance script)
    • ❌: Needs to be checked for compatibility with the test system
    • ✅: We maintain some semi-abandonware
    • ✅: We can import complete entites form a wikibase (e.g. Wikidata.org) so we don't need to think up test data
  • Make API calls wbeditentity
    • ❌: Where do we run it
    • ✅: Any one could run it (even tools builders was mentioned?)

We think we should use WikibaseImport unless it's massively incompatible with the test system. Our fallback option is the API calls

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Tarrow changed the task status from Stalled to Open.Sep 14 2020, 8:46 PM
Tarrow claimed this task.

Hey @Pintoch, it would be cool to get your feedback here. Do you think the data we've seeded into the test system for the WikibaseManifest would be useful to OpenRefine when trying to test an integration? It includes a number of properties (including Instance of & Subclass Of) as well as items. Let me know if you think anything additional would be needed.

It looks great! Here are a few comments:

  • I assume the constraint ids don't appear there because the quality constraints extension is not installed on this wiki, but would appear otherwise?
  • In the manifests that we use, we have currently added a maxlag setting (to know which maxlag= is recommended when making edits). I don't know if that is worth including in your version or if it is actually always going to be 5 in all deployments…

Also, there will inevitably be things that will need to be added to the manifest, as the tools ecosystem and the core platform evolves. For instance, say OpenRefine wants to display a note about the license used on the target Wikibase instance, to make sure users are aware of it. Assuming the WikibaseManifest extension has already been deployed on Wikidata and other Wikibase instances, what sort of process should we follow to add this? I imagine we need to file a Phabricator ticket to propose a syntax for the new field, submit a Gerrit patch to add it to the extension, wait for a new release and deployment? Do you already have some ideas about the sort of policy you would apply to these changes? I assume it would be legitimate for you to be reluctant to add fields which are overly tool-specific and therefore outside the scope of this project.

With that in mind, what is your expectation for tool builders:

  • We rely entirely on the manifest exposed by Wikibase: users can configure a Wikibase instance just by adding the corresponding URL. It is clean, but relies on the assumption that all configuration parameters we will ever need will be available there (and puts pressure on the release and deployment process on your side to get new fields shipped quickly)
  • We still use our own configuration format, which is stripped of all information we can extract from the Wikibase instance via the new extension. This is more flexible, but still requires users to write manifests manually (although they would be much smaller).

Change 629127 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/WikibaseManifest@master] Clone and Install WikibaseImport on test system

https://gerrit.wikimedia.org/r/629127

Change 629128 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/WikibaseManifest@master] Add folder on test system for extra MW config

https://gerrit.wikimedia.org/r/629128

Change 629129 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/WikibaseManifest@master] Enable WikibaseImport and use

https://gerrit.wikimedia.org/r/629129

Change 629130 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/WikibaseManifest@master] Build Manifest config as result of importing

https://gerrit.wikimedia.org/r/629130

Change 629127 merged by jenkins-bot:
[mediawiki/extensions/WikibaseManifest@master] Clone and Install WikibaseImport on test system

https://gerrit.wikimedia.org/r/629127

Change 629128 merged by jenkins-bot:
[mediawiki/extensions/WikibaseManifest@master] Add folder on test system for extra MW config

https://gerrit.wikimedia.org/r/629128

It looks great! Here are a few comments:

  • I assume the constraint ids don't appear there because the quality constraints extension is not installed on this wiki, but would appear otherwise?
  • In the manifests that we use, we have currently added a maxlag setting (to know which maxlag= is recommended when making edits). I don't know if that is worth including in your version or if it is actually always going to be 5 in all deployments…

Also, there will inevitably be things that will need to be added to the manifest, as the tools ecosystem and the core platform evolves. For instance, say OpenRefine wants to display a note about the license used on the target Wikibase instance, to make sure users are aware of it. Assuming the WikibaseManifest extension has already been deployed on Wikidata and other Wikibase instances, what sort of process should we follow to add this? I imagine we need to file a Phabricator ticket to propose a syntax for the new field, submit a Gerrit patch to add it to the extension, wait for a new release and deployment? Do you already have some ideas about the sort of policy you would apply to these changes? I assume it would be legitimate for you to be reluctant to add fields which are overly tool-specific and therefore outside the scope of this project.

With that in mind, what is your expectation for tool builders:

  • We rely entirely on the manifest exposed by Wikibase: users can configure a Wikibase instance just by adding the corresponding URL. It is clean, but relies on the assumption that all configuration parameters we will ever need will be available there (and puts pressure on the release and deployment process on your side to get new fields shipped quickly)
  • We still use our own configuration format, which is stripped of all information we can extract from the Wikibase instance via the new extension. This is more flexible, but still requires users to write manifests manually (although they would be much smaller).

@Pintoch I think I worded my question in a less-than-clear way -- I was wondering if the amount of entities (items, properties) that we loaded into the test Wikibase is sufficient to replicate a production WB. As you are correctly pointing out, the Manifest itself linked on the main page of the test Wiki is still a work in progress and has yet to reflect the changes we are making in response to earlier feedback and the OpenRefine manifest. This includes adding maxlag (see T263640). In terms of incorporating additional feedback, that is the goal of this test system -- for tool builders to try out this first version while it is still under development, to incorporate any changes we can before first release, and then to continue collecting feedback to make the Manifest useful for the broadest range of tools before a second release/deployment to Wikidata. I'll shoot you an email later this week to see if we can connect and talk through some of these details in the near future.

Oh I see, sorry for the misunderstanding!

Concerning the items and properties in this Wikibase instance, I think that looks good. I see you have added items and properties for the constraint system - that's great! But it is not clear to me if the quality constraints extension has been installed accordingly?

Change 629129 merged by jenkins-bot:
[mediawiki/extensions/WikibaseManifest@master] Enable WikibaseImport and use on test system

https://gerrit.wikimedia.org/r/629129

Change 629130 merged by jenkins-bot:
[mediawiki/extensions/WikibaseManifest@master] Build Manifest config as result of importing

https://gerrit.wikimedia.org/r/629130