Page MenuHomePhabricator

[Investigation: Timebox 4 hours] Compare prototype output with json-schema from OpenRefine
Closed, ResolvedPublic

Description

Please find the differences between the two outputs; particularly data that is in the OpenRefine file and not in the prototype. Wherever possible, we should also document the "why" behind the differences.

Key differences to consider:

  1. EditGroups (templates for generating wikitext)
  2. String rather than EntityID keys for WikibaseQualityConstraints

Event Timeline

Tarrow created this task.Aug 21 2020, 10:36 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 21 2020, 10:36 AM

Key differences to consider:

  • EditGroups (templates for generating wikitext)
  • String rather than EntityID keys for WikibaseQualityConstraints
Restricted Application added a project: Wikidata. · View Herald TranscriptAug 26 2020, 8:22 AM
Samantha_Alipio_WMDE updated the task description. (Show Details)
Samantha_Alipio_WMDE renamed this task from Compare prototype output with json-schema from OpenRefine to [Investigation: Timebox 4 hours] Compare prototype output with json-schema from OpenRefine.Aug 26 2020, 10:14 AM
Samantha_Alipio_WMDE triaged this task as Medium priority.Aug 31 2020, 11:03 AM

OpenRefine's Manifest Schema spec
WikibaseManifest prototype

OptionsEntity mappingPropertiesQualityConstraints
WikibaseManifest Prototypemaps entity ids, where the key is the Wikidata entity and the value is the local wikibase entity, e.g. "P31": "P1"has "instance_of"does not have it
OpenRefine Manifest Speclists the label and the respective id on the local wikibase, e.g. “instance_of”: “P1”has "instance of" and "subclass_of"lists them but it's optional as QualityContraints is an extension and it might not be installed on some Wikibases
CommentWe should find out which format is more convenient for tool buildersMaybe we should look at Basic membership properties to see the full list and decide which we want to includeWe might not provide them in v1 of WikibaseManifest
OptionsOAuthReconciliationEditGroups
WikibaseManifest Prototypedoes not have ithas it as an external servicedoes not have it
OpenRefine Manifest Speclists the configs but it's optional, if available it requires the URL of the registration page, e.g. https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/proposehas it and it's requiredhas it as optional, and if provided it needs a url_schema. The URL schema must contain the variable ${batch_id}
Comment
OptionsRDF NamespacesMediaWiki infoOther
WikibaseManifest Prototypehas them listedlists the Wikibase name and root URLLists the namespaces for item and property
OpenRefine Manifest Specdoes not have themRequires the name of the Wikibase, e.g. Wikidata, the URL of the root, e.g. https://www.wikidata.org/wiki/, main page URL, e.g. https://www.wikidata.org/wiki/Wikidata:Main_Page and the API endpoint, e.g. https://www.wikidata.org/w/api.phprequires max_lag and site_iri
CommentNo harm in keeping themWe should craft a list of MediaWiki configs we can expose

Here is my take with changes marked in italic

OptionsEntity mappingPropertiesQualityConstraints
WikibaseManifest Prototypemaps entity ids, where the key is the Wikidata entity and the value is the local wikibase entity, e.g. "P31": "P1"no specific section, but can be mapped as entities no specific section, but can be mapped as entities
OpenRefine Manifest Speclists the label and the respective id on the local wikibase, e.g. “instance_of”: “P1”has "instance_of" and "subclass_of", requiredlists 17 properties and 47 items for wikidata. Otherwise optional, as QualityContraints is an extension and it might not be installed on some Wikibases
CommentWe should find out which format is more convenient for tool buildersMaybe we should look at Basic membership properties to see the full list and decide which we want to includeWe might not provide them in v1 of WikibaseManifest
OptionsOAuthReconciliationEditGroups
WikibaseManifest Prototypedoes not have itallows it as an external servicedoes not have it
OpenRefine Manifest Speclists the configs but it's optional, if available it requires the URL of the registration page, e.g. https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/proposehas it and it's requiredhas it as optional, and if provided it needs a url_schema. The URL schema must contain the variable ${batch_id}
Comment
OptionsRDF NamespacesMediaWiki infoOther
WikibaseManifest Prototypehas them listedlists the Wikibase name and root URLLists the namespaces for item and property
OpenRefine Manifest Specdoes not have themRequires the name of the Wikibase, e.g. Wikidata, the URL of the root, e.g. https://www.wikidata.org/wiki/, main page URL, e.g. https://www.wikidata.org/wiki/Wikidata:Main_Page and the API endpoint, e.g. https://www.wikidata.org/w/api.phprequires max_lag and site_iri
CommentNo harm in keeping themWe should craft a list of MediaWiki configs we can expose

Change 624011 had a related patch set uploaded (by Tonina Zhelyazkova; owner: Tonina Zhelyazkova):
[mediawiki/extensions/WikibaseManifest@master] Add ADR 3 - Manifest Output Format

https://gerrit.wikimedia.org/r/624011

Change 624011 merged by jenkins-bot:
[mediawiki/extensions/WikibaseManifest@master] Add docs - Manifest Output Format

https://gerrit.wikimedia.org/r/624011

Samantha_Alipio_WMDE closed this task as Resolved.Sep 24 2020, 12:10 PM