Page MenuHomePhabricator

Wikidata integration for proveit gadget
Open, MediumPublic

Description

One of the main goals of my grant to enhance the ProveIt gadget is connecting it with Wikidata, so that references are autocompleted when the user enters enough data match a Wikidata item. Furthermore, new references added with the gadget should be inserted into Wikidata, and changes done to existing references should be inserted too, so that users only need to enter the information once.

This is a very desirable feature, but I've been thinking about it since the grant began and I still don't see very clearly how the UX and UI should be, not to mention the actual implementation details. So I start this task to get some input and help.

Some months ago I contacted the Wikidata community and they are ok with allowing ProveIt to create items at Wikidata, even if this means that items for non-notable sources may be created. They also pointed me to the WikiProject Source MetaData that seems involved in making this kind of connections possible.

UX and UI

What would be the best way to present and interact with this functionality?

The best I could think of so far is this: every time the user moves from one field to the next, we query Wikidata in search for a match. If we find one, we display a link to the item at the bottom-left corner of the gadget. If the link is clicked, the Wikidata item opens on a new browser tab. Next to the link, a button that says "Use" appears. When the button is clicked, the reference form is filled with the data from the Wikidata item. After that, any changes done to the form will be pushed to Wikidata when the user hits the Insert button (well, not all, there are many irrelevant fields, such as the page number, citation text, etc. which rises the question of how to distinguish them, see below). Once the Use button is clicked, it should probably be replaced with an Unlink button that unlinks the form from Wikidata, in case the user doesn't want to push the changes to Wikidata, or in case the Use button was hit by mistake.

In case more than one item is matched, it may be possible to show more than one link (one beneath the other?) each with its own Use/Unlink button.

Implementation

How to map template parameters with Wikidata properties?

My best idea so far is this: in many cases, there is a correlation between parameter names and property names. For example, the "author" parameter in the English citation templates corresponds to the "author" property in Wikidata. Same in Spanish: the "autor" parameter corresponds to the "autor" property, and in other languages it will probably be the same, simply because it's the most obvious name to give to the field. If the property name has spaces, we replace them with dashes and we often get a match, for example, the "publication-date" parameter of the English citation templates is called "publication date" in Wikidata. Lastly, we can use parameter aliases to match the properties that are called differently in Wikidata. For example, the "publication-place" parameter of the English citation templates is called "published in" in Wikidata, but we can add a parameter alias called "published-in" to the TemplateData of the templates so that the gadget finds the match.

This way of matching parameter names with property names can probably be used not only for filling the reference form with the data retrieved from Wikidata, but also for querying Wikidata effectively when searching for a match.

Given that properties in Wikidata are predefined and cannot be arbitrarily created, it should be possible to distinguish between data that should be pushed into Wikidata, and data that should not. If a corresponding property exists in Wikidata, then update it. Else don't. Some cases however may be problematic. For example, I found that articles have a "page(s)" property for indicating the pages at which the article is found within the journal (example) but in Wikipedia citations, the page(s) parameter is used for specifying the page(s) relevant to the reference, not the whole article. Not sure about the solution here, but maybe these special cases can be hard-coded.

These are my ideas and solutions so far, and I think they are pretty good. But because I haven't started with the actual coding yet, and because the functionality is relatively complex and I don't know Wikidata too well, I may be missing important issues. If so, I hope you can help me detect them, and solve them. Thanks!

Event Timeline

Sophivorus updated the task description. (Show Details)

I have tried using ProveIt a few times, but it never fit with my workflows, so thanks for working on its usability.

More to your point, I think https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData/Bibliographic_metadata_for_scholarly_articles_in_Wikidata , while a bit outdated,
is still a good basis for discussing the correspondence between Wikipedia citation template parameters and Wikidata properties for citations.

Mvolz renamed this task from Wikidata integration to Wikidata integration for proveit gadget.Oct 25 2016, 8:30 AM

You might be interested in this task, which is about adding support in wikidata for citoid: T131661

It faces a lot of the same challenges, although there it maps citoid parameters to wikidata parameters, and not template parameters directly to wikidata parameters. However, we already map citoid parameters to template parameters in several places (in VE, using template data, and also in the ref tool bar, which does it directly) so it's possible you could use a lot of the same architecture and just pass from template params, to citoid params, to wikidata params.

Sophivorus changed the task status from Open to Stalled.Nov 16 2016, 4:30 PM

This task is stalled until the relevant people review my midpoint report and the second part of my grant is released. They are delayed two months already.

Sophivorus changed the task status from Stalled to Open.Apr 7 2017, 1:19 AM

The new REST API offers a simple way to get citation data out of a URL, DOI, PMID, etc.
https://en.wikipedia.org/api/rest_v1/#!/Citation/getCitation
This, coupled with the Citoid parameter map already available for all the main templates, could greatly simplify the task of adding a Citoid service to ProveIt!

@Deskana as you may have noticed, there's a rule that adds VisualEditor to every citoid task, including ones like this which have no relation to VE, even when you remove it it adds it back in- should we maybe consider removing the rule?

Sophivorus lowered the priority of this task from High to Medium.Aug 16 2018, 9:42 PM
Sophivorus removed projects: VisualEditor, Citoid.

@Mvolz Heh. I hadn't noticed until you pointed it out. I could've sworn that with some Herald rules that it only did it once, and if someone manually overrode what Herald did then it didn't override you again. Either I imagined that, or it's true but simply not the case here.

The simplest solution is probably to change the rule so that VisualEditor isn't automatically added to Citoid tasks.