Page MenuHomePhabricator

Step 1: nicer formatting of reference snaks (impact: high)
Closed, ResolvedPublic13 Estimated Story Points

Description

As an editor I want to be able to read the references shown in the Bridge.

Problem:
For anything but simple datatypes the formatting contains additional markup etc. We should remove it.

Screenshots/mockups:

references.png (488×500 px, 31 KB)

How it looks on Wikipedia (formatting of Wikidata references)

image.png (127×379 px, 17 KB)

BDD
GIVEN a statement with a reference
WHEN editing that statement in the Bridge
THEN the value is shown in a human-readable format

Acceptance criteria:

  • reference is formatted in a human-readable way
  • language of the client wiki's content is taken into account

Notes:

  • There are a lot of properties that can be expected to be a part of a reference (https://www.wikidata.org/wiki/Wikidata:List_of_properties/Citing_sources). We can't cover all cases.
    • For now we try to go with a format that is similar to this: "$title. $statedIn. $author. $publisher. $publicationdate. <everything else in the order it is on the repository>. Retrieved $retrieveddate." $title would be linked to the reference URL if available. If a reference URL exists but no title then we spell out the URL instead of the title.
    • We will want to make the formatting a bit more sophisticated later but for now that probably suffices. We expect that others will want to be able to use the same reference formatting via the API and Lua in the future so wrapping this into a standard API would be useful.
  • For any links to other entities (like author or publication) we output them as plain text and not links to the respective entities or their Wikipedia articles.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lydia_Pintscher renamed this task from nicer formatting of reference snaks to Step 1: nicer formatting of reference snaks.Dec 5 2019, 1:12 PM
Lydia_Pintscher renamed this task from Step 1: nicer formatting of reference snaks to Step 1: nicer formatting of reference snaks (impact: high).Dec 8 2019, 1:21 PM
Lydia_Pintscher renamed this task from Step 1: nicer formatting of reference snaks (impact: high) to Step 1: nicer formatting of reference snaks (impact: high).
Lydia_Pintscher renamed this task from Step 1: nicer formatting of reference snaks (impact: high) to Step 1: nicer formatting of reference snaks (impact: high).

Story writing notes:

  • We don’t want to invest into formatting the values in the browser. It would take some effort; it would either be conceptually ugly or require other extensions to extend Bridge (because e. g. mathematical expressions and musical notation are defined in separate extensions and Wikibase doesn’t yet know about them); it would require us to keep this formatting code up to date with future changes; and it’s not as useful to others.
  • We could use the wbformatvalue API, formatting each snak value at a time. But that’s a lot of concurrent API requests (possibly enough to reach a rate limit), and the formatting isn’t always ideal for us (e. g. we want to combine URL + title into a link with that text; the retrieved date should be distinguishable from the publication date; we would suffer from T218646; etc.), likely requiring further post-processing in the browser (see above).
  • We could introduce a new API, e. g. wbformatreference, which takes a full reference and formats it. This would be based on the Lua formatting code (e. g. turn item links into links to a certain client wiki if sitelinks exist), mostly eliminates the concurrent API requests problem (unless there are enough references to exhaust the rate limit, rather than that many reference snaks), and is useful to third parties as well. We could also expose this reference formatting logic via Lua; large wikis that already have their own modules for that might not (initially) use it, but it may be very useful for others, as well as third-party installs.

We are leaning towards the new API.

To match the mock-ups, this API would have to do at least the following, beyond formatting snaks normally:

  • Combine a URL and a title into a link with that URL and link text. (What happens in case of multiple URLs? Is “title” a certain property, any string, any monolingual text snak?)
  • Never format entity IDs as links. References should be sparse in links, the URL itself is the most important. (This seems to match the style of existing Wikipedia references.)
  • Add the word “retrieved” before the retrieved date.
  • Order reference snaks. (Compare the existing ordering of statements by main snak property ID, via a certain interface message.)

The property IDs of the following properties will be added to the config / repo settings:

  • title
  • stated in
  • author
  • publisher
  • publication date
  • retrieved date
  • reference URL

The API will be added to the repo, not the client. It will initially be marked as internal. Different citation styles may be accommodated at a later date. Lua access is also not part of this task.

Estimated story points for the API part: 13
Estimated story points for the Bridge integration part: 8

The API will be added to the repo, not the client

Based on the ADR, a follow-up in-person discussion with @Lucas_Werkmeister_WMDE, and another in-person discussion with @Lydia_Pintscher we maybe are going to revise the repo, not the client decision. Let's have another round of chat tomorrow (2020-02-21; I failed to submit the comment earlier it seems).

Lydia and I also identified a disconnect about the term "citation style" - to me it is what Lucas described in T238661#5808551 (with some personal assumption of ho while for Lydia e.g. the way entity labels are shown (buzzword: fallback language name) seems to be part of that definition. We need to sharpen the specification it seems.

darthmon_wmde set the point value for this task to 13.Mar 10 2020, 4:17 PM
darthmon_wmde moved this task from Ready to estimate to Ready to pick up on the Wikidata-Bridge board.

Product question: in case we are formatting a reference containing a snak for a property that does not exist (any longer) (cf.) - how do we handle this? The existing formatters we (re)use have an opinion about that but it may not necessarily be the desired behavior, especially when we combine their output to form a larger string.

Example: [...] <span class="error wb-format-error">Property P711 not found, cannot determine the data type to use.</span> [...] (mark-up will not be seen by the user if embed it into bridge)

Urgh... Why are people keeping statements with deleted properties 😭

Would showing the raw value like we do right now in the Bridge work? It's clearly not great but I'm not sure what better options we can offer.

Urgh... Why are people keeping statements with deleted properties 😭

I don’t think they are? Pablo’s link goes to an old revision, not a current version.

Would showing the raw value like we do right now in the Bridge work? It's clearly not great but I'm not sure what better options we can offer.

It could work, but would that be better what we do now? Right now, the value is still somewhat formatted, after all – the error message is only added afterwards. I’ll paste the full HTML snippet from Pablo’s experiment (from chat):

<span>https://commonists.wordpress.com/2019/10/10/my-message-to-video-game-databases-wekidata-come-in-peace/ <span class="error wb-format-error">Property P709 not found, cannot determine the data type to use.</span></span>. <span>My message to video game databases: We(kidata) come in peace <span class="error wb-format-error">Property P711 not found, cannot determine the data type to use.</span></span>. <span>10 October 2019 <span class="error wb-format-error">Property P766 not found, cannot determine the data type to use.</span></span>. <span>14 November 2019 <span class="error wb-format-error">Property P761 not found, cannot determine the data type to use.</span></span>.

You can see that the URL at the beginning isn’t formatted as a URL (because the data value has value type “string” – the information that the data type is “url” is only attached to the property), but e. g. the date near the end is still formatted correctly (because the data value has value type “time”).

Urgh... Why are people keeping statements with deleted properties 😭

I don’t think they are? Pablo’s link goes to an old revision, not a current version.

Ahhh puh. Good.

Would showing the raw value like we do right now in the Bridge work? It's clearly not great but I'm not sure what better options we can offer.

It could work, but would that be better what we do now? Right now, the value is still somewhat formatted, after all – the error message is only added afterwards. I’ll paste the full HTML snippet from Pablo’s experiment (from chat):

<span>https://commonists.wordpress.com/2019/10/10/my-message-to-video-game-databases-wekidata-come-in-peace/ <span class="error wb-format-error">Property P709 not found, cannot determine the data type to use.</span></span>. <span>My message to video game databases: We(kidata) come in peace <span class="error wb-format-error">Property P711 not found, cannot determine the data type to use.</span></span>. <span>10 October 2019 <span class="error wb-format-error">Property P766 not found, cannot determine the data type to use.</span></span>. <span>14 November 2019 <span class="error wb-format-error">Property P761 not found, cannot determine the data type to use.</span></span>.

You can see that the URL at the beginning isn’t formatted as a URL (because the data value has value type “string” – the information that the data type is “url” is only attached to the property), but e. g. the date near the end is still formatted correctly (because the data value has value type “time”).

My main concern is the "Property PXXX not found, cannot determine the data type to use." part. I don't think it is helpful to show this in the Bridge because the reader will most likely not know what this means or what to do about it.

This should be done except for T248479: Add external link icon after links in references, so I think it’s useful to move this to Verification already. (Some of us devs aren’t sure if that icon is supposed to be there at all.)

Some of us devs aren’t sure if that icon is supposed to be there at all

Chiming in to add that this is not a question of opinion but of observation of inconsistent use of the icon in figma (specimen 1, 2, 3). Would be great to hear explicit requirement (=> T248479) before we invest. I'm particularly interested in the observed difference between desktop and mobile (1 & 2 if I did not botch the links). /cc @Sarai-WMDE @Charlie_WMDE

I'm sorry about the inconsistency represented by the designs: specimen 3 is stale, specimens 1 and 2 are inconsistent, both should display the icon indicating external links.

I believe we overlooked the skin interpretation and tried to go for a "custom" more "user-friendly" and visible icon (based on the Minerva interpretation) in Vector.

As already mentioned in T248479, I don't think we need to invest in creating a new icon. Simply adding the necessary class and letting each skin insert its interpretation is more than good enough, and makes the Bridge consistent with each UI environment.

Change 583664 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/Wikibase@master] ApiFormatReference: add basic e2e tests

https://gerrit.wikimedia.org/r/583664

Lydia_Pintscher claimed this task.
Lydia_Pintscher moved this task from Verification to Done on the Wikidata-Bridge-Sprint-17 board.

Looking good!
\o/

i just realised that we never specified that the links in the references should open in a new tab. i will make a ticket for the next iteration.

…that is going to be, uh, interesting to implement.

Change 583664 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] ApiFormatReference: add basic tests

https://gerrit.wikimedia.org/r/583664