Page MenuHomePhabricator

Show word-level diff on textual fields in Wikibase diffs
Open, Needs TriagePublic13 Estimated Story PointsFeature

Description

User story:
As a Wikidata editor patroling some edits,
I want to spot the actual changes more easily
in order to assess them more efficiently.

Problem:
Currently, Wikibase highlights the whole fields in diffs for items and properties. I’d like to have word-level diffs, just like how it works on non-entity pages, for any textual fields, including terms (labels, descriptions, aliases), sitelink targets and all textual properties’ values (strings, monolingual strings, Commons files etc.).

Example:
For example, do you see the difference in this diff? (Tip: in the fourth word, е was changed to є. I found it using NavPopups (which does word-level diff on the JSON, so it shows u0435u0454) and then looking very closely at it.)

Solution:
Do word-level diff on textual fields in Wikibase diffs

Mockups:

How it currently looks likeHow it should look like
wikibase-word-level-diff-before.png (292×1 px, 27 KB)
wikibase-word-level-diff-after.png (293×1 px, 26 KB)

Acceptance criteria:

  • Diff shows word-level differences on textual fields in Wikibase.

Notes:

Open questions:

Event Timeline

(After spending at least ten minutes with handcrafting the screenshots using Firefox’ developer tools, now I noticed that I left off the Property / Commons category heading on the left side. 🙁 I won’t redo them, I hope you get the point anyway.)

Thank you! The illustration is very helpful. I agree that this would be a very useful improvement, especially for patrollers.
@Addshore @ItamarWMDE Any idea how big an endeavor this could be?

I'd assess this endeavor to be of medium effort. From what I can gather, there are a couple of places where we need to change the way diffs are rendered: Firstly, we might want to use or extend the WordLevelDiff class from core's Diff namespace inside our own data-model-services sub-package's EntityDiff and / or ItemDiff classes. Then, we might want to update the way terms and claim diffs are rendered within BasicDiffView and ItemDiffView in the Wikibase/Repo/Diff namespace, to highlight only the changed words rather than the whole changed sentence.

I'd still be happy to hear what @Addshore thinks of it, though, just as a sanity check.

I'd still be happy to hear what @Addshore thinks of it, though, just as a sanity check.

Nothing really to add that I can think of from my side.
Sounds good to me!

Manuel renamed this task from Do word-level diff on textual fields in Wikibase diffs to Show word-level diff on textual fields in Wikibase diffs.May 4 2022, 3:53 PM
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
karapayneWMDE set the point value for this task to 13.Tue, Jun 14, 10:09 AM

Task Breakdown Notes

  • If any subtasks are created, they should be created in the currently running sprint board, to be picked up there.
  • The classes might be involved in detecting merge conflicts, so we might need to tread carefully
  • We should probably try to work our way from BasicDiffView and ItemDiffView and see what we need to change along the line, but avoid touching the merge and conflict resolution functionalities. i.e. the way the diff is programmatically represented should probably not be touched.
  • We can use WordLevelDiff to try and achieve the acceptance criteria, but in case we need to extend or modify it, we should consider the MediaWiki Stable Interface Policy.
  • Make sure to consider whether the core class WordLevelDiff is marked as "newable", meaning that we are able to instantiate it outside of core

Understanding the topic better @noarave will create a separate task from these notes.

  • Figure out where we use WordLevelDiff, but also consider alternative solutions to understand our way forward.
  • The investigation should focus on WordLevelDiff and our possibilities to use it, or, if need be, modifying it, so we could use it.
  • It appears the Tech wishes is using it, so asking Adam or Svantje might be a good option.

Make sure to consider whether the core class WordLevelDiff is marked as "newable", meaning that we are able to instantiate it outside of core

WordLevelDiff was last substantially changed in 2016, so it predates the stable interface policy (created beginning of 2017). To me it seems plausible that marking it newable was simply not done yet, but there wouldn’t be any reasons not to do it either. (Alternatively, TableDiffFormatter is already newable and uses WordLevelDiff; we might be able to use TableDiffFormatter.)

@Lucas_Werkmeister_WMDE I linked to this comment in the subtask so it doesn't escape us.

Change 810017 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Rename item diff related classes for clarity

https://gerrit.wikimedia.org/r/810017

Change 810017 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Rename item diff related classes for clarity

https://gerrit.wikimedia.org/r/810017

Change 810302 had a related patch set uploaded (by Noa wmde; author: Noa wmde):

[mediawiki/extensions/Wikibase@master] Prepare switching to WordLevelDiff in BasicDiffView

https://gerrit.wikimedia.org/r/810302

Change 810315 had a related patch set uploaded (by Noa wmde; author: Noa wmde):

[mediawiki/extensions/Wikibase@master] Use WordLevelDiff for Labels/Description/Aliases

https://gerrit.wikimedia.org/r/810315

Change 810315 had a related patch set uploaded (by Noa wmde; author: Noa wmde):

[mediawiki/extensions/Wikibase@master] Use WordLevelDiff for Labels/Description/Aliases

https://gerrit.wikimedia.org/r/810315

This implements word-level diffs for the contents of labels, descriptions, and aliases. I think diffing the titles of sitelink changes would also be fairly doable; diffing statements feels like more work to me, both to define (e.g. should we diff the labels of item values word by word, or just treat the whole thing as one change, because after all the whole item value was changed to a different ID?) and to implement. Should either of those be part of this task as well?

Good thinking! Yes, let's please diff the titles of sitelink changes as well! I don't want to increase complexity, let's not do the word-by-word diffs for statements for now. If the community wants this at some point, I will create another task for it.

Change 810302 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Prepare switching to WordLevelDiff in BasicDiffView

https://gerrit.wikimedia.org/r/810302

Change 810361 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Prepare switching to WordLevelDiff in SiteLinkDiffView

https://gerrit.wikimedia.org/r/810361

Change 810362 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Use WordLevelDiff for site link titles

https://gerrit.wikimedia.org/r/810362