Page MenuHomePhabricator

[MEX] M3.1.2 - Spike - Improve Publish Info Option 5: Use ClaimSummaryBuilder in EditEntity
Closed, ResolvedPublic

Description

Per the investigation in T407876: [MEX] M3.1.2 - Spike - Improve Publish Info Option 4: Teach EditEntity to respect the summary returned by ModifyEntity::applyChangeOp, let’s try a fifth option where EditEntity uses the ClaimSummaryBuilder class which is also used in SetClaim.

Event Timeline

Change #1205123 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] POC: Use ClaimSummaryBuilder in EditEntity

https://gerrit.wikimedia.org/r/1205123

Quoting the commit message:

This is, I think, an in principle viable approach to building detailed edit summaries for wbeditentity. Moreover, I think it should be possible to reuse this approach for EditSummaryHelper, and frankly I’d very much like to do that, because that would cut down on the different number of ways we use to obtain the edit summary, which we really have quite enough of already (diff the edit and analyze the change, as in SetClaim and now EditEntity; update the summary in the change ops, as in most API modules; analyze the ChangeOpResult, as in EditSummaryHelper).

The one problem I’ve noticed so far is that if you edit a qualifier, it incorrectly reports it as “Changed 2 qualifiers of claim”, because ClaimSummaryBuilder will see a qualifier diff with two diff ops (remove old qualifier, add new qualifier) rather than one (change qualifier value). This didn’t come up before because, while ClaimSummaryBuilder has code to emit update-qualifiers and update-references summaries, I believe those are never actually used in the end: SetClaim will pass the summary from the ClaimSummaryBuilder into the ChangeOp, which in turn will reset the action to a plain “update”. (The incorrect number “2” in the autocomment args actually persists – you can see it in this API request for this diff – but it’s not used by the “update” message and thus never visible in the user interface.)

(That last part is basically another instance of the “the code has ambitions, but because it was never used that way, it doesn’t quite work” pattern that I also noticed in T407876#11371379.)

I think the one possible downside of this is performance – diffing the entity feels wasteful when the change ops “should” already be able to tell us what they’ve changed. (And if the edit specified an old baserevid, then the conflict resolution code in MediaWikiEditEntity will diff the entity again in order to apply the resulting patch to the latest revision, so we might calculate the same diff twice.) That said, given all the issues with option 4 (T407876), I think this might still be the best approach.

Change #1205158 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] POC: Use EntityDiff for EditSummaryHelper

https://gerrit.wikimedia.org/r/1205158

The one problem I’ve noticed so far is that if you edit a qualifier, it incorrectly reports it as “Changed 2 qualifiers of claim”, because ClaimSummaryBuilder will see a qualifier diff with two diff ops (remove old qualifier, add new qualifier) rather than one (change qualifier value). This didn’t come up before because, while ClaimSummaryBuilder has code to emit update-qualifiers and update-references summaries, I believe those are never actually used in the end: SetClaim will pass the summary from the ClaimSummaryBuilder into the ChangeOp, which in turn will reset the action to a plain “update”.

Apparently the update-qualifiers messages were used between 2013 and 2014:

mysql:research@dbstore1009.eqiad.wmnet [wikidatawiki]> SELECT rev_id, rev_timestamp, comment_id, comment_text FROM revision JOIN comment ON rev_comment_id = comment_id WHERE comment_text LIKE '/* wbsetclaim-update-qualifiers:%' LIMIT 10;
+----------+----------------+------------+-------------------------------------------------------------------------+
| rev_id   | rev_timestamp  | comment_id | comment_text                                                            |
+----------+----------------+------------+-------------------------------------------------------------------------+
| 33407810 | 20130501182516 |   50991409 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P37]]: [[Q8752]]   |
| 33451051 | 20130501203713 |   51001489 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P39]]: [[Q19546]]  |
| 33494087 | 20130501231927 |   51010969 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P214]]: 54229582   |
| 33494235 | 20130501231953 |   51010969 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P214]]: 54229582   |
| 33520866 | 20130502005117 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
| 33521561 | 20130502005515 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
| 33521859 | 20130502005653 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
| 33522081 | 20130502005802 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
| 33522366 | 20130502005928 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
| 33522624 | 20130502010050 |   51015183 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P39]]: [[Q191954]] |
+----------+----------------+------------+-------------------------------------------------------------------------+
10 rows in set (1 min 20.379 sec)

mysql:research@dbstore1009.eqiad.wmnet [wikidatawiki]> SELECT rev_id, rev_timestamp, comment_id, comment_text FROM revision JOIN comment ON rev_comment_id = comment_id WHERE comment_text LIKE '/* wbsetclaim-update-qualifiers:%' ORDER BY rev_timestamp DESC LIMIT 10;
+-----------+----------------+------------+---------------------------------------------------------------------------+
| rev_id    | rev_timestamp  | comment_id | comment_text                                                              |
+-----------+----------------+------------+---------------------------------------------------------------------------+
| 123778792 | 20140429180058 |   74348645 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P360]]: [[Q5]]       |
| 123771658 | 20140429173352 |   77643554 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P138]]: [[Q76365]]   |
| 123767133 | 20140429171424 |   77642224 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P54]]: [[Q2143177]]  |
| 123767087 | 20140429171411 |   77642209 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P54]]: [[Q483965]]   |
| 123767018 | 20140429171356 |   77642186 | /* wbsetclaim-update-qualifiers:1||1|1 */ [[Property:P54]]: [[Q750437]]   |
| 123766172 | 20140429171017 |   77641965 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P54]]: [[Q16331994]] |
| 123766098 | 20140429170957 |   77410136 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P54]]: [[Q240783]]   |
| 123766010 | 20140429170935 |   77641924 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P54]]: [[Q131378]]   |
| 123765358 | 20140429170651 |   75338423 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P54]]: [[Q592241]]   |
| 123765177 | 20140429170607 |   77641711 | /* wbsetclaim-update-qualifiers:1||1|2 */ [[Property:P54]]: [[Q1349814]]  |
+-----------+----------------+------------+---------------------------------------------------------------------------+
10 rows in set (3 hours 53 min 4.138 sec)

And they already suffered from the “changed 2 qualifiers” issue at the time, as seen in the very first edit with this summary, which really changed the value of the single “located in the administrative territorial entity” qualifier. I’m inclined to call this a problem we can live with, to be honest, or at least defer until later.

Moreover, I think it should be possible to reuse this approach for EditSummaryHelper, and frankly I’d very much like to do that…

(This was done in the other POC change above, by the way, and generally seems to work out fine there.)

Change #1201726 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Inline classes into EditSummaryHelper

https://gerrit.wikimedia.org/r/1201726

Change #1208371 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Use EntityDiff for EditSummaryHelper

https://gerrit.wikimedia.org/r/1208371

Change #1208372 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Move more summary generation into EditSummaryHelper

https://gerrit.wikimedia.org/r/1208372

Change #1208373 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Refactor EditSummaryHelper for different modules

https://gerrit.wikimedia.org/r/1208373

Change #1208374 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] Use ClaimSummaryBuilder in EditSummaryHelper

https://gerrit.wikimedia.org/r/1208374

Change #1205158 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/Wikibase@master] POC: Use EntityDiff for EditSummaryHelper

Reason:

superseded by Ia0d71e037e

https://gerrit.wikimedia.org/r/1205158

Change #1205123 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/Wikibase@master] POC: Use ClaimSummaryBuilder in EditEntity

Reason:

superseded by I99122c508f

https://gerrit.wikimedia.org/r/1205123

I rearranged the changes to be less of a proof of concept and closer to a form that I think we could merge (which includes swapping the two POC changes around: first refactor EditSummaryHelper to use EntityDiff, then add ClaimSummaryBuilder); to leave the context for the above comments clear, I decided to upload this whole chain with new change IDs and abandon the two POC changes instead of updating them.

If you’re just starting out reviewing this investigation, it probably makes sense to look just at the new changes, and especially Use ClaimSummaryBuilder in EditSummaryHelper which is the new functionality (everything leading up to that is refactoring); if you were in the middle of reviewing the POC changes, they’re still there, just abandoned now.

Change #1210645 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] WIP: Further improve wbeditentity summaries for terms

https://gerrit.wikimedia.org/r/1210645

Team discussion outcome: We’ll go ahead with this option. We expect it to be an “M (medium)” sized task.

  • The impact of the diffing on performance is a bit of an unknown. We did a few basic tests on XHGui: when diffing a huge item (found via Special:LongPages), EntityDiffer::diffEntities, can take up to 1 second of inclusive wall-clock time; when resolving an edit conflict on a smallish item (the sandbox item), three calls to EntityDiffer::diffEntities() occupy 7 milliseconds. Based on that, we’re comfortable to go ahead with this approach for now; we can then see the actual performance impact in production once it’s rolled out.
    • Suggestion by @hoo: We may be able to optimize the diffing by skipping parts that we know weren’t modified, based on the JSON input to the API (no "claims" → statements certainly weren’t modified) or on the ChangeOpResult. But this can wait.
  • We don’t know at the moment which edit summaries the REST API generates or how it generates them. I want to check that before going ahead with this option, just in case we’d be duplicating a lot of code that’s already available there. (Edit: This is also relevant for T410617: [MEX] M5 - Replace ActionAPI `wbeditentity` call with REST API PATCH request.)

Leszek and Silvan helped me out here. The REST API generates basically the same summaries as wbeditentity on the PATCH endpoint for full entities: term changes get the termbox messages, anything else gets generic “changed an Item”. The code for this lives in WholeEntityEditSummaryToFormattableSummaryConverter.php and a relevant task with a specification is T342993.