Page MenuHomePhabricator

[MEX] M3.1.2 - Spike - Improve Publish Info Option 4: Teach EditEntity to respect the summary returned by ModifyEntity::applyChangeOp
Closed, ResolvedPublic

Description

In order to gain a better understanding of the level of effort and impact each option would have, we will create a prototype of the solution.

Timebox: 8 hrs?

From T403149: [MEX] M3.1.2 - Investigate options for improving information saved while publishing

Option 4:

  1. Teach EditEntity to respect the summary returned by ModifyEntity::applyChangeOp. At least in the most basic situations when an EditEntity API call boils down to a single ChangeOp with a single summary line. This is currently not easy to detect, unfortunately, but not hard to fix. Add a new method ChangeOp::isAtomic or ChangeOp::isSingular or similar and make non-atomic instances like the ChangeOps class return false (only when it really contains multiple), while all basic ChangeOp return true. Callers can use this information to decide which summary they should use.
  2. Implement and wire new ChangeOp instances for all the real-world situations you want to cover. These new classes would be rather short: All they do is to generate a nice summary specific to the use-case they are designed for. The actual changes are forwarded to any combination of existing ChangeOp classes (but their summaries are ignored). isAtomic would be true for these new ChangeOp instances because they are able to describe the change in a single, meaningful summary line.
  3. A good starting point is ChangeOpFingerprint. That would be a great place to detect and generate new summaries like "a label and a description in the same language changed". However, the class currently doesn't even try to generate a meaningful summary. Why is that?

To me this doesn't seem to be a larger change, unless I'm missing something.

As far as I understand the other proposals so far both option 1 and 2 have notable problems:

  • Both suggest to analyze ChangeOpResult instances, not ChangeOps. But that's to late, isn't it? ChangeOpResult doesn't contain the information that would be needed to understand if a change can be described with a more narrow summary line. For example: What if a label and a description was changed the same time? If that happens often enough it could get a separate summary line. How would ChangeOpResult be able to understand that?
  • Both proposals would generally re-implement something that already largely exists in the ChangeOp hierarchy. Why not just use the existing logic instead?

Event Timeline

It turns out this approach suffers from more problems than anticipated; I think the overall issue is simply that EditEntity has never used the change ops to generate edit summaries before, and so while there are some ambitions in the code in that direction, none of it quite works out properly.

  • The ChangeOps can form a pretty convoluted hierarchy, yet in order to know if we can apply a custom edit summary, we need to determine if it’s “actually” just one (“atomic”) change op (albeit possibly one with several interior change ops that takes care of the edit summary, as @thiemowmde suggested in T403149#11224870). ChangeOps::apply() currently has some very simple code for this (if count( $this->changeOps ) === 1, then use summary, else not), which breaks down very quickly: e.g. ItemChangeOpDeserializer may return ChangeOps with one inner ChangeOpFingerprint for the terms and one inner ChangeOps for all the statement edits together – thus the outer ChangeOps has two children and won’t apply any edit summary. Fixing this isn’t trivial – a naive attempt by me to “flatten” change ops (discard interior empty ChangeOps instances, collect all nested ChangeOp instances into a single array) ran into a couple of issues (the existence of NullChangeOp as a change op that should be dropped; the existence of ChangeOpFingerprint as a subclass of ChangeOps that shouldn’t be flattened away because of its custom ChangeOpFingerprintResult result).
  • Option 4 assumes that ChangeOp classes (or ModifyEntity::applyChangeOp()) “return a summary”, but that’s not quite right: they receive an (optional) summary and modify it (if it’s not null). This is an important difference, because it turns out that ChangeOp classes never set the moduleName component of a Summary (indeed, the class doesn’t even have a setter for it – the $moduleName is passed into the constructor and immutable from then on). But the module name is a crucial part of the edit summary; to make option 4 workable, all the ChangeOp classes need to learn to optionally set a $moduleName on the summary, and EditEntity needs to pass in an edit summary where the module name isn’t set yet.
  • ChangeOpStatement is even worse, because it doesn’t even set all the autocomment args (number of changes) and autosummary args (property ID of statement changed) – that actually happens in SetClaim::getSummary() and ClaimSummaryBuilder, using a ClaimDiffer that diffs the existing and new statement. Without this, all that ChangeOpStatement does for us regarding the summary is distinguish between “create” (new statement) and “update” (existing statement).
  • Some of the other ChangeOp classes are more useful for summaries, such as ChangeOpQualifier. However, it turns out that this is not useful to us in wbeditentity, because ClaimsChangeOpDeserializer (which creates the change ops to apply) will only ever create change ops to set (ChangeOpStatement) and remove (ChangeOpRemoveStatement) statements. Editing a statement’s qualifiers via wbeditentity doesn’t involve ChangeOpQualitier at all.
  • Also, ClaimsChangeOpDeserializer creates a ChangeOpStatement for every statement included in the wbeditentity payload, even statements that weren’t changed. This ties back to the first problem, because ChangeOps::apply() will now see several ChangeOp classes, with no way to determine which ones of them actually change the data and should be included in the edit summary.

I’ll have to think about the ramifications of this a bit more. I think all of these issues are in principle solvable, but it certainly won’t be as easy as I hoped earlier. I’ll take a closer look at this ClaimSummaryBuilder / ClaimDiffer – maybe we’ll end up using that directly somewhere else…

Change #1204919 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] WIP DNM: Some code changes for better wbeditentity summaries

https://gerrit.wikimedia.org/r/1204919

Change #1204919 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/Wikibase@master] WIP DNM: Some code changes for better wbeditentity summaries

Reason:

investigation complete, we’re not pursuing this option

https://gerrit.wikimedia.org/r/1204919