Page MenuHomePhabricator

[MEX] M3.1.2 - Investigate options for improving information saved while publishing
Closed, ResolvedPublic

Description

Context

The MEX design for editing item statements would like the user to be able to edit multiple statements in a statement group at once, and all the changes then saved in a single edit when the user clicks “publish”.

However, the API currently doesn’t support this very well – the only API module we could use would be wbeditentity.

Main Objective

What options do we have to update the API module wbeditentity so that its creates a short summary when specifically adding multiple statements with a single property, and what is the level of effort for these options?

Related Tasks
T67846: wbeditentity: try to use appropriate autocomment instead of the generic one
T191885: Reflect atomic changes in wbeditentity summary

Compare SpecialSetLabelDescriptionAliases::applyChangeOpList(), which tries to apply a specific summary / comment if possible (see also T67846#1997210).

Problem statement: When a user edits multiple statements with one action, the watchlist summary needs a concise overview of what was changed.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedkarapayneWMDE
ResolvedArian_Bozorg
ResolvedkarapayneWMDE

Event Timeline

Investigation Report: Improving wbeditentity Summary Generation for Multiple Statements

Problem Statement

When a user edits multiple statements with one action, the watchlist summary needs a concise overview of what was changed. Currently, the wbeditentity API generates generic summaries that don't provide specific information about the changes made.

Current State

  • API Module: wbeditentity (located in extensions/Wikibase/repo/includes/Api/EditEntity.php)
  • Summary Generation: Handled by EditSummaryHelper class
  • Current Behavior: Generates generic summaries like:
    • /* wbeditentity-update: */ - for any entity update
    • /* wbeditentity-update-languages-short:0||de, en */
    • /* wbeditentity-create-item:0| */

Investigation Options

Option 1: Statement-Specific Summary Detection (Recommended)

Level of Effort: Medium (2-3 weeks)

Create a new StatementChangesAnalyzer class (or whatever the the name you see fitting the most) to detect when multiple statements with the same property are being added and generate specific summaries.

Technical Implementation:

// New class: StatementChangesAnalyzer.php
class StatementChangesAnalyzer {
    public function analyzeStatementChanges(ChangeOpResult $changeOpResult): array {
        // Analyze ChangeOpResult to detect:
        // - Number of statements added per property
        // - Whether all changes are to the same property
        // - Types of changes (add/remove/modify)
    }
    
    public function isSinglePropertyMultipleStatements(array $changes): ?array {
        // Check if multiple statements are being added to a single property
    }
}

Code Changes Required:

  1. New Class: StatementChangesAnalyzer.php
  2. Modified Class: EditSummaryHelper.php - Add statement analysis logic
  3. New i18n Messages: Add messages for new summary actions
  4. Tests: Comprehensive test coverage for new functionality

Expected Result:
This will create a new comment value in the comment table in the database.
The used key wbeditentity-batch-single-property can be changed as well, this is just an exanple.

comment_text: /* wbeditentity-batch-single-property:4|P31|3|0|0 */ Added 3 statements for P31

Benefits:

  • ✅ Provides specific information about property changes.
  • ✅ Maintains consistency with existing Wikibase patterns.
  • ✅ Extensible for future enhancements.
  • ✅ Doesn't break existing functionality.

Disadvantage:

  • ❌ Requires new analysis logic.
  • ❌ Medium complexity implementation.

Option 2: Enhanced ChangeOpResult Analysis

Level of Effort: Medium-High (3-4 weeks)

Extend the existing ChangeOpResult system to provide more detailed information about statement changes.

Technical Implementation:

// New class: StatementChangeOpResult.php
class StatementChangeOpResult extends ChangeOpResult {
    private $propertyId;
    private $changeType;
    private $statementCount;
    
    public function getPropertyId(): string {
        return $this->propertyId;
    }
    
    public function getChangeType(): string {
        return $this->changeType;
    }
    
    public function getStatementCount(): int {
        return $this->statementCount;
    }
}

Code Changes Required:

  1. New Class: StatementChangeOpResult.php
  2. Modified Classes: All existing ChangeOp classes to return detailed results
  3. Updated Logic: EditSummaryHelper to use new result types
  4. Comprehensive Tests: For all modified ChangeOp classes

Benefits:

  • ✅ Most comprehensive solution.
  • ✅ Provides detailed change information.
  • ✅ Extensible for future neeeds.

Disadvantage:

  • ❌ Requires changes to core ChangeOp system.
  • ❌ High risk of breaking existing functionality.
  • ❌ Complex implementation.

Option 3: Client-Side Summary Enhancement

Level of Effort: Low (1 week)

Allow clients to provide detailed summaries when making batch edits (kinda out-of-scope but still a valid option to consider, but not recomended).

Technical Implementation:

// In EditEntity.php (extensions/Wikibase/repo/includes/Api/EditEntity.php)
protected function getAllowedParams(): array {
    return array_merge(
        parent::getAllowedParams(),
        [
            'batch_summary' => [
                ParamValidator::PARAM_TYPE => 'text',
                ParamValidator::PARAM_DEFAULT => null,
            ],
        ]
    );
}

private function getSummary(array $preparedParameters, EntityDocument $entity, ChangeOpResult $changeOpResult): Summary {
    if (!empty($preparedParameters['batch_summary'])) {
        $summary = new Summary('wbeditentity');
        $summary->setUserSummary($preparedParameters['batch_summary']);
        return $summary;
    }
    
    // Existing logic
}

Usage Example:

const response = await fetch('/w/api.php', {
    method: 'POST',
    body: new URLSearchParams({
        action: 'wbeditentity',
        id: 'Q123',
        data: JSON.stringify({
            claims: [/* multiple statements for P31 */]
        }),
        batch_summary: 'Added multiple instance of statements',
        token: token
    })
});

Benefits:

  • ✅ Quick to implement
  • ✅ No risk to existing functionality
  • ✅ Flexible for client needs

Disadvantages:

  • ❌ Relies on client cooperation
  • ❌ Inconsistent user experience
  • ❌ Doesn't solve the core problem

Comment/Revision Table Impact Analysis

Which option will create multiple comments in the comments/revision tables?

  • Option 1: ✅ Single comment - Creates one consolidated comment per API call, regardless of how many statements are added.
  • Option 2: ✅ Single comment - Creates one consolidated comment per API call, regardless of how many statements are added.
  • Option 3: ✅ Single comment - Creates one comment per API call, using the client-provided summary.

Current Problem: The existing system already creates only one comment per wbeditentity API call, but the comment is generic and doesn't provide useful information about multiple statement changes.

All options maintain the current behavior of creating a single comment per API call, which is the desired outcome for the MEX design requirement.


Comparison Matrix

OptionEffortRiskMaintainabilityUser ValueTechnical Debt
1. Statement-Specific DetectionMediumLowHighHighLow
2. Enhanced ChangeOpResultMedium-HighHighHighHighLow
3. Client-Side EnhancementLowVery LowLowLowHigh

Recommendation

Recommended Approach: Option 1 (Statement-Specific Summary Detection)

Justification

  1. Optimal Risk/Reward Ratio: Provides significant value with manageable risk.
  2. Consistency: Follows existing Wikibase patterns (similar to SpecialSetLabelDescriptionAliases).
  3. Extensibility: Can be enhanced in the future without breaking changes.
  4. Maintainability: Clean, focused implementation that's easy to understand and maintain.

Implementation Timeline

  • Week 1: Create StatementChangesAnalyzer class and unit tests.
  • Week 2: Modify EditSummaryHelper to integrate statement analysis.
  • Week 3: Add i18n messages, integration tests, and documentation.
  • Week 3: Final testing and code review.

Success Criteria

  • Multiple statements added to the same property generate a single, descriptive summary
  • Existing functionality remains unchanged
  • New summaries are properly localized
  • Performance impact is minimal
  • Covered with proper tests.

Conclusion

Option 1 provides the best balance of functionality improvement, development effort, and risk management. It directly addresses the problem statement while maintaining system stability and following the established patterns of Wikidata/Wikibase.

I would go with a combination of option 1 and 2. The new logic should be in EditSummaryHelper, but I think it should be based on newly added information in the ChangeOpResult system, just like the current implementation there is based on LanguageBoundChangeOpResult. This seems more in accordance with ADR 4 to me. I think the risk of breakage should be manageable – IIRC this code is relatively well covered by existing tests – and to me it seems less complex to add this functionality to the existing change ops than to effectively reimplement the change op analysis in EditSummaryHelper directly.

I’m not sure I understand option 3 correctly. Some version of this already exists with the summary parameter, but that edit summary is not translatable. Also, if the intention is that the batch_summary should replace the server-generated edit summary, then this would mean that we’re letting API users lie about the contents of edits, which I don’t think we should allow. And also, any client-side implementation only benefits wbui2025, whereas the server-side options 1 and 2 benefit any API user.

Yup - I agree with @Lucas_Werkmeister_WMDE . Extend ChangeOpResult and change EditSummaryHelper. I don't have much hands-on experience with either, but from my limited understanding it seems like that would align most with our existing architecture and provide the functionality we need.

So from my perspective, Option 1 is the better choice. Since I’m not that experienced, going by what I've read here, the simplicity and maintainability of Option 1 seems more straightforward since Option 2 will risk messing up existing functionality and overall complexity.

Sounds like we're all in agreement already that Option 3 is not worth considering.

For Option 1 vs 2.... I also don't have much familiarity with this code and would defer to @Lucas_Werkmeister_WMDE.

I would love to understand option 2 better. At the moment it confuses me a bit. For example, how does a single ChangeOp have a "count" of statements? What is a "type"? Does this refer to "add/remove/modify"? If so, isn't this redundant to ChangeOpRemoveStatement and such?

In other words: As far as I understand option 2 it really is the same as option 1, just so that the result of the analysis can be stored directly in the ChangeOp hierarchy. It seems like this does indeed have benefits. However, the analysis code itself would be the same and mostly live in the same place as in option 1, wouldn't it?

Based on my understanding of the code base analysis:
Option 2 is essentially Option 1 but with the analysis results stored directly in the ChangeOpResult hierarchy instead of being computed separately.
The "count" and "type" you're asking about would be:
Count: Number of statements affected (e.g., 3 statements added to P31) .. the goal is to make the multiple statements change with a single property.
Type: "add/remove/modify" - I'm not sure if we already have the same functionality in ChangeOpRemoveStatement, based on my understanding I assumed it has one functionality of removing only.
The key difference is where the analysis lives:
Option 1: Analysis code in StatementChangesAnalyzer + EditSummaryHelper
Option 2: Analysis code in ChangeOp class + EditSummaryHelper (but same logic, extending the current functionality of ChangeOpResult)

For example, how does a single ChangeOp have a "count" of statements?

By being a ChangeOps instance with a list of one or more other ChangeOp instances (each of which can be ChangeOpStatement, ChangeOpRemoveStatement, etc.). The EditSummaryHelper-related code has a helper for iterating through these change op collections (ChangeOpResultTraversal::makeRecursiveTraversable()).

  • About the "type" of an edit: I just wanted to point out that the information is already in the system. remove actions are usually separate ChangeOps already. The difference between "add" and "modify" is less important, but can also be found. ChangeOpStatement for example distinguishes between create and update. ChangeOpQualifier distinguishes between add and update.
  • If it turns out to be helpful to add some kind of getChangeType to either ChangeOp or ChangeOpResult interfaces, why not?
  • However, note that only a few ChangeOps can make use of the proposed methods. You will probably always need them to be able to return some generic value or null.
  • Grouping similar ChangeOps together based on certain aspects they share (e.g. multiple ChangeOps that change multiple values on the same item and same property) should easily be possible with the existing system. There is already a ChangeOps class you can use either as it is or use as a template for such classes.
  • I wonder if there even is a need to store the result of the analysis? Isn't the goal to not change the number of summary lines that get generated? Isn't the summary line generated by the ChangeOp hierarchy the only thing that needs to change?

I probably miss something about the proposal, but I'm not entirely sure how a ChangeOp can "analyze itself"? The current system relies on the hierarchy of ChangeOpDeserializers to "deconstruct" a generic EditEntity request into smaller parts. For example, even if someone uses the generic EditEntity API to hand in 2 MB of JSON with unknown changes, we are currently comparing and diffing this against the previous JSON and can understand that – for example – only a single label changed. The diff makes it down the hierarchy to ItemChangeOpDeserializers to FingerprintChangeOpDeserializer to LabelsChangeOpDeserializer which generates a single ChangeOpLabel. If there is no other ChangeOp with a conflicting interest to write something into the summary, we can use that meaningful summary line. Otherwise we need a more generic one.

Sticking to the example, at the moment there is only a single ChangeOpLabel that internally distinguishes between remove, set, and add. However, the current system could as well be rewritten to have separate ChangeOpAddLabel, ChangeOpRemoveLabel, and ChangeOpModifyLabel. There is no strong reason for one or the other. In many cases there is already a separate ChangeOpRemove… just because the code felt cleaner that way.

My proposal (option 4?) would be to look into ClaimsChangeOpDeserializer and try to teach it a few new tricks in addition to what it already does.

  • For example, it could understand situations when more than one statement is removed. This currently results in a generic ChangeOps instance that contains two ChangeOpRemoveStatement but is unable to generate a meaningful summary line for that. Instead, introduce a new ChangeOpRemoveMultipleStatements or rework the existing ChangeOpRemoveStatement to support more than one statement.
  • Similarly, introduce a new ChangeOpMultipleStatements that is more or less identical to the existing ChangeOps (it just contains a list of more than one ChangeOpStatement instances) but is able to generate a much more meaningful summary line, knowing that it can only contain ChangeOpStatement instances.
  • And so on for all the new situations you want to support.
  • Personally, I would not try to add more functionality to the existing ChangeOps class but leave it untouched for all the (unavoidable) edge-cases that aren't worth being covered with new summary lines.

Isn't the summary line generated by the ChangeOp hierarchy the only thing that needs to change?

As far as I can tell, the summary generated by the ChangeOp hierarchy has never been used by wbeditentity. EditEntity::modifyEntity() calls $this->applyChangeOp() without a Summary argument, and later creates its own summary in getSummary() (using the EditSummaryHelper); the edit summary has been created separately since 2013 (though applyChangeOp() was only added in 2014, EditSummaryHelper in 2019, and of course the code moved around between methods and stuff).

I kind of like the idea of tweaking the ChangeOp hierarchy until the edit summary created by it is sufficient for all cases – ideally we could even get rid of EditSummaryHelper – but that would require larger changes to the existing code.

This ticket is a duplicate of T67846: wbeditentity: try to use appropriate autocomment instead of the generic one then, I'm afraid.

Option 4:

  1. Teach EditEntity to respect the summary returned by ModifyEntity::applyChangeOp. At least in the most basic situations when an EditEntity API call boils down to a single ChangeOp with a single summary line. This is currently not easy to detect, unfortunately, but not hard to fix. Add a new method ChangeOp::isAtomic or ChangeOp::isSingular or similar and make non-atomic instances like the ChangeOps class return false (only when it really contains multiple), while all basic ChangeOp return true. Callers can use this information to decide which summary they should use.
  2. Implement and wire new ChangeOp instances for all the real-world situations you want to cover. These new classes would be rather short: All they do is to generate a nice summary specific to the use-case they are designed for. The actual changes are forwarded to any combination of existing ChangeOp classes (but their summaries are ignored). isAtomic would be true for these new ChangeOp instances because they are able to describe the change in a single, meaningful summary line.
  3. A good starting point is ChangeOpFingerprint. That would be a great place to detect and generate new summaries like "a label and a description in the same language changed". However, the class currently doesn't even try to generate a meaningful summary. Why is that?

To me this doesn't seem to be a larger change, unless I'm missing something.

As far as I understand the other proposals so far both option 1 and 2 have notable problems:

  • Both suggest to analyze ChangeOpResult instances, not ChangeOps. But that's to late, isn't it? ChangeOpResult doesn't contain the information that would be needed to understand if a change can be described with a more narrow summary line. For example: What if a label and a description was changed the same time? If that happens often enough it could get a separate summary line. How would ChangeOpResult be able to understand that?
  • Both proposals would generally re-implement something that already largely exists in the ChangeOp hierarchy. Why not just use the existing logic instead?

I disagree, though it could be a subtask of that one. But this task is specifically about the edit summaries we need for MEX, i.e. statement edits within the same statement group (property ID).

  • Both suggest to analyze ChangeOpResult instances, not ChangeOps. But that's to late, isn't it? ChangeOpResult doesn't contain the information that would be needed to understand if a change can be described with a more narrow summary line.

We can change the code to add the information, just like ChangeOpLabel et al. already add their information via custom GenericChangeOpResult subclasses.

For example: What if a label and a description was changed the same time? If that happens often enough it could get a separate summary line. How would ChangeOpResult be able to understand that?

That would be a ChangeOpsResult that contains one ChangeOpLabelResult and one ChangeOpDescriptionResult. (Possibly nested inside any number of intermediate ChangeOpsResults. I’m inclined to say that both ChangeOps and ChangeOpsResult should be changed to “flatten themselves” at some point, so that they never contain intermediate instances of themselves. This would also make the if ( count( $this->changeOps ) === 1 ) condition inside ChangeOps::apply() more meainingful, and it would make ChangeOpResultTraversal unnecessary.)

FWIW, I agree that Thiemo’s option 4 is a viable option to implement this task (and better than option 3), though I’m not yet convinced it’s better than option 2.

I might be underestimating the amount of effort required, but I’m starting to think we could make a better decision if we first create prototype implementations of options 1, 2 and 4. (Options 1 and 2 should probably not be done in parallel, because I expect some code could just be copy+pasted between them without having to write it twice. Option 4 on the other hand seems fully parallelizable to me.)

Do ChangeOpsResult instances pull the summary from their corresponding ChangeOp? Or do they generate their own summary? The later would imply quite some code duplication, wouldn't it? That's my main concern here. I don't think it's a good idea to implement the same idea in two places – and the two potentially even generating different summaries for the same input.

I'm also a bit confused how a backend problem with the editentity API can be locally solved in a mobile UI? Both option 1 and 2 are not talking about a UI but suggest to add additional backend infrastructure that would affect all users of the API. That's why I believe this is essentially a duplicate of T67846.

Or in other words: When you partially solve T67846 only for the one use case you care about this ticket here becomes obsolete.

  1. Add a single new "ChangeOpStatementGroup" that generates a nice summary for an edit that affects multiple statements in the same group.
  2. Wire that into the hierarchy of ChangeOpDeserializer, specifically into ClaimsChangeOpDeserializer. All it needs to do is to iterate the list of incoming statements and check if they are all about the same property id.
  3. Teach EditEntity to respect and use this summary. See the suggested "atomic" flag above.

Do ChangeOpsResult instances pull the summary from their corresponding ChangeOp? Or do they generate their own summary? The later would imply quite some code duplication, wouldn't it? That's my main concern here. I don't think it's a good idea to implement the same idea in two places – and the two potentially even generating different summaries for the same input.

ChangeOpsResult instances don’t generate any summaries. EditSummaryHelper implements some edit summaries based on their data, but without (AFAICT) any code duplication at the moment. We basically have two kinds of edit summaries: the fine-grained ones generated by the ChangeOps themselves, which are used in all the other API modules, and the coarser ones generated by EditSummaryHelper, which are exclusively used by wbeditentity and just try to improve on the default of “Changed an Item”.

I'm also a bit confused how a backend problem with the editentity API can be locally solved in a mobile UI? Both option 1 and 2 are not talking about a UI but suggest to add additional backend infrastructure that would affect all users of the API.

Correct – just like the improved edit summaries for the termbox (T220696 and subtasks) also happen to affect e.g. Namescript (example). (This is also true for option 4, by the way.)

That's why I believe this is essentially a duplicate of T67846.

Or in other words: When you partially solve T67846 only for the one use case you care about this ticket here becomes obsolete.

I don’t understand this. When we partially solve T67846 only for the one use case we care about, then T67846 isn’t fully resolved yet and should stay open. But the part we care about, which is this task (or a successor non-investigation task), will be done, so we’ll be able to close it. Which is why it makes sense as a separate task.

We are aligned here. 👍️ The confusion is only about how the tickets should depend on each other, which is your team's decision to make. 😇️

Following the discussion, I have created a new parent task for this topic: T407877. Additionally, there are now 3 spike tasks to create prototypes for the viable options: T407874, T407875, T407876. As the A/C of this investigation task were met, I will mark this closed and the follow up work will be managed in these new tickets

Just to put my cards on the table before I launch into the investigations over at T407877 (which the team has apparently decided I should be doing, all three of them :D): by now I’ve come around to option 4. When we started discussing this task, I was mainly thinking about the new edit summary we needed, for something you can’t do in the desktop UI: edit multiple statements for the same property at once. But of course we also want all the existing edit summaries for the edits you can do in the desktop UI: if you edit just one statement (which should be the common case!), we want specific edit summaries like “added statement”, “edited statement value”, “removed statement”, “added qualifier”, “edited reference value”, and so on and so forth. And I feel like option 4 is best suited to do all of this without, as the desktop UI does, making the frontend issue a bunch of different API calls depending on the diff.

Let’s see if that’s still my opinion after I go through the investigations :)