Page MenuHomePhabricator

Not getting an edit conflict on two conflicting structured data edits on Commons (since MediaWiki 1.36/wmf.9?)
Open, Needs TriagePublic

Description

Today someone pointed out someone pointed out that my bots did duplicate edits on Commons. I often have multiple bots running that might collide a bit and this would just give some edit conflicts and one of the bots would wait a bit. I tested this multiple times in the past and never had duplicate edits.

To test this now I fired up two bots up on the same category. https://commons.wikimedia.org/w/index.php?title=File%3A%22A_Muslim_Pilgrim_Learns_a_Lesson_in_Piety_from_a_Brahman%22%2C_Folio_from_a_Khamsa_%28Quintet%29_of_Amir_Khusrau_Dihlavi_MET_CAT_12r1_89A.jpg&type=revision&diff=464154718&oldid=428228218 is what happened and no edit conflicts. Checking my recent edits I see a whole bunch of duplicates. I have only done minor updates on the client code and haven't updates Pywikibot recently, the only thing that has changed is that MediaWiki 1.36/wmf.9 seems to have been deployed yesterday on Commons

Event Timeline

Restricted Application added a project: Platform Engineering. · View Herald TranscriptSep 18 2020, 9:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ramsey-WMF added subscribers: Addshore, Ramsey-WMF.

@Addshore any thoughts on this? Something change in Wikibase recently?

holger.knust added a subscriber: holger.knust.

Untagging Platform team for now unless there is something for us to work on.

CBogen added a subscriber: CBogen.Sep 23 2020, 7:36 PM

@Multichill are you only seeing this issue on Commons, or is it on Wikidata as well? Also, if you could point us to your bot code and the exact part where this conflict happens that will help us and WMDE diagnose the problem.

So if I make subsequent requests to add the same statement It only makes and edit the first time. (using some api parameters that I whipped up which may not match what was being done in this ticket.
See in ApiSandbox
Subsequent responses end up looking like this

{
    "entity": {
        "labels": {},
        "descriptions": {},
        "claims": {
            "P95560": [
                {
                    "mainsnak": {
                        "snaktype": "value",
                        "property": "P95560",
                        "hash": "a622e895bd403055234a24348a77a8125f7b3b0c",
                        "datavalue": {
                            "value": {
                                "entity-type": "item",
                                "numeric-id": 16597,
                                "id": "Q16597"
                            },
                            "type": "wikibase-entityid"
                        },
                        "datatype": "wikibase-item"
                    },
                    "type": "statement",
                    "id": "M388$7B743B1C-6B91-40FB-AB79-6469D76A4BCE",
                    "rank": "normal"
                }
            ]
        },
        "id": "M388",
        "type": "mediainfo",
        "lastrevid": 3220,
        "nochange": ""
    },
    "success": 1
}

This is also the behaviour that I observe when creating 2 statements in a single request.

When passing a baserevid (used to detect and resolve edit conflicts if possible) I correctly see an edit conflict warning

{
    "error": {
        "code": "editconflict",
        "info": "Edit conflict. Could not patch the current revision.",
        "messages": [
            {
                "name": "wikibase-api-editconflict",
                "parameters": [],
                "html": {
                    "*": "Edit conflict. Could not patch the current revision."
                }
            },
            {
                "name": "edit-conflict",
                "parameters": [],
                "html": {
                    "*": "Edit conflict."
                }
            }
        ],
        "*": "See https://test-commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw2297"
}

It's only on Commons and to reproduce you need a file with no structured data yet. First try doing an edit with "baserevid", you'll get a nasty API error (that's why I'm unable to use that).
To reproduce just do two edits close to each other from different jobs.

So as @Multichill points out if no structured data exists and you try to make an edit while providing a baserevid you get:

{
    "error": {
        "code": "nosuchrevid",
        "info": "Revision with ID not found.",
        "messages": [
            {
                "name": "wikibase-api-nosuchrevid",
                "parameters": [],
                "html": {
                    "*": "Revision with ID not found."
                }
            }
        ],
        "*": "See https://test-commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw2368"
}

This sounds like something that could be fixed (probably related to MCR etc)

It looks like doing multiple queries like this ApiSandbox results in multiple created statements.

This was missed during my tests in T263298#6491777 as I was actually providing an id in the statement serialization.

Specifying a baserevid ends up resulting in a patch that still adds the new statement, and as far as I can tell this is consistent behaviour with wikdata.org and has been the state for a while.

Not sure who wants to change what here, but I guess this needs to be discussed between @Ramsey-WMF and @Lydia_Pintscher

T234457: [M] Don't create a duplicate depicts statement upon label approval is also related I figure. wbsetclaim could be used instead of wbeditentity to make use of this functionality

Based on discussion today, @Cparle will reach out to @Addshore to discuss technical implications of changing things here.

I just noticed this mid air collision: https://commons.wikimedia.org/w/index.php?title=File%3ANSG_Salmorth_PM19-09.jpg&type=revision&diff=485124008&oldid=485009736 .
If I understand Adman correctly, adding the option to use baserevid won't solve this. Using wbsetclaim isn't really an option because edits like this would take 8 edits instead of one. Maybe introduce baserevid and some kind of option for the bot to indicate that it wants strict checking instead of a warning?

Looking at the edit conflict options for action=edit:
baserevid

ID of the base revision, used to detect edit conflicts. May be obtained through action=query&prop=revisions. Self-conflicts cause the edit to fail unless basetimestamp is set. 
Type: integer

basetimestamp

Timestamp of the base revision, used to detect edit conflicts. May be obtained through action=query&prop=revisions&rvprop=timestamp. Self-conflicts are ignored. 
Type: timestamp (allowed formats)

starttimestamp

Timestamp when the editing process began, used to detect edit conflicts. An appropriate value may be obtained using curtimestamp when beginning the edit process (e.g. when loading the page content to edit).

Maybe we can bring things closer together here?

Ok @Multichill I've talked with @Addshore and we think possibly the simplest thing is to introduce a param that you can pass with the api call to ask it to throw an error if there's a duplicate. Does that work for you?

Ok @Multichill I've talked with @Addshore and we think possibly the simplest thing is to introduce a param that you can pass with the api call to ask it to throw an error if there's a duplicate. Does that work for you?

So, just to clarify, such an API param already exists on wbsetclaim and it was introduced in T234457: [M] Don't create a duplicate depicts statement upon label approval.
I'm not sure if it makes sense to introduce this in in wbeditentity.

Looking forward to a REST Api as well, this would make sense as an option on a statement API endpoint, but perhaps not at an entity level.

I still think we should hear from @Ramsey-WMF, @Lydia_Pintscher and @Samantha_Alipio_WMDE here from the product side.
It's easy to make changes, but the real question is what should this part of the product look like and what behaviour do we want to by default and as options present to the users.

Stepping back a bit. Wbeditentity should work more like the normal edit (action=edit) with things like how to handle edit conflicts and also minor edits.

Ramsey-WMF added a comment.EditedOct 20 2020, 4:10 PM

@Lydia_Pintscher and @Samantha_Alipio_WMDE - given that WMDE is drafting new stuff for the REST API, I'd defer to you on ideas for how this kind of change should apply in more general use cases. On our side, we're fine with a param on wbeditentity

@Lydia_Pintscher and @Samantha_Alipio_WMDE - given that WMDE is drafting new stuff for the REST API, I'd defer to you on ideas for how this kind of change should apply in more general use cases. On our side, we're fine with a param on wbeditentity

Taking some snippets of thought from an email I sent recently on this matter (and recording them here for all to see and so they are not lost)


So this also somewhat ties in with the use of, or potential use of baserevid and how conflicts are resolved or merged.

If you simply make 2 edits adding the same statement one after another with no baserevid then the server should probably not complain at all. (no edit conflict)
If you make 2 edits adding the exact same statement one after another with the same baserevid, then perhaps the server should complain? (right now it doesn't and successfully merges the changes).
This edit conflict possibility would be a product decision for Wikibase / Wikidata

The expectation then would be that client correctly set the baserevid for their edits and for the situation a conflict would then be triggered.
The "situation" in one sentence would be, adding a statement that is exactly the same as another statement that was not in the baserevid, but is in the newer version.
Fixing this will likely involve some code changes in areas we havn't touched since 2013/14, and I have no estimate as to how much work this would be right now.

This original usecase (covered by ignoreduplicatemainsnak) is slightly different to this and probably best handed to the client and mediainfo extension and they could create a specialized REST endpoint for it
This endpoint would check if the tagging thing wants to make the edit or not, and then set the claim, rather than using wbsetclaim directly.
If we wanted to include it in wikibase as an edit conflict rule we would end up with something like:
Adding a statement with a mainsnak the same as another statement mainsnak that was not in the baserevid, but is in the newer version.
This edit conflict possibility would be a product decision for Wikibase / Wikidata

A final point that is then also relevant here:
When a file on commons does not get have a media info entity attached to it, you can not use a baserevid in Wikibase API calls
This is a small issue with the MCR integration that someone will need to tackle for the edit conflict rules to be able to work in all cases.

Addshore added a comment.EditedNov 5 2020, 6:50 PM

Quickly discussed with Lydia and Sam after our sync call today:

1# We will investigate the "final point" being the MCR issue with baserevid and see if we can get closer to what needs to happen there. (This could then land on one of us, or on the core team)

T267363: Investigate the use of baserevid with MCR and creating a new MediaInfo entity on an existing page

2# We discussed the possibility of adding more edit conflict situations in Wikibase but didn't come to a final conclusion.
The main points here are:

  • These issues have not surfaced since the edit conflict rules etc were setup in 2013, thus they obviously do not cause that much of an issue (nothing has changed here), so if this is a one off pain, it might not be worth touching anything
  • @Lydia_Pintscher will come back to this "is it worth it" point at some point.
  • Even if we agree that it may not be worth it to be prioritized ahead of other things right now, it might still be nice to have at some point in the future, as the edit conflict condition does make some sense.
  • The second bold issue discussed in T263298#6586005 which we currently created ignoreduplicatemainsnak for will likely never become an edit conflict issue.
    • We also don't think it makes sense to include this extra functionality in our baseline future REST API.
    • This will mean that T234457 will need to be solved in a different way in the future. (as part of the CAT tool, or as a MediaInfo / CAT tool specific API) as mentioned in the comment above.

I hope my inline bold is not too annoying here.

So T267363 is set to be looked at in our prioritization session soon and then by some folks.
We decided that case 1 should probably be an edit conflict, but as the issue doesn't surface very often we likely won't invest dev resources to turn it into an edit conflict at this stage.
Case 2 probably should not be and edit conflict, we won't be doing anything there.
Not sure what to do with this ticket given those decisions?