Page MenuHomePhabricator

Step 6: Consolidate overwrite data cycle
Closed, ResolvedPublic

Assigned To
Authored By
Yug
Oct 8 2025, 2:48 PM
Referenced Files
Restricted File
Oct 9 2025, 11:31 AM
Restricted File
Oct 9 2025, 11:31 AM
F66738792: Screenshot from 2025-10-05 19-18-24.png
Oct 8 2025, 3:05 PM
F66738782: Screenshot from 2025-10-07 01-10-56.png
Oct 8 2025, 3:05 PM
Subscribers

Description

Current behavior

Re-recording : Sometimes, the speaker may want to re-record some items. We then observed :

  • the new record audio file is uploaded over (overwrite), as expected
  • the new data is added into Commons in addition to existing data, creating duplication, which we don't want

Screenshot from 2025-10-07 01-10-56.png (101×1 px, 81 KB)

Screenshot from 2025-10-05 19-18-24.png (847×1 px, 106 KB)

Expected behavior

  • the new record audio file is uploaded over (overwrite), as expected
  • the old data is removed from Commons, the new data is added, there is no duplication, as expected

Files

The file upload cycle involves the following

  1. src/views/ReviewStep.vue
  2. API call
  3. upload_batches/views.py : import serializers.py
  4. upload_batches/serializers.py > import and call upload2commons.py's add_structured_data_to_recording() and error logs here.
  5. upload_batches/helpers/upload2commons.py > code of add_structured_data_to_recording()

Inspiration to review (not pure python but Claude Sonnet could translate that in vanilla python) :

SDC examples

Note: content may have changed when you review this task.

  • Inline has SDC claims P275 and P6216 (incomplete)
  • travailler has SDC claims P6216, P275, P31, P407, P9533, P3575, P2047, P1163, P4092 , P585, P10893, P10894
    • to delete then recreate : P3575 data size, P2047 duration, P4092 checksum, P585 point in time
  • scarabée has SDC claims : none, no SDC.
Sample

Response for file with all SDC data

json
{
    "entities": {
        "M175475927": {
            "pageid": 175475927,
            "ns": 6,
            "title": "File:LL-Q150—Yug—travailler.wav",
            "lastrevid": 1089941946,
            "modified": "2025-09-23T12:55:35Z",
            "type": "mediainfo",
            "id": "M175475927",
            "labels": {},
            "descriptions": {},
            "statements": {
                "P6216": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P6216",
                            "hash": "f88a8b9472789ca66067ed50ce20dc2650af5744",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 88088423,
                                    "id": "Q88088423"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$BC76B2D0-22B0-4D89-AF8F-6165D2E8DBFB",
                        "rank": "normal"
                    }
                ],
                "P275": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P275",
                            "hash": "28a6efcc224656589c283263e9efa1effe56d682",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 6938433,
                                    "id": "Q6938433"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$228AB0E6-4709-454E-A874-059781AC4BFD",
                        "rank": "normal"
                    }
                ],
                "P31": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P31",
                            "hash": "ea121edceb9f865460bdf51a6b0ea5460c7f33cd",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 108167708,
                                    "id": "Q108167708"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$EEA65FF9-681A-473D-8309-C55941474C5D",
                        "rank": "normal"
                    }
                ],
                "P407": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P407",
                            "hash": "d197d0a5efa4b4c23a302a829dd3ef43684fe002",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 150,
                                    "id": "Q150"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$B3115A2F-0AB4-470F-8C02-027C06DB94A8",
                        "rank": "normal"
                    }
                ],
                "P9533": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P9533",
                            "hash": "c36e4ad8f5ef10268e309fa8a924c1719d54a196",
                            "datavalue": {
                                "value": {
                                    "text": "travailler",
                                    "language": "fr"
                                },
                                "type": "monolingualtext"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$75E8E6B3-E10D-45A9-B98D-9970218A5999",
                        "rank": "normal"
                    }
                ],
                "P3575": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P3575",
                            "hash": "963419415dab322d0bd3a7df1353be5b49d4ed02",
                            "datavalue": {
                                "value": {
                                    "amount": "+93740",
                                    "unit": "http://www.wikidata.org/entity/Q8799"
                                },
                                "type": "quantity"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$1D6D3D22-C9FC-43AF-B4C1-E5279B6F9C63",
                        "rank": "normal"
                    }
                ],
                "P2047": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P2047",
                            "hash": "c8d98b44b272f223ef59e8fdfff30c7f4ab7f159",
                            "datavalue": {
                                "value": {
                                    "amount": "+0.976",
                                    "unit": "http://www.wikidata.org/entity/Q11574"
                                },
                                "type": "quantity"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$2ABDD401-F0ED-4DEC-8EA6-5375D5FB8A2C",
                        "rank": "normal"
                    }
                ],
                "P1163": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P1163",
                            "hash": "dd8572d4245e5e6c2c5a2c4110de83fca2a87574",
                            "datavalue": {
                                "value": "audio/wav",
                                "type": "string"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$334067A4-5FE0-47BD-A3E6-149616BE9479",
                        "rank": "normal"
                    }
                ],
                "P4092": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P4092",
                            "hash": "30795b47373b5f9bf28fcfed6a171e45bee0c686",
                            "datavalue": {
                                "value": "d428ad816fe2a536994004da1b79916ff5cd86db9cf32f7dcbd53e2ae524cee8",
                                "type": "string"
                            }
                        },
                        "type": "statement",
                        "qualifiers": {
                            "P459": [
                                {
                                    "snaktype": "value",
                                    "property": "P459",
                                    "hash": "1260aceb09495672eacb8900e5c1624ecb3bf7fd",
                                    "datavalue": {
                                        "value": {
                                            "entity-type": "item",
                                            "numeric-id": 110651361,
                                            "id": "Q110651361"
                                        },
                                        "type": "wikibase-entityid"
                                    }
                                }
                            ]
                        },
                        "qualifiers-order": [
                            "P459"
                        ],
                        "id": "M175475927$98FE10CC-5205-4276-BEBA-D857BA838B28",
                        "rank": "normal"
                    }
                ],
                "P585": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P585",
                            "hash": "3feb680bc68c53c309ffb0e972c18c31d25c885a",
                            "datavalue": {
                                "value": {
                                    "time": "+2025-09-23T00:00:00Z",
                                    "timezone": 0,
                                    "before": 0,
                                    "after": 0,
                                    "precision": 11,
                                    "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
                                },
                                "type": "time"
                            }
                        },
                        "type": "statement",
                        "id": "M175475927$DC76328F-44DA-4FF3-A921-417861B2B43B",
                        "rank": "normal"
                    }
                ],
                "P10893": [
                    {
                        "mainsnak": {
                            "snaktype": "somevalue",
                            "property": "P10893",
                            "hash": "ba348c7c20e84bd0d037f86c525a7a56ec7c92e0"
                        },
                        "type": "statement",
                        "qualifiers": {
                            "P4174": [
                                {
                                    "snaktype": "value",
                                    "property": "P4174",
                                    "hash": "835a86d2cdd0280833283fc7ebc4e97e6e02b529",
                                    "datavalue": {
                                        "value": "Yug",
                                        "type": "string"
                                    }
                                }
                            ]
                        },
                        "qualifiers-order": [
                            "P4174"
                        ],
                        "id": "M175475927$102717FB-BCB2-4554-B58C-DB6269642C14",
                        "rank": "normal"
                    }
                ],
                "P10894": [
                    {
                        "mainsnak": {
                            "snaktype": "somevalue",
                            "property": "P10894",
                            "hash": "c1e8360599474248ad0dc834f6ed9831d866cf41"
                        },
                        "type": "statement",
                        "qualifiers": {
                            "P2093": [
                                {
                                    "snaktype": "value",
                                    "property": "P2093",
                                    "hash": "5b30edf5fdaf8e824947c009f562ffba7cb47dee",
                                    "datavalue": {
                                        "value": "Yug",
                                        "type": "string"
                                    }
                                }
                            ]
                        },
                        "qualifiers-order": [
                            "P2093"
                        ],
                        "id": "M175475927$664043A0-F7B1-45F1-8851-C3A28B08A1BF",
                        "rank": "normal"
                    }
                ]
            }
        }
    },
    "success": 1
}

Response for file with partial SDC data

json
{
    "entities": {
        "M175851816": {
            "pageid": 175851816,
            "ns": 6,
            "title": "File:LL-Q150—Yug—Inline.wav",
            "lastrevid": 1093216135,
            "modified": "2025-09-30T13:25:06Z",
            "type": "mediainfo",
            "id": "M175851816",
            "labels": {},
            "descriptions": {},
            "statements": {
                "P275": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P275",
                            "hash": "28a6efcc224656589c283263e9efa1effe56d682",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 6938433,
                                    "id": "Q6938433"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175851816$7DA492CE-F66A-44AE-80CF-A6F0AA6595F2",
                        "rank": "normal"
                    }
                ],
                "P6216": [
                    {
                        "mainsnak": {
                            "snaktype": "value",
                            "property": "P6216",
                            "hash": "f88a8b9472789ca66067ed50ce20dc2650af5744",
                            "datavalue": {
                                "value": {
                                    "entity-type": "item",
                                    "numeric-id": 88088423,
                                    "id": "Q88088423"
                                },
                                "type": "wikibase-entityid"
                            }
                        },
                        "type": "statement",
                        "id": "M175851816$4BA158C6-4E24-4DA6-8D30-A7FFA591C194",
                        "rank": "normal"
                    }
                ]
            }
        }
    },
    "success": 1
}

Response for non-existing files

json
{
    "entities": {
        "-1": {
            "site": "enwiki",
            "title": "File:Q150—Yug—travailler.wav",
            "missing": ""
        }
    },
    "success": 1
}

Note

Some code are on ReviewStep.vue, some on upload2commons.py . consolidation should ideally put most of the behavior on one side.

Event Timeline

Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug renamed this task from Step 6: Consolidate upload cycle to Step 6: Consolidate upload data cycle.Oct 8 2025, 3:12 PM
Yug renamed this task from Step 6: Consolidate upload data cycle to Step 6: Consolidate overwrite data cycle.
Yug triaged this task as Medium priority.
Yug updated the task description. (Show Details)
Aditya changed the task status from Open to In Progress.Oct 8 2025, 5:12 PM
Aditya claimed this task.

To fix, we need only 1 check

If a recording already exists and the user uploads the same one again, we want to delete the existing SDC.
A new SDC will be created automatically according to the current flow.

Since the new recording may change some of the claims, we want to keep them consistent with the new recording.
for eg: data size, duration, checksum, point in time

I have raised an MR #86

Expected behaviour after this is if a user tries to record the same word again, it should update the audio, update the SDC, and remove the old SDC.

Yug closed this task as Resolved.EditedOct 9 2025, 11:28 AM

In previous days we needed to check the exact metadata state, this was to limit the following excessive edits to delete then recreate claims.
The new url parameters introduced by Claude Sonnet is a game changer since it clears and rewrite all claims in one single edit. The need to check if and which metadata exists therefore nearly disappear.

The both the SDC and the edit logs are now clean and up to production expectation.

{F66740180}

{F66740179}

See MR#86 + MR#87 🚀

⚠️ : UI of progress bar and upload counter are still working but seems degraded a bit.