Page MenuHomePhabricator

Merging of Senses
Closed, ResolvedPublic8 Story Points

Description

As an editor I want Senses to be taken into account when merging Lexemes in order to have a complete merge and not lose any valuable information when merging two Lexemes.

Problem:
Lexeme merging so far only takes into account the Lexeme- and Form-level data and not the Senses data.

BDD
GIVEN a source and a target Lexeme that should be merged
AND the source Lexeme contains at least one Sense
WHEN merging the source Lexeme into the target Lexeme
THEN the Sense data is removed from the source Lexeme
AND each individual Sense including Glosses and statements is added as a new Sense to the target Lexeme

GIVEN a source and a target Lexeme that should be merged
AND the source or target Lexeme contain at least one Sense with a statement linking to the other one
WHEN trying to merge the source Lexeme into the target Lexeme
THEN the merge is not done
AND an error message "Failed to merge Lexemes, please resolve any conflicts first. Error: Lexemes link to each other in a statement." shown

Acceptance criteria:

  • no Sense data is lost in a merge
  • new IDs in the target Lexeme for the Senses are acceptable for now
  • if the source and target Lexeme link to each other in a statement on a Sense then the merge is blocked
  • works via SpecialPage and API

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptAug 9 2018, 11:46 AM

Should only start once T189129 is finished.

We can implement a kind of "internal redirect": replace the merged sense with a kind of "stub" entity including a "symbolic" link to other senses (indicating the content is located in other senses), and existing links will not be broken

Lydia_Pintscher added a subscriber: Denny.

I can't think of a case where we can meaningfully automatically merge individual Senses from the source and target Lexeme into each other. @Denny, @daniel: can you? Otherwise they'll always be added as new Senses and need to potentially be sorted out by hand. But I fear that's the best we can do.

As each senses may have some statements, we may create a new special page Special:MergeSense to handle that.

Addshore moved this task from Doing to Peer Review on the Wikidata-Senses-Iteration4 board.

Yeah that could work! Let's see how much of a problem it becomes and then file a ticket for it if needed.

It appears that the statement on L1-S8 lost its GUID during the merge. ChangeOpFormAdd has some logic to (re)generate GUIDs when adding a form with existing statements. It looks like this may be missing for senses.
wbgetentities says

{
    "id": "L1-S8",
    "glosses": {
        "de": {
            "language": "de",
            "value": "bar"
        }
    },
    "claims": {
        "P13432": [
            {
                "mainsnak": {
                    "snaktype": "value",
                    "property": "P13432",
                    "hash": "86829551052769864607436e311ec0eb9574bc59",
                    "datavalue": {
                        "value": {
                            "text": "testing a statement",
                            "language": "en"
                        },
                        "type": "monolingualtext"
                    }
                },
                "type": "statement",
                "rank": "normal"
            }
        ]
    }
}

Change 460891 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Reassign statement IDs when adding a sense

https://gerrit.wikimedia.org/r/460891

I think this is related to the same problem – the statements have different IDs before and after the merge, so it looks like a diff. SenseDiffer should probably special-case that somehow – I’ll compare with FormDiffer later.

Weird, when I merge a lexeme that has only forms, but no senses, I don’t even get that separate edit (“merged lexeme into Lxxx”), only the final “redirected to Lxxx” edit.

EDIT: just to avoid confusion – it turns out that the difference isn’t whether the lexeme has forms or senses, it’s whether any forms or senses have statements. See the patch in the next comment.

Change 461112 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Clone source entity in ChangeOp{Form,Sense}Clone

https://gerrit.wikimedia.org/r/461112

Change 461112 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Clone source entity in ChangeOp{Form,Sense}Clone

https://gerrit.wikimedia.org/r/461112

Addshore moved this task from incoming to in progress on the Wikidata board.Sep 18 2018, 2:24 PM

Change 460891 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Reassign statement IDs when adding a sense

https://gerrit.wikimedia.org/r/460891

Lydia_Pintscher closed this task as Resolved.Sep 26 2018, 9:34 AM
Lydia_Pintscher claimed this task.

\o/