Page MenuHomePhabricator

Implement Lexeme merging ChangeOps
Closed, ResolvedPublic

Description

  • Implement ChangeOps per rules documented in T198105 (and researched in T196554#4291641)
  • Abstract statement merging logic from the way it is done for items, reuse for lexemes and forms

Wikibase\Repo\ChangeOp\ChangeOpsMerge::generateStatementsChangeOps

Note:
For ChangeOps in the lexeme api we introduced ChangeOpApplyException which allows for localized errors to be emitted from ChangeOps

Details

Related Gerrit Patches:
mediawiki/extensions/WikibaseLexeme : masterlexeme merging: implement ChangeOps
mediawiki/extensions/Wikibase : masterMergeChangeOpsFactory: rename
mediawiki/extensions/Wikibase : masterStatementsMerger: allow reuse

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Pablo-WMDE updated the task description. (Show Details)Jul 11 2018, 11:28 AM

Change 446814 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] lexeme merging: implement ChangeOps

https://gerrit.wikimedia.org/r/446814

@Lydia_Pintscher Questions regarding the merging of forms - putting it in writing to reduce misunderstandings (let's see how it goes), but please let's talk

a

Imagine two lexemes, source and target.
Source has two forms, both have identical grammatical features, a common representation, and no contradicting representations.
Source's second form also has a statement attached to it.
Target has one form, that matches sources' both forms in grammatical features, the representation. it has no statements.

How many forms will target have after the merge?

1 ) the statement from source's second form will be merged into the pre-existing target form, other forms disregarded as duplicates (strict garbage collection)
2 ) source's first form will not be disregarded, source's second form will be merged into the target form
3 ) all forms are simply applied to target (no garbage collection)

b

Imagine the previous example but with one form on source, two on target.

Will the merger actively remove/merge redundant forms within target?

Pablo-WMDE added a comment.EditedJul 27 2018, 9:49 AM

@Lydia_Pintscher Another question regarding how to detect "cross-referencing" between Lexemes, Forms, and their statements.

Back in the day there was a decision to only check the main snak for this cross-referencing.
Are we only to check the main snack for lexeme cross-referencing, or all the snaks (patch up here https://gerrit.wikimedia.org/r/448027)?
Would it be ok/desired if this also changed the behavior for item statement cross-referencing checks -> T119614?

I just talked it through with @Jakob_WMDE.

@Lydia_Pintscher Questions regarding the merging of forms - putting it in writing to reduce misunderstandings (let's see how it goes), but please let's talk

a

Imagine two lexemes, source and target.
Source has two forms, both have identical grammatical features, a common representation, and no contradicting representations.
Source's second form also has a statement attached to it.
Target has one form, that matches sources' both forms in grammatical features, the representation. it has no statements.
How many forms will target have after the merge?
1 ) the statement from source's second form will be merged into the pre-existing target form, other forms disregarded as duplicates (strict garbage collection)
2 ) source's first form will not be disregarded, source's second form will be merged into the target form
3 ) all forms are simply applied to target (no garbage collection)

This seems to be missing the 4th option which I think we should go for. The target would have 2 Forms and one of them (second if possible but it's not really important) would have the statements.
(This assumes that option 3 means having 3 Forms in the target after merging.)

b

Imagine the previous example but with one form on source, two on target.
Will the merger actively remove/merge redundant forms within target?

Merging should never remove - only add or ignore if redundant. So in this example the target would have 2 Forms after merging and one of them would have statements.

@Lydia_Pintscher Another question regarding how to detect "cross-referencing" between Lexemes, Forms, and their statements.
Back in the day there was a decision to only check the main snak for this cross-referencing.
Are we only to check the main snack for lexeme cross-referencing, or all the snaks (patch up here https://gerrit.wikimedia.org/r/448027)?
Would it be ok/desired if this also changed the behavior for item statement cross-referencing checks -> T119614?

Oh nice. Yes if we can do this now then we should be doing it for item statements as well. IIRC we did not do this in the past because it was too hard and not super important back then. If we have it now then \o/

Change 454513 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/Wikibase@master] StatementsMerger: allow reuse

https://gerrit.wikimedia.org/r/454513

Change 454516 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/Wikibase@master] MergeChangeOpsFactory: rename

https://gerrit.wikimedia.org/r/454516

Change 454513 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] StatementsMerger: allow reuse

https://gerrit.wikimedia.org/r/454513

Change 454516 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] MergeChangeOpsFactory: rename

https://gerrit.wikimedia.org/r/454516

Pablo-WMDE removed Pablo-WMDE as the assignee of this task.Aug 27 2018, 11:47 AM

Change 446814 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] lexeme merging: implement ChangeOps

https://gerrit.wikimedia.org/r/446814