Page MenuHomePhabricator

[Story] Merging wizard shouldn't allow dissimilar items to be merged
Closed, ResolvedPublic

Description

Recently there has been users merging items that by consensus should be kept separate. This is in bioinformatics where protein items has been merged with their associated gene items. It is likely the merging wizard that makes it too easy to merge.

In Magnus Manske's merging game it is not possible to merge two items that are linked AFAIU. It is then not possible to merge protein/gene as these items are usually linked by the "expressed by" property. Similar solution could be put into the merging wizard to warn users or to entire disable merging.

The issue has been discussed on the Wikidata mailing list, see threads https://lists.wikimedia.org/pipermail/wikidata/2015-November/007586.html and https://lists.wikimedia.org/pipermail/wikidata/2015-November/007580.html

Event Timeline

Fnielsen created this task.Nov 10 2015, 9:01 PM
Fnielsen updated the task description. (Show Details)
Fnielsen raised the priority of this task from to Needs Triage.
Fnielsen added a project: Wikidata-Gadgets.
Fnielsen added a subscriber: Fnielsen.
Restricted Application added a project: Wikidata. · View Herald TranscriptNov 10 2015, 9:01 PM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
I9606 added a subscriber: I9606.Nov 10 2015, 9:12 PM

Thumbs up for checking to see if the items being merged link to each other and highlighting those links to the would-be merger. That sounds like a nice way to reduce merging errors. In the case discussed above a gene item "encodes" P688 a protein item (or items) and reciprocally a protein item is "encoded by" P702 a gene.

agray added a subscriber: agray.Nov 11 2015, 10:48 AM

This is a good idea, I think.

It might be best to simply prevent merging any two items which link to each other (A>B, B>A, or A<>B), in the same way that a merge is prevented if it would lead to a duplicate sitelink.

If it's a legitimate merge, the user still has the option of removing the links and retrying the merge. The prevention message thus has the same effect as a highlight/warning message but takes a little more conscious action to override.

From my side this could work. Are there any major objections?

Lydia_Pintscher renamed this task from Merging wizard shouldn't allow dissimilar items to be merged to [Story] Merging wizard shouldn't allow dissimilar items to be merged.
Lydia_Pintscher triaged this task as High priority.
Lydia_Pintscher set Security to None.

CCing a few people for notification and making sure they see no major issue with this.

If we do it we should imho not do it in the merge gadget but in the API.

hoo added a comment.Nov 13 2015, 6:03 PM

Not sure we should hard disallow that (the behavior could be changed by another ignore parameter to the API), but I don't see a problem with issuing a warning or even disallowing the action (unless the overwrite is set) in such cases.

Bene added a comment.Nov 14 2015, 7:04 PM

I agree. As long as we add another ignore option it would be better to integrate this in the Wikidata api. However, we should first make sure that we do not disallow possible valid merges.

If there's an ignore option, it sounds fine to me too.

The only thing I can think of where it might be a bit annoying is when people add "said to be the same as" to two items which need merging but originally couldn't be merged because of interwiki conflicts - the statements saying the items are the same are the ones which would make the merge fail. That's pretty minor though, since those statements need removing anyway if the items are being merged.

How can we ensure wbmergeitems isn't going to become a medley of ignore parameters?

Bene claimed this task.Nov 17 2015, 11:03 AM

Change 253583 had a related patch set uploaded (by Bene):
Disallow merging of items that link to each other

https://gerrit.wikimedia.org/r/253583

How can we ensure wbmergeitems isn't going to become a medley of ignore parameters?

As far as I can read Bene*'s change there is no extra ignore parameters introduced. It is simply testing to throwing an exception.

thiemowmde moved this task from Backlog to Review on the Wikidata-Sprint-2015-11-17 board.

What about items that have a sitelink to each other?

Bene added a comment.Nov 25 2015, 9:39 AM

What about items that have a sitelink to each other?

So far this is only about statements. I don't think there are any cases where an item links to another item using a sitelink.

What about items that have a sitelink to each other?

So far this is only about statements. I don't think there are any cases where an item links to another item using a sitelink.

Probably not, but it's technically possible. I'm just wondering if it wouldn't make more sense to ignore conflicts based on the type of conflict (›colliding sitelinks‹, ›colliding descriptions‹, ›items are linked with each other‹) instead of the location in the entity (›sitelinks‹, ›descriptions‹, ›statements‹). Maybe it would also make sense to use the referenced entities concept from ReferencedEntitiesDataUpdater.

daniel added a subscriber: daniel.EditedNov 25 2015, 10:24 AM

Sitelinks are not really entity references. But how about items that reference each other via qualifiers or references?

UPDATE: after brief discussion in the team, it seems we want to avoid merging items also when they are corss-linked via qualifiers or references. I'll update the description to clarify.

daniel updated the task description. (Show Details)Nov 25 2015, 10:33 AM
Bene added a comment.Nov 25 2015, 10:33 AM

Sitelinks are not really entity references. But how about items that reference each other via qualifiers or references?

UPDATE: after brief discussion in the team, it seems we want to avoid merging items also when they are corss-linked via qualifiers or references. I'll update the description to clarify.

As far as I understand this task should only consider main snaks. Ohterwise, a ReferencedEntitiesFinder would be needed as noted by Adrian.

daniel updated the task description. (Show Details)Nov 25 2015, 10:34 AM

Change 253583 merged by jenkins-bot:
Disallow merging of items that link to each other

https://gerrit.wikimedia.org/r/253583

Yeah for the first version main snak is fine. Let's keep that for now and see if this is still not enough.

daniel added a comment.EditedNov 25 2015, 10:36 AM

As far as I understand this task should only consider main snaks. Ohterwise, a ReferencedEntitiesFinder would be needed as noted by Adrian.

Yes, that's what we just realized when we discussed this with Lydia. Note that ReferencedEntitiesFinder is now called ReferencedEntitiesDataUpdater.

I agree with Lydia that it's fine for the current patch to only consider main snaks (I just gave a +2), but we should leave this ticket open until the other cases are covered too.

This comment was removed by daniel.
Bene closed this task as Resolved.Nov 25 2015, 10:50 AM

Closing this as resolved. If it turns out we also need to consider references and qualifiers, we can reopen this ticket or create a new one.

Bene moved this task from Review to Done on the Wikidata-Sprint-2015-11-17 board.Nov 25 2015, 10:50 AM