Page MenuHomePhabricator

[Story] Merging wizard shouldn't allow dissimilar items to be merged
Closed, ResolvedPublic

Description

Recently there has been users merging items that by consensus should be kept separate. This is in bioinformatics where protein items has been merged with their associated gene items. It is likely the merging wizard that makes it too easy to merge.

In Magnus Manske's merging game it is not possible to merge two items that are linked AFAIU. It is then not possible to merge protein/gene as these items are usually linked by the "expressed by" property. Similar solution could be put into the merging wizard to warn users or to entire disable merging.

The issue has been discussed on the Wikidata mailing list, see threads https://lists.wikimedia.org/pipermail/wikidata/2015-November/007586.html and https://lists.wikimedia.org/pipermail/wikidata/2015-November/007580.html

Event Timeline

Fnielsen raised the priority of this task from to Needs Triage.
Fnielsen updated the task description. (Show Details)
Fnielsen added a project: Wikidata-Gadgets.
Fnielsen subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

Thumbs up for checking to see if the items being merged link to each other and highlighting those links to the would-be merger. That sounds like a nice way to reduce merging errors. In the case discussed above a gene item "encodes" P688 a protein item (or items) and reciprocally a protein item is "encoded by" P702 a gene.

This is a good idea, I think.

It might be best to simply prevent merging any two items which link to each other (A>B, B>A, or A<>B), in the same way that a merge is prevented if it would lead to a duplicate sitelink.

If it's a legitimate merge, the user still has the option of removing the links and retrying the merge. The prevention message thus has the same effect as a highlight/warning message but takes a little more conscious action to override.

From my side this could work. Are there any major objections?

Lydia_Pintscher renamed this task from Merging wizard shouldn't allow dissimilar items to be merged to [Story] Merging wizard shouldn't allow dissimilar items to be merged.Nov 13 2015, 1:32 PM
Lydia_Pintscher triaged this task as High priority.
Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.
Lydia_Pintscher set Security to None.

CCing a few people for notification and making sure they see no major issue with this.

If we do it we should imho not do it in the merge gadget but in the API.

Not sure we should hard disallow that (the behavior could be changed by another ignore parameter to the API), but I don't see a problem with issuing a warning or even disallowing the action (unless the overwrite is set) in such cases.

I agree. As long as we add another ignore option it would be better to integrate this in the Wikidata api. However, we should first make sure that we do not disallow possible valid merges.

If there's an ignore option, it sounds fine to me too.

The only thing I can think of where it might be a bit annoying is when people add "said to be the same as" to two items which need merging but originally couldn't be merged because of interwiki conflicts - the statements saying the items are the same are the ones which would make the merge fail. That's pretty minor though, since those statements need removing anyway if the items are being merged.

How can we ensure wbmergeitems isn't going to become a medley of ignore parameters?

Change 253583 had a related patch set uploaded (by Bene):
Disallow merging of items that link to each other

https://gerrit.wikimedia.org/r/253583

How can we ensure wbmergeitems isn't going to become a medley of ignore parameters?

As far as I can read Bene*'s change there is no extra ignore parameters introduced. It is simply testing to throwing an exception.

What about items that have a sitelink to each other?

What about items that have a sitelink to each other?

So far this is only about statements. I don't think there are any cases where an item links to another item using a sitelink.

What about items that have a sitelink to each other?

So far this is only about statements. I don't think there are any cases where an item links to another item using a sitelink.

Probably not, but it's technically possible. I'm just wondering if it wouldn't make more sense to ignore conflicts based on the type of conflict (›colliding sitelinks‹, ›colliding descriptions‹, ›items are linked with each other‹) instead of the location in the entity (›sitelinks‹, ›descriptions‹, ›statements‹). Maybe it would also make sense to use the referenced entities concept from ReferencedEntitiesDataUpdater.

Sitelinks are not really entity references. But how about items that reference each other via qualifiers or references?

UPDATE: after brief discussion in the team, it seems we want to avoid merging items also when they are corss-linked via qualifiers or references. I'll update the description to clarify.

Sitelinks are not really entity references. But how about items that reference each other via qualifiers or references?

UPDATE: after brief discussion in the team, it seems we want to avoid merging items also when they are corss-linked via qualifiers or references. I'll update the description to clarify.

As far as I understand this task should only consider main snaks. Ohterwise, a ReferencedEntitiesFinder would be needed as noted by Adrian.

Change 253583 merged by jenkins-bot:
Disallow merging of items that link to each other

https://gerrit.wikimedia.org/r/253583

Yeah for the first version main snak is fine. Let's keep that for now and see if this is still not enough.

As far as I understand this task should only consider main snaks. Ohterwise, a ReferencedEntitiesFinder would be needed as noted by Adrian.

Yes, that's what we just realized when we discussed this with Lydia. Note that ReferencedEntitiesFinder is now called ReferencedEntitiesDataUpdater.

I agree with Lydia that it's fine for the current patch to only consider main snaks (I just gave a +2), but we should leave this ticket open until the other cases are covered too.

This comment was removed by daniel.

Closing this as resolved. If it turns out we also need to consider references and qualifiers, we can reopen this ticket or create a new one.