Page MenuHomePhabricator

Automatically drop redundant sitelink to redirect when merging in wbmergeitems api module
Closed, ResolvedPublic

Description

Improvement proposal:
Let's have two items to be merged, which have two different sitelinks to a given MW site. One or both of these sitelinks may point to a redirect. If one redirect resolves to the other sitelink, it should be dropped automatically, otherwise the merge should fail.

Previous discussion: https://www.wikidata.org/wiki/MediaWiki_talk:Gadget-Merge.js#Resolve_redirected_sitelinks

Event Timeline

petr.matas raised the priority of this task from to Needs Triage.
petr.matas updated the task description. (Show Details)
petr.matas added a subscriber: petr.matas.
Lydia_Pintscher raised the priority of this task from Medium to High.
Lydia_Pintscher set Security to None.
Lydia_Pintscher added subscribers: hoo, daniel.

@Lydia_Pintscher, if discussion leads to us wanting to do this feel free to assign me and I'll take a poke when I find the time!

Addshore renamed this task from Automatically drop redundant sitelink to redirect when merging to Automatically drop redundant sitelink to redirect when merging in wbmergeitems api module.Jan 9 2015, 3:03 PM

This seems fine to me. Subscribing a few more people for input.

Would be great if this also works with the real duplicates (same sitelink on two items), but I like the idea. :)

So the bit of code we would need to touch is this method > https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/ChangeOp/ChangeOpsMerge.php#L199

I guess it would be, If there are conflicts for a site then find the actual final target of both sitelinks (the one we are trying to add and the one already there).
If both targets match then remove both sitelinks from both items and add the final target (Thus also resolving redirects)

All sounds good! Should make it easier to clean up duplicates/Wikipedia article merges.

I guess it would be, If there are conflicts for a site then find the actual final target of both sitelinks (the one we are trying to add and the one already there).
If both targets match then remove both sitelinks from both items and add the final target (Thus also resolving redirects)

This is too strong in my oppinion. Suppose that you are trying to merge the following items:

  • Bonnie (Q1) with sitelink [[en:Bonnie Parker]], which is a redirect to [[Bonnie and Clyde]]
  • Clyde (Q2) with sitelink [[en:Clyde Barrow]], which is also a redirect to [[Bonnie and Clyde]]

Although the final targets are identical, neither of the redirect chains passes through the other sitelink. This indicates that Bonnie (Q1) and Clyde (Q2) may be different concepts and they should not be merged.

I also think that merging should not replace redirects with their final targets. Suppose that you want to merge Clyde (Q2) with

  • Clyde B. (Q3) with sitelink [[en:Clyde B.]], which is a redirect to [[en:Clyde Barrow]], i.e. a double redirect

This merge should succeed, but the resulting sitelink should be one of the original sitelinks (the one closer to the final target), i.e. [[en:Clyde Barrow]], not [[en:Bonnie and Clyde]].

The case you have just described should not / can not happen.

The case you have just described should not / can not happen.

It can, after T67064 is resolved.

But the case will still never be able to happen on Wikidata..?

It can happen the following way:

  • Create [[en:Bonnie Parker]], linked to a new item Q1
  • Create [[en:Clyde Barrow]], linked to a new item Q2
  • Create [[en:Bonnie and Clyde]], not linked to any item
  • Merge [[en:Bonnie Parker]] and [[en:Clyde Barrow]] into [[en:Bonnie and Clyde]] and redirect them there
  • Mark [[en:Clyde Barrow]] as a redirect with possibilities (this will prevent double-redirect straightening after T67064-related changes are implemented)
  • Create [[en:Clyde B.]] linked to a new item Q3
  • Redirect [[en:Clyde B.]] to [[en:Clyde Barrow]]

The resulting Q1, Q2, Q3 are the ones from the example above.

But this is a sort of broken state which breaks things.
There should only ever be one sitelink on one item (including redirects to that site link), hence why the software doesn't allow you to add them in the first place!
And that's why I don't see the issue with what I described as it is slowly fixing the state of the data.

Change 204811 had a related patch set uploaded (by Addshore):
Drop redundant sitelinks when merging items

https://gerrit.wikimedia.org/r/204811

Change 204811 merged by jenkins-bot:
Drop redundant sitelinks when merging items

https://gerrit.wikimedia.org/r/204811

hoo moved this task from Review to Done on the Wikidata-Sprint-2015-04-07 board.
hoo removed projects: Patch-For-Review, patch-welcome.