Page MenuHomePhabricator

[feature request] how to clean up redirects in sitelinks
Closed, ResolvedPublic

Description

Problems:

We need to find a better way to clean these up.

Workarounds, reports and possible fixes:

  1. The request T143486 might limit this from happening.
  2. https://www.wikidata.org/w/index.php?title=Wikidata:Bot_requests&oldid=375991287#Delete_redirects for https://www.wikidata.org/wiki/User:Josve05a/dupes
  3. Lists are available at wikidata-redirects-conflicts-reports (Pywikibot script for them)

Event Timeline

Esc3300 updated the task description. (Show Details)
Esc3300 updated the task description. (Show Details)

For what it's worth, I've built WRCR with the intent of finding items that link to pages that redirect to other pages that have items linking to them too.

I must warn there are some false positives, due to a tradeoff between specificity and speed ; and even though I'd like better specificity, I had no choice given the time "big" wikis (anything bigger than arwiki, really) would have taken (certainly hours, possibly days or even weeks - instead of seconds or minutes with the current setup). I however don't believe there are false negatives, sensibility should be 100%. I am still looking for a way to improve specificity to 100%.

It publishes TSV files for every wiki every friday around 03:00 UTC, and keeps track of the number of hits on each wiki - a visualization tool for the number of hits over time is planned.

Thanks for the input. I think it's a good idea to track the extent of the problem. It might be worth adding the number per wiki on a page on wikidata.org

@Lydia_Pintscher can you set this as blocking for T54564 ? Already the current situation is problematic, but expanding it just makes things worse.

Could someone kindly show a case -- or two :) -- of how such links cause a problem?
For example... looking at flautist / flûtist / fløjtenist (Q12902372) with an article e.g. at Fløjtenist but some other links, such as in enwiki, redirecting to the instrument page -- vs the instrument, flute (Q11405) but also recorder (Q187851), etc. etc. ... so aren't such additional links in fact rather helpful ?

Thanks much.

Could someone kindly show a case -- or two :) -- of how such links cause a problem?
For example... looking at flautist / flûtist / fløjtenist (Q12902372) with an article e.g. at Fløjtenist but some other links, such as in enwiki, redirecting to the instrument page -- vs the instrument, flute (Q11405) but also recorder (Q187851), etc. etc. ... so aren't such additional links in fact rather helpful ?

Thanks much.

A flute is a woodwind instrument, therefore a wind instrument, therefore an aerophone, therefore a musical instrument, therefore a tool.
Are you saying flautists are tools? Yes, the Q2030511 was fully intended.

A redirect from enwiki:flautist to enwiki:flute can somewhat make sense locally - although it prevents enwiki from having an article about the actual "obnoxious or uptight person" behind the instrument.
But from a Wikidata perspective, when someone expects to click on a link and find an article on the specific subject they's looking at, and in fact finds an article on something related instead of what they was looking for, it is only confusing, not "rather helpful" at all.

aye, someone...clicks on a link to find... ok, not a full-blown article but a subsection.. a paragraph, just a sentence even... however the particular language wiki has been capable of covering the subject so far... isn't this a good thing wikipedia can offer to its readers?

wikipedias have such redirects for a valid reason, don't they.

the "redirect linkworthiness" question might be the task to tackle instead :) would you agree?

It doesn't matter that Wikipedias chose to handle things that way. Wikipedia chose to fill any gap by pointing to a related article. Wikidata chose to fill any gap by actually filling said gap.

Wikidata chose to point to articles specifically about the entity described. If you wish for Q12902372 to have a link to an article on enwiki, have an enwiki article about Q12902372. There can be no approximation. So no, I would not agree, no matter how you put it.

There can be no approximation >

  • ..is an idea rather orthogonal to the project that aims at modeling the real world in its complexity, I dare say.

So your concern, the problem with these "approximate" links as you see it, lest I get it backwards, is about the easily confused wikipedia readers?

( in this problematic 'musical instrument / player' linked redirect example, would it be a dane who checks the dawiki:fløjtenist, then clicks a Q12902372 enwiki:flautist link into en.wikipedia and lands on enwiki:flute article, then reads through the intro to the second paragraph "A musician who plays the flute can be referred to as a flute player, flautist, flutist...", and then somehow reaches into wiktionary:tool and reads bottom-up? or worse yet, scrolls down the en:flute page and hits the template:flutes, yeah that'd be harsh. :)

I don't see a real 'confusion' problem in there.

How to select linkworthy redirects among, potentially, a multitude of them or how to automate this task, I can see a problem here, but may be there is a solution already?

Thanks again.

The debate is long over.
Links MUST point to articles specifically about the entity described. Not sections, not redirects, not articles about generic entities, not articles about something unrelated. There can be no exceptions. This is especially useful to machines, which are the first consumers of Wikidata - including interwikis on Wikipedia, as they're served to you by machines that need to get them from Wikidata. Machines can't read URL fragments in a meaningful fashion, only browsers will scroll until the top of the screen aligns with the top of the (x|ht|xht)ml element with the id specified as fragment so the user can read from their screen, but that's browser behaviour for web browsing, and should not be expected from other software.

Additionally, assuming the debate wasn't over, this would not be the appropriate place to have it, so please, drop it.

@Alphos: Thanks for replying,

not sure I understand fully why would you bring up the browsers as my link-redirect question is about software function or perhaps design (not the url redirection itself as the existing function of wikimedia software), so ultimately it is about how such links are going to be constructed (permitted, validated..) within wiki software, I think? -- so that’d be at a level, figuratively speaking, prior to rendering by browser.

Also, the (x|ht|xht)ml elements you mention, xml being about data and html & xhtml about presentation, yes all are about markup but of different nature, speaking of which... I am wondering if a description of an extension http://www.rddl.org/natures/ might be of any interest.

Debate is out of scope :) - I am still looking to answer some questions though.

I have just published a script for Pywikibot which loads problematic items from @Alphos' lists via screen scraping. It depends on another library which should reduce mess caused by merging Wikidata items. I hope this will help us reduce the number of duplicates with sitelinks redirected to other ones.

thiemowmde added a project: patch-welcome.

In an ealier post here, matej_suchanek posted a link to Wikilinks_and_redirects.

As someone experienced in closing RFCs, I'd like to note that the linked discussion is a pretty clear informal consensus that some Wikidata links to redirects are valid and appropriate. If anyone got an impression to the contrary, there key here is to cut through the bombardment of the discussion by a single persistent critic. By my count the discussion was two people opposed to such links, and all six other participants supporting them. It wasn't a formal RFC, but 75% support is generally a pretty clear consensus.

When redundant Wikidata items are directly or indirectly pointing to the same place, obviously they should be cleaned up. However it would be destructive to "clean up" distinct Wikidata items that deliberately point to redirects.

The real fix here is to fix Wikidata's broken restriction preventing two pages from linking to the same Wikidata item, and preventing a Wikidata item from linking to more than one page. You can't arrogantly assert that English gets to define concepts. A structure-X (Houpačka) may be one concept with one word in one language , whereas another language may consider horizontal-structure-X (seesaw) and vertical-structure-X (swing) to be two different concepts.

The English articles for seesaw and swing both need an interwiki links to the Czech article Houpačka, and the Czech article Houpačka needs interwiki links to both seesaw and swing. Until we can create those links, the closest thing we can do is have one of the English pages link to a redirect to the Czech page.

Followup: I just found an explicit Wikidata consensus Wikidata links to Wikipedia redirect pages are allowed.

Is this still a valid feature request? The main problem identified in the description (that some articles are linked to in Wikidata via redirects) doesn't seem to actually be a problem, as this is explicitly allowed on Wikidata, and desirable in many situations. If the feature request is just to remove unintentional links via redirects, that should be clarified in the title and description.

Michael closed this task as Resolved.EditedWed, Mar 13, 7:51 AM
Michael claimed this task.
Michael added a subscriber: Michael.

I think with sitelinks to redirects being explicitly allowed if they have the respective badge (to solve the "Bonnie & Clyde"-problem), this should be somewhat mitigated. Also, there has been basically no movement here in years.

The other tasks T143486 and T143485 are still relevant.

Please reopen if there is still a specific open request here, or you think this problem still persists in a significant fashion that needs addressing.