Page MenuHomePhabricator

Mark URLs to pre-2018 pages on Tour de France official websites as dead
Closed, ResolvedPublic

Description

Every URL pre-2018 from letour.com and letour.fr redirects to the root of both. The bot is seeing these as alive and not rescuing them. Can someone please mark these domain as dead or partially dead? https://en.wikipedia.org/wiki/2012_Tour_de_France for letour.com and https://en.wikipedia.org/wiki/Col_de_la_Bonette for letour.fr.

Event Timeline

BaldBoris updated the task description. (Show Details)

Thanks for the report. For the record, there are about 340 URLs in the database, of which a majority is affected.

@Cyberpower678 Is there a batch process similar to reporting false positives? Otherwise, I suggest to mark the domain as dead, run the bot on all affected pages, and then change it back.

Cirdan renamed this task from Tour de France official websites to Mark URLs to pre-2018 pages on Tour de France official websites as dead.Jul 31 2018, 4:30 PM
Cirdan updated the task description. (Show Details)

I'd say just let the bot eventually see the dead ones as dead.

The bot has only picked up the letour.com links and not the letour.fr as can be seen in this September 2018 diff https://en.wikipedia.org/w/index.php?title=2016_Tour_de_France&diff=860380679&oldid=858035945

Yea. It would seem that letour.fr is actually redirecting to https://www.letour.fr/fr/landing-page which is not the root page. That's why the bot isn't actually picking up on it.

Can I blacklist the entirety of letour.fr or do I have to run a script to mark every single with an access date from the year 2017 and earlier dead?

Letour.com is dead, so blacklist that. Letour.fr has only kept pages for the current year apart from http://histo.letour.fr/HISTO/us/TDF/, so mark everything from 2018 and earlier dead.