Page MenuHomePhabricator

Bot is changing archive.ph to web.archive.org
Closed, ResolvedPublic

Description

Affected wiki(s):
Enwiki + others

Diff(s) (if applicable):
*1111734025 (User-run)
*1116544669 (Automatic)
*1116625546 (User-run, as a test for this report)
*63143717 (On nlwiki, user-run, as a test for this report)

What is happening?:
Bot is changing an archive.ph (a.k.a. archive.today) link located in the |archive-url= parameter to a web.archive.org one.

What should happen instead?:
Bot should not change the link.

List of steps to reproduce (step by step, including full links if applicable):
*Insert any archive.ph url at |archive-url=
*Run bot with the single-page analyzer or submit a job

Other information:
When trying to fix this manually in the URL management tool, I get "URL data error: The archive URL given is not a valid archive." Was able to reproduce on nlwiki.

Related Objects

Event Timeline

Ok, that's... weird. Note that anything with archive.today redirects to archive.ph (including the link above), but it appears that neither http://archive.today/20220922072433.... nor http://archive.ph/20220922072433.... used in this article triggers the bot into changing it the way I described in the initial report (Diffs 1, 2), so despite getting the longer form being a bit more inconvenient it seems like a reasonable alternative.

Styyx renamed this task from Bot is changing archive.ph to web.archve.org to Bot is changing archive.ph to web.archive.org.Oct 17 2022, 3:26 PM
Harej triaged this task as Low priority.Oct 17 2022, 8:10 PM
Harej moved this task from Inbox to Backlog: URLs on the InternetArchiveBot board.

so despite getting the longer form being a bit more inconvenient it seems like a reasonable alternative.

technically it should be the long form anyway per https://en.wikipedia.org/wiki/Help_talk:Using_archive.today#RfC:_Should_we_use_short_or_long_format_URLs?

Cyberpower678 raised the priority of this task from Low to High.Dec 4 2022, 1:03 AM
Cyberpower678 added a subscriber: Natuur12.

It is clear that the operators of archive.ph have no interest, or ability, to address the issue with IABot not being able to communicate with their servers. A workaround, that doesn't require users to clean up after the bot, or appease the bot needs to be implemented in the bot code.

Code is being tested.

Old behavior of the bot is that any archive URL that couldn't be validated directly from the URL, needed to be validated by making a request to the archive provider. If the request fails for any reason, the archive URL would fail validation and the archive URL would be considered invalid, resulting in subsequent replacement on wiki. The original solution was to have the archive provider simply exempt the bot from anti-bot checks, but this never happened.

The new behavior of the bot will now instead a raise a partially validated flag that the bot code will look for when vital metadata is missing. It will not acknowledge the archive URL as valid, but it will not replace the archive URL on wiki.

Cyberpower678 claimed this task.