Page MenuHomePhabricator

IABot overwriting Ghostarchive
Closed, ResolvedPublic

Description

Sometimes, IABot will overwrite an another archive site a ghostarchive link that has already been placed in a cite web template when someone uses the management interface. Of course, this isn't a big deal, if wayback has the archived page as well then that should be fine, but sometimes there are good reasons for using another archive site then Wayback (e.g: broken websites on Wayback, certain content doesn't work on it).

Of course {{cbignore}} exists but i don't think thats necessarily the right way to go (or maybe it is)

Is this intended behavior or a bug?

Example with diffs here: https://en.wikipedia.org/w/index.php?title=GStreamer&action=history

EDIT 10/24/2021: Changed the title to more accurately describe whats going on

Event Timeline

Looks like Ghost Archive is a service the bot, and I, don't yet recognize as a legitimate archive service. The bot will need to be made aware of it first.

archivelover renamed this task from IABot overwriting other archive sites to IABot overwriting Ghostarchive .Oct 24 2021, 9:43 PM
archivelover updated the task description. (Show Details)
Cyberpower678 changed the task status from Open to Stalled.Dec 15 2021, 8:56 PM
Cyberpower678 triaged this task as Medium priority.

Ghostarchive is missing needed elements of snapshot data in the URLs themselves. They will need to make the URLs compliant first before IABot can accept them.

Cyberpower678 changed the task status from Stalled to Open.Dec 15 2021, 9:03 PM
Cyberpower678 raised the priority of this task from Medium to High.

Through experimentation i updated some of the url formats on here: https://en.wikipedia.org/wiki/Wikipedia:List_of_web_archives_on_Wikipedia.

If you don't want to implement the other URL formats one option is to convert any other format to the universal one (which you already implemented on https://github.com/internetarchive/internetarchivebot/commit/3c9b7066366af7719102e885f7b16e41fb805c7e ) using the original url.

Cyberpower678 claimed this task.

Again IABot overwrites Ghostarchive, as seen here: https://en.wikipedia.org/w/index.php?diff=prev&oldid=1091413557&diffmode=source. While the Wayback archive thankfully works, this should otherwise never be the instance since, for example, Wayback cannot archive Instagram but Ghostarchive can.