Page MenuHomePhabricator

Deploy InternetArchiveBot on the German Wikipedia (dewiki)
Closed, ResolvedPublic

Description

The German Wikipedia is the fourth largest Wikipedia, with just under 2 million articles. This should be the fourth wiki to be deployed to.

Event Timeline

Cyberpower678 renamed this task from Deploy InternetArchiveBot on the Cebuano Wikipedia (cebwiki) to Deploy InternetArchiveBot on the German Wikipedia (dewiki).May 24 2016, 11:33 PM
Cyberpower678 triaged this task as High priority.

It seems like we're heading towards no consensus here. :/

It seems like we're heading towards no consensus here. :/

I see only one comment here. Was this discussion adequately advertised? Village Pumps and such?

That is the village pump. It was also advertised elsewhere by a German user there.

The problem with dewiki is that they are strongly opposed to bots editing articles. They would rather deal with a backlog of broken links than let the bot replace links with archives.

With that being said, maybe the proposal can be adapted that the semiautomated tools be usable on dewiki and queuing the bot up on demand. That adds human control to the bot and may get the consensus we needed to deploy to dewiki.

I think we should take another stab at convincing dewiki to allow IABot on their articles. dewiki is a huge player in the top wikis.

Looks like this has turned into a full blown RfC. RfC still under construction.

Module deployed. Single page analysis tool can be used on articles.

Bot is now awaiting final approval.

@Cirdan While the bot issues and how it should behave are being discussed, it's probably good to mention here what should be changed so the bot behaves as it should.

The bugs which surfaced in the accidental test run are the following:

  • for {{Webarchiv}} the archive bot parameter is archiv-bot not arkiv-bot (I fixed all edits with AWB)
  • for {{Internetquelle}}, the parameter is archiv-bot as well and not archivebot (needs to be fixed, I can take care of a significant portion with AWB and then see whether the remaining ones are few enough to be fixed manually)
  • all archive bot parameter values should include both the name of the bot and a human-readable timestamp for user information and potential future analysis (its probably best to use the same format you are required to use for {{Internetarchiv}})

Regarding the double-links resulting from adding {{Webarchiv}}/{{Toter Link}} after an existing, non-template URL, I suggest to turn

[http://originalurl.tld original title] potential descriptive text

into

<!-- [http://originalurl.tld original title] -->{{Webarchiv|url=originalurl.tld|text=original title|...}} potential descriptive text

This makes it very easy to revert the bot edit manually, keeps any formatting of the original link, but does not result in two separate links to the same target.

Regarding the talk page notifications I'm not sure yet. The Meinungsbild does not require the bot to do that and I suggest to disable this feature for de-wiki. I'm working on a comprehensive how-to for the bot which we will link from the "this link was edited by a bot"-markers.

Perfect thanks.

Also per the bot requests page, the bot should not tag a cite template that is already using offline=1. On enwiki, deadurl=yes does nothing without and archive URL to go with it, but on dewiki the opposite is true.

You are talking about edits like this which Wi-luc-ky complained about, right? In that case, yes, the additional {{Toter Link}} is not necessary. {{Internetquelle}} (and all other cite-templates in fact) invoke {{Toter Link}} internally. So whenever there is an offline/deadurl parameter, it is sufficient to set this and add the archivebot-parameter as well. Then the correct information will be displayed (and if not, that's our task as a community to take care of it).

If the bot finds a template where the link is already marked as offline, it should of course still try to find an archived version and if it does, add both the archive URL as well as the archivebot-parameter.

The bugs which surfaced in the accidental test run are the following:

  • for {{Webarchiv}} the archive bot parameter is archiv-bot not arkiv-bot (I fixed all edits with AWB)
  • for {{Internetquelle}}, the parameter is archiv-bot as well and not archivebot (needs to be fixed, I can take care of a significant portion with AWB and then see whether the remaining ones are few enough to be fixed manually)
  • all archive bot parameter values should include both the name of the bot and a human-readable timestamp for user information and potential future analysis (its probably best to use the same format you are required to use for {{Internetarchiv}})

Regarding the double-links resulting from adding {{Webarchiv}}/{{Toter Link}} after an existing, non-template URL, I suggest to turn

[http://originalurl.tld original title] potential descriptive text

into

<!-- [http://originalurl.tld original title] -->{{Webarchiv|url=originalurl.tld|text=original title|...}} potential descriptive text

This makes it very easy to revert the bot edit manually, keeps any formatting of the original link, but does not result in two separate links to the same target.

@MGChecker and I have been discussing on IRC about this and regarding the last part it might be best to simply restore IABot's original behavior of simply replacing the link and then having the template render the original link when archiv-bot is set.

I just ran another trial and realized I forgot to fix the archiv-bot parameter bug. :/. I will have to fix those manually.

For English cite templates, it might be better to change the {{{archivebot|}}} to {{{archivebot|{{{archiv-bot|}}}}}} Otherwise I have to code in different use cases which is a pain.

We are still getting double links, resulting from adding {{Webarchiv}} with a Wayback ID to {{Webarchiv}} with a Webcite ID, or adding {{Webarchiv}} with a Webcite ID to an existing {{Webarchiv}} with a Webcite ID
Examples:

We are still getting double links, resulting from adding {{Webarchiv}} with a Wayback ID to {{Webarchiv}} with a Webcite ID, or adding {{Webarchiv}} with a Webcite ID to an existing {{Webarchiv}} with a Webcite ID
Examples:

As we discussed, this is not a bot problem, but someone manually ensured that the original link would still be visible. Prior to my change to the Webarchiv template two days ago, the original link would not be shown to the user. This edit shows this clearly and fixed it.

YAY! This was a tough challenge. Very well done! Congrats and thank you for all your work on this!