Page MenuHomePhabricator

Start a discussion on English Wikipedia about allowing IABot to add archive URLs for non-dead links
Closed, DeclinedPublic

Description

Per T177676, we should see if the English Wikipedia community supports enabling this option.

Event Timeline

Restricted Application added a subscriber: Cyberpower678. · View Herald Transcript

I certainly oppose relying on archive.is. It has proven to be notoriously unreliable. In 40% of the cases you will get a bad archive snapshot from them.

Providing an analysis of proactive archiving, while it’s beneficial to ensure the snapshot exists. Even if Enwiki adopts proactive archiving of all links, which I doubt it will, it is most certainly going to be rejected on other wikis. This I don’t foresee IABot’s current functionality getting replaced in the foreseeable future. This is already a highly controversial subject and on enwiki and will undoubtedly end with no consensus from my observations so far.

With that being said IABot’s DB may be large, it’s certainly not useless. IABot is maintaining a lot of archive URLs confirmed to be working with various services.

Did I miss anything. The analysis was rather long.

@Cyberpower678: Thanks for the comments. If you're correct that English Wikipedia will reject proactive archiving, that will basically leave us at the status quo (which is fine with me). It's probably worth at least having the discussion though to see what folks think of the idea.

Some brief discussions have been had already on the topic of using IABot for live links: BOTN July 2017, BOTN September 2017, VPM October 2017.

@Cyberpower678: Thanks for the comments. If you're correct that English Wikipedia will reject proactive archiving, that will basically leave us at the status quo (which is fine with me). It's probably worth at least having the discussion though to see what folks think of the idea.

No doubt. I'm not going to say it's happening with 100% with certainty, but https://tools.wmflabs.org/iabot?page=runbotsingle has stirred up some controversy with the additional option users are provided, namely the "Add archives to all non-dead references (optional)" bit. Some users have been doing with the belief that having the archives there while the links are still alive is useful, while others feel it bloats the page needless and also confuses the readers. As far as my observations go, what IABot should be doing, as well as how the users should be using my tool, seems to be pretty divided.

Even if the proposal passes and all the links now have archives added to them, IABot would still be useful to maintain said DB. It doesn't have to edit Wikipedia, but it can still crawl it and learn. It uses what it learns and applies it to other Wikipedias. Not to mention the bot has an API. Tools and gadgets can be written to utilize it. The DB is optimized for performance, and can handle quite a bit of stress. See the instance cyberbot-db-01 at Grafana (https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?var-project=cyberbot&var-server=All&from=now-1h&to=now&orgId=1&refresh=10s). At the moment the bot has 32 concurrent connections open to it, all of them active. See https://meta.wikimedia.org/wiki/InternetArchiveBot/API to have a read at what automated processes can do with IABot. A bot is already tied into IABot.

We should also definitely mention use of the dead-url parameter as that seems to be something that folks aren't clear about.

Looking at the discussions that Max linked to, I don't think we're going to get to a good consensus around this. The positions seem to be:

a) this is obviously useful, why wait for the link to go dead?
b) this is unnecessary, adds bloat to the wikitext and increases page load time

In the three discussions from July, September and October, I didn't see anybody say "you're right, I've changed my opinion." The new information that we could add to the discussion is that 73 people support-voted this and nobody opposed it in the survey, but seeing how all three discussions played out, I'm not sure it's worth trying to create a more authoritative RfC.

From my reading of the discussions, it seems like just as many are in favor of it as there are those opposed to it. The reason it's been brought up over and over is that some people strongly feel the need for it. I think a final RfC would help in finding consensus either way. We could ping the 73 people who voted for it to chime in on the RfC as well.

Aside, I don't buy the argument about the article size increase that people have been citing in the discussions as a reason to not do this.

Looking at the discussions that Max linked to,...

:( But I linked to them!

From my reading of the discussions, it seems like just as many are in favor of it as there are those opposed to it. The reason it's been brought up over and over is that some people strongly feel the need for it. I think a final RfC would help in finding consensus either way. We could ping the 73 people who voted for it to chime in on the RfC as well.

Indeed, it is mixed. However, when you're proposing it for 5.5 million pages and a hundred thousand authors (or the active 2,000 anyway), 50% isn't a great ratio.

The argument to preserve verifiability probably hasn't been heard well enough. Or stated well enough...

Aside, I don't buy the argument about the article size increase that people have been citing in the discussions as a reason to not do this.

It is a really terrible argument from the point of view of the end HTML served to the reader (images of course dominating that bit stream), but it can be difficult in wikitext when every ref has a usually really long archive URL in it. (I prefer LDR generally so it's not a problem in the articles I mostly write, but in [[Barack Obama]]? Or even the generic article in which everyone still has to play nice together.) Wikitext folding and highlighting by default are on the way or so I hear, which will help with that particular problem.

@Izno, sorry -- I misread the icon, and thought you were Max. :)

I totally agree that "wikitext bloat" is not a persuasive argument, especially now that we have syntax highlighting.

People seem to be triggered by edits like this, just because it's adding 563 archive urls in one edit, instead of three urls, which people probably wouldn't care as much about. Seeing a bot make hundreds of changes at once makes people think that something is "out of control", and needs to be stopped.

Part of my reluctance to start another discussion about this is that it's the beginning of December, and we should be wrapping up work on 2017's wishes, not beginning a discussion about a project we haven't started to work on yet. I know that from the point of view of the people who voted for this wish, that's not fair or logical, but that's the way we have to work, on this unusual team.

@Izno, sorry -- I misread the icon, and thought you were Max. :)

I totally agree that "wikitext bloat" is not a persuasive argument, especially now that we have syntax highlighting.

People seem to be triggered by edits like this, just because it's adding 563 archive urls in one edit, instead of three urls, which people probably wouldn't care as much about. Seeing a bot make hundreds of changes at once makes people think that something is "out of control", and needs to be stopped.

Part of my reluctance to start another discussion about this is that it's the beginning of December, and we should be wrapping up work on 2017's wishes, not beginning a discussion about a project we haven't started to work on yet. I know that from the point of view of the people who voted for this wish, that's not fair or logical, but that's the way we have to work, on this unusual team.

It’s not like it’s much of a project. You would simply flip a switch and the project is practically finished.

TBolliger removed a project: Community-Tech.
TBolliger added a subscriber: DannyH.