Page MenuHomePhabricator

arkivurl= should contain actual URL, not a Wikimedia-related test URL
Closed, ResolvedPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

There are no other articles containing the link.

This is an issue with the URL resolver, resolving it to "https:///". Since they are all seen as the same URL, it's inserting the same garbage URL into the sources. Since the problem is only limited to these articles, the bot can safely operate without fear of additional disruption. If it continues to make this errors, the run page for the bot can be found at https://en.wikipedia.org/wiki/User:InternetArchiveBot

I fixed what was causing the first diff to happen. It wasn't resolving the template's URL properly.

I will point out that the URL in the second diff is broken, not to be confused with dead. The URLs are improperly formatted, and not even a web browser can read them right. I'm just getting a blank tab when I click on it. They need to be fixed on wiki.

I'm not going to fix the issue in the second diff. If a browser can't open it, then IABot can't be expected to handle it correctly either.

@Cyberpower678: the bot does not need to fix broken stuff; the bot just needs to not to make it worse.

(FWIW, the second URL is double-escaped, and has repeated '://' at the front;

urllib.unquote(URL).replace('://://','://')

gives:

Please can the revision control (Git) link to the proposed fix be posted here?

I did however add a check to the routine to ignore instances where it can't read the URL correctly.

@Cyberpower678: the bot does not need to fix broken stuff; the bot just needs to not to make it worse.

(FWIW, the second URL is double-escaped, and has repeated '://' at the front;

urllib.unquote(URL).replace('://://','://')

gives:

Please can the revision control (Git) link to the proposed fix be posted here?

The git link isn't there yet as it's not committed yet, nor is it deployed.

https://sv.wikipedia.org/w/index.php?title=Studenthuset,_Stockholms_universitet&diff=39866435&oldid=39835828

Is an edit with the new code, and as you can see it ignored the badly formatted URLs.

@Cyberpower678: The combined diff for "v1.3.2" is:

Which change is the "Added a sanity check to ignore URLs it can't process correctly." part?