Page MenuHomePhabricator

Stop expanding perma.cc to long-form URL
Closed, ResolvedPublic

Description

perma.cc has short-form URLs:

https://perma.cc/9CRH-WVZ8

It was possible to expand to long-form:

https://perma-archives.org/warc/20161118180310/http://www.vexillology.org/FB-234-WSO-20161117-Final.pdf

However this no longer works, the perma-archive.org domain is no longer operational for retrieving archives. They seem to no longer support it.

IABot shold continue supporting perma.cc in the ability to add them to the database, but it should not try to convert to long-form or do any operations on them.

If this is not possible due to a missing 14-digit date, it should ignore perma.cc links entirely.

Update: Re-open due to still active bugs. In comments.

Event Timeline

Restricted Application added a subscriber: Cirdan. · View Herald Transcript
Cirdan renamed this task from perma-archives.org adds extra slashes to IABot adds extra slashes to perma-archives.org.May 18 2018, 1:36 PM
Cirdan updated the task description. (Show Details)
Cirdan triaged this task as Low priority.
Cirdan added a subscriber: Cyberpower678.

@Green_Cardamom Can you provide a couple more diffs, please? This way it would be easier to spot what's wrong in APII.php where these URLs are generated.

@Cirdan Here you go:

Samuel Alito - shows recursive

Last two edits of "Nominated Member of Parliament"

Settlement (litigation) - real-time expansion from short-form

Whitney Smith has an invalid snapshot date of 1970. Caused by a header redirect. A header query shows:

Location: https://perma-archives.org/warc/20161118180310/http://www.vexillology.org/FB-234-WSO-20161117-Final.pdf

Providing the correct snapshot information.

Mikhail Farikh - another 1970 header redirect. Correct shapshot date in header.


need any more let me know

Cirdan moved this task from Backlog to IABot on the User-Cirdan board.

@Green_Cardamom Thanks. Does the problem also appear in recent IABot edits? Hopefully this has been fixed in 2.0. If not, I'm still happy to have a look at it, but I would like to verify that the bug is still there.

Nothing in the archive handling routines have been updated yet. That will happen when I process the tickets in the archive requests section.

Vvjjkkii renamed this task from IABot adds extra slashes to perma-archives.org to h1caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii removed Cirdan as the assignee of this task.
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Green_Cardamom renamed this task from h1caaaaaaa to IABot adds extra slashes to perma-archives.org.Jul 1 2018, 4:20 AM
Green_Cardamom assigned this task to Cirdan.
Green_Cardamom lowered the priority of this task from High to Medium.
Green_Cardamom updated the task description. (Show Details)
Green_Cardamom removed a subscriber: Cirdan.
Cirdan lowered the priority of this task from Medium to Low.Jul 2 2018, 4:50 AM
Cirdan added a project: User-Cirdan.
Cirdan subscribed.
Green_Cardamom renamed this task from IABot adds extra slashes to perma-archives.org to Stop expanding perma.cc to long-form URL.Aug 9 2018, 9:05 PM
Green_Cardamom updated the task description. (Show Details)

I've changed the ticket to reflect no longer supporting perma.cc since they don't have a long-form option that works.

Cirdan removed a project: User-Cirdan.
Green_Cardamom raised the priority of this task from Low to Medium.Aug 10 2018, 2:45 AM

Also note I sent an email to perma.cc explaining the situation at Wikipedia that we need long-form URLs with date, but have not had a response. Also their API isn't working anymore (the one at /timemap/).

Update: I've removed all perma-archive.org links from the IABot database and Enwiki (converted to other providers).

Long-form URLs are functional again.

Still bug(s). Testcases:

https://en.wikipedia.org/w/index.php?title=User%3AGreenC%2Ftestcases%2Fpermacc&type=revision&diff=867715234&oldid=867715190

Case #2 is adding an extra slash. It also ignores anything not in an archiveurl not sure if that is a bug or intentional.

Green_Cardamom updated the task description. (Show Details)
Cyberpower678 claimed this task.