Page MenuHomePhabricator

Unable to update URL even though API reports success
Closed, InvalidPublic

Description

For ID 23388724

https://tools.wmflabs.org/iabot/index.php?page=manageurlsingle&url=http%3A%2F%2Fhitparade.ch%2Fshowitem.asp%3Finterpret%253DNerina%252BPallot%2526titel%253DEverybody%2527s%252BGone%252BTo%252BWar%2526cat%253Ds

Sending post data:

action=modifyurl&urlid=23388724&overridearchivevalidation=1&archiveurl=http%3A%2F%2Fwww%2Ewebcitation%2Eorg%2F6IBvTspq8%3Furl%3Dhttp%3A%2F%2Fhitparade%2Ech%2Fshowitem%2Easp%3Finterpret%253DNerina%252BPallot%2526titel%253DEverybody%2527s%252BGone%252BTo%252BWar%2526cat%253Ds

The archiveurl decodes as:

http://www.webcitation.org/6IBvTspq8?url=http://hitparade.ch/showitem.asp?interpret%3DNerina%2BPallot%26titel%3DEverybody%27s%2BGone%2BTo%2BWar%26cat%3Ds

It reports "success" however looking at the new Archive URL it didn't change and the URL log in the interface reports "Changed the archive URL to >the old URL>" which is strange because I sent a new URL in the post.

Event Timeline

The archive URL is getting sanitized. It's making the URL look nicer since it has no bearing on what loads.

True because WebCite drops anything beyond ?url .. and theoretically it should work if it's "+" or "%20" in the query. But what if it is a site that is not flexible these ways.

The original URL uses "+" instead of "%20":

http://hitparade.ch/showitem.asp?interpret=Nerina+Pallot&titel=Everybody%27s+Gone+To+War&cat=s

Thus, archive.is has it available as "+":

http://archive.fo/20130703190443/http://hitparade.ch/showitem.asp?interpret=Nerina+Pallot&titel=Everybody's+Gone+To+War&cat=s

But not in "%20":

https://archive.fo/http://hitparade.ch/showitem.asp?interpret=Nerina%20Pallot&titel=Everybody%27s%20Gone%20To%20War&cat=s

Or:

https://archive.fo/20130703190443/http://hitparade.ch/showitem.asp?interpret=Nerina%20Pallot&titel=Everybody%27s%20Gone%20To%20War&cat=s

This is ultimately a failing of archive.is not being flexible, but our problem also

I don't have a good solution. The problem arises later when trying to hunt down new archives and relying on the data in the url= portion. In this case it's a cite web so the original URL using "+" is retained, but not always.

True because WebCite drops anything beyond ?url .. and theoretically it should work if it's "+" or "%20" in the query. But what if it is a site that is not flexible these ways.

The original URL uses "+" instead of "%20":

http://hitparade.ch/showitem.asp?interpret=Nerina+Pallot&titel=Everybody%27s+Gone+To+War&cat=s

Thus, archive.is has it available as "+":

http://archive.fo/20130703190443/http://hitparade.ch/showitem.asp?interpret=Nerina+Pallot&titel=Everybody's+Gone+To+War&cat=s

But not in "%20":

https://archive.fo/http://hitparade.ch/showitem.asp?interpret=Nerina%20Pallot&titel=Everybody%27s%20Gone%20To%20War&cat=s

Or:

https://archive.fo/20130703190443/http://hitparade.ch/showitem.asp?interpret=Nerina%20Pallot&titel=Everybody%27s%20Gone%20To%20War&cat=s

This is ultimately a failing of archive.is not being flexible, but our problem also

I don't have a good solution. The problem arises later when trying to hunt down new archives and relying on the data in the url= portion. In this case it's a cite web so the original URL using "+" is retained, but not always.

Every service is handled differently internally. It converts them appropriately. So for archiveis, it will preserve the formatting. If there are any cases that come up that actually results in the URL not loading, please do report those.