Page MenuHomePhabricator

Unable to retrieve IA metadata for any item
Closed, ResolvedPublic

Description

No IA items' metadata can be retrieved by IA-Upload.

Original report:

'20231002_20231002_0537' is not a valid Internet Archive identifier

Hello. For some reason it doesn't recognize https://archive.org/details/20231002_20231002_0537 as a valid IA item. I want to upload this book via IA tool.

Event Timeline

Are any IA items working?

The above item seems to be working fine: https://archive.org/details/20231002_20231002_0537?output=json

But I can't request it from a Toolforge webservice.

In the Toolforge account:

$ curl -I https://archive.org/details/20231002_20231002_0537?output=json
HTTP/2 200 
server: nginx/1.24.0
date: Mon, 11 Nov 2024 01:06:08 GMT
content-type: application/json
strict-transport-security: max-age=15724800
referrer-policy: no-referrer-when-downgrade

But not in the webservice container:

$ toolforge webservice shell

$ curl -I https://archive.org/details/20231002_20231002_0537?output=json
curl: (7) Failed to connect to archive.org port 443: Connection timed out
$ curl -I https://archive.org/details/history-of-telegraphy-wa-3?output=json
curl: (7) Failed to connect to archive.org port 443: Connection timed out

Sorry, ignore me! ia-upload is not running on Toolforge any more, it's got its own VPS. :-/

However, the issue is the same:

$ curl -I https://archive.org/details/20231002_20231002_0537?output=json
curl: (28) Failed to connect to archive.org port 443 after 129880 ms: Couldn't connect to server
$ curl -I https://archive.org/details/history-of-telegraphy-wa-3?output=json
curl: (28) Failed to connect to archive.org port 443 after 130678 ms: Couldn't connect to server
Samwilson renamed this task from '20231002_20231002_0537' is not a valid Internet Archive identifier to Unable to retrieve IA metadata for any item.Nov 11 2024, 7:06 AM
Samwilson updated the task description. (Show Details)
Samwilson added a subscriber: HLHJ.

There haven't been any new uploads in over 30 days so you won't find anything in commons:Special:RecentChanges (e.g., the recent-uploads link at the top of the tool page).

The error message is no-found-on-ia so this is likely src/Controller/UploadController.php lines 252 or 369. But if you cannot get to the metadata from the tool shell prompt it probably isn't and ia-upload code issue.

It appears to have gone offline about the same time as Internet Archive had their recent devastating attacks so this might be due to changes at IA to mitigate such.

And on that note, wouldn't it be good to switch the tool from using https://archive.org/details/iaId?output=json to https://archive.org/metadata/iaId? I realize the JSON formats are slightly different but it seems to be better documented.

Sorry, ignore me! ia-upload is not running on Toolforge any more, it's got its own VPS. :-/

However, the issue is the same:

$ curl -I https://archive.org/details/20231002_20231002_0537?output=json
curl: (28) Failed to connect to archive.org port 443 after 129880 ms: Couldn't connect to server
$ curl -I https://archive.org/details/history-of-telegraphy-wa-3?output=json
curl: (28) Failed to connect to archive.org port 443 after 130678 ms: Couldn't connect to server

I get the same thing from a PAWS terminal but it appears to work fine from login.toolforge.org. Maybe this is some sort of VPS security issue? FYI: I also used -k because the curl I used did not seem to know about the archive.org TLS certificate.

I used dig to check the DNS and they seemed the same.

FWIW, one of my tools which relies on the Internet Archive also always times out. Perhaps the Internet Archive has temporarily(?) blocked some WMCS IPs following the outage?

FWIW, one of my tools which relies on the Internet Archive also always times out. Perhaps the Internet Archive has temporarily(?) blocked some WMCS IPs following the outage?

That is possible but I was thinking it might be a local firewall issue, see Firewall - Wikitech

Jason Scott has suggested that the IA could be blocking us, so I've emailed info@archive.org to see if there's anything that can be done. I wouldn't be surprised if the Toolforge IPs have been blocked, considering they must see somewhat higher traffic from them. It sounds like IA is still in recovery mode, so we should be patient.

Samwilson claimed this task.
Samwilson added a subscriber: Marnanel.

It looks like things are working again! Uploads started working on 16 November.