Page MenuHomePhabricator

IABot API: support for www.webarchive.org.uk
Closed, ResolvedPublic

Description

When attempting to upload a new URL for www.webarchive.org.uk via modifyurl it says

"errormesage": "The provided archive URL is not recognized as a supported archive URL."

Body:

action=modifyurl&urlid=3219517&overridearchivevalidation=1&archiveurl=https%3A%2F%2Fwww%2Ewebarchive%2Eorg%2Euk%2Fwayback%2Farchive%2F20151128210021mp%5F%2Fhttp%3A%2F%2Fnewsroom%2Eherefordshire%2Egov%2Euk%2F2006%2Fnovember%2Fnew%252Dsculpture%252Dto%252Dbe%252Dhanded%252Dover%2Easpx

Event Timeline

Restricted Application added a project: Internet-Archive. · View Herald TranscriptAug 1 2019, 2:55 PM
Restricted Application added a subscriber: Cyberpower678. · View Herald Transcript

The site is the national archives and libraries of the UK.

It uses a standard URL syntax, example, no other variants, that I know of:

http://www.webarchive.org.uk/wayback/archive/20100602000217/www.westsussex.gov.uk/ccm/navigation/your-council/election

Cyberpower678 closed this task as Resolved.Aug 18 2019, 3:21 PM
Cyberpower678 claimed this task.
Green_Cardamom added a comment.EditedThu, Aug 22, 3:10 PM

API reports invalid archive

{
    "result": "fail",
    "urldataerror": "invalidarchive",
    "errormesage": "The provided archive URL is not recognized as a supported archive URL.",
    "loggedon": true,
    "username": "GreenC bot",
    "checksum": "d7f29210159510b11f60c17eb3625e26",
    "csrf": "0978ebbd24e8defb62dedb1aa4f240ca",
    "servetime": 0.0866
}
action=modifyurl&urlid=54654663&overridearchivevalidation=1&archiveurl=https%3A%2F%2Fwww%2Ewebarchive%2Eorg%2Euk%2Fwayback%2Farchive%2F20131026090256mp%5F%2Fhttp%3A%2F%2Fwww%2Ebbc%2Eco%2Euk%2Fsport%2F0%2Ffootball%2F24576598&reason=new%20archive%20%7C%20iab%2Eawk%20%7C%20iab20180915%2D20181108%2E52601%2D71009%2Fnewaltarch%20%7C%20Albert%20Adomah%20%7C%20url%2054654663&token=<redacted>&checksum=d7f29210159510b11f60c17eb3625e26&wiki=enwiki
Green_Cardamom reopened this task as Open.Thu, Aug 22, 3:10 PM
Cyberpower678 added a comment.EditedThu, Aug 22, 3:17 PM

API reports invalid archive

{
    "result": "fail",
    "urldataerror": "invalidarchive",
    "errormesage": "The provided archive URL is not recognized as a supported archive URL.",
    "loggedon": true,
    "username": "GreenC bot",
    "checksum": "d7f29210159510b11f60c17eb3625e26",
    "csrf": "0978ebbd24e8defb62dedb1aa4f240ca",
    "servetime": 0.0866
}
action=modifyurl&urlid=54654663&overridearchivevalidation=1&archiveurl=https%3A%2F%2Fwww%2Ewebarchive%2Eorg%2Euk%2Fwayback%2Farchive%2F20131026090256mp%5F%2Fhttp%3A%2F%2Fwww%2Ebbc%2Eco%2Euk%2Fsport%2F0%2Ffootball%2F24576598&reason=new%20archive%20%7C%20iab%2Eawk%20%7C%20iab20180915%2D20181108%2E52601%2D71009%2Fnewaltarch%20%7C%20Albert%20Adomah%20%7C%20url%2054654663&token=<redacted>&checksum=d7f29210159510b11f60c17eb3625e26&wiki=enwiki

That URL is not consistent with the format you gave me. That's why. The timestamp is followed by 'mp_'

Hmm.. ok I guess it will sometimes have extraneous after the 14-digit can that be ignored or do you need literal strings?

Green_Cardamom added a comment.EditedThu, Aug 22, 3:33 PM

Looking through logs.. it it usually 14-digit clean, but when not it is always "<14digit>mp_" so the literal string would be safe to check for.

Cyberpower678 closed this task as Resolved.Thu, Aug 22, 3:48 PM