Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Files
F16912142
list_archive_links.py
Tgr (Gergő Tisza)
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Authored By
Tgr
Apr 11 2018, 8:50 AM
2018-04-11 08:50:28 (UTC+0)
Size
942 B
Referenced Files
None
Subscribers
Cirdan
list_archive_links.py
View Options
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""List archive.org REST API URLs needed to archive all external links
from a given wiki to a given domain.
Usage:
list_archive_links.py <wiki-domain> <link-domain>
"""
import
bisect
import
urllib
from
docopt
import
docopt
from
wikitools
import
wiki
,
api
arguments
=
docopt
(
__doc__
)
site
=
wiki
.
Wiki
(
'https://
%s
/w/api.php'
%
arguments
[
'<wiki-domain>'
])
params
=
{
"action"
:
"query"
,
"format"
:
"json"
,
"list"
:
"exturlusage"
,
"euprop"
:
"url"
,
"euquery"
:
arguments
[
'<link-domain>'
],
"eulimit"
:
"max"
,
"euexpandurl"
:
1
,
}
protocols
=
[
'http'
,
'https'
];
archive_url
=
'https://web.archive.org/save/
%s
'
urls
=
{}
for
protocol
in
protocols
:
params
[
'euprotocol'
]
=
protocol
for
result
in
api
.
APIRequest
(
site
,
params
)
.
queryGen
():
for
item
in
result
[
'query'
][
'exturlusage'
]:
urls
[
item
[
'url'
]]
=
True
for
url
in
urls
:
print
archive_url
%
url
File Metadata
Details
Attached
Mime Type
text/x-python
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
5725220
Default Alt Text
list_archive_links.py (942 B)
Attached To
Mode
T184856: Deploy InternetArchiveBot on the Hungarian Wikipedia (huwiki)
Attached
Detach File
Event Timeline
Cirdan
subscribed.
Apr 12 2018, 4:34 AM
2018-04-12 04:34:56 (UTC+0)
Log In to Comment