Page MenuHomePhabricator

InternetArchiveBot for Wikimedia Commons
Open, Needs TriagePublic


Currently the InternetArchiveBot does not run on Wikimedia Commons, as many images are uploaded from external sources, for proper attribution the copyright status of an image will have to be fetched from sites such as Verizon's Flickr and others, though currently all of this information IS copied when imported it is not impossible for Flickr-users (or other providers of media) to change the status of the copyright, or for websites to cease to exist, for that reason I would like to suggest running the InternetArchiveBot on Wikimedia Commons and allowing it to cache websites from the upload date and the date a (new) link was added (which can happen in cases of misattribution).

The InternetArchiveBot should mostly scan "Source" but also all other links in the description and other areas of a file as sometimes media files can have multiple links for attribution or older files can have their sources in the description.

Event Timeline

DonTrung created this task.Mar 21 2018, 2:17 AM
Restricted Application added subscribers: Cyberpower678, Aklapper. · View Herald TranscriptMar 21 2018, 2:17 AM

Comment, This mostly concerns linkrot, for a good example please see:


Restricted Application assigned this task to Cyberpower678. · View Herald TranscriptMar 21 2018, 4:03 AM
revi added a subscriber: revi.Mar 22 2018, 6:41 AM
Hmxhmx added a subscriber: Hmxhmx.Jul 8 2019, 2:49 PM
4nn1l2 added a subscriber: 4nn1l2.Jul 14 2019, 3:24 AM
whym awarded a token.Aug 25 2019, 2:29 AM

What's the status on this? Having the ability to summon InternetArchiveBot would be extremely useful for licence checking.

@mdaniels5757: Hi, see the upper left corner: "Open". Someone needs to volunteer to work on this task...

@Aklapper I'd be happy to work on this, but have no experience with programming/deploying this bot. What steps could I help with?

@Cyberpower678 What steps could I assist with to get this done?

Aklapper removed Cyberpower678 as the assignee of this task.Apr 16 2020, 8:17 AM

@Cyberpower678: I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you realistically plan to work on it (via Add Action...Assign / Claim in the dropdown menu). Thanks for your understanding!

Nintendofan885 renamed this task from InternetArchiverBot for Wikimedia Commons to InternetArchiveBot for Wikimedia Commons.Sep 19 2020, 6:37 PM
Nintendofan885 updated the task description. (Show Details)