Page MenuHomePhabricator

InternetArchiverBot for Wikimedia Commons
Open, Needs TriagePublic


Currently the InternetArchiverBot does not run on Wikimedia Commons, as many images are uploaded from external sources, for proper attribution the copyright status of an image will have to be fetched from sites such as Verizon's Flickr and others, though currently all of this information IS copied when imported it is not impossible for Flickr-users (or other providers of media) to change the status of the copyright, or for websites to cease to exist, for that reason I would like to suggest running the InternetArchiveBot on Wikimedia Commons and allowing it to cache websites from the upload date and the date a (new) link was added (which can happen in cases of misattribution).

The InternetArchiveBot should mostly scan "Source" but also all other links in the description and other areas of a file as sometimes media files can have multiple links for attribution or older files can have their sources in the description.

Event Timeline

DonTrung created this task.Mar 21 2018, 2:17 AM
Restricted Application added subscribers: Cyberpower678, Aklapper. · View Herald TranscriptMar 21 2018, 2:17 AM

Comment, This mostly concerns linkrot, for a good example please see:


Restricted Application assigned this task to Cyberpower678. · View Herald TranscriptMar 21 2018, 4:03 AM
Hmxhmx added a subscriber: Hmxhmx.Jul 8 2019, 2:49 PM
4nn1l2 added a subscriber: 4nn1l2.Jul 14 2019, 3:24 AM
whym awarded a token.Aug 25 2019, 2:29 AM

What's the status on this? Having the ability to summon InternetArchiveBot would be extremely useful for licence checking.

@mdaniels5757: Hi, see the upper left corner: "Open". Someone needs to volunteer to work on this task...

@Aklapper I'd be happy to work on this, but have no experience with programming/deploying this bot. What steps could I help with?

@Cyberpower678 What steps could I assist with to get this done?