Page MenuHomePhabricator

Create a service to crawl external links and test them against known bad sites list
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):
Implement a service which crawls all known external links embedded in Wikipedia and compares them to a service like https://www.stopbadware.org or a similar service to detect domains that are now serving up badware/virusses and/or other malware, possibly due to a domain owner change.

Use case(s)
Wikipedia has lots of pages with external links and many of those articles are over a dozen year old now. As the internet ages, many of this domains change owner, because for various reasons the original owner lets go of those old domains. These domains often get picked up by domain squatters and or malware publishers.

Benefits (why should this be implemented?):
As these links still are part of our corpus it would be good to scan these external links, so that we can detect and protect ppl from getting infected with malware.
Either by showing warnings before opening the link, or by making the link unlinked for instance...

Related Objects