Page MenuHomePhabricator

cleanupSpam.php fails to work if domain is not http://
Open, MediumPublic

Description

I've been recently trying to use the cleanupSpam.php to get rid of spam pages as I've been doing lately. However since a week or so I cannot get any page deleted using the script because apparently no page is found using the written domain. However, those pages and domains are indeed present on the spam pages. Trying to use Special:LinkSearch or API:Extlinks do not display them. What can be the issue and how to fix it? Thanks.

Edit: the problem with linksearch seemed to be the https:// protocol. As for API:extlinks, sometimes they appear, sometimes they do not. So maybe this is an issue with the maintenance script only or something wrong with the wiki db, or both.

Edit 2: after further checks, domains starting with http:// are detected okay, but not those starting with https:// or other protocols.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 8 2018, 2:16 PM
MarcoAurelio triaged this task as High priority.Feb 8 2018, 2:17 PM

(hinders maintenance)

So this is weird: wikiadmin@deployment-db04[deploymentwiki]> select * from externallinks where el_from='3066'; displays all the external links on that page; but Special:Linksearch do not. Also cleanupSpam, which queries the extlinks table isn't able to query them either:

maurelio@deployment-tin:~$ mwscript cleanupSpam.php --wiki=deploymentwiki *.spam_domain_here --delete
Found 0 articles containing *.spam_domain_here

(replaced the actual spam domain with spam_domain_here to avoid keeping a permanent track of spam on Phabricator)

MarcoAurelio renamed this task from External links tables not updating/populating on Beta Cluster to Special:LinkSearch and cleanupSpam.php not working anymore on Beta Cluster.Feb 8 2018, 3:19 PM
MarcoAurelio renamed this task from Special:LinkSearch and cleanupSpam.php not working anymore on Beta Cluster to cleanupSpam.php not working anymore on Beta Cluster.Feb 8 2018, 8:52 PM
MarcoAurelio lowered the priority of this task from High to Medium.
MarcoAurelio updated the task description. (Show Details)
MarcoAurelio renamed this task from cleanupSpam.php not working anymore on Beta Cluster to cleanupSpam.php does not seem to be working anymore.Feb 11 2018, 7:08 PM
MarcoAurelio renamed this task from cleanupSpam.php does not seem to be working anymore to cleanupSpam.php fails to work if domain is https://.Feb 12 2018, 11:22 AM
MarcoAurelio renamed this task from cleanupSpam.php fails to work if domain is https:// to cleanupSpam.php fails to work if domain is not http://.
MarcoAurelio updated the task description. (Show Details)

Change 409845 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] Make cleanupSpam.php query for multiple different protocols

https://gerrit.wikimedia.org/r/409845

Change 409845 merged by jenkins-bot:
[mediawiki/core@master] Make cleanupSpam.php query for http and https

https://gerrit.wikimedia.org/r/409845