Page MenuHomePhabricator

Beetstra (Dirk Beetstra)
Research

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 6:11 AM (497 w, 6 d)
Availability
Available
IRC Nick
Beetstra
LDAP User
Unknown
MediaWiki User
Beetstra [ Global Accounts ]

Recent Activity

Dec 14 2023

Beetstra added a comment to T352827: Directory traversal allows single-page whitelisting to override entire spam-blacklist entry.

I don’t mind the patching, but first I have not seen this being done
on-wiki. It would be only applying to the cases where we whitelist a
suitable path (unless it also works on specific documents? Does this work
on badsite.com/path/goodtxt2read.pdf?), which does not seem giving
malicious editors a lot of chance to apply this (you need a suitable
whitelisted link).

Dec 14 2023, 4:12 AM · SecTeam-Processed, Vuln-Misconfiguration, SpamBlacklist, Security, Security-Team

Jun 14 2023

Beetstra added a comment to T313107: Grant bots the sboverride userright.

if a vandalism edit removes the blacklisted link, and then a good follow up edit makes it impossible to revert the vandalism edit you cannot edit the vandalised information back in because you are not a bot.

It has always worked this way, the change discussed here did not change this mechanism, did not make it worse, so this is not the place to make such claims. And secondly, you can roll back such edit, if you are rollbacker.

Jun 14 2023, 3:22 AM · User-notice-archive, MediaWiki-User-management, SpamBlacklist
Beetstra updated subscribers of T313107: Grant bots the sboverride userright.

@MBH: on en.wiki, bots just don’t archive when they can’t save. Morever, the discussions are not lost, barring oversight and deletion, everything is still there. The better solution is to break the link so it is disabled. Well, actually the better solution is to have the SBL be able to distinguish namespaces
Bots should not repair spam links, those links should be removed. There is a reason why stuff is blacklisted, the community does not want it. You want spammers to intelligently maim redirect links so a bot repairs it for them and they have their link? Spammers spam because it makes them money. Any link they get gives them a chance it gets followed.

Jun 14 2023, 3:18 AM · User-notice-archive, MediaWiki-User-management, SpamBlacklist

Jun 13 2023

Beetstra added a comment to T313107: Grant bots the sboverride userright.

What I mean is, the only way to get the blacklisted link back in is through
a revert, undo, not through a regular edit. And situations do happen where
revert is wrong (you would revert a good, or multiple good, edit(s) which
you then have to re-do), and where undo does not work.

Jun 13 2023, 7:08 PM · User-notice-archive, MediaWiki-User-management, SpamBlacklist
Beetstra added a comment to T313107: Grant bots the sboverride userright.
  1. Now try to revert the edit that vandalized … you can if the diffs are

not conflicting (that overrides the SBL), but if it is conflicting you
cannot, you have to edit it back in, but you cannot because that would add
a blacklisted link back in.

Jun 13 2023, 7:00 PM · User-notice-archive, MediaWiki-User-management, SpamBlacklist
Beetstra added a comment to T313107: Grant bots the sboverride userright.

This should not be hardcoded like this, it should be coded as to allow additions of matches according to user rights, some should simply never be added (not even by admins, as the current SBL works, others should be disallowed only for unregistered editors or new editors. In this way one could even allow usage of link shorteners to e.g. extended confirmed users or users with ‘given’ rights, but disallow them being used by spambots (and then bots can replace all of them).

Jun 13 2023, 4:19 AM · User-notice-archive, MediaWiki-User-management, SpamBlacklist

Apr 12 2023

Beetstra added a comment to T328691: [toolsdb] Migrate linkwatcher db to Trove.

Dropping the diff column may give problems with the current coding, which I
do not have time to change at the moment.

Apr 12 2023, 12:29 AM · linkwatcher, cloud-services-team (FY2022/2023-Q4), Toolforge

Jun 19 2021

Beetstra closed T224154: Reduce size of linkwatcher db if at all possible as Declined.

db needs to stay as is to have sufficient historical log. Db could be moved to own server though if it consumes too many resources in current config (see my previous comment)

Jun 19 2021, 8:20 AM · cloud-services-team, linkwatcher, Tools
Beetstra closed T224154: Reduce size of linkwatcher db if at all possible, a subtask of T224152: toolsdb replica is running low on space -- cleanup large tables if possible, as Declined.
Jun 19 2021, 8:20 AM · cloud-services-team (Kanban), Data-Services

Mar 1 2021

Beetstra added a comment to T254649: Rename SpamBlacklist.

Can someone test whether this works? Then we can start an RfC on en.wiki
to implement this.

Mar 1 2021, 5:14 PM · SpamBlacklist

Feb 23 2021

Beetstra added a comment to T254649: Rename SpamBlacklist.

Would it be an option to have all wiki pages be custom set by the wiki:
$whereIsSpamBlacklist=Wikipedia:here?

Feb 23 2021, 3:18 AM · SpamBlacklist

Nov 12 2020

Beetstra updated subscribers of T266587: ToolsDB replication is broken.
Nov 12 2020, 7:06 PM · Data-Services, cloud-services-team (Kanban)

Oct 16 2020

Beetstra added a comment to T202989: Administrators can no longer view deleted history of js/css pages.

You can repeat that over and over. It is tasks of administrators to
administrate Wikis. That means that they must be able to see evidence that
was deleted. Unless, which is extremely unlikely, the deleted page is
malicious code (which should have been suppressed, not deleted), what is on
the page is part of the accountability, it is part of patterns, etc.
Administrators need to see that.

Oct 16 2020, 1:02 PM · User-notice-archive, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-DannyS712, Security, User-Tgr, Trust-and-Safety, WMF-General-or-Unknown, JavaScript

Oct 15 2020

Beetstra added a comment to T202989: Administrators can no longer view deleted history of js/css pages.

I am going to pile on here. Admins should be able to see the deleted
revisions. If there is something malicious in there, it should be
oversighted/suppressed, not just deleted.

Oct 15 2020, 3:49 AM · User-notice-archive, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-DannyS712, Security, User-Tgr, Trust-and-Safety, WMF-General-or-Unknown, JavaScript

Jun 6 2020

Beetstra added a comment to T254649: Rename SpamBlacklist.

Should we not get rid of the word ‘spam’? It is a blocklist for external links, which are often, but certainly not only, spam

Jun 6 2020, 5:55 PM · SpamBlacklist

Jan 29 2020

Beetstra updated the task description for T243484: Spam blacklist excludes on WikiData and spill-over to client projects.
Jan 29 2020, 11:07 AM · Wikidata, SpamBlacklist
Beetstra added a comment to T243484: Spam blacklist excludes on WikiData and spill-over to client projects.

Related discussion to show how spamming WikiData has an effect on other wikis: https://en.wikipedia.org/w/index.php?title=Talk:The_Pirate_Bay&oldid=938126260#Official_website_template

Jan 29 2020, 8:37 AM · Wikidata, SpamBlacklist

Jan 26 2020

Beetstra updated Beetstra.
Jan 26 2020, 8:08 AM
Beetstra added a comment to T243484: Spam blacklist excludes on WikiData and spill-over to client projects.

Two points:

Jan 26 2020, 7:12 AM · Wikidata, SpamBlacklist

Jan 23 2020

Beetstra created T243484: Spam blacklist excludes on WikiData and spill-over to client projects.
Jan 23 2020, 6:24 AM · Wikidata, SpamBlacklist

Nov 2 2019

Beetstra added a comment to T146837: Add ability to search by user agent from CheckUser interface.

The only way I would see is that there is an abusefilter-variety that is
enabled for checkusers (so a separate one). It was however confessed to me
that the AbuseFilter itself needs a serious upgrade, so I can imagine that
a CU-clone of it is not soon going to happen.

Nov 2 2019, 7:56 PM · Stewards-and-global-tools, CheckUser

Oct 29 2019

Beetstra added a comment to T20110: Define AbuseFilter consequence to display a CAPTCHA.

(Did not see this earlier)

Oct 29 2019, 9:25 AM · ConfirmEdit (CAPTCHA extension), Patch-For-Review, Wikimedia-Hackathon-2024, AbuseFilter
Beetstra created T236760: Make captcha an option in the abusefilter.
Oct 29 2019, 8:50 AM · AbuseFilter

Oct 22 2019

Beetstra closed T123121: Linkwatcher spawns many processes without parent as Resolved.

LinkWatcher has moved to an own instance (VM). Not an issue anymore on the instances.

Oct 22 2019, 6:29 AM · Cloud-Services, Toolforge

Oct 11 2019

Beetstra added a comment to T146837: Add ability to search by user agent from CheckUser interface.

A way to circumvent the large index is to turn this into something like an abusefilter for checkusers only. Get alerted when someone in a range uses a recognizable UA is a gazillion times better that finding a sock after 50 edits, waiting for CUs to check and confirm while the sock is already on a next account.

Oct 11 2019, 6:53 PM · Stewards-and-global-tools, CheckUser

Jul 19 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

@Bstorm: Thanks! All is working now, except I have to now make explicit to perl where 'toolsdb' is (previously, basically saying 'sqlhost=toolsdb' is enough). What is the full address? - Got it!!

Jul 19 2019, 11:39 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 17 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

@Bstorm, is there anything you need from my end now? How do I proceed?

Jul 17 2019, 3:33 PM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 15 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

Just as a very recent example, see https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports/yeyebook.com. That is 1 year worth of very slow addition of external links by a multitude of IPs. If you see one individual IP doing one or two edits on one wiki you would not know that this is part of a larger campaign. You would only see one or two edits on one Wiki and without the db you would have no clue that this is happening on 6 different wikis by 13 IPs.

Jul 15 2019, 10:59 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

@bd808 I understand, I do maintain these bots with a ‘fear’ that at some
point a failure will render my db broken (it happened before, and this is
the third place where I started this db from scratch). It is ‘painful’ but
it happens. Thank you for your evaluation.

Jul 15 2019, 4:12 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 13 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

@bd808 Can you tell me what was the outcome of the 9/7 meeting?

Jul 13 2019, 9:16 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 10 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

The data is quite valuable, as it enables on-wiki to see who added what links in the past, and the content allows for statistical spam detection. It is therefore persistent. There is 7.5 years of data there, and seen that one tools-sgeexec has difficulties keeping up with current additions and statistics, rebuilding it is a gigantic task (plus, valuable information in the form of deleted articles is invisible and hence cannot be rebuilt without global admin bit).

Jul 10 2019, 4:39 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 7 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

Re db size: this db is about 7.5 years worth of data, I expect that this
will be enough for more than 5 years in the future. As MediaWiki starts to
store similar data now itself, I may be able to use MediaWiki’s data in the
future and stop storing it myself (or store less).

Jul 7 2019, 3:16 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jul 6 2019

Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

I have to see what is needed. It is also something that is useful for me
to learn, but I likely need help

Jul 6 2019, 10:55 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Beetstra updated subscribers of T227377: Request creation of Linkwatcher and COIBot VPS project.
Jul 6 2019, 7:02 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Beetstra added a comment to T227377: Request creation of Linkwatcher and COIBot VPS project.

See also https://phabricator.wikimedia.org/T224154

Jul 6 2019, 7:01 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)
Beetstra created T227377: Request creation of Linkwatcher and COIBot VPS project.
Jul 6 2019, 6:59 AM · cloud-services-team (Kanban), Cloud-VPS (Project-requests)

Jun 26 2019

Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

The idea is to move. However, I will need some help (some has already been
offered, and I have been asking around for more). Finding the time to get
this up and running is another issue (as a volunteer)

Jun 26 2019, 3:02 AM · cloud-services-team, linkwatcher, Tools

Jun 25 2019

Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

We handle cases back to 2008 ... I currently have an AfD of a case that is
8.5 years old.

Jun 25 2019, 7:49 PM · cloud-services-team, linkwatcher, Tools
Beetstra added a comment to T216504: page-links-change stream is assigning template propagation events to the wrong edits.

I am looking at this from a spam-detection point-of-view. The way I see
this, this may result in records on my name because I add a spamlink
because a spammer added a link to a template. That would disable a lot of
statistical spam-detection mechanisms (and, e.g. mechanisms like xLinkBot).

Jun 25 2019, 4:07 AM · Data-Engineering, Event-Platform, Patch-For-Review, Platform Team Workboards (Clinic Duty Team), The-Wikipedia-Library, Internet-Archive

May 22 2019

Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

I will have a look and maybe get the ball running. I generally do not have a lot oftime, but should have 2-3 weeks with more time in the end of July to do more work on it

May 22 2019, 7:42 PM · cloud-services-team, linkwatcher, Tools
Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

Replicas are too slow, linkwatcher tries to work in real-time, it tries to
keep up with the edit feeds (if capacity of the sgeexec hosts allows, which
currently it doesn’t; warning/blocking ip/account hopping spammers, or
blacklisting their links to get the message through) makes only sense if
they are caught in the act). There is info in the wiki db, but I doubt it
is easy to search (even wiki-by-wiki, try to find those additions of
porhub.com, and realise that it are all single-edit IPs that add it, is
already a good test -write a query from which you can conclude it is xwiki
spam - two queries (or even 1 ..) on my db shows you that there are the
same number of additions by IPs as there are additions of the link they
were adding, an unlikely coincidence)

May 22 2019, 7:25 PM · cloud-services-team, linkwatcher, Tools
Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

That would be a great idea. Note that also my tool coibot would need to go
there, and that both need significant capacity to run (linkwatcher is
struggling on its current instance due to workload, and if coibot needs to
run there as well ...).

May 22 2019, 7:06 PM · cloud-services-team, linkwatcher, Tools
Beetstra updated subscribers of T224154: Reduce size of linkwatcher db if at all possible.
May 22 2019, 6:53 PM · cloud-services-team, linkwatcher, Tools
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Can someone please move all other bots away from the instance that runs linkwatcher? @valhallasw?

May 22 2019, 6:51 PM · Cloud-Services, Toolforge
Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

@Bstorm: can you provide me with the names/IPs of the editors that were
spamming porhub.com (including the diffs of addition and on which wiki)?

May 22 2019, 6:42 PM · cloud-services-team, linkwatcher, Tools
Beetstra added a comment to T224154: Reduce size of linkwatcher db if at all possible.

That would result in an immediate loss of functionality for the spam
fighting community.

May 22 2019, 6:05 PM · cloud-services-team, linkwatcher, Tools

Apr 7 2018

Beetstra added a comment to T191686: Ability to blacklist based on sister wiki family or whitelist based on same.

See https://phabricator.wikimedia.org/T6459 and https://meta.wikimedia.org/wiki/2017_Community_Wishlist_Survey/Miscellaneous#Overhaul_spam-blacklist

Apr 7 2018, 5:21 AM · SpamBlacklist

Jul 3 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@bd808 It is not that trivial, the new project would need to run coibot and linkwatcher, as they both do their share of analysis on the created db.

Jul 3 2017, 5:42 PM · Cloud-Services, Toolforge

Jul 2 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw Do you mind clearing the instance that linkwatcher is on .. it does not have enough resources and starts to build up a backlog. It is currently on 1438. Thanks!

Jul 2 2017, 4:08 AM · Cloud-Services, Toolforge

Apr 18 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw Do you mind to make sure that linkwatcher is the only bot on 1403? I had to start it this morning, it apparently crashed. Thanks!

Apr 18 2017, 3:38 AM · Cloud-Services, Toolforge

Mar 29 2017

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

@Samwalton9, what do you mean with 'at some point'? Do you mean that this
has an enormous lag? We do see some effect in deterring spammers by acting
in real time (within minutes), many are hit and run editors, and I have
seen 'good faith spammers' with many warnings on many IPs complain that
they were never contacted ..

Mar 29 2017, 3:46 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, WMF-General-or-Unknown, The-Wikipedia-Library

Feb 15 2017

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

If there is a rc-feed of edits for testwiki, I could set up LiWa3 to feed
added links to a channel on freenode (have to figure that out, it is a
matter of changing on-wiki settings and some killing on the server, it is a
long time since I added a feed manually). I would say that if the rc-feed
processed an edit, that then the db should also be updated. In my
experience, external link searches on wikis are quickly updated (as fast as
a diff gets saved and reported to rc), and as far as I understand that
search is based on a separate table that gets updated after every diff. I
presume that that same hook would update the list of added and removed
links.

Feb 15 2017, 4:04 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, WMF-General-or-Unknown, The-Wikipedia-Library

Feb 10 2017

Beetstra added a comment to T157826: Spam blacklist should provide options like the Title blacklist.

This is probably better handled through T6459, a complete overhaul so it
also easier to administer

Feb 10 2017, 8:00 PM · Stewards-and-global-tools, SpamBlacklist

Feb 1 2017

Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

@Beetstra thank you for your proactive response. FWIW I don't see and I don't currently think there is any abusive behavior or wrong doing on the part of the bots in question. Looking forward to further details from the reporter but the language used in the emails don't leave me the impression this is a high value reporter.

Feb 1 2017, 1:46 PM · Security, Cloud-Services

Jan 24 2017

Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

With hundreds of edits per minute to the 800 wikis that are checked ...
likely there are domains in every thinkable range ...

Jan 24 2017, 3:21 PM · Security, Cloud-Services
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

Just to clarify:

Jan 24 2017, 5:48 AM · Security, Cloud-Services
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

That is also one of my suspicions. The other one is that a domainowner
noticed that there is a bot requesting data from their site, and they want
to know whether that is/was legit ... or that the site itself got added a
lot in some tracking template in places where Linkwatcher and coibot would
notice (the latter being odd but not impossible). I would need to know
specific requests that triggered this now ...

Jan 24 2017, 4:10 AM · Security, Cloud-Services
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

I also need to know what they see as harmful. Coibot and Linkwatcher are
checking added links for viability, whether they are redirects, and whether
they are containing typical 'money making schemes'. If they note a lot of
traffic, then those links are added to Wikipedia at a somewhat alarming
rate.

Jan 24 2017, 3:50 AM · Security, Cloud-Services

Jan 4 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

hmm. Any idea how long those 3 python scripts will stay? linkwatcher will
munch away its backlog in time. Until the wikimedia linklog system comes
online I don't foresee a way of making linkwatcher smaller.

Jan 4 2017, 7:00 PM · Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw, do you mind to clear the instance linkwatcher is on, there
are three heavy python scripts there as well, and LiWa3 is building up a
massive backlog. Thanks.

Jan 4 2017, 5:26 PM · Cloud-Services, Toolforge

Dec 6 2016

Beetstra added a comment to T152316: Display filtered data in different categories.

If I understand it correctly, thse are basically giving the possibility to search 'by link/domain', 'by username/IP' or by 'pagename', right? That sounds about the most important, the first two is how we generally search, we either know the domain and find the spammers, or we know a spammer and want to find the domains. The third one is useful to monitor typical pages soammers would hit. System should be designed in such a way that the three searches can link to each other: if I am looking at a list of additions of a certain domain, with 3 different users adding them, I should have for each user a link to the 'search for this user', so I can snowball quickly.

Dec 6 2016, 4:03 AM · The-Wikipedia-Library

Nov 14 2016

Beetstra closed T150120: Perl module problems on 14## exec nodes as Resolved.

Now works.

Nov 14 2016, 11:06 AM · Cloud-Services, Toolforge
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

The only other things that now changed, is that I ran cpan to install LWP' - has that changed settings that now make everything run? Or did s.o. enforce a refresh on the modules serverwide - also the regular LWP::UserAgent now works ...

Nov 14 2016, 11:00 AM · Cloud-Services, Toolforge
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

I resolved the first three

  • the regex problem is a perl-problem, it has apparently been set to more strict (it is something all my bots complain about on those regexes, it is known for perl - just either in a newer version it has become more strict, or a setting has changed in how the regex module is loaded).
  • the the BSD::Resource problem disappears when I change the order of the called modules (funny - it suggests that some things get loaded in earlier modules that make later modules fail - already loaded version of older modules that don't get reloaded with next modules and which have a different 'version'?)
  • The getrlimit seems to have been resolved as well by reshuffling the module-calling ..
Nov 14 2016, 10:55 AM · Cloud-Services, Toolforge
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

@valhallasw: you say ".. on an older Perl version doesn't work on a newer version anymore" .. there is a newer Perl version on the 14XX hosts (and also a new PHP for that matter)?

Nov 14 2016, 5:48 AM · Cloud-Services, Toolforge

Nov 13 2016

Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

Well, obviously, they are not the same as the 12xx nodes (see also a bug about sudden php errors on 14xx nodes that were not there on the 12xx nodes, bug T149810). These issues are also for me impossible to debug, as the perl errors were not there on the 12xx nodes, which apparently had everything correctly installed. Again, I did not change anything, yet everything crashes. Am I now to tell how the 14xx nodes are different from the 12xx nodes .. as you say libbsd-resource-perl is installed yet throws a 'not found' error ..

Nov 13 2016, 2:12 PM · Cloud-Services, Toolforge
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

(I temporary killed the bot (tools.xlinkbot) that is now non-functional, I expect others to become problematic in time).

Nov 13 2016, 7:32 AM · Cloud-Services, Toolforge

Nov 9 2016

Beetstra added a comment to T6459: Create a special page to handle additions, removals, changes and logging of spam blacklist entries.

I am going to work out some thought experiment here. My suggestion to re-write the current spam-blacklist extension (or better, rewrite another extension):

  • take the current AbuseFilter, take out all the code that interprets the rule ('conditions').
  • Make 2 fields:
    • one text field for regexes that block added external links (the blacklist). Can contain many rules (one on each line).
    • one text field for regexes that override the block (whitelist overriding this blacklist field; that is generally simpler and cleaner than writing a complex regex, not everybody is a specialist on regexes).
  • Add namespace choice (checkboxes; so one can choose not to blacklist something in one particular namespace, or , with addition of an 'all', a 'content-namespace only' and 'talk-namespace only'.
  • Add user status choice (checkboxes for the different roles, or like the page-protection levels)
    • Some links are fine in discussions but should not be used in mainspace, others are a total nono
    • Some image links are find in the file-namespace to tell where it came from, but not needed in mainspace
  • Leave all the other options:
    • Discussion field for evidence (or better, a talk-page like function)
    • Enabled/disabled/deleted - not needed, turn it off, obsolete then delete
    • 'Flag the edit in the edit filter log' - maybe nice to be able to turn it off, to get rid of the real rubbish that doesn't need to be logged
    • Rate limiting - catch editors that start spamming an otherwise reasonably good link
    • Warn - could be a replacement for en:User:XLinkBot
    • Prevent the action - as is the current blacklist/whitelist function
    • Revoke autoconfirmed - make sure that spammers are caught and checked
    • Tagging - for combining certain rules to be checked by RC patrollers.
    • I would consider to add a button to auto-block editors on certain typical spambot-domains.
Nov 9 2016, 10:57 AM · Stewards-and-global-tools, SpamBlacklist

Nov 6 2016

Beetstra created T150120: Perl module problems on 14## exec nodes.
Nov 6 2016, 11:35 AM · Cloud-Services, Toolforge

Nov 3 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw The bot moved two days ago again, and I had to restart it now .. it is now on tools-exec-1417 (2 x edited comment).

Nov 3 2016, 3:28 AM · Cloud-Services, Toolforge

Oct 26 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw The bot yesterday moved to 1216. It is not backlogging, but maybe it is good to make sure other tasks do not run on this instance.

Oct 26 2016, 3:34 AM · Cloud-Services, Toolforge

Sep 7 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - it crashed, and is now on 1213. Do you mind moving the other tasks (it is back making backlogs again)?

Sep 7 2016, 5:17 AM · Cloud-Services, Toolforge

Aug 25 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw can you please resubmit jobs on tools-exec-1203 .. linkwatcher seems to interfere with other scripts running there.

Aug 25 2016, 1:02 PM · Cloud-Services, Toolforge

Jun 1 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Thanks, vallhallasw.

Jun 1 2016, 3:32 AM · Cloud-Services, Toolforge

May 19 2016

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

@Sadads, @kaldari - if this is supposed to help the anti-spam efforts, this
should be standard enabled for ALL wikis.

May 19 2016, 3:50 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, WMF-General-or-Unknown, The-Wikipedia-Library

Feb 24 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - I had to move the bot to another instance, it is now on 1205 (if I become linkwatcher I can't ssh to 1209, access denied).

Feb 24 2016, 3:54 AM · Cloud-Services, Toolforge

Feb 21 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - the bot moved to 1209

Feb 21 2016, 3:31 AM · Cloud-Services, Toolforge

Jan 24 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

I'm working on that @MarcoAurelio - Now back online.

Jan 24 2016, 3:19 PM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - the bot crashed (no clue why, it seems to have troubles with MySQL). I restarted it this morning, and it is now on 1215

Jan 24 2016, 3:45 AM · Cloud-Services, Toolforge

Jan 19 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Grr, I noted a bug on one of the counts (resolved) - it is now counting those and filling the proper table to reduce the counts. Re-indexing of the broken index is now done.

Jan 19 2016, 5:56 AM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Thank you. Not sure if I understand the situation with the privacy, you mean that there is no way to exclude the queries from other people which may contain information that I should not see - as the bot operator, I do know (in principle) which queries the bot runs.

Jan 19 2016, 4:05 AM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw: a good solution would be assigning 200-300% processor to the whole task. I found http://wiki.crc.nd.edu/wiki/index.php/Submitting_Batch/SGE_jobs - which suggests "-pe mpi-# #" would be the option .. (I'm not a specialist in this)

Jan 19 2016, 3:25 AM · Cloud-Services, Toolforge

Jan 18 2016

Beetstra added a comment to T123270: Make gridengine exec hosts also submit hosts.

Can this enforce that all sub-spawned processes are running on the same exec-host (or can the spawning command 'enforce' that). As my bots are currently set up, the sub-processes communicate with the mother process through TCP, which means that they (at the moment) can not communicate between exec hosts (this would help with T123121). (I could make the communication through MySQL or files, but that would be quite a task).

Jan 18 2016, 6:28 AM · Patch-For-Review, Toolforge

Jan 17 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

The bot is still eating away its (old) backlog, which goes slowly. Bot seems to operate fine now with way less processes. Still it uses 200-250% of processor power, which seems to be necessary for a bot doing all this work. As earlier, we could consider a rewrite making the sub-processes running independently, or I could split the bot into three smaller bots - but both actions require significant rewrites for which I do not have time.

Jan 17 2016, 5:42 AM · Cloud-Services, Toolforge

Jan 14 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - I have added 2 more parsers (total now 12) - the bot is creating a backlog, likely during the American daytime, which it does not munch away at night.

Jan 14 2016, 7:30 AM · Cloud-Services, Toolforge

Jan 12 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw: taking the number of parsers down from 10 to 8 resulted in formation of a backlog within 10 minutes. Trying 9 .. (the parsers are the processor intensive processes, the others hardly ever take more than 3% each, and often are 0).

Jan 12 2016, 11:12 AM · Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - thank you for the lengthy explanation. This bot has now been running on labs for a long time (and sometimes for long uptimes without problems - it has at least once managed to run for more than 6 months in a row), and has been running smoothly here. The main thing that I see from running this system on a multi-bot environment is the interaction indeed with the other bots. When it was privately hosted, sometimes the other bots were 'munching' too much and the bot started lagging - I see that here as well (and obviously, and my apologies for that, the opposite also happens). In the early times of Labs, it has indeed been running on an own instance for some time., both to avoid bringing down other bots, as well as being brought down by other bots.

Jan 12 2016, 11:00 AM · Cloud-Services, Toolforge

Jan 11 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Valhallasw, it spawns many subprocesses to be able to keep up with
wikipedia editing. It needs to parse in real time as anti-spam bots and
work depend on it.

Jan 11 2016, 2:11 PM · Cloud-Services, Toolforge

Jan 5 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo. The last upgrade of the bot seems to have brought down the load significantly over the (my) night - doing successive 'show processlist;' statements does not show many queries running longer than 5 seconds, and hardly any longer than 10 seconds (which should now really happen less and less). When this bot ([[:m:User:LiWa3]]) is back up and running in full, I will turn my attention to the second bot (([[:m:User:COIBot]]) that makes heavy use of this db.

Jan 5 2016, 6:08 AM · linkwatcher, Stewards-and-global-tools, Toolforge

Jan 4 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Working on it again. Some of the new counting mechanisms were not performing as requested, but that has now been updated.

Jan 4 2016, 12:24 PM · linkwatcher, Stewards-and-global-tools, Toolforge

Dec 24 2015

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Is that since this morning (UTC+3)?

Dec 24 2015, 5:59 PM · linkwatcher, Stewards-and-global-tools, Toolforge

Dec 14 2015

Beetstra added a comment to T118042: Map different types of measurement for T115119 's schema.

It would be great if this could be a real-time IRC feed as well - as then http://en.wikipedia.org/wiki/User:XLinkBot can hook live into the feed and revert when conditions are met.

Dec 14 2015, 9:05 AM · The-Wikipedia-Library, Possible-Tech-Projects
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo: I have implemented new counting tables based on the three 'offending' queries above (and will implement if there are more, just tell me here what is being queried and I will devise a solution for it).

Dec 14 2015, 7:15 AM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo: I was notified immediately, but unfortunately at the start of my weekend, with an email which is hardly telling me anything, just that the number of connections were restricted. I reacted immediately after the weekend, and it still took time to realise why the bots were affected by this. Moreover, the en.wikipedia policy that you quote regards the editing on-wiki, which was here not the problem (the main bot does not even edit on-wiki) - it was the database.

Dec 14 2015, 3:54 AM · linkwatcher, Stewards-and-global-tools, Toolforge

Dec 13 2015

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Let me have a manual look at the 'offending queries' one of these days .. see if I can reproduce. When WikiData started I had problems with three bot that ran at hundreds of edits per minute which brought everything down. Maybe I have a similar problem here now.

Dec 13 2015, 2:10 PM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Well, with one connection the bots cannot run. LiWa3 uses something like 50 parallel processes (to keep up with the 600+ edits a minute) with each their own connection. COIBot adds a couple more. The first bot's main process takes down the only connection, the rest of the processes the bot spawns crashes the main bot then.

Dec 13 2015, 12:32 PM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo - I receive complains every time there is a CPU/IO spike from other users - that means that you knew for a long time there was a CPU/IO spike every now and then .. and you could have seen then which bot there was causing that, and ask the bot-owner/maintainer

Dec 13 2015, 12:03 PM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo .. what issue? What query is making this happening. It can't be the couple of hundred of usual insert queries that the bots do, it must be one (or a couple) of the select queries. Do I have a broken query, do I have a query that is not optimised, or do I have a missing index on a table ...??

Dec 13 2015, 12:01 PM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@yuvipanda, @jcrespo - with all respect, this has just completely brought the complete Wikipedia anti-spam effort to a near halt (I've taken the bots offline). It is fine that there are problems, and that those need to be solved, but it would be great if we finally would get a bit more consideration from the WMF (this is not the first time that unannounced and undiscussed actions from WMF bring down bots - a couple of months ago my bots went down for days because of an unannounced and very minor change in server output) - your databases will run just fine when there are no bot operators that are willing to use Labs. Thank you.

Dec 13 2015, 11:48 AM · linkwatcher, Stewards-and-global-tools, Toolforge
Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

hm, interesting idea but I don't think the data stored from this process would be used for this kind of searching. Rather, the data stored here would be analyzed and aggregated, then stored that way in a different table that would facilitate searching.

Dec 13 2015, 3:45 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, WMF-General-or-Unknown, The-Wikipedia-Library