Page MenuHomePhabricator

Beetstra (Dirk Beetstra)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 6:11 AM (244 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Beetstra [ Global Accounts ]

Recent Activity

Yesterday

Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

The idea is to move. However, I will need some help (some has already been
offered, and I have been asking around for more). Finding the time to get
this up and running is another issue (as a volunteer)

Wed, Jun 26, 3:02 AM · Data-Services

Tue, Jun 25

Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

We handle cases back to 2008 ... I currently have an AfD of a case that is
8.5 years old.

Tue, Jun 25, 7:49 PM · Data-Services
Beetstra added a comment to T216504: page-links-change stream is assigning template propagation events to the wrong edits.

I am looking at this from a spam-detection point-of-view. The way I see
this, this may result in records on my name because I add a spamlink
because a spammer added a link to a template. That would disable a lot of
statistical spam-detection mechanisms (and, e.g. mechanisms like xLinkBot).

Tue, Jun 25, 4:07 AM · Services (watching), The-Wikipedia-Library, MediaWiki-extensions-WikimediaEvents, Internet-Archive

May 22 2019

Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

I will have a look and maybe get the ball running. I generally do not have a lot oftime, but should have 2-3 weeks with more time in the end of July to do more work on it

May 22 2019, 7:42 PM · Data-Services
Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

Replicas are too slow, linkwatcher tries to work in real-time, it tries to
keep up with the edit feeds (if capacity of the sgeexec hosts allows, which
currently it doesn’t; warning/blocking ip/account hopping spammers, or
blacklisting their links to get the message through) makes only sense if
they are caught in the act). There is info in the wiki db, but I doubt it
is easy to search (even wiki-by-wiki, try to find those additions of
porhub.com, and realise that it are all single-edit IPs that add it, is
already a good test -write a query from which you can conclude it is xwiki
spam - two queries (or even 1 ..) on my db shows you that there are the
same number of additions by IPs as there are additions of the link they
were adding, an unlikely coincidence)

May 22 2019, 7:25 PM · Data-Services
Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

That would be a great idea. Note that also my tool coibot would need to go
there, and that both need significant capacity to run (linkwatcher is
struggling on its current instance due to workload, and if coibot needs to
run there as well ...).

May 22 2019, 7:06 PM · Data-Services
Beetstra updated subscribers of T224154: Reduce size of linkwatcher db on toolsdb if at all possible.
May 22 2019, 6:53 PM · Data-Services
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Can someone please move all other bots away from the instance that runs linkwatcher? @valhallasw?

May 22 2019, 6:51 PM · Cloud-Services, Toolforge
Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

@Bstorm: can you provide me with the names/IPs of the editors that were
spamming porhub.com (including the diffs of addition and on which wiki)?

May 22 2019, 6:42 PM · Data-Services
Beetstra added a comment to T224154: Reduce size of linkwatcher db on toolsdb if at all possible.

That would result in an immediate loss of functionality for the spam
fighting community.

May 22 2019, 6:05 PM · Data-Services

Apr 7 2018

Beetstra added a comment to T191686: Ability to blacklist based on sister wiki family or whitelist based on same.

See https://phabricator.wikimedia.org/T6459 and https://meta.wikimedia.org/wiki/2017_Community_Wishlist_Survey/Miscellaneous#Overhaul_spam-blacklist

Apr 7 2018, 5:21 AM · SpamBlacklist

Jul 3 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@bd808 It is not that trivial, the new project would need to run coibot and linkwatcher, as they both do their share of analysis on the created db.

Jul 3 2017, 5:42 PM · Cloud-Services, Toolforge

Jul 2 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw Do you mind clearing the instance that linkwatcher is on .. it does not have enough resources and starts to build up a backlog. It is currently on 1438. Thanks!

Jul 2 2017, 4:08 AM · Cloud-Services, Toolforge

Apr 18 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw Do you mind to make sure that linkwatcher is the only bot on 1403? I had to start it this morning, it apparently crashed. Thanks!

Apr 18 2017, 3:38 AM · Cloud-Services, Toolforge

Mar 29 2017

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

@Samwalton9, what do you mean with 'at some point'? Do you mean that this
has an enormous lag? We do see some effect in deterring spammers by acting
in real time (within minutes), many are hit and run editors, and I have
seen 'good faith spammers' with many warnings on many IPs complain that
they were never contacted ..

Mar 29 2017, 3:46 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, Wikimedia-General-or-Unknown, The-Wikipedia-Library

Feb 15 2017

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

If there is a rc-feed of edits for testwiki, I could set up LiWa3 to feed
added links to a channel on freenode (have to figure that out, it is a
matter of changing on-wiki settings and some killing on the server, it is a
long time since I added a feed manually). I would say that if the rc-feed
processed an edit, that then the db should also be updated. In my
experience, external link searches on wikis are quickly updated (as fast as
a diff gets saved and reported to rc), and as far as I understand that
search is based on a separate table that gets updated after every diff. I
presume that that same hook would update the list of added and removed
links.

Feb 15 2017, 4:04 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, Wikimedia-General-or-Unknown, The-Wikipedia-Library

Feb 10 2017

Beetstra added a comment to T157826: Spam blacklist should provide options like the Title blacklist.

This is probably better handled through T6459, a complete overhaul so it
also easier to administer

Feb 10 2017, 8:00 PM · Stewards-and-global-tools, SpamBlacklist

Feb 1 2017

Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

@Beetstra thank you for your proactive response. FWIW I don't see and I don't currently think there is any abusive behavior or wrong doing on the part of the bots in question. Looking forward to further details from the reporter but the language used in the emails don't leave me the impression this is a high value reporter.

Feb 1 2017, 1:46 PM · Cloud-Services, Security

Jan 24 2017

Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

With hundreds of edits per minute to the 800 wikis that are checked ...
likely there are domains in every thinkable range ...

Jan 24 2017, 3:21 PM · Cloud-Services, Security
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

Just to clarify:

Jan 24 2017, 5:48 AM · Cloud-Services, Security
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

That is also one of my suspicions. The other one is that a domainowner
noticed that there is a bot requesting data from their site, and they want
to know whether that is/was legit ... or that the site itself got added a
lot in some tracking template in places where Linkwatcher and coibot would
notice (the latter being odd but not impossible). I would need to know
specific requests that triggered this now ...

Jan 24 2017, 4:10 AM · Cloud-Services, Security
Beetstra added a comment to T156074: webiron batched abuse reports 1/23/2017 for coibot and linkwatcher.

I also need to know what they see as harmful. Coibot and Linkwatcher are
checking added links for viability, whether they are redirects, and whether
they are containing typical 'money making schemes'. If they note a lot of
traffic, then those links are added to Wikipedia at a somewhat alarming
rate.

Jan 24 2017, 3:50 AM · Cloud-Services, Security

Jan 4 2017

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

hmm. Any idea how long those 3 python scripts will stay? linkwatcher will
munch away its backlog in time. Until the wikimedia linklog system comes
online I don't foresee a way of making linkwatcher smaller.

Jan 4 2017, 7:00 PM · Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw, do you mind to clear the instance linkwatcher is on, there
are three heavy python scripts there as well, and LiWa3 is building up a
massive backlog. Thanks.

Jan 4 2017, 5:26 PM · Cloud-Services, Toolforge

Dec 6 2016

Beetstra added a comment to T152316: Display filtered data in different categories.

If I understand it correctly, thse are basically giving the possibility to search 'by link/domain', 'by username/IP' or by 'pagename', right? That sounds about the most important, the first two is how we generally search, we either know the domain and find the spammers, or we know a spammer and want to find the domains. The third one is useful to monitor typical pages soammers would hit. System should be designed in such a way that the three searches can link to each other: if I am looking at a list of additions of a certain domain, with 3 different users adding them, I should have for each user a link to the 'search for this user', so I can snowball quickly.

Dec 6 2016, 4:03 AM · The-Wikipedia-Library

Nov 14 2016

Beetstra closed T150120: Perl module problems on 14## exec nodes as Resolved.

Now works.

Nov 14 2016, 11:06 AM · Toolforge, Cloud-Services
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

The only other things that now changed, is that I ran cpan to install LWP' - has that changed settings that now make everything run? Or did s.o. enforce a refresh on the modules serverwide - also the regular LWP::UserAgent now works ...

Nov 14 2016, 11:00 AM · Toolforge, Cloud-Services
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

I resolved the first three

  • the regex problem is a perl-problem, it has apparently been set to more strict (it is something all my bots complain about on those regexes, it is known for perl - just either in a newer version it has become more strict, or a setting has changed in how the regex module is loaded).
  • the the BSD::Resource problem disappears when I change the order of the called modules (funny - it suggests that some things get loaded in earlier modules that make later modules fail - already loaded version of older modules that don't get reloaded with next modules and which have a different 'version'?)
  • The getrlimit seems to have been resolved as well by reshuffling the module-calling ..
Nov 14 2016, 10:55 AM · Toolforge, Cloud-Services
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

@valhallasw: you say ".. on an older Perl version doesn't work on a newer version anymore" .. there is a newer Perl version on the 14XX hosts (and also a new PHP for that matter)?

Nov 14 2016, 5:48 AM · Toolforge, Cloud-Services

Nov 13 2016

Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

Well, obviously, they are not the same as the 12xx nodes (see also a bug about sudden php errors on 14xx nodes that were not there on the 12xx nodes, bug T149810). These issues are also for me impossible to debug, as the perl errors were not there on the 12xx nodes, which apparently had everything correctly installed. Again, I did not change anything, yet everything crashes. Am I now to tell how the 14xx nodes are different from the 12xx nodes .. as you say libbsd-resource-perl is installed yet throws a 'not found' error ..

Nov 13 2016, 2:12 PM · Toolforge, Cloud-Services
Beetstra added a comment to T150120: Perl module problems on 14## exec nodes.

(I temporary killed the bot (tools.xlinkbot) that is now non-functional, I expect others to become problematic in time).

Nov 13 2016, 7:32 AM · Toolforge, Cloud-Services

Nov 9 2016

Beetstra added a comment to T6459: Create a special page to handle additions, removals, changes and logging of spam blacklist entries.

I am going to work out some thought experiment here. My suggestion to re-write the current spam-blacklist extension (or better, rewrite another extension):

  • take the current AbuseFilter, take out all the code that interprets the rule ('conditions').
  • Make 2 fields:
    • one text field for regexes that block added external links (the blacklist). Can contain many rules (one on each line).
    • one text field for regexes that override the block (whitelist overriding this blacklist field; that is generally simpler and cleaner than writing a complex regex, not everybody is a specialist on regexes).
  • Add namespace choice (checkboxes; so one can choose not to blacklist something in one particular namespace, or , with addition of an 'all', a 'content-namespace only' and 'talk-namespace only'.
  • Add user status choice (checkboxes for the different roles, or like the page-protection levels)
    • Some links are fine in discussions but should not be used in mainspace, others are a total nono
    • Some image links are find in the file-namespace to tell where it came from, but not needed in mainspace
  • Leave all the other options:
    • Discussion field for evidence (or better, a talk-page like function)
    • Enabled/disabled/deleted - not needed, turn it off, obsolete then delete
    • 'Flag the edit in the edit filter log' - maybe nice to be able to turn it off, to get rid of the real rubbish that doesn't need to be logged
    • Rate limiting - catch editors that start spamming an otherwise reasonably good link
    • Warn - could be a replacement for en:User:XLinkBot
    • Prevent the action - as is the current blacklist/whitelist function
    • Revoke autoconfirmed - make sure that spammers are caught and checked
    • Tagging - for combining certain rules to be checked by RC patrollers.
    • I would consider to add a button to auto-block editors on certain typical spambot-domains.
Nov 9 2016, 10:57 AM · Stewards-and-global-tools, SpamBlacklist

Nov 6 2016

Beetstra created T150120: Perl module problems on 14## exec nodes.
Nov 6 2016, 11:35 AM · Toolforge, Cloud-Services

Nov 3 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw The bot moved two days ago again, and I had to restart it now .. it is now on tools-exec-1417 (2 x edited comment).

Nov 3 2016, 3:28 AM · Cloud-Services, Toolforge

Oct 26 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw The bot yesterday moved to 1216. It is not backlogging, but maybe it is good to make sure other tasks do not run on this instance.

Oct 26 2016, 3:34 AM · Cloud-Services, Toolforge

Sep 7 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - it crashed, and is now on 1213. Do you mind moving the other tasks (it is back making backlogs again)?

Sep 7 2016, 5:17 AM · Cloud-Services, Toolforge

Aug 25 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw can you please resubmit jobs on tools-exec-1203 .. linkwatcher seems to interfere with other scripts running there.

Aug 25 2016, 1:02 PM · Cloud-Services, Toolforge

Jun 1 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Thanks, vallhallasw.

Jun 1 2016, 3:32 AM · Cloud-Services, Toolforge

May 19 2016

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

@Sadads, @kaldari - if this is supposed to help the anti-spam efforts, this
should be standard enabled for ALL wikis.

May 19 2016, 3:50 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, Wikimedia-General-or-Unknown, The-Wikipedia-Library

Feb 24 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - I had to move the bot to another instance, it is now on 1205 (if I become linkwatcher I can't ssh to 1209, access denied).

Feb 24 2016, 3:54 AM · Cloud-Services, Toolforge

Feb 21 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - the bot moved to 1209

Feb 21 2016, 3:31 AM · Cloud-Services, Toolforge

Jan 24 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

I'm working on that @MarcoAurelio - Now back online.

Jan 24 2016, 3:19 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - the bot crashed (no clue why, it seems to have troubles with MySQL). I restarted it this morning, and it is now on 1215

Jan 24 2016, 3:45 AM · Cloud-Services, Toolforge

Jan 19 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Grr, I noted a bug on one of the counts (resolved) - it is now counting those and filling the proper table to reduce the counts. Re-indexing of the broken index is now done.

Jan 19 2016, 5:56 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Thank you. Not sure if I understand the situation with the privacy, you mean that there is no way to exclude the queries from other people which may contain information that I should not see - as the bot operator, I do know (in principle) which queries the bot runs.

Jan 19 2016, 4:05 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw: a good solution would be assigning 200-300% processor to the whole task. I found http://wiki.crc.nd.edu/wiki/index.php/Submitting_Batch/SGE_jobs - which suggests "-pe mpi-# #" would be the option .. (I'm not a specialist in this)

Jan 19 2016, 3:25 AM · Cloud-Services, Toolforge

Jan 18 2016

Beetstra added a comment to T123270: Make gridengine exec hosts also submit hosts.

Can this enforce that all sub-spawned processes are running on the same exec-host (or can the spawning command 'enforce' that). As my bots are currently set up, the sub-processes communicate with the mother process through TCP, which means that they (at the moment) can not communicate between exec hosts (this would help with T123121). (I could make the communication through MySQL or files, but that would be quite a task).

Jan 18 2016, 6:28 AM · Patch-For-Review, Toolforge

Jan 17 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

The bot is still eating away its (old) backlog, which goes slowly. Bot seems to operate fine now with way less processes. Still it uses 200-250% of processor power, which seems to be necessary for a bot doing all this work. As earlier, we could consider a rewrite making the sub-processes running independently, or I could split the bot into three smaller bots - but both actions require significant rewrites for which I do not have time.

Jan 17 2016, 5:42 AM · Cloud-Services, Toolforge

Jan 14 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - I have added 2 more parsers (total now 12) - the bot is creating a backlog, likely during the American daytime, which it does not munch away at night.

Jan 14 2016, 7:30 AM · Cloud-Services, Toolforge

Jan 12 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw: taking the number of parsers down from 10 to 8 resulted in formation of a backlog within 10 minutes. Trying 9 .. (the parsers are the processor intensive processes, the others hardly ever take more than 3% each, and often are 0).

Jan 12 2016, 11:12 AM · Cloud-Services, Toolforge
Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

@valhallasw - thank you for the lengthy explanation. This bot has now been running on labs for a long time (and sometimes for long uptimes without problems - it has at least once managed to run for more than 6 months in a row), and has been running smoothly here. The main thing that I see from running this system on a multi-bot environment is the interaction indeed with the other bots. When it was privately hosted, sometimes the other bots were 'munching' too much and the bot started lagging - I see that here as well (and obviously, and my apologies for that, the opposite also happens). In the early times of Labs, it has indeed been running on an own instance for some time., both to avoid bringing down other bots, as well as being brought down by other bots.

Jan 12 2016, 11:00 AM · Cloud-Services, Toolforge

Jan 11 2016

Beetstra added a comment to T123121: Linkwatcher spawns many processes without parent.

Valhallasw, it spawns many subprocesses to be able to keep up with
wikipedia editing. It needs to parse in real time as anti-spam bots and
work depend on it.

Jan 11 2016, 2:11 PM · Cloud-Services, Toolforge

Jan 5 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo. The last upgrade of the bot seems to have brought down the load significantly over the (my) night - doing successive 'show processlist;' statements does not show many queries running longer than 5 seconds, and hardly any longer than 10 seconds (which should now really happen less and less). When this bot ([[:m:User:LiWa3]]) is back up and running in full, I will turn my attention to the second bot (([[:m:User:COIBot]]) that makes heavy use of this db.

Jan 5 2016, 6:08 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge

Jan 4 2016

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Working on it again. Some of the new counting mechanisms were not performing as requested, but that has now been updated.

Jan 4 2016, 12:24 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge

Dec 24 2015

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Is that since this morning (UTC+3)?

Dec 24 2015, 5:59 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge

Dec 14 2015

Beetstra added a comment to T118042: Map different types of measurement for T115119 's schema.

It would be great if this could be a real-time IRC feed as well - as then http://en.wikipedia.org/wiki/User:XLinkBot can hook live into the feed and revert when conditions are met.

Dec 14 2015, 9:05 AM · The-Wikipedia-Library, Possible-Tech-Projects
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo: I have implemented new counting tables based on the three 'offending' queries above (and will implement if there are more, just tell me here what is being queried and I will devise a solution for it).

Dec 14 2015, 7:15 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo: I was notified immediately, but unfortunately at the start of my weekend, with an email which is hardly telling me anything, just that the number of connections were restricted. I reacted immediately after the weekend, and it still took time to realise why the bots were affected by this. Moreover, the en.wikipedia policy that you quote regards the editing on-wiki, which was here not the problem (the main bot does not even edit on-wiki) - it was the database.

Dec 14 2015, 3:54 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge

Dec 13 2015

Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Let me have a manual look at the 'offending queries' one of these days .. see if I can reproduce. When WikiData started I had problems with three bot that ran at hundreds of edits per minute which brought everything down. Maybe I have a similar problem here now.

Dec 13 2015, 2:10 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Well, with one connection the bots cannot run. LiWa3 uses something like 50 parallel processes (to keep up with the 600+ edits a minute) with each their own connection. COIBot adds a couple more. The first bot's main process takes down the only connection, the rest of the processes the bot spawns crashes the main bot then.

Dec 13 2015, 12:32 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo - I receive complains every time there is a CPU/IO spike from other users - that means that you knew for a long time there was a CPU/IO spike every now and then .. and you could have seen then which bot there was causing that, and ask the bot-owner/maintainer

Dec 13 2015, 12:03 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@jcrespo .. what issue? What query is making this happening. It can't be the couple of hundred of usual insert queries that the bots do, it must be one (or a couple) of the select queries. Do I have a broken query, do I have a query that is not optimised, or do I have a missing index on a table ...??

Dec 13 2015, 12:01 PM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

@yuvipanda, @jcrespo - with all respect, this has just completely brought the complete Wikipedia anti-spam effort to a near halt (I've taken the bots offline). It is fine that there are problems, and that those need to be solved, but it would be great if we finally would get a bit more consideration from the WMF (this is not the first time that unannounced and undiscussed actions from WMF bring down bots - a couple of months ago my bots went down for days because of an unannounced and very minor change in server output) - your databases will run just fine when there are no bot operators that are willing to use Labs. Thank you.

Dec 13 2015, 11:48 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge
Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

hm, interesting idea but I don't think the data stored from this process would be used for this kind of searching. Rather, the data stored here would be analyzed and aggregated, then stored that way in a different table that would facilitate searching.

Dec 13 2015, 3:45 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, Wikimedia-General-or-Unknown, The-Wikipedia-Library
Beetstra added a comment to T121094: Throttling linkwatcher tool user as it is consuming 100% CPU.

Please revert this. This is effectively killing the hole anti-spam effort on Wikipedia. The bot needs multiple user connections into the database.

Dec 13 2015, 3:25 AM · Stewards-and-global-tools, DBA, Cloud-Services, Toolforge

Nov 24 2015

Beetstra added a comment to T115119: Create a feed or log of changed links on Wikimedia projects.

Does this scheme also include a quick-searchable domain (it is unclear to me) - I mean storing the domain 'www.example.com' as 'com.example.www', as that greatly improves search speed for domains.

Nov 24 2015, 5:18 AM · Internet-Archive, MediaWiki-extensions-WikimediaEvents, Wikimedia-General-or-Unknown, The-Wikipedia-Library