Tool Name: anomiebot
Quota increase requested: Maybe +5 pods and +2 CPU? Details below.
Reason: T319557: Migrate anomiebot from Toolforge GridEngine to Toolforge Kubernetes
Since T319557 asks me to migrate AnomieBOT to Kubernetes, I looked at what the quotas are versus AnomieBOT's current usage on GridEngine. That current usage is:
ID Bot State CPU VMem Peak Max % Queue ------- ------------------ ------- ---------- ------ ------ ------ ----- --------------------------------------------------------------------- 6020222 AnomieBOT-2 running 30:12:10 86.2M 95.2M 350.0M 24.6% continuous@tools-sgeexec-10-19.tools.eqiad1.wikimedia.cloud 6020223 AnomieBOT-3 running 51:03:51 138.0M 158.5M 350.0M 39.4% continuous@tools-sgeexec-10-8.tools.eqiad1.wikimedia.cloud 6020224 AnomieBOT-4 running 1028:38:41 120.0M 128.8M 512.0M 23.4% continuous@tools-sgeexec-10-19.tools.eqiad1.wikimedia.cloud 6020225 AnomieBOT-5 running 2:29:25 123.1M 140.3M 256.0M 48.1% continuous@tools-sgeexec-10-8.tools.eqiad1.wikimedia.cloud 1237289 AnomieBOT-7 running 12:23:31 94.0M 104.6M 256.0M 36.7% continuous@tools-sgeexec-10-20.tools.eqiad1.wikimedia.cloud 6017869 AnomieBOT-200 running 0:10:42 81.1M 81.6M 256.0M 31.7% continuous@tools-sgeexec-10-13.tools.eqiad1.wikimedia.cloud 6017870 AnomieBOT-500 running 0:15:11 80.3M 81.4M 256.0M 31.4% continuous@tools-sgeexec-10-13.tools.eqiad1.wikimedia.cloud 6017871 AnomieBOT-501 running 1:06:08 80.1M 80.8M 256.0M 31.3% continuous@tools-sgeexec-10-8.tools.eqiad1.wikimedia.cloud 6017872 AnomieBOT-999 running 16:24:00 235.0M 425.8M 512.0M 45.9% continuous@tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud 9999436 lighttpd-anomiebot running 0:20:05 175.2M 316.2M 4.0G 4.3% webgrid-lighttpd@tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud
As I understand it, each of the 10 jobs there would be a "pod" in Kubernetes, and the default quota is 10 pods. That doesn't leave any overhead, for e.g. the daily cron task that sends me a status email and the AnomieBOT-1 job used to run on-demand tasks.
The default 8Gi quota for memory seems like it should be fine, AnomieBOT doesn't use a lot. Especially if I can turn the webserver's request down when switching it to Kubernetes.
As for CPU, that's where I could really use some advice. AnomieBOT-4 clearly does the most processing and could probably use 1 full CPU. The rest should be fine with fractions, although it seems likely that 1/9 each would be low. The +2 requested would be enough for 1/4 each plus 3/4 left over for overhead, but I'd be happy to have more. If you can point me at monitoring (grafana?), that would also be helpful once I start switching over to inform balancing the allocations.
For background, AnomieBOT currently runs 40 separate tasks for enwiki. Rather than having 40 separate jobs, most usually idle but potentially being a thundering herd if they all wake at once, the tasks are divided among a small number of "runners" that execute tasks in series.
- AnomieBOT-1 runs on-demand tasks, if someone asks me to run one.
- AnomieBOT-2 runs 12 different clerking tasks, that generally all would want to run at hourly, 4-hourly, or 6-hourly intervals.
- AnomieBOT-3 runs 14 continuous but not particularly time-sensitive tasks.
- AnomieBOT-4 runs a CPU-intensive task that runs pretty much continuously.
- AnomieBOT-5 runs a task which runs infrequently but is IO-bound when it runs, so I put it on a separate runner to avoid blocking tasks on -2 or -3.
- AnomieBOT-6 doesn't do anything right now, the task it used to run was discontinued.
- AnomieBOT-7 runs an IO-bound task that runs fairly continuously.
- AnomieBOT-200 runs 2 tasks that use the AnomieBOT II account (which has the templateeditor group).
- AnomieBOT-500 runs 2 tasks that use the AnomieBOT III account (which is an adminbot).
- AnomieBOT-501 runs a task using the AnomieBOT III account that needs particularly low latency.
- AnomieBOT-999 runs 6 tasks that operate under a "does not need specific approval" clause of enwiki's bot policy.