Page MenuHomePhabricator

Re-create poolcounter instance in Beta Cluster (deployment-prep)
Open, Needs TriagePublic

Description

Background

poolcounter05 and poolcounter06.deployment-prep.eqiad1.wikimedia.cloud were shutdown in T370458: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) despite being used actively used by MediaWiki and Thumbor.

https://codesearch.wmcloud.org/search/?q=deployment-poolcounter

deployment-prep/deployment-imagescaler.yaml
thumbor::poolcounter_server: deployment-poolcounter05.deployment-prep.eqiad.wmflabs
wmf-config/LabsServices.php
		'poolcounter' => [
			'deployment-poolcounter06.deployment-prep.eqiad.wmflabs',
		],

Found this from T332015 and a simple question on my side is, does anyone know why this VM needs to exist? Aside from "this is how it is in production", […], isn't a very good answer.

It's not so much existing, as behaving making services behave substantially different. Eg. MediaWiki with and without serving stale content and coellescing Parser invocations, and MW CirrusSearch with and without throttle, and Thumbor (or MW image scaling) with or without throttling.

I imagine this is likely causing logspam at the moment, making it harder to diagnose other issues.
And in terms of testing, this will of course decrease the value of testing, e.g. thumbnails are generated concurrently without limits which has often been the source of bugs in production and would benefit from behaving the same in Beta. Likewise, fast-serving stale ParserOutput is the kind of thing that can catch people off-guard. The earlier people notice this the better.

Event Timeline

I am not the one that shut down that VM, I just offered a PoV in T370458: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation). For that Beta VM to be shutdown, it probably means that no one showed up willing to claim it (and work on migrating it to Debian bullseye). WMCS went ahead with their policy to delete unclaimed VMs. See https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2024_Purge where it is even documented

I wish I had a good solution, but I don't. Assuming that it is still true that no one is willing to claim the work of setting up a poolcounter VM in Beta, the only suggestion I can offer is to remove the VM from the configuration altogether. That should stop the logspam at least. As far as the different behavior goes, I can only acknowledge what you point out. I have no solution, other than suggesting to bring that up in T215217: deployment-prep (beta cluster): Code stewardship request. Maybe it can be put in the roadmap of the team.