Thu, Dec 5
I'll reach out to the maintainer for a Buster backport.
Wed, Dec 4
This is complete
2.190.3 is now available in thirdparty for jessie-wikimedia.org and thirdparty/ci for stretch-wikimedia (given that the entire component is synced/updated by "reprepro update", this also updated docker-ce to 5:19.03.5 and containerd.io to 1.2.10-3)
The old puppetdb hosts (puppetdb1001) should be ready to go away, @jbond merged the patches to stop broadcasting to it last week. It also has 16G RAM, so those would be freed up when that's done.
The systemd time threw a number of errors on boron when trying to remove /var/cache/pbuilder/build/cow.6815/sys/devices* and proc/*, "Operation not permitted"
Tue, Dec 3
Mon, Dec 2
@RobH As you offered help in the SRE meeting last Monday, can you upgrade the firmware on cp3053?
As mentioned in last week's SRE meeting, let's upgrade the firmware to the latest revision cpn cp3053?
Thu, Nov 28
Two nits: When reimaging the servers (or when it's done), please also update the Cumin aliases and update https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions
JFTR, there's no immediate hurry, it was removed from Debian unstable, i.e. Diamond will not be part of the next Debian release in 1.5 years, but this doesn't affect existing releases (removals from stable releases are only done in exceptional cases)
Wed, Nov 27
JFTR, Diamond has been removed from Debian as part of the Python 2 removal: https://packages.qa.debian.org/d/diamond/news/20191127T071040Z.html
Mon, Nov 25
Fri, Nov 22
@Andrew : I created an (untested) patch which should fix this, can you take it from here?
The metrics below were added in PDNS 4. which were not yet present in 3. They needed to be added to the metrics dictionary of the exporter in operations/debs/prometheus-pdns-exporter
You can find docs on the specific metrics at https://doc.powerdns.com/authoritative/performance.html (from a quick glance they should all be of GaugeMetricFamily)
Thu, Nov 21
We already run a dedicated mirror host (mirrors.wikimedia.org, running on sodium) which currently mirrors Debian, Ubuntu and Tails (and which currently has 7.8 TB of free diskspace). I'm not sure if we have specific criteria for what we mirror, but it seems like a suitable candidate. Adding @faidon for comments.
Wed, Nov 20
Ah, that explains, it was removed from Debian in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932607
I don't see imposm2 packaged in Debian, which package would that be?
Tue, Nov 19
Mon, Nov 18
I could also use one for debugging with user credentials when we go live on the 2nd.
Thu, Nov 14
cp1077 might also be a totally different issue than cp3* (which are from a the same model/generation/ordering batch ; in kern.log on cp1077 there's two oopses from Nov 5, it's not unlikely that this corrupted some internal state, which made the server eventually crash later.
I tried to narrow this down a bit, but no real luck:
Wed, Nov 13
Mon, Nov 11
This task misses a rationale, what do we need it for on the non-labweb mw* servers? Anything which will be rolled out in the future?
We're in the process of rolling out Apereo CAS (and initial services are getting migrated to it), see https://phabricator.wikimedia.org/T233921 and sub tasks.
Fri, Nov 8
Nov 8 2019
I removed a bunch of old data (e.g. trusty leftovers), we now have over 70G free diskspace again and I filed https://phabricator.wikimedia.org/T237713 for pruning old builds.
We use group-based access control and based on the description that means membership in the "restricted" group. I see that you've already "signed" https://phabricator.wikimedia.org/L3, so we're good to go once Corey approves and the three day waiting period (from https://wikitech.wikimedia.org/wiki/SRE_Clinic_Duty#Access_requests) has passed.
Nov 7 2019
One big disk space hog is the fact that we don't expire old builds in /var/cache/pbuilder/result/*, there are builds which date back to 2016. There's certain value to keep the last, say six months of builds (e.g. for rollbacks), but anything else does not need to be retained. We could simply add a cron to purge old stuff, that should free ~ 40G or so.
The last remaining patch should be https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/525220/ (and some semi-related refactoring to switch more prod hosts to the new ldap Hiera settings.
The systemd unit shipped in the Buster package is fine, this was specific to the Puppet Labs one, so closing.
This isn't "unfair", this task is about the update error in the package itself and there's absolutely nothing we can do about this. If you want to have some kind of image rebuild, open a task for that.
Closing the task, this was a one time migration and jessie is on it's way out.
Nov 6 2019
That's all, I'll add you tomorrow to the group.
@Cmjohnson: According to the DHCP logs on install1002, the server correctly assigned the IP address, but I suspect the error is caused by the OS here; it's current configured for jessie, which probably doesn't support the new hardware. Those are intended to run buster anyway, so please adjust the config to use Buster and retry: https://phabricator.wikimedia.org/T224563#5636773
JFTR, the mwdebug* servers are running 7.2.24 and can be used for additional tests.
Ack, searching by procurement properly addresses the batching aspect, I missed that before.
You are in a group which should allow you to edit dashboards (cn=wmf), did you login? (Using the arrow to the right in the grey navigation bar on the left). If you're logged in, are you getting some error message?
Nov 5 2019
Nov 4 2019
I've removed jeroendedauw from the wmde LDAP group. There were no NDA-sensitive Phab groups present.
The lldpd unit only depends on network.target, but network-online.target, per systemd-special(7) lldpd.service only the latter will postpone startup until the network interface is fully setup
You can mimic the existing thirdparty/kubeadm-k8s-docker.com component for wikimedia-stretch. And the binary name changed, it's docker-ce now.
Ack, just ping the task with the new SSH key when you received your new computer.
Ticked off the relevant bits, closing.
I fixed the data.yaml entry
Oct 30 2019
Obviously the jessie kernel is built with the jessie GCC. but there's no toolchain change which would explain the difference. You can try install the jessie kernel on stretch and vice-versa for further tests, though.
There's no difference, the 4.9.189-3+deb9u1 and 4.9.189-3+deb9u1~deb8u1 are identical feature-wise.