User Details
- User Since
- Aug 2 2021, 1:52 PM (227 w, 16 h)
- Availability
- Available
- IRC Nick
- Emperor
- LDAP User
- MVernon
- MediaWiki User
- MVernon (WMF) [ Global Accounts ]
Thu, Dec 4
per radosgw-admin user stats --uid=docker-registrythere are only 11 objects in that account, which I think equates to it not being currently used.
Tue, Dec 2
@Jhancock.wm RAID rebuilt OK, server back in production. Thanks for your help here :)
Mon, Dec 1
@dancy it's been a while now, but I think we just moved trafficserver to specify tags: [wmcs] for all its jobs and that worked around the issue.
@Jhancock.wm please go ahead - server is depooled.
Tue, Nov 25
Great find, thank you!
Fri, Nov 21
I have solved the easy one, though: ecosia. If you image search on there (e.g. https://www.ecosia.org/images?q=cattle and find the wikipedia hit (about fourth row down), it's hard-coding the link to https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Cow_(Fleckvieh_breed)_Oeschinensee_Slaunger_2009-07-07.jpg/480px-Cow_(Fleckvieh_breed)_Oeschinensee_Slaunger_2009-07-07.jpg (though that does bring about the question of where/how it's getting that from )
Thanks! I've spent a fair chunk of time searching and have come up with nothing. My next stop is likely #no-stupid-questions...
Thu, Nov 20
So 480 is quite common, but hasn't showed up in our search. I thought it might be instructive to check referer:
select referer, count(*) as hits from wmf.webrequest where webrequest_source='upload' and year=2025 and month=10 and day=24 and hour=10 and http_status='200' and uri_path like '/wikipedia/%/thumb/%' and regexp_extract(uri_path, '([0-9]+)px[^/]+$')='480' group by referer order by hits desc LIMIT 10;
Gives us
| referer | hits |
|---|---|
| https://en.wikipedia.org/ | 664267 |
| https://de.wikipedia.org/ | 159180 |
| https://ru.wikipedia.org/ | 143781 |
| https://www.ecosia.org/ | 143604 |
| https://ja.wikipedia.org/ | 90948 |
| https://fr.wikipedia.org/ | 88355 |
| https://it.wikipedia.org/ | 75985 |
| https://es.wikipedia.org/ | 54362 |
| https://pl.wikipedia.org/ | 44896 |
| https://zh.wikipedia.org/ | 28721 |
(to be clear, the testwiki link above does result in 500 at smallest, not 400 as I get with commons. I don't know if that's expected)
Wed, Nov 19
@BCornwall Pcre2 was first released in 2015. Pcre3 stopped receiving any upstream support (including security fixes) back in 2021, and I filed bugs against all packages depending on the obsolete pcre3 late in 2021. The initial aim had been to not ship pcre3 in Bookworm, but there were enough stragglers that the removal of pcre3 was delayed until the trixie development cycle. So while pcre3 was dropped from Debian in February of 2025 (and thus didn't go into trixie), this has been coming for quite some time. I wouldn't want to use pcre3 in any context involving untrusted input at this point. The author of pcre3 has handed over pcre maintenance, so the current maintainers have very little exposure to the old pcre3 code base.
Tue, Nov 18
Can we maybe adjust the set of sizes in the light of what is already in common use (cf T410304) - e.g. 400 is currently very uncommon compared to 500 or 330, so if we're trying to rationalise, it'd make sense to go with one of those rather than 400.
@RobH / @Jclark-ctr as I noted above, moss-be1002 can be done whenever, I'd just like to be told when you're going to do it, please.
It's about 0.5% difference in count of 250, which isn't a vast amount, but it's not nothing. And the ranking of the top-30-by-hits changes (at least 200/600 swap places, there are other shifts too, albeit not in the top 10). So I think it was worth spending a little time working on improving the query.
@Jhancock.wm I've just jbodded the new drive, and it seems good, thanks :)
Now obsoleted by a regexp-based approach (see https://phabricator.wikimedia.org/T410304#11383363)
A couple of notes on extracting thumbnail size from uri_path - a previous approach used
SELECT split(split(uri_path, '/')[7], 'px-')[0] as thumbsize
but this has a number of shortcomings, particularly that the array index of 7 is fragile, and incorrect for e.g. /archive/ thumbs. So I refined it somewhat to take the final path element, and then split that at px- and then split the result on - and take the final element (thus coping with prefix-NNNpx like you get with translated SVG files):
This still left a very few stragglers (15, mostly SVG files with URL-encodings in their names), which is likely good enough, but we can do better with a simple regexp:
select regexp_extract( slice(split(uri_path, '/'),-1,1)[0], '([0-9]+)px') as thumbsize, count(*) as hits from wmf.webrequest where webrequest_source = 'upload' and year = 2025 and month = 10 and day = 24 and http_status = '200' and uri_path like '/wikipedia/%/thumb/%' group by thumbsize order by hits desc;
This produces the same answers (modulo the 15 errors), is clearer, and only takes ~10% longer to run. Finally, of course, we can just do the whole operation with a single regexp - to match for thumbsize as previous and then state that it must be followed by only not-\ characters:
select regexp_extract(uri_path, '([0-9]+)px[^/]+$') as thumbsize, count(*) as hits from wmf.webrequest where webrequest_source = 'upload' and year = 2025 and month = 10 and day = 24 and http_status = '200' and uri_path like '/wikipedia/%/thumb/%' group by thumbsize order by hits desc;
@Jhancock.wm reimage had stalled again because puppet wasn't happy, again because of an EFI/vfat partition on one of the spinning disks
Notice: /Stage[main]/Profile::Swift::Storage::Configure_disks/Exec[mkfs-pci-0000:50:00.0-scsi-0:2:12:0]/returns: mkfs.xfs: /dev/disk/by-path/pci-0000:50:00.0-scsi-0:2:12:0-part1 appears to contain an existing filesystem (vfat).
I wiped that partition (and then partition table of the drive), then the reimage went OK.
Mon, Nov 17
@Jhancock.wm for each of ms-be209[0-3] the install was failing because puppet couldn't run, because one of the spinning disks had a vfat partition on containing an EFI setup. In each case, wiping that filesystem (and the partition table of the drive) unwedged puppet and I could then reimage them OK.
Nov 6 2025
We want the final element of uri_path split by / (it's not a fixed length because of archive thumbs).
Then to take the string up to "px-" (which is usually just the size) hence splitting on "px-" and taking the first element.
The complication is that there are a number of prefixes that might come before the size (e.g. langfr-250px for a translated SVG file), so we then want the last element of that string split on '-'.
(spark 3.3.0 gains the split_part function, which would make this rather simpler)
Nov 5 2025
All completed now.
@Jhancock.wm Thanks! We're all done here now :)
The necessary packages are now all available, and ms-be1088 managed to run puppet OK as a trixie host. Thanks, all :)
:(
IME the iDRAC basically never notices a bad disk. The kernel log above (and the Media error reported by perccli64) are all the errors I have.
It might still be caching, but https://apt-staging.wikimedia.org/wikimedia-staging/dists/trixie-wikimedia/main/binary-amd64/ is saying the Packages file is un-updated since 28 Oct (and it lacks e.g. python3-conftool which should be there by now).
Nov 3 2025
@Jhancock.wm that's a good question, to which I don't have a good answer :-/ I think my inclination would be to go for a like-for-like replacement (if nothing else to avoid surprising ourselves later).
Hi @Jhancock.wm ms-be208[5-7] are now ready for you to swap their controllers, please. I've downtimed them for a couple of days, so please go ahead whenever suits you.
Cool, thank you :)
Thanks! Do we have a suitable spare in stock still, in the mean time?
Oct 31 2025
A further complication - some wikis (I've found at least fr and de) add a lang{fr,de,...} prefix to the thumb size, e.g. https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Armoiries_de_Marseille.svg/250px-Armoiries_de_Marseille.svg.png (I think this is done to all svgs, presumably because some of them can be translated?); for multi-page documents the page number gets inserted into the thumb path too.
Updated in the light of review from Android and iOS folks - only change to our list of sizes is the addition of 80.
@VRiley-WMF looks good now, thanks!
Oct 30 2025
@VRiley-WMF the host is up, but it can't reach any of its spinning disks (the OS sees none, and the BMC says 0 physical disks). Could you take another look, please?
(as suggested in the other task, it might also/instead be worth trying to build megacli for trixie; I've just not had time to do so as yet)
Oct 29 2025
Oct 27 2025
Yes, for each of those hosts you can depool by running depool from the host in question (and then pool afterwards). Thanks!
Oct 24 2025
I've not been able to reproduce the boot failure (except by cheating), but the underlying issue remains - the installer is installing the EFI System Partition onto only 1 of the two OS disks, and doesn't touch the equivalent partition on the other one. So as long as drive ordering (as seen by the installer) is consistent, everything is good. We've learned in the past that this isn't something to rely upon.
Oct 23 2025
[trixie does have the unofficial https://packages.debian.org/stable/admin/megactl but I don't know if a) that works b) we'd want to trust it ]
megacli might have been copied to trixie, but it's useless there, because as you say it depends upon libncurses5, which isn't in trixie.
Oct 22 2025
(to answer the question - like all ms-* nodes, this will continue to be Debian 11 for now, although we might use it for a test install of Debian 13 before its returned to service; it's partman/custom/ms-be_simple-efi.cfg or partman/custom/ms-be_simple.cfg as appropriate for UEFI/BIOS booting)
@VRiley-WMF the last two nodes ms-be1089 and ms-be1090 are ready for controller swap, please; I've downtimed them for a couple of days.
Oct 17 2025
Thanks; our weekly rclone job would have caught up with this on Monday, but it's nice to have it resolved sooner :)
Oct 16 2025
@elukey FWIW, feel free to wipe these disks (the host isn't in the swift rings ATM).
Looks good now, thanks :)
Oct 15 2025
So, the swift & Ceph nodes:
Oct 14 2025
Hi @VRiley-WMF I'm afraid not (filesystems still about 25% full, so a little way to go yet).
Hi @Jhancock.wm ms-be2083 looks great, thank you.
Oct 8 2025
@Jhancock.wm ms-be2083 and ms-be2084 are now ready to have their controllers swapped - can you do them, please? I've downtimed them for a couple of days.
The new fact works; the failure is because the following key packages are not available in trixie: python3-conftool, megacli, prometheus-statsd-exporter . So we can close this out, and I'll open a ticket with infra foundations about the missing packages.
Oct 7 2025
As expected, the Monday rclone copied this image across:
curl -o /dev/null -v --connect-to ::upload-lb.eqiad.wikimedia.org https://upload.wikimedia.org/wikipedia/commons/c/cf/Things_near_the_Nautical_Museum_of_Litochoro_10.jpg 2>&1 | grep "< HTTP" < HTTP/2 200
Oct 6 2025
@LSobanski I worked round it for ms-be nodes; it was later re-opened by @jbond to look at the more general issue (see his comment above); I don't know if anything got done about that...
Oct 3 2025
As expected from the report, the object is in codfw, but not eqiad:
root@ms-fe1009:~# swift stat wikipedia-commons-local-public.cf c/cf/Things_near_the_Nautical_Museum_of_Litochoro_10.jpg
Object HEAD failed: http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.cf/c/cf/Things_near_the_Nautical_Museum_of_Litochoro_10.jpg 404 Not Found
Failed Transaction ID: tx7ac2ef9f7c79486d84f80-0068dfb6b8
root@ms-fe2009:~# swift stat wikipedia-commons-local-public.cf c/cf/Things_near_the_Nautical_Museum_of_Litochoro_10.jpg
Account: AUTH_mw
Container: wikipedia-commons-local-public.cf
Object: c/cf/Things_near_the_Nautical_Museum_of_Litochoro_10.jpg
Content Type: image/jpeg
Content Length: 7341349
Last Modified: Tue, 30 Sep 2025 20:53:18 GMT
ETag: 6383419dbeec344b83cb353b472ab95f
Meta Sha1Base36: 957witx4ns8tlo71g1o7h4qsrpng7v9
X-Timestamp: 1759265597.26006
Accept-Ranges: bytes
X-Trans-Id: tx7c819eef97524e50a8617-0068dfb6b7
X-Openstack-Request-Id: tx7c819eef97524e50a8617-0068dfb6b7