User Details
- User Since
- Jul 26 2022, 2:11 PM (129 w, 6 d)
- Availability
- Available
- IRC Nick
- claime
- LDAP User
- Clément Goubert
- MediaWiki User
- CGoubert-WMF [ Global Accounts ]
Today
rsyslog container was added to mercurius on the 7th https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1105800 but it again doesn't line up with the beginning of the slope
I enabled logging for mw-jobrunner through rsyslog on the 13th https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1110786 but it looks like the increase of udp_localhost-info preceeds that a bit?
Fri, Jan 17
Mon, Jan 13
Logs are now appearing in logstash, as well as benthos metrics in the Application Servers RED - k8s dashboard. Other metrics can be found in the mw-jobrunner service dashboard
While investigating if this task could be closed, I realized we were not logging the same way in mw-jobrunner than the rest of mw-on-k8s, meaning we didn't get benthos metrics for this deployment. The linked patch should fix that.
Wed, Jan 8
Re-prioritizing this to High, as the alert is actually critical, and would mask other wikifunctions httpbb alerts if they were to happen.
SGTM, thanks for doing the maths
Mon, Jan 6
@tstarling @TheDJ This has been flagged in T382517: PHP Warning seen by logspam-watch but not by mediawiki-errors logstash page and is due to the mw-videoscaler deployment missing the rsyslog sidecar. A patch is currently in review and will be deployed soon.
Dec 19 2024
There will be a deployement for the future mw-cron and maybe for mw-videoscaler (@hnowlan will be able to weigh in on this after the holidays), but we can handle that in separates issues, the main deployments of mediawiki are indeed covered. Resolving.
Dec 17 2024
FTR, direct link to that patch which is part of the work on T378458: Modernize code for the Translation notifications extension
Dec 16 2024
You probably also want to if-guard the service definition in templates/service.yaml.tpl
[...]
I see various references to mwscript and mercurius here, so if you (or someone) could let me know the current direction of travel regarding these two things, I'd be grateful.
Dec 13 2024
Dec 12 2024
The problem should be resolved for all private wikis now.
I've tweaked a rate limiting rule, could you please try again?
Sorry for the delay in responding.
Dec 11 2024
Dec 5 2024
Dec 2 2024
From T381252: legalteam wiki reliably returns 500s
Currently, any request to https://legalteam.wikimedia.org is returning 500 with this:
Uncaught MediaWiki\Config\ConfigException: Translate: Message group subscriptions (TranslateEnableMessageGroupSubscription) are enabled but Echo extension is not installed in /srv/mediawiki/php-1.44.0-wmf.5/extensions/Translate/src/HookHandler.php:438This wiki is used in health checks, so needs to be fixed ASAP.
Nov 27 2024
Nov 26 2024
Nov 25 2024
Because of T375845: WikiKube clusters close to exhausting Calico IPPool allocations, putting these nodes in production needs to wait for T379599: Reevaluate the requirement for dedicated sessionstore/kask nodes in wikikube clusters to be completed to have enough ip blocks to proceed
.
Host reimaged, RAID ok, repooled
Nov 22 2024
I'm good with removing them as well.
Re-imaging because I accidentaly overwrote the partition table on the good disk with the partition table on the new disk...
Nov 21 2024
wikikube-worker2159.codfw.wmnet is in C4 and blocked by management switch being down
wikikube-worker2157.codfw.wmnet has the same issue as T380265: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet
I have the same issue on wikikube-worker2157.codfw.wmnet, the interface in netbox is eno12409np1 but it has no link, whereas eno12399np0 does.
Nov 20 2024
@Papaul sorry for the misunderstanding, but it's not resolved. The interface that is supposed to have the link according to Netbox doesn't. I don't know if the best course of action is to change the connection in Netbox to be to eno12399np0 and reprovision the server?
Yes, eno12409np1 was the one where the IPs were originally mounted when I encountered the issue. In order to troubleshoot, I changed the config in /etc/network/interfaces to mount the IPs on eno12399np0, and that interface has the link up.
i just managed to mount the ip adresses on the other interface eno12399np0 and the link is up. Looks like the wrong one got provisioned?
All done and pooled except 2140 waiting on T380265: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet
Nov 19 2024
Thanks @Jhancock.wm :)