User Details
- User Since
- Aug 14 2018, 10:50 AM (400 w, 3 d)
- Availability
- Available
- IRC Nick
- effie
- LDAP User
- Effie Mouzeli
- MediaWiki User
- EMouzeli (WMF) [ Global Accounts ]
Yesterday
Wed, Apr 15
Thu, Apr 9
Wed, Apr 8
While investigating a different problem, I found that we have a similar(?) issue when mediawiki contacts the DBs T422489: rdbms errors in eqiad
I can't help noticing that MediaWiki periodic job update-special-pages-s5 failed failed twice for the same reason, which is either a very unfortunate coincidence related to T422489: rdbms errors in eqiad, or something worth investigating.
I am closing this, since we are tracking that in T422486
MW-Interfaces-Team same questions for you for changeprop/cpjobqueue /api-gateway/Ratelimit:
Tue, Apr 7
Things looks quite well so far mw-cron (MediaWiki Periodic Jobs on k8s) after merging 1268569. More details in T422455#11795500
Infrastructure-Foundations two questions:
- how netbox will behave if it looses connectivity to its redis and then start with a cold cache?
- do we have any concerns updating to redis 8?
Take this with a grain of salt, it seems like something indeed changed during the week of March 17th, and if eqiad was not producing all those errors, we wouldn't have noticed.
failed jobs have been deleted, closing this too for T422486
failed jobs have been deleted, closing this too for T422486
failed jobs have been deleted, closing this too for T422486
I filtered timeouts from the mediamoderation-hourlyscan job in an attempt to establish if we are seeing those timouts more after switching to eqiad.
@A_smart_kitten Thank you!
Setting aside any mediawiki changes, the difference between the two DCs in the same time period (post codfw repooling), is alarming
It seems like EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached started sometime around March 17th, with eqiad exhibiting
Mon, Apr 6
Seems like a temporary network error
The nearest (timestamp wise) long entry I found yielded a temp network problem. I am not aware how long it takes for @phaultfinder to create a task to be absolutely sure this is the one
This was due to a temp connection errors to the DB https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-k8s-1-7.0.0-1-2026.04.05?id=Z0EgXJ0BbI6kJ8WywbVL
Due to an unfortunate coincidence, this issue caused a paging event.
Fri, Apr 3
I just realised this is actually two tasks
Updated, I also sorted the dashboard's variables to be inline with the other ones
