The guy who fixes bugs. In daily life I am a Computer Science student specialising in Cyber Security & Cloud. I am the co-founder of Miraheze, a not-for-profit MediaWiki based wiki farm. If you notice activity here, it's me either working on SRE tasks or reporting bugs on behalf of downstream (Miraheze).
User Details
- User Since
- Oct 12 2014, 7:12 AM (339 w, 3 h)
- Availability
- Available
- IRC Nick
- SPF|Cloud
- LDAP User
- Southparkfan
- MediaWiki User
- Southparkfan [ Global Accounts ]
Fri, Apr 2
Short update: using local puppet patches, I got the syslog forwarding working. Next step will be getting those patches into the production branch.
Mar 4 2021
@Bstorm thank you! I presume the project will be created shortly? Let me know if there is anything I can do.
Mar 2 2021
Feb 26 2021
Feb 25 2021
Yes, Grafana Loki is one of the newer solutions. I am not experienced with Grafana Loki, but it may be the best solution for extracting metrics from logs, although I am not sure if Grafana Loki fits any high availability / scalability requirements. Elastic is more known, but with the license issues, it's better to let that slide for now (if the software is forked, we can reconsider!). What do you think?
Feb 17 2021
Feb 15 2021
As the original reporter of T127656, could you use my help here? Central logging sounds like a fun, but useful side project.
Feb 6 2021
Aug 25 2020
@MoritzMuehlenhoff understood. Patch set 4 will use the custom unit (with /run) on systems older than bullseye. On bullseye systems we'll be using the built-in unit.
Correct, see this:
Aug 24 2020
I have uploaded a new patch using /run on all servers (regardless of OS). However, what about removing the unit and using the built-in one? That reduces the likelihood of units conflicting with intended behavior from upstream. An apt-get source for nagios-nrpe=3.2.1-2 shows the following unit (debian/nagios-nrpe-server.service):
[Unit] Description=Nagios Remote Plugin Executor Documentation=http://www.nagios.org/documentation After=var-run.mount nss-lookup.target network.target local-fs.target remote-fs.target time-sync.target Before=getty@tty1.service plymouth-quit.service xdm.service Conflicts=nrpe.socket
Aug 23 2020
@Ciencia_Al_Poder What do you mean with internal URLs? purgeList.php fetches the URLS using Title->getCdnUrls(), which seems to work fine in 1.34.2:
The PID directory has been changed to /run since Buster: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932353#10
You're welcome.
Except I wanted to ask where you got the 308 from specifically.
I got that from T256276. However, https://gerrit.wikimedia.org/r/c/operations/puppet/+/612442/3/modules/dynamicproxy/files/domainproxy.lua says "ngx.HTTP_MOVED_PERMANENTLY", which is a 301. Either one is fine, though.
Aug 18 2020
Aug 11 2020
Aug 8 2020
Aug 7 2020
May 22 2020
Apr 10 2020
Mar 13 2020
Jan 11 2020
Definitely not working on this. Thanks for the help!
Dec 4 2019
Nov 4 2019
I feel my point was completely missed, regardless of the cause or fix: this change has a performance impact of roughly 28x. And since I have new questions, reopening.
Oct 17 2019
It doesn't seem to be limited to broken citations, {{Special:WhatlinksHere/{{FULLPAGENAME}}}} also seems enough to reproduce this.
Oct 15 2019
Aug 8 2019
GDNSD supports HTTP health checks (or custom ones as well). Provided the DNS TTL is no longer than a few minutes, 'automatic failover' is possible.
Jun 10 2019
The tasks regarding loss of PSU redundancy on cp303[2689] are normal priority, does this one need to be high priority?
Mar 28 2019
Jan 25 2017
(correct datacenter)
Dec 30 2016
Nov 18 2016
The same issue is occurring again at a high rate, and is causing problems. Also, there are users reporting they are silently being logged out of the site. I'm not sure how much they are related to each other, but I spotted this in the debug logs:
[session] Session "[50]CentralAuthSessionProvider<-:2:Southparkfan>[REDACTED]": Metadata merge failed: [Exception MediaWiki\Session\MetadataMergeException( /srv/mediawiki/w/includes/session/SessionProvider.php:195) Key "CentralAuthSource" changed] #0 /srv/mediawiki/w/includes/session/SessionManager.php(629): MediaWiki\Session\SessionProvider->mergeMetadata(array, array) #1 /srv/mediawiki/w/includes/session/SessionManager.php(498): MediaWiki\Session\SessionManager->loadSessionInfoFromStore(MediaWiki\Session\SessionInfo, WebRequest) #2 /srv/mediawiki/w/includes/session/SessionManager.php(182): MediaWiki\Session\SessionManager->getSessionInfoForRequest(WebRequest) #3 /srv/mediawiki/w/includes/WebRequest.php(700): MediaWiki\Session\SessionManager->getSessionForRequest(WebRequest) #4 /srv/mediawiki/w/includes/session/SessionManager.php(121): WebRequest->getSession() #5 /srv/mediawiki/w/includes/Setup.php(747): MediaWiki\Session\SessionManager::getGlobalSession() #6 /srv/mediawiki/w/includes/WebStart.php(137): require_once(string) #7 /srv/mediawiki/w/index.php(40): require(string) #8 {main}
Nov 9 2016
Sep 17 2016
Is this the same as the recent behavior seen on various API appservers? https://ganglia.wikimedia.org/latest/?c=API%20application%20servers%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2
Aug 24 2016
I wonder why @RobH and @Cmjohnson are talking about 4TB disks. The current problems are caused by 4k (6TB) disks, and the accessoires link given by Cmjohnson mentions a non-4k 6TB disk.
I wonder why @RobH and @Cmjohnson are talking about 4TB disks. The current problems are caused by 4k (6TB) disks, and the accessoires link given by Cmjohnson mentions a non-4k 6TB disk.
Aug 10 2016
Do we prefer a fallback that cannot be impacted by a Wikimedia outage of any kind? Conpherence is an option, but it is not off-site; a network outage affecting Wikimedia will render Conpherence useless.
Jul 31 2016
It could be possible that my configuration is not up-to-date. Like Wikimedia, we use redis for sessions (production 1.26 config):
$wgObjectCaches['redis'] = array( 'class' => 'RedisBagOStuff', 'servers' => array( '<IP>' ), 'password' => $wmgRedisPassword, ); $wgMainCacheType = 'redis'; $wgSessionCacheType = 'redis'; $wgSessionsInObjectCache = true; $wgMessageCacheType = CACHE_NONE; $wgParserCacheType = CACHE_DB; $wgLanguageConverterCacheType = CACHE_DB;
Jul 27 2016
Jul 26 2016
Jul 15 2016
Jul 12 2016
Yes, it does.
Step 1 seems to be re-introducing the texvc package (whose deletion actually was unrelated when I created the issue, it just made this task more or less invalid, because the issue was kinda gone - since nobody (except users who still have that package installed) would be hit by it anymore). That is a blocker for this task.
Jul 11 2016
@Andrew labvirt1012 lacks hyperthreading. Can you enable that?
Jul 3 2016
mw1017 and mw1099 are the eqiad MW debug hosts, so those should not be in conftool-data.
mw1161-mw1169 were previous appservers, but have been converted to jobrunners in December, per rOPUPba0a47b56ded3dc748c436c2940f114389b312f2. Jobrunners are not present in conftool-data, so I guess everything is as expected?
@Kghbln yes, I am aware of that, and I spoke to @MoritzMuehlenhoff, and he said that in hindsight the mediawiki-math-texvc should never have been deleted. Perhaps it will be packaged again - but I'm not sure if Debian's policies allow that, since it would mean that it has to be introduced in the jessie (stable) suite again. Otherwise we'd still need to wait until Debian 9 has been released.
Jun 17 2016
@Joe mw1306 seems to have the same IP as mw1091 (although that one shows up as mw1091 in Ganglia, whereas mw1090 shows up as mw1305...). So I think you need to fix that one as well.
Jun 11 2016
Just tested this, this doesn't happen in Firefox 47.0 on Windows 8.1. Perhaps an upgrade to 47.0 might help, although I doubt it because I can't see in the 47.0 release notes anything related to this.
Yep, that change (and the maintenance script) worked perfect. Everything is displaying correctly now.
Jun 10 2016
cp1044 has been decomissioned per T133614
Jun 8 2016
As of PHP 5.5 (Jessie has 5.6), there is an opcache extension included (only needs to be enabled in config), which caches compiled bytecode in the memory. Together with the other performance improvements since 5.4 (assuming paymentswiki runs on 5.3 now), CPU usage / response time should already decrease without using HHVM (so you get some more headroom before HHVM is really needed).
Jessie has PHP 5.6 actually.
Jun 6 2016
I'm sure Legoktm is right :-)! But I failed to make it work, so someone else should take it over honestly...
It would be cool if this bug could be fixed (whatever the solution is).
May 31 2016
southparkfan@tools-exec-1407:~$ echo 'oht0ipe1Pho7aa7pohChie8eath0ogoo9Eesoh9nahc3aefoh7ie2ais6oohugoo' > reset_2fa.txt southparkfan@tools-exec-1407:~$ cat reset_2fa.txt oht0ipe1Pho7aa7pohChie8eath0ogoo9Eesoh9nahc3aefoh7ie2ais6oohugoo southparkfan@tools-exec-1407:~$
May 30 2016
May 22 2016
@Paladox I've already upgraded to HHVM 3.12, which seems to work.
@Joe did you already install one of those servers? noticed https://ganglia.wikimedia.org/latest/?c=Application%20servers%20eqiad&h=mw1305.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 (I can't view which specs the new servers have, but this 'mw1305' seems to have the same cpu/ram specs as the old appservers?)
May 6 2016
@Cmjohnson yeah, perhaps I have been a bit too fast by already doing the DNS part (despite that's the only thing I can do it seems) :-)
May 5 2016
Just to learn how the process works, I've submitted a patch for the DNS adjustments. I noticed db1058 is referenced in the dhcpd and manifests/role/coredb.pp files in puppet but I have no idea how the latter one works, so I'll leave the puppet work to someone else.
Apr 28 2016
Apr 26 2016
Can this patch be backported to REL1_26, REL1_25 and REL1_23 too? The current security patch is only available in master, and these three branches still receive security support.
Apr 23 2016
Yeah, this doesn't seem to be a Varnish problem:
Apr 19 2016
Apr 17 2016
@Andrew mw1138 is not depooled (anymore), its CPU and network graphs show it is serving traffic. Looking at http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=mw1132.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=network_report&c=API+application+servers+eqiad it was idle for a few hours, but then it somehow got repooled..
Apr 16 2016
Apr 15 2016
I have a setup where I tell Varnish to redirect all /.well-known/acme-challenge traffic to one backend server (practically any server with a webserver should work), so I can use this (acme-tiny).