aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (174 w, 4 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz

Recent Activity

Yesterday

aaron added a comment to T187980: Memcached error "A TIMEOUT OCCURRED" for key "WANCache:v:enwiki:sidebar:en".

Since the value is "false", the callback runs, unless it's running somewhere else and there is no interim value. When this happens a lot in a short time, there will be interim values (lasting up 30 sec) used, unless they also return false due to some memcached error. If everything returns falls, then the callback runs all the time, regardless of the mutex. It won't be empty though.

Thu, Feb 22, 11:08 PM · Performance-Team, Wikimedia-log-errors, MediaWiki-Cache, MediaWiki-Interface
aaron added a comment to T187980: Memcached error "A TIMEOUT OCCURRED" for key "WANCache:v:enwiki:sidebar:en".

I noticed that too yesterday. Note that there is a PECL memcached bug that causes things to say TIMEOUT after a KEY TO LONG or VALUE TOO LARGE error, which makes for confusing failures and logs. I'm not sure if that is it play, but it wouldn't surprise me, and statistically it would affect the most-fetched keys (whatever they are).

Thu, Feb 22, 5:18 PM · Performance-Team, Wikimedia-log-errors, MediaWiki-Cache, MediaWiki-Interface
aaron added a comment to T187942: Replication lag detection broken in wmf.22.

The php warning is noise. The "Database is read-only" flood is an actual bug...no idea why that happened.

Thu, Feb 22, 3:18 AM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), User-notice, Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T187942: Replication lag detection broken in wmf.22.

$dbr->getLag() and $lb->getLagTimes() works fine in eval.php on wmf22 wikis as well.

Thu, Feb 22, 2:37 AM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), User-notice, Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T187942: Replication lag detection broken in wmf.22.

I've looking at the 21->22 logs, changes, and trying things on mw.org. I don't see a read-only problem there and https://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= looks fine. I don't see any lag in the DBs or seen by MW in that time (LoadBalancer graph at Grafana, those the resolution is low).

Thu, Feb 22, 2:32 AM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), User-notice, Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors

Tue, Feb 20

zeljkofilipin awarded T185328: "User should be able to change preferences" Selenium test fails when targeting mediawiki-vagrant a Party Time token.
Tue, Feb 20, 9:07 AM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, User-zeljkofilipin, Release-Engineering-Team (Kanban)

Sat, Feb 17

aaron updated the task description for T185664: FlaggedRevs: code stewardship review.
Sat, Feb 17, 9:06 PM · MediaWiki-extensions-FlaggedRevs, Code-Stewardship-Reviews

Thu, Feb 15

aaron closed T186947: many statistics have fallen to 0 on azwiktionary, ruwikiquote, and ptwikisource as Resolved.
Thu, Feb 15, 11:01 PM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), Patch-For-Review, Performance-Team, MediaWiki-General-or-Unknown, MediaWiki-Special-pages

Wed, Feb 14

aaron placed T169249: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam up for grabs.
Wed, Feb 14, 9:48 PM · Patch-For-Review, Performance-Team, Operations

Thu, Feb 8

aaron committed rESRXa24942ef371e: Add TTL to set() call (authored by aaron).
Add TTL to set() call
Thu, Feb 8, 8:51 PM
aaron committed rERXBa39a6a4ac147: Add TTL to set() call (authored by aaron).
Add TTL to set() call
Thu, Feb 8, 8:50 PM

Wed, Feb 7

aaron added a comment to T184854: hhvm memcached and php7 memcached extensions do not play well together.

I see, hhvm works with and without the flags, so they could be set in the background.

Wed, Feb 7, 10:23 PM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), Performance-Team, Patch-For-Review, User-ArielGlenn, NewPHP, MediaWiki-Platform-Team
aaron added a comment to T184854: hhvm memcached and php7 memcached extensions do not play well together.

Lots of keys use no value, 0, or TTL_INDEFINITE (all infinite), so there will be a lot of old keys.

Wed, Feb 7, 10:07 PM · MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), Performance-Team, Patch-For-Review, User-ArielGlenn, NewPHP, MediaWiki-Platform-Team
aaron added a comment to T186752: Swap objectcache table for MEMORY engine?.

MEMORY tables were kind of lame last time anyone checked, though I suppose someone can take a look. I doubt it would be too useful given a good innodb buffer pool size.

Wed, Feb 7, 9:50 PM · MediaWiki-Cache, MediaWiki-Database
aaron added a comment to T152934: Log accessing private information by those with 'abusefilter-private' permission.

Sorry about the slow review...this extension has a bit of an ownership problem, with random people stepping in for CR. I was thinking someone else would have merged this by now.

Wed, Feb 7, 9:48 PM · Epic, MW-1.31-release-notes (WMF-deploy-2018-02-13 (1.31.0-wmf.21)), Stewards-and-global-tools, Security-Team, AbuseFilter

Tue, Feb 6

aaron closed T185328: "User should be able to change preferences" Selenium test fails when targeting mediawiki-vagrant as Resolved.

Verified by local selenium test runs (passes with the fix and fails without the fix).

Tue, Feb 6, 11:56 PM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, User-zeljkofilipin, Release-Engineering-Team (Kanban)

Fri, Jan 26

aaron added a comment to T185328: "User should be able to change preferences" Selenium test fails when targeting mediawiki-vagrant.

Do these tests actually used replication or is it singe DB server? Header logs would also be useful.

Fri, Jan 26, 8:00 PM · MW-1.31-release-notes (WMF-deploy-2018-02-06 (1.31.0-wmf.20)), Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, User-zeljkofilipin, Release-Engineering-Team (Kanban)

Jan 17 2018

aaron added a comment to T161190: Configure External Store on Vagrant with blobs tables in databases named after wiki DB and proper isolation.

@aaron Am I right that having two databases with the same name (i.e. "enwiki where revision, page, etc. are" and "enwiki where blobs is") on the same machine requires two different MySQL data directories and ports/sockets?

Jan 17 2018, 10:35 PM · Patch-For-Review, MediaWiki-Database, MediaWiki-Vagrant
aaron added a comment to T185055: Stack overflow when Redis is down.

So I cannot contact redis via nutcracker on tin. I noticed the password was not actually set for redis (trying to AUTH when no password set results in an error); using CONFIG SET requirepass <x> didn't make a difference though. In any case, I can use redis-cli to talk to the local redis instance on 01/02 themselves. I'm not sure how much of this is nutcracker vs redis. Restarting either does not help.

Jan 17 2018, 6:07 PM · Beta-Cluster-Infrastructure, Performance-Team (Radar), MediaWiki-JobQueue, Operations, Beta-Cluster-reproducible

Jan 13 2018

aaron added a comment to T182322: ChronologyProtector breaks if two requests write different sets of databases.

Yes, https://gerrit.wikimedia.org/r/396546 .

Jan 13 2018, 8:55 PM · MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, MediaWiki-Database, Wikidata, Performance-Team, User-Addshore, User-notice

Jan 10 2018

aaron added a comment to T171071: Perform testing for TLS effect on connection rate.

I fixed a stupid hostname var bug. Now I get numbers that make sense:

Same-DC (db2070.codfw.wmnet):
string(57) "0.001196186542511 sec/conn (non-SSL) [db2070.codfw.wmnet]"
string(60) "0.00027136325836182 sec/query (non-SSL) [db2070.codfw.wmnet]"
string(53) "0.059528641700745 sec/conn (SSL) [db2070.codfw.wmnet]"
string(56) "0.00028834581375122 sec/query (SSL) [db2070.codfw.wmnet]"
Cross-DC (db1055.eqiad.wmnet):
string(56) "0.10918385744095 sec/conn (non-SSL) [db1055.eqiad.wmnet]"
string(57) "0.03636349439621 sec/query (non-SSL) [db1055.eqiad.wmnet]"
string(52) "0.25189030647278 sec/conn (SSL) [db1055.eqiad.wmnet]"
string(54) "0.036419949531555 sec/query (SSL) [db1055.eqiad.wmnet]"
Jan 10 2018, 12:09 AM · Patch-For-Review, Availability (Multiple-active-datacenters), DBA, Operations, Performance-Team

Jan 9 2018

aaron added a comment to T184529: Define a way to get a database connection based on a logical wiki ID..

I see wiki IDs as a type of "domain ID" that just uses two ASCII components, (dbname,prefix), neither using slashes to avoid the ugliness of using things like "mysite?hnewswiki-en" have to appear on config or in "table_wiki" DB fields. For B/C, the non-slash rule can't be a hard-rule that throws errors. Given that, the getWiki() functions should use known-to-be-encoded wiki ID values or use use DatabaseDomain to derive them. There could be a stricter WikiDatabaseDomain subclass. Changing those methods would probably both fix and break things for the slash-scenario; maybe the "doesn't use domain hierarchy delimiter character" restriction could then be enforced by default behind a flag that could be disabled for legacy-mode.

Jan 9 2018, 11:00 PM · User-Daniel, MediaWiki-Database
RandomDSdevel awarded T182322: ChronologyProtector breaks if two requests write different sets of databases a Doubloon token.
Jan 9 2018, 1:42 AM · MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, MediaWiki-Database, Wikidata, Performance-Team, User-Addshore, User-notice

Dec 14 2017

aaron added a comment to T171071: Perform testing for TLS effect on connection rate.

I keep coming with times like:

Dec 14 2017, 9:55 PM · Patch-For-Review, Availability (Multiple-active-datacenters), DBA, Operations, Performance-Team
aaron moved T171071: Perform testing for TLS effect on connection rate from Blocked to Doing on the Performance-Team board.
Dec 14 2017, 9:45 PM · Patch-For-Review, Availability (Multiple-active-datacenters), DBA, Operations, Performance-Team

Dec 12 2017

aaron added a comment to T173450: Setup grafana alert for job error rate.

I started a quick dashboard at https://grafana.wikimedia.org/dashboard/db/job-queue-alerts?orgId=1&from=now-12h&to=now with some alerts.

Dec 12 2017, 11:29 PM · Performance-Team
aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

@aaron the proxy is installed but unconfigured, - we still have to fix some issues with the start and process, but do you want me to point it to the real master? Do you want me to point it to a soon to be setup master test host?

Dec 12 2017, 6:29 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations

Dec 11 2017

aaron added a comment to T173450: Setup grafana alert for job error rate.

I suppose we can use jobrunner.runner-status.error.rate, sumSeries(jobrunner.pop.*.failed.*.rate), and sumSeries(jobrunner.pop.*.ok.*.rate) to make alerts in a Grafana dashboard.

Dec 11 2017, 9:53 PM · Performance-Team
aaron closed T182390: 2017-12-07 Huge SaveTiming spike as Resolved.

Yeah, same thing.

Dec 11 2017, 7:38 PM · Performance-Team

Dec 8 2017

aaron added a comment to T182322: ChronologyProtector breaks if two requests write different sets of databases.

I'm not sure why the time check logic is so complicated, I guess it got prematurely generalized from the single-DB case.

Dec 8 2017, 8:33 PM · MW-1.31-release-notes (WMF-deploy-2018-01-16 (1.31.0-wmf.17)), Patch-For-Review, MediaWiki-Database, Wikidata, Performance-Team, User-Addshore, User-notice

Dec 7 2017

aaron updated the task description for T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep.
Dec 7 2017, 6:53 AM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure, Performance-Team

Dec 6 2017

Envlh awarded T181385: Wikidata entity dumpers stuck with 100% CPU on snapshot1007 a Heartbreak token.
Dec 6 2017, 8:50 AM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Performance-Team, Wikidata, Datasets-General-or-Unknown

Dec 5 2017

aaron moved T181385: Wikidata entity dumpers stuck with 100% CPU on snapshot1007 from Doing to Blocked on the Performance-Team board.
Dec 5 2017, 11:36 PM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Performance-Team, Wikidata, Datasets-General-or-Unknown
aaron closed T178531: Add statsd metric to WANObjectCache as Resolved.

Probably some MW fixes actually reaching production.

Dec 5 2017, 11:07 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron added a comment to T180035: MediaWiki core Selenium tests fail when targeting Vagrant.

Does this still occur?

Dec 5 2017, 4:19 AM · Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, Release-Engineering-Team (Kanban), User-zeljkofilipin
aaron closed T180793: Frequent "Wikimedia\\Rdbms\\DatabaseMysqlBase::lock failed to acquire lock" errors on WMF mediawiki logs as Resolved.
Dec 5 2017, 4:17 AM · MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), Patch-For-Review, DBA, Wikimedia-log-errors, Performance-Team
aaron added a comment to T180793: Frequent "Wikimedia\\Rdbms\\DatabaseMysqlBase::lock failed to acquire lock" errors on WMF mediawiki logs.

By reducing the lock max wait times and pushing the brunt of lag waits out if the critical section, then less real time should be wasted.

Dec 5 2017, 4:17 AM · MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), Patch-For-Review, DBA, Wikimedia-log-errors, Performance-Team

Dec 4 2017

aaron moved T181385: Wikidata entity dumpers stuck with 100% CPU on snapshot1007 from Inbox to Doing on the Performance-Team board.
Dec 4 2017, 9:00 PM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Performance-Team, Wikidata, Datasets-General-or-Unknown

Dec 2 2017

aaron added a comment to T181385: Wikidata entity dumpers stuck with 100% CPU on snapshot1007.

Change 394779 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Try to opportunistically flush statsd data in maintenance scripts

https://gerrit.wikimedia.org/r/394779

Dec 2 2017, 9:20 PM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Performance-Team, Wikidata, Datasets-General-or-Unknown
aaron added a comment to T178531: Add statsd metric to WANObjectCache.

There is some caller that is not making keys correctly, which causes this. I can't find anymore looking though all of core and extensions and mediawiki-config.

Thanks for looking into it! In the meantime I've blackholed said metrics to avoid graphite disks filling up

Dec 2 2017, 8:27 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron added a comment to T178531: Add statsd metric to WANObjectCache.

Change 394493 merged by jenkins-bot:
[mediawiki/core@wmf/1.31.0-wmf.10] Add temporary logging for bad WAN cache statsd keys

https://gerrit.wikimedia.org/r/394493

Dec 2 2017, 8:26 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron added a comment to T181385: Wikidata entity dumpers stuck with 100% CPU on snapshot1007.

How long do these run? The sample rate in config is set to be extremely low. So perhaps:

  • The buffering class buffers things that won't even be saved
  • The buffering could be disable in CLI mode
Dec 2 2017, 8:18 PM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Performance-Team, Wikidata, Datasets-General-or-Unknown

Nov 29 2017

aaron added a comment to T180035: MediaWiki core Selenium tests fail when targeting Vagrant.

I noticed a worse bug of cpPosTime cookies not being used (not related to WAN cache). The patch for that is above.

Change 393983 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make ChronologyProtector actually use cpPosTime cookies

https://gerrit.wikimedia.org/r/393983

Nov 29 2017, 5:51 AM · Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, Release-Engineering-Team (Kanban), User-zeljkofilipin
aaron added a comment to T180035: MediaWiki core Selenium tests fail when targeting Vagrant.

I noticed a worse bug of cpPosTime cookies not being used (not related to WAN cache). The patch for that is above.

Nov 29 2017, 4:52 AM · Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, Release-Engineering-Team (Kanban), User-zeljkofilipin
aaron added a comment to T180035: MediaWiki core Selenium tests fail when targeting Vagrant.

The simple thing is to not set INTERIM keys in the same request that purged them. The duration of that rule would be HOLDOF_TTL so that the array holding the purged keys doesn't get too big for long-running maintenance scripts. This can be done with a HashBagOStuff nested in the WAN cache object easily enough.

Nov 29 2017, 12:52 AM · Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, Release-Engineering-Team (Kanban), User-zeljkofilipin
aaron added a comment to T180035: MediaWiki core Selenium tests fail when targeting Vagrant.

This looks like an integration issue with ChronologyProtector vs WANObjectCache.

Nov 29 2017, 12:43 AM · Performance-Team (Radar), MediaWiki-Cache, MediaWiki-Vagrant, Release-Engineering-Team (Kanban), User-zeljkofilipin

Nov 28 2017

aaron added a comment to T178531: Add statsd metric to WANObjectCache.

I guess we will need MW side logging now. Probably can just add it to wmf branch.

Nov 28 2017, 5:14 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron created T181528: Document the various parameters of each Job class.
Nov 28 2017, 5:02 PM · Documentation, MediaWiki-JobQueue

Nov 27 2017

aaron added a comment to T178531: Add statsd metric to WANObjectCache.

There is some caller that is not making keys correctly, which causes this. I can't find anymore looking though all of core and extensions and mediawiki-config.

Nov 27 2017, 7:59 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team

Nov 23 2017

aaron renamed T181216: Get rid of pointless EnqueueJob usage from Get rid of enqueue job to Get rid of pointless EnqueueJob usage.
Nov 23 2017, 10:03 AM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Patch-For-Review, Services (done), MediaWiki-JobQueue
aaron added a comment to T181216: Get rid of pointless EnqueueJob usage.

They were mentioned in https://www.mediawiki.org/wiki/Requests_for_comment/Master-slave_datacenter_strategy_for_MediaWiki#Job_queuing though it was never set up (partly from people being busy with other things). In general jobs are enqueued on POST requests or from other jobs, all in the master datacenter. In some cases, jobs are enqueued on GET or possibly POST (if the api-promise-nonwrite thing is set up in vlc) in rare cases. This should work in a way where the cross-DC propagation is async, rather than having JobQueue::push() blocking on cross-DC traffic.

Nov 23 2017, 10:01 AM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Patch-For-Review, Services (done), MediaWiki-JobQueue

Nov 21 2017

jcrespo awarded T180793: Frequent "Wikimedia\\Rdbms\\DatabaseMysqlBase::lock failed to acquire lock" errors on WMF mediawiki logs a Doubloon token.
Nov 21 2017, 11:57 AM · MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), Patch-For-Review, DBA, Wikimedia-log-errors, Performance-Team

Nov 20 2017

aaron moved T180793: Frequent "Wikimedia\\Rdbms\\DatabaseMysqlBase::lock failed to acquire lock" errors on WMF mediawiki logs from Next-up to Doing on the Performance-Team board.
Nov 20 2017, 9:23 PM · MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), Patch-For-Review, DBA, Wikimedia-log-errors, Performance-Team
aaron moved T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep from Next-up to Doing on the Performance-Team board.
Nov 20 2017, 9:23 PM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure, Performance-Team

Nov 17 2017

aaron added a comment to T171881: CL support for Wikipedia Zero piracy problems.

On the MW side of (2) above, it appears the swiftFileBackend code in MW uses PHP's urlencode to transform the filenames into upload URL paths. urlencode documentation claims that it percent-encodes everything but alphanumerics and -_. (so the set it does not encode is almost the official Unreserved Set, but it's missing the tilde). It also encodes spaces as + rather than %20 because it's meant for query strings rather than paths. PHP's rawurlencode would probably have been more appropriate here as it conforms to the RFC and excludes from encoding exactly the Unreserved Set and doesn't do the +-for-spaces thing. However, in practice, we can deal with the ~ issue and spaces have already been made into underscores, so the plusses shouldn't ever actually appear.

Regardless, this explanation seems consistent with observations of the upload.wm.o paths I've seen. We can normalize on similar rules there (but leave spaces as %20 just to be technically-correct, which again won't matter in practice). If at some later date we want to use a prettier normalization we can do that, too, but for now it would be simplest to leave the MediaWiki side alone and just conform everything else to its expectations.

Nov 17 2017, 11:26 PM · Patch-For-Review, Community-Liaisons (Oct-Dec 2017), Zero

Nov 16 2017

aaron added a comment to T178849: Click on fullImageLink <a> for PDF on File: page no longer rendering in browser.

So, the post_as_copy = true case works if SwiftFileBackend to no longer blacklist Content-Type from non-PUTs. It would always re-assert the old value if nothing was passed in by the describe() caller.

Nov 16 2017, 8:41 PM · MediaWiki-File-management, Commons, MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, Regression, media-storage, Multimedia, Multimedia-Team-Working-Board
aaron added a comment to T178849: Click on fullImageLink <a> for PDF on File: page no longer rendering in browser.

We should be mindful of the Swift post_as_copy option when set to false. At the moment that does *not* allowing changing Content-Type via POST.

Nov 16 2017, 8:24 PM · MediaWiki-File-management, Commons, MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, Regression, media-storage, Multimedia, Multimedia-Team-Working-Board

Nov 8 2017

aaron closed T178531: Add statsd metric to WANObjectCache as Resolved.
Nov 8 2017, 8:35 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron closed T179999: CentralAuthUser::loadFromCache doesn't call the makeKey() methods as needed as Resolved.
Nov 8 2017, 8:35 PM · MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), MediaWiki-extensions-CentralAuth, Patch-For-Review, Performance-Team
aaron closed T179999: CentralAuthUser::loadFromCache doesn't call the makeKey() methods as needed, a subtask of T178634: 1.31.0-wmf.7 deployment blockers, as Resolved.
Nov 8 2017, 8:35 PM · RelEng-Archive-FY201718-Q2, Train Deployments, Release
aaron created T179999: CentralAuthUser::loadFromCache doesn't call the makeKey() methods as needed.
Nov 8 2017, 2:50 AM · MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), MediaWiki-extensions-CentralAuth, Patch-For-Review, Performance-Team

Oct 31 2017

aaron updated the task description for T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep.
Oct 31 2017, 10:27 PM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure, Performance-Team
aaron added a comment to T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep.

So, running mcrouter via screen -r with the config in /etc/mcrouter/mcrouter.json on tin seems to work fine. The pool replication works and the timings are comparable to twemproxy -- often better than twemproxy.

Oct 31 2017, 10:21 PM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure, Performance-Team

Oct 30 2017

aaron moved T171071: Perform testing for TLS effect on connection rate from Doing to Blocked on the Performance-Team board.
Oct 30 2017, 9:03 PM · Patch-For-Review, Availability (Multiple-active-datacenters), DBA, Operations, Performance-Team

Oct 26 2017

aaron closed T175418: Create new instances memc05 and memc06 running memcached as Resolved.
Oct 26 2017, 10:26 PM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure
aaron closed T175418: Create new instances memc05 and memc06 running memcached, a subtask of T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep, as Resolved.
Oct 26 2017, 10:26 PM · Release-Engineering-Team (Watching / External), Availability (Multiple-active-datacenters), Beta-Cluster-Infrastructure, Performance-Team

Oct 24 2017

aaron added a comment to T135261: {{REVISION*}} magic words should not display usernames and timestamps for null edits.

That "cannot merge" message is mostly useless and overly-technical in a Gerrit specific way (e.g. you can't "submit" without "+2", which is obvious anyway). Just look for "merge conflict" on the changeset page or where the patch shows up in listings, since that actually matters and is common.

Oct 24 2017, 1:54 AM · MW-1.31-release-notes (WMF-deploy-2017-11-07 (1.31.0-wmf.7)), MW-1.28-release (WMF-deploy-2016-06-14_(1.28.0-wmf.6)), Patch-For-Review, MediaWiki-Parser
aaron added a comment to T178857: Is the ConfirmAccount extension maintained?.

There have always been a lot feature requests or bug reports due to misconfiguration/version-mismatch and so on. I don't really have the time anymore (for some time in fact) to sift through and find the serious bugs. When I become aware of one I try to fix it, but if it's not major then I probably won't look at it.

Oct 24 2017, 1:52 AM · MediaWiki-extensions-ConfirmAccount

Oct 23 2017

aaron added a comment to T177073: Split the backend savetiming metric into submetrics.

Actually, I just moved them to https://grafana-admin.wikimedia.org/dashboard/db/backend-save-timing-breakdown?refresh=5m&orgId=1 .

Oct 23 2017, 8:50 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Performance-Team
aaron moved T171071: Perform testing for TLS effect on connection rate from Blocked to Doing on the Performance-Team board.
Oct 23 2017, 8:29 PM · Patch-For-Review, Availability (Multiple-active-datacenters), DBA, Operations, Performance-Team
aaron moved T169249: /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam from Doing to Next-up on the Performance-Team board.
Oct 23 2017, 8:29 PM · Patch-For-Review, Performance-Team, Operations
aaron closed T177073: Split the backend savetiming metric into submetrics as Resolved.

They are on the main dashboard. If more or added, it would be good to split them out since the main save timing board is getting long.

Oct 23 2017, 8:09 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Performance-Team
aaron moved T178531: Add statsd metric to WANObjectCache from Inbox to Doing on the Performance-Team board.
Oct 23 2017, 8:08 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron triaged T178531: Add statsd metric to WANObjectCache as Normal priority.
Oct 23 2017, 8:08 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team

Oct 20 2017

aaron added a comment to T173696: Cache format constraint check results.

Probably hotTTR is way to high. It's really "expected time till refresh given 1 hit/sec". With 50/min, you'd get maybe 2 updates (new values) per regex. I'll put up a patch for that.

Oct 20 2017, 4:16 PM · MW-1.31-release-notes (WMF-deploy-2017-10-24 (1.31.0-wmf.5)), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), Patch-For-Review, Wikidata-Former-Sprint-Board, Wikibase-Quality-Constraints, Wikibase-Quality, Wikidata

Oct 19 2017

aaron added a comment to T173696: Cache format constraint check results.

I did a bunch of requests against https://www.wikidata.org/w/api.php?action=wbcheckconstraints&format=json&id=Q42&constraintid=P1476%24F24FF782-E994-4946-BEEC-104CC592534F, which checks a format constraint for “title”. It’s always the same regex and only a handful of different values (17). But while I could see a sharp rise in requests in Grafana corresponding to the times when I sent those requests (permalink), most of them are still cache misses. I’m not sure how to interpret that – it seems values aren’t entering the cache map very often?

Oct 19 2017, 5:45 PM · MW-1.31-release-notes (WMF-deploy-2017-10-24 (1.31.0-wmf.5)), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), Patch-For-Review, Wikidata-Former-Sprint-Board, Wikibase-Quality-Constraints, Wikibase-Quality, Wikidata

Oct 18 2017

Krinkle awarded T178531: Add statsd metric to WANObjectCache a Orange Medal token.
Oct 18 2017, 8:37 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron created T178531: Add statsd metric to WANObjectCache.
Oct 18 2017, 8:31 PM · MW-1.31-release-notes (WMF-deploy-2017-11-28 (1.31.0-wmf.10)), Patch-For-Review, MediaWiki-Cache, monitoring, Performance-Team
aaron placed T160298: "Special:ActiveUsers" throws database query error with sql_mode=only_full_group_by up for grabs.
Oct 18 2017, 6:21 PM · Patch-For-Review, MW-1.27-release-notes, MW-1.29-release-notes, MW-1.28-release-notes, MW-1.29-release, MW-1.27-release, Technical-Debt, MediaWiki-Special-pages

Oct 17 2017

aaron added a comment to T173696: Cache format constraint check results.

Reopening. This task is supposed to be for caching results in general, which isn’t done yet at all, though we had a lot of discussion on caching regex checks specifically here, which in hindsight should’ve been in a separate task. Also, IMO the regex caching isn’t done yet, since the Grafana stats are pretty unsatisfactory.

(Perhaps we should repurpose this task to be just about regex checking, open a new one for general caching, and reshuffle the parent tasks so that this one is a child of the new task?)

Oct 17 2017, 3:53 PM · MW-1.31-release-notes (WMF-deploy-2017-10-24 (1.31.0-wmf.5)), MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), Patch-For-Review, Wikidata-Former-Sprint-Board, Wikibase-Quality-Constraints, Wikibase-Quality, Wikidata

Oct 12 2017

aaron placed T75174: Make PHPUnit tests pass with PHP 5.5/PostgreSQL on Travis CI up for grabs.
Oct 12 2017, 9:38 PM · User-Addshore, MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), Patch-For-Review, MW-1.30-release-notes, PostgreSQL, Goal, MediaWiki-Core-Tests

Oct 6 2017

aaron moved T177073: Split the backend savetiming metric into submetrics from Next-up to Doing on the Performance-Team board.
Oct 6 2017, 6:02 PM · MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), Patch-For-Review, Performance-Team

Oct 5 2017

aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

We discussed proxies in the last performance meeting and we're OK with that (it would cut down on handshake latency anyway).

Oct 5 2017, 10:06 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron added a comment to T155110: JobRunner transaction fname for Job::run() can mismatch __METHOD__ in a subclass.

JobRunner always starts an LBFactory transaction.

Oct 5 2017, 10:01 PM · MediaWiki-JobQueue
aaron closed T42451: "Transaction already in progress" error in sqlite as Resolved.

This was actually fixed for new installs before that patch by moving the object cache table to a separate DB.

Oct 5 2017, 9:41 PM · Performance-Team, SQLite, MediaWiki-Database
aaron closed T42451: "Transaction already in progress" error in sqlite, a subtask of T72710: StorageException in EditEntityActionTest::testActionForPage (edit-already-exists) and related failures, as Resolved.
Oct 5 2017, 9:41 PM · § Wikidata-Sprint-2015-02-25, Patch-For-Review, Wikidata, MediaWiki-extensions-WikibaseRepository
aaron placed T134811: Consider REST with SSL (HyperSwitch/Cassandra) for session storage up for grabs.
Oct 5 2017, 1:12 AM · Services (blocked), Availability (Multiple-active-datacenters), Operations, Performance-Team

Oct 4 2017

aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

So what I extract from the errors is you're trying to connect to db2048 by IP and not by hostname, and the certificates we expose for mysql do not include verification information for the ip address in its SAN. In fact, I don't think we ever did add that info to our certs.

So if we had the hostname instead of the IP in db-codfw.php, it should work. I think performance was a reason for using IPs instead of hostnames there, so we might need to reissue the certificates if we want to keep using IPs. I think the implications for DBAs would be a huge maintenance work.

Oct 4 2017, 8:47 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

Also, there is https://bugs.php.net/bug.php?id=74445 :)

Oct 4 2017, 8:39 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron updated the task description for T175672: Make apache/maintenance hosts TLS connections to mariadb work.
Oct 4 2017, 8:33 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron renamed T175672: Make apache/maintenance hosts TLS connections to mariadb work from Make client certs available for apache/maintenance hosts for TLS connections to mariadb to Make apache/maintenance hosts TLS connections to mariadb work.
Oct 4 2017, 7:07 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron added a comment to T155110: JobRunner transaction fname for Job::run() can mismatch __METHOD__ in a subclass.

You can always do what extensions/CentralAuth/includes/LocalRenameJob/LocalRenameJob.php does AFAIK.

Oct 4 2017, 6:03 PM · MediaWiki-JobQueue

Oct 3 2017

aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

Looking at http://php.net/manual/en/mysqli.ssl-set.php, I would think you'd only need to set capath=/etc/ssl/certs, while setting all other parameters to NULL (except maybe cipher, as I have no idea what is the actual default cipherlist for mysqli on HHVM).

I tried that first but it yields "SSL connection error: SSL_CTX_set_default_verify_paths failed (10.192.32.108)".

Oct 3 2017, 11:01 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations
aaron added a comment to T177017: Re-enable per-filter profiling on wikis where it was disabled.

I'd look for the new method calls that are being reached and whether they show up and how large their profile is if they do. Note that you can use cntl-F on the svg images to highlight matches in purple.

Oct 3 2017, 9:15 PM · AbuseFilter

Oct 2 2017

aaron created T177258: Update.php fails with postgres due to ip_changes population.
Oct 2 2017, 9:58 PM · Community-Tech, MW-1.31-release-notes (WMF-deploy-2017-12-05 (1.31.0-wmf.11)), MW-1.30-release-notes, MW-1.30-release, PostgreSQL, MediaWiki-Maintenance-scripts, MediaWiki-Installer
aaron added a comment to T177017: Re-enable per-filter profiling on wikis where it was disabled.

I think it's fine to roll out there as long as you are watching https://grafana.wikimedia.org/dashboard/db/save-timing?refresh=5m&orgId=1 and check the -index.svg flamegraph at https://performance.wikimedia.org/xenon/svgs/daily/ for day of deployment the next day (current day values are always useless/incomplete).

Oct 2 2017, 9:45 PM · AbuseFilter
aaron closed T160298: "Special:ActiveUsers" throws database query error with sql_mode=only_full_group_by as Resolved.
Oct 2 2017, 6:09 PM · Patch-For-Review, MW-1.27-release-notes, MW-1.29-release-notes, MW-1.28-release-notes, MW-1.29-release, MW-1.27-release, Technical-Debt, MediaWiki-Special-pages
aaron placed T173450: Setup grafana alert for job error rate up for grabs.
Oct 2 2017, 6:08 PM · Performance-Team

Sep 26 2017

aaron added a comment to T175672: Make apache/maintenance hosts TLS connections to mariadb work.

Looking at http://php.net/manual/en/mysqli.ssl-set.php, I would think you'd only need to set capath=/etc/ssl/certs, while setting all other parameters to NULL (except maybe cipher, as I have no idea what is the actual default cipherlist for mysqli on HHVM).

Sep 26 2017, 3:29 PM · Performance-Team (Radar), Availability (Multiple-active-datacenters), DBA, Operations

Sep 20 2017

aaron moved T166199: Add metrics for master queries on HTTP GET/HEAD from Next-up to Doing on the Performance-Team board.
Sep 20 2017, 7:21 PM · MW-1.31-release-notes (WMF-deploy-2017-10-03 (1.31.0-wmf.2)), Performance-Team, Availability (Multiple-active-datacenters)